Author Topic: IMDb plugin and Titles with reserved characters  (Read 9032 times)

0 Members and 1 Guest are viewing this topic.

Offline Data1001

  • User
  • ***
  • Posts: 50
    • View Profile
IMDb plugin and Titles with reserved characters
« on: January 22, 2012, 10:23:16 am »
First of all, I'd like to say thanks once again Nostra for all your tireless efforts in making such a terrific piece of software, and to you, Rick and others for being so helpful here on the boards.

I'm noticing something with the IMDb import plugin (and it may be an across-the-board plugin issue, I don't know), regarding punctuation. Specifically, PVD seems to ignore titles listed on IMDb with certain types of punctuation (returning another version of the title instead), and even when the punctuation matches both in PVD and IMDb, it can't seem to find the right title.

Examples:

I had the Beatles movie "Help!" listed in my database, and ran the IMDb import plugin. Even though the way I had written the title in PVD exactly matched the way it was listed on IMDb (with exclamation point at the end), it did not return the proper title as an option -- instead, it gave me the two choices of "Call for Help" (1998) and A Cry for Help 2 (2002).  If I enter the title in PVD simply as "Help", without the exclamation point at the end, then the IMDb plugin does return the Beatles film, along with several dozen other choices.

When running the IMDb import plugin on the PVD title of "And Justice For All" (spelled that exact way in the database), it does not even return the option to choose, but instead retrieves a Michael Moore documentary (http://www.imdb.com/title/tt0207965/) by that name, instead of the Al Pacino title, which IMDb spells with an ellipsis (three dots) at the beginning of the title. But here's the crazy part -- I actually had written the title in PVD the way I remembered it, with the ellipsis -- "...And Justice For All" -- but again, the Al Pacino movie was not returned as an option; instead this 2012 title was automatically retrieved: http://www.imdb.com/title/tt1988529/. Needless to say, this was throwing me, as I was sure I had the title right, so I opened up my browser and searched IMDb directly. Turns out they have the title spelled not only with the leading ellipsis, but with an extra period at the end, like "...And Justice for All.". Now, I'd not seen it spelled that way, but the IMDb import plugin should have returned the correct title as an option, nonetheless. So it was being confused both by the periods at the beginning and the period at the end.

Seems that perhaps it needs to be told to ignore punctuation, both outgoing and incoming, yes?

(P.S. I am running the latest beta version of PVD, and the latest version of the IMDb import plugin.)

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #1 on: January 22, 2012, 01:28:15 pm »
Understand the plugin can only request a search, then parse the search results page returned. Most of these kinds of issues you would see using the host directly, although you're likely to fare better with the actual results page.

In the case of Help!, the '!' is a reserved character and therefore sent as as '%21'. I wonder if that's necessary, as it's part of the query component of the URL, but it doesn't seem to be incorrect. IMDb, however, doesn't handle it properly. Ironically, it seems to handle an actual '!' by ignoring it.

As for ...And Justice For All. the period at the end doesn't seem right. But there's nothing the plugin can do about that. And looking at the search results page, it's obvious what the problem is.

These search exceptions (whether IMDb or any other source) are small enough in number they're probably best handled by manually searching using a browser, and getting the URL of the target page. If you happen to know the host will have a problem with a special character, you might find it easier to just remove or change it, and then run the plugin. Even if the frequency of exceptions could be reduced by modifying the plugin, they can't be eliminated. There's nothing a plugin can do about a flaky search engine at the host.

[Edit] I now wonder, however, as a result of the following exchange, if changing the handling of reserved characters in the search string portion of the URI might reduce the number of such exceptions. :-\
« Last Edit: January 23, 2012, 12:26:31 am by rick.ca »

Offline Morgenstern72

  • User
  • ***
  • Posts: 39
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #2 on: January 22, 2012, 09:54:48 pm »
Movies with "/" in its name (like "20/20") produce an http error (see picture) when trying to update movie information

Version 0.9.9.21 portable

[attachment deleted by admin]
PVD 1.0.2.7
Windows 7 Pro 64bit

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #3 on: January 23, 2012, 12:17:12 am »
What plugin are you using? '/' is a reserved character that has to be replaced (with '%2F'). The host search (e.g., IMDb) may not handle that correctly, but it shouldn't cause a 404 error.

When I try (in 1.0.2.2) updating '20/20' using the IMDb plugin, the search fails because IMDb does interpret the replacement correctly. This is exactly the same issue mentioned in above [topics merged]. Maybe nostra can clarify...When part of the query component of a URI, why is it necessary to replace reserved characters? When done from a browser,

http://www.imdb.com/find?s=tt&q=20/20 works fine, while (what the program does)...

http://imdb.com/find?s=tt&q=20%2F20 fails, because IMDb converts it to...

http://www.imdb.com/find?s=tt&q=20%252F20 (the '%' is replaced with '%25').

I note my usual practice of changing reserved characters that I know cause problems fails in this case. Nothing is going to find '20/20' but a search for '20/20'. What I would normally do after such a failure is go the site and do the search manually.
« Last Edit: January 23, 2012, 12:21:44 am by rick.ca »

Offline goddert

  • User
  • ***
  • Posts: 39
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #4 on: January 24, 2012, 09:28:35 am »
@rick.ca

Quote
'/' is a reserved character that has to be replaced (with '%2F').

This isn't quite correct.

In general '/' is a reserved character but in the ? part of an URI (queries for example) the '/' has no reserved scope anymore (it doesn't distinguish anymore a separate part of an URI). So it may be percent encoded but it doesn't have to. So the behaviour of IMDB in this case is absolutely correct. It is the choice of IMDB to do percent encoding or not.
Unfortunately I doubt that IMDB will change it so I fear we have to work around it.


Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #5 on: January 24, 2012, 10:51:22 am »
I don't quite understand. How is it the character may be percent encoded, but IMDb is absolutely correct in not handling it? Regardless, it seems you're confirming what I suspected—special characters in the query section should not be percent encoded. And I guess that should apply to any source, not just IMDb (although I don't know if we have the same problem with any other plugins).

Offline goddert

  • User
  • ***
  • Posts: 39
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #6 on: January 24, 2012, 11:22:19 am »
Yes, you're right. My explanation wasn't very well ...

Because it is not a strictly reserved character in a query as a site operator you're free to accept only the '/'. That is what IMDB is doing.
Nobody is constrained to accept percent encoded characters if they are not obligatory. If it was a strictly reserved character you would be obliged to percent encode it and the site operator to parse this percent encoded character (indeed if you use a '/' in a non query part you are).

I'm not saying that it wouldn't be nicer if IMDB was parsing the 2%F but I fear they won't. Anyway I write them a mail ... we see whether they reflect ...

Offline goddert

  • User
  • ***
  • Posts: 39
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #7 on: January 24, 2012, 12:06:12 pm »
If you put a "www" in front of the URL it works ...
http://www.imdb.com/find?s=tt;q=20%2F20

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: IMDb plugin and Titles with reserved characters
« Reply #8 on: January 25, 2012, 02:46:59 am »
Quote
If you put a "www" in front of the URL it works ...

Now that's interesting. Maybe the fix is an easy one. :-\

 

anything