Author Topic: [SOLVED] Multi threaded IMDB fetching? (Read 49564 times)

Happy2k · « **Reply #20 on:** August 27, 2010, 09:06:19 pm »

Quote from: nostra on August 27, 2010, 08:48:10 pm

Quote
But i really like PVD - it was hard getting to work properly in the beginning, but the possiblities are endless.

Unfortunately it seems like many users have difficulties when starting using PVD. If you have suggestions how to make the process easier for beginners, then feel free to post in the Feature Suggestions board.

Tutorials would probably help alot. A video tutorial on how to add movies, series and how to fix common errors.
If i get any others idea, i will post them in the suggestions board

patch · « **Reply #21 on:** August 27, 2010, 11:48:47 pm »

Quote from: Happy2k on August 27, 2010, 09:06:19 pm

Tutorials would probably help alot. A video tutorial on how to add movies, series and how to fix common errors.

Better still, update the wiki while you are still learning and have your best appreciation of a beginners experience.

rick.ca · « **Reply #22 on:** August 28, 2010, 12:08:15 am »

Now there's an idea! Videos for every possible help topic could be produced, and a PVD database created to catalogue them—along with any textual help that might be available (e.g., a wiki or forum topic). Users could add the videos to their collection and import the catalogue. Help topics could be found using Search or Advanced search, and the videos played directly from PVD.

Do we have any volunteers for this project?

patch · « **Reply #23 on:** August 28, 2010, 02:13:00 am »

Quote from: Happy2k on August 27, 2010, 10:55:30 am

Quote from: rick.ca on August 27, 2010, 02:25:24 am
Quote
Multi threadded IMDB fetching

That's an interesting idea for other reasons (e.g., update speed), but I would be very surprised if IMDb did not ban IP's making multiple simultaneous requests. I'm actually surprised were able to get away with what we are doing.

Before using PVD, i was trying out different programs. Most of them had a very fast IMDb fetch. I dont think IMDb would care - ive never heard of IMDb ip banning someone.
I wouldnt mind being a ginniepig for testing out multi threaded IMDb fetching.

I strongly suspect most time is spent waiting for the remote web sites to respond and downloading data.
PVD is likely to be slower than some other programs as it downloads more data.
Trying to increase the download speed by having multiple requests from the same user / program going to these databases is likely to cause more maintance problems as the html request stream generated will put further stress on these remote databases, not generate income fore them, and differ more from a browsing user for which they were designed / marketed at.

Things which are at least theoretically possibly
1) Run different plugins / web site queries in parallel. By this I mean if for a movie you are getting information from imdb, allmovie, and amazon then the sites could be accessed in series but the 3 sites could be accessed in parallel by overlapping accesses for different movies. Implementing this would probably mean calling different plugins from different tasks with a queue between each so involve considerable change to the plugin calling code but less changes to to individual plugins. Maximum speed up of x3 for 3 plugins run in parallel.

2) Only information which is going to be actually stored in PVD could be downloaded. By this I mean PVD could look at the field flags set for the plugin and the data already in PVD and determine what pages actually need to be downloaded for each website - PVD movie update. The speed benefit would be minimal when initially populating an empty PVD movie record from the first web site, so would be no faster than 1) for a mass movie import but maybe a lot faster for later incremental updates to an established PVD database. From a coding perspective it would be a major change to the plugin architecture with each plugin being passed information on which fields need to be filled, them each plugin being code to selectively download web pages as required for each PVD movie record update. So it implies a re-write of each plugin for which selective access is practical and considerable changes to the plugin calling code.

Of course I have no idea how hard any of this is for nostra but the theoretical discussion is still interesting. None of it sounds easy to me so I'm not confident any of it will be implemented.

rick.ca · « **Reply #24 on:** August 28, 2010, 03:09:37 am »

The idea of running in parallel is fine—assuming each field is filled using only one source. But most multi-source configurations are going to include some degree of "get this from the first source, but if there's nothing there use the second source." That could be handled, but it would make things a lot more complicated—and probably lose much of the speed improvement that might otherwise be achieved.

Even more problematic is the idea of getting only data for fields that need it. (I'm not sure this is what you meant, but it's implied by "maybe a lot faster for later incremental updates to an established PVD database.") Data changes. The only way to determine whether data needs to be updated is to compare it to what's currently available. So it's faster just to download all the data.

It would be helpful if fields set to "ignore" were omitted. This would make no difference to the "average" plugin configuration—where only a small number of fields are ignored. For special purpose updates of one or a few fields (e.g., updating the IMDb Top 250 rank and votes), however, would probably be considerably faster. This applies only to plugins. For a scripts, a different version of a script can be created for downloading only the data required—if that would be faster in a particular circumstance.

Happy2k · « **Reply #25 on:** August 28, 2010, 04:50:17 am »

Quote from: rick.ca on August 28, 2010, 12:08:15 am

Now there's an idea! Videos for every possible help topic could be produced, and a PVD database created to catalogue them—along with any textual help that might be available (e.g., a wiki or forum topic). Users could add the videos to their collection and import the catalogue. Help topics could be found using Search or Advanced search, and the videos played directly from PVD.

Do we have any volunteers for this project?

I would, if i wasnt starting a new semester on monday. An up to date wiki could help alot - shouldnt be too hard to fix some topics, so hopefully i get some time to do that.

patch · « **Reply #26 on:** August 28, 2010, 05:07:15 am »

Quote from: rick.ca on August 28, 2010, 03:09:37 am

The idea of running in parallel is fine—assuming each field is filled using only one source.

In option 1) above the plugins for an individual movie are run in series ensuring all the current data dependencies are preserved. ie with time on the horizontal axis

movie 1 imdb -> movie 1 allmovie -> movie 1 amazon
--------------> movie 2 imdb -> movie 2 allmovie -> movie 2 amazon
----------------------------------> movie 3 imdb -> movie 3 allmovie -> movie 3 amazon

Resulting in all sites being accessed in parallel. The how to do this is what the separate tasks and queues were for in the earlier post.

Quote

Even more problematic is the idea of getting only data for fields that need it. (I'm not sure this is what you meant, but it's implied by "maybe a lot faster for later incremental updates to an established PVD database.") Data changes. The only way to determine whether data needs to be updated is to compare it to what's currently available. So it's faster just to download all the data.

It would be helpful if fields set to "ignore" were omitted.

Currently for each field we can specify,
a) Solid tick -> get value and overwrite existing value
b) Grey -> store value only if no prior data (so really only need to get in if field is currently empty).
c) White / blank check box -> do not use this value (so no need to get it).

For populated field b & c do not need to bee downloaded, for an unpopulated field c doesn't need to be downloaded.
The potential saving depends on the granularity used by each sites web interface and how slow a particular page is to download. For example images, full cast list and deeper technical pages etc.

patch · « **Reply #27 on:** August 28, 2010, 05:25:44 am »

Quote from: nostra on August 27, 2010, 08:48:10 pm

Quote
But i really like PVD - it was hard getting to work properly in the beginning, but the possiblities are endless.

Unfortunately it seems like many users have difficulties when starting using PVD. If you have suggestions how to make the process easier for beginners, then feel free to post in the Feature Suggestions board.

I suspect the problem is there is a trade off between a programs ease of use and its flexibility.
What we could do is focus on new users initial experience because if this can be made positive then many will see reason to put in enough time to see the benefit of the more advanced features.

Some ideas to help achieve this:
1) Wiki should open on a contents and search page focused on PVD not the wiki engine.
2) Clear simplistic tutorials for basic initial set up tasks.
3) Let advanced users have to burrow through multiple pages to get to relevant things for them rather than beginners. As such documenting program use instructions in a forum is the opposite to what we need.

rick.ca · « **Reply #28 on:** August 28, 2010, 06:05:42 am »

Quote

In option 1) above the plugins for an individual movie are run in series ensuring all the current data dependencies are preserved...

OIC. Yes, that would work.

So do you think running multiple versions of the program, each simultaneously running the plugin, would be a valid test of whether IMDb would tolerate this kink of hammering?

Quote

For populated field b & c do not need to bee downloaded, for an unpopulated field c doesn't need to be downloaded.

Yes, this is essentially what I was referring to in the last paragraph of my post. I don't think it matters much if there isn't much of a time saving in the general case, as long as there is a big saving in cases where only a few fields are required. But the savings are only going to happen to the extent pages don't have to be downloaded. If only three fields need data, but they come from three different pages, it's not going to be much faster. $:-\$

Quote

As such documenting program use instructions in a forum is the opposite to what we need.

I, of course, disagree with this. We have a wiki. It doesn't seem to be used much, and there's quite obviously no one with any interest in contributing to it. So the evidence seems to be to the contrary. No one is going to argue that things like tutorials or videos or help files wouldn't be nice to have. But it seems clear no one is willing to create them. Besides, most people are still going to find software like this a bit of a challenge to learn, regardless of the documentation available. So they're going to be back here asking for help anyway.

buah · « **Reply #29 on:** August 28, 2010, 09:09:51 am »

Maybe we should try to make videos on specific issues from now on? The one who resolved the issue could make a video tutorial for it, or someone else who's willing to. New "Video Tutorials" board would be established in such purpose...

patch · « **Reply #30 on:** August 28, 2010, 09:48:21 am »

Quote from: rick.ca on August 28, 2010, 06:05:42 am

So do you think running multiple versions of the program, each simultaneously running the plugin, would be a valid test of whether IMDb would tolerate this kink of hammering?

From an accessed site perspective what is suggested will demand the same or less than the current approach so the risk is low there.
The concurrency happens at different web sites.

Running multiple versions of PVD partially tests a much higher risk implementation imo.

Quote

Quote
As such documenting program use instructions in a forum is the opposite to what we need.

I, of course, disagree with this. We have a wiki. It doesn't seem to be used much, and there's quite obviously no one with any interest in contributing to it. So the evidence seems to be to the contrary.

The suggestion was to address nostra's observation that the perceived PVD complexity is a significant barrier to a new novice user.

Quote

Unfortunately it seems like many users have difficulties when starting using PVD.

I agree we all tend to be more interested in addressing issues relevant to what we do. Unfortunately the issues are different for the novice compared to a regular forum contributer.
The form is a good way to explore new and complex problems.
It is not a good guide for a first time user who just want to try out PVD to see if it meets there needs. Many will walk away.

So we interpret the evidence differently.

rick.ca · « **Reply #31 on:** August 28, 2010, 12:30:47 pm »

Quote

The concurrency happens at different web sites.

If you're suggesting concurrent download of different movies from IMDb (or perhaps concurrent download of the various different pages of one movie) is not feasible, then it doesn't seem worthwhile to pursue the idea further.

Quote

The suggestion was to address nostra's observation that the perceived PVD complexity is a significant barrier to a new novice user.

I understand why the comment was made. To suggest the forum is good for some things and not others just doesn't make any sense. It seems we've proven a wiki not integrated the software's home (this forum) doesn't work very well. The fact it offers more features and a format more convenient for software documentation is pointless if it's not going to be used. Just like the idea of tutorials, manuals, help files and videos is pointless if no one is going to create them. But if anyone does care to create any of these things, they can be posted here—where they will get the most exposure and can be integrated with everything else that goes on here.

Quote

It is not a good guide for a first time user who just want to try out PVD to see if it meets there needs.

Why not? If a well-written tutorial would do the trick, such a thing (or a collection of such things) can just as easily be posted here as anywhere else. And here, new users—who are going to have difficulties regardless of good the "documentation" is—can ask questions and get help for their specific issues. And if no one has the time or inclination to create a formal help document, forum discussions themselves will provide a reasonable alternative. I always try to answer questions in way that will help others reading the exchange (at the time or later) will benefit as well. I hope you're not suggesting I'm wasting my time.

Quote

Many will walk away.

Of course they will. So what? You have no evidence of who, how many, why or what would make any difference. Maybe more would walk away if they tried some clear instruction and still had difficulty. They might reasonably conclude there's something wrong with the software, or it's just not for them. Who knows? Maybe that's what's happening to many who go only to the wiki for help—while those who come here learn from others the program is a challenge to learn, but mainly because of it's power and flexibility. I believe the first and most important thing any user who is ever likely to develop a serious interest in this software needs to learn is that it is worth their while to learn it. The only credible source for that information is personal advice of other experienced users—reinforced by their own experience. It seems to me this forum is the perfect means for facilitating that.

Happy2k · « **Reply #32 on:** August 28, 2010, 01:47:18 pm »

From my point of view, then all there is needed, is a "Getting started tutorial". It should contain:

- How to scan folders for movies
- How to fix movies like Transporter 2 and 3 being grouped as episodes
- How to update info
- How to add a movie if the auto-detect got no correct results
- How to run silent mode, and search for skipped movies
- How to import posters located in the same folder as the movie itself
- How to customize the export (found the template list via google - couldnt find it anywhere here)
- How to add series
- How to get it to display the episode titles correctly

Just my 5 cent

rick.ca · « **Reply #33 on:** August 28, 2010, 08:39:02 pm »

Quote

From my point of view, then all there is needed, is a "Getting started tutorial". It should contain...

That, more or less, is the intent behind the Help Index. Anyone is welcome to post a tutorial there. Feel free to use an edited version of a post or thread found elsewhere, or excerpts from the wiki. Add a link to that topic at the top, and give it a meaningful "How to" title. I'll add each such post to the index (i.e., the first post). That's getting a bit messy, but I'd be happy to redo that if there's more content.

An advantage to this approach is that we can openly discuss and collaborate on the creation of such tutorials right in that thread. Users who have difficulty following a tutorial can say so right there. Others could then help them and suggest improvements to the tutorial at the same time. Such posts would probably need to be moved to separate topics (or a "Help Index Discussion" topic) after each discussion is over, but that's easy to do.

nostra · « **Reply #34 on:** August 28, 2010, 11:39:31 pm »

Quote

1) Run different plugins / web site queries in parallel. By this I mean if for a movie you are getting information from imdb, allmovie, and amazon then the sites could be accessed in series but the 3 sites could be accessed in parallel by overlapping accesses for different movies. Implementing this would probably mean calling different plugins from different tasks with a queue between each so involve considerable change to the plugin calling code but less changes to to individual plugins. Maximum speed up of x3 for 3 plugins run in parallel.

I am afraid that running multiple different plugins will be too difficult to implement and also could cause such problems as user not exactly knowing of is happening or user getting 3 movie selection windows simultaneously. I do think that a plugin that downloads all the pages it needs from the same source at the same time would work much faster if the source allows. Such a system is also not too difficult to implement, but you will need to update the plugin to support multithreading.

Quote

2) Only information which is going to be actually stored in PVD could be downloaded. By this I mean PVD could look at the field flags set for the plugin and the data already in PVD and determine what pages actually need to be downloaded for each website - PVD movie update. The speed benefit would be minimal when initially populating an empty PVD movie record from the first web site, so would be no faster than 1) for a mass movie import but maybe a lot faster for later incremental updates to an established PVD database. From a coding perspective it would be a major change to the plugin architecture with each plugin being passed information on which fields need to be filled, them each plugin being code to selectively download web pages as required for each PVD movie record update. So it implies a re-write of each plugin for which selective access is practical and considerable changes to the plugin calling code.

The speed benefit will be too low to consider

Quote

From my point of view, then all there is needed, is a "Getting started tutorial". It should contain:

- How to scan folders for movies
- How to fix movies like Transporter 2 and 3 being grouped as episodes
- How to update info
- How to add a movie if the auto-detect got no correct results
- How to run silent mode, and search for skipped movies
- How to import posters located in the same folder as the movie itself
- How to customize the export (found the template list via google - couldnt find it anywhere here)
- How to add series
- How to get it to display the episode titles correctly

Just my 5 cent Smiley

+

Quote

1) Wiki should open on a contents and search page focused on PVD not the wiki engine.
2) Clear simplistic tutorials for basic initial set up tasks.
3) Let advanced users have to burrow through multiple pages to get to relevant things for them rather than beginners. As such documenting program use instructions in a forum is the opposite to what we need.

I think, I will implement some kind of tutorial or online help in version 1. Do not have time for 1 and 3.

Quote

Maybe we should try to make videos on specific issues from now on? The one who resolved the issue could make a video tutorial for it, or someone else who's willing to. New "Video Tutorials" board would be established in such purpose...

I think it is a good idea. Any volunteers?

mgpw4me@yahoo.com · « **Reply #35 on:** August 29, 2010, 12:08:00 am »

Video tutorials are an interesting idea. What tools do we (as a group) know of that would make this reality?

nostra · « **Reply #36 on:** August 29, 2010, 12:30:07 am »

Jing is free and pretty good for such stuff: http://www.techsmith.com/jing/

buah · « **Reply #37 on:** August 29, 2010, 11:44:49 am »

You can count on me. I could make videos on issues in which resolving I participated or I was involved. The only thing we have to keep our mind on is that we all use different skins with different custom fields, which may confuse (newbies, or others).

Happy2k · « **Reply #38 on:** August 29, 2010, 03:15:52 pm »

Quote from: buah on August 29, 2010, 11:44:49 am

You can count on me. I could make videos on issues in which resolving I participated or I was involved. The only thing we have to keep our mind on is that we all use different skins with different custom fields, which may confuse (newbies, or others).

Use the standard skin then. Its what a newbie would do - which the tutorials are focused on.

rick.ca · « **Reply #39 on:** August 29, 2010, 05:32:28 pm »

Quote from: buah on August 29, 2010, 11:44:49 am

The only thing we have to keep our mind on is that we all use different skins with different custom fields, which may confuse (newbies, or others).

I think you would have to judge that based on the nature of the video. If it's confusing or distracting, you may have to use the standard skin. But in most cases, it should be fine. An effective video will keep the focus on the actions being taken, not the appearance. Of more concern would be menus and captions modified using a custom language file. Imagine a newbie wondering, "Why isn't that command on my menu?!"

News:

Author Topic: [SOLVED] Multi threaded IMDB fetching? (Read 49564 times)

buah

mgpw4me@yahoo.com

buah