The idea of running in parallel is fine—assuming each field is filled using only one source.
In option 1) above the plugins for an individual movie are run in series ensuring all the current data dependencies are preserved. ie with time on the horizontal axis
movie 1 imdb -> movie 1 allmovie -> movie 1 amazon
--------------> movie 2 imdb -> movie 2 allmovie -> movie 2 amazon
----------------------------------> movie 3 imdb -> movie 3 allmovie -> movie 3 amazon
Resulting in all sites being accessed in parallel. The how to do this is what the separate tasks and queues were for in the earlier post.
Even more problematic is the idea of getting only data for fields that need it. (I'm not sure this is what you meant, but it's implied by "maybe a lot faster for later incremental updates to an established PVD database.") Data changes. The only way to determine whether data needs to be updated is to compare it to what's currently available. So it's faster just to download all the data.
It would be helpful if fields set to "ignore" were omitted.
Currently for each field we can specify,
a) Solid tick -> get value and overwrite existing value
b) Grey -> store value only if no prior data (so really only need to get in if field is currently empty).
c) White / blank check box -> do not use this value (so no need to get it).
For populated field b & c do not need to bee downloaded, for an unpopulated field c doesn't need to be downloaded.
The potential saving depends on the granularity used by each sites web interface and how slow a particular page is to download. For example images, full cast list and deeper technical pages etc.