Author Topic: Dealing with movie duplicates  (Read 13496 times)

0 Members and 1 Guest are viewing this topic.

Offline AimHere

  • Older Power User
  • *****
  • Posts: 213
    • View Profile
Dealing with movie duplicates
« on: December 28, 2010, 07:29:26 pm »
Hi,

I'm having a minor issue with duplication of movies in my database. Here's what's happening:

I like to maintain a [reasonably] complete filmography for each of the actors in my database. So, for each actor, I use the IMDB People plugin to import the filmography from IMDB. The plugin is set to overwrite filmographies, and "Merge filmography" selected in Preferences.

This works well enough most of the time. But occasionally, for certain movies, IMDB lists a different release year than the one I already have in my database record for that movie. This causes the actor's filmography to have TWO entries with that movie title, one for the complete movie record I already had, and another linking to an "invisible" movie record with the release year from IMDB.

Now, I can usually get rid of the duplicate entry by editing my "complete" movie record to match the release year from IMDB (this has the effect of merging the two records, as long as the titles match exactly). Occasionally, though, I also have to take the additional step of clicking on the imported IMDB title in the actor's filmography to make the movie record visible in Movies View, then holding down the CTRL key while deleting it.

What I'd like to know is, is there a simple way to scan through all of my actor filmographies to find these duplicates? Right now, I'm dealing with them as I notice them, but I suspect there are a ton more I haven't found yet. I don't know if the "Delete movie duplicates" in "Tools/Optimoze Database" will do what I want, and I'd rather not risk having it delete any "complete" movie records.

Aimhere

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Dealing with movie duplicates
« Reply #1 on: December 28, 2010, 11:52:18 pm »
Quote
The plugin is set to overwrite filmographies, and "Merge filmography" selected in Preferences.

Why do you have "Merge filmography" selected? If it wasn't, running the plugin would fix the problem by replacing everything with the person's current filmography.

Quote
IMDB lists a different release year than the one I already have in my database record for that movie.

Why are they different?

Quote
I don't know if the "Delete movie duplicates" in "Tools/Optimize Database" will do what I want, and I'd rather not risk having it delete any "complete" movie records.

I believe the only way that could happen is if both records are visible. That's not the case in the scenario you describe. But if you're concerned about that possibility, you can still run the utility with the option to remove only invisible records.

Offline AimHere

  • Older Power User
  • *****
  • Posts: 213
    • View Profile
Re: Dealing with movie duplicates
« Reply #2 on: December 30, 2010, 05:17:51 pm »
Quote
The plugin is set to overwrite filmographies, and "Merge filmography" selected in Preferences.

Why do you have "Merge filmography" selected? If it wasn't, running the plugin would fix the problem by replacing everything with the person's current filmography.

What happens if I don't have "merge filmography" selected, and I have a movie in my database that ISN'T in IMDB's filmography? (IMDB isn't always complete, after all.) What happens to my movie record? Would the actor's filmography in PVD continue to list that movie, even if it wasn't found on IMDB? Or would the filmography omit it, thus disconnecting the movie from the actor? I thought that's what "Merge filmographies" was supposed to prevent?

Quote
Quote
IMDB lists a different release year than the one I already have in my database record for that movie.

Why are they different?

Because I import most of the movie info (posters/box art, studio, descriptions, etc.) from another source, AdultDVDEmpire. (In this genre, IMDB rarely has any info other than the title and cast.) And the two sources just happen to disagree on release year.

In my typical workflow, I'll add a movie, import the movie info from ADE, confirm the list of cast members, go to each cast member in turn, and update the person's filmography from IMDB. It's that last step that's giving me the duplicate movies in the filmography, when ADE and IMDB disagree on the release year.

Quote
Quote
I don't know if the "Delete movie duplicates" in "Tools/Optimize Database" will do what I want, and I'd rather not risk having it delete any "complete" movie records.

I believe the only way that could happen is if both records are visible. That's not the case in the scenario you describe. But if you're concerned about that possibility, you can still run the utility with the option to remove only invisible records.

"Delete movie duplicates" (unlike "Delete movie orphans") doesn't have an option to only remove invisible records.

Okay, as a test I made a copy of my database, then ran "Optimize database" with "Delete movie duplicates" selected. It did delete some 300+ movies. Unfortunately, it removed eight (out of 2009) visible movie records, e.g. movies that were legitimately in my collection. I don't know why this would happen, I'm certain I didn't have any actual duplicates. It's possible there may have been a couple of movies (from different studios) with identical titles, but whenever I run across those, I add the studios to the titles to avoid confusion...

Aimhere
« Last Edit: December 30, 2010, 05:34:44 pm by AimHere »

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: Dealing with movie duplicates
« Reply #3 on: December 30, 2010, 08:34:18 pm »
Quote
I thought that's what "Merge filmographies" was supposed to prevent?

Yes, not using that option results in the existing filmography being replaced. The only good reason for using it is the situation you didn't describe at first. The cause of your issue is the inconsistency in data between two sources.

Quote
And the two sources just happen to disagree on release year.

Before updating the people, update the movie with the IMDb plugin set to overwrite Title and Year, and to not change credits.

Quote
"Delete movie duplicates" (unlike "Delete movie orphans") doesn't have an option to only remove invisible records.

I know that. I was only stating what I believe based on my experience. I've never noticed it delete a visible movie. You could export before and after lists of visible moves and compare those in Excel to determine what was deleted. That should indicate why the deletions are happening.

Offline AimHere

  • Older Power User
  • *****
  • Posts: 213
    • View Profile
Re: Dealing with movie duplicates
« Reply #4 on: December 31, 2010, 12:39:51 am »
I've never noticed it delete a visible movie. You could export before and after lists of visible moves and compare those in Excel to determine what was deleted. That should indicate why the deletions are happening.

Okay, did before-and-after exports, and compared them. PVD deleted eight out of 2022 visible movie records. I can't find any rhyme or reason why it would think they are duplicates, either. Looking at my "before" database, the movies all have proper titles and "original titles", all unique as far as I can tell. (A couple are parts of series, e.g. "Some Movie", "Some Movie 2", etc., but they shouldn't be considered duplicates.)

I wonder if we could get Nostra to weigh in on the methodology used to determine whether movies are duplicates?

Aimhere

Offline AimHere

  • Older Power User
  • *****
  • Posts: 213
    • View Profile
Re: Dealing with movie duplicates
« Reply #5 on: December 31, 2010, 01:17:09 am »
In my testing just now, I stumbled across a helpful tool in PVD... Filters/Advanced/Show All. By being able to see both visible and invisible movie records side-by-side (and searching for the titles that had been removed by the test optimization), I found that the movies in question DID have duplicate records. In some cases, a single person (out of half a dozen or more) had an extra entry for the movie in their filmography. In other cases, the duplicate was listed under a director's or producer's filmography, when previously I had only been considering actors. D'oh!!!  ;D

I still don't know why PVD's optimization would delete the visible record instead of the invisible one, though. The dupes I saw appeared to have the same URLs (for IMDB) as the visible ones. Many of my movies have multiple URL's, though (from both IMDB and ADE), maybe that has something to do with it?

So, I deleted all the "invisible" duplicates I could find, and now I'm doing another test optimization (again, with "Delete movie duplicates" selected) to see what happens.  It's so slow, though, with that option... updates to follow...

Aimhere

Offline AimHere

  • Older Power User
  • *****
  • Posts: 213
    • View Profile
Re: Dealing with movie duplicates
« Reply #6 on: December 31, 2010, 01:50:29 am »
Success! The number of visible movie records didn't change after this optimization pass. PVD did delete 368 movies, but they were all invisible ones.

I guess now it's just a matter of preventing future duplications.

Aimhere