My database has about 2,200 movies, 33,000 people (all with filmographies) and 245,000 invisible movies (i.e., data for the filmographies). I can't remember the last time I've let the optimize routine to run long enough to remove duplicates. I usually give up after 12-15 hours.
Maybe I need to delete and re-download the people data as suggested, but I don't want to do that unless I'm sure it will make a difference. With these kinds of numbers, it's very difficult to tell how extensive the problem is. Many apparent duplicates are separate records for different years of a series. Those are necessary so the series will appear with the year a particular actor appeared in it. And however uncommon, it is possible for two different videos to have the same title and year.
So this is what I did: I exported my invisible movies to Excel, and used that to determine duplications in URL. There are 20 of them. In each case the titles are identical, except for differences in their capitalization. I guess this might be the result of updating different people at different times, and in the interim the capitalization of the entry has been changed at IMDb. PVD then considers the item to be new because the titles are different, even though the URLs are the same. In any case, it really doesn't matter. There are 20 duplications in 145,000 records (0.014%). It's of no possible consequence.
I suppose one question remains. Is the optimize routine removing duplicates even though I typically abort it before it's finished? I think I let it run overnight less than a week ago. Maybe that's why there are so few duplicates.
And now for the funny part. I decided I should do the same thing for my visible movies. Sometimes I notice a duplicate that's happened because I've added something using a different title than an already existing "wish list" item. There are 20 of those as well! Not hard to lose among 2,200 movies, but still...