IMDB_ [EN] [HTTPS] scriptIn the tests I found several errors that I corrected and added some of the code parts for better information transfer. Added some new code sections for downloading additional information.
An many errors in IMDB_ [EN] [HTTPS] script.
Errors are in Rating, Top 250 IMDB Votes, Studio and Full Actors in the Full Cast and Crew section. This happened because of the change of source code on IMDB Movies web pages. I've already tested the code corrections and code corrections will be published in a few days. I beg you for patience, because I'm a little more close to my free time.
First of all, what I've corrected for changing the source code on IMDB pages.
Full Cast and Crew sectionI've repaired pieces of code for transmitting actors information.
Function ParsePage_IMDBMovieCREDIT(HTML:String):Cardinal; //BlockOpen
//Returns:
// Result:=prFinished; Script has finished gathering data
// Result:=prError; If żany big problem? with exit;
//Retrieve: ~crew~ctDirectors,ctWriters,ctComposers,ctProducers
// ~actors~ ctActors
Var
curPos,endPos,index:Integer;
ItemList:String;
Name,Role,PersonURL:String;
Begin
LogMessage('Function ParsePage_IMDBMovieCREDIT BEGIN=====================||');
Result:=prFinished; //It will change to prError if any big problem with exit;
//Get ~crew~ctDirectors,ctWriters,ctComposers,ctProducers
.
.
//Get to "Cast" ~actors~ ctActors
//Go Cast list
curPos:=Pos('<table class="cast_list">',HTML); //Strings start which opens the block content data. WEB_SPECIFIC
curPos:=curPos+Length('<table class="cast_list">'); //Strings end which opens the block content data. WEB_SPECIFIC
//Get all "raw" crew summary (in raw because we need the hidden person links) May one person or severals in the ~crew~
endPos:=PosFrom('</div>',HTML,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
ItemList:=Copy(HTML,curPos,endPos-curPos);
//LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
curPos:=Pos('<td class="primary_photo">',ItemList) //String which opens the subList data. WEB_SPECIFIC
index:=1;
While curPos>0 Do Begin
If (index>PEOPLE_LIMIT) Then break; //Limited depassed (Remember index begin in 0).
//Get PersonURL (Always must it has)
PersonURL:=BASE_URL_PERSON_PRE + TextBetWeen(ItemList,'<a href="/name/','/',false,curPos) + BASE_URL_SUF; //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results PersonURL:'+PersonURL+'||');
//Get Name (Always must it has)
//Name:=TextBetWeen(ItemList,'itemprop="name">','<',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
Name:=TextBetWeen(ItemList,'> ','</a>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results Name:'+Name+'||');
//Get Role
//Role:=TextBetWeen(ItemList,'<td class="character">','</a>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
Role:=TextBetWeen(ItemList,'<td class="character">','</td>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results Role:'+Role+'||');
Role:=StringReplace(Role,' (',' (',True,False,True);
Role:=StringReplace(Role,'(uncredited) ','(uncredited) • ',True,False,True);
Role:=StringReplace(Role,') ',') ',True,False,True);
Role:=StringReplace(Role,' / ',' - ',True,False,True);
Role:=StringReplace(Role,' / ... ',' • ',True,False,True);
Role:=StringReplace(Role,' ',' • ',True,False,True);
Role:=StringReplace(Role,' (uncredited)',' (uncredited)',True,False,True);
LogMessage(' Parse Results Role_1:'+Role+'||');
AddMoviePerson(Name,'',Role,PersonURL,ctActors);
LogMessage(' Get results cast:#'+IntToStr(index)+'|'+Name+'|'+Role+'|'+PersonURL+'||ctActors');
curPos:=PosFrom('<td class="primary_photo">',ItemList,curPos); //String which opens the Web Result item List data. WEB_SPECIFIC
index:=index+1;
End;
LogMessage('Function ParsePage_IMDBMovieCREDIT END=====================||');
End; //BlockClose
Note:
At Actors is also part of the role code for the actors series informations.I corrected the pieces of code for transmitting information to the director, writer, producer and music, because in certain cases it transmitted wrong information with Full Cast and Crew pages. This happened when there was no information for actors, writer, producer or music .
For example, the same name was added to the database for the director and producer, even though the producer on the Full Cast and Crew site was not mentioned. Director, writer, producer and music do not have role information in dastabase. For the correct transfer of information in the database, I added all the roles to everyone. Role and LogMessage are also for the debug log file. The role is not visible in the database information unless the transfer was in the custom field.
Part 1:
//Get ~crew~ctDirectors,ctWriters,ctComposers,ctProducers
//Go to "Directed by" ~crew~ctDirectors
//curPos:=Pos('Directed by',HTML); //Strings start which opens the block content data. WEB_SPECIFIC
curPos:=Pos('<h4 class="dataHeaderWithBorder">Directed by',HTML); //Strings start which opens the block content data. WEB_SPECIFIC
curPos:=PosFrom('</h4>',HTML,curPos); //Strings end which opens the block content data. WEB_SPECIFIC
curPos:=curPos+Length('</h4>'); //Strings end which opens the block content data. WEB_SPECIFIC
//Get all "raw" crew summary (in raw because we need the hidden person links) May one person or severals in the ~crew~ n the ~crew~
endPos:=PosFrom('</table>',HTML,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
ItemList:=Copy(HTML,curPos,endPos-curPos);
//LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
curPos:=Pos('<td class="name">',ItemList) //String which opens the subList data. WEB_SPECIFIC
index:=1;
While curPos>0 Do Begin
If (index>PEOPLE_LIMIT) Then break; //Limited depassed (Remember index begin in 0).
//Get PersonURL (Always must it has)
PersonURL:=BASE_URL_PERSON_PRE + TextBetWeen(ItemList,'<a href="/name/','/',false,curPos) + BASE_URL_SUF; //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results PersonURL:'+PersonURL+'||');
//Get Name (Always must it has)
//Name:=TextBetWeen(ItemList,'>','</a>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
Name:=TextBetWeen(ItemList,'> ','</a>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results Name:'+Name+'||');
//Get Role NO: PVD don't save Role in crew people
//Get Role
Role:=TextBetWeen(ItemList,'<td class="credit">','</td>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results Role:'+Role+'||');
AddMoviePerson(Name,'','',PersonURL,ctDirectors);
LogMessage(' Get results Director:#'+IntToStr(index)+'|'+Name+'|'+PersonURL+'||ctDirectors');
curPos:=PosFrom('<td class="name">',ItemList,curPos) //String which opens the subList data. WEB_SPECIFIC
index:=index+1;
End;
//Go to "Writer:" or "Writers:" ~crew~ctWriters
curPos:=Pos('<h4 class="dataHeaderWithBorder">Writing Credits',HTML);
If 0<curPos Then Begin
//curPos:=Pos('Writing',HTML); //Strings start which opens the block content data. WEB_SPECIFIC
curPos:=Pos('<h4 class="dataHeaderWithBorder">Writing Credits',HTML); //Strings start which opens the block content data. WEB_SPECIFIC
curPos:=PosFrom('</h4>',HTML,curPos); //Strings end which opens the block content data. WEB_SPECIFIC
curPos:=curPos+Length('</h4>'); //Strings end which opens the block content data. WEB_SPECIFIC
//Get all "raw" crew summary (in raw because we need the hidden person links) May one person or severals in the ~crew~
endPos:=PosFrom('</table>',HTML,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
ItemList:=Copy(HTML,curPos,endPos-curPos);
//LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
curPos:=Pos('<td class="name">',ItemList) //String which opens the subList data. WEB_SPECIFIC
index:=1;
While curPos>0 Do Begin
If (index>PEOPLE_LIMIT) Then break; //Limited depassed (Remember index begin in 0).
//Get PersonURL (Always must it has)
PersonURL:=BASE_URL_PERSON_PRE + TextBetWeen(ItemList,'<a href="/name/','/',false,curPos) + BASE_URL_SUF; //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results PersonURL:'+PersonURL+'||');
//Get Name (Always must it has)
//Name:=TextBetWeen(ItemList,'>','</a>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
Name:=TextBetWeen(ItemList,'> ','</a>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results Name:'+Name+'||');
//Get Role NO: PVD don't save Role in crew people
//Get Role
Role:=TextBetWeen(ItemList,'<td class="credit">','</td>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
LogMessage(' Parse Results Role:'+Role+'||');
AddMoviePerson(Name,'','',PersonURL,ctWriters);
LogMessage(' Get results Writer:#'+IntToStr(index)+'|'+Name+'|'+PersonURL+'||ctWriters');
curPos:=PosFrom('<td class="name">',ItemList,curPos) //String which opens the subList data. WEB_SPECIFIC
index:=index+1;
End;
End;