Author Topic: New IMDb People v3 (Selenium) script comments  (Read 266 times)

0 Members and 2 Guests are viewing this topic.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
New IMDb People v3 (Selenium) script comments
« on: January 09, 2025, 06:38:32 pm »
New IMDb People v3 (Selenium) script comments

Here are some pieces of code that would be added to the script or updated in an existing script.

Here are the basic settings.

Quote
//Script Options-------------------------------------------------------------------------------------------------------
  //Retreive Data Config
  USE_SAVED_PVDCONFIG  = True ;  //Use the Overwrite Options of the script saved in pvdconf.ini for avoid download not used pages. Remember PVD only save in exit.
  //MAX_IMAGE_HEIGHT  = 12000; //Height limit of the stored photos.
  MAX_IMAGE_HEIGHT  = 1200; //Height limit of the stored photos.
  //MAX_IMAGE_HEIGHT  = 500; //Height limit of the stored photos.
  //Process Data Config
  PHOTO_URL_IN_TRANSNAME  = False ; //Use the PVD field ~transname~ for storing the URL to the person photo for send to KODI in a Template.
  //BIRTH_NAME_IN_TRANSNAME  = True ; //Use the PVD field ~transname~ for storing the person Birth Name for Biography Pages.  // No works
  BIRTH_NAME_IN_TRANSNAME  = False ; //Use the PVD field ~transname~ for storing the person Birth Name for Biography Pages. // No works
  GET_FULL_BIO  = True ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //GET_FULL_BIO  = False ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //BIO_INFO_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
  BIO_INFO_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
  //BIO_URL_IN_BIO  = True ; //Use the PVD field ~bio~ for  storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  BIO_URL_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages. 
  //IMDB_MINI_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  IMDB_MINI_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  GET_FULL_CREDIT  = True ;  //Download Credits (text only) provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //GET_FULL_CREDIT  = False ;  //Download Credits (text only) provider page for retreive the info. Otherwise only the info of the principal peple page.   
  GET_FULL_GENRES  = True ;  //Download Genres provider page for retreive the info. Otherwise only the info of the principal peple page.
  //GET_FULL_GENRES  = False ;  //Download Genres provider page for retreive the info. Otherwise only the info of the principal peple page.
  GET_FULL_AWARDS  = True ;  //Download Awards provider page for retreive the info. Otherwise doesn't do nothingh because no info in the principal movie page.
  //GET_FULL_AWARDS  = False ;  //Download Awards provider page for retreive the info. Otherwise doesn't do nothingh because no info in the principal movie page. 
  EVENTS_LIMIT  = 1000;   //Limit of number of events (USA Academy Awards, Golden Globes, etc) to retrive awards.
  //Process Behaviour Config
  BYPASS_SILENT  = True ; //Ensure critical ShowMessage alerts bypassing Silent PVdB preferences
  CHECK_WEBSITE  = False ;  //Add to SearchResult List the true HTTPS links 'Just to check the website' with the browser
  //CHECK_WEBSITE  = True ;  //Add to SearchResult List the true HTTPS links 'Just to check the website' with the browser
  POSTER_IN_SEARCH  = False ; //Download and show people posters in the list of the SearchResult 
  //POSTER_IN_SEARCH  = True ; //Download and show people posters in the list of the SearchResult
  //SEARCH_ENGINE  = True ;  //If there isn't provider search results, try with Bing search engine
  SEARCH_ENGINE  = False ;  //If there isn't provider search results, try with Bing search engine
  PHOTO_DWN_RONDABOUT  = True ; //Activate the "HTTPS image download function" and the "ImageListSearch exit" as RONDABOUT (bypass a bug) for download Photos.
                            //   Because there is not choice (because its only one photo) normaly it download without asking but if PVdB begin to ask then
                            //   with PVdB preference/Plugin/Silent Enable would be more confortable for large databases.*)
  INTERNET_TEST_ITERATIONS  = 6;  //Attempts before to alert user that not internet connection detected. Increase if the provider has low speed.                           
//Script data------------------------------------------------------------------------------------------------------------

Downloading bio information without url links and without IMDb Mini Biography letters.

If these settings are changed
Quote
  BIO_URL_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  //BIO_URL_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages. 
  IMDB_MINI_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  //IMDB_MINI_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
then url links and IMDb Mini Biography letters.

Quote
Function RemoveTagsEx00(AText:String):String; //BlockOpen
    //Ivek23 function for get faster the script
Var
   B,E:Integer;
Begin
   Result:=AText;
   B:=PosFrom('<link url="',Result,1);
   E:=PosFrom('">',Result,B);
   While (B>0) AND (B<E) Do Begin
      Delete(Result,B,E-B+2);
      B:=Pos('<link url="',Result);
      E:=Pos('">',Result);
   End;
End; //BlockClose

« Last Edit: January 09, 2025, 11:50:09 pm by Ivek23 »
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #1 on: January 09, 2025, 06:57:27 pm »
Function ParsePage_IMDBPersonBASE changes

Quote
//(*
Function ParsePage_IMDBPersonBASE(HTML:String):Cardinal; //BlockOpen
    //Returns:
    //     Result:=prFinished; Script has finished gathering data
    //     Result:=prListImage; As RONDABOUT (bypass a bug) for download Photos
    //     Result:=prError; if żany big problem? with exit
    //Retrieve: IMDB has a json container, easy to scrap.
    //         ~url~, ~name~,~altnames~(NO),~birthday~,~birthplace~,~death~,
    //         If Not(GET_FULL_BIO) ~bio~   
    //          ~Photo~,
    //         If PHOTO_URL_IN_TRANSNAME. ~transname~ The PVdB ~transname~ Translated Name not stored in TheMovieDB. Used for PhotoURL
    //         ~genre~(NO) Female or Male (Even if PVB Scripting Manual say 'comma separeted list' because is in the same list that movie ~genre~)
    //         ~comment~ Not used
    //         ~age~ Not used. Calculated in PVdB. ~age~
    //         ~dateadded~ Not used. Calculated in PVdB.
    //         "homepage": Not used by PVdB   
  Var
    curPos,endPos,debug_pos1,index:Integer;
    PhotoURL,ItemValue,ItemList,ImageFile:String;
   PersonID,ItemValue0,ItemValue1,ItemValue2,ItemValue3:String;
   jobTitle,AltNames,AltNames1,DeathAge:String;
   ItemList0,ItemList1,ItemList2,ItemList4:String;   
   Title,Role,Year,MovieURL:String; 
   AwardsValue,AwardList:String;
  Begin

Under the getGet ~Photo~ code, the set of Alternate Names code is removed and updated code is added as seen below.

Quote
   //(*
    //Get ~Photo~ . Remember that the PVdB ~transname~ Translated Name is not stored in TheMovieDB. Can be used for PhotoURL
    ItemValue:=TextBetWeenFirst(ItemList,'"image":"','",');                  // WEB_SPECIFIC.
    If (Length(ItemValue)>0) and (Pos('nopicture',ItemValue)=0)Then Begin            //"https://m.media-amazon.com/images/G/01/imdb/images/nopicture/...' NOT exists working httpS
        PhotoURL:=TextBetWeenFirst(ItemValue,BASE_URL_IMAGE_PRE_TRUE,'.');       //Get poster code. Strings which opens/closes the data. WEB_SPECIFIC       
        If ((Length(PhotoURL)>0) and Not(USE_SAVED_PVDCONFIG and (Copy(PVDConfigOptions,opPhoto,1)='0'))) then begin  //The Poster will be saved in PVD
            PhotoURL:=BASE_URL_IMAGE_PRE_TRUE + PhotoURL;                             //Base poster URL without '.jpg'. WEB_SPECIFIC
            ImageFile:=GetAppPath+'Scripts\'+BASE_DOWNLOAD_FILE_IMAGE_NAME+'-Photo.jpg'
            // Avoid HTTPS redirection: Download https image to file
            If (1=DownloadImage(PhotoURL + '._V1_UY' + IntToStr(MAX_IMAGE_HEIGHT) + '_.jpg',ImageFile)) then begin  //Dowload with the selected user's max size. WEB_SPECIFIC
//LogMessage('Image successfully downloaded to: ' + ImageFile);
         //Dowload in the selected user max size. WEB_SPECIFIC
               //Log the actual value being added
//LogMessage('Adding image with URL: ' + ImageFile + ' and type: itPoster');

//Call the AddImageURL procedure
                AddImageURL(itPoster,ImageFile);    //Get the photo to the database.But I don't know why but it doesnt work: not retrive the photo like in movie poster

               //Log a confirmation message after adding the image
              LogMessage('AddImageURL in user-s size has been called with ImageType: ' + IntToStr(itPoster) + ' and ImageFile: ' + ImageFile);

                AddSearchResult(GetFieldValueXML('name'), '', '', ImageFile, ImageFile); //It's not possible avoid GetFieldValueXML because the name can't be the same.
                if PHOTO_URL_IN_TRANSNAME then AddFieldValueXML('transname',PhotoURL + '._V1_UY' + IntToStr(MAX_IMAGE_HEIGHT) + '_.jpg'); //For storing the URL to the person photo, for send to KODI in a Template
                //LogMessage('      Get result PhotoURL:'+PhotoURL + '._V1_UY' + IntToStr(MAX_IMAGE_HEIGHT) + '_.jpg'+'||');
                LogMessage('Script end. After, PVdB will retreive from ListImage and info of person in order get the photo');
                Result:=prListImage;
            end else if (1=DownloadImage(ItemValue +'.jpg',ImageFile)) then begin  //Donwload in the web base size. WEB_SPECIFIC
                AddImageURL(itPoster,ImageFile);    //Get the photo to the database.But I don't know why but it doesnt work: not retrive the photo like in movie poster
            LogMessage('AddImageURL web based size has been called with ImageType: ' + IntToStr(itPoster) + ' and ImageFile: ' + ImageFile);

                AddSearchResult(GetFieldValueXML('name'), '', '', ImageFile, ImageFile); //It's not possible avoid GetFieldValueXML because the name can't be the same.
                if PHOTO_URL_IN_TRANSNAME then AddFieldValueXML('transname',PhotoURL+'.jpg'); //For storing the URL to the person photo, for send to KODI in a Template
                //LogMessage('      Get result PhotoURL:'+PhotoURL+'.jpg'+'||');
                LogMessage('Script end. After, PVdB will retreive from ListImage and info of person in order get the photo');
                Result:=prListImage;
            end;       
        End;       
    End Else Begin
        PhotoURL:='';
    End;
   //*)
   
   //(*
   ItemList:='';
   //~jobTitle~
    //Begin of scrap the json container.
    ItemList1:=TextBetWeenFirst(HTML,'<script type="application/ld+json">','</script>');
   //LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList: '+'<script type="application/ld+json"'+ItemList+'}</script>'+'||');      
    ItemList1:=StringReplace(ItemList1,'}',',',True,True, False); //Replace the last } and then all the TheMovieDB jason fields finish with ',"' even the last. WEB_SPECIFIC.   
    //Get ~jobTitle~ jobTitle
    ItemValue:=TextBetWeenFirst(ItemList1,'","jobTitle":["','"],');                   //WEB_SPECIFIC.
   ItemValue:=StringReplace(ItemValue,'","',',  ',True,False,True);
    if (2<Length(ItemValue)) then begin
        If ItemValue <> '' then ItemList:=ItemList+'JobTitle:  '+ItemValue+'  ';
      if ItemValue <> '' then AddFieldValueXML('careertype',ItemValue);
        LogMessage('      Get result jobTitle:'+ItemValue+'||');
    end;   
   //*)   
   //(*
   //~Career~
   If Pos('<h3 class="ipc-title__text"><span id="credits">Credits</span>',HTML)>0 Then Begin
      curPos:=Pos('<h3 class="ipc-title__text"><span id="credits">Credits</span>',HTML);
      If curPos>0 Then Begin
         EndPos:=curPos;
         //ItemValue:=TextBetween(HTML,'<div id="jumpto">','<div id="filmography">',True,curPos);
         ItemValue1:=HTMLValues2(HTML,'<h3 class="ipc-title__text"><span id="credits">Credits</span>','<span class="ipc-chip__text">IMDbPro</span>','" class="ipc-chip ipc-chip--on-base-accent2" tabindex="0" aria-disabled="false"><span class="ipc-chip__text">','<span class="ipc-chip__count">',',  ',EndPos);
         If ItemValue1 <> '' then ItemList:=ItemList+#13+'Filmography  -  Career:  '+ItemValue1+'  ';
      End;
   End;   
   //*)
   
   //(*   
   //Get ~Main Page URL~

And here too is part of the code change.

Quote
   //(*   
   //~Died~
   curPos:=Pos('<li role="presentation" class="ipc-metadata-list__item" data-testid="nm_pd_dl"><span class="ipc-metadata-list-item__label" aria-disabled="false">Died</span>',HTML);
   If curPos>0 Then Begin
   //*)   
   //(*
      EndPos:=curPos;
      ItemValue1:=HTMLValues(HTML,'<span class="ipc-metadata-list-item__label" aria-disabled="false">Died</span>','</li></ul></div></li>','<li role="presentation" class="ipc-inline-list__item test-class-react','</li></ul></div></li>','  ',EndPos);
      ItemValue1:=StringReplace(ItemValue1,'<li role="presentation" class="ipc-inline-list__item">',' in ',True,False,True);
      ItemValue1:=StringReplace(ItemValue1,'<span class="ipc-metadata-list-item__list-content-item--subText">','  ',True,False,True);      
      //LogMessage('  **    Parse Results Died10:'+ItemValue1+'||');      
      //ItemValue1:=RemoveTags(ItemValue1, False);
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);
      ItemValue1:=RemoveTagsEx0(ItemValue1);   
      ItemValue1:=RemoveTagsEx0(ItemValue1);            
      ItemValue1:=StringReplace(ItemValue1,'">','',True,False,True);
      ItemValue1:=StringReplace(ItemValue1,'  (undisclosed)','',True,False,True);      
      If ItemValue1 <> '' then ItemList:=ItemList+#13+'Died:  '+ItemValue1 Else ItemList:=ItemList+#13;      
      LogMessage('      Parse Results Died10:'+ItemValue1+'||');
   End;      
   //*)   
   //(*
   //~AwardsSummary~ 
   //If Pos('<section cel_widget_id="StaticFeature_Awards" class="ipc-page-section ipc-page-section--base celwidget" data-csa-c-id="1va8oc-j9jhr6-xyfws0-meswem" data-cel-widget="StaticFeature_Awards">',HTML)>0 Then Begin
      curPos:=PosFrom('<div data-testid="awards" class="sc-710dd9d1-0 iiIXRd base"><div class="sc-710dd9d1-1 cmBtRN">',HTML,EndPos);
      //If curPos>0 Then Begin
      EndPos:=PosFrom('</ul></div></section>',HTML,curPos);
      //AwardList:=Copy(HTML,curPos,endPos-curPos);
      AwardList:=Trim(Copy(HTML,curPos,endPos-curPos));
      //LogMessage('    *    Parse Results AwardsSummary1x:'+AwardList+'||');
      if (2<Length(AwardList)) then begin
         AwardsValue:=TextBetWeenFirst(AwardList,'/awards/?ref_=nm_awd"','</span></li></ul></div>');       
         LogMessage('    *    Parse Results AwardsSummary1a:'+AwardsValue+'||');
         AwardsValue:=StringReplace(AwardsValue,'</a><div class="ipc-metadata-list-item__content-container"><ul class="ipc-inline-list ipc-inline-list--show-dividers ipc-inline-list--inline ipc-metadata-list-item__list-content base" role="presentation"><li role="presentation" class="ipc-inline-list__item"><span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">',' • ',False,True,True);
         AwardsValue:=StringReplace(AwardsValue,'>','',True,False,True);
         LogMessage('    *    Parse Results AwardsSummary:'+AwardsValue+'||');
         If AwardsValue <> '' then AwardsValue:=#13+'--------------------------------------------------------------------------'+#13+'<link url="http://www.imdb.com/name/'+PersonID+'/awards/">Awards link</link> •• '+AwardsValue+' •• ';      
      End;   
   //End;
   //*)   
   //(*
   //~Alternate Names~
   curPos := Pos('<script>if(typeof uet === ''function''){ uet(''be'', ''StaticFeature_PersonalDetails'', {wb: 1}); }</script>', HTML);
   LogMessage('curPos after finding Alternative Names curPos: ' + IntToStr(curPos));
   If curPos > 0 Then Begin
      EndPos := curPos;
      LogMessage('EndPos set to curPos: ' + IntToStr(EndPos));
      // Extract values between the specified tags
      AltNames1 := HTMLValues(HTML, '<script>if(typeof uet === ''function''){ uet(''be'', ''StaticFeature_PersonalDetails'', {wb: 1}); }</script>', '"feature_contribution_header":"Contribute to this page"', '{"node":{"displayableProperty":{"value":{"plainText":"', '","__typename":"Markdown"},"__typename":"DisplayableNameAkaProperty"},"__typename":"NameAka"},"', ',  ', EndPos);
   
      LogMessage('    *    Parsed Result Alternative Name: ' + AltNames1);   
      AltNames1:=StringReplace(AltNames1,'\u0026',#38,True,False,True);
      
      If AltNames1 <> '' then AddFieldValueXML('AltNames', ItemValue1);
      
      If AltNames1 <> '' then ItemList:=ItemList+#13+'Alternate Names:  '+AltNames1+'  ';
      If AltNames1 <> '' then LogMessage('      Parsed Results All Expanded Alternative Names: ' + AltNames1 + '||');
   End;
   //*)      
   //(*
   //~Height~
   curPos:=Pos('<h3 class="ipc-title__text"><span id="personalDetails">Personal details</span>',HTML);
   If curPos>0 Then Begin
      EndPos:=curPos;
      ItemValue0:=HTMLValues2(HTML,'<li role="presentation" class="ipc-metadata-list__item" data-testid="nm_pd_he"><span class="ipc-metadata-list-item__label" aria-disabled="false">Height</span>','</li></ul></div></li>','<span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">','</li></ul></div></li>','<br>',EndPos);
      If ItemValue0 <> '' then ItemList:=ItemList+#13+'Height:  '+ItemValue0+'  ';
      LogMessage('      Parse Results Height:'+ItemValue0+'||');
   End;
   //*)
   //(*
   //~Nicknames~

« Last Edit: January 09, 2025, 08:06:43 pm by Ivek23 »
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #2 on: January 09, 2025, 08:02:00 pm »
And here too is part of the code change

Quote
   //(*
   //~Nicknames~
   curPos:=Pos('<li role="presentation" class="ipc-metadata-list__item ipc-metadata-list__item--stacked" data-testid="name-dyk-nickname"><span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nickname</span>',HTML);
   If curPos>0 Then Begin
      EndPos:=curPos;
      ItemValue0:=HTMLValues2(HTML,'<span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nickname</span>','</li></ul></div','<span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">','</span>','<br>',EndPos);
      //LogMessage('    *    Parse Results Nickname1:'+ItemValue0+'||');
      //ItemValue0:=StringReplace(ItemValue0,'                                    See more                                »','',True,False,True);   
      If ItemValue0 <> '' then ItemList:=ItemList+#13+'Nickname:  '+ItemValue0+'  ';
      LogMessage('      Parse Results Nickname:'+ItemValue0+'||');
   End;
   //*)
   //(*
   //~Nicknames~
   curPos:=Pos('<li role="presentation" class="ipc-metadata-list__item ipc-metadata-list__item--stacked" data-testid="name-dyk-nickname"><span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nicknames</span>',HTML);
   If curPos>0 Then Begin
      EndPos:=curPos;
      ItemValue1:=HTMLValues2(HTML,'<span class="ipc-metadata-list-item__label ipc-metadata-list-item__label--btn" aria-label="See more" aria-disabled="false">Nicknames</span>','</li></ul></div','<span class="ipc-metadata-list-item__list-content-item" aria-disabled="false">','</span>',',  ',EndPos);
      //LogMessage('    *    Parse Results Nickname1:'+ItemValue0+'||');
      //ItemValue0:=StringReplace(ItemValue0,'                                    See more                                »','',True,False,True);   
      If ItemValue1 <> '' then ItemList:=ItemList+#13+'Nickname:  '+ItemValue1+'  ';
      LogMessage('      Parse Results Nickname:'+ItemValue1+'||');
   End;
   //*)   
   
   ItemList:=ItemList+AwardsValue;   
   ItemList:=ItemList+#13+'--------------------------------------------------------------------------';
   
   //(*   
   //Get ~Biography URL~
   //http://www.imdb.com/name/nm0002031/bio?ref_=nm_ql_pdtls_1
   EndPos:=Pos('">Biography</a></li>',HTML);
   If endPos>0 Then Begin
      ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/bio">Biography</link>';
      If ItemValue0 <> '' then ItemList:=ItemList+#13+ItemValue0;
      //LogMessage('      Parse Results Biography URL:'+ItemValue0+'||');
   End;   
   //*)
   //(*   
   //Get ~Awards URL~
   //http://www.imdb.com/name/nm0002031/awards?ref_=nm_ql_op_1 
   //EndPos:=Pos('">Awards</a></li>',HTML);
   curPos:=Pos('">Awards</a></li>',HTML);
   //If endPos>0 Then Begin
   If curPos>0 Then Begin
      ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/awards">Awards</link>';
      If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
      LogMessage('      Parse Results Awards URL:'+ItemValue0+'||');
   End;   
   //*)   
   //(*   
   //Get ~External Sites URL~
   //http://www.imdb.com/name/nm0002031/externalsites?ref_=nm_ql_rel_3
   EndPos:=Pos('">External sites</a>',HTML);
   //If endPos>0 Then Begin
      ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/externalsites">External Sites</link>';
      //If ItemValue0 <> '' then
      ItemList:=ItemList+#32#32+ItemValue0;
      //LogMessage('      Parse Results External Sites URL:'+ItemValue0+'||');
   //End;
   //*)
   //(*   
   //Get ~Genre index URL~   //http://www.imdb.com/filmosearch/?sort=moviemeter&explore=genres&role=nm0005455&ref_=nm_ql_flmg_4
   //https://www.imdb.com/search/title/?explore=genres&role=nm0005455
   EndPos:=Pos('">by Genre</a>',HTML);
   //If endPos>0 Then Begin
      ItemValue0:='<link url="http://www.imdb.com/filmosearch/?sort=moviemeter&explore=genres&role='+PersonID+'">Genres</link>';
      //If ItemValue0 <> '' then
      ItemList:=ItemList+#32#32+ItemValue0;
      //If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
      //LogMessage('      Parse Results Genre URL:'+ItemValue0+'||');
   //End;
   //*)      
   //(*   
   //Get ~Photo Gallery URL~
   //http://www.imdb.com/name/nm0002031/mediaindex?ref_=nm_ql_pv_1
   EndPos:=Pos('<h3 class="ipc-title__text">Photos<',HTML);
   If endPos>0 Then Begin
      ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/mediaindex/">Photo Gallery</link>';
      If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
   //   LogMessage('      Parse Results Photo Gallery URL:'+ItemValue0+'||');
   End;      
   //*)   
   //(*
   //Get ~Filmography  URL~
   //http://m.imdb.com/name/nm0002031/filmotype
   //https://m.imdb.com/name/nm0002031/?showAllCredits=true
   //curPos:=Pos('<h3 class="ipc-title__text"><span>Credits</span></h3>',HTML);
   curPos:=Pos('<h3 class="ipc-title__text"><span id="credits">Credits</span>',HTML);
   If curPos>0 Then Begin
      //ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/fullcredits">Filmography</link>';
      ItemValue0:='<link url="http://www.imdb.com/name/'+PersonID+'/?showAllCredits=true">Filmography</link>';
      If ItemValue0 <> '' then ItemList:=ItemList+#32#32+ItemValue0;
      //LogMessage('      Parse Results Filmography URL:'+ItemValue0+'||');
   End;   
   //*)         
   //(*   
   ItemList:=ItemList+#13+'--------------------------------------------------------------------------'+#13+SCRIPT_NAME+' on '+DateToStr(CurrentDateTime)+' at '+TimeToStr(CurrentDateTime);
   If (Length(ItemList)>0) Then Begin
        AddFieldValueXML('comment',ItemList);
        //LogMessage('      Get result Filmography  -  Career:'+ItemList+'||');
    End;   
    //*)
 
    //Get ~dateadded~ Not used. Calculated in PVdB.
    //Get ~orating~ Not documented in PVB Scripting Manual and in the script don't work even working in the skin.
    //Get ~transname~ TranslateName. The PVdB ~transname~ Translated Name not stored in IMDB. Used for PhotoURL
   
//(*    
    LogMessage('Function ParsePage_IMDBPersonBASE END=====================||');
   LogMessage('ParsePage_IMDBPersonBASE: Ending processing.');
  End; //BlockClose
//*)

In the comment field it then looks like this.

Quote
JobTitle:  Producer,  Writer,  Actor 
Filmography  -  Career:  Additional Crew,  Soundtrack,  Director,  Self,  Thanks,  Archive Footage 
<link url="http://www.imdb.com/name/nm0005455">Main Page</link>
PID ID:  79324
People ID:  nm0005455           
Name:  Aaron Spelling  († 1923-2006) 
Born:  April 22, 1923 in Dallas, Texas, USA
Died:  June 23, 2006 in Los Angeles, California, USA  (complications following a stroke)
Alternate Names:  Aaron & Candy
Shelly Colbert 
Height:  1.65 m 
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0005455/awards/">Awards link</link> •• Won 2 Primetime Emmys • 14 wins & 11 nominations total ••
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0005455/bio">Biography</link>  <link url="http://www.imdb.com/name/nm0005455/awards">Awards</link>  <link url="http://www.imdb.com/name/nm0005455/externalsites">External Sites</link>  <link url="http://www.imdb.com/filmosearch/?sort=moviemeter&explore=genres&role=nm0005455">Genres</link>  <link url="http://www.imdb.com/name/nm0005455/mediaindex/">Photo Gallery</link>  <link url="http://www.imdb.com/name/nm0005455/?showAllCredits=true">Filmography</link>
--------------------------------------------------------------------------
IMDB_People_[EN][Selenium]-v3.2 on 2025-01-09 at 13:29:34
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #3 on: January 09, 2025, 08:16:32 pm »
Function ParsePage_IMDBPeopleBIO  changes

And here is also part of the code change

Quote
//(*
Function ParsePage_IMDBPeopleBIO(HTML:String):Cardinal; //BlockOpen
    //Returns:
    //     Result:=prFinished; Script has finished gathering data
    //     Result:=prError; If żany big problem? with exit;
    //Retrieve: ~bio~ Biography from "Mini Bio" IMDB section
  Var
    curPos,endPos,debug_pos1:Integer;
    ItemValue:String;
    PersonID,ItemValue0,ItemValue10,ItemValue1,ItemValue11:String;
   ItemList,ItemList00,ItemList0,ItemList1,ItemList11,ItemList12:String;
   FinalValue: String;
   ItemList2,ItemList10,ItemList20,ItemValue3:String;
  Begin
    LogMessage('ParsePage_IMDBPeopleBIO: Starting processing.');
   LogMessage('HTML length: ' + IntToStr(Length(HTML)));
    LogMessage('Function ParsePage_IMDBPeopleBIO BEGIN=====================||');       
    Result:=prFinished;  //It will change to prError if any big problem with exit;
   
      LogMessage('Result set to prFinished');  //Log the initial result setting
   
      //(*
      //Get "Biography" info
      curPos:=Pos('<h1 class="ipc-title__text">Biography</h1>',HTML);      //Strings start which opens the block content data. WEB_SPECIFIC
      if (curPos=0) then Exit;    
      //*)   
   
      ItemList2:='';
      ItemList11:='';
      //*)   
   
      ItemList2:='';
      ItemList11:='';
      //(*   
        //Get PersonID
        //LogMessage('Attempting to find PersonID');
        PersonID := TextBetWeenFirst(HTML, '<link rel="canonical" href="https://', '/">');  //WEB_SPECIFIC   
        if (Length(PersonID) > 2) then begin
          ItemList2 := '--------------------------------------------------------------------------'+#13+'<link url="http://' + PersonID + '/#overview">Biography Info</link>';
            //ItemList2 := '--------------------------------------------------------------------------'+#13+'<link url="http://www.imdb.com/name/' + PersonID + '/bio/#overview">Biography Info</link>';
            LogMessage('Get result PersonID: ' + PersonID + '||');
        end else begin
            LogMessage('Error: PersonID not found');
            Result := prError; //Set the result to error if PersonID is not found
        end;   
      //*)   
      //(*
        //Get "Biography" info
        LogMessage('Attempting to find Biography section');
        curPos := Pos('<div data-testid="sub-section-mini_bio"', HTML);  //Updated to reflect new layout
        if (curPos = 0) then Begin
            LogMessage('Error: Biography section not found');
            Result := prError; //Set the result to error if the section is not found
            Exit;
        End;
        endPos := Pos('</ul>', Copy(HTML, curPos, Length(HTML) - curPos + 1)) + curPos - 1;
        if endPos = curPos - 1 then Begin
            LogMessage('Error: End of Biography section not found');
            Result := prError; //Set the result to error if the section is not found
            Exit;
        End;
        ItemList0 := Copy(HTML, curPos, endPos - curPos + Length('</ul>')); //Include </ul> in the end position
        LogMessage('Biography section found');

        //Extract "Mini bio" Biography text
        LogMessage('Extracting Mini Bio text:');
        curPos := Pos('<div class="ipc-html-content-inner-div" role="presentation">', ItemList0);  //Updated to reflect new layout
        LogMessage('curPos for Mini Bio set to: ' + IntToStr(curPos));
        if curPos > 0 then Begin
            endPos := Pos('</ul>', Copy(ItemList0, curPos, Length(ItemList0) - curPos + 1)) + curPos - 1; //Update to match exact structure
            LogMessage('endPos for Mini Bio set to: ' + IntToStr(endPos));
            if endPos > curPos Then Begin
                ItemValue := Trim(Copy(ItemList0, curPos, endPos - curPos + Length('</ul>')));
               
                //Normalize whitespace but keep empty lines
                ItemValue := StringReplace(ItemValue, #13#10, #10, True, True, False); //Normalize line endings
                ItemValue := StringReplace(ItemValue, #13, #10, True, True, False);
                ItemValue := StringReplace(ItemValue, #10#10, #13#10#13#10, True, True, False); //Preserve empty lines
                ItemValue := StringReplace(ItemValue, #10, ' ', True, True, False);
                ItemValue := StringReplace(ItemValue, #13#10#13#10, #10#10, True, True, False); //Revert empty line placeholders
                While Pos('  ', ItemValue) > 0 Do
                    ItemValue := StringReplace(ItemValue, '  ', ' ', True, True, False);

                //Transform links
                ItemValue := StringReplace(ItemValue, '<a class="ipc-md-link ipc-md-link--entity" href="', '<link url="http://www.imdb.com', True, True, False);
                ItemValue := StringReplace(ItemValue, '/?ref_=nmbio_mbio">', '/">', True, True, False);
                ItemValue := StringReplace(ItemValue, '</a>', '</link>', True, True, False);

                //Remove unwanted tags
                ItemValue := StringReplace(ItemValue, '<div class="ipc-html-content-inner-div" role="presentation">', '', True, True, False);
                ItemValue := StringReplace(ItemValue, '<div class="ipc-html-content ipc-html-content--base ipc-metadata-list-item-html-item" role="presentation">', '', True, True, False);
                ItemValue := StringReplace(ItemValue, '</div>', '', True, True, False);
                ItemValue := StringReplace(ItemValue, '</ul>', '', True, True, False);
            
            If Not(BIO_URL_IN_BIO) then ItemValue:=RemoveTagsEx00(ItemValue);   
            If Not(BIO_URL_IN_BIO) then ItemValue:=StringReplace(ItemValue,'</link>','',True,True,False);
               
                If ItemValue <> '' then ItemList := ItemValue;
            
            //LogMessage('      Get result bio (from Mini bio)002:'+ItemList+'||');            
            If ItemList <> '' then ItemList11:=ItemList11+ItemList;
            
            End Else LogMessage('Error: End position not found for Mini Bio');
        End Else LogMessage('Error: Start position not found for Mini Bio');
      //(*
        //Extract the final "IMDb Mini Biography By: ..." value and clean tags
        If Pos('- IMDb Mini Biography By:', ItemList0) > 0 Then Begin
            curPos := Pos('- IMDb Mini Biography By:', ItemList0);
            endPos := Pos('</div>', Copy(ItemList0, curPos, Length(ItemList0) - curPos + 1)) + curPos - 1;
            FinalValue := Copy(ItemList0, curPos, endPos - curPos + Length('</div>'));

            //Clean surrounding tags without using RemoveTags
            FinalValue := StringReplace(FinalValue, '<div class="ipc-html-content-inner-div" role="presentation">', '', True, True, False);
            FinalValue := StringReplace(FinalValue, '<div class="ipc-html-content ipc-html-content--base ipc-metadata-list-item-html-item" role="presentation">', '', True, True, False);
            FinalValue := StringReplace(FinalValue, '</div>', '', True, True, False);
            FinalValue := StringReplace(FinalValue, '</ul>', '', True, True, False);
            
            //Append the final value to ItemList only if it's not already present
            If Pos(FinalValue, ItemList) = 0 Then Begin
                If Length(ItemList) > 0 Then
                    ItemList := ItemList + ' ' + FinalValue
                Else
                    ItemList := FinalValue;
            End;                        
         LogMessage('   *   Get result bio (from Mini bio)002:'+ItemList+'||');                  
         If Not(IMDB_MINI_IN_BIO) then   
         curPos:=Pos('- IMDb Mini',ItemList);
         if curPos >0 then ItemList := Copy(ItemList,0,curPos-1);
         LogMessage('      Get result bio (from Mini bio) a:'+ItemList+'||');                     
         LogMessage('      Get result bio (from Mini bio):'+ItemList+'||');         
         If ItemList <> '' then ItemList11:=ItemList11+ItemList;
        End;
      //*)   

        //AddFieldValueXML('bio', ItemList);
        //LogMessage('Added ItemList to XML: ' + ItemList);

         
      If (ItemList11 <> '') AND (ItemList2 <> '') Then 
         //ItemList12:=ItemList11;   
         ItemList12:=ItemList11+#13+ItemList2;
      
      //Get "Birth name" Biography text
      ItemList00:='';
      //ItemList10:=TextBetWeenFirst(HTML,'" data-testid="title"><hgroup><h1 class="ipc-title__text"','<h3 class="ipc-title__text"><span>Contribute to this page</span></h3>');
      curPos := PosFrom('<h3 class="ipc-title__text"><span id="overview">Overview', HTML,curPos);
      EndPos:=PosFrom('</div></section>',HTML,curPos);
      ItemList00:=Copy(HTML,curPos,endPos-curPos);   
      //LogMessage('  ** Parse Biography '+#13+ItemList00+' **');   
      //(*   
      If (Length(ItemList00)>0) Then Begin
         ItemValue10:=TextBetWeenFirst(ItemList00,'<li role="presentation" class="ipc-metadata-list__item" id="name" data-testid="list-item"><span class="ipc-metadata-list-item__label" aria-disabled="false">Birth name</span>','</div></div></div></li>');
         //if BIRTH_NAME_IN_TRANSNAME then
            //if ItemValue10 <> '' then
            //AddFieldValueXML('transname',ItemValue10);         
         If ItemValue10 <> '' then LogMessage('      Get result from Birth Name02:'+ItemValue10+'||');
         If ItemValue10 <> '' then ItemValue10:='BirthName:  '+ItemValue10;
      If ItemValue10 <> '' then ItemList12:=ItemList12+#13+'--------------------------------------------------------------------------'+#13+ItemValue10;      
      End;   
      //*)         
               
      If BIO_INFO_IN_BIO then AddFieldValueXML('bio',ItemList12);
               
      If Not(BIO_INFO_IN_BIO) Then AddFieldValueXML('bio',ItemList11);   
          
   Result := prFinished;
 
    LogMessage('Function ParsePage_IMDBPeopleBIO END=====================||');
   LogMessage('ParsePage_IMDBPeopleBIO: Ending processing.');
  End; //BlockClose
//*)       
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #4 on: January 09, 2025, 08:25:52 pm »
If these changes are applied to the Function ParsePage_IMDBPeopleBIO code, then the bio field will look like the one described below.

If these settings are in use

Quote
  GET_FULL_BIO  = True ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //GET_FULL_BIO  = False ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //BIO_INFO_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
  BIO_INFO_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
  //BIO_URL_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  BIO_URL_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  //IMDB_MINI_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  IMDB_MINI_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.

then in the bio field it then looks like this.

Quote
Andrea Barber was born on July 3, 1976 in Los Angeles, California, USA. She is an actress and writer, known for Polna hiša (1987), Fuller House (2016) and Days of Our Lives (1965). She was previously married to Jeremy Rytky.

If these settings are in use

Quote
  GET_FULL_BIO  = True ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //GET_FULL_BIO  = False ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  BIO_INFO_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
  //BIO_INFO_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
  //BIO_URL_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  BIO_URL_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing
  //IMDB_MINI_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  IMDB_MINI_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.

then in the bio field it then looks like this.

Quote
Andrea Barber was born on July 3, 1976 in Los Angeles, California, USA. She is an actress and writer, known for Polna hiša (1987), Fuller House (2016) and Days of Our Lives (1965). She was previously married to Jeremy Rytky.
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0053347/bio/#overview">Biography Info</link>
--------------------------------------------------------------------------
BirthName:  Andrea Laura Barber


If these settings are in use

Quote
  GET_FULL_BIO  = True ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  //GET_FULL_BIO  = False ;  //Download Biography provider page for retreive the info. Otherwise only the info of the principal peple page.   
  BIO_INFO_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
  //BIO_INFO_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
  BIO_URL_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  //BIO_URL_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
  //IMDB_MINI_IN_BIO  = True ; //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  IMDB_MINI_IN_BIO  = False ; //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.

then in the bio field it then looks like this.

Quote
Andrea Barber was born on July 3, 1976 in Los Angeles, California, USA. She is an actress and writer, known for <link url="http://www.imdb.com/title/tt0092359/">Polna hiša (1987)</link>, <link url="http://www.imdb.com/title/tt3986586/">Fuller House (2016)</link> and <link url="http://www.imdb.com/title/tt0058796/">Days of Our Lives (1965)</link>. She was previously married to Jeremy Rytky.
--------------------------------------------------------------------------
<link url="http://www.imdb.com/name/nm0053347/bio/#overview">Biography Info</link>
--------------------------------------------------------------------------
BirthName:  Andrea Laura Barber
« Last Edit: January 09, 2025, 11:46:44 pm by Ivek23 »
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 586
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #5 on: January 10, 2025, 02:14:32 am »
Great thanks! It's good to clean the code and to fix additional things in order to revive them!



I have some dilemmas about Alternative Names, though:


Is there any specific reason you changed ItemValue1 to AltNames1?


I also don't get it what var AltNames is used for and where?

One thing I am not sure about too: how this work in the line you posted?
Quote
If AltNames1 <> '' then AddFieldValueXML('AltNames', ItemValue1);

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #6 on: January 10, 2025, 07:31:51 am »
Great thanks! It's good to clean the code and to fix additional things in order to revive them!



I have some dilemmas about Alternative Names, though:


Is there any specific reason you changed ItemValue1 to AltNames1?


I also don't get it what var AltNames is used for and where?

One thing I am not sure about too: how this work in the line you posted?
Quote
If AltNames1 <> '' then AddFieldValueXML('AltNames', ItemValue1);

I apologize, I forgot to make the change, it's correct like this.

Quote
If AltNames1 <> '' then AddFieldValueXML('AltNames', AltNames1);

AltNames or AltNames1 is named like this because I have more code for Alternative Names in different parts of the script in Function ParsePage_IMDBPersonBASE for better transfer of information to the altnames field and comment field.

You can also use ItemValue1 in this code, as you wish.

Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 586
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #7 on: January 10, 2025, 09:06:49 am »
Thanks!


If someone (like me) wants bio field to look only like this (without Biography link due to redudancy in comment field):


Quote


 In 2015, he played a journalist in <link url="http://www.imdb.com/title/tt1895587/">Под лупом (2015)</link>, which, like Birdman, won the Academy Award for Best Picture. In 2016, he starred as <link url="http://www.imdb.com/name/nm4669566/">Ray Kroc</link>, the developer of McDonald's, in the drama <link url="http://www.imdb.com/title/tt4276820/">Osnivač (2016)</link>.
 
 He is a visiting scholar at Carnegie Mellon University.
--------------------------------------------------------------------------
- IMDb Mini Biography By: firehouse44 and Pedro Borges
--------------------------------------------------------------------------
BirthName:  Michael John Douglas

--------------------------------------------------------------------------


Then it is needed to set Script options like this:


Quote
  BIO_INFO_IN_BIO                        = False ;   //Use the PVD field ~bio~ for not storing the person Biography Info Url link for Biography Pages.
   //BIO_INFO_IN_BIO                     = True ;   //Use the PVD field ~bio~ for storing the person Biography Info Url link for Biography Pages.
  BIO_URL_IN_BIO                        = True ;   //Use the PVD field ~bio~ for storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
   //BIO_URL_IN_BIO                          = False ;   //Use the PVD field ~bio~ for not storing the person Url's for Biography Info (Mini Bio) for Biography Pages.
   //IMDB_MINI_IN_BIO                               = True ;   //Use the PVD field ~bio~ for storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.
  IMDB_MINI_IN_BIO                            = False ;   //Use the PVD field ~bio~ for not storing the person IMDb Mini Biography letters for Biography Info (Mini Bio) for Biography Pages.



and function has to be


Quote

Function ParsePage_IMDBPeopleBIO(HTML:String):Cardinal; //BlockOpen
//Returns:
//     Result:=prFinished; Script has finished gathering data
//     Result:=prError; If any big problem with exit;
//Retrieve: ~bio~ Biography from "Mini Bio" IMDB section
Var
curPos,endPos,debug_pos1:Integer;
ItemValue:String;
PersonID,ItemValue0,ItemValue10,ItemValue1,ItemValue11:String;
ItemList,ItemList00,ItemList0,ItemList1,ItemList11,ItemList12:String;
FinalValue: String;
ItemList2,ItemList10,ItemList20,ItemValue3:String;
BirhNameValue: String;
Begin
LogMessage('ParsePage_IMDBPeopleBIO: Starting processing.');
LogMessage('HTML length: ' + IntToStr(Length(HTML)));
LogMessage('Function ParsePage_IMDBPeopleBIO BEGIN=====================||');
Result:=prFinished;  //It will change to prError if any big problem with exit;

LogMessage('Result set to prFinished');  //Log the initial result setting

//(*
//Get "Biography" info
curPos:=Pos('<h1 class="ipc-title__text">Biography</h1>',HTML);      //Strings start which opens the block content data. WEB_SPECIFIC
if (curPos=0) then Exit;   
//*)
//(*
ItemList2:='';
ItemList11:='';
//*)
(*
ItemList2:='';
ItemList11:='';
//*)
//(*   
//Get PersonID
//LogMessage('Attempting to find PersonID');
PersonID := TextBetWeenFirst(HTML, '<link rel="canonical" href="https://', '/">');  //WEB_SPECIFIC   
if (Length(PersonID) > 2) then begin
ItemList2 := '--------------------------------------------------------------------------'+#13+'<link url="http://' + PersonID + '/#overview">Biography Info</link>';
//ItemList2 := '--------------------------------------------------------------------------'+#13+'<link url="http://www.imdb.com/name/' + PersonID + '/bio/#overview">Biography Info</link>';
LogMessage('Get result PersonID: ' + PersonID + '||');
end else begin
LogMessage('Error: PersonID not found');
Result := prError;      //Set the result to error if PersonID is not found
end;   
//*)   
//(*
//Get "Biography" info
LogMessage('Attempting to find Biography section');
curPos := Pos('<div data-testid="sub-section-mini_bio"', HTML);         //Updated to reflect new layout
if (curPos = 0) then Begin
LogMessage('Error: Biography section not found');
Result := prError;      //Set the result to error if the section is not found
Exit;
End;
endPos := Pos('</ul>', Copy(HTML, curPos, Length(HTML) - curPos + 1)) + curPos - 1;
if endPos = curPos - 1 then Begin
LogMessage('Error: End of Biography section not found');
Result := prError;      //Set the result to error if the section is not found
Exit;
End;
ItemList0 := Copy(HTML, curPos, endPos - curPos + Length('</ul>'));    //Include </ul> in the end position
LogMessage('Biography section found');

//Extract "Mini bio" Biography text
LogMessage('Extracting Mini Bio text:');
curPos := Pos('<div class="ipc-html-content-inner-div" role="presentation">', ItemList0);      //Updated to reflect new layout
LogMessage('curPos for Mini Bio set to: ' + IntToStr(curPos));
if curPos > 0 then Begin
endPos := Pos('</ul>', Copy(ItemList0, curPos, Length(ItemList0) - curPos + 1)) + curPos - 1;      //Update to match exact structure
LogMessage('endPos for Mini Bio set to: ' + IntToStr(endPos));
if endPos > curPos Then Begin
ItemValue := Trim(Copy(ItemList0, curPos, endPos - curPos + Length('</ul>')));
               
//Normalize whitespace but keep empty lines
ItemValue := StringReplace(ItemValue, #13#10, #10, True, True, False);      //Normalize line endings
ItemValue := StringReplace(ItemValue, #13, #10, True, True, False);
ItemValue := StringReplace(ItemValue, #10#10, #13#10#13#10, True, True, False);      //Preserve empty lines
ItemValue := StringReplace(ItemValue, #10, ' ', True, True, False);
ItemValue := StringReplace(ItemValue, #13#10#13#10, #10#10, True, True, False);      //Revert empty line placeholders
While Pos('  ', ItemValue) > 0 Do
ItemValue := StringReplace(ItemValue, '  ', ' ', True, True, False);

//Transform links
ItemValue := StringReplace(ItemValue, '<a class="ipc-md-link ipc-md-link--entity" href="', '<link url="http://www.imdb.com', True, True, False);
ItemValue := StringReplace(ItemValue, '/?ref_=nmbio_mbio">', '/">', True, True, False);
ItemValue := StringReplace(ItemValue, '</a>', '</link>', True, True, False);

//Remove unwanted tags
ItemValue := StringReplace(ItemValue, '<div class="ipc-html-content-inner-div" role="presentation">', '', True, True, False);
ItemValue := StringReplace(ItemValue, '<div class="ipc-html-content ipc-html-content--base ipc-metadata-list-item-html-item" role="presentation">', '', True, True, False);
ItemValue := StringReplace(ItemValue, '<>', '', True, True, False);
ItemValue := StringReplace(ItemValue, '</ul>', '', True, True, False);

If Not(BIO_URL_IN_BIO) then ItemValue:=RemoveTagsEx00(ItemValue);   
If Not(BIO_URL_IN_BIO) then ItemValue:=StringReplace(ItemValue,'</link>','',True,True,False);

If ItemValue <> '' then ItemList := ItemValue;

//LogMessage('      Get result bio (from Mini bio)002:'+ItemList+'||');           
If ItemList <> '' then ItemList11:=ItemList11+ItemList;

End Else LogMessage('Error: End position not found for Mini Bio');
End Else LogMessage('Error: Start position not found for Mini Bio');
//(*
// Extract the final "IMDb Mini Biography By: ..." value and clean tags
If Pos('- IMDb Mini Biography By:', ItemList0) > 0 Then Begin
curPos := Pos('- IMDb Mini Biography By:', ItemList0);
endPos := Pos('<>', Copy(ItemList0, curPos, Length(ItemList0) - curPos + 1)) + curPos - 1;
FinalValue := Copy(ItemList0, curPos, endPos - curPos + Length('<>'));

// Clean surrounding tags without using RemoveTags
FinalValue := StringReplace(FinalValue, '<div class="ipc-html-content-inner-div" role="presentation">', '', True, True, False);
FinalValue := StringReplace(FinalValue, '<div class="ipc-html-content ipc-html-content--base ipc-metadata-list-item-html-item" role="presentation">', '', True, True, False);
FinalValue := StringReplace(FinalValue, '<>', '', True, True, False);
FinalValue := StringReplace(FinalValue, '</ul>', '', True, True, False);

// Remove existing occurrence of FinalValue from ItemList
ItemList := StringReplace(ItemList, FinalValue, '', True, True, False);

// Log the cleaned ItemList
//LogMessage('   *   Cleaned ItemList without FinalValue:' + ItemList + '||');

// Append the final value to ItemList only if it's not already present
If Pos(FinalValue, ItemList) = 0 Then Begin
If Length(ItemList) > 0 Then
ItemList := ItemList + #13#10 + '--------------------------------------------------------------------------'+ #13#10 + FinalValue
Else
ItemList := FinalValue;
End;

// Log the updated ItemList
//LogMessage('   *   Get result bio (from Mini bio)002:' + ItemList + '||');

If Not(IMDB_MINI_IN_BIO) then   
curPos:=Pos('- IMDb Mini',ItemList);
if curPos >0 then ItemList := Copy(ItemList,0,curPos-1);
//LogMessage('      Get result bio (from Mini bio) a:'+ItemList+'||');
//LogMessage('      Get result bio (from Mini bio):'+ItemList+'||');
If ItemList <> '' then ItemList11:=ItemList11+ItemList;
End;
//*)

//AddFieldValueXML('bio', ItemList);
        //LogMessage('Added ItemList to XML: ' + ItemList);

If (ItemList11 <> '') AND (ItemList2 <> '') Then
//ItemList12:=ItemList11;
ItemList12:=ItemList11+#13+ItemList2;

//Get "Birth name" Biography text
ItemList00:='';
//ItemList10:=TextBetWeenFirst(HTML,'" data-testid="title"><hgroup><h1 class="ipc-title__text"','<h3 class="ipc-title__text"><span>Contribute to this page</span></h3>');
curPos := PosFrom('<h3 class="ipc-title__text"><span id="overview">Overview', HTML,curPos);
EndPos:=PosFrom('<></section>',HTML,curPos);
ItemList00:=Copy(HTML,curPos,endPos-curPos);
//LogMessage('  ** Parse Biography '+#13+ItemList00+' **');
//(*
If (Length(ItemList00)>0) Then Begin
ItemValue10:=TextBetWeenFirst(ItemList00,'<li role="presentation" class="ipc-metadata-list__item" id="name" data-testid="list-item"><span class="ipc-metadata-list-item__label" aria-disabled="false">Birth name</span>','<><><></li>');
//if BIRTH_NAME_IN_TRANSNAME then
//if ItemValue10 <> '' then
//AddFieldValueXML('transname',ItemValue10);
If ItemValue10 <> '' then
//LogMessage('      Get result from Birth Name02:'+ItemValue10+'||');
If ItemValue10 <> '' then ItemValue10:='BirthName:  '+ItemValue10;
If ItemValue10 <> '' then ItemList12:=ItemList12+#13+'--------------------------------------------------------------------------'+#13+ItemValue10;
If ItemValue10 <> '' then BirhNameValue:='--------------------------------------------------------------------------' + #13#10 + ItemValue10 + #13#10 + '--------------------------------------------------------------------------'
End;   
//*)         
If BIO_INFO_IN_BIO then
AddFieldValueXML('bio', ItemList12)
Else If Not(BIO_INFO_IN_BIO) and BIO_URL_IN_BIO and (IMDB_MINI_IN_BIO) then
AddFieldValueXML('bio', ItemList + #13#10 + BirhNameValue)
Else
AddFieldValueXML('bio', ItemList11);


Result := prFinished;

LogMessage('Function ParsePage_IMDBPeopleBIO END=====================||');
End; //BlockClose
//*)


In v4 I'll probably in the comments fields leave only links, people id and PID and move the rest to bio where, to me, naturally belongs.
« Last Edit: January 10, 2025, 09:26:50 am by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2778
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #8 on: January 10, 2025, 12:25:26 pm »
No, that's not true, not everything belongs in the bio field, but only what was there now, because there is simply no other information on the bio pages and you can't import it anywhere else, only into the comment field.

For example, the proper name (for example Andrew Keegan (I) ), jobtitle, Filmography - Career or alternative names will not be found on the bio pages and the transfer of this information goes to the comment field.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 586
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #9 on: January 10, 2025, 02:59:55 pm »
Oh, I see! Thanks! I thought to update like AddFieldValueXML('bio', ItemList + #13#10 + CommentValuesForBio) for example and to pull them out from Base, or to merge functions... It was just an idea not thought through in details at all...

Or this concept:



to add additional argument to the Bio function and in ParsePage to sort that out when calling Bio function...


Quote
Function ParsePage(...);
Var
    HTML, CommentValuesForBio, BioResult: String;
Begin
    // Parse Base Page
    HTML := 'Base page HTML content';
    CommentValuesForBio := ParsePage_IMDBPersonBASE(HTML);

    // Parse Bio Page and combine with the 10th result from base page
    HTML := 'Bio page HTML content';
    BioResult := ParsePage_IMDBPeopleBIO(HTML, CommentValuesForBio);

    // Output or use the combined result
    LogMessage('Final Bio Result: ' + BioResult);
End;


and in ParseBio then:

Quote
Function ParsePage_IMDBPeopleBIO(HTML: String; CommentValuesForBio: String): Cardinal;
Var
    .....
Begin
    // Extract and process bio information
    BioList:= 'Bio result';

    // Combine with the 10th result from the base page
    BioItemListFinal := BioList+ #13#10 + CommentValuesForBio;

    // Return the combined result
    AddFieldValueXML('bio', BioItemListFinal)
End;

I don't know...
« Last Edit: January 10, 2025, 04:16:20 pm by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 586
    • View Profile
Re: New IMDb People v3 (Selenium) script comments
« Reply #10 on: January 15, 2025, 08:59:47 am »
No, that's not true, not everything belongs in the bio field, but only what was there now, because there is simply no other information on the bio pages and you can't import it anywhere else, only into the comment field.

For example, the proper name (for example Andrew Keegan (I) ), jobtitle, Filmography - Career or alternative names will not be found on the bio pages and the transfer of this information goes to the comment field.


Thanks to the new approach by speeding up the process with downloading pages in parallel, it makes it also actually pretty easy to achieve this goal too:


Quote
//Parse Biography provider page = BASE_URL_BIO_PERSON-----------------------------------------------------------------------
If (GET_FULL_BIO and Not(USE_SAVED_PVDCONFIG and (Copy(PVDConfigOptions, opBio, 1) = '0'))) Then Begin
    DownloadURL := StringReplace(BASE_URL_BIO_PERSON, '%IMDB_ID', PersonID, True, True, False);
    HTML := DownloadPageBio(FileNameBio);  //True page for parsing
    HTML := HTMLToText(HTML);
   CombinedHTML := HTML;
    DownloadURL := BASE_URL_PERSON_PRE_TRUE+ PersonID +BASE_URL_SUF;
    HTML := DownloadPageMain(FileNameMain);
    //LogMessage('HTML set to FileNameMain: ' + HTML);
    HTML := HTMLToText(HTML);
   CombinedHTML := CombinedHTML + HTML;
    ResultTmp := ParsePage_IMDBPeopleBIO(CombinedHTML);
    If Not(ResultTmp = prFinished) Then Begin
        Result := ResultTmp;
        Exit;
    End;


This makes it possible to theoretically combine whatever we want.

I have finished final, complete and definitive all-in-one IMDB_People_[EN][Selenium]-v4.0.psf with full search brought back, and now working with IMDB_Movie_[EN][Selenium]-v4.0.psf to get the same, and will publish both then.

 

anything