Author Topic: Imdb People script issues  (Read 848 times)

0 Members and 2 Guests are viewing this topic.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #20 on: December 28, 2024, 08:38:43 pm »
And probably FINALLY this one is a winner (have to try to tweak Recipient yet):





Quote
Function ParsePage_IMDBPeopleAWARDS(HTML: String): Cardinal;
Var
  curPos, endPos, awardPos, categoryPos, recipientPos, yearPos, eventEndPos, namePos: Integer;
  Event, Award, AwardName, Category, Recipient, Year: String;
  Won: Boolean;
Begin
  LogMessage('Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||');
  Result := prFinished;


  // Locate the start of the first event section
  curPos := Pos('<section class="ipc-page-section ipc-page-section--base">', HTML);
  While curPos > 0 Do Begin
    LogMessage('curPos after finding event section: ' + IntToStr(curPos));


    // Extract event name
    curPos := PosFrom('<span id="ev', HTML, curPos);
    If curPos = 0 Then Begin
      LogMessage('Event name not found');
      Break;
    End;
    curPos := PosFrom('>', HTML, curPos) + 1;
    endPos := PosFrom('</span>', HTML, curPos);
    Event := Trim(Copy(HTML, curPos, endPos - curPos));
    LogMessage('Parsed Event: ' + Event);


    // Move cursor to start processing awards within the event
    curPos := endPos;
    eventEndPos := PosFrom('<section class="ipc-page-section ipc-page-section--base">', HTML, curPos);
    If eventEndPos = 0 Then
      eventEndPos := Length(HTML);  // Set to the end of HTML if no more events


    // Process awards within the event
    While curPos < eventEndPos Do Begin
      // Find next award div within the current event
      awardPos := PosFrom('<li class="ipc-metadata-list-summary-item sc-15fc9ae6-1 gQbMPJ" data-testid="list-item">', HTML, curPos);
      If (awardPos = 0) Or (awardPos >= eventEndPos) Then Begin
        LogMessage('No more awards found in this event');
        Break;
      End;
      LogMessage('curPos after finding award div: ' + IntToStr(awardPos));


      // Extract entire award block
      curPos := awardPos;
      endPos := PosFrom('</li>', HTML, curPos);
      If endPos = 0 Then Begin
        LogMessage('No closing tag for award div found');
        Break;
      End;


      Award := Copy(HTML, curPos, endPos - curPos);
      curPos := endPos + Length('</li>');
      LogMessage('Award Content Extracted Successfully: ' + Award);


      // Extract year
      yearPos := PosFrom('<a class="ipc-metadata-list-summary-item__t"', Award, 1);
      If yearPos = 0 Then Begin
        LogMessage('Year not found');
        Continue;
      End;
      yearPos := PosFrom('>', Award, yearPos) + 1;
      endPos := PosFrom(' ', Award, yearPos);
      Year := Copy(Award, yearPos, endPos - yearPos);
      Year := Trim(Year);
      LogMessage('Parsed Year: ' + Year);


      // Determine if the award was won
      Won := PosFrom('Winner', Award, 1) > 0;
      If Won Then
        LogMessage('Parsed Won: True')
      Else
        LogMessage('Parsed Won: False');


      // Extract award name
      namePos := PosFrom('<span class="ipc-metadata-list-summary-item__tst">', Award, 1);
      If namePos > 0 Then Begin
        namePos := PosFrom('>', Award, namePos) + 1;
        endPos := PosFrom('</span>', Award, namePos);
        AwardName := Copy(Award, namePos, endPos - namePos);
        LogMessage('Parsed Award Name: ' + AwardName);
      End Else Begin
        LogMessage('Award Name not found');
        AwardName := '';
      End;


      // Extract category
      categoryPos := PosFrom('<span class="ipc-metadata-list-summary-item__li awardCategoryName" aria-disabled="false">', Award, 1);
      If categoryPos > 0 Then Begin
        categoryPos := PosFrom('>', Award, categoryPos) + 1;
        endPos := PosFrom('</span>', Award, categoryPos);
        Category := Copy(Award, categoryPos, endPos - categoryPos);
        LogMessage('Parsed Category: ' + Category);
      End Else Begin
        LogMessage('Category tag not found');
        Category := '';
      End;


      // Extract recipient
      recipientPos := PosFrom('<a class="ipc-metadata-list-summary-item__li ipc-metadata-list-summary-item__li--link"', Award, endPos + Length('</span>') + 1);
      If recipientPos > 0 Then Begin
        recipientPos := PosFrom('>', Award, recipientPos) + 1;
        endPos := PosFrom('</a>', Award, recipientPos);
        Recipient := Copy(Award, recipientPos, endPos - recipientPos);
        LogMessage('Parsed Recipient: ' + Recipient);
      End Else Begin
        LogMessage('Recipient tag not found');
        Recipient := '';
      End;


      // Add award to the database
      AddAward(Event, AwardName, Category, Recipient, Year, Won);
      If Won Then
        LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + AwardName + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=True')
      Else
        LogMessage('AddAward executed successfully: Event=' + Event + ', Award=' + AwardName + ', Category=' + Category + ', Recipient=' + Recipient + ', Year=' + Year + ', Won=False');
    End;


    // Move to the next event section
    curPos := eventEndPos;
    curPos := PosFrom('<section class="ipc-page-section ipc-page-section--base">', HTML, curPos);
  End;


  LogMessage('Function ParsePage_IMDBPeopleAWARDS END=====================||');
  Result := prFinished;
End;
//BlockClose


Here's the beginning of the log (first event is "Ariel Awards, Mexico" and the first award in it is "Golden Ariel") and the end of the log ("BOFA" is the last award of the last event of this page - "Brazil Online Film Award"):


Quote
(12/28/2024 8:35:07 PM) Function ParsePage_IMDBPeopleAWARDS BEGIN=====================||
(12/28/2024 8:35:07 PM) curPos after finding event section: 148924
(12/28/2024 8:35:07 PM) Parsed Event: Ariel Awards, Mexico
(12/28/2024 8:35:07 PM) curPos after finding award div: 149982
(12/28/2024 8:35:07 PM) Parsed Year: 2019
(12/28/2024 8:35:07 PM) Parsed Won: True
(12/28/2024 8:35:07 PM) Parsed Award Name:  Golden Ariel
(12/28/2024 8:35:07 PM) Parsed Category: Best Picture (Mejor Película)
(12/28/2024 8:35:07 PM) Recipient tag not found
(12/28/2024 8:35:07 PM) AddAward executed successfully: Event=Ariel Awards, Mexico, Award= Golden Ariel, Category=Best Picture (Mejor Película), Recipient=, Year=2019, Won=True
============= intermediate logs here
(12/28/2024 8:35:10 PM) AddAward executed successfully: Event=Premios Eres, Award= Premio Eres, Category=Best Picture (Mejor Película), Recipient=, Year=1993, Won=False
(12/28/2024 8:35:10 PM) No more awards found in this event
(12/28/2024 8:35:10 PM) curPos after finding event section: 1574223
(12/28/2024 8:35:10 PM) Parsed Event: Brazil Online Film Award
(12/28/2024 8:35:10 PM) curPos after finding award div: 1575285
(12/28/2024 8:35:10 PM) Parsed Year: 2019
(12/28/2024 8:35:10 PM) Parsed Won: True
(12/28/2024 8:35:10 PM) Parsed Award Name:  BOFA
(12/28/2024 8:35:10 PM) Parsed Category: Best Director
(12/28/2024 8:35:10 PM) Recipient tag not found
(12/28/2024 8:35:10 PM) AddAward executed successfully: Event=Brazil Online Film Award, Award= BOFA, Category=Best Director, Recipient=, Year=2019, Won=True
(12/28/2024 8:35:10 PM) No more awards found in this event
(12/28/2024 8:35:10 PM) Function ParsePage_IMDBPeopleAWARDS END=====================||
(12/28/2024 8:35:10 PM) After calling ParsePage_IMDBPeopleAWARDS
(12/28/2024 8:35:10 PM) Parsed awards page.
(12/28/2024 8:35:10 PM) Parsing awards page finished successfully.
(12/28/2024 8:35:10 PM)     Provider data info retrieved Ok on 2024-12-28 20:35:10|
(12/28/2024 8:35:10 PM) Function ParsePage smNormal END======================|
(12/28/2024 8:35:10 PM) Person -> LoadStatic -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadMultivalues -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadFilms -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadAwards -> 15ms
(12/28/2024 8:35:10 PM) Person -> LoadImages -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadStatic -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadMultivalues -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadFilms -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadAwards -> 0ms
(12/28/2024 8:35:10 PM) Person -> LoadImages -> 16ms


Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2751
    • View Profile
Re: Imdb People script issues
« Reply #21 on: December 29, 2024, 09:46:28 am »
Quote
      // Extract entire award block
      curPos := awardPos;
      endPos := PosFrom('</li>', HTML, curPos);
      If endPos = 0 Then Begin
        LogMessage('No closing tag for award div found');
        Break;
      End;


      Award := Copy(HTML, curPos, endPos - curPos);
      curPos := endPos + Length('</li>');
      LogMessage('Award Content Extracted Successfully: ' + Award);

Just change this part of the code above with this part of the code below and Recipient will work.

Quote
      // Extract entire award block
      curPos := awardPos;
      endPos := PosFrom('</div></div></li>', HTML, curPos);
      If endPos = 0 Then Begin
        LogMessage('No closing tag for award div found');
        Break;
      End;


      Award := Copy(HTML, curPos, endPos - curPos);
      curPos := endPos + Length('</div></div></li>');
      LogMessage('Award Content Extracted Successfully: ' + Award);

Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #22 on: December 29, 2024, 10:43:11 am »
Quote
      // Extract entire award block
      curPos := awardPos;
      endPos := PosFrom('</li>', HTML, curPos);
      If endPos = 0 Then Begin
        LogMessage('No closing tag for award div found');
        Break;
      End;


      Award := Copy(HTML, curPos, endPos - curPos);
      curPos := endPos + Length('</li>');
      LogMessage('Award Content Extracted Successfully: ' + Award);


Just change this part of the code above with this part of the code below and Recipient will work.

Quote
      // Extract entire award block
      curPos := awardPos;
      endPos := PosFrom('<><></li>', HTML, curPos);
      If endPos = 0 Then Begin
        LogMessage('No closing tag for award div found');
        Break;
      End;


      Award := Copy(HTML, curPos, endPos - curPos);
      curPos := endPos + Length('<><></li>');
      LogMessage('Award Content Extracted Successfully: ' + Award);


Is that for Recipient? Because everything else works except Recipient.
« Last Edit: December 29, 2024, 10:48:57 am by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #23 on: December 29, 2024, 10:57:30 am »
Ohhhhh, I seee now!!!! Award extracted didn't contain Recipient!!! Thank you I will try it later!
« Last Edit: December 29, 2024, 11:03:08 am by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #24 on: December 29, 2024, 11:49:44 am »
I can now confirm that parsing awards works completely.

What doesn't work is populating to database, at least for me. No award or event is populated, although everything is properly parsed. Here's the log for the person and page given above.

What that can be???

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2751
    • View Profile
Re: Imdb People script issues
« Reply #25 on: December 29, 2024, 02:41:24 pm »
This solo IMDB_People_[EN][HTTPS]_Awards 1 script with selenium transfers the awards data to the awards field without any problems. The regular IMDB_People_[EN][HTTPS]_Awards 2 script for the PVD MOD version also transfers the awards data to the awards field without any problems

The problem is in your IMDB_People_[EN][HTTPS]-Awards script, because this script did not add any awards data to the awards field for me either.

I already know where the problem is. For me, it transfers downpage-UTF8_NO_BOM.htm for the awards from the website, but for you, the problem is probably the parsing for the awards at the end of the script
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #26 on: December 29, 2024, 08:48:33 pm »
Thanks for the quick feedback, Ivek. I have 2 questions I'm puzzled with now.
1. What selenium script do you use to download person's page? Is it the same one for aka?
2. I tried to put Awards function to the beginning of the script, at the same place where it is in your scripts too: just after the Function ParsePage_IMDBSearchName, but nothing different happened, so it's not about that it looks, or I didn't understand your remark?


Now, I tried regular IMDB_People_[EN][HTTPS]_Awards 2 script for the PVD MOD version you posted, but even that one didn't populate any award to my database, unlike for you. So, it's probably something with my database then, what do you think?

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2751
    • View Profile
Re: Imdb People script issues
« Reply #27 on: December 29, 2024, 09:33:22 pm »
Thanks for the quick feedback, Ivek. I have 2 questions I'm puzzled with now.
1. What selenium script do you use to download person's page? Is it the same one for aka?

It is the same selenium script as for aka and I can download both the movies and people awards data with your awards code.

2. I tried to put Awards function to the beginning of the script, at the same place where it is in your scripts too: just after the Function ParsePage_IMDBSearchName, but nothing different happened, so it's not about that it looks, or I didn't understand your remark?

The problem is in the name of the file extension Tmp\UTF8_NO_BOM-Awards.mhtml. Change it to .htm and it should work.

Now, I tried regular IMDB_People_[EN][HTTPS]_Awards 2 script for the PVD MOD version you posted, but even that one didn't populate any award to my database, unlike for you. So, it's probably something with my database then, what do you think?

It's probably not your database, maybe you don't have the awards field marked, because it was the same for me until I checked the awards field in the settings, after that it transferred the awards data to the awards fields without any problems.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #28 on: December 29, 2024, 11:43:21 pm »

It is the same selenium script as for aka and I can download both the movies and people awards data with your awards code.
Thanks! Do you get all awards with it, or only "static" ones?
The problem is in the name of the file extension Tmp\UTF8_NO_BOM-Awards.mhtml. Change it to .htm and it should work.
Wait WHAT? It works THIS WAY, thanks a lot(!) but where and why on Earth that comes from? What htm or mhtml has to do with database at all?
It's probably not your database, maybe you don't have the awards field marked, because it was the same for me until I checked the awards field in the settings, after that it transferred the awards data to the awards fields without any problems.
You were right. Scripts were set to "- Set if empty" It works now, of course.


But, I'm still puzzled, not to say shocked about htm and mhtml...

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2751
    • View Profile
Re: Imdb People script issues
« Reply #29 on: December 30, 2024, 08:04:32 am »

It is the same selenium script as for aka and I can download both the movies and people awards data with your awards code.
Thanks! Do you get all awards with it, or only "static" ones?

A regular script only transfers "static" awards data, while with selenium it transfers all awards data.

The problem is in the name of the file extension Tmp\UTF8_NO_BOM-Awards.mhtml. Change it to .htm and it should work.

Wait WHAT? It works THIS WAY, thanks a lot(!) but where and why on Earth that comes from? What htm or mhtml has to do with database at all?

Your IMDB_People_[EN][HTTPS]-Awards script also uses the .htm extension for transferring all other data, except for transferring awards data, which file is not transferred to the Tmp folder by the script. This is what I meant when I mentioned the change at the end of the script, or an additional change is needed at the beginning of the script. For example, this needs to be added there.
Quote
BASE_DOWNLOAD_FILE_NO_BOM-AWARDS  = 'Tmp\UTF8_NO_BOM-Awards.mhtml';

The selenium script will also transfer the same if you do not change the extension in it.

It's probably not your database, maybe you don't have the awards field marked, because it was the same for me until I checked the awards field in the settings, after that it transferred the awards data to the awards fields without any problems.
You were right. Scripts were set to "- Set if empty" It works now, of course.


But, I'm still puzzled, not to say shocked about htm and mhtml...

Great that it works now.

About htm and mhtml it has already been mentioned above.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 546
    • View Profile
Re: Imdb People script issues
« Reply #30 on: December 30, 2024, 03:16:50 pm »

Unfortunately, I don't plan on working on any Imdb Awards section anymore for any updates or fixes to the movies or people code in Function ParsePage_IMDBMovieAWARDS. It's too complicated and completely inappropriate layout or notation of the Awards page source code to be able to edit it to properly record the Awards data.


Thanks Ivek. I will now continue to integrate everything in order to get fully revised and functional IMDB_People_[EN][HTTPS].psf script and will do my best to maintain it in the future...


For that, I prepared Chrome selenium script at:
Quote
http://www.videodb.info/forum_en/index.php?topic=4364.msg22706
« Last Edit: December 30, 2024, 03:48:39 pm by afrocuban »