Author Topic: New FilmAffinity Script  (Read 510 times)

0 Members and 1 Guest are viewing this topic.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
New FilmAffinity Script
« on: November 22, 2024, 09:13:35 pm »
At least recently, FilmAffinity script parses Movie title custom field "title1" together with movietype when the title is not movie, but series, documentary or so. For example, for https://www.filmaffinity.com/en/film348275.html for the result after <h1> we were getting


Quote
Orson Welles: The One-Man Band                documentary

Instead


Quote
Orson Welles: The One-Man Band

only.


So, using AI, I corrected this, because I confirmed that in such cases there are exactly 16 spaces between the title and movie type. So I changed this in the script. Instead of this


Code: [Select]

//Get ~title~
    curPos:=1
    ItemValue:=TextBetWeen(HTML,'<h1 id="main-title">','</h1>',false,curPos);            //Strings which opens/closes the data. WEB_SPECIFIC
    AddCustomFieldValueByName('title1',ItemValue);
    LogMessage('      Get result title:'+ItemValue+'||');


I changed with this ( I choose to use >8 spaces just in case)
Code: [Select]

// Get ~title~
curPos := 1;


// Extract the section within <h1 id="main-title"> and </h1>
ItemValue := TextBetWeen(HTML, '<h1 id="main-title">', '</h1>', false, curPos);
LogMessage('Intermediate result after <h1>: ' + ItemValue + '||');


// Check and clean up any trailing content with 8 or more spaces
curPos := Pos('        ', ItemValue); // 8 spaces here between single quotes


if curPos > 0 then
begin
    // Move the cursor to cover any additional spaces
    // This loop handles any number of trailing spaces greater than or equal to 8
    while (curPos <= Length(ItemValue)) and (ItemValue[curPos] = '        ') do // 8 spaces here between single quotes
    begin
        curPos := curPos + 1;
    end;
   
    // Extract the title up to the first non-space character after the spaces
    ItemValue := Copy(ItemValue, 1, curPos - 1);
    LogMessage('Cleaned title result: ' + ItemValue + '||');
end
else
begin
    LogMessage('No extra trailing content found.');
end;


// Trim any leading or trailing whitespace
ItemValue := Trim(ItemValue);


// Add the title to the custom field
AddCustomFieldValueByName('title1', ItemValue);


// Log the final cleaned title for verification
LogMessage('Get result title: ' + ItemValue + '||');



It would be good Ivek to officially include it in FilmAffinity script if interested in, otherwise if someone is interested in, I can post my already customized script
« Last Edit: December 05, 2024, 12:16:30 am by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2746
    • View Profile
Re: FilmAffinity correction
« Reply #1 on: November 23, 2024, 03:42:20 pm »
It would be good Ivek to officially include it in FilmAffinity script if interested in
No interested.

otherwise if someone is interested in, I can post my already customized script
You can post script.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #2 on: November 23, 2024, 06:54:40 pm »
Thank you Ivek. Here is my script that corrects this.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #3 on: December 02, 2024, 11:11:12 pm »
Does anyone have an idea why this script wouldn't populate data, although it parses them (poster)? And, why it wouldn't parse some data at all (Actors, desciption). As I can tell, web layout hasn't change...


Code: [Select]

(12/2/2024 11:05:05 PM) Get result title: Ghostlight||
(12/2/2024 11:05:05 PM)       Get result poster:http://pics.filmaffinity.com/ghostlight-171637604-large.jpg||
(12/2/2024 11:05:05 PM)       Get result rating/orating (CF~FilmAffinity_Rating~):7.0||
(12/2/2024 11:05:05 PM)       Get result (~FilmAffinity_Votes~):122||
(12/2/2024 11:05:05 PM)       Get result origtitle:Ghostlight||
(12/2/2024 11:05:05 PM)       Get result year:2024||
(12/2/2024 11:05:05 PM)       Get result Original Runtime:110||
(12/2/2024 11:05:05 PM)       Get result Movie Features:Original Runtime: 110 min.<br>||
(12/2/2024 11:05:05 PM)       Get results country:United States||
(12/2/2024 11:05:05 PM)            Parse results List Directors:Alex Thompson, Kelly O'Sullivan||
(12/2/2024 11:05:05 PM)       Get results Directors:Alex Thompson||
(12/2/2024 11:05:05 PM)       Get results Directors:Kelly O'Sullivan||
(12/2/2024 11:05:05 PM)            Parse results List Writers:Kelly O'Sullivan||
(12/2/2024 11:05:05 PM)       Get results Writers:Kelly O'Sullivan||
(12/2/2024 11:05:05 PM)            Parse results List Composers:Quinn Tsan||
(12/2/2024 11:05:05 PM)       Get results Composers:Quinn Tsan||
(12/2/2024 11:05:05 PM)            Parse results List Actors:||
(12/2/2024 11:05:05 PM)       Get results studio:Little Engine, Runaway Train.  Distributor: IFC Films, Sapan Studios||
(12/2/2024 11:05:05 PM)            Parse results List Genre+Category:Drama.                 Comedy |                 Comedy-Drama.                 Stage Play||
(12/2/2024 11:05:05 PM)            Parse results List Category:                 Comedy-Drama.                 Stage Play||
(12/2/2024 11:05:05 PM)       Get results Category:Comedy-Drama,Stage Play||
(12/2/2024 11:05:05 PM)            Parse results List Genre:Drama.                 Comedy ||
(12/2/2024 11:05:05 PM)       Get results Genre:Drama,Comedy||
(12/2/2024 11:05:05 PM)       Get result description:||
(12/2/2024 11:05:05 PM) Function ParsePage NORMAL END======================|
(12/2/2024 11:05:05 PM)     Provider data info retreived Ok in 2024-12-02 23:05:05| (~Updated~)
(12/2/2024 11:05:05 PM) Function ParsePage smNormal END======================|
(12/2/2024 11:05:05 PM) GET: http://pics.filmaffinity.com/ghostlight-171637604-large.jpg
(12/2/2024 11:05:05 PM) Redirected to: https://pics.filmaffinity.com/ghostlight-171637604-large.jpg


Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #4 on: December 02, 2024, 11:56:23 pm »
Using AI, I've managed to parse actors now, but cannot populate them to PVD:

Code: [Select]

// Get ~Actors~
curPos := 1;
ItemList := TextBetWeen(HTML, '<dt>Cast</dt>', '</dd>', false, curPos);
LogMessage('Parsed results List Actors: ' + ItemList + '||');


// Extract actor names
while Pos('<li class="nb" itemprop="actor"', ItemList) > 0 do
begin
    curPos := Pos('<li class="nb" itemprop="actor"', ItemList);
    ItemSubList := TextBetWeen(ItemList, '<div class="name" itemprop="name">', '<>', false, curPos);
    ItemValue := Trim(ItemSubList);
   
    // Adding actor to database
    LogMessage('Adding Actor: ' + ItemValue);
    AddMoviePerson(ItemValue, '', '', '', ctActors);
    LogMessage('Added Actor: ' + ItemValue);
   
    // Move to the next actor
    ItemList := Copy(ItemList, curPos + Length('<li class="nb" itemprop="actor"'), Length(ItemList));
end;





Quote
(12/2/2024 11:27:23 PM) Parsed results List Actors: Keith Kupferer Dolly De Leon Katherine Mallen Kupferer Tara Mallen Hanna Dworkin Tommy Rivera-Vega Alma Washington H.B. Ward Dexter Zollicoffer Deanna Dunagan Francis Guinan David Bianco Matthew C. Yee See all credits||


Scriptnig procedure:
Quote

procedure AddMoviePerson(Name, TransName, Role, URL : String; const AType : Byte);

Adds movie credits (actor, director, etc.).

Parameters:
Name   name of the person
TransName   translated name of the person
Role   role of the person (if applicable)
URL   web address of the person (if available)
AType   credits type


Possible values for AType variable:
0   actor
1   director
2   writer
3   composer
4   producer
is followed, so I have no idea for now what it could be?

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #5 on: December 03, 2024, 03:29:30 am »
This is the best I could achieve to get back actors to the field:



Code: [Select]
   // Get ~Actors~ (Only name in Web from !file! list)
    curPos := 1;
    ItemList := TextBetWeen(HTML, '<dt>Cast</dt>', '</dd>', false, curPos); // Strings which opens/closes the data. WEB_SPECIFIC
    LogMessage('Parsed results List Actors: ' + ItemList + '||');


    // Remove " See all credits" if it exists
    curPos := Pos(' See all credits', ItemList);
    if curPos > 0 then
        ItemList := Copy(ItemList, 1, curPos - 1);
    LogMessage('Cleaned ItemList: ' + ItemList);


    // Forward the plain text to the field
    AddMoviePerson(ItemList, '', '', '', 0); // Use 0 for actors
    LogMessage('Added All Actors: ' + ItemList);


The problem is because function TextBetween removes all tags from the resulting substring, and since the actors are separated by spaces only on FA when parsed, it's impossible to distinguish between individual names by space only as delimiter, thus impossible to generate comma separated list.
« Last Edit: December 03, 2024, 07:17:42 am by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2746
    • View Profile
Re: FilmAffinity correction
« Reply #6 on: December 03, 2024, 10:40:49 am »
This is the best I could achieve to get back actors to the field:
.
.
.
The problem is because function TextBetween removes all tags from the resulting substring, and since the actors are separated by spaces only on FA when parsed, it's impossible to distinguish between individual names by space only as delimiter, thus impossible to generate comma separated list.

Quote
     //Get ~Actors~ (Only name in Web from !file! list)(Separated with ',' and with spaces).
   curPos:=Pos('<dt>Cast</dt>',HTML);
   //curPos:=curPos+Length('<dt>Cast</dt>');                                                    //Strings end which opens the block content data.  WEB_SPECIFIC
   curPos:=PosFrom('<div class="cast-wrapper">',HTML,curPos);                                                    //Strings end which opens the block content data.  WEB_SPECIFIC   
   curPos:=curPos+Length('<div class="cast-wrapper">');                                                    //Strings end which opens the block content data.  WEB_SPECIFIC
   endPos:=PosFrom('</dd>',HTML,curPos);                                            //Strings which opens/closes the data. WEB_SPECIFIC
    //ItemList:=TextBetWeenFirst(HTML,'<div class="cast-wrapper"','</dd>');    //Strings which opens/closes the data. WEB_SPECIFIC
   ItemList:=Copy(HTML,curPos,endPos-curPos);
    LogMessage('           Parse results List Actors:'+ItemList+'||');
   ItemList:=StringReplace(ItemList,'</a>','|',True,True,False);                      //WEB_SPECIFIC   
   ItemList:=StringReplace(ItemList,'See all credits','',True,True,False);                      //WEB_SPECIFIC      
   ItemList:=RemoveTags(ItemList, False);
/LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
    ExplodeString(ItemList,ItemArray, '|');
    for index := Low(ItemArray) to High(ItemArray) do begin
        ItemValue:=Trim(ItemArray[index]);
        AddMoviePerson(ItemValue,'','','',ctActors);
        LogMessage('      Get results Actors:'+ItemValue+'||');
    end;   

Try this code, it transferred the actors to the Actors field for me
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #7 on: December 03, 2024, 03:27:43 pm »
This is the best I could achieve to get back actors to the field:
.
.
.
The problem is because function TextBetween removes all tags from the resulting substring, and since the actors are separated by spaces only on FA when parsed, it's impossible to distinguish between individual names by space only as delimiter, thus impossible to generate comma separated list.

Quote
     //Get ~Actors~ (Only name in Web from !file! list)(Separated with ',' and with spaces).
   curPos:=Pos('<dt>Cast</dt>',HTML);
   //curPos:=curPos+Length('<dt>Cast</dt>');                                                    //Strings end which opens the block content data.  WEB_SPECIFIC
   curPos:=PosFrom('<div class="cast-wrapper">',HTML,curPos);                                                    //Strings end which opens the block content data.  WEB_SPECIFIC   
   curPos:=curPos+Length('<div class="cast-wrapper">');                                                    //Strings end which opens the block content data.  WEB_SPECIFIC
   endPos:=PosFrom('</dd>',HTML,curPos);                                            //Strings which opens/closes the data. WEB_SPECIFIC
    //ItemList:=TextBetWeenFirst(HTML,'<div class="cast-wrapper"','</dd>');    //Strings which opens/closes the data. WEB_SPECIFIC
   ItemList:=Copy(HTML,curPos,endPos-curPos);
    LogMessage('           Parse results List Actors:'+ItemList+'||');
   ItemList:=StringReplace(ItemList,'</a>','|',True,True,False);                      //WEB_SPECIFIC   
   ItemList:=StringReplace(ItemList,'See all credits','',True,True,False);                      //WEB_SPECIFIC     
   ItemList:=RemoveTags(ItemList, False);
/LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
    ExplodeString(ItemList,ItemArray, '|');
    for index := Low(ItemArray) to High(ItemArray) do begin
        ItemValue:=Trim(ItemArray[index]);
        AddMoviePerson(ItemValue,'','','',ctActors);
        LogMessage('      Get results Actors:'+ItemValue+'||');
    end;   

Try this code, it transferred the actors to the Actors field for me


Wow, thanks! I t works like a charm! Two small things: endPos has to be declared as additional String variable in this function, and there's missing "/" in
Quote
/LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');


so it should be commented out.
« Last Edit: December 03, 2024, 03:55:05 pm by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2746
    • View Profile
Re: FilmAffinity correction
« Reply #8 on: December 03, 2024, 05:25:06 pm »
Either like this

Quote
//LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');

or like this

Quote
LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');

It's up to you what you take, according to your wishes.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #9 on: December 04, 2024, 01:14:48 am »
You are right!


For my personal purposes I have slightly modified your snippet to:

Code: [Select]
AddMoviePerson(ItemValue,'','No Role On FilmAffinity','',ctActors);

in order to get each actor to a new line (maintaining the feel and look of IMDb parsing), while indicating roles aren't stated on FA. Next I'll try to implement PersonURL for FA, but not planning to create FA script for people, not in the near future.
« Last Edit: December 04, 2024, 01:18:35 am by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #10 on: December 04, 2024, 04:31:22 pm »
Here's updated code for "Description" field:

Code: [Select]
       // Get ~description~ from the HTML.
curPos := Pos('<dt>Synopsis</dt>', HTML);
if curPos > 0 then // Ensure the synopsis section is found
begin
    curPos := PosFrom('<dd class="" itemprop="description">', HTML, curPos); // Locate the synopsis description div
    curPos := curPos + Length('<dd class="" itemprop="description">'); // Adjust curPos to the end of the opening tag
    endPos := PosFrom('</dd>', HTML, curPos); // Locate the closing tag


    // Extract the relevant section of the HTML
    ItemValue := Copy(HTML, curPos, endPos - curPos);
    LogMessage('Parse results description: ' + ItemValue + '||');


    // Clean up and format the ItemValue
    ItemValue := Trim(StringReplace(ItemValue, '(FILMAFFINITY)', '', True, True, False)); // Remove specific tags or text
    ItemValue := RemoveTags(ItemValue, False);
    LogMessage('Cleaned description: ' + ItemValue + '||');


    // Add the cleaned description to the XML field
    AddFieldValueXML('description', ItemValue);
    LogMessage('Get result description: ' + ItemValue + '||');
end // Properly terminate the if block
else
begin
    LogMessage('Synopsis section not found.');end;

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: FilmAffinity correction
« Reply #11 on: December 04, 2024, 09:33:00 pm »
Does anyone have an idea why this script wouldn't populate data, although it parses them (poster)?

Unfortunately, FilmAffinity doesn't provide http links to pics any more:

Quote
C:\Users\user>curl -I http://pics.filmaffinity.com/bookworm-371036333-large.jpg
HTTP/1.1 301 Moved Permanently
Date: Wed, 04 Dec 2024 20:11:52 GMT
Content-Type: text/html
Content-Length: 167
Connection: keep-alive
Cache-Control: max-age=3600
Expires: Wed, 04 Dec 2024 21:11:52 GMT
Location: https://pics.filmaffinity.com/bookworm-371036333-large.jpg
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=Mjz%2B8pecaYxQxzpSkhAuFdftbG6ya0xXg%2BrjQQWzjZQmTEKL%2FGvFTGYbXU9HCka1CJMZPDOLkmSrCuhdqDJ%2BXbeorrMFwgP6FrKFpZZovX8btXvLWdYfrK6jjdS6zssZ4f3snbIHGA%3D%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 8ece72b65de05b67-VIE
server-timing: cfL4;desc="?proto=TCP&rtt=10158&min_rtt=10158&rtt_var=5079&sent=1&recv=3&lost=0&retrans=0&sent_bytes=0&recv_bytes=114&delivery_rate=0&cwnd=249&unsent_bytes=0&cid=0000000000000000&ts=0&x=0"


Here's updated snippet that provides poster only via Proxomitron:
Code: [Select]



//Get ~poster~
    ItemValue:=TextBetWeen(HTML,'<a class="lightbox" style="display: block;" href="','"',false,curPos);            //Strings which opens/closes the data.'https://pics.filmaffinity...-large.jpg. WEB_SPECIFIC
     //Unfortunately the http links for the pics are not available anymore. This snippet only via Proxomitron-like proxies
    AddImageURL(itPoster,ItemValue);
    LogMessage('      Get result poster:'+ItemValue+'||');

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: New FilmAffinity Script
« Reply #12 on: December 05, 2024, 12:23:14 am »
Using AI (will not address that in the future since it will be implied by default), I have done major rewrite and changes to FilmAffinity script. I will try to maintain and upgrade it in the future, at least for myself and will offer it to others here. Here are the changes since VVV last changed it in 2020.


Quote
CHANGE LOG :
           V 5.0.0.1-afrocuban (12/05/2024) afrocuban: Poster now available via HTTPS
           V 4.2.0.0-afrocuban (12/01/2024) afrocuban: Updated Actors and Description are now again available for import. Poster now available only via Proxomitron

Offline afrocuban

  • Moderator
  • *****
  • Posts: 537
    • View Profile
Re: New FilmAffinity Script
« Reply #13 on: December 15, 2024, 08:58:48 pm »
Here's my completely new FilmAffinity_[EN][HTTPS_Poster]_v6.1.psf script.


It includes major changes and additions.


CHANGE LOG:

           V 6.1.0.1-afrocuban (12/15/2024) afrocuban: MAJOR CHANGES INTRODUCED.

   Custom field types need to be strictly followed in order to be presented properly.

   Following custom/original fields added/changed respectively:

   • ~Studio~ section completely re-written:
1. Original "producer (human, category 4)" field now properly imports only when there is a Producer explicitly stated on FA.
2. Original "studio" (AField, value 8 ), populated now with studios and distributors concatenated. Co-productions (countries) and producer persons not included.
3. "FA Co-productions" Multiselect list custom field introduced for the first time.
4. "FA Producers"  Multiselect list custom field introduced for the first time.
5. "FA Studio" Multiselect list custom field introduced for the first time. This field inludes only studios, not Co-production (countires), producer person, nor Distributors.
6. "FA Distributors" Multiselect list custom field introduced for the first time.

   • ~Writers~ section completely re-written:
1. Original "writers (human, category 2)" field now properly imports only when there is a screenwriter explicitely stated on FA.
2. "FA Writers" Multiselect list custom field added for the same value as under 1.
3. "FA Script" Memo custom field introduced for the first time filled with everything else found in the Writers block ("novel:...", "book:...", etc).

   • New custom Memo field "FA Related". Related movies to the current one.
   • New custom Memo field "FA SimMov". Similar movies with % of similiraty.
   • New custom Memo field "FA Awards". Years linked to FA pages with all awards for that competition for that year.
   • New custom Memo field "FA Misc URLs". Trailers, Image gallery and Pro reviews links with the counts present for each three, for the current movie.
   • New custom Multiselect list field "FA Cinematography".
   • New custom Memo field "FA Ranking Position", underlinked to external top lists.
   • New custom Memo field "FA Ranking Lists Position", underlinked to external users' lists.
   • New custom Memo field "FA Critics" with Review, Authors and Magazines, each underlinked to external sources, when there is one.
   • New custom Memo field "FA Released By" added.
   • New custom Memo field "FA Release Date" added.
   • New custom Memo field "FA Lists" added.
   • New custom Multiselect list field "FA Genre" added.
   • New custom Multiselect list field "FA Category" added.
   • New custom Memo field "FA OrigTitle" added.
   • New custom Multiselect listfield "FA Year" added.
   • New custom Long Text field "FA length" added.
   • New custom Long Text field "FA features" added.
   • New custom Multiselect List field "FA Country" added.
   • New custom Multiselect List field "FA Directors" added.
   • New custom Multiselect List field "FA Writers" added.
   • New custom Multiselect List field "FA Composers" added.
   • New custom Memo field "FA Description" added.
   • New custom Memo field "FA Actors" added.


   • Two new functions introduced in the script:
1. "DownloadAndParseTrailerPage" Function added which constructs the URL for the trailer page, uses the DownloadPage Function to download it, and then calls a parsing function to process it.
2. "ParseTrailerPage" Function added to parse the downloaded trailer page HTML content.
3. "DownloadPage" Function modified to accept an output file name, by adding a parameter to specify the output file name, so that we can use this function to download both the main movie page and the trailer page. This modified function now accepts OutFile as a parameter to specify the output file name.
4. "ParsePage" Main Function modified to include the call to "DownloadAndParseTrailerPage" function.
Additional parsing page "downpage_trailer-UTF8_NO_BOM_FA.htm" is downloaded beside "downpage-UTF8_NO_BOM_FA.htm", but nothing still parsed since it contains only dynamic content which needs Selenium integration, because PVdBDownPage.exe can download only static content of html.

TO DO:
- Selenium integration to PVD.
- Logging to movie site throught the script.

           V 5.0.0.1-afrocuban (12/05/2024) afrocuban: Poster now available via HTTPS.

You can view how imported data looks in my new skins on this topic, and what is imported actually.

You can look for most recent custom fields from this topic's message and later, and use it for easier tracking which custom fields to add to your PVD