Personal Video Database
English => Support => Topic started by: afrocuban on November 22, 2024, 09:13:35 pm
-
At least recently, FilmAffinity script parses Movie title custom field "title1" together with movietype when the title is not movie, but series, documentary or so. For example, for https://www.filmaffinity.com/en/film348275.html (https://www.filmaffinity.com/en/film348275.html) for the result after <h1> we were getting
Orson Welles: The One-Man Band documentary
Instead
Orson Welles: The One-Man Band
only.
So, using AI, I corrected this, because I confirmed that in such cases there are exactly 16 spaces between the title and movie type. So I changed this in the script. Instead of this
//Get ~title~
curPos:=1
ItemValue:=TextBetWeen(HTML,'<h1 id="main-title">','</h1>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
AddCustomFieldValueByName('title1',ItemValue);
LogMessage(' Get result title:'+ItemValue+'||');
I changed with this ( I choose to use >8 spaces just in case)
// Get ~title~
curPos := 1;
// Extract the section within <h1 id="main-title"> and </h1>
ItemValue := TextBetWeen(HTML, '<h1 id="main-title">', '</h1>', false, curPos);
LogMessage('Intermediate result after <h1>: ' + ItemValue + '||');
// Check and clean up any trailing content with 8 or more spaces
curPos := Pos(' ', ItemValue); // 8 spaces here between single quotes
if curPos > 0 then
begin
// Move the cursor to cover any additional spaces
// This loop handles any number of trailing spaces greater than or equal to 8
while (curPos <= Length(ItemValue)) and (ItemValue[curPos] = ' ') do // 8 spaces here between single quotes
begin
curPos := curPos + 1;
end;
// Extract the title up to the first non-space character after the spaces
ItemValue := Copy(ItemValue, 1, curPos - 1);
LogMessage('Cleaned title result: ' + ItemValue + '||');
end
else
begin
LogMessage('No extra trailing content found.');
end;
// Trim any leading or trailing whitespace
ItemValue := Trim(ItemValue);
// Add the title to the custom field
AddCustomFieldValueByName('title1', ItemValue);
// Log the final cleaned title for verification
LogMessage('Get result title: ' + ItemValue + '||');
It would be good Ivek to officially include it in FilmAffinity script if interested in, otherwise if someone is interested in, I can post my already customized script
-
It would be good Ivek to officially include it in FilmAffinity script if interested in
No interested.
otherwise if someone is interested in, I can post my already customized script
You can post script.
-
Thank you Ivek. Here is my script that corrects this.
-
Does anyone have an idea why this script wouldn't populate data, although it parses them (poster)? And, why it wouldn't parse some data at all (Actors, desciption). As I can tell, web layout hasn't change...
(12/2/2024 11:05:05 PM) Get result title: Ghostlight||
(12/2/2024 11:05:05 PM) Get result poster:http://pics.filmaffinity.com/ghostlight-171637604-large.jpg||
(12/2/2024 11:05:05 PM) Get result rating/orating (CF~FilmAffinity_Rating~):7.0||
(12/2/2024 11:05:05 PM) Get result (~FilmAffinity_Votes~):122||
(12/2/2024 11:05:05 PM) Get result origtitle:Ghostlight||
(12/2/2024 11:05:05 PM) Get result year:2024||
(12/2/2024 11:05:05 PM) Get result Original Runtime:110||
(12/2/2024 11:05:05 PM) Get result Movie Features:Original Runtime: 110 min.<br>||
(12/2/2024 11:05:05 PM) Get results country:United States||
(12/2/2024 11:05:05 PM) Parse results List Directors:Alex Thompson, Kelly O'Sullivan||
(12/2/2024 11:05:05 PM) Get results Directors:Alex Thompson||
(12/2/2024 11:05:05 PM) Get results Directors:Kelly O'Sullivan||
(12/2/2024 11:05:05 PM) Parse results List Writers:Kelly O'Sullivan||
(12/2/2024 11:05:05 PM) Get results Writers:Kelly O'Sullivan||
(12/2/2024 11:05:05 PM) Parse results List Composers:Quinn Tsan||
(12/2/2024 11:05:05 PM) Get results Composers:Quinn Tsan||
(12/2/2024 11:05:05 PM) Parse results List Actors:||
(12/2/2024 11:05:05 PM) Get results studio:Little Engine, Runaway Train. Distributor: IFC Films, Sapan Studios||
(12/2/2024 11:05:05 PM) Parse results List Genre+Category:Drama. Comedy | Comedy-Drama. Stage Play||
(12/2/2024 11:05:05 PM) Parse results List Category: Comedy-Drama. Stage Play||
(12/2/2024 11:05:05 PM) Get results Category:Comedy-Drama,Stage Play||
(12/2/2024 11:05:05 PM) Parse results List Genre:Drama. Comedy ||
(12/2/2024 11:05:05 PM) Get results Genre:Drama,Comedy||
(12/2/2024 11:05:05 PM) Get result description:||
(12/2/2024 11:05:05 PM) Function ParsePage NORMAL END======================|
(12/2/2024 11:05:05 PM) Provider data info retreived Ok in 2024-12-02 23:05:05| (~Updated~)
(12/2/2024 11:05:05 PM) Function ParsePage smNormal END======================|
(12/2/2024 11:05:05 PM) GET: http://pics.filmaffinity.com/ghostlight-171637604-large.jpg
(12/2/2024 11:05:05 PM) Redirected to: https://pics.filmaffinity.com/ghostlight-171637604-large.jpg
-
Using AI, I've managed to parse actors now, but cannot populate them to PVD:
// Get ~Actors~
curPos := 1;
ItemList := TextBetWeen(HTML, '<dt>Cast</dt>', '</dd>', false, curPos);
LogMessage('Parsed results List Actors: ' + ItemList + '||');
// Extract actor names
while Pos('<li class="nb" itemprop="actor"', ItemList) > 0 do
begin
curPos := Pos('<li class="nb" itemprop="actor"', ItemList);
ItemSubList := TextBetWeen(ItemList, '<div class="name" itemprop="name">', '<>', false, curPos);
ItemValue := Trim(ItemSubList);
// Adding actor to database
LogMessage('Adding Actor: ' + ItemValue);
AddMoviePerson(ItemValue, '', '', '', ctActors);
LogMessage('Added Actor: ' + ItemValue);
// Move to the next actor
ItemList := Copy(ItemList, curPos + Length('<li class="nb" itemprop="actor"'), Length(ItemList));
end;
(12/2/2024 11:27:23 PM) Parsed results List Actors: Keith Kupferer Dolly De Leon Katherine Mallen Kupferer Tara Mallen Hanna Dworkin Tommy Rivera-Vega Alma Washington H.B. Ward Dexter Zollicoffer Deanna Dunagan Francis Guinan David Bianco Matthew C. Yee See all credits||
Scriptnig procedure:
procedure AddMoviePerson(Name, TransName, Role, URL : String; const AType : Byte);
Adds movie credits (actor, director, etc.).
Parameters:
Name name of the person
TransName translated name of the person
Role role of the person (if applicable)
URL web address of the person (if available)
AType credits type
Possible values for AType variable:
0 actor
1 director
2 writer
3 composer
4 producer
is followed, so I have no idea for now what it could be?
-
This is the best I could achieve to get back actors to the field:
// Get ~Actors~ (Only name in Web from !file! list)
curPos := 1;
ItemList := TextBetWeen(HTML, '<dt>Cast</dt>', '</dd>', false, curPos); // Strings which opens/closes the data. WEB_SPECIFIC
LogMessage('Parsed results List Actors: ' + ItemList + '||');
// Remove " See all credits" if it exists
curPos := Pos(' See all credits', ItemList);
if curPos > 0 then
ItemList := Copy(ItemList, 1, curPos - 1);
LogMessage('Cleaned ItemList: ' + ItemList);
// Forward the plain text to the field
AddMoviePerson(ItemList, '', '', '', 0); // Use 0 for actors
LogMessage('Added All Actors: ' + ItemList);
The problem is because function TextBetween removes all tags from the resulting substring, and since the actors are separated by spaces only on FA when parsed, it's impossible to distinguish between individual names by space only as delimiter, thus impossible to generate comma separated list.
-
This is the best I could achieve to get back actors to the field:
.
.
.
The problem is because function TextBetween removes all tags from the resulting substring, and since the actors are separated by spaces only on FA when parsed, it's impossible to distinguish between individual names by space only as delimiter, thus impossible to generate comma separated list.
//Get ~Actors~ (Only name in Web from !file! list)(Separated with ',' and with spaces).
curPos:=Pos('<dt>Cast</dt>',HTML);
//curPos:=curPos+Length('<dt>Cast</dt>'); //Strings end which opens the block content data. WEB_SPECIFIC
curPos:=PosFrom('<div class="cast-wrapper">',HTML,curPos); //Strings end which opens the block content data. WEB_SPECIFIC
curPos:=curPos+Length('<div class="cast-wrapper">'); //Strings end which opens the block content data. WEB_SPECIFIC
endPos:=PosFrom('</dd>',HTML,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
//ItemList:=TextBetWeenFirst(HTML,'<div class="cast-wrapper"','</dd>'); //Strings which opens/closes the data. WEB_SPECIFIC
ItemList:=Copy(HTML,curPos,endPos-curPos);
LogMessage(' Parse results List Actors:'+ItemList+'||');
ItemList:=StringReplace(ItemList,'</a>','|',True,True,False); //WEB_SPECIFIC
ItemList:=StringReplace(ItemList,'See all credits','',True,True,False); //WEB_SPECIFIC
ItemList:=RemoveTags(ItemList, False);
/LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
ExplodeString(ItemList,ItemArray, '|');
for index := Low(ItemArray) to High(ItemArray) do begin
ItemValue:=Trim(ItemArray[index]);
AddMoviePerson(ItemValue,'','','',ctActors);
LogMessage(' Get results Actors:'+ItemValue+'||');
end;
Try this code, it transferred the actors to the Actors field for me
-
This is the best I could achieve to get back actors to the field:
.
.
.
The problem is because function TextBetween removes all tags from the resulting substring, and since the actors are separated by spaces only on FA when parsed, it's impossible to distinguish between individual names by space only as delimiter, thus impossible to generate comma separated list.
//Get ~Actors~ (Only name in Web from !file! list)(Separated with ',' and with spaces).
curPos:=Pos('<dt>Cast</dt>',HTML);
//curPos:=curPos+Length('<dt>Cast</dt>'); //Strings end which opens the block content data. WEB_SPECIFIC
curPos:=PosFrom('<div class="cast-wrapper">',HTML,curPos); //Strings end which opens the block content data. WEB_SPECIFIC
curPos:=curPos+Length('<div class="cast-wrapper">'); //Strings end which opens the block content data. WEB_SPECIFIC
endPos:=PosFrom('</dd>',HTML,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
//ItemList:=TextBetWeenFirst(HTML,'<div class="cast-wrapper"','</dd>'); //Strings which opens/closes the data. WEB_SPECIFIC
ItemList:=Copy(HTML,curPos,endPos-curPos);
LogMessage(' Parse results List Actors:'+ItemList+'||');
ItemList:=StringReplace(ItemList,'</a>','|',True,True,False); //WEB_SPECIFIC
ItemList:=StringReplace(ItemList,'See all credits','',True,True,False); //WEB_SPECIFIC
ItemList:=RemoveTags(ItemList, False);
/LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
ExplodeString(ItemList,ItemArray, '|');
for index := Low(ItemArray) to High(ItemArray) do begin
ItemValue:=Trim(ItemArray[index]);
AddMoviePerson(ItemValue,'','','',ctActors);
LogMessage(' Get results Actors:'+ItemValue+'||');
end;
Try this code, it transferred the actors to the Actors field for me
Wow, thanks! I t works like a charm! Two small things: endPos has to be declared as additional String variable in this function, and there's missing "/" in
/LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
so it should be commented out.
-
Either like this
//LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
or like this
LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList:'+ItemList+'||');
It's up to you what you take, according to your wishes.
-
You are right!
For my personal purposes I have slightly modified your snippet to:
AddMoviePerson(ItemValue,'','No Role On FilmAffinity','',ctActors);
in order to get each actor to a new line (maintaining the feel and look of IMDb parsing), while indicating roles aren't stated on FA. Next I'll try to implement PersonURL for FA, but not planning to create FA script for people, not in the near future.
-
Here's updated code for "Description" field:
// Get ~description~ from the HTML.
curPos := Pos('<dt>Synopsis</dt>', HTML);
if curPos > 0 then // Ensure the synopsis section is found
begin
curPos := PosFrom('<dd class="" itemprop="description">', HTML, curPos); // Locate the synopsis description div
curPos := curPos + Length('<dd class="" itemprop="description">'); // Adjust curPos to the end of the opening tag
endPos := PosFrom('</dd>', HTML, curPos); // Locate the closing tag
// Extract the relevant section of the HTML
ItemValue := Copy(HTML, curPos, endPos - curPos);
LogMessage('Parse results description: ' + ItemValue + '||');
// Clean up and format the ItemValue
ItemValue := Trim(StringReplace(ItemValue, '(FILMAFFINITY)', '', True, True, False)); // Remove specific tags or text
ItemValue := RemoveTags(ItemValue, False);
LogMessage('Cleaned description: ' + ItemValue + '||');
// Add the cleaned description to the XML field
AddFieldValueXML('description', ItemValue);
LogMessage('Get result description: ' + ItemValue + '||');
end // Properly terminate the if block
else
begin
LogMessage('Synopsis section not found.');end;
-
Does anyone have an idea why this script wouldn't populate data, although it parses them (poster)?
Unfortunately, FilmAffinity doesn't provide http links to pics any more:
C:\Users\user>curl -I http://pics.filmaffinity.com/bookworm-371036333-large.jpg (http://pics.filmaffinity.com/bookworm-371036333-large.jpg)
HTTP/1.1 301 Moved Permanently
Date: Wed, 04 Dec 2024 20:11:52 GMT
Content-Type: text/html
Content-Length: 167
Connection: keep-alive
Cache-Control: max-age=3600
Expires: Wed, 04 Dec 2024 21:11:52 GMT
Location: https://pics.filmaffinity.com/bookworm-371036333-large.jpg (https://pics.filmaffinity.com/bookworm-371036333-large.jpg)
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v4?s=Mjz%2B8pecaYxQxzpSkhAuFdftbG6ya0xXg%2BrjQQWzjZQmTEKL%2FGvFTGYbXU9HCka1CJMZPDOLkmSrCuhdqDJ%2BXbeorrMFwgP6FrKFpZZovX8btXvLWdYfrK6jjdS6zssZ4f3snbIHGA%3D%3D"}],"group":"cf-nel","max_age":604800}
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Server: cloudflare
CF-RAY: 8ece72b65de05b67-VIE
server-timing: cfL4;desc="?proto=TCP&rtt=10158&min_rtt=10158&rtt_var=5079&sent=1&recv=3&lost=0&retrans=0&sent_bytes=0&recv_bytes=114&delivery_rate=0&cwnd=249&unsent_bytes=0&cid=0000000000000000&ts=0&x=0"
Here's updated snippet that provides poster only via Proxomitron:
//Get ~poster~
ItemValue:=TextBetWeen(HTML,'<a class="lightbox" style="display: block;" href="','"',false,curPos); //Strings which opens/closes the data.'https://pics.filmaffinity...-large.jpg. WEB_SPECIFIC
//Unfortunately the http links for the pics are not available anymore. This snippet only via Proxomitron-like proxies
AddImageURL(itPoster,ItemValue);
LogMessage(' Get result poster:'+ItemValue+'||');
-
Using AI (will not address that in the future since it will be implied by default), I have done major rewrite and changes to FilmAffinity script. I will try to maintain and upgrade it in the future, at least for myself and will offer it to others here. Here are the changes since VVV last changed it in 2020.
CHANGE LOG :
V 5.0.0.1-afrocuban (12/05/2024) afrocuban: Poster now available via HTTPS
V 4.2.0.0-afrocuban (12/01/2024) afrocuban: Updated Actors and Description are now again available for import. Poster now available only via Proxomitron