Author Topic: v4 of Movie and People scripts progress  (Read 761 times)

0 Members and 2 Guests are viewing this topic.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
v4 of Movie and People scripts progress
« on: January 21, 2025, 02:47:47 am »
This is temp topic so you could track where am I at the moment with scripts.
Most probably I will keep only this message and modify it with most recent updates.


IMDb People script:
CHANGE LOG :

            V 4.0.0.1 (23/01/2025) afrocuban:
         - Full and complete transtion to Selenium. No more instances or references to PVdBDownPage. Huge thanks to VVVEasy and Ivek to maintain decade long option to keep PVD alive with it.
         - Search function brought back to the script with thumbnails in the search window.
         - Full implementation of PvdConfigOptions. Extremely important for optimization and especially useful when refreshing only certain set of data
         - Job Title, Career and other personal data now moved from comment to bio field.
         - Starting positions provided for all the fileds in section "Field Overwrite Options position in pvdconf.ini".
         - No more pop-up windows stealing focus! In earlier versions, the downpage-UTF8_NO_BOM.htm file was repeatedly downloaded and deleted to parse different pages. Each time this occurred, a pop-up window would steal the focus. Thanks to a different approach in this script, which now downloads separate pages in parallel using Selenium, as well as the special PurgeTmpFiles procedure and vbs script instead of cmd.exe, this focus-stealing issue is now resolved.
TO DO:

SCRIPT COMPLETE
Script and instructions available in a package here.


IMDb Movie script:
CHANGE LOG :

            V 4.0.0.1 (18/01/2025) afrocuban:
         - Transition to Selenium started.
         - Various DownloadPageXXX functions for each page to be downloaded constructed.
         - ParsePage function adapted accordingly.
         - DownloadPage and DownloadImage functions set up for Selenium scripts.
         - ParsePage_IMDBSearchTitle function fully transitioned to Selenium which means that
         - Search function is brought back to the script with thumbnails in the search window.
         - Starting positions provided for all the fileds in section "Field Overwrite Options position in pvdconf.ini".
         - No more pop-up windows stealing focus! In earlier versions, the downpage-UTF8_NO_BOM.htm file was repeatedly downloaded and deleted to parse different pages. Each time this occurred, a pop-up window would steal the focus. Thanks to a different approach in this script, which now downloads separate pages in parallel using Selenium, as well as the special PurgeTmpFiles procedure and vbs script instead cmd.exe, this focus-stealing issue is now resolved.
---------------------------------
February as of 1st 2025 news:
----------------------------------
- Script fully transitioned to Selenium.
- Script Options, Script Data and Global Vars upgraded.
- New GetPvdConfigOptions Function introduced, so now
- Whole script successfully set to rely on PVDConfigOptions.
- GetDownloadURL Function completely set and functional.
- DownloadPage Function completely set and functional.
- DownloadImage Function completely set and functional.
- ParsePage_IMDBSearchTitle Function completely set and functional.
- ParsePage_IMDBMovieBASE Function completely set and functional.
- ParsePage_IMDBMovieAWARDS Function completely set and functional. (check the screenshot here)
- ParsePage_IMDBMoviePLOTSUMMARY Function completely set and functional.
- ParsePage Function completely set and functional.


---------------------------------
February as of 6th 2025 news:
----------------------------------
- ParsePage_IMDBMovieCREDIT Function redesigned and completely set and functional.
- ParsePage_IMDBMovieAKA Function completely set and functional.
- ParsePage_IMDBMovieCONNECTIONS Function brought back to the script with additional connection type not existed in the script so far (check the screenshot here)


---------------------------------
February as of 16th 2025 news:
----------------------------------
SCRIPT BASICALLY COMPLETED

---------------------------------
March as of 13th 2025 news:
----------------------------------
SCRIPT COMPLETE
Script and instructions available in a package here.

IMDb Series Script:
Nothing yet. When I finish movie and people scripts, I will check for and assess a possibilty to merge it with episodes script.


IMDb Episodes Script:
Nothing yet. When I finish movie and people scripts, I will check for and assess a possibilty to merge it with series script.


FilmAffinity Script:
I will strip the version down to v4, and for now for testing purposes I have successfully transitioned only DownloadImage (poster) to Selenium. The plan is to fully transition it to Selenium too, but since it works perfectly at the moment, it is not priority. The transition should be fast, though, because most probably it is only needed to set Selenium scripts and switch functions to them (there are not as many functions for FA as fro IMDb scripts). Trailer page is intended to be added too.
TO DO:
SCRIPT COMPLETE
Script and instructions available in a package here.

Selenium Scripts:
Single Download base page set for IMDb Movie, People and FilmAffinity Movie. ("3 in 1")
Single Search script set for IMDb Movie, People and FilmAffinity Movie. ("3 in 1")
Single Download image script set for IMDb Movie, People and FilmAffinity Movie. ("3 in 1")

Single Download additional pages scripts set for IMDb Movie and FilmAffinity Movie. ("2 in 1")
Single Download additional pages scripts set for IMDb People.

Scripts and instructions available in a package here.


TO DO:
- To set download base page and additional pages scripts for Series and Episodes script.
- To adapt Search script for Series and Episodes.

OTHER:
- vbs script set to use instead of cmd.exe in order to avoid annoying pop-up windows stealing focus.
- SCRIPTS CONFIGURATOR completely rewritten from the scratch (yes, I had to learn about ahk, too :-\ ) so now we have resizable window with scrollbars with all the options I could think of included.
Scripts, configurator exe and source code, screenshots and instructions available in a package here.

WISHFULL THINKING:
- Bringing back Allmovie and Rottentomatoes scripts too.
« Last Edit: March 14, 2025, 12:17:53 am by afrocuban »

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #1 on: January 23, 2025, 01:56:13 am »
People Script Complete.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #2 on: February 01, 2025, 07:52:08 pm »
New advances in the first post on February 1st 2025.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #3 on: February 06, 2025, 08:56:58 pm »
New advances in the first post as of February 6th 2025.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #4 on: February 10, 2025, 03:22:50 am »
Function ParsePage_IMDBMovieREFERENCE gives me a lot of headache because of a huge redudancy and overlapping potential with ParsePage_IMDBMovieBASE and ParsePage_IMDBMovieCREDIT, which is not good for time optimization. But it's a great resource, so I am at the moment occupied with reorganization of Base and Credit not to parse when reference page is downloaded, while adding everything from both to parse from reference page that is redudant for base and credit. Will check for redudancy for other pages later too.
« Last Edit: February 10, 2025, 03:25:26 am by afrocuban »

Offline Pacifist

  • User
  • ***
  • Posts: 71
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #5 on: February 15, 2025, 08:17:35 am »
where to download new script IMDB Movie Selenium?  ???

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #6 on: February 15, 2025, 01:10:33 pm »
I am preparing 3 scripts: IMDb Movie, People and FilmAffinity that I will publish all together, so people could test them all. I don't want to publish them separately, because I can't work on them, and to make updates on published scripts in parallel. I guess I am a week from the publishing. Thanks for patience, but Movie script now has almost 7000 lines...

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #7 on: February 15, 2025, 02:30:05 pm »
News is that I adapted ParsePage_IMDBMovieCREDIT to parse either from reference or fullcredits page based on a new script option:
Quote
  GET_FULL_CREDIT_FROM_REFERENCE      =   True;   //Instead from FullCredits.htm page with Function ParsePage_IMDBMovieCREDIT, download full credits from Reference.htm with  Function ParsePage_IMDBMovieCREDIT.


This way we optimize the time and load, by not downloading and parsing fullcredits page. But if we set the option to "False", then it will download fullcredits page, and with the same Function ParsePage_IMDBMovieCREDIT, without any code changes, it will parse fullcredits page and get the exact same result as when this function parses reference.htm page.



Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #8 on: February 17, 2025, 02:34:20 pm »
New advances in the first post as of  February 16th 2025.

Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #9 on: March 13, 2025, 10:26:56 pm »
As of March 13, 2025 Selenium IMDb Movie, People and FilmAffinity Movie scripts, vbs script and Scripts Configurator for them completed. Read the first message about the details as well as how and where to get them.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2792
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #10 on: March 14, 2025, 08:44:35 am »
Question

How do I fix this for those of us who use Firefox browser instead of Chrome browser?
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #11 on: March 14, 2025, 11:34:39 am »
Question

How do I fix this for those of us who use Firefox browser instead of Chrome browser?



To adapt your existing Selenium script to use Firefox with geckodriver, you'll need to make a few key changes. Here's a detailed explanation of the necessary modifications:
 1. Change WebDriver to Firefox Selenium has a Firefox WebDriver (webdriver.Firefox()), similar to how you're using the Chrome WebDriver (webdriver.Chrome()).
 2. Install and Use Geckodriver You need to install geckodriver (the WebDriver for Firefox) and make sure it is available in your system’s PATH or specify its path explicitly.
 3. Modify Chrome-Specific Options to Firefox-Specific Options Some Chrome-specific options (like chrome_options) need to be replaced with their Firefox counterparts. The FirefoxOptions object is used to set browser-specific configurations.
 4. Remove Chrome-specific arguments and replace them with Firefox-specific ones For Firefox, you would use FirefoxOptions and its methods instead of ChromeOptions.
 

Step-by-Step Adaptation
  • Install Firefox and Geckodriver
     
    • Firefox: If you don’t have Firefox installed already, you can install it from the official website.
    • Geckodriver: You can download it from the Geckodriver GitHub releases page. Make sure to download the version that matches your operating system and place it in a directory that's included in your PATH or specify the path in the script.
  • Modify Imports You need to import the Firefox-specific classes instead of Chrome.
    Quote
    from selenium.webdriver.firefox.service import Service as FirefoxService
    from selenium.webdriver.firefox.options import Options as FirefoxOptions
    from selenium.webdriver.common.by import By 
  • Set Firefox Options Replace the chrome_options with firefox_options. Also, you will replace Chrome-specific arguments with their Firefox equivalents.
    Quote
    # Set Firefox options
    firefox_options = FirefoxOptions() firefox_options.add_argument("--headless")  # Running Firefox in headless mode
    firefox_options.add_argument(f"--lang={language_code}")  # Set language
    firefox_options.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Firefox/91.0 Safari/537.36")  # User agent 
    Note that Firefox has some different preferences compared to Chrome. For example, setting the user agent or using specific preferences like general.useragent.override is handled differently.
  • Replace Chrome WebDriver with Firefox WebDriver When initializing the WebDriver, use webdriver.Firefox instead of webdriver.Chrome.
    Quote
    # Path to geckodriver
    geckodriver_path = os.path.join(current_dir, "geckodriver.exe")
    # Ensure geckodriver exists
    if not os.path.exists(geckodriver_path): logging.error(f"Geckodriver not found at path: {geckodriver_path}")
    sys.exit(f"Geckodriver not found at path: {geckodriver_path}")
    # Initialize FirefoxDriver
    service = FirefoxService(executable_path=geckodriver_path)
    driver = webdriver.Firefox(service=service, options=firefox_options)
    logging.info(f"Geckodriver started from: {geckodriver_path}") 
  • Make Other Necessary Changes for Firefox You don't need to make major changes for general functionality, but if you're handling Firefox-specific settings (like cookies or scrolling behavior), you may need to adjust those based on Firefox’s behavior.


Full Example of Key Changes:Here’s an example of how you can modify the initialization and WebDriver setup for Firefox:
Quote
# Import the necessary components for Firefox
from selenium.webdriver.firefox.service import Service as FirefoxService
from selenium.webdriver.firefox.options import Options as FirefoxOptions
from selenium.webdriver.common.by import By
import os
import logging
import sys



# Set Firefox options
firefox_options = FirefoxOptions()firefox_options.add_argument("--headless")  # Running Firefox in headless mode
firefox_options.add_argument(f"--lang={language_code}") # Set language
firefox_options.set_preference("general.useragent.override", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Firefox/91.0 Safari/537.36")  # User agent

# Path to geckodriver
geckodriver_path = os.path.join(current_dir, "geckodriver.exe")

# Ensure geckodriver exists
if not os.path.exists(geckodriver_path):
    logging.error(f"Geckodriver not found at path: {geckodriver_path}")
    sys.exit(f"Geckodriver not found at path: {geckodriver_path}")

# Initialize FirefoxDriver
service = FirefoxService(executable_path=geckodriver_path)
driver = webdriver.Firefox(service=service, options=firefox_options)
logging.info(f"Geckodriver started from: {geckodriver_path}")
Summary of Changes:
  • Import the FirefoxService and FirefoxOptions.
  • Set Firefox-specific options (firefox_options), such as headless mode and language settings.
  • Initialize the webdriver.Firefox instead of webdriver.Chrome with the appropriate service and options.
  • Ensure the path to geckodriver is correct.


That should be all you need to switch from Chrome to Firefox using Selenium and Geckodriver! The rest of your script (e.g., interacting with elements, taking screenshots, saving the page) should work with minimal modification because Selenium provides a consistent API for interacting with different browsers.
« Last Edit: March 14, 2025, 11:56:19 am by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2792
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #12 on: March 14, 2025, 11:56:15 am »
Thanks for the comprehensive explanation for Firefox browsers.

Thank you very much.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 609
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #13 on: March 14, 2025, 12:04:39 pm »
Thanks for the comprehensive explanation for Firefox browsers.

Thank you very much.

You are more than welcome, Ivek. I never tried it, so I am not sure at all how Firefox would download pages (clicking "See more" pages, "Storyline" sections and other), and if final html code would be the same as downloaded with Chrome, so it might be frustrating to realize that there are differences actually in scraped hmtls with either.

P.S. In people script, I brought back career option to base function too, so just make sure the proper switch (ShouldParseCareer) is set not to parse it with bio function.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2792
    • View Profile
Re: v4 of Movie and People scripts progress
« Reply #14 on: March 14, 2025, 05:07:16 pm »
I never tried it, so I am not sure at all how Firefox would download pages (clicking "See more" pages, "Storyline" sections and other), and if final html code would be the same as downloaded with Chrome, so it might be frustrating to realize that there are differences actually in scraped hmtls with either.

Notice

Does not work on Win 10 and with Firefox browser and geckodriver.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD