Author Topic: Python scripts documentation and instructions  (Read 37 times)

0 Members and 1 Guest are viewing this topic.

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2743
    • View Profile
Python scripts documentation and instructions
« on: December 24, 2024, 09:06:51 am »
Documentation and instructions for using Python scripts

The script is intended to automate access to IMDb websites, where it:

  • Clicks on the "Select Your Preferences" pop-up window so that the page can continue loading.
  • Clicks on all "More" buttons to expand all hidden information on the page.
  • Saves the entire source code (HTML) of the page to a file for further processing.

Requirements

  • Python: Python 3.x installed on your computer.
  • Selenium: Browser automation library.
  • Geckodriver: Driver for using Firefox browser with Selenium(or other driver for other browser, if available).
  • Firefox browser: Firefox browser installed(or other browser).

Environmental preparation

    Installing libraries: Install the Selenium library using the command:

    pip install selenium

    Downloading Geckodriver (or other driver for other browser, if available):
    Download geckodriver from the official site: Geckodriver Releases.
    Unzip the file and save the path to the geckodriver.exe program.

    Adjusting the path in the script:
        Set the correct paths in the script:
            gecko_path: Path to the geckodriver.exe file.
            pvd_path: Path to your PVD folder.
            log_path: Path to the log file.


How to use the script

    Calling the script from the command line:
        You run the script with the following command:

    python script_name.py (for example "selenium_aka_script.py")

    Replace "https://www.imdb.com/title/tt1234567/" with the actual URL of the IMDb page (or the script will use the URL from the database if the URL already exists there).


Script functionalities:

    When loading the page, it first checks if the "Select Your Preferences" window appears, and closes it by clicking the "Accept" button.

    Clicks all visible "More" buttons to expand additional information (e.g., a list of actors or similar movies).

    After clicking all buttons, it saves the entire page to the file downpage-UTF8_NO_BOM.htm, which is located in the Scripts/Tmp folder.


Results:

    The entire page is saved as an HTML file in the folder:

C:\Program Files (x86)\Personal Video Database/Scripts/Tmp/  (or wherever your PVD folder with the installed program is)

The log file that records all events is located in:

        C:\Program Files (x86)\Personal Video Database/Scripts/python_script.log  (or wherever your PVD folder with the installed program is)


A more detailed explanation of the functionality

    Closing the pop-up window:
        The script waits up to 10 seconds for the "Select Your Preferences" window to appear.
        It looks for the "Accept" button and clicks it.
        If the window is not displayed, it continues without error.

    Clicking "More" buttons:
        The script continuously searches for "More" buttons on the page.
        When it finds them, it clicks them one by one and waits for additional content to load.
        This process is repeated until all "More" buttons have been clicked.

    Saving the page:
        Saves the entire page to a file in UTF-8 format after clicking all buttons.

    Logging:
        The log file records all important events, such as a successful button click, any errors, or the unavailability of a pop-up window.


Common Problems and Solutions

    Geckodriver not found:
        Check if the gecko_path is set correctly.
Make sure geckodriver.exe is compatible with your version of Firefox.

    Pop-up window does not close:
        Check if the "Accept" button attributes have been changed on the IMDb page.
        Update the XPath for the button if there have been any changes.

    Log file not created:
        Check the write permissions on the Scripts folder.


Example of a log file

Quote
2024-12-19 12:00:00 - INFO - Starting the Python script.
2024-12-19 12:00:01 - INFO - Using IMDb URL: https://www.imdb.com/title/tt1234567/
2024-12-19 12:00:05 - INFO - Page https://www.imdb.com/title/tt1234567/ loaded successfully.
2024-12-19 12:00:06 - INFO - 'Select Your Preferences' popup detected.
2024-12-19 12:00:07 - INFO - Clicked on the 'Accept' button to close the popup.
2024-12-19 12:00:10 - INFO - Clicked a 'More' button.
2024-12-19 12:00:12 - INFO - HTML saved to file: D:/MyPVD/PVD_0.9.9.21_MOD-Simple AllMovies/Scripts/Tmp/downpage-UTF8_NO_BOM.htm
2024-12-19 12:00:15 - INFO - Browser closed.
« Last Edit: December 24, 2024, 09:08:44 am by Ivek23 »
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


 

anything