Author Topic: Python (+Selenium) Chrome script for IMDb People script (Read 1254 times)

afrocuban · « **on:** January 04, 2025, 04:48:45 am »

This is Selenium script specific for the People script here:

Script Description: This script automates the download of various IMDb pages using Selenium and ChromeDriver, including handling cookies and popups, and saving the resulting HTML to local files.
It automatically finds your localization by using service, the http://ipinfo.io API to get the country code and the dictionary to map to language acording to obtaining country code. If you don't want this, comment out first part of the script and uncomment the one at the end of this script. Open the script in text editor and read about this.
For this to work ensure that:

Quote

A. You installed python
B. You installed selenium and requests by

Quote
pip install selenium requests

C. You have your Chrome bin on a PATH
D. You have Python folder on your PATH
E. pythonw.exe is not missing, or it's containing folder is on the PATH

This script:

Quote

1. Uses Chrome browser instead Firefox
2. Uses chromedriver.exe instead geckodriver
3. Starts chromedriver.exe silently
4. Silently invokes browser in a headless mode (no pop-up windows of browser)
5. Scrapes .htm page of a given url
6. No path is needed to set manually inside the script - it is set to be relative to the path of selenium script!

For using relative path, ensure:

Quote

6A. You put this script into "Scripts" folder of your PVD instance.
6B. You put appropirate chromedriver.exe to the "Script" folder, too.

To silently invoke selenium script itself by PVD's .psf script (no pop-up windows of selenium script's cmd window), be sure to use pythonw.exe instead of python.exe, like this for example:

Quote

FileExecute('pythonw.exe', '"' + ScriptPath + 'selenium_script-Chrome_People.py" "' + URL + '" "' + ScriptPath + BASE_DOWNLOAD_FILE_NO_BOM + '"');

Now, the last one will probably be ensured by those who maintain corresponding scripts if interested in, and for now, those are Ivek and me, but be sure to check if it's there anyway.

You may want first to test the script manually, from cmd, for example like this:

Quote

C:\Users\user\selenium_script-Chrome_People.py "https://www.imdb.com/name/nm0000017"

From this point on, everything is automated and headless.

afrocuban · « **Reply #1 on:** January 04, 2025, 08:39:56 am »

Here's optimized selenium script, that should reduce time wait significantly.

afrocuban · « **Reply #2 on:** January 05, 2025, 10:37:58 pm »

New scripts. Delete earlier and put these to the Scripts folder.

Read more here:

Quote

http://www.videodb.info/forum_en/index.php/topic,4367.msg22727.html#msg22727

Ivek23 · « **Reply #3 on:** January 06, 2025, 10:02:12 pm »

selenium_script-People_4_pages_v3.2 script does not transfer all awaeds data because it does not open all more buttons for you, at least it was the case for me.

Here is my updated part of the code to open more more buttons and it works, so you will have to adapt it for the chrome version.

Quote

# Define URLs and save paths
URLS_AND_PATHS = {
f"{base_url}/awards/": os.path.join(tmp_dir, "downpage-UTF8_NO_BOM-Awards.htm"),
f"{base_url}/bio/": os.path.join(tmp_dir, "downpage-UTF8_NO_BOM-Bio.htm"),
f"https://www.imdb.com/search/title/?explore=genres&role={base_url.split('/')[-1]}": os.path.join(tmp_dir, "downpage-UTF8_NO_BOM-Genres.htm"),
f"{base_url}/?showAllCredits=true": os.path.join(tmp_dir, "downpage-UTF8_NO_BOM-Credit.htm")
}

# Improved function to click all "More" buttons with scrolling
def click_all_more_buttons(driver):
"""
Scrolls down the page and clicks all the "More" buttons that are visible.
"""
body = driver.find_element(By.TAG_NAME, 'body')
while True:
try:
# Find visible "More" buttons
more_buttons = driver.find_elements(By.XPATH, '//span[contains(@class, "ipc-see-more__text")]/..')

# If no buttons are found, break the loop
if not more_buttons:
logging.info("No more 'More' buttons found.")
break

# Iterate through and click all visible "More" buttons
for button in more_buttons:
try:
# Scroll into view before clicking
driver.execute_script("arguments[0].scrollIntoView({block: 'center'});", button)
time.sleep(0.5) # Allow page to stabilize
button.click()
logging.info("Clicked a 'More' button.")
time.sleep(1) # Allow time for new content to load
except Exception as e:
logging.warning(f"Error clicking a 'More' button: {e}")
continue

# Scroll the page down to load more buttons
body.send_keys(Keys.PAGE_DOWN)
time.sleep(1) # Wait for page to load more buttons
except Exception as e:
logging.info("No additional 'More' buttons to click.")
break

# Function to download a page
def download_page(imdb_url, output_path, retries=3):
for attempt in range(retries):
try:
# Initialize FirefoxDriver
service = Service(gecko_path)
driver = webdriver.Firefox(service=service, options=firefox_options)
logging.info(f"Started FirefoxDriver for: {imdb_url}")

driver.get(imdb_url)
logging.info(f"Loaded URL: {imdb_url}")

# Handle "Select Your Preferences" popup
try:
popup = WebDriverWait(driver, 5).until(
EC.presence_of_element_located((By.XPATH, "//div[contains(@class, 'sc-kDvujY')]"))
)
accept_button = WebDriverWait(driver, 5).until(
EC.element_to_be_clickable((By.XPATH, "//button[@data-testid='accept-button']"))
)
accept_button.click()
logging.info("Accepted preferences popup.")
except TimeoutException:
logging.info("No preferences popup appeared.")

# Click all "More" buttons on the page
click_all_more_buttons(driver)

# Save the HTML after clicking all "More" buttons
html_source = driver.page_source
with open(output_path, 'w', encoding='utf-8') as file:
file.write(html_source)
logging.info(f"Saved HTML to: {output_path}")
break
except Exception as e:
logging.error(f"Error in attempt {attempt + 1}: {e}")
finally:
driver.quit()

# Download pages in parallel
threads = []
for url, path in URLS_AND_PATHS.items():
thread = threading.Thread(target=download_page, args=(url, path))
threads.append(thread)
thread.start()

afrocuban · « **Reply #4 on:** January 07, 2025, 02:58:37 am »

Strange.

I deleted record for Alfonso Cuaron (https://www.imdb.com/name/nm0190859/), that has 262 wons and 207 nominations and imported it from the scratch, and counted manually up to 262+207=469 and all are there?

Can you provide me the link you were stuck with, so I could try to reproduce?

Ivek23 · « **Reply #5 on:** January 07, 2025, 05:50:03 am »

Quote from: afrocuban on January 07, 2025, 02:58:37 am

Strange.

I deleted record for Alfonso Cuaron (https://www.imdb.com/name/nm0190859/), that has 262 wons and 207 nominations and imported it from the scratch, and counted manually up to 262+207=469 and all are there?

Can you provide me the link you were stuck with, so I could try to reproduce?

Andrea Barber
https://www.imdb.com/name/nm0053347/

Aaron Spelling
https://www.imdb.com/name/nm0005455/

afrocuban · « **Reply #6 on:** January 07, 2025, 09:10:01 am »

I can't seem to reproduce it?

Ivek23 · « **Reply #7 on:** January 07, 2025, 03:25:54 pm »

Quote from: afrocuban on January 07, 2025, 09:10:01 am

I can't seem to reproduce it?

As I can see from the attached pictures, the problem was on my side. What about downloading the filmography, does it download it on the first attempt for a specific person or do I have to run the script again for the same person to download the filmography. In most cases, for me, the filmography download is successful only on the second attempt, which means that my modified firefox selenium_script-People_4_pages_v3.2 is not fully compatible with the script. My selenium script will need to be fixed.

afrocuban · « **Reply #8 on:** January 08, 2025, 07:20:48 am »

Quote from: Ivek23 on January 07, 2025, 03:25:54 pm

What about downloading the filmography, does it download it on the first attempt for a specific person or do I have to run the script again for the same person to download the filmography.

For me everything works without a problem.

Ask Copilot to revise my Chrome script for FF and geckodriver. Compare that revision with the FF version you have in Notepad++ with Compareplus plugin. If they are different, try Copilot's version. If you still have issues, than at the moment I can think only that your computer is a bit outdated and it can't handle page loading scrolling and clicking that fast. Increase wait time in the script for page loading and scrolling. If you get back with testing details with these but still unsuccessful, maybe something else can come up on my mind.

Ivek23 · « **Reply #9 on:** January 08, 2025, 05:23:26 pm »

In selenium_script-Chrome_People_Base_page_v3.2 it will be necessary to add a function to open Expand buttons for alternative names, because now it only passes one alternative name, where otherwise there are multiple names.

Here is a nice example for Ann Gillespie alternative names

https://www.imdb.com/name/nm0318905/

afrocuban · « **Reply #10 on:** January 09, 2025, 06:08:09 am »

Nice find, thanks!

Here it is. I had to fix IMDB_People_[EN][Selenium]-v3.2.psf to v3.2.0.2 too. Check here please

Quote

http://www.videodb.info/forum_en/index.php/topic,4367.msg22753.html#msg22753

I will not update v3.x of the script(s) anymore (it gave me a headache to maintain changes across versions --->) because I'm working on a v4 of a script, which indicates major additions to it, but please let me know about anything else to improve and to include in v4.

Quote

CHANGE LOG:
Version: 3.2.0.2 Date: 2025-01-09
- Expanding "Alterantive Names" section.
- Further optimization of the script (Handle "Select Your Preferences" popup with a timeout not an obstacle for the processs).
- "Hidden" feature

Did you debug your FF sript for Awards and Filmography?

Ivek23 · « **Reply #11 on:** January 09, 2025, 07:12:20 am »

Quote from: afrocuban on January 09, 2025, 06:08:09 am

Did you debug your FF sript for Awards and Filmography?

Yes, it's ok now.

Personal Video Database

News:

Author Topic: Python (+Selenium) Chrome script for IMDb People script (Read 1254 times)

afrocuban

Python (+Selenium) Chrome script for IMDb People script

afrocuban

Re: Python (+Selenium) Chrome script for IMDb People script

afrocuban

Re: Python (+Selenium) Chrome script for IMDb People script

Ivek23

Re: Python (+Selenium) Chrome script for IMDb People script

afrocuban

Re: Python (+Selenium) Chrome script for IMDb People script

Ivek23

Re: Python (+Selenium) Chrome script for IMDb People script

afrocuban

Re: Python (+Selenium) Chrome script for IMDb People script

Ivek23

Re: Python (+Selenium) Chrome script for IMDb People script

afrocuban

Re: Python (+Selenium) Chrome script for IMDb People script

Ivek23

Re: Python (+Selenium) Chrome script for IMDb People script

afrocuban

selenium_script-Chrome_People_Base_page_v3.2.0.2.py

Ivek23

Re: selenium_script-Chrome_People_Base_page_v3.2.0.2.py