Author Topic: Hello from NetworkShark  (Read 14892 times)

0 Members and 1 Guest are viewing this topic.

Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Hello from NetworkShark
« on: October 14, 2023, 10:10:17 am »
Hi guys and ladies,

Let me introduce myself, I am a 32-year-old Italian sysadmin.
I like to eat pizza and pasta (strange, huh?) and to be at the PC (yes, I'm damned).

I was lucky enough to know this project years ago, but only a few days ago I was able to join it (thanks to Ivek).

As much as I can, I will try to lend a hand to keep PVD going.

Thank you for your attention,
your friendly NetworkShark
SysAdmin for Work? For Fun!

Offline Pacifist

  • User
  • ***
  • Posts: 68
    • View Profile
Re: Hello from NetworkShark
« Reply #1 on: October 21, 2023, 09:21:33 am »
welcome!!!!!!!!!!!!!!

Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Re: Hello from NetworkShark
« Reply #2 on: October 22, 2023, 12:24:38 pm »
SysAdmin for Work? For Fun!

Offline afrocuban

  • Moderator
  • *****
  • Posts: 558
    • View Profile
Re: Hello from NetworkShark
« Reply #3 on: October 27, 2023, 05:19:16 pm »
Hi guys and ladies,

Let me introduce myself, I am a 32-year-old Italian sysadmin.
I like to eat pizza and pasta (strange, huh?) and to be at the PC (yes, I'm damned).

I was lucky enough to know this project years ago, but only a few days ago I was able to join it (thanks to Ivek).

As much as I can, I will try to lend a hand to keep PVD going.

Thank you for your attention,
your friendly NetworkShark

Welcome NetworkShark and thank you immensely for your offer. It would be great if we could preserve PVD and maintain somehow at least IMDb script!

Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Re: Hello from NetworkShark
« Reply #4 on: October 27, 2023, 07:40:54 pm »
Thanks for the welcome afrocuban!

I think Ivek is already working on it on IMDb.
SysAdmin for Work? For Fun!

Offline afrocuban

  • Moderator
  • *****
  • Posts: 558
    • View Profile
Re: Hello from NetworkShark
« Reply #5 on: November 05, 2023, 11:37:51 pm »
Thanks for the welcome afrocuban!

I think Ivek is already working on it on IMDb.


You're welcome. I am not aware Ivek would work on it. The last I read from him here was

http://www.videodb.info/forum_en/index.php/topic,4163.msg21754.html#msg21754


Quote
So there won't be any new IMDB_[EN][HTTPS] script updates anymore because it is no longer possible to download many full downloads of information. The new Imdb website design does not allow this anymore.

Unless this means he'll completely rewrite new one...

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2757
    • View Profile
Re: Hello from NetworkShark
« Reply #6 on: November 08, 2023, 08:57:01 am »
I think Ivek is already working on it on IMDb.
Thanks for the welcome afrocuban!

I think Ivek is already working on it on IMDb.


You're welcome. I am not aware Ivek would work on it. The last I read from him here was

http://www.videodb.info/forum_en/index.php/topic,4163.msg21754.html#msg21754


Quote
So there won't be any new IMDB_[EN][HTTPS] script updates anymore because it is no longer possible to download many full downloads of information. The new Imdb website design does not allow this anymore.

Unless this means he'll completely rewrite new one...

I wonder how many users still using PVD.


Do you think more than few dozens?

Maybe.

I wonder how many users still using PVD.

There is no doubt, PVD is the first choice and will remain so in the future as long as it is possible to update the scripts for the information I need for my PVD database.

I use EMDB or another program from time to time for testing, if a favorable solution might be found, so that I could then possibly update all possible IMDB movies scripts for all other PVD users as well.

Not yet (currently the second script I'm testing takes precedence), but maybe I'll fix the IMDb script in the future, as I mentioned above. If there's anything new, I'll let you know here.

Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 558
    • View Profile
Re: Hello from NetworkShark
« Reply #7 on: November 09, 2023, 10:12:54 am »
Thank you Ivek, thousand times!


I have recently decide anyway to use PVD in the future regardless of the scripts state. It's basic functionality to at least record file entries will stay unbroken

Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Re: Hello from NetworkShark
« Reply #8 on: November 10, 2023, 08:48:30 pm »
Thank you Ivek, thousand times!


I have recently decide anyway to use PVD in the future regardless of the scripts state. It's basic functionality to at least record file entries will stay unbroken

Thank you for your hope in this project!

I hope Nostra will be heard, and I hope this will happen soon.
This project has a very good chance to grow.
SysAdmin for Work? For Fun!

Offline Pacifist

  • User
  • ***
  • Posts: 68
    • View Profile
Re: Hello from NetworkShark
« Reply #9 on: November 19, 2023, 08:58:24 am »
Thank you Ivek, thousand times!


I have recently decide anyway to use PVD in the future regardless of the scripts state. It's basic functionality to at least record file entries will stay unbroken

Thank you for your hope in this project!

I hope Nostra will be heard, and I hope this will happen soon.
This project has a very good chance to grow.
Well, at the moment all that is left of the PVD users is one Global Moderator, one Moderator, one Administrator and one User (15 years of experience using PVD).  :'( :'( :'(

Offline afrocuban

  • Moderator
  • *****
  • Posts: 558
    • View Profile
Re: Hello from NetworkShark
« Reply #10 on: November 27, 2023, 10:34:03 pm »
Thank you Ivek, thousand times!


I have recently decide anyway to use PVD in the future regardless of the scripts state. It's basic functionality to at least record file entries will stay unbroken

Thank you for your hope in this project!

I hope Nostra will be heard, and I hope this will happen soon.
This project has a very good chance to grow.
Well, at the moment all that is left of the PVD users is one Global Moderator, one Moderator, one Administrator and one User (15 years of experience using PVD).  :'( :'( :'(

Well, 14 years here. But I doubt it's all that is left:

Quote
Most Online Today: 67. Most Online Ever: 908 (January 22, 2020, 05:58:28 am)

« Last Edit: November 27, 2023, 10:35:58 pm by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2757
    • View Profile
Re: Hello from NetworkShark
« Reply #11 on: December 14, 2024, 12:29:46 pm »
NetworkShark I have a question for you.

Would it be possible to write a script to download all IMDb Also Known As (AKA) titles from, for example, this url

https://www.imdb.com/title/tt0147800/releaseinfo/?ref_=tt_ov_at_dt_rdat

Namely, IMDb uses dynamic display of Also Known As (AKA) titles and uses this method on some other IMDb websites.

I am adding the IMDB_[EN][HTTPS]_TEST_2c 2c script to help make it easier.

I hope that some solution will be found within PVD so that it can be used to transfer information to the pvd database.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Re: Hello from NetworkShark
« Reply #12 on: December 15, 2024, 11:00:43 am »
NetworkShark I have a question for you.

Would it be possible to write a script to download all IMDb Also Known As (AKA) titles from, for example, this url

https://www.imdb.com/title/tt0147800/releaseinfo/?ref_=tt_ov_at_dt_rdat

Namely, IMDb uses dynamic display of Also Known As (AKA) titles and uses this method on some other IMDb websites.

I am adding the IMDB_[EN][HTTPS]_TEST_2c 2c script to help make it easier.

I hope that some solution will be found within PVD so that it can be used to transfer information to the pvd database.

Are you sure the AKA data is dynamic? I've dowloaded by linux machine and the data are here in the page.

I do some tests with the script 2c2c you have linked here and I return with some answers



Sorry that I'm in the conversation uninvited and not knowing programming.

No problem afrocuban, indeed I am interested in how I can integrate other ways to improve info retrieval from PVD

Maybe this part can be a separate thread in this forum


But I learned a bit along the way trying to locally download FA page with trailers. I succeeded to download it as downpage_trailer-UTF8_NO_BOM_FA.htm beside downpage-UTF8_NO_BOM.htm in order to try to parse them both, but no good news. I mean, I can parse them both, but dynamic content isn't downloaded,[/size]

I don't think we can do it with PVdBDownPage.exe:

Quote
// Download the URL page.
  LogMessage(Chr(9) + Chr(9) + '      Download with PVdBDownPage in file:|' + ScriptPath + OutFile + ' the information of:|' + URL + '||');
  FileExecute(ScriptPath + 'PVdBDownPage.exe', '"' + URL + '" "' + ScriptPath + OutFile + '"');

because it cannot download dynamic content since it
Quote
is likely due to the JavaScript that loads the video elements dynamically on the original page. When you download the page using DownloadPage, the JavaScript doesn’t execute, so the dynamic content (like embedded trailers) isn’t included in the downloaded HTML file.
To work around this, we have a couple of options:

Use a Web Browser Automation Tool: Tools like Selenium can control a web browser to visit the page and let all JavaScript execute before saving the fully rendered page’s HTML. This way, the dynamic content will be included.

Find the Video URL Manually: Check the original HTML of the page in your browser (right-click and “View Page Source” or inspect elements with Developer Tools). Look for the actual URLs of the video files. You can then use those URLs directly in your script.

Using a web browser automation tool like Selenium is a bit more involved and requires additional setup, but it is a more robust solution for handling dynamic content. If you prefer to go this route, I can guide you through the setup process.

Alternatively, manually inspecting the page’s source for video URLs is quicker but less convenient for repeated tasks.



Quote
Here's the rest of the conversation:
Invoking Selenium directly from within a PVD script (written in Pascal) is not straightforward because Selenium typically operates via Python, Java, or another high-level language. However, you can use a workaround by invoking an external Python script from your PVD Pascal script.

Here’s how you can integrate Selenium into your workflow:

Create a Python Script Using Selenium:

This script will navigate to the URL, wait for the content to load, and save the fully rendered HTML.

python
Quote
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Setup the WebDriver
driver = webdriver.Chrome(executable_path='path_to_chromedriver')

# Navigate to the URL
driver.get("https://www.filmaffinity.com/en/evideos.php?movie_id=xxxx")

# Wait for the trailers to load (adjust the waiting condition as needed)
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "divvid"))
)

# Get the fully rendered HTML
html = driver.page_source

# Save the HTML to a file
with open('downpage_trailer-UTF8_NO_BOM_FA.htm', 'w', encoding='utf-8') as file:
    file.write(html)

# Close the browser

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Setup the WebDriver
driver = webdriver.Chrome(executable_path='path_to_chromedriver')

# Navigate to the URL
driver.get("https://www.filmaffinity.com/en/evideos.php?movie_id=xxxx")

# Wait for the trailers to load (adjust the waiting condition as needed)
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "divvid"))
)

# Get the fully rendered HTML
html = driver.page_source

# Save the HTML to a file
with open('downpage_trailer-UTF8_NO_BOM_FA.htm', 'w', encoding='utf-8') as file:
    file.write(html)

# Close the browser
driver.quit()

Invoke the Python Script from Your PVD Script:

Use the FileExecute method to call the Python script from within your Pascal script.

pascal
Quote
// Log the action
LogMessage(Chr(9) + Chr(9) + '      Download with Selenium in file:|' + ScriptPath + OutFile + ' the information of:|' + URL + '||');

// Construct the command to run the Python script
FileExecute('python.exe', ScriptPath + 'selenium_script.py');
In this setup:

The Python script handles the browsing and downloading of the fully rendered HTML using Selenium.

Your PVD script uses the FileExecute method to call the Python script, passing any necessary parameters if needed.

After the Python script finishes and the HTML is saved, your PVD script can continue processing the saved HTML as usual.

Example Complete Workflow
1. Python Script (selenium_script.py)

python
Quote
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import sys

# Get the URL from the command line arguments
url = sys.argv[1]
output_file = sys.argv[2]

# Setup the WebDriver
driver = webdriver.Chrome(executable_path='path_to_chromedriver')

# Navigate to the URL
driver.get(url)

# Wait for the trailers to load (adjust the waiting condition as needed)
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "divvid"))
)

# Get the fully rendered HTML
html = driver.page_source

# Save the HTML to a file
with open(output_file, 'w', encoding='utf-8') as file:
    file.write(html)

# Close the browser
driver.quit()
2. Pascal Script (Invoke Selenium Script)

pascal
Quote
// Log the action
LogMessage(Chr(9) + Chr(9) + '      Download with Selenium in file:|' + ScriptPath + OutFile + ' the information of:|' + URL + '||');

// Construct the command to run the Python script
FileExecute('python.exe', '"' + ScriptPath + 'selenium_script.py" "' + URL + '" "' + ScriptPath + OutFile + '"');
By following this workflow, you can effectively integrate Selenium into your PVD script to handle pages with dynamic content. This way, you can ensure that all necessary content, like embedded trailers, is included in the HTML that your PVD script processes.

How would selenium know which url to open?
Great question. You can pass the URL to Selenium through command line arguments when you invoke the Python script. This way, your PVD script can dynamically supply the URL to Selenium. Here's how you can do it:

Step-by-Step Implementation
Modify Your Python Script to Accept Command Line Arguments:

You’ll adjust the Python script to accept the URL as an argument. This ensures that Selenium knows which URL to open.

Example Python Script (selenium_script.py):

python
Quote
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import sys

# Get the URL and output file from the command line arguments
url = sys.argv[1]
output_file = sys.argv[2]

# Setup the WebDriver
driver = webdriver.Chrome(executable_path='path_to_chromedriver')

# Navigate to the URL
driver.get(url)

# Wait for the trailers to load (adjust the waiting condition as needed)
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "divvid"))
)

# Get the fully rendered HTML
html = driver.page_source

# Save the HTML to a file
with open(output_file, 'w', encoding='utf-8') as file:
    file.write(html)

# Close the browser
driver.quit()
Invoke the Python Script from Your PVD Script:

Use the FileExecute method to call the Python script, passing the URL and the output file path as arguments.

Example Pascal Script:

pascal
Quote
// Log the action
LogMessage(Chr(9) + Chr(9) + '      Download with Selenium in file:|' + ScriptPath + OutFile + ' the information of:|' + URL + '||');

// Construct the command to run the Python script
FileExecute('python.exe', '"' + ScriptPath + 'selenium_script.py" "' + URL + '" "' + ScriptPath + OutFile + '"');
How It Works
Python Script: The Python script (selenium_script.py) receives the URL and the output file path as command line arguments. It then uses Selenium to open the URL, waits for the trailers to load, and saves the fully rendered HTML to the specified output file.

Pascal Script: Your PVD script constructs the command to run the Python script with the necessary arguments (the URL and the output file path) and then executes it. This ensures that the correct URL is passed to Selenium dynamically.

This approach allows you to integrate Selenium seamlessly into your PVD script, enabling it to handle pages with dynamic content effectively. If you need further assistance with any part of the setup, feel free to ask!

How slow is this process?
The process of using Selenium to render and download the HTML can be slower than simply downloading a static page because Selenium has to:

Open a Web Browser: Launching a browser instance (e.g., Chrome or Firefox) takes a few seconds.

Navigate to the URL: Just like a human user, Selenium will navigate to the specified URL, which involves network latency.

Wait for JavaScript Execution: Selenium waits for the page to fully load, including the execution of any JavaScript that dynamically loads content. This can vary in time depending on the complexity of the page and the speed of your internet connection.

Save the HTML: Once the page is fully loaded, Selenium will extract and save the HTML content, which is relatively quick.

Estimated Time
Browser Launch: 1-3 seconds

Page Navigation: 1-5 seconds (depends on network speed)

JavaScript Execution & Content Load: 3-10 seconds (depends on page complexity)

Saving HTML: <1 second

In total, it can take anywhere from 5 to 20 seconds or more per page, depending on various factors.

Performance Considerations
Network Speed: Faster internet connections will reduce the navigation and loading time.

Page Complexity: Pages with heavy JavaScript and multiple dynamic elements will take longer to fully load.

Browser & System Performance: Using a faster browser and a system with more resources (CPU, RAM) will speed up the process.

While this is slower compared to a direct HTML download using PVdBDownPage.exe, it ensures that all dynamic content is fully loaded and accessible. This trade-off can be worthwhile if you need to scrape content that is loaded dynamically by JavaScript.

If speed becomes a critical issue, you might look into optimizing the Selenium script further, such as by reducing the wait times where possible or running multiple instances in parallel if applicable.

SysAdmin for Work? For Fun!

Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Re: Hello from NetworkShark
« Reply #13 on: December 15, 2024, 11:28:26 am »

Are you sure the AKA data is dynamic? I've dowloaded by linux machine and the data are here in the page.

I do some tests with the script 2c2c you have linked here and I return with some answers

I hadn't noticed the button "50 more", good morning to me!  ;D ;D
« Last Edit: December 15, 2024, 11:35:44 am by NetworkShark »
SysAdmin for Work? For Fun!

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2757
    • View Profile
Re: Hello from NetworkShark
« Reply #14 on: December 15, 2024, 12:42:57 pm »

Are you sure the AKA data is dynamic? I've dowloaded by linux machine and the data are here in the page.

I do some tests with the script 2c2c you have linked here and I return with some answers

I hadn't noticed the button "50 more", good morning to me!  ;D ;D

There is a "50 more" or smaller number button, but if the number exceeds "50 more" then there is an "All" button next to it as is the case with the link below for the 10 Things I Hate About You movie.

https://www.imdb.com/title/tt0147800/releaseinfo/?ref_=tt_ov_at_dt_rdat#akas

It is intended dynamically in this direction that you have to click on the all button for example to see all 60 aka titles on the link above.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline afrocuban

  • Moderator
  • *****
  • Posts: 558
    • View Profile
Re: Hello from NetworkShark
« Reply #15 on: December 15, 2024, 09:38:31 pm »

No problem afrocuban, indeed I am interested in how I can integrate other ways to improve info retrieval from PVD

Maybe this part can be a separate thread in this forum

Split and moved topic to Development forum, here. Feel free to further manipulate it.


Ivek's message: http://www.videodb.info/forum_en/index.php/topic,4357.msg22644.html#msg22644 wasn't clear so I moved it too there, but if you think it should stay here, please apologize and move it back.


Best regards,
« Last Edit: December 15, 2024, 09:40:15 pm by afrocuban »

Offline Ivek23

  • Global Moderator
  • *****
  • Posts: 2757
    • View Profile
Re: Hello from NetworkShark
« Reply #16 on: December 16, 2024, 08:25:44 am »
No problem.
Ivek23
Win 10 64bit (32bit)   PVD v0.9.9.21, PVD v1.0.2.7, PVD v1.0.2.7 + MOD


Offline NetworkShark

  • Administrator
  • *****
  • Posts: 34
  • SysAdmin for Work? For Fun!
    • View Profile
Re: Hello from NetworkShark
« Reply #17 on: December 18, 2024, 10:02:38 pm »
Hy guys,

I believe it's impossible to retrieve the data in the traditional way.

The url to call is like this:
Code: [Select]
https://caching.graphql.imdb.com/? operationName=TitleReleaseDatesPaginated&variables={“after”: “NQ==”, “const”: “tt0147800”, “first”:50, “locale”: “en-US”, “originalTitleText”: false}&extensions={“persistedQuery”:{“sha256Hash”:“0e4e6468b8bc55114f80551e7a062301c78999ee538789a936902e4ab5239ccd”,“version”:1}}
The page has all the data except the sha256Hash, which seems to be a key for both authentication and data retrieval.

Sorry :'(
SysAdmin for Work? For Fun!