Author Topic: HOW can I get the clicked URL of the SearchResults list  (Read 10936 times)

0 Members and 1 Guest are viewing this topic.

Offline meriator

  • User
  • ***
  • Posts: 52
    • View Profile
HOW can I get the clicked URL of the SearchResults list
« on: December 24, 2011, 03:32:03 am »
The script I currently trying to do
includes URLS
with POST and GET methods
and the site does not accept
POST instead of GET and vice versa
(http://www.filmportal.de)

I should be able to get the URL of the searchresults list
that was choosen/clicked before the download starts
to parse it and return the correct method.

I tryed everything without success,
no way to get this URL  before GetDownloadMethod is executed


the earlist state to get the URL is after download
but this of cource fails either with ...

HTTP/1.1 400 Bad Request
 (GET instead of POST)
or
HTTP/1.1 405 Method Not Allowed
 (POST instead of GET)

Is there a way ?

thanks

cu meriator
while 1000 thanks crawling after one....they may never reach...the journey is the reward

Offline meriator

  • User
  • ***
  • Posts: 52
    • View Profile
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #1 on: December 24, 2011, 03:34:12 am »
by the way Happy Christ.... ;) 
to all here

cu  meriator
while 1000 thanks crawling after one....they may never reach...the journey is the reward

mgpw4me

  • Guest
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #2 on: December 24, 2011, 10:53:38 pm »
The following link has a script that MAY help you...

http://www.videodb.info/forum_en/index.php/topic,1665.20.html

If you follow the smPage global variable, you'll see how I implemented multiple pages within the search results.  In your case,

function ParsePage(HTML : String; URL : AnsiString): Cardinal;
var
   RVal : Integer;
begin

   if ((Pos('/mediaindex',URL) > 0)) then begin;                                               // this is a valid IMDB image location
      if ((Pos('&ipage=load',URL) > 0) and (Mode <> smPage)) then begin;     // user selected a page, not an image
         PageUrl := URL;                                                                                     // set the url to retrieve another search page
         Result  := prDownLoad;                                                                         // download the page
         Mode   := smPage;                                                                                 // parse search results
                                                                                                                       // SET POST METHOD HERE
      end
     else
// SET GET METHOD 

If you have trouble, post the script and I'll take a look at it.

Offline meriator

  • User
  • ***
  • Posts: 52
    • View Profile
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #3 on: December 25, 2011, 09:22:28 pm »
thanks mgpw4me for trying
but this does not realy solve the problem

as I allready explained above
 the script parses the searchresults
at this point I allready know which of the URLs need which METHOD to be set
but I cant set a METHOD here in fact of,
I do not know at this time  which of the result items will be choosen/clicked
and I cant apply an invidual METHOD to each item of the searchlist at this point

so after an item has got choosen/clicked
the function GetDownloadMethod get called
but now I cant get the URL that was clicked to parse it
and to return the right METHOD

try it your self
(http://www.filmportal.de) search for "Fluch"
you will get a long list of result but only 10 per page
so the script has to add Buttons as searchresults
atleast
one for go to next page (if curPage < lastPage)
and
one for go to previous page (if curPage > 1)

the results them self need the GET METHOD
the NAV buttons need the POST METHOD

but how can I know what will be clicked

hope this is more understandable now

cu meriator


while 1000 thanks crawling after one....they may never reach...the journey is the reward

mgpw4me

  • Guest
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #4 on: December 25, 2011, 10:32:19 pm »
Understood.  My explanation was not complete enough (or won't work).  I don't see the problem, so I'll have to try it myself.

Now that I have a sample title to work with, I'll see what I can do with it over the next day or two.

I've been wanting to play with "post" anyway...aveleyman.com has content I want, but it requires "post" and I've been too lazy to investigate the process.

mgpw4me

  • Guest
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #5 on: December 27, 2011, 08:21:06 pm »
Finally looking at this....

I'm researching the exact process required by HTTP 1.1 standards for the POST method.  I've noted the Impawards poster plugin isn't working (for me), and it requires POST for search and GET for data...and it requires minimal post variable passing.  I didn't count, but filmportal looks like it requires at least 6 variables to be passed in the POST, so I'll do an example with Impawards, and expand on it (maybe tomorrow) for dealing with multiple search pages.

mgpw4me

  • Guest
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #6 on: December 29, 2011, 12:29:07 am »
My apologies for being so slow.

I've been sidetracked by making the script as flexible and easy to use as possible, while providing a "good" solution to the problem.  I'd prefer that the script be a good example for others trying to do the same thing (note: Pascal isn't my favorite language so it's taking longer than I'd like), and have been trying to make it portable...within the Windows and Wine environments (Wine support will be very limited).

I might post a zip file tomorrow.  If not. then the next day for sure.

Offline meriator

  • User
  • ***
  • Posts: 52
    • View Profile
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #7 on: December 29, 2011, 05:18:50 am »
my script is working
except my navigation buttons which need the POST METHOD instead GET
which I have to add
in case there are more then 10 items found
what means the searchresult page contains navigation buttons too

the navigation urls(POST) must contain ALL  hidden form inputs
they vari from search to search depending on paging page and result of pages

plus
Code: [Select]

this field value  must be searched and set
FF___FF__DF4425363084591DE0340003BA5CE267_0___Value=???

this field value
EncodingCheckString=ue
must be set manually  and should be excluded from parsed hiddens
(do NOT parse or set the org value "&#252;", to prevent miss URLencodings)

this field value
FF___FF__DF4425363084591DE0340003BA5CE267_1___Value=2
must be set manually  and should be excluded from parsed hiddens
it should be "2" (titlesearch)

do not parse the action
because it sometimes contains ";jsessionid=value[a-z|0-9]{10}" before "?"
(which also lead to miss URLencodings )
take this as default action URL instead
http://www.filmportal.de/dif/?FormName=PublicSearchForm&NavID=SimpleSearch&

the submit for navigation should be set like this
FF___ResultPager_%s___Value=%s


everything works perfekt so far
my script is multilang (en/de) for most of the data results

except this POST/GET thing  :(

I'm very excitedly for a solution
 ( while me still believing its impossible for scripts currently )

cu meriator

while 1000 thanks crawling after one....they may never reach...the journey is the reward

mgpw4me

  • Guest
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #8 on: December 29, 2011, 07:29:33 am »

I'm very excitedly for a solution
 ( while me still believing its impossible for scripts currently )


procedure FileExecute(const AFileName : String; const Params : String);

Opens a specified file with the application associated with its file type or starts another application if an executable file is specified.

Parameters:
AFileName    file to open or execute
Params    command line parameters to pass with the file

I started writing code this afternoon.

My plan for POST / GET is to load a web page to the hard drive, remove <script> tags (I hate pop-ups) and load it into a browser IFRAME under script control, do the initial search, have the user click a "save" button when they get to the right page, then pass the html from the IFRAME back to the .PSF file via a text file.  No passing post variables, no parsing of search results...the web site html takes care of the navigation.  Since the html will be accessible via javascript, the possibility exists to use the Document Object Model to process the html file for some information, rather than parsing it in the PSF file.  For example, with posters, I'll be able to look at the document.images collection without any messy string processing.

mgpw4me

  • Guest
Re: HOW can I get the clicked URL of the SearchResults list
« Reply #9 on: December 30, 2011, 08:28:21 pm »
I've been way too busy, so the following code is not even close to 'ready', but it does provide proof the method will work.

Some data about the activex component used is here:
http://www.the-art-of-web.com/javascript/ajax/

At the moment, what is does:

- Psf file invokes the HTA (HyperTExt Application)
- HTA uses GET to load the source code for the impawards.com index page into a textarea
- HTA can save the textarea to a file
- the activex component is 'standard' on windows xp and up...no install required

Known issues:

ActiveXObject('msxml2.xmlhttp') works for VISTA,  other objects are used by other O/S's.  The loadXMLDoc(url, params) subroutine shows how to connect to the correct object.

POST isn't supported by this code...at the moment...but it looks simple enough...in fact it could be possible to write a generic "read the <input> tags" routine to build most of the post variables on-the-fly.

Very unlikely this code will work on anything but a "genuine windows" install.

The FileExecute routine will NOT wait for the executable to finish before continuing.  You can either create a file and wait for it to be deleted by the HTA (you'll probably want a TIMEOUT value on the loop), or use the TASKLIST command (maybe in a batch file...TASKLIST > activetasks.txt) and check to see if the hta is active.

----------------------

I'll update this code as I have time, but meanwhile I hope it gives you a starting point.

BTW, the second file, posterconfig.zip contains a user interface configuration dialog.  Not very refined, but it works.

[attachment deleted by admin]

 

anything