proxy list downloader

Message

doscode · #1 Post by **doscode** » 06 Aug 2013 08:01

With help of people from this forum, I have made this downloader, which should download proxy lists from different servers. Now I realized that server hidemyadd.com enables to filter out some sort of IPs, so you can save time when reading the lists. In this moment I am debuging hidemyass so there is :jump label which skips few commands. The script works or seems like be working, however the server sends back something different than should send.

First http request is for first page:
... /proxy-list/search-267968
and second is for second page:
... /proxy-list/search-267968/2
it should be saved as hidemyass_1.htm and hidemyass_2.htm and if you open it, you found that there is not ip list. However if you open this links in browser, you see ip list. Any idea how to solve the problem?

you need curl to download the pages

http://pastebin.com/yuTLj0vt

#2 Post by **penpen** » 06 Aug 2013 09:21

I am not familar with curl, but according to its manpage you have forgotten to use the option -G to perform a HTTP1.1 GET request instead of the standard HTTP1.1 POST request:http://curl.haxx.se/docs/manpage.html

penpen

doscode · #3 Post by **doscode** » 06 Aug 2013 11:13

The -G arguments seems not to effect anything (but I don't understand the manuals description) so I will ask on some other place, near to linux.

#4 Post by **penpen** » 06 Aug 2013 13:17

The effect of the different request methods may not always be visible, but the servers may react in a different way to each of them.

For more informations on this, just read the "Request methods" section of:
https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

penpen

#5 Post by **penpen** » 06 Aug 2013 13:51

I have watched the website because you said, that nothing changed by using -G option of curl.
I have found out, that they have splitted the ip:port adresses in a complicated way, i assume to prevent from reading the ip:port adresses.

Nevertheless the upper address (31.3.237.250:8090) in the file i have downloaded from http://www.hidemyass.com/proxy-list/search-267968 is:

Code: Select all

+-<html>
  +-<body>
    +-<div id="subpagetabs">
    | +-<div id="container">
    | | +-<table id="listtable" cellSpacing="0" cellPadding="0" rel="50">
    | | | + (...)
    | | | +-<tbody>
    | | |   +-<tr class="" rel="13357017">
    | | |   | +-<td />
    | | |   | +-<td>
    | | |   | | +-<span>
    | | |   | |   +-<style />
    | | |   | |   +- 31
    | | |   | |   +-<span>
    | | |   | |   | +- .
    | | |   | |   +-<span>
    | | |   | |   | +- 3
    | | |   | |   +-<span>
    | | |   | |   | +- .
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<div />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<div />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<span />
    | | |   | |   +-<div />
    | | |   | |   +-<span class="3"/>
    | | |   | |   | +- 237
    | | |   | |   +-<span class="N82k">
    | | |   | |   | +- .
    | | |   | |   +<span class="N82k">
    | | |   | |     +-250
    | | |   | +-<td>
    | | |   | | +- 8090

I recommend you to use some web developer tools, like this of the Internet Explorer, if you have no other:
Just display the internet site in a tab and then press F12, use the Point Tool to find the parts of the html document easily.

When i am setting up a webserver, it sends different code on GET, POST, ... so this was my first thought, sry for that, as this is not the problem here.

penpen

doscode · #6 Post by **doscode** » 08 Aug 2013 16:26

I found that the dowloader works but I need to add slash after http adress. I have lost two days till somebody noticed it :-)

....

But I have a need to correct my script a bit because I dont know when the downloaded page does not contain any proxies because no other results. So I should stop downloading if not sufficient results found.

Old script:

Code: Select all

@echo off 
Setlocal EnableDelayedExpansion
SET proxy_3=hide_1.htm                    
SET source_3=http://www.hidemyass.com/proxy-list/search-227955/
FOR /L %%N IN (1,+1,40) DO CALL :download "http://www.hidemyass.com/proxy-list/2%%N" hidemyass_ %%N

Wait... I am not sure why I have number 2 before %%N... I think here should be rather

Code: Select all

FOR /L %%N IN (1,+1,40) DO CALL :download "!source_3!%%N"

The code for download:

Code: Select all

:download
SET user_agent="User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.8) Gecko/20100214 Ubuntu/9.10 (karmic) Firefox/3.5.8"
SET accept="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
SET accept_language="Accept-Language: en-us,en;q=0.5"
SET accept_charset="Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"
SET keep_alive="Keep-Alive: 300"
SET connection="Connection: keep-alive"
SET url=http://request.urih.com/
del %1 2>NUL
Echo Gonna download from %~1
curl -o %2%3.htm -H %user_agent% -H %accept% -H %accept_language% -H %accept_charset% -H %keep_alive% -H %connection% %~1
GOTO :eof

Now I need to detect size of downloaded file, if it is less or equal 26823 bytes and it does not contain string <span class="updatets ">
so this page is empty so no next downloading should be made.

But better way would be to count how many strings of <span class="updatets "> is included in the downloaded file to detect If I should continue to download.

But the dowload section should serve for more servers, but hydemyass is just one of them. So I need universal solution. I would like to get second argument of the call and from the output name remove all from underline to end of string. So from hidemyass_ is hidemyass. So if I would work with hidemyass, I could define which subroutine would be called to specify how many results of certain string should be contained in the downloaded file. If the count is less, then don't download next file. If the result is 0 then delete the actually downloaded file.

I know how to get filesize
for %%I in (!file!) do SET filesize=%%~zI
echo Filesize !filesize! B

but would like help in filtering the file...

I am having few commands doing similar things with grep:

Code: Select all

FOR /F "tokens=1-20 delims=<>" %%A IN ('grep -B 1411 -E "</table>" %file% ^| grep -E ^"^(display^|^>[0-9]{1^,2}^<^|[0-9][0-9][0-9]^|[0-9][0-9]{1^,2}^</td^>^|flag^|^<td^>HTTP^|rightborder^).*$^" ') DO (

Which means that it will make search in block if text starting on line 1411 and finishing with tag </table> ... I would need just change the part following to return count of results of the string <span class="updatets "> ...
So I believe I could have something like

Code: Select all

grep -B 1411 -E "</table>" %2%3.htm ^| grep -c ^"<span class="updatets "> ... "

to get count of results ...

any tips how to complete this code?

DosTips.com

proxy list downloader

proxy list downloader

Re: proxy list downloader

Re: proxy list downloader

Re: proxy list downloader

Re: proxy list downloader

Re: proxy list downloader