Help with grep in for loop

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Help with grep in for loop

#1 Post by doscode » 15 Jun 2012 02:08

I'm glad your forum works again. I am trying to solve ptroblem with reading of a file:

I want to get proxy data from this site:
http://www.samair.ru/proxy/
I copied the source page content here:
http://codepaste.net/bzdrhr
I need proxy:port , country and type high anonymouse. In command line I can simply type

Code: Select all

grep -B 25 "</td></tr></tbody></table>" | grep high
but this does not work at all in the loop. How to solve it?

Code: Select all

    @echo off 
    Setlocal EnableDelayedExpansion
    SET proxy=Proxy.htm     
    SET TAB=" "
   
    FOR /F "eol= tokens=* delims=*" %%A IN ('grep -B 25 "</td></tr></tbody></table>" !proxy! | grep high') DO (
    echo %%A
    pause
    )

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Help with grep in for loop

#2 Post by foxidrive » 15 Jun 2012 03:26

doscode wrote:I need proxy:port , country and type high anonymouse. In command line I can simply type
type "file.txt" ^|grep -B 25 "</td></tr></tbody></table>" ^| grep high
but this does not work at all in the loop. How to solve it?



You need the ^ I outlined above when it is in a for command tail in a for in do loop.


Or try this:

Code: Select all

    @echo off
    Setlocal EnableDelayedExpansion
    SET proxy=Proxy.htm     
   
   
    FOR /F "tokens=7,12,15 delims=<>" %%A IN ('grep -B 25 "</td></tr></tbody></table>" "!proxy!" ^| grep "high"') DO (
    echo %%A %%B %%C
    pause
    )
Last edited by foxidrive on 15 Jun 2012 03:36, edited 1 time in total.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#3 Post by doscode » 15 Jun 2012 03:31

Unfortunately,

I can use

Code: Select all

'type !proxy! ^| grep -B 25 "</td></tr></tbody></table>" ^| grep high'

Code: Select all

'cat !proxy! ^| grep -B 25 "</td></tr></tbody></table>" ^| grep high'

Code: Select all

'grep -B 25 "</td></tr></tbody></table>" !proxy! ^| grep high'


But none of these works to me as part of the loop. Can you please test my script? I see no output.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Help with grep in for loop

#4 Post by foxidrive » 15 Jun 2012 03:37

See what I posted above, after checking the output. Hopefully the HTML line endings won't affect it.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#5 Post by doscode » 15 Jun 2012 03:45

Yet what if I want to grep this line:

Code: Select all

<th colspan="10">&nbsp;</th>

having quotes? File here:
http://www.proxynova.com/proxy-server-list/
Should I use ^ to escape double quotes?

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#6 Post by doscode » 15 Jun 2012 03:51

foxidrive wrote:See what I posted above, after checking the output. Hopefully the HTML line endings won't affect it.


I'm surprised, it works. What was the problem? And why there are not the TD and /TD in the output?

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Help with grep in for loop

#7 Post by foxidrive » 15 Jun 2012 03:52

Code: Select all

    @echo off
    Setlocal EnableDelayedExpansion
    SET proxy=Proxy.htm     
   
   
    FOR /F "delims=" %%A IN ('findstr /r "<th colspan=.10.>&nbsp;</th>" "%proxy%"') DO (
    echo %%A
    pause
    )

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Help with grep in for loop

#8 Post by foxidrive » 15 Jun 2012 03:54

doscode wrote:
foxidrive wrote:See what I posted above, after checking the output. Hopefully the HTML line endings won't affect it.


I'm surprised, it works. What was the problem? And why there are not the TD and /TD in the output?


The delims are set to < and >

So every item between < and > is counted as a token, including the leading space. The tokens= is set to give you the three items you asked for.

Did you want the td /td in the output?

The basic error was that the | in the for in do command did not have ^| to escape it.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#9 Post by doscode » 15 Jun 2012 04:15

It is OK. It works. The last command with:

Code: Select all

findstr /r "<th colspan=.10.>&nbsp;</th>"

I need to get 846 lines that are before the result. This would be simple with grep - it also can use extended regex. So I will try

Code: Select all

grep -B 846 -E "<th colspan=.10.>&nbsp;</th>" file

and then to filter out the proxy, port, country, elite
Where elite is for High Anonymous server which should be there but now is not there. So I will look for Anonymous server this time.
http://www.proxynova.com/proxy-server-list/?p=10

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#10 Post by doscode » 15 Jun 2012 04:41

What's wrong with this command:

Code: Select all

FOR /F "tokens=4,5,6 delims=<>" %%A IN ('grep -B 846 -E "<th colspan=.10.>&nbsp;</th>" !proxy_2!' ^| grep -E ^"proxy^|port^|country^|proxy_^" ) DO (
  echo %%A %%B %%C
pause
)


Error:

Code: Select all

System cannot find file 'grep -B 846 -E "<th colspan=.10.>&nbsp;</th>" Proxy_
2.htm' | grep -E "proxy|port|country|proxy_".

http://www.samair.ru/proxy/

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Help with grep in for loop

#11 Post by foxidrive » 15 Jun 2012 04:43

I see 1 page of proxy servers there - I'm not a premium member.

They do have a proxy file you can download if you 'like' them...


The error appears to be because you renamed the proxy file.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#12 Post by doscode » 15 Jun 2012 05:05

foxidrive wrote:I see 1 page of proxy servers there - I'm not a premium member.

They do have a proxy file you can download if you 'like' them...


The error appears to be because you renamed the proxy file.


I know, but I don't want they keep my IP. The file for download is for free but doesn't have country info and proxy type info.

It is not "renamed file" error. If I simplify the code to one grep level, so no error is there. But I see the error now, the ' is on bad place. Forgot to move it to end.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#13 Post by doscode » 15 Jun 2012 05:32

It amost works but the type is not working yet

Code: Select all

FOR /F "tokens=3,4,5 delims=<>" %%A IN ('grep -B 810 -E "<th colspan=.10.>&nbsp;</th>" !proxy_2! ^| grep -E ^"row_proxy^|port^|country^|proxy_^" ') DO (
  echo 1: %%A 2:%%B 3:%%C
  if "%%C" == "/span" SET ip=%%B
  if "%%C" == "/a" SET port=%%B
  if "%%B" == "/a" SET location=%%A
  if "%%B" == "/span" (
    SET type=%%A;
    SET type=!type:~12,0!
    echo !type!
  )
  echo IP:!ip!:!port! in !location! is !type!
pause
)



P:\server\loop>parse_proxy.bat
1: span class="row_proxy_ip" 2:201.41.66.211 3:/span
IP:201.41.66.211: in is
Press any key to continue
1: a href="http://www.proxynova.com/proxy-server-list/port-8080/" title="proxy l
ist - port 8080" 2:8080 3:/a
IP:201.41.66.211:8080 in is
Press any key to continue
1: div class="vbar proxy_speed" title="1077" 2:/div 3:/td
IP:201.41.66.211:8080 in is
Press any key to continue
1: Brazil 2:/a 3:span style="color:#666666; font-size:10px;"
IP:201.41.66.211:8080 in Brazil is
Press any key to continue
...

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#14 Post by doscode » 15 Jun 2012 05:43

How to get rid of the bold text
span class="proxy_anonymous" style="font-weight:bold; font-size:10px;"


This will remove whole line but I need to remove everything starting from the second double quote

Code: Select all

SET type=!type:~12,0!

Length of the word in double quotes can be different

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Help with grep in for loop

#15 Post by doscode » 15 Jun 2012 06:09

I try this to get the word in double quotes:

Code: Select all

    SET type=%%A
    SET type=!type:"=$!
    FOR /F "tokens=1,2 delims=$" %%S IN ("%%A") DO (
    echo %%S
    echo %%T
    SET type=%%S %%T
    )

no success

Post Reply