Page 3 of 3

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 04:26
by foxidrive
Here is a version that doesn't use SED and maybe the file.htm it generates would be easier to parse in batch for you. This strips out all of the bogus IP information that the site includes.

Code: Select all

@echo off
setlocal enabledelayedexpansion
call :sar filein.htm fileina.htm "<div style=.display:none.>[0-9]*<\/div>" ""
call :sar fileina.htm file.htm "<span style=.display:none.>[0-9]*<\/span>" ""

htmstrip /-table /border=b file.htm
find "TP" <file.out |find /v ">" >file.txt

for /f "delims=" %%a in (file.txt) do (
set var=%%a
set var=!var:~11!
for /f "tokens=1-6" %%b in ("!var!") do (
if /i "%%f%%g"=="High+KA" echo %%b,%%c,%%d,%%e,%%f, %%g
)
)
pause
goto :EOF
:sar
::Search and replace
@echo off
if "%~3"=="" (
echo.Search and replace
echo Syntax:
echo %0 "filein.txt" "fileout.ext" "regex" "replace_text" [first]
echo.
echo. if [first] is present only the first occurrence is changed
goto :EOF
)
if "%~5"=="" (set global=true) else (set global=false)
set s=regex.replace(wscript.stdin.readall,"%~4")
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=%global%
>>_.vbs echo regEx.IgnoreCase=True           
>>_.vbs echo regex.pattern="%~3"
>>_.vbs echo wscript.stdOut.write %s%
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 04:36
by doscode
When you use
/border=b
so there is spaces as delimiter... But this is problem. The | as delimiter was good. When space is delimiter, so High +KA would be devided in two columns. Now I see still many spaces there.

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 04:39
by foxidrive
The script prints out a comma delimited set of text here.


201.116.187.236,3128,Mexico,HTTPS,High,+KA
177.43.203.122,8080,Brazil,HTTPS,High,+KA
182.52.114.41,3128,Thailand,HTTPS,High,+KA
200.42.69.94,8080,Argentina,HTTPS,High,+KA
178.48.2.237,8080,Hungary,HTTPS,High,+KA
218.248.4.101,8080,India,HTTPS,High,+KA
219.83.100.202,8080,Indonesia,HTTPS,High,+KA
94.42.176.108,80,Poland,HTTPS,High,+KA
118.97.71.57,3128,Indonesia,HTTPS,High,+KA
186.227.174.242,3128,Brazil,HTTPS,High,+KA
201.251.62.137,8080,Argentina,HTTPS,High,+KA
95.215.48.158,8080,Ukraine,HTTPS,High,+KA
202.182.172.2,3128,Indonesia,HTTPS,High,+KA
190.121.143.254,8080,Colombia,HTTPS,High,+KA
201.49.77.3,8080,Brazil,HTTPS,High,+KA
189.2.162.165,3128,Brazil,HTTPS,High,+KA
190.254.196.211,3128,Colombia,HTTPS,High,+KA
101.255.36.234,81,China,HTTPS,High,+KA
79.101.37.131,8080,Serbia,HTTPS,High,+KA
187.60.96.7,3128,Brazil,HTTPS,High,+KA
200.171.182.128,3128,Brazil,HTTPS,High,+KA
190.206.44.35,8080,Venezuela,HTTPS,High,+KA
210.246.88.46,8080,Thailand,HTTPS,High,+KA
210.4.97.107,3128,Philippines,HTTPS,High,+KA
218.65.230.212,8080,China,HTTPS,High,+KA
219.83.100.195,8080,Indonesia,HTTPS,High,+KA
201.22.249.114,3128,Brazil,HTTPS,High,+KA

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 05:10
by foxidrive
Try this - From my short examination it cleans up the HTML so that the real IP address are exposed in the HTML. The input file is filein.htm and final file is fileing.htm

Code: Select all

@echo off
call :sar filein.htm fileina.htm "<div style=.display:none.>[0-9]*<\/div>" ""
call :sar fileina.htm fileinb.htm "<span style=.display:none.>[0-9]*<\/span>" ""
call :sar fileinb.htm fileinc.htm "<span>" ""
call :sar fileinc.htm fileind.htm "</span>" ""
call :sar fileind.htm fileine.htm "<span class=.. style=..>" ""
call :sar fileine.htm fileinf.htm "<span class=.[0-9]*.>" ""
call :sar fileinf.htm fileing.htm "<span style=.display: inline.>" ""

pause
goto :EOF
:sar
::Search and replace
@echo off
if "%~3"=="" (
echo.Search and replace
echo Syntax:
echo %0 "filein.txt" "fileout.ext" "regex" "replace_text" [first]
echo.
echo. if [first] is present only the first occurrence is changed
goto :EOF
)
if "%~5"=="" (set global=true) else (set global=false)
set s=regex.replace(wscript.stdin.readall,"%~4")
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=%global%
>>_.vbs echo regEx.IgnoreCase=True           
>>_.vbs echo regex.pattern="%~3"
>>_.vbs echo wscript.stdOut.write %s%
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 05:49
by foxidrive
This version is now pure batch and VBS. Note that the input file is now FILE.HTM

Code: Select all


@echo off
setlocal enabledelayedexpansion
call :sar file.htm fileina.htm "<div style=.display:none.>[0-9]*<\/div>" ""
call :sar fileina.htm fileinb.htm "<span style=.display:none.>[0-9]*<\/span>" ""
call :sar fileinb.htm fileinc.htm "<span>" ""
call :sar fileinc.htm fileind.htm "</span>" ""
call :sar fileind.htm fileine.htm "<span class=.. style=..>" ""
call :sar fileine.htm fileinf.htm "<span class=.[0-9]*.>" ""
call :sar fileinf.htm fileing.htm "<span style=.display: inline.>" ""
find "td" <fileing.htm > fileing.txt


setlocal enabledelayedexpansion
set debug=rem
for %%x in (a b c d e f g h i j k l) do set %%x=

for /f "skip=9 delims=" %%a in (fileing.txt) do (
set "a=!b!"
set "b=!c!"
set "c=!d!"
set "d=!e!"
set "e=!f!"
set "f=!g!"
set "g=!h!"
set "h=!i!"
set "i=!j!"
set "j=!k!"
set "k=!l!"
set "l=%%a"

if defined a (

%debug%  for %%x in (a b c d e f g h i j k l) do echo !%%x!

for /f "delims=</td> " %%x in ("!c!") do set IP=%%x
for /f "delims=</td> " %%x in ("!e!") do set PORT=%%x
for /f "tokens=5 delims=<>" %%x in ("!f!") do set "COUNTRY=%%x"
for /f "tokens=4 delims==>" %%x in ("!g!") do set speedbar response_time=%%~x
for /f "tokens=7 delims==> " %%x in ("!i!") do set speedbar connection_time=%%~x
for /f "tokens=3 delims=<>" %%x in ("!k!") do set PROTOCOL=%%x
for /f "tokens=4 delims==<>" %%x in ("!l!") do set LAST=%%x

set "COUNTRY=!COUNTRY:~1!"

echo !IP!,!PORT!,!COUNTRY!,!speedbar response_time!,!speedbar connection_time!,!PROTOCOL!,!LAST!
%debug% pause
for %%x in (a b c d e f g h i j k l) do set %%x=
)
)
pause
del filein?.htm
goto :EOF
:sar
::Search and replace
@echo off
if "%~3"=="" (
echo.Search and replace
echo Syntax:
echo %0 "filein.txt" "fileout.ext" "regex" "replace_text" [first]
echo.
echo. if [first] is present only the first occurrence is changed
goto :EOF
)
if "%~5"=="" (set global=true) else (set global=false)
set s=regex.replace(wscript.stdin.readall,"%~4")
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=%global%
>>_.vbs echo regEx.IgnoreCase=True           
>>_.vbs echo regex.pattern="%~3"
>>_.vbs echo wscript.stdOut.write %s%
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 06:31
by foxidrive
The version above seems stable now.

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 14:18
by doscode
What do I need to succesfuly run the script with WHS, vbs, regex and so on? I have Win XP. I tryied to install Windows Script which should containt WHS and VBS but it told me that I have the service pack installed and I don't need it. But how can I test if I have all things installed? I can run the last script and it generates html files and fileing.txt including these lines on begin of file:

Code: Select all

<//www.w3.org/TR/html4/strict.dtd">
<td id="theaderleft">Last update</td>
<td>IP address</td>
<td>Port</td>
<td>Country</td>
<td>Speed</td>
<td>Connection time</td>
<td>Type</td>
<td id="theaderright">Anonymity</td>
<td class="leftborder timestamp" rel="1340943063"><span class="updatets ">
190.37.135.49</td>   
<td>
3128</td>

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 21:37
by foxidrive
WSH is a standard part of XP and every version of windows after that. It can be downloaded for Win9x and W2K.

So you should already have it installed in XP

Re: Double quotes in For loop input [SOLVED]

Posted: 30 Jun 2012 22:08
by foxidrive
doscode wrote:I can run the last script and it generates html files and fileing.txt including these lines on begin of file:

Code: Select all

<//www.w3.org/TR/html4/strict.dtd">
<td id="theaderleft">Last update</td>
<td>IP address</td>
<td>Port</td>
<td>Country</td>
<td>Speed</td>
<td>Connection time</td>
<td>Type</td>
<td id="theaderright">Anonymity</td>
<td class="leftborder timestamp" rel="1340943063"><span class="updatets ">
190.37.135.49</td>   
<td>
3128</td>


That shows it is working. The code skips the first 9 lines and then processes each block of 12 lines to generate one line of data.

Re: Double quotes in For loop input [SOLVED]

Posted: 01 Jul 2012 07:19
by doscode
To understand your scripts from page 3.

I have saved it to 3 files:
1) foxidrive_short (parse without sed).bat
the first script on this page.
It uses htmstrip + vbs
This looks as the best choice for me because it is short code and it does need sed.
2) foxidrive_pure+vbs (parse without sed).bat
This is version without htmstrip
3) foxidrive_exposed (IP exposed in HTML).bat
This exposes the IP address in html

I would like to get back to the | delimiter, but still how can I remove the spaces which are around the tokens?

Re: Double quotes in For loop input [SOLVED]

Posted: 01 Jul 2012 07:30
by foxidrive
Examine this: it depends on the text format in many cases.

Code: Select all

@echo off
set "var=a | bcd   |  Australia | High KA+   "
for /f "tokens=1,2,3,4,5 delims=| " %%a in ("%var%") do echo %%a,%%b,%%c,%%d %%e

Final script

Posted: 08 Jul 2012 06:13
by doscode
Foxidrive helped me to finish the script with trimming of spaces - this is his work:

Code: Select all

@echo off
setlocal enabledelayedexpansion
call :sar filein.htm fileina.htm "<div style=.display:none.>[0-9]*<\/div>" ""
call :sar fileina.htm file.htm "<span style=.display:none.>[0-9]*<\/span>" ""

htmstrip /width=255 /-TABLE /BUFF=0 file.htm
find "TP" <file.out |find /v ">" >file.txt

for /f "tokens=1-4,7,8 delims=|" %%A in (file.txt) do (

rem echo A:"%%A",B:"%%B",C:"%%C",D:"%%D",E:"%%E",F:"%%F"

call :trim "%%A" a
call :trim "%%B" b
call :trim "%%C" c
call :trim "%%D" d
call :trim "%%E" e
call :trim "%%F" f

echo A:"!a!",B:"!b!",C:"!c!",D:"!d!",E:"!e!",F:"!f!"
)

pause
goto :EOF
:sar
  ::Search and replace
  @echo off
  if "%~3"=="" (
  echo.Search and replace
  echo Syntax:
  echo %0 "filein.txt" "fileout.ext" "regex" "replace_text" [first]
  echo.
  echo. if [first] is present only the first occurrence is changed
goto :EOF
)
if "%~5"=="" (set global=true) else (set global=false)
set s=regex.replace(wscript.stdin.readall,"%~4")
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=%global%
>>_.vbs echo regEx.IgnoreCase=True           
>>_.vbs echo regex.pattern="%~3"
>>_.vbs echo wscript.stdOut.write %s%
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs
goto :EOF
:trim
  if "%~1"=="" set "%~2=NULL"&goto :EOF
  set "str=%~1"
  for /f "tokens=* delims= " %%a in ("%str%") do set str=%%a
:right
  if "%str:~-1%"==" " set "str=%str:~0,-1%" & goto :right
  set "%~2=%str%"
goto :EOF


using htmstrim + vbs

Re: Double quotes in For loop input [SOLVED]

Posted: 08 Jul 2012 06:17
by doscode
Just before he sent his version to me I find out my own version of trimming with help of subroutine:

Code: Select all

@echo off
setlocal enabledelayedexpansion
call :sar filein.htm korea.htm "<div style=.display:none.>[0-9]*<\/div>" ""
call :sar fileina.htm file.htm "<span style=.display:none.>[0-9]*<\/span>" ""

htmstrip /WIDTH=255 /-TABLE /BUFF=0 file.htm
find "TP" <file.out |find /v ">" >file.txt

for /f "tokens=1-3,6,7 delims=|" %%A in (file.txt) do (
REM echo RAW A:"%%A",B:"%%B",C:"%%C",D:"%%D",E:"%%E"
REM if /i "%%E"==" High +KA" (
  for /f "tokens=2,3 delims=s" %%e in ("%%A") do ( SET IP=%%f )
  call :trim !IP!
  SET IP=!str!
  call :trim  %%B
  SET port=!str!
  call :trim  %%C
  SET country=!str!
  call :trim  %%D
  SET type=!str!
  call :trim  %%E
  SET anonymity=!str!
  echo IP:!IP!,port:!port!,country:!country!,type:!type!,anonymity:"!anonymity!"
REM  )
)
pause
goto :EOF
:sar
  ::Search and replace
  @echo off
  if "%~3"=="" (
  echo.Search and replace
  echo Syntax:
  echo %0 "filein.txt" "fileout.ext" "regex" "replace_text" [first]
  echo.
  echo. if [first] is present only the first occurrence is changed
goto :EOF
)
if "%~5"=="" (set global=true) else (set global=false)
set s=regex.replace(wscript.stdin.readall,"%~4")
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=%global%
>>_.vbs echo regEx.IgnoreCase=True           
>>_.vbs echo regex.pattern="%~3"
>>_.vbs echo wscript.stdOut.write %s%
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs
goto :EOF
:trim
  set str=
  if NOT "%1"=="" ( set str=%1) else ( goto :EOF )
  if NOT "%2"=="" ( set str=%str% %2) else ( goto :EOF )
  if NOT "%3"=="" ( set str=%str% %3) else ( goto :EOF )
  if NOT "%4"=="" ( set str=%str% %4) else ( goto :EOF )
  if NOT "%5"=="" ( set str=%str% %5) else ( goto :EOF )
  if NOT "%6"=="" ( set str=%str% %6) else ( goto :EOF )
  if NOT "%7"=="" ( set str=%str% %7) else ( goto :EOF )
  if NOT "%9"=="" ( set str=%str% %9) else ( goto :EOF )
goto :EOF


The file that we parse is proxy list from hidemyass.com.

This version changes words like "Korea, republic of" to "Korea republic of" and "Taiwan, republic of China" to "Taiwan republic of China".

Yet I think about possibility to remove the part after comma, because this part is not necessary to be included in the result.

So this is completed version indeed:

Code: Select all

@echo off
setlocal enabledelayedexpansion
call :sar filein.htm korea.htm "<div style=.display:none.>[0-9]*<\/div>" ""
call :sar fileina.htm file.htm "<span style=.display:none.>[0-9]*<\/span>" ""

htmstrip /WIDTH=255 /-TABLE /BUFF=0 file.htm
find "TP" <file.out |find /v ">" >file.txt

for /f "tokens=1-3,6,7 delims=|" %%A in (file.txt) do (
REM echo RAW A:"%%A",B:"%%B",C:"%%C",D:"%%D",E:"%%E"
REM if /i "%%E"==" High +KA" (
  for /f "tokens=2,3 delims=s" %%e in ("%%A") do ( SET IP=%%f )
  call :trim !IP!
  SET IP=!str!
  call :trim  %%B
  SET port=!str!
  for /f "tokens=1 delims=," %%Z IN ("%%C") do SET country=%%Z
  call :trim  !country!
  SET country=!str!
  call :trim  %%D
  SET type=!str!
  call :trim  %%E
  SET anonymity=!str!
  echo IP:!IP!,port:!port!,country:!country!,type:!type!,anonymity:"!anonymity!"
REM  )
)
pause
goto :EOF
:sar
  ::Search and replace
  @echo off
  if "%~3"=="" (
  echo.Search and replace
  echo Syntax:
  echo %0 "filein.txt" "fileout.ext" "regex" "replace_text" [first]
  echo.
  echo. if [first] is present only the first occurrence is changed
goto :EOF
)
if "%~5"=="" (set global=true) else (set global=false)
set s=regex.replace(wscript.stdin.readall,"%~4")
 >_.vbs echo set regex=new regexp
>>_.vbs echo regex.global=%global%
>>_.vbs echo regEx.IgnoreCase=True           
>>_.vbs echo regex.pattern="%~3"
>>_.vbs echo wscript.stdOut.write %s%
cscript /nologo _.vbs <"%~1" >"%~2"
del _.vbs
goto :EOF
:trim
  set str=
  if NOT "%1"=="" ( set str=%1) else ( goto :EOF )
  if NOT "%2"=="" ( set str=%str% %2) else ( goto :EOF )
  if NOT "%3"=="" ( set str=%str% %3) else ( goto :EOF )
  if NOT "%4"=="" ( set str=%str% %4) else ( goto :EOF )
  if NOT "%5"=="" ( set str=%str% %5) else ( goto :EOF )
  if NOT "%6"=="" ( set str=%str% %6) else ( goto :EOF )
  if NOT "%7"=="" ( set str=%str% %7) else ( goto :EOF )
  if NOT "%9"=="" ( set str=%str% %9) else ( goto :EOF )
goto :EOF


So multiple words are included, but not those which are after comma in the country name.