Fastest way to extract IP:port occurances any text file?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
MKANET
Posts: 160
Joined: 31 Mar 2012 21:31

Fastest way to extract IP:port occurances any text file?

#1 Post by MKANET » 14 Aug 2013 11:44

Im hoping to find the fastest way to extract all occurrences of IP:port in almost any text file using pure batch file (if possible). I tried searching for a solution on google, but didn't find anything. Maybe there needs to be a little help from vbscript or powershell without downloading/installing extra software? Linux already has multiple native tools for this; such as: sed, awk, and grep.
textfile.txt

Code: Select all

This is a text file which shows ipaddress:port randomly 6.240.54.222:22.  The IP address can occur anywhere in the text file 72.222.7.123:8080 with no recognizable delimiters or surrounding characters such as brackets [] or colons:.  So 129.16.0.129:65080 can appear anywhere in the text file.


So, if I run the code against textfile.txt, the result would be:
6.240.54.222:22
72.222.7.123:8080
129.16.0.129:65080

I really wish findstr could show only matching string; instead of the entire line containing it. The below command doesn't really do many any good.

Code: Select all

findstr ".*[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\:[0-9][0-9]*" "D:\textfile.txt"

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Fastest way to extract IP:port occurances any text file?

#2 Post by penpen » 14 Aug 2013 14:45

I don't know how fast this is, but it should give you all ip:port addresses in the file
if they are encapsulated in batch delimiters (space, tab, comma, ...), and
if all lines in this textfile are shorter than 1023 chars (including carriage return and endline):

Code: Select all

@echo off
cls
setlocal enableDelayedExpansion

set "regInetAddr=[0-9][0-9]*.[0-9][0-9]*\.[0-9][0-9]*.[0-9][0-9]*:[0-9][0-9]*"
set "file=Z:\textfile.txt"

for /F "tokens=* delims=" %%a in ('findstr "%regInetAddr%" "%file%"') do (
   for %%b in (%%a) do (
      set token=%%b
      for /F "tokens=1 delims=0123456789.:" %%c in ("%%b") do set "token="
      for /F "tokens=1-5* delims=0123456789" %%c in ("!token!") do if NOT "%%c%%d%%e%%f%%g" == "...:" set "token="
      for /F "tokens=1-5 delims=.:" %%c in ("!token!") do (
         set /A "IP[3]=%%c", "IP[2]=%%d", "IP[1]=%%e", "IP[0]=%%f", "port=%%g"
         if not "!IP[3]!" == "%%c" set "token="
         if not "!IP[2]!" == "%%d" set "token="
         if not "!IP[1]!" == "%%e" set "token="
         if not "!IP[0]!" == "%%f" set "token="
         if not "!port!" == "%%g" set "token="
         if !IP[3]! GTR 255 set "token="
         if !IP[2]! GTR 255 set "token="
         if !IP[1]! GTR 255 set "token="
         if !IP[0]! GTR 255 set "token="
         if !port! GTR 65535 set "token="
         if defined token echo !token!
      )
   )
)

goto :eof

penpen

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Fastest way to extract IP:port occurances any text file?

#3 Post by foxidrive » 14 Aug 2013 20:40

This is an ugly kludge but it gets you a long way there. It would be fairly simple to filter out results that were less than 10 characters.

Code: Select all

@echo off
for /f "tokens=1-10 delims=[]ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz " %%a in (file.txt) do echo "%%a" "%%b" "%%c" "%%d" "%%e" "%%f" "%%g" etc

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Fastest way to extract IP:port occurances any text file?

#4 Post by Aacini » 14 Aug 2013 21:02

You may use my FindRepl.bat program, that is a hybrid Batch-JScript program; it is very fast! For example:

Code: Select all

< file.txt findrepl "(\d+\.\d+\.\d+\.\d+:\d+)" /$:1

Output:

Code: Select all

 "6.240.54.222:22"
 "72.222.7.123:8080"
 "129.16.0.129:65080"

If you want to eliminate the quotes, just process the lines with a FOR.

Antonio

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Fastest way to extract IP:port occurances any text file?

#5 Post by foxidrive » 14 Aug 2013 23:24

An excellent solution Antonio.

Endoro
Posts: 244
Joined: 27 Mar 2013 01:29
Location: Bozen

Re: Fastest way to extract IP:port occurances any text file?

#6 Post by Endoro » 15 Aug 2013 00:35

Code for GNU grep for Windows.
It can also find IP's without port numbers, e.g. '129.16.0.129'.

Code: Select all

grep -Po "(\d+\.){3}\d+:?(\d?){4}" "file"

This is extremely fast.

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Fastest way to extract IP:port occurances any text file?

#7 Post by penpen » 15 Aug 2013 02:18

I think these solutions are subopimal, as they are returning something like this as an ip:port address:
256.256.256.256:65536 (sint32 > 255, 65535)
2147483648.1.1.1:1 (sint32 overflow)
so the range and overflow checking should be added.

Endoro wrote:

Code: Select all

grep -Po "(\d+\.){3}\d+:?(\d?){4}" "file"
The 4 should be a 5 as 65535 is a valid port (the maximum valid one).
And to not get something like that 1255.255.255.25565535 the grep command shoul look like this:

Code: Select all

grep -Po "((\d+){3}\.){3}(\d+){3}(:(\d+){5})?" "file"
Although i'm not firm with grep, so better someone who is, should have a view on it.

penpen

Endoro
Posts: 244
Joined: 27 Mar 2013 01:29
Location: Bozen

Re: Fastest way to extract IP:port occurances any text file?

#8 Post by Endoro » 15 Aug 2013 04:23

@penpen our codes might be suboptimal but working for most real cases, but yours doesn't work ever.

Code: Select all

grep -Po "((\d+){3}\.){3}(\d+){3}(:(\d+){5})?" "file"


But if you dont't want to find '299.222.7.123:8080' you need more than Regex, you need the ability to calculate.

And moreover, in '1255.255.255.25565535' grep finds with my code '255.255.255.25565535', so you need a Regex for IP with and without port.

With port:

Code: Select all

grep -Po "([0-2]?\d?\d\.){3}[0-2]?\d?\d:(\d?){5}" file


Without port:

Code: Select all

grep -Po "([0-2]?\d?\d\.){3}[0-2]?\d?\d" file

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Fastest way to extract IP:port occurances any text file?

#9 Post by penpen » 15 Aug 2013 07:26

Endoro wrote:@penpen our codes might be suboptimal but working for most real cases, but yours doesn't work ever.

Code: Select all

grep -Po "((\d+){3}\.){3}(\d+){3}(:(\d+){5})?" "file"

This code indeed does not work on most cases, and it even produces wrong positives as your code.
But it recognizes all above 000.000.000.000 or 000.000.000.000:00000 (inclusively), so it recognizes nearly almost 1/64 of all possible valid representations, (and all above, as your solution).
Because of this i've written: "Although i'm not firm with grep, so better someone who is, should have a view on it".
I thought, what i've wanted to do is obvious namely (now i am firm with grep):

Code: Select all

grep -Po "(\d{1,3}\.){3}\d{1,3}(\:\d{1,5})?" "textfile.txt"


Endoro wrote:But if you dont't want to find '299.222.7.123:8080' you need more than Regex, you need the ability to calculate.
This is wrong for a finite set of words over a language.
So it is always possible to realize this with one regular expression;
the following regexp is a little bit longer but it should be the fastest on internal structure build within grep and on execute:

Code: Select all

grep -Po -w "(((\d{1,2})|([0-1]\d{2})|(2(([0-4]\d)|(5[0-5]))))\.){3}((\d{1,2})|([0-1]\d{2})|(2(([0-4]\d)|(5[0-5]))))(\:((\d{1,4})|(([0-5]\d{4})|(6(([0-4]\d{3})|(5(([0-4]\d{2})|(5(([0-2]\d)|(3[0-5]))))))))))?" "file"
Note that i didn't tried to offend anybody with my last post, it should be just a hint.

penpen

Edit: Fixed some spelling flaws.
Edit2: Added the edit notes.
Last edited by penpen on 15 Aug 2013 07:50, edited 2 times in total.

Endoro
Posts: 244
Joined: 27 Mar 2013 01:29
Location: Bozen

Re: Fastest way to extract IP:port occurances any text file?

#10 Post by Endoro » 15 Aug 2013 07:47

This works for IP:port

Code: Select all

grep -Po "(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):(\d?){5}" file

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Fastest way to extract IP:port occurances any text file?

#11 Post by penpen » 15 Aug 2013 10:13

I have computed (by hand, so it took a while) the optimal regular expression for scanning.
It is also optimal on building the internal structure grep is using and
it offers all optimization possibilities grep may use: minimum spanning tree, ... .

So this is an optimal regExp using grep

Code: Select all

grep -Po "(([0-1](\d\d?)?|2([0-4]\d?|5[0-5]?|[6-9])?|[3-9]\d?)\.){3}([0-1](\d\d?)?|2([0-4]\d?|5[0-5]?|[6-9])?|[3-9]\d?)(:([0-5](\d(\d(\d\d?)?)?)?|6([0-4](\d(\d\d?)?)?|5([0-4](\d\d?)?|5([0-2]\d?|3[0-5]?|[4-9])?|[6-9]\d?)?|[6-9](\d\d?)?)?|[7-9](\d(\d\d?)?)?))?" "textfile.txt"
for scanning both: ip:port, ip.

It uglily finds 0.1.2.3 as an ip adress within .0.1.2.3.4.5; the -w option wasn't of any help.

Maybe you may assume that the ip, ip:port adresses are encapsulated within {spaces, starting of line, end of line}, but this has to be added.

penpen

Edit: Added -w test result, corrected some flaws

Adrianvdh
Posts: 177
Joined: 16 May 2013 13:00

Re: Fastest way to extract IP:port occurances any text file?

#12 Post by Adrianvdh » 15 Aug 2013 14:53

Use

Code: Select all

findstr /rc:"^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >nul

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Fastest way to extract IP:port occurances any text file?

#13 Post by foxidrive » 15 Aug 2013 21:22

Adrianvdh wrote:Use

Code: Select all

findstr /rc:"^[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$" >nul


That will return the entire line - which is the problem when using findstr.

Endoro
Posts: 244
Joined: 27 Mar 2013 01:29
Location: Bozen

Re: Fastest way to extract IP:port occurances any text file?

#14 Post by Endoro » 16 Aug 2013 02:08

Batch is just not a word processor like Word.
For such tasks may be to switch to another tool, e.g. grep, sed, awk or Perl, all available for Windows.
Or you take one of the VBS solutions from dostips (example).

MKANET
Posts: 160
Joined: 31 Mar 2012 21:31

Re: Fastest way to extract IP:port occurances any text file?

#15 Post by MKANET » 19 Aug 2013 10:38

Penpen/Endoro and all, thanks so much for taking the time to offer various ways to search quickly IP:port. It is very nice to see some of the greatest batch file scripters collaborate here.

I'm still trying to decipher Penpen's "ultimate" grep solution. The below grep command works fantastic! This is the first time I've seen a Perl regular expression "-P option". I am strongly considering adding GNU grep.exe to all my Windows PCs.

Code: Select all

grep -Po "(([0-1](\d\d?)?|2([0-4]\d?|5[0-5]?|[6-9])?|[3-9]\d?)\.){3}([0-1](\d\d?)?|2([0-4]\d?|5[0-5]?|[6-9])?|[3-9]\d?)(:([0-5](\d(\d(\d\d?)?)?)?|6([0-4](\d(\d\d?)?)?|5([0-4](\d\d?)?|5([0-2]\d?|3[0-5]?|[4-9])?|[6-9]\d?)?|[6-9](\d\d?)?)?|[7-9](\d(\d\d?)?)?))?" "textfile.txt"

Anyway, it looks like Penpen's original batch file is the ONLY way to do it without any other file dependencies (ie, findrepl.bat or grep). This has been an extremely useful learning experience. Thanks again!!

Post Reply