How to output a range of lines from a text file using findstr

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
shodan
Posts: 54
Joined: 01 May 2023 01:49

How to output a range of lines from a text file using findstr

#1 Post by shodan » 09 Apr 2024 22:54

You might have seen a method using findstr, where you want to output chosen lines of text from a text file.

Code: Select all

type "myfile.txt" | %SystemRoot%\System32\findstr /N /R /C:".*" | %SystemRoot%\System32\findstr /B /C:"5785:" /C:"5786:" /C:"5787:" /C:"5788:"
And that certainly works but it gets unwieldly if you want to include thousands of lines.

So I asked chatgpt if I could use regex to select a range of lines

First it tried to do every group of 10 lines

Code: Select all

type myfile.txt | %SystemRoot%\System32\findstr /N "^" | findstr /R "^453[3-9]: ^454[0-9]: ^455[0-9]: ^456[0-9]: ^457[0-9]: ^458[0-9]: ^459[0-9]: ^460[0-9]: ^461[0-9]: ^462[0-9]: ^463[0-9]: ^464[0-9]: ^465[0-9]: ^466[0-9]: ^467[0-9]: ^468[0-9]: ^469[0-9]: ^470[0-9]: ^471[0-9]: ^472[0-9]: ^473[0-9]: ^474[0-9]: ^475[0-9]: ^476[0-9]: ^477[0-9]: ^478[0-9]: ^479[0-9]: ^480[0-9]: ^481[0-9]: ^482[0-9]: ^483[0-9]: ^484[0-9]: ^485[0-9]: ^486[0-9]: ^487[0-9]: ^488[0-9]: ^489[0-9]: ^490[0-9]: ^491[0-9]: ^492[0-9]: ^493[0-9]: ^494[0-9]: ^495[0-9]: ^496[0-9]: ^497[0-9]: ^498[0-9]: ^499[0-9]: ^500[0-9]: ^501[0-9]: ^502[0-9]: ^503[0-9]: ^504[0-9]: ^505[0-9]: ^506[0-9]: ^507[0-9]: ^508[0-9]: ^509[0-9]: ^510[0-9]: ^511[0-9]: ^512[0-9]: ^513[0-9]: ^514[0-9]: ^515[0-9]: ^516[0-9]: ^517[0-9]: ^518[0-9]: ^519[0-9]: ^520[0-9]: ^521[0-9]: ^522[0-9]: ^523[0-9]: ^524[0-9]: ^525[0-9]: ^526[0-9]: ^527[0-9]: ^528[0-9]: ^529[0-9]: ^530[0-9]: ^531[0-9]: ^532[0-9]: ^533[0-9]: ^534[0-9]: ^535[0-9]: ^536[0-9]: ^537[0-9]: ^538[0-9]: ^539[0-9]: ^540[0-9]: ^541[0-9]: ^542[0-9]: ^543[0-9]: ^544[0-9]: ^545[0-9]: ^546[0-9]: ^547[0-9]: ^548[0-9]: ^549[0-9]: ^550[0-9]: ^551[0-9]: ^552[0-9]: ^553[0-9]: ^554[0-9]: ^555[0-9]: ^556[0-9]: ^557[0-9]: ^558[0-9]: ^559[0-9]: ^560[0-9]: ^561[0-9]: ^562[0-9]: ^563[0-9]: ^564[0-9]: ^565[0-9]: ^566[0-9]: ^567[0-9]: ^568[0-9]: ^569[0-9]: ^570[0-9]: ^571[0-9]: ^572[0-9]: ^573[0-9]: ^574[0-9]: ^575[0-9]: ^576[0-9]: ^577[0-9]: ^578[0-9]: ^579[0-9]: ^580[0-9]: ^581[0-9]: ^582[0-9]: ^583[0-9]: ^584[0-9]: ^585[0-9]: ^586[0-9]: ^587[0-9]: ^588[0-9]: ^589[0-9]: ^590[0-9]: ^591[0-9]: ^592[0-9]: ^593[0-9]: ^594[0-9]: ^595[0-9]: ^596[0-9]: ^597[0-9]: ^598[0-9]: ^599[0-9]: ^600[0-9]: ^601[0-9]: ^602[0-9]: ^603[0-9]: ^604[0-9]: ^605[0-9]: ^606[0-9]: ^607[0-9]: ^608[0-9]: ^609[0-9]: ^610[0-9]: ^611[0-9]: ^612[0-9]: ^613[0-9]: ^614[0-9]: ^615[0-9]: ^616[0-9]: ^617[0-9]: ^618[0-9]: ^619[0-9]:"
And that almost works except I had asked 4533 to 6219

So I asked again and told it, hey you can probably do 5000 to 5999 with just one

And it replied

Code: Select all

type myfile.txt | %SystemRoot%\System32\findstr /N "^" | findstr /R "^453[3-9]: ^45[4-9][0-9]: ^4[6-9][0-9][0-9]: ^5[0-9][0-9][0-9]: ^6[0-1][0-9][0-9]: ^620[0-9]: ^621[0-9]:"

And that does actually work


So now I want to create a function which creates these regex ranges from a simple range of lines

Code: Select all

::Usage Call :GetRegexRange X-Y X1-Y1 X2-Y2 ... Xn-Yn
:: returns findstr compatible regex list describing the ranges: ^453[3-9]: ^45[4-9][0-9]: ^4[6-9][0-9][0-9]: ^5[0-9][0-9][0-9]: ^6[0-1][0-9][0-9]: ^620[0-9]: ^621[0-9]:


So what is a range, well it's probably like when you print, a series of pages, individual pages and range of pages

example

Code: Select all

4,6,12,22-38,52-55
You might have even page numbers that repeat, or range of pages that go backward

Code: Select all

55,56,1-5,31-21,17,5,5,5,5,30-1
Ranges might also go forward then backwards, having more than 2 stops

Code: Select all

20-25-17-20,10,11,99-89,56,57,59,22-24-26,1-5
But for now I want the simplest working function so, just two numbers

Code: Select all

235-11579
First thing, split that in two variables

Code: Select all

for /f "delims=- " %%a in ("%_MyRange%") do ( set /a _Range1=%%a & set /a _Range2=%%b )
Next, figure out which is the higher number

Code: Select all

if %_Range1% LSS %_Range2% ( set /a _RangeLow=%_Range1% & set /a _RangeHigh=%_Range2% ) else ( set /a _RangeLow=%_Range1% & set /a _RangeHigh=%_Range2% )

Now it gets harder

In english we have to

Figure out how many digits the higher number has

I will use the examples

Code: Select all

5-9,5-55,15-555,27-47852,25227-45319,29-2555,40000-40008,39987-40022,45315-45319
Ok, in pseudocode I think it looks like this.

Code: Select all

5-9
Then get the len of each numbers

Code: Select all

call :len _RangeHigh _RangeHigh_len
call :len _RangeLow _RangeLow_len
In this case both len is 1

Now loop %_RangeHigh_len% number of times

In this case 1

first loop

get digit _RangeHigh[1] in _RangeHigh_CurrentDigit
if _RangeLow[1] is "" when it's 0 , into _RangeLow_CurrentDigit


so _RangeLow_CurrentDigit=5 and _RangeHigh_CurrentDigit=9

Increment by one _RangeLow_CurrentDigit, decrement by one _RangeHigh_CurrentDigit

if _RangeHigh_CurrentDigit minus _RangeLow_CurrentDigit is greater than zero

create a regex, ^ for the beginning of line, then [%_RangeLow_CurrentDigit%-%_RangeHigh_CurrentDigit%] and for _RangeHigh_len minus one, add [0-9] and end regex string with :

Result should be ^[5-9]:

-------------

Next example

Code: Select all

5-55

Code: Select all

call :len _RangeHigh _RangeHigh_len
call :len _RangeLow _RangeLow_len
get digit _RangeHigh[1] in _RangeHigh_CurrentDigit
if _RangeLow[1] is "" when it's 0 , into  _RangeLow_CurrentDigit

And here is a problem, _RangeLow's first digit is 5 but in the wrong direction
What needed to happen earlier was to leftpad _RangeLow with zeroes until it has as many digits as _RangeHigh


Ok new version

Code: Select all

call :len _RangeHigh _RangeHigh_len
call :len _RangeLow _RangeLow_len
call :leftpad _RangeLow 0  %_RangeHigh_len%
get digit _RangeHigh[1] in _RangeHigh_CurrentDigit
if _RangeLow[1] is "" when it's 0 , into  _RangeLow_CurrentDigit
New state is

_RangeLow=05
_RangeHigh=55
_RangeHigh_len=2
_RangeLow[1]->_RangeLow_CurrentDigit=0
_RangeHigh[1]->_RangeHigh_CurrentDigit=5

Increment by one _RangeLow_CurrentDigit, decrement by one _RangeHigh_CurrentDigit

_RangeLow_CurrentDigit=1
_RangeHigh_CurrentDigit=4


_RangeHigh_CurrentDigit minus _RangeLow_CurrentDigit = 3 is greater than zero
Now we create the first regex

^[%_RangeLow_CurrentDigit%-%_RangeHigh_CurrentDigit%]
or ^[1-4]

Then pad with [0-9] for _RangeHigh_len minus 1 time and end with :, so that's

Code: Select all

^[1-4][0-9]:
Loop to the next index of _RangeHigh_len (this is _RangeHigh_len_index)

Right now we have 10 to 49 covered, we need two more regex ^[5-9]: and ^5[0-5]:

I think for the rest of loop this means a low side and a high side regex needs to be created
The low side regex should take _RangeLow_CurrentDigit and rightpad with zero all remaining positions of _RangeHigh_len, then substract 1. This is the _Current_Regex_LowLimit.

Likewise, _RangeHigh_CurrentDigit, needs to be right padded with 9 and then add one, this makes _Current_Regex_LowLimit

so

call :rightpad _RangeLow_CurrentDigit 0 %_RangeHigh_len%-%_RangeHigh_len_index%
call :rightpad _RangeHigh_CurrentDigit 9 %_RangeHigh_len%-%_RangeHigh_len_index%

_RangeLow_CurrentDigit is now 10
_RangeLow_CurrentDigit is now 49

decrement _RangeLow_CurrentDigit and increment _RangeLow_CurrentDigit

_RangeLow_CurrentDigit is now 9
_RangeLow_CurrentDigit is now 50

I have to quit at this point sorry, I will pick this up later.

_RangeLow_CurrentDigit, might need to be 09, I will see

Post Reply