New technic: set /p can read multiple lines from a file

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

New technic: set /p can read multiple lines from a file

#1 Post by jeb » 12 Aug 2011 13:45

Hi,
today I saw a new technic Stackoverflow: Merge 2 txt files in a single tab delimited file in batch from walid2mi.

Perhaps you know it, but I was absolutly impressed :D :)
I know that you can read a single line(only the first line) with

Code: Select all

<myfile.txt set /p line=


But walid2mi use it with a simple trick to extend it to read multiple lines from a file

Code: Select all

< myfile.txt (
  set /p line1=
  set /p line2=
  set /p line3=
  set /p line4=
  set /p line5=
)
set line

And it's even better than that, it can read empty lines and read any content, even "%&|<>^ and also !
Independent of the delayed expansion mode :!:
This is impossible with a simple FOR loop

Btw. Do I said that I'm totally excited :D :!:

jeb

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: New technic: set /p can read multiple lines from a file

#2 Post by aGerman » 13 Aug 2011 07:13

That's awesome. I tested with that code (skip the first 3 rows an put the next 2 in a variable) and it worked like a charme:

Code: Select all

@echo off
setlocal enabledelayedexpansion

<"%~f0" (
  for /l %%i in (1,1,3) do set /p "="
  set /p "line_="
  set /p "line__="
)

echo(!line_!
echo(!line__!
pause


jeb wrote:Btw. Do I said that I'm totally excited :D :!:

You're indeed batch-crazy, jeb :lol:

Regards
aGerman

Ranguna173
Posts: 104
Joined: 28 Jul 2011 17:32

Re: New technic: set /p can read multiple lines from a file

#3 Post by Ranguna173 » 13 Aug 2011 13:18

Hmmmm....
Really good..

I think this can come in handy..

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: New technic: set /p can read multiple lines from a file

#4 Post by dbenham » 14 Aug 2011 19:28

aGerman wrote:
jeb wrote:Btw. Do I said that I'm totally excited :D :!:

You're indeed batch-crazy, jeb :lol:

I'm with you jeb - This is fantastic. :D

My favorite part is there is no longer a need to enable and disable delayed expansion with each loop iteration. This makes it MUCH easier for complex logic that requires variables set for one line to be preserved for subsequent lines.

But - as a bonus, this technique is actually faster than the old way :!: (I think because there is no need to setlocal/endlocal with each iteration.)

I wanted to test the performance so I wrote loops to copy a ~50k text file. Useless I know, but it is a good test that easily verifies everything was read correctly. I wanted to test this against the "traditional" way of reading a text file while preserving empty lines and special characters.

- Copy1 uses the new technique, writes entire file at end of outer block.
- Copy2 same as Copy1, but writes each line using append - MUCH SLOWER
- Copy3 "Traditional method"
- Copy4 Same as Copy3, but I wanted to investigate performance of type file|find /c /v "" vs. find /c /v "" file. No difference found.
- Copy1a Same as Copy 1, but added unneeded SETLOCAL/ENDLOCAL to show that this is why Copy3/4 are slower than Copy1.

I tested each method 10 times. The new method was roughly twice as fast.

The timing macros used in this test are available here.

Code: Select all

@echo off
setlocal
if not defined macro\load.macrolib_time call macrolib_time

set file="%~1"
for %%a in (%file%) do set size=%%~za
for /f %%a in ('type %file%^|find /c /v ""') do set lines=%%a
echo file=%file%, lines=%lines%, size=%size%

::copy1
set out="%~1.copy1"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal enableDelayedExpansion
<%file% (
  for /f %%n in ('type %file%^|find /c /v ""') do for /l %%l in (1 1 %%n) do (
    set /p "ln="
    echo(!ln!
    set "ln="
  )
)>%out%
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy1") %macro.diffTime%
set copy1
)
fc %file% %out%

::copy2
set out="%~1.copy2"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal enableDelayedExpansion
if exist %out% del %out%
<%file% (
  for /f %%n in ('type %file%^|find /c /v ""') do for /l %%l in (1 1 %%n) do (
    set /p "ln="
    echo(!ln!>>%out%
    set "ln="
  )
)
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy2") %macro.diffTime%
set copy2
)
fc %file% %out%

::copy3 "traditional"
set out="%~1.copy3"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal disableDelayedExpansion
(for /f "skip=2 tokens=*" %%a in ('find /n /v "" %file%') do (
  set "ln=%%a"
  setlocal enableDelayedExpansion
  set "ln=!ln:*]=!"
  echo(!ln!
   endlocal
))>%out%
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy3") %macro.diffTime%
set copy3
)
fc %file% %out%

::copy4 "traditional"
set out="%~1.copy4"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal disableDelayedExpansion
(for /f "tokens=*" %%a in ('type %file%^|find /n /v ""') do (
  set "ln=%%a"
  setlocal enableDelayedExpansion
  set "ln=!ln:*]=!"
  echo(!ln!
   endlocal
))>%out%
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy4") %macro.diffTime%
set copy4
)
fc %file% %out%

::copy1a
set out="%~1.copy1a"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal enableDelayedExpansion
<%file% (
  for /f %%n in ('type %file%^|find /c /v ""') do for /l %%l in (1 1 %%n) do (
    setlocal enableDelayedExpansion
    set /p "ln="
    echo(!ln!
    set "ln="
    endlocal
  )
)>%out%
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy1a") %macro.diffTime%
set copy1a
)
fc %file% %out%

results:

Code: Select all

C:\Users\Public\utils>test2.bat testcall.bat
file="testcall.bat", lines=1685, size=52081
copy1=00:00:01.09
copy1=00:00:00.60
copy1=00:00:00.59
copy1=00:00:01.10
copy1=00:00:00.59
copy1=00:00:00.59
copy1=00:00:00.58
copy1=00:00:00.59
copy1=00:00:00.59
copy1=00:00:00.59
Comparing files testCall.bat and TESTCALL.BAT.COPY1
FC: no differences encountered

copy2=00:00:18.83
copy2=00:00:18.78
copy2=00:00:18.79
copy2=00:00:19.27
copy2=00:00:18.82
copy2=00:00:18.84
copy2=00:00:18.90
copy2=00:00:18.80
copy2=00:00:19.68
copy2=00:00:18.98
Comparing files testCall.bat and TESTCALL.BAT.COPY2
FC: no differences encountered

copy3=00:00:01.41
copy3=00:00:01.42
copy3=00:00:01.41
copy3=00:00:01.40
copy3=00:00:01.41
copy3=00:00:01.41
copy3=00:00:01.45
copy3=00:00:01.41
copy3=00:00:01.41
copy3=00:00:01.40
Comparing files testCall.bat and TESTCALL.BAT.COPY3
FC: no differences encountered

copy4=00:00:01.41
copy4=00:00:01.41
copy4=00:00:01.43
copy4=00:00:01.43
copy4=00:00:01.46
copy4=00:00:01.43
copy4=00:00:01.44
copy4=00:00:01.43
copy4=00:00:01.43
copy4=00:00:01.45
Comparing files testCall.bat and TESTCALL.BAT.COPY4
FC: no differences encountered

copy1a=00:00:01.52
copy1a=00:00:02.00
copy1a=00:00:01.50
copy1a=00:00:01.48
copy1a=00:00:01.49
copy1a=00:00:01.50
copy1a=00:00:01.99
copy1a=00:00:01.98
copy1a=00:00:01.99
copy1a=00:00:01.49
Comparing files testCall.bat and TESTCALL.BAT.COPY1A
FC: no differences encountered


Dave Benham

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: New technic: set /p can read multiple lines from a file

#5 Post by jeb » 16 Aug 2011 02:36

And there is one more functionality ...
It can read even lines when they are appended later (while reading).

I find no way of detecting the end of the file with "set /p", it simply read empty lines. :(
But if you append then a line it reads this immediatly

Code: Select all

@echo off
setlocal EnableDelayedExpansion
cls
<myText.txt (
   call :readLoop
)
goto :eof

:readLoop
set "line="
set /p line=
if defined line   echo(!line!
if !line!==STOP goto :eof
goto :readLoop


jeb

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: New technic: set /p can read multiple lines from a file

#6 Post by dbenham » 16 Aug 2011 06:24

jeb wrote:And there is one more functionality ...
It can read even lines when they are appended later (while reading).
Does this not work in a normal FOR /F loop :?: I haven't tested. I realize the line would have to be written before the FOR /F loops reaches the end, but I would think it would work.

jeb wrote:I find no way of detecting the end of the file with "set /p", it simply read empty lines.
But if you append then a line it reads this immediately
Yes - I had hoped that the %errorlevel% would help. An error is set, but it doesn't differentiate between an exhausted input stream and an empty line. I am mildly surprised that the input stream remains open after exhaustion such that the "endless" loop terminates upon appending the STOP line. That modifies my understanding of how the input redirection works - thanks.

I confirmed that the terminating STOP line does not require the <CR><LF> at the end. Which raises an interesting question - what if STOP is written character by character and SET /P happens to read the line in the middle. It seems as though the loop should then carry on.

I thought of attempting to read the entire file via a GOTO loop, but didn't like the necessity of modifying the file to detect the end. I also assumed it would be slower than a FOR /L loop, even though the FOR /L solution must read the file twice - the 1st time to determine the number of lines.

I ran some tests and the GOTO solution is indeed significantly slower. I terminated the test file with ::STOP prior to running the test.

Code: Select all

@echo off
setlocal
if not defined macro\load.macrolib_time call macrolib_time

set file="%~1"
for %%a in (%file%) do set size=%%~za
for /f %%a in ('type %file%^|find /c /v ""') do set lines=%%a
echo file=%file%, lines=%lines%, size=%size%

::copy1
set out="%~1.copy1"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal enableDelayedExpansion
<%file% (
  for /f %%n in ('type %file%^|find /c /v ""') do for /l %%l in (1 1 %%n) do (
    set /p "ln="
    echo(!ln!
    set "ln="
  )
)>%out%
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy1") %macro.diffTime%
set copy1
)
fc %file% %out%

::copy2
set out="%~1.copy2"
for /l %%A in (1 1 10) do (
%macro_Call% ("t1") %macro.getTime%
setlocal enableDelayedExpansion
if exist %out% del %out%
<%file% (
  call :readLoop
)>%out%
endlocal
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 copy2") %macro.diffTime%
set copy2
)
fc %file% %out%

exit /b

:readLoop
set "line="
set /p line=
echo(!line!
if !line!==::STOP exit /b
goto :readLoop

Results:

Code: Select all

file="testCall.bat", lines=1686, size=52089
copy1=00:00:00.70
copy1=00:00:00.70
copy1=00:00:00.68
copy1=00:00:00.69
copy1=00:00:00.70
copy1=00:00:01.20
copy1=00:00:01.23
copy1=00:00:01.20
copy1=00:00:00.70
copy1=00:00:00.71
Comparing files testCall.bat and TESTCALL.BAT.COPY1
FC: no differences encountered

copy2=00:00:05.79
copy2=00:00:05.75
copy2=00:00:05.75
copy2=00:00:05.77
copy2=00:00:05.75
copy2=00:00:05.77
copy2=00:00:05.76
copy2=00:00:05.77
copy2=00:00:05.73
copy2=00:00:05.75
Comparing files testCall.bat and TESTCALL.BAT.COPY2
FC: no differences encountered


Dave Benham

Ranguna173
Posts: 104
Joined: 28 Jul 2011 17:32

Re: New technic: set /p can read multiple lines from a file

#7 Post by Ranguna173 » 16 Aug 2011 19:01

Awesome!

Works perfectly.. :D :D :D

*New save system coming* :D

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: New technic: set /p can read multiple lines from a file

#8 Post by aGerman » 17 Aug 2011 12:18

Dave,

you could measure again with this code:

Code: Select all

<%file% (
  for /f %%i in ('findstr /n "^" %file%') do (
    set /p "ln="
    echo(!ln!
    set "ln="
  )
)>%out%

Not sure, but it could be faster because you don't need the FOR /L loop.

Regards
aGerman

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: New technic: set /p can read multiple lines from a file

#9 Post by dbenham » 17 Aug 2011 13:36

Great idea aGerman :!: :D

We can also try this analagous form with FIND:

Code: Select all

<%file% (
  for /f "skip=2" %%i in ('find /n /v "" %file%') do (
    set /p "ln="
    echo(!ln!
    set "ln="
  )
)>%out%
I agree that at least one if not both of these forms should be faster.

I think it will come down to a race to see which is faster: FIND or FINDSTR

I'll run some timing tests later tonight.

Dave Benham

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: New technic: set /p can read multiple lines from a file

#10 Post by aGerman » 17 Aug 2011 13:46

Yeah, that would be interesting. I didn't test the speed because I'm virtually certain my result would be completely different (CPU, RAM ...).

Regards
aGerman

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: New technic: set /p can read multiple lines from a file

#11 Post by jeb » 18 Aug 2011 13:38

... and two more interesting effects.

1. Dave discover the same, the errorlevel is set to 1 for an empty line, but you can't decide if it is an empty line or if the end of the file is reached, but the errorlevel isn't reset to 0, if the next set /p read a none empty line.

2. (again demonstrated by walid2me Alternative creation of LF)
This demonstrates the usage of PAUSE, pause reads excactly one character, it can even split a CR/LF at a line end.

Code: Select all

@echo off

:)
setlocal enabledelayedexpansion
>nul,(pause&set /p LF=&pause&set /p LF=)<%0
set LF=!LF:~0,1!

echo 1!LF!2!LF!3


jeb

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: New technic: set /p can read multiple lines from a file

#12 Post by dbenham » 18 Aug 2011 20:46

This was much more complicated than I thought it would be. Results are NOT as expected - in fact, I find some of them shocking!

1) This does NOT work :!:

Code: Select all

<%file% (
  for /f "skip=2" %%i in ('find /n /v "" %file%') do (
    set "ln="
    set /p "ln="
    echo(!ln!
  )
)>%out%
It reads and writes the correct number of lines - but the lines are empty :shock:
I don't understand the mechanism of failure. :?

Yet this works just fine:

Code: Select all

<%file% (
  for /f "skip=2" %%i in ('type %file%|find /n /v ""') do (
    set "ln="
    set /p "ln="
    echo(!ln!
  )
)>%out%
And so does aGerman's original suggestion with FINDSTR.

2) As often happens with batch programming - the methods that seems like they should be faster are actually slower as the size of the file increases.

Once I proved that each method was able to copy properly, I stripped out the copy portion and preserved only the read portion of the code. In this way I was able to time just the code that is necessary to do the read.

I tested 6 different methods for reading a text file: 4 using the new SET /P syntax, and two using the "traditional" FOR /F approach. I did not further test the SET /P approach using a GOTO loop because A) I've already shown it is slower, and B) it requires altering the text file with an appended STOP flag. The read should be non-destructive.

For test files to read, I started with one batch file that was approximately 1 kbyte in size and progressively doubled the size until I reached 32k. I did the same with a file that was approximately 50k and doubled it until I reached 1600k.

I tested each of the methods 10 times against the 1k derived test files, and 3 times against the 50k derived files, and averaged the results.

I also ran the tests on two different machines.

The test code takes 2 arguments:
%1 = the test file to read
%2 = the number of times to test each method

Here is the test code:

Code: Select all

@echo off
setlocal enableDelayedExpansion
if not defined macro\load.macrolib_time call macrolib_time

set cnt=%2

set file="%~1"
for %%a in (%file%) do set size=%%~za
for /f %%a in ('type %file%^|find /c /v ""') do set lines=%%a
echo file=%file%, lines=%lines%, size=%size%

::read1 FOR /F ('FIND /C FILE') FOR /L () SET /P
for /l %%A in (1 1 %cnt%) do (
%macro_Call% ("t1") %macro.getTime%
<%file% (
  for /f "delims=" %%n in ('find /c /v "" %file%') do set "len=%%n"&for /l %%l in (1 1 !len:*: ^=!) do (
    set "ln="
    set /p "ln="
  )
)
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 read1") %macro.diffTime%
set read1
)
echo(

::read2 FOR /F ('TYPE FILE|FIND /C') FOR /L () SET /P
for /l %%A in (1 1 %cnt%) do (
%macro_Call% ("t1") %macro.getTime%
<%file% (
  for /f %%n in ('type %file%^|find /c /v ""') do for /l %%l in (1 1 %%n) do (
    set "ln="
    set /p "ln="
  )
)
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 read2") %macro.diffTime%
set read2
)
echo(

::read3 FOR /F ('FINDSTR FILE') SET /P
for /l %%A in (1 1 %cnt%) do (
%macro_Call% ("t1") %macro.getTime%
<%file% (
  for /f %%a in ('findstr /n "^" %file%') do (
    set "ln="
    set /p "ln="
  )
)
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 read3") %macro.diffTime%
set read3
)
echo(

::read4 FOR /F ('TYPE FILE|FIND') SET /P
for /l %%A in (1 1 %cnt%) do (
%macro_Call% ("t1") %macro.getTime%
<%file% (
  for /f %%a in ('type %file%^|find /n /v ""') do (
    set "ln="
    set /p "ln="
  )
)
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 read4") %macro.diffTime%
set read4
)
echo(

setlocal DisableDelayedExpansion
::read5 "Traditional" FOR /F ('FINDSTR')
for /l %%A in (1 1 %cnt%) do (
%macro_Call% ("t1") %macro.getTime%
(
  for /f "tokens=*" %%a in ('findstr /n "^" %file%') do (
    set "ln=%%a"
    setlocal enableDelayedExpansion
    set "ln=!ln:*:=!"
    endlocal
  )
)
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 read5") %macro.diffTime%
set read5
)
echo(

::read6 "Traditional" FOR /F ('FIND')
for /l %%A in (1 1 %cnt%) do (
%macro_Call% ("t1") %macro.getTime%
(
  for /f "skip=2 tokens=*" %%a in ('find /n /v "" %file%') do (
    set "ln=%%a"
    setlocal enableDelayedExpansion
    set "ln=!ln:*]=!"
    endlocal
  )
)
%macro_Call% ("t2") %macro.getTime%
%macro_Call% ("t1 t2 read6") %macro.diffTime%
set read6
)

I've summarized the methods used above using an abbreviated syntax:

Read1 = FOR /F ('FIND /C FILE') FOR /L () SET /P
This is a variation of my original Copy1 method where I eliminate the pipe while determining the file size.

Read2 = FOR /F ('TYPE FILE|FIND /C') FOR /L () SET /P
This is my original Copy1 method

Read3 = FOR /F ('FINDSTR /N FILE') SET /P
This is aGerman's suggestion

Read4 = FOR /F ('TYPE FILE|FIND /N') SET /P
This is my variation of aGerman's suggestion, with the added pipe to get around the unexplained failure.

Read5 = FOR /F ('FINDSTR /N')
This is the "traditional" method using FINDSTR /N to preserve the empty lines

Read6 = "Traditional" FOR /F ('FIND /N')
This is the "traditional" method using FIND /N to preserve the empty lines


Results on a Vista64 Gateway Intel Quad Core2

Code: Select all

                        A V E R A G E   T I M E  ( s e c o n d s )
    Size  Lines  runs  Read1  Read2  Read3   Read4   Read5   Read6
     ~1k     24    10   0.49   0.15   0.53    0.15    0.60    0.15
     ~2k     48    10   0.39   0.20   0.59    0.16    0.47    0.12
     ~4k     96    10   0.50   0.12   0.60    0.22    0.66    0.16
     ~8k    192    10   0.42   0.19   0.53    0.24    0.58    0.24
    ~16k    384    10   0.45   0.18   0.17    0.19    0.39    0.39
    ~32k    768    10   0.47   0.32   0.27    0.29    0.71    0.71
    ~50k   1685     3   0.56   0.65   0.54    0.56    1.47    1.48
   ~100k   3370     3   0.63   0.81   1.16    1.19    3.04    3.06
   ~200k   6740     3   1.32   1.66   2.92    2.98    6.66    6.71
   ~400k  13480     3   2.55   3.22   8.42    8.61   15.90   16.07
   ~800k  26960     3   5.00   6.37  27.54   28.17   42.44   43.16
  ~1600k  53920     3   9.90  12.62  98.73  101.32  129.13  131.55

Results on an Old Dell XP machine

Code: Select all

                        A V E R A G E   T I M E  ( s e c o n d s )
    Size  Lines  runs  Read1  Read2  Read3   Read4   Read5   Read6
     ~1k     24    10   0.34   0.49   0.34    0.34    0.37    0.36
     ~2k     48    10   0.35   0.49   0.34    0.51    0.42    0.39
     ~4k     96    10   0.35   0.51   0.37    0.52    0.47    0.45
     ~8k    192    10   0.38   0.55   0.39    0.56    0.60    0.62
    ~16k    384    10   0.45   0.64   0.45    0.68    0.90    0.94
    ~32k    768    10   0.56   0.77   0.62    1.12    1.48    1.58
    ~50k   1685     3   0.82   1.14   1.00    1.41    2.87    3.32
   ~100k   3370     3   1.51   1.96   1.96    2.77    5.63    6.31
   ~200k   6740     3   2.51   2.96   4.05    6.37   11.67   13.76
   ~400k  13480     3   4.31   5.49  10.93   17.58   28.25   32.44
   ~800k  26960     3   8.36  11.35  34.72   53.78   69.24   83.73
  ~1600k  53920     3  15.88  20.77 119.54  180.32  180.76  240.50

My Vista machine is faster, but it has a quirk in that sometimes when the machine needs to invoke CMD.EXE there is a consistant .5 second delay that randomly creeps in. If you look at my times in my first post in this thread you can see what I am talking about. In these tests, Read1 for a 1k file was either ~.10 or ~.60 seconds. The variations are significant for small files, but virtually dissapear for large files.

The timings for the XP machine are slower, but much better clustered.

All methods are virtually equivalent for small files, but as the files grow, the methods really begin to differentiate.

The results surprised me initially, but I think I understand somewhat why aGerman's suggestion is slower.

Windows command shell does not have true pipes between processes or threads like Unix. EDIT - Now I'm not so sure this is true. I'm pretty sure it was true with original DOS and COMMAND.COM. But some recent reading indicates CMD.EXE has true pipes after all. But there definitely seems to be some kind of buffering issue when piping large amounts of data Instead there is only one active process within a given session. So whenever Windows needs to pipe explicitly, or implicitly like when FOR /F executes a command, The spawned command shell must complete its job entirely and cache the results before it is sent to the next "process" in line. I think for small files everything is cached in memory and we don't see much degradation. But as the output grows, it has to cache to disk and this is what slows it down. Although this "disk caching" performance hit seems to be much worse for the 'command' within a FOR /F than it does for an explicit pipe. Perhaps the mechanisms are different.

I'm sure I have some inaccuracies in my explanation, but I think there is at least a grain of truth to the above.

The Read1 method never caches more than one line of data while determining the file size, so it is by far the fastest.

The Read2 method is nearly identical except it "caches" the entire file once for the left half of the pipe operation.

The Read3 method caches the entire file with line number prefixes, but it is now for the FOR /F command and not an explicit pipe. For some reason this becomes increasingly slow for large outputs.

The Read4 method does the same as Read3, plus it must cache the entire file an additional time for an explicit pipe.

Read5 and Read6 must cache the entire file plus the line number prefixes, plus they must invoke SETLOCAL/ENDLOCAL for each line. However I did some timings without SETLOCAL/ENDLOCAL (not shown) and there is some additional mechanism that makes this slower than Read3/Read4. I'm guessing it has something to do with the fact that Read3/Read4 only preserve the 1st token of each line within the FOR /F, whereas Read5/Read6 must preserve the entire line. Does this imply that the tokenised results are also "cached"? :?

The results seem to show that FIND is inherently slower than FINDSTR. It's too bad FINDSTR does not have the /C option that FIND has.

The methods that don't require caching of the FOR /F command are fairly linear. Each time the file size is doubled the timing is also doubled.

But methods that do require caching of a FOR /F command are worse than linear. Doubling the size of a large file increases the time by a factor of three or more. :( :?


So in the future I will be using either Read1 or Read2 method. Read1 is faster, but a bit more complex to write.

Dave Benham
Last edited by dbenham on 04 Oct 2011 21:43, edited 1 time in total.

OJBakker
Expert
Posts: 88
Joined: 12 Aug 2011 13:57

Re: New technic: set /p can read multiple lines from a file

#13 Post by OJBakker » 19 Aug 2011 15:04

1) This does NOT work :!:

<%file% (
for /f "skip=2" %%i in ('find /n /v "" %file%') do (
set "ln="
set /p "ln="
echo(!ln!
)
)>%out%


It reads and writes the correct number of lines - but the lines are empty :shock:
I don't understand the mechanism of failure. :?


I have done some test and it seems like somehow find is eating away everyting from the inputstream.
To confirm this I changed your code a bit.

Code: Select all

set filetmp=%File%.tmp
set bool=1 & type %file% > %filetmp%
<%filetmp% (
  for /f "skip=2" %%i in ('find /n /v "" %filetmp%') do (
    if defined bool (type %filetmp%>>%filetmp%) & set "bool="
    set "ln="
    set /p "ln="
    echo(!ln!
  )
)>%out%

It is cheating but with these changes the for loops works as it should.

OJB

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: New technic: set /p can read multiple lines from a file

#14 Post by jeb » 19 Aug 2011 16:34

Some new empirical findings...

Testfile.txt wrote:one
two
three
four


FIND and MORE works different than FINDSTR.
I suppose there exists internally a file position variable.
None of the commands close the input stream, even if they are at the end the stream is still open.

FIND and MORE first resets the file position variable and then read the complete file to the EOF,
so both can be used to reread the same file multiple times.

If you use FINDSTR it simply reads the next data from the position, if the position is EOF it sets the errorlevel to 1,
this is a way to detect the file end, but not a very useful solution, as it consumes the rest of the input if it isn't at the end :(

Pause reads excactly one character but never sets the errorlevel at all.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
< testfile.txt  (
   set /p line=
   echo set/p-read: !line!

   echo --- FINDSTR ---
   findstr /n ^^

   echo --- MORE ---
   more

   echo --- FIND ---
   find /n /v ""
)


Output wrote:set/p-read: one
--- FINDSTR ---
1:two
2:three
3:four
--- MORE ---
one
two
three
four
--- FIND ---
[1]one
[2]two
[3]three
[4]four


jeb

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: New technic: set /p can read multiple lines from a file

#15 Post by dbenham » 19 Aug 2011 17:00

That's pretty cool jeb 8)

But I am still mystified how the 'FIND /N "" %FILE%' in my FOR IN() clause is sharing the same input stream (or file handle?) with the <%file% redirection :?: :?

I would have thought that each would maintain its own file position pointer. Especially since the IN() clause command gets its own copy of COMMAND.EXE

Dave Benham

Post Reply