Mechanics of reading a file with FOR /F

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Mechanics of reading a file with FOR /F

#1 Post by dbenham » 01 Jan 2012 13:54

I was investigating what happens when you read a file with FOR /F and exit prematurely via GOTO. All testing was done on 64 bit Vista with a quad core processor.

I created a ~1.5Gbyte text file and used the following batch to read the first line and exit:

Code: Select all

echo on
for /f "delims=" %%a in (%1) do echo %%a&goto quit
:quit

First the for loop line was immediately echoed, and then my disk drive thrashed for what seemed over a minute before anything more happened.

Next the FOR DO clause was echoed and the 1st line was immediately echoed, and the batch script ended.

But then my disk drive continued to thrash for what seemed like at lest 1/2 hour. I brought up the Resource Monitor to look at the disk activity and saw a process named "svchost.exe (LocalServiceNetworkRrestricted)" that was continuously reading from pagefile.sys (virtual memory). There was virtually no write activity. At long last the process terminated and all was quiet again.

So I have two questions:
1) Why the long delay before reading and echoing the 1st line? - I think I know this one.
2) What was my machine reading for so long after the batch script terminated? - I have a theory

1) Why the long delay before reading and echoing the 1st line?
I figure that FOR /F must load the entire file into memory (virtual memory if it is large enough) prior to reading any of the lines. If this is true, then lines that are read should be immune to any changes that are made to the file by the DO clause. So I decided to test this theory.

I created a small test.txt file

Code: Select all

1
2
3
4
5

Processed by this batch

Code: Select all

echo off
echo Within append loop
for /f %%a in (test.txt) do (
  echo %%a
  if %%a==1 echo %%a>>test.txt
)
echo(
echo After append loop
type test.txt

echo(
echo Within modify loop
set "flag="
for /f %%a in (test.txt) do (
  if not defined flag del test.txt&set flag=1
  echo %%a
  echo Line %%a>>test.txt
)
echo(
echo After modify loop
type test.txt

echo(
echo Delete test
for /f "delims=" %%a in (test.txt) do (
  echo %%a
  if exist test.txt del test.txt
)

With these results

Code: Select all

Within append loop
1
2
3
4
5

After append loop
1
2
3
4
5
1

Within modify loop
1
2
3
4
5
1

After modify loop
Line 1
Line 2
Line 3
Line 4
Line 5
Line 1

Delete test
Line 1
Line 2
Line 3
Line 4
Line 5
Line 1

Indeed, the FOR /F reads the file as it existed at the start of the command, ignoring any changes that occur in the DO clause. :shock: That was a big eye opener for me. I had always assumed FOR /F simply opened the file and began reading right away.


2) What was my machine reading for so long after the batch script terminated?Edit - I disprove this theory later in the thread. I haven't a clue what is actually happening
I have a theory, but I don't know how to test it. I'm guessing that FOR /F creates an asynchronous process to actually read the "file" from virtual memory and then consumes the output in a similar fashion to how pipes work. The parent batch process physically ends quickly when the FOR loop terminates after the 1st iteration because of the GOTO. But the auxiliary process is left open and it continues to read all 1.5GB of data to completion. I'm wondering if the auxiliary process might be responsible for the actual parsing of each line into tokens?

Well, that is my theory. I'd be curious if anyone can provide more evidence that this is correct. Or perhaps someone has a better theory?

Dave Benham
Last edited by dbenham on 08 Jan 2012 08:52, edited 1 time in total.

aGerman
Expert
Posts: 4710
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Mechanics of reading a file with FOR /F

#2 Post by aGerman » 01 Jan 2012 15:09

Hi Dave.

I can tentatively confirm your first issue. If I try to process a 1.5 GB file I get an error message that it's not enough memory space available. Currently I work on my small netbook with only 1 GB installed RAM and approx 1 GB virtual RAM. I'm also unable to open this file with notepad or PSPad (my default editor for scripts).

As to your second issue:
Svchost.exe is the parent prozess for DLL related services, such like "Automatic Updates", "Windows Firewall", "Plug and Play" and many others. What service could read the file into the RAM? :?

Regards
aGerman

Aacini
Expert
Posts: 1927
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Mechanics of reading a file with FOR /F

#3 Post by Aacini » 03 Jan 2012 21:39

I had made some tests and conclude this point: FOR command always "process" all the values included in its set, that is, all the lines of a file with /F option, all the numbers of a loop with /L option, etc. However, if a GOTO or EXIT or a Batch file invocation with no CALL or CMD command (I called it Overlay) is executed, the rest of values after the GOTO/EXIT/Overlay are simple "passed over", but still processed in some way. There is no way to break a FOR command in the middle, EXCEPTING if the FOR command is executing in a second CMD.EXE session that is not permanent (that have not the /K switch). This behaviour is the key that made possible WHILE macro to work.

I have seen a very strange behaviour if an Overlay is executed inside a FOR command, and a GOTO or EXIT is executed in the same line of the FOR or in the Overlay:

Code: Select all

for /L %%i in (1,1,1000) do echo %%i & Overlay & if %%i gtr 10 goto out
:out

Overlay is another .BAT file. In this case it seems that the Overlay is executed with every value in the set. I have to do a more elaborated and complete test for this.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Mechanics of reading a file with FOR /F

#4 Post by dbenham » 03 Jan 2012 22:36

Aacini wrote:I had made some tests and conclude this point: FOR command always "process" all the values included in its set, that is, all the lines of a file with /F option, all the numbers of a loop with /L option, etc.

I used to think the same. I know that is true for FOR /L based on this:

Code: Select all

echo on
for /l %%n in (1 1 10) do echo %%n&goto :break
:break
Each iteration is still echoed, but only one value is echoed.

But it is NOT quite true for the other types of FOR.

Evidence 1:

Code: Select all

echo on
for /f %%a in (file) do echo %%a&goto :break
:break
Only one iteration and one value is echoed

Evidence 2:
I wasn't totally convinced by the Evidence 1, which was why I ran the test on a 1.5 gigabyte file. Once the first value was echoed, the batch file immediately finished. If it had to wait for the loop to complete it would have taken MUCH longer. Based on this experiment I am absolutely convinced the FOR /F loop does terminate immediately after a GOTO. However, there is still the asynchronous background process that I talked about that must still complete, whatever it may be doing. It continued long after the batch process was complete.

Dave Benham

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Mechanics of reading a file with FOR /F

#5 Post by alan_b » 04 Jan 2012 02:58

Could your malware protection cause a massive delay by investigating the 1.5 GB file traveling via the pagefile ?

It is now second nature for me to switch mine to a special configuration :-
Firewall Blocked (keep out everything)
A.V. Disabled
Defense+ Disabled (ignore internal behaviour)

Sometimes when using a bit of VBS script I even fully disable the software firewall (after unplugging the router).
That I do not understand, I just accept that the Cscript engine sometimes "locks horns" with my firewall blocking outgoing.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Mechanics of reading a file with FOR /F

#6 Post by dbenham » 04 Jan 2012 06:44

@alan_b - Great suggestion - I'll check it out.

EDIT - 1/2 hour later....

And the results are - no change. I tested with a 300MB file this time. There is still a noticeable delay before the first line is read and echoed, after which the batch immediately ends. But there is still the background process reading from the pagefile after the batch terminates.

So unless Avast is lying to me when it reports all the real time monitoring is off, I'd say the behavior is not do to anti-malware interference. (I don't have anything else installed)

The initial delay makes sense since I've already proven that FOR /F makes a copy of the file (in memory apparently) and then reads from the copy, such that the DO clause can not impact the results of the read.

Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Mechanics of reading a file with FOR /F

#7 Post by dbenham » 04 Jan 2012 23:50

I think I have disproven my theory regarding the 2nd question in my original post.

First I tried this code with a 300MB file

Code: Select all

echo on
for /f "usebackq" %%a in (%1) do echo %%n&pause&goto :break
:break

First the FOR loop is echoed followed by a delay while the file is read into memory.
Then the 1st DO clause is echoed, the first value is echoed and then the PAUSE executes.
No evidence of background read process for as long as the batch is paused.
When I finally press a key the batch file immediately terminates and then the background read process begins.

OK - the theory is still plausible

But then I tried this code with the same 300MB file:

Code: Select all

@echo off
for /f "usebackq" %%a in (%1) do rem

The batch file took a long time to complete, during which time there was NO evidence of the background read process. Only after the batch file terminated did the background read process become evident, and it seemed to take just as long to complete as before. If my theory about the background process was correct, then it should have been complete by the time the batch file terminated.

So I am totally at a loss as to what is being read after the batch file completes. :?

Dave Benham

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Mechanics of reading a file with FOR /F

#8 Post by alan_b » 05 Jan 2012 01:42

Windows automatically transfers some W.I.P. (Work In Progress) to Pagefile for future use.

Perhaps it remains available for the rest of the life of the CMD.EXE process that put it there,
and when the batch script ends then Windows releases that space for future W.I.P.
Maybe Windows can only release that reservation by reading it all back again,
especially as the reservation was in "dribs and drabs" whilst "FOR /F ..." was iterating.
What a drag :x

My first P.C. had the luxury of 1 MegaByte of RAM on a card,
and because Bill said 640 kB was more than DOS could use I added something in config.sys that turned the spare 360 kB RAM into a virtual disk R:\.

Batch scripts that painfully clattered when run from Drive A:\ were so much sweeter running on Drive R:\

Crazy idea coming up :-

Configure the B.I.O.S. to convert a spare 2 GB of RAM into virtual Drive R:\ and then under Windows switch the Pagefile from C:\ to R:\

Aacini
Expert
Posts: 1927
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Mechanics of reading a file with FOR /F

#9 Post by Aacini » 08 Jan 2012 20:44

Here is evidence that GOTO command not always terminate a FOR loop that read a file; in particular, when an Overlay Batch file is executed before the GOTO. This is the Main file:

Code: Select all

@echo off
for /F "delims=" %%a in (%1) do echo Main: "%%a" & Overlay "%%a" & goto quit
:quit
The Overlay.BAT file:

Code: Select all

@echo off
echo Overlay: "%~1"
The results:

Code: Select all

C:>test test.txt
Main: "1"
Overlay: "1"
C:>echo Main: "2"  & Overlay "2"  & goto quit
Main: "2"
Overlay: "2"
C:>echo Main: "3"  & Overlay "3"  & goto quit
Main: "3"
Overlay: "3"
C:>echo Main: "4"  & Overlay "4"  & goto quit
Main: "4"
Overlay: "4"
C:>echo Main: "5"  & Overlay "5"  & goto quit
Main: "5"
Overlay: "5"
C:>
It is interesting to note that FOR iterations are echoed in the screen from the second one on, although both the Main file and the Overlay have ECHO OFF commands.

If the GOTO is executed before the Overlay, it works as expected, that is, the Overlay is not executed at all.

If the GOTO is removed, the Overlay is executed 5 times; this also happen even if the Overlay includes an EXIT /B command. However, if the command is EXIT, both the Overlay and the Main file (and the DOS-Window) are closed after the first execution.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Mechanics of reading a file with FOR /F

#10 Post by dbenham » 08 Jan 2012 23:05

Interesting behavior - but I'm not sure what to make of it. CMD.EXE is certainly confused :lol:

It's obvious the loop doesn't terminate with your "overlay" scenario. But I still think the for /f loop does terminate immediately without the overlay.

jeb
Expert
Posts: 1058
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Mechanics of reading a file with FOR /F

#11 Post by jeb » 09 Jan 2012 08:37

dbenham wrote:Interesting behavior - but I'm not sure what to make of it. CMD.EXE is certainly confused :lol:


It's obvious :)
It's the well known behaviour of cmd.exe.
This is the way to switch from the batch parser to the cmd-line parser even in a batch file.

The problem is that this is only active until the for loop ends, then also the batch file will end.
I can't see any useful scenario for this, but perhaps someone will find some...

As the percents are expanded before the for/loop starts, you can't see the effect directly, but with a call it will become visible.

Code: Select all

@echo off
if "%1"=="quit" exit /b
setlocal EnableDelayedExpansion
set "undefinedVar="
set "var=myContent"
for /L %%n in (1 1 3) do @(
  echo(
  echo Round %%n
  call echo The content of undefinedVar="%%undefinedVar%%"
  echo Delayed content of var="!var!"
  call :function
  "%~f0" quit
)
echo Never reached

exit /b

:function
echo The function is called
exit /b


jeb

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Mechanics of reading a file with FOR /F

#12 Post by dbenham » 09 Jan 2012 09:41

Thanks jeb :D

I thought it might be in a command line context, but failed when I tried to derive a proper test. You win yet again. :lol:

Aacini's code does not need the GOTO, nor does it have any effect within a command context. The GOTO does not cause an error because GOTO in a command context never causes an error.

Now that you have shown the way, here is a FOR /F example that demonstrates the same thing.

Code: Select all

@echo off & if "%~1"==":overlay" (goto :overlay) else cls
set n=0
for /f %%A in (test.bat) do (
  set /a "n+=1" & call echo before overlay: line %%n%% junk=%%junk%% & %~0 :overlay & call echo after overlay: line %%n%% junk=%%junk%%
)
:overlay
call echo within overlay: line %%n%% junk=%%junk%%


Dave Benham

Post Reply