View unanswered posts | View active topics It is currently 18 Apr 2014 02:33



Post new topic Reply to topic  [ 9 posts ] 
How Set/p works 
Author Message
Expert

Joined: 12 Aug 2011 13:57
Posts: 26
Post How Set/p works
How Set/p works

I have done many test to better understand the way set/p operates especially when used with input-redirection.

set/p can add a variable and give it a value.
set/p can NOT remove a variable.
Trying to use set/p to set a variable to an empty string will not give a clear message but set errorlevel to 1.
This can be interpreted as 'Error: no data/input'

When reading from stream the same thing happens when encountering empty lines.
This is also the reason you have to reset the variable yourself inside the read-loop.
Set /p will not reset the variable if current input is empty as with an empty line.

special note: set/p can set errorlevel but never resets the errorlevel to zero !!!
So if your batchfile checks errorlevel somewhere be sure to reset errorlevel in the read-loop.
The fastest way to reset errorlevel seems to be with 'verify>nul'
Code:
   set /p line=
   if errorlevel 1 set "line=" & verify>nul

I have been experimenting a lot with the set /p command
apparently this is how 'set/p VAR=' works

Reading characters:
Characters are read from the inputstream and put in a characterbuffer until one of three conditions is true:
Condition 1: The last 2 read char are CRLF. (a complete line)
Condition 2: There are 1024 characters in the charbuffer. (buffer full)
Condition 3: A timeout, usually caused by end-of-stream. (time-out error-condition)

Processing the charbuffer:
All control-characters from the end of the char-buffer are discarded (possible dataloss)
If there is a NUL character in the charbuffer all characters following the first NUL in the charbuffer
will be discarded (dataloss)

Moving from charbuffer to Var.
If charbuffer is empty report errorcondition:
...set errorlevel to 1 (meaning No value entered)
If charbuffer is not empty:
... Move the string from charbuffer to the variable named in the set/p command.

Set/p is done and returns control to the batchfile.

Some remarks regarding control-characters (ASCII 0-31)
All control char except NUL can be read and put in a variable by use of the set/p read from stream.
The end-of-line combination CRLF will be discarded. However single CR and LF can be put in a variable as long
as they are not combined to a CRLF pair and there is at least one non-control character following the control character.
Remark: Because TAB is also a control-char TAB can not be used to 'protect' control characters from being skipped.
All normal chars like letters numbers, even a space can 'protect' control characters.

In pseudo-code
Code:
initially charbuffer is empty
:innerloop
  REM reading characters
  repeat
     get char from input (stream or manual input and store in charbuffer)
  until - char buffer full (number of chars in charbuffer is 1024)
     OR - last two char added to charbuffer are CRLF
    OR - a timeout occurred when requesting next char (usually meaning end-of-input-stream)

  REM Processing the charbuffer:
  loop charbuffer from first received to last received char:
     if char == NUL delete this char and all following chars in the charbuffer.
  loop charbuffer from last received to (first or to a non-controlchar character)
     if char is control-char remove this char from the charbuffer
     (REM this loop removes all trailing control chars including the CRLF)

  REM Moving from charbuffer to Var.
  If charbuffer is empty (REM report errorcondition: Note: Var value remains unchanged!
     set errorlevel to 1 (meaning No value entered)
  else
     Move the string from charbuffer to the variable named in the set/p command.

  REM Set/p ends it's inner loop.
  Control is returned to the batchfile or command prompt.

of course for manual input the entered values can be changed by using backspace etc.

  On return in the batchfile there are 2 states possible.
  1: the error-state, errorlevel is set, no input found, don't use Var-value because its
     value is from a previous loop
    Error-state can be caused by an empty line but also by timeout (end-of-stream)
  2: no error-state, we got some value in our Var-variable.
     This value can be one of the following
    - a complete non-empty line
    - an incomplete non-empty line caused bij time-out (end-of-input-stream)
    - a data-chunk of max 1024 characters, not line-oriented


TODO's
[1]
What is the length of the timeout used internally by set/p.
[2]
I'm not sure if a timeout can occur before the inputstream is empty.
It might be that the inner loop from the set/p command can be stopped by for example the pause-key.
If this can be done it will result in set/p returning 2 half lines instead of 1 line.
So far I have tested the time-out behavior only in the end-of-stream siituation.
[3]
Test if time-polling can be used for better handling of the error-state.
That is to determine if the error was caused by an empty line/lines in the inputstream
or was caused by timeout on end-of-input-stream situation.
[4]
Test if special characters like ^ & ! % " ' = might cause problems when read using set/p
| have not yet tested these but i don't expect these to have special handling in set/p
[5]
Everything I might have missed or misinterpreted in my tests. :)

OJB


26 Aug 2011 07:17
Profile
Expert

Joined: 30 Aug 2007 08:05
Posts: 658
Location: Germany
Post Re: How Set/p works
Respect :!: 8)

Perfect analysis and a good explanation.
The theory sounds very convincing.

Now I will do some testings with pipes and set /p.

jeb


26 Aug 2011 10:01
Profile
Expert

Joined: 30 Aug 2007 08:05
Posts: 658
Location: Germany
Post Re: How Set/p works
My first testings with pipes ...

In all my previous test with set /p and pipes (some years/month ago),
I used simply to few data, so it failed at the 1024 byte buffer limit. :(

The main problem of pipes are that both parts runs (mostly) asynchronous in different processes.

And one interessting thing is that the PAUSE key only pauses the right process.

For my tests I create a "num.txt" file with 500 lines, each with 32 bytes of data (counting also CR and LF).

CreateNum.bat wrote:
@echo off
setlocal EnableDelayedExpansion
(
for /L %%n in (1,1,500) DO (
set "num=1000%%n"
set "num=!num:~-4!"
echo a!num!,b!num!,c!num!,d!num!,e!num!#
)
) > num.txt


SlowType.bat wrote:
@echo off
setlocal EnableDelayedExpansion
set lineNr=0
for /F "delims=" %%A in (num.txt) DO (
call call call set wait=4
set /a lineNr+=1
title "SlowType !lineNr!"
echo(%%A
)


ReadPipe.bat wrote:
@echo off
cls
setlocal EnableDelayedExpansion
set empty=0
set /a loopCnt=0
for /L %%n in (1 0 1) do (
set /a loopCnt+=1
set line=
set /p line=
if defined line (
echo( !loopCnt!: !line!
) else (
set /a Empty+=1
if !empty! GTR 10 call :HALT
)
)
exit /b

:Halt
call :_halt 2> NUL
:_halt
()


And now tested with
Code:
slowType.bat | ReadPipe.bat


On my system three call's are enough to slow down the system,
and the readPipe can read each line.

But if I remark the call call call set wait=2 line, I got only 16 lines.
Output wrote:
1: a0001,b0001,c0001,d0001,e0001#
2:
a0033,b0033,c0033,d0033,e0033#
4: #
5: 8#
6: 60#
7: 192#
8: 0224#
9: e0256#
10: ,e0288#
11: 0,e0320#
12: 52,e0352#
13: 384,e0384#
14: 0416,e0416#
15: d0448,e0448#
16: ,d0480,e0480#


If you press the pause-key, the slowType.bat doesn't stop, the title still count up,
but only till the internal "Pipe-Buffer" is full, then also the slowType.bat stops.
And if the pause-key is pressed again both parts starts again.

There seems to be a problem if the data creator is faster then the data consumer.
Then set /p can't read the correct data from the buffers (MORE of FINDSTR can read it).
Perhaps it's an effect of the internal timeout and/or buffer limit.
One of my theories is that in the buffer the CR/LF are reduced to single LF, and therefore the set/p can't access the data any more.

As one result I got ... we need more investigations :)

jeb


26 Aug 2011 13:53
Profile
Expert

Joined: 12 Feb 2011 21:02
Posts: 1118
Location: United States (east coast)
Post Re: How Set/p works
OJBakker wrote:
Reading characters:
Characters are read from the inputstream and put in a characterbuffer until one of three conditions is true:
Condition 1: The last 2 read char are CRLF. (a complete line)
Condition 2: There are 1024 characters in the charbuffer. (buffer full)
Condition 3: A timeout, usually caused by end-of-stream. (time-out error-condition)

I have slight corrections based on experiments on a Vista 64 machine.

Reading characters:
Characters are read from the inputstream and put in a characterbuffer until one of three conditions is true:
Condition 1: The last 2 read char are CRLF or LFCR. (a complete line)
Condition 2: There are 1023 characters in the charbuffer. (buffer full)
Condition 3: A timeout, usually caused by end-of-stream. (time-out error-condition)


The longest line that can be processed reliably is 1021 (not including the terminating CRLF or LFCR).

I suppose some moderately complex logic could be written to detect buffer full condition and concatenate lines appropriately. Special processing would be required to handle whenever CRLF (or LFCR) is split across the 1023 buffer length boundary. The process would have to assume that CR and LF are always paired in the source file.

Dave Benham


28 Dec 2011 12:12
Profile
Expert

Joined: 22 Jan 2010 18:01
Posts: 1682
Location: Germany
Post Re: How Set/p works
dbenham wrote:
Condition 2: There are 1023 characters in the charbuffer. (buffer full)

I assume the 1024th character is NUL (string terminator).

Regards
aGerman


28 Dec 2011 12:44
Profile
Expert

Joined: 13 Jan 2012 21:24
Posts: 394
Post Re: How Set/p works
OJBakker wrote:
I have done many test to better understand the way set/p operates especially when used with input-redirection.
Neat! And confirmed under xp.sp3 (with dbenham's correction of 1,023 vs. 1,024).

dbenham wrote:
The longest line that can be processed reliably is 1021 (not including the terminating CRLF or LFCR).
For a single line, 1023 looks OK (but if one continues reading off the same input stream then, yes, 1021 is the highest with "default" behavior).

From what I see here, a 1022 long line returns the expected 1022 string to 'set /p' but (discards the 1023'rd CR character and) leaves the following LF into the stream, which can be read by a subsequent 'set /p'. A 1023 long line also returns the full string to 'set /p' but leaves the CR/LF in the stream, which the next read will take as a blank line.

dbenham wrote:
I suppose some moderately complex logic could be written to detect buffer full condition and concatenate lines appropriately. Special processing would be required to handle whenever CRLF (or LFCR) is split across the 1023 buffer length boundary. The process would have to assume that CR and LF are always paired in the source file.
Indeed. Below is a possible draft, assuming normal CRLF line endings. Save it as typefile.cmd and run it with a text file as first argument...
Code:
:: 'for /f (.txt) do echo' mockup, but preserves empty lines
:: and is safe with un-escaped odd characters, except it
:: drops trailing control characters at the end of the line
:: due to 'set /p' quirk
:: and misses by one multiple empty lines at the end of file
:: due to 'find' quirk

@echo off
setlocal disableDelayedExpansion

@rem original call must reference existing file
if exist "%~1" (
  cmd /s /c ""%~f0" :loop "%~1""
  @rem --- reached after nested call ---
  endlocal
  exit /b %errorlevel%
)

@rem nested call expected to reference ':loop'
@rem can't match existing file because of leading colon
set "arg1=%~1" || set "arg1="
if not "%arg1%"==":loop" (
  echo.
  echo *** unrecognized target "%~1" 2>&1
  exit /b -1
)
shift /1

@rem label itself not used, but 'cmd /c' nesting needed
@rem in order to break out of infinite 'for' loop cleanly
:loop
set lf=^


@rem above 2 blank lines are required - do not remove
set "file=%~1"

@rem 'find' counts 2 empty lines at the very end of file as 1
for /f %%a in ('find /c /v "" ^<"%file%"') do (
  set /a lines = %%a
)

@rem loop 'set /p' until line count matches
<"%file%" (
  setlocal enableDelayedExpansion
  set "line="
  for /l %%a in () do (
    @rem loop break condition which requires 'cmd /c' nesting
    if "!lines!" leq "0" exit
    @rem read next chunk
    set "chunk=" & set /p "chunk="
    @rem process current chunk
    if not defined chunk (
      @rem either empty line, or leftover '\r\n' from previous 1,023 one
      call :line
      set "line="
    ) else (
      if "!chunk:~0,1!"=="!lf!" (
        @rem leftover '\n' from previous 1,022+'\r', flush preceding line
        call :line
        set "line=!chunk:~1!"
      ) else (
        @rem regular chunk, append to current line
        set "line=!line!!chunk!"
      )
      if "!chunk:~1021,1!"=="" (
        @rem proper ending chunk, flush line
        call :line
        set "line="
      )
    )
  )
  endlocal
)
echo --- never reached ---
endlocal
goto :eof

@rem process current line
:line
@rem '(' paranthesis guards against '', '/?', echo* matches
echo(!line!
set /a lines -= 1
goto :eof
Critique most welcome, of course... For one thing, it won't handle lines ending in control characters, nor lines wider than around 8K, nor multiple empty lines at the end of the file, but other than that it seems to generate identical copies.

Liviu


09 Feb 2012 00:46
Profile

Joined: 23 Jun 2013 06:15
Posts: 543
Location: Germany
Post Re: How Set/p works
Good analysis!
But i assume, that one thing is not correct (i cannot prove this, but i think it is improbable enough):
OJBakker wrote:
Condition 3: A timeout, usually caused by end-of-stream. (time-out error-condition)

I think there is no timeout, just because this is not neccessary (using MS C or C++, i assume MS had used MS C/C++ for programming Dos/Win any version).
Code:
// timeout version
void setSlashPImplTo (char* variable) {
   std::ifstream is (STDIN);
   is.read ((byte*) variable, 1023, timeout);   // last byte in buffer is always 0
}

// std version
void setSlashPImplStdUse (char* variable) {
   std::ifstream is (STDIN);
   if (is) {
      is.read ((byte*) variable, 1023);   // last byte in buffer is always 0
   }
}
This is C++ style pseudo-code.
The second code faster and more secure, as this has no side effects, and i assume this is the cause why MS recommends it doing it this way.
Additionally they had to program the timeout functionality, but such a functionality is not found using MSVS (MS Visual Studio) and i doubt they just have forgotten to add it in all versions since 1.0.
(Similar if MS has used their C language to program this.)

penpen


01 Oct 2013 17:46
Profile
Expert

Joined: 12 Feb 2011 21:02
Posts: 1118
Location: United States (east coast)
Post Re: How Set/p works
It really is true - SET /P can read from the file while another process is writing to it. SET /P does not wait around for a newline or for the 1023 char buffer to get full. If it reaches the end of the available input stream, then it will return the partial line. I don't know how to test to see if there is a timeout period that must expire before continuing, or it if returns immediately after detecting the end of the data. After reading the partial line, SET /P can then try again and read subsequent data.

Testing the behavior can be tricky, because the writing process may be buffered, and it may wait until it writes a newline before it flushes the content to disk.


Dave Benham


03 Oct 2013 13:50
Profile

Joined: 23 Jun 2013 06:15
Posts: 543
Location: Germany
Post Re: How Set/p works
There is a way to test this (and to test all other unclear behavior):
You may write a C++ programm, using Microsoft Visual Studio that performs:
Code:
#include <process.h>
#include <stdlib.h>

int main (int argc, char** argv) {
   system("Z:\test.bat");
}
And do what you like within the batch file.
You can debug using "trace into" and watch the source code, after you have downloaded all (needed) debug symbols (if you don't download it you have fun with assembler):
http://msdn.microsoft.com/en-us/windows/hardware/gg463028.aspx

But the problem, why i actually haven't done this is: At work i don't have the rights to install the debug symbols, i don't want to watch assembler, and i think they wouldn't allow me to investigate this just for my curiosity.
And at home i have not the full version (only the express version) where this seems to be not possible... . In addition the needed packages seem to be really BIG, and i have only a 20GB hdd.

If anyone has the access to these things and want to find this out, AND this is allowed in your country, AND it isn't disallowed by microsofts EULA, and whatever they additionally have: Feel free to find it out.

penpen


03 Oct 2013 15:05
Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 


Who is online

Users browsing this forum: No registered users and 16 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Forum style by Vjacheslav Trushkin for Free Forums/DivisionCore.