Page 1 of 1

Fundamental string processing bug?

Posted: 21 Jan 2012 00:12
by dbenham
I've only tested this on Vista so far.

I am attempting to write a routine that builds a string consisting of a single character repeated N times, up to the maximum variable length of 8191.

It works great until it reaches a length 8184. Then it fails for lengths 8185 - 8191 :!:

I added some diagnostic messages, and it turns out I am not able to append a character to a string that is length 8184 :shock: :?

I thought I had seen a variable with length 8191 before. Am I crazy :?:
Why can't I append to a variable length 8184 :?:
Do I have to compensate for the length of the variable name and/or the command line length :?:
I would have thought that delayed expansion would eliminate the worry about the command line length.

Code: Select all

@echo off
:buildString
setlocal enableDelayedExpansion
if %~1 geq 8192 exit /b
set /a "n=%~1, leftBit=4096"
set "str="
echo target len=!n!
for /l %%n in (1 1 13) do (
  echo(
  call :strLen str len
  echo before double len=!len!
  set "str=!str!!str!"
  call :strLen str len
  echo after double len=!len!
  set /a "bit=n&leftBit, n<<=1"
  if !bit! neq 0 (
    set "str=!str!."
    call :strLen str len
    echo after append len=!len!
  )
)
exit /b
echo(
call :strLen str len
echo final len=!len!
exit /b


:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101116 :$categories StringOperation
:$source http://www.dostips.com
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,0) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b


output:

Code: Select all

>buildstring 8185
target len=8185

before double len=0
after double len=0
after append len=1

before double len=1
after double len=2
after append len=3

before double len=3
after double len=6
after append len=7

before double len=7
after double len=14
after append len=15

before double len=15
after double len=30
after append len=31

before double len=31
after double len=62
after append len=63

before double len=63
after double len=126
after append len=127

before double len=127
after double len=254
after append len=255

before double len=255
after double len=510
after append len=511

before double len=511
after double len=1022
after append len=1023

before double len=1023
after double len=2046

before double len=2046
after double len=4092

before double len=4092
after double len=8184
after append len=8184

Re: Fundamental string processing bug?

Posted: 21 Jan 2012 13:48
by aGerman
Well, it should never work with a length greater than 8190 because of the prepended A in :strLen.
Your effect has however something to do with the entire length of the command line (with expanded variables).

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "str=."
for /l %%n in (1 1 12) do (
  set "str=!str!!str!"
)
set str=!str!!str:~-4091!
call :strLen str len
echo !len!
pause

This returns 8186.
As you can see I removed the enclosing quotation marks in line
set str=!str!!str:~-4091!

To fully understand how it works we should ask jeb :wink: It seems the "tokenizer" was running before the variables expand. For that reason only the expanded expression str=!str!!str:~-4091! is the point of interrest (without set command).

Regards
aGerman

Re: Fundamental string processing bug?

Posted: 21 Jan 2012 18:03
by Aacini
Don't forget that the total length of 8191 includes the variable name and the equal sign, but StrLen just get the VALUE length...

Re: Fundamental string processing bug?

Posted: 21 Jan 2012 18:14
by aGerman
Yeah that's what I tried to explain. I removed the quotes and gained 2 more characters for the variable. But why are only "varName", "=" and value counted? What about the command ("set" in this case) and the space between?

Regards
aGerman

Re: Fundamental string processing bug?

Posted: 21 Jan 2012 18:44
by Aacini
Batch variables are stored in the Environment (a memory block of up to 32 MB long) as strings with this format: VARNAME=VARVALUE0; a byte with binary zero mark the variable end followed by the next variable, and a second zero mark the Environment end. CMD.EXE have a limit of 8 KB when it search the Environment for variable names/values. In conclusion, the maximum VALUE length for 1-letter named variables is 8189 bytes.

I don't know about the maximum length of the command line, but I am pretty sure is larger than 8191 and there is a very easy way to check this. The following line in StrLen:

Code: Select all

for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
is much larger than 8191 if the length of the variable is 8184 (unless the 8191 limit is taken before delayed variable expansions...)

Re: Fundamental string processing bug?

Posted: 21 Jan 2012 18:57
by aGerman
That sounds logical to me. But remember I gained 2 characters by removing the quotation marks. In this case the line
set "var=value"
would generate a string
"var=value"\0
instead of
var=value\0
in the environment. That's confusing :?

Regards
aGerman

Re: Fundamental string processing bug?

Posted: 22 Jan 2012 12:21
by aGerman
OK, I reread there.

[...]
On computers running Microsoft Windows XP or later, the maximum length of the string that you can use at the command prompt is 8191 characters.
[...]

Examples
[...]
In a batch file, the total length of the following command line that you use in the batch file cannot contain more than either 2047 or 8191 characters (as appropriate to your operating system):
cmd.exe /k ExecutableFile.exe parameter1, parameter2 ... parameterN
This limitation applies to command lines that are contained in batch files when you use Command Prompt to run the batch file.
[...]
In Command Prompt, the total length of EnvironmentVariable1 after you expand EnvironmentVariable2 and EnvironmentVariable3 cannot contain more than either 2047 or 8191 characters (as appropriate to your operating system):
c:> set EnvironmentVariable1=EnvironmentVariable2EnvironmentVariable3
[...]


It seems that neither the command line nor the variable could contain more than 8191 characters (even if the OS allows 32767 characters for a environment variable).

It stays confusing since the command line with expandet variables is longer than 8191 characters...

Regards
aGerman

Re: Fundamental string processing bug?

Posted: 22 Jan 2012 23:01
by dbenham
Most definitely confusing.

It appears the total number of characters that appear after the SET command cannot exceed 8191, after delayed expansion. The first space does not count. But every character after that does count. So a command like

Code: Select all

SET    "VAR=!STR!xxxx"
has a max final value length of 8182 because there are 4 spaces between SET and ". The first one doesn't count toward the limit. So 3 spaces + 2 quotes + 3 chars for variable name + 1 equal sign gives 9 characters to be subtracted from the theoretical limit of 8191.

So Aacini is correct that the maximum value length that can be assigned within a script is 8189 for a single char variable name. But I wonder if CMD can expand a value of length 8191 that was set by an executable, not by batch :?:

If an option is used, then the 1st character after the last option (usually a space) does not count, but all remaining characters count toward the limit of 8191.

The equal sign is a valid token delimiter, so this command:

Code: Select all

SET/P=!STR!xxxx
can print a maximum of 8191 characters since the = does not count toward the limit. But put a space in front of the = and the limit is 8190.


I've done some experiments with the maximum total command line length as well: the total command line length cannot exceed 8191 after the normal expansion phase.

Suppose I have a variable STR that has length 8187.
This command line with exactly 8191 characters after expansion succeeds

Code: Select all

REM %STR%

But this command with 8192 characters fails

Code: Select all

REM %STR%x

The total command line length after FOR variable and delayed expansion phases does not matter. Both of these commands succeed:

Code: Select all

REM !STR!1234567890
for /f %A in ("!STR!") do REM %%A1234567890

Re: Fundamental string processing bug?

Posted: 23 Jan 2012 06:50
by jeb
dbenham wrote:So Aacini is correct that the maximum value length that can be assigned within a script is 8189 for a single char variable name. But I wonder if CMD can expand a value of length 8191 that was set by an executable, not by batch :?:

Also a variable with 8191 characters can be created with batch (obviously) :D

Code: Select all

setlocal enableDelayedExpansion
set "X=."
for /L %%n in (1,1, 13) do set "X=!X:~0,4000!!X:~0,4000!"
for /L %%n in (1,1, 180) do set "X=!X!."
REM Now X has 8180 dots

set Y=%X%^
12345678#901
REM Y has now 8191 characters, ending with 12345678901, the # was removed by the 8192 "BUG"

echo pos 8189=!Y:~8188,1!
echo pos 8190=!Y:~8189,1!
echo pos 8191=!Y:~8190,1!
echo pos 8192=!Y:~8191,1!


dbenham wrote:I've done some experiments with the maximum total command line length as well: the total command line length cannot exceed 8191 after the normal expansion phase.


It can without syntax error

Code: Select all

echo %Y:~0,8185%^
abcdefghijklmnop
echo -------

The output is odd
Output wrote:........................................................................................................................
....................12345abcdeg-------


The rest of line 2 (after "g") is removed, the "f" is also removed and the CR/LF is removed :!:

I used this to create a single CR before I learned the "copy /z" trick.

Code: Select all

(
echo(!Y!#
) > tmpFile

If Y is 8189 charcters long, only the LF will be removed, so a single CR will be placed at the end of the file

dbenham wrote:The total command line length after FOR variable and delayed expansion phases does not matter. Both of these commands succeed:
Code:
That's wrong, these lines create an error, and will be not executed but without showing an error message.
Only the REM !LONG!!LONG! will never create an error, as REM supress the delayed expansion.

jeb

Re: Fundamental string processing bug?

Posted: 23 Jan 2012 07:18
by dbenham
Thanks jeb for clearing everything up. :D

:shock: That 8192 "bug" is insane.

Dave Benham