Rules for how CMD.EXE parses numbers

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2394
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Rules for how CMD.EXE parses numbers

EDIT - After this initial post, many additional discoveries were made and documented by various people in later posts of this thread. This initial post has been edited to account for some, but not all, of those additional discoveries. Please read the entire thread for additional edge cases and differences between Windows versions.

There are multiple contexts where CMD.EXE parses a string into a 4 byte signed integer value ranging from -2147483648 to 2147483647:

- SET /A
- IF
- %var:~n,m% (variable substring expansion)
- FOR /F "TOKENS=n"
- FOR /F "SKIP=n"
- FOR /L %%A in (n1 n2 n3)

In all the above contexts, CMD can parse numbers expressed as decimal, hexadecimal, or octal notation:

Code: Select all

``decimal:     [-]{non-zero decimal digit}[{decimal digit}...]hexadecimal: [-]0{x|X}{hexadecimal digit}[{hexadecimal digit}...]octal:       [-]0{octal digit}[{octal digit}...]{decimal digit} = any of {0|1|2|3|4|5|6|7|8|9}{hexadecimal digit} = any of {0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|a|b|c|d|e|f}{octal digit} = any of {0|1|2|3|4|5|6|7}``

But there are subtle differences depending on the context. The differences are in how negative numbers are parsed, and also how overflow and invalid number errors are handled. It appears there is one set of rules for SET /A, and another set of rules used by all other contexts. To make matters worse, SET /A behavior on XP is different than the more modern Windows versions (Vista onward).

One additional command accepts only decimal numbers:

- EXIT [/B] n

SET /A
(Vista, Windows 7, [Windows 8?])

Literals

decimal - The sign is initially ignored and the string of decimal digits is first converted into the unsigned binary numeric representation. Afterward, if the number was preceded by a negative sign, then the negative value is computed by taking the 2's compliment of the binary value. (invert digits and add 1)

-1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111

Everything works great except the negative limit of a signed 4 byte integer cannot be expressed! The problem is the parser limits itself to 31 bits in the 1st step. If the 32nd bit (the sign bit) is set, then the parser detects an overflow error.

-2147483648 -> 1000 0000 0000 0000 0000 0000 0000 0000 : ERROR - overflow detected

The actual error message is "Invalid number. Numbers are limited to 32-bits of precision.", with ERRORLEVEL=1073750992. Very misleading and unfortunate if you ask me.

If an invalid digit is used, then a different error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.

hexadecimal - The parser initially ignores any sign and the string of hexadecimal digits is converted into the unsigned binary numeric representation. If the number was preceded by a negative sign then the negative value is computed by taking the 2's compliment of the binary value.

The difference is that the SET /A hexadecimal parser allows the 32nd bit to be set during the initial parsing. After the initial parsing is complete, the 32nd bit is treated as the sign bit. So there are 2 representations for every number!

0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
-0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1

0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
-0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1

The oddball is -2147483648 because the 2's compliment of that number is itself!

0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000
-0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000 -> 1000 0000 0000 0000 0000 0000 0000 0000

Actually there are many more representations for each number because additional leading 0s can be added. There is no limit other than the 8191 limit to a command line.

0x1, 0x01, 0x00000000000000000000000000000000001 are all equivalent representations of 1.

Another odd SET /A behavior is that overflow conditions are ignored when parsing hexadecimal notation. Any hex notation that would require 33 or more bits will result in either 1 or -1.

The following all result in -1:

Code: Select all

``set /a 0x1000000000set /a 0xFFFFFFFFFFset /a 0x888888888888888888888``

The following all result in 1:

Code: Select all

``set /a -0x1000000000set /a -0xFFFFFFFFFFset /a -0x888888888888888888888``

If an invalid hex digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.

octal - The number is parsed similarly to decimal. The sign is initailly ignored and the octal digits are converted into a 31 bit unsigned integer. If the 32nd bit is set then an overflow error is detected. Any negative sign is applied afterward by taking the 2's compliment, but only if no error was detected.

So -2147483648 cannot be represented with octal notation, just as it cannot be represented with decimal notation.

Just like with hexadecimal, any number of leading zeros may be prefixed to a valid octal number.

00000000000000000000000000000000000000000000000001 --> 1
-00000000000000000000000000000000000000000000000001 --> -1

If an invalid octal digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991. This error is a common occurrence when decimal 8 or 9 is zero prefixed, as can occur when parsing date and time information.

(XP)

First, any leading minus sign is ignored and the value is parsed as a 32 bit unsigned integer. Afterward, the 32nd bit is treated as a sign bit, and then any minus sign is applied by taking the two's complement. So every value has at least two representations with each base:

SET /A 2147483650 = SET /A -2147483646 = -2147483646
SET /A 0x80000002 = SET /A -0x7FFFFFFE = -2147483646
SET /A 020000000002 = SET /A -017777777776 = -2147483646

Any value that exceeds 32 bits during the initial unsigned parsing results in an overflow error.

There is one complication when all bits are set. When CMD.EXE is first launched, then the following all give the expected value of -1

SET /A 4294967295 = -1
SET /A 0XFFFFFFFF = -1
SET /A 037777777777 = -1

But if a math overflow is ever detected by CMD.EXE, then the above three will raise an overflow error instead for the remainder of the CMD.EXE session. The triggering overflow can occur in SET /A as described above. It can also occur with IF, FOR "TOKENS=n", FOR "SKIP=n", FOR /L, and variable expansion with substring operations as described later on.

Variables (all versions)

The rules for parsing un-expanded numeric variables are different. All three numeric notations employ a similar strategy: First ignore any leading negative sign and convert the number into an unsigned binary representation, stopping as soon as an invalid character is reached. Then apply any leading negative sign by taking the 2's compliment.

The big difference is that overflow conditions no longer result in an error. Instead the maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.

Undefined variables are treated as zero, and variables that do not contain a valid numeric format are treated as zero.

A defined variable that does not start out as a valid number is treated the same as an undefined variable - value equals zero. But something like -123JUNK is assigned value -123.

IF

IF only parses numbers when one of (EQU, NEQ, LSS, LEQ, GTR, GEQ) is used. The == comparison operator always results in a string comparison.

All three numeric notations employ a similar strategy: First ignore any leading negative sign and parse the number into an unsigned binary 31 bit integer. Then apply any leading negative sign by taking the 2's compliment.

The big difference is that overflow conditions no longer result in an error. Instead the appropriately signed maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.

Code: Select all

``if 2147483647==999999999999999999999999 echo These numbers are equalif -2147483648=-999999999999999999999999 echo These numbers are equalif 0xFFFFFFFFFFFFF==2147483647 echo These numbers are equalif -0xFFFFFFFFFFFF==-2147483648 echo These numbers are equalif 077777777777777==2147483647 echo These numbers are equalif -077777777777777==-2147483648 echo These numbers are equal``

This is a radical departure for hex notation. With SET /A, 0xFFFFFFFF sets the sign bit and the value is -1. With IF, 0xFFFFFFFF is treated as a positive number with an overflow condition, so it becomes 2147483647.

One other major difference - Numeric parsing is abandoned when an invalid digit is detected and IF uses a string comparison. Numeric parsing is also abandoned if the number starts with two or more minus signs.

Code: Select all

``if 09 lss 9 echo TRUE because 9 is an invalid octal digit so string comparison is usedif --1 gtr 1 echo TRUE because only one minus allowed for numbers``

%var:~n,m% (variable substring expansion)

I believe substring numeric parsing is the same as for IF, but it is difficult to prove because variables are limited to length 8191. The only thing I can prove is that overflow conditions give the same result as a non-overflow number that exceeds the length of the string.

Code: Select all

``set var=hello::All of the following statements print out the entire stringecho %var:~0,5%echo %var:~0,10%echo %var:~0,9999999999999999999%echo %var:~0,0xA%echo %var:~0,0xFFFFFFFFFFFFFFFFF%echo %var:~0,05%echo %var:~0,0777777777777777777%echo %var:~-5%echo %var:~-10%echo %var:~-9999999999999999999%echo %var:~-0xA%echo %var:~-0xFFFFFFFFFFFFFFFFF%echo %var:~-05%echo %var:~-0777777777777777777%``

If an invalid digit or more than one minus sign is detected, then variable expansion is aborted and the result is the code (minus the percents) instead of a substring of the value.

echo %var:~09% --> var:~09

FOR /F "TOKENS=n"

Again I believe the numeric parsing rules are the same as for IF, but it is even more difficult to prove.

Any value < 1 results in a syntax error. This includes negative values with an overflow condition.

Also, an invalid number due to an invalid digit or multiple minus signs results in a syntax error.

Any value > 31 results in a FOR /F parsing no-op (that request for a token is ignored) because FOR /F is limited to parsing a maximum of 31 tokens. This includes positive numbers with an overflow condition.

Code: Select all

``for /f "tokens=31,32,0xFFFFFFFFFFFF,1" %A in (  "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33") do @echo A=%A,  B=%B,  C=%C,  D=%D``
results in A=1, B=31, C=%C, D=%D

This has nothing to do with number parsing, but note how the token numbers are sorted prior to assigning the letters.

FOR /F "SKIP=n"

Exactly the same as "TOKENS" except I believe the max SKIP value is 2147483647. I did some testing, but it is a pain, and I'm not sure I tested properly.

Any SKIP value < 1 results in a syntax error.

Also, an invalid number due to an invalid digit or multiple minus signs results in a syntax error.

I believe any SKIP value > 2147483647 results in immediate termination of the parsing of the input. But the command is still executed even if the positive overflow occurs.

Code: Select all

``for /f "skip=0x80000000" %A in ('dir "does not exist"') do @echo %A``

From what I remember of my testing, "SKIP=0x7FFFFFFF" properly skipped the proper number of lines in a huge file, and "SKIP=0x80000000" immediately returned without error and without taking the time to scan the huge file.

FOR /L %%A in (n1 n2 n3)

Again, all three numbers are parsed using basically the same rules as used by IF. Overflow values are converted into the appropriately signed maximum magnitude value.

Here is a demonstration of all three number bases:

Code: Select all

``C:\test>for /L %N in (0xF 012 35) do @echo %N152535``

Here is a demonstration of both positive and negative overflow:

Code: Select all

``C:\test>for /L %N in (999999999999999999999999 -1 0x7FFFFFFD) do @echo %N214748364721474836462147483645C:\test>for /L %N in (-0777777777777777777777777 1 -0x7FFFFFFD) do @echo %N-2147483648-2147483647-2147483646-2147483645``

FOR /L will parse each numeric token up until it finds an invalid character (invalid digit, or multiple minus signs). If the token starts off with an invalid character, then it is treated as zero. Not shown, but missing values are also treated as zero.

Code: Select all

``C:\test>for /L %N in (G45 1 038) do @echo %N0123``

This has nothing to do with parsing, but setting the end value to the max (or min) value will result in an endless loop if the increment matches the sign because the incremented value results in an overflow which results in the opposite sign. The examples below use EXIT to break out of the endless loop.

Code: Select all

``C:\test>cmd /c for /l %N in (0x7FFFFFFE 1 0x7FFFFFFF) do @(echo %N^&if %N geq -0x7FFFFFFE exit)21474836462147483647-2147483648-2147483647-2147483646C:\test>cmd /c for /l %N in (-0x7FFFFFFE -1 -0x80000000) do @(echo %N^&if %N leq 0x7FFFFFFE exit)-2147483646-2147483647-214748364821474836472147483646``

EXIT n
EXIT /B n

The EXIT command is radically different than all other contexts in that it only accepts decimal values. The return code can be any value represented by a 32 bit signed integer.

Any decimal number of any magnitude will be accepted. There is a perpetual overflow rollover into the opposite sign. I believe any leading minus sign is initially ignored, and the value is parsed into a 32 bit unsigned integer, modulo 4294967296. If there was a leading minus sign, then the two's complement is taken, and finally the 32nd bit is then treated as a sign bit.

The token is parsed as a decimal number up until the first invalid character. All remaining characters are then ignored.

A token that starts of with an invalid character is treated as no value (the result is the same as calling EXIT without any argument).

Code: Select all

``@echo offsetlocal enableDelayedExpansionfor %%N in (  2147483647  2147483648  4294967295  4294967296  4294967297  6442450943  6442450944  8589934591  8589934592  8589934593  00000045  51g  ""  --1  hello) do (  echo EXIT /B %%~N  call :test %%N  echo !errorlevel!  echo()exit /b:testexit /b %1``

--OUTPUT--

Code: Select all

``EXIT /B 21474836472147483647EXIT /B 2147483648-2147483648EXIT /B 4294967295-1EXIT /B 42949672960EXIT /B 42949672971EXIT /B 64424509432147483647EXIT /B 6442450944-2147483648EXIT /B 8589934591-1EXIT /B 85899345920EXIT /B 85899345931EXIT /B 0000004545EXIT /B 51g51EXIT /B51EXIT /B --151EXIT /B hello51``

Dave Benham
Last edited by dbenham on 21 Sep 2016 18:45, edited 5 times in total.

jeb
Expert
Posts: 967
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Rules for how CMD.EXE parses numbers

Hi Dave,

nice to see, that even the last undocumented places are discovered now.

But I'm missing some new challenges, it seems to be nothing unclear left.

jeb

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Rules for how CMD.EXE parses numbers

jeb wrote:But I'm missing some new challenges, it seems to be nothing unclear left.

What oh what would it take to get you interested in Unicode matters

For an example, the most elementary challenge of iterating through all files in a directory (including system/hidden ones) is not satisfactorily resolved. Pipes, input redirection and for/f 'dir /a-d' loops all end up converting the filenames to the active codepage, and are therefore useless against Unicode names.

Maybe someone should start a "top 10 open issues" page on dostips.

Liviu

P.S. @Dave, thanks for the writeup, and sorry for the hijack - it was just too tempting

dbenham
Expert
Posts: 2394
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Rules for how CMD.EXE parses numbers

I like that idea of a list - You've touched on another batch "unsolved" problem with your recent post dealing with search and replace.

Those types of problems are interesting, but they are more about finding an algorithm that will work efficiently and reliably in batch.

This post is more about understanding how CMD.EXE works. It doesn't solve any problem directly but potentially opens up new avenues for development and/or identifies blind alley dead ends.

Have no fear jeb - there is always more to discover.

I've been wanting to fully investigate file patterns with wild cards, especially in a RENAME context, but I haven't gotten around to it. I've got some understanding, but I think there is more to discover. I almost remember some very surprising RENAME results with multiple * that I haven't been able to reproduce.

Update - I've actually tackled the RENAME issue at How does the Windows RENAME command interpret wildcards?

Dave Benham
Last edited by dbenham on 21 Sep 2016 20:47, edited 1 time in total.

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Rules for how CMD.EXE parses numbers

dbenham wrote:It appears there is one set of rules for SET /A, and another set of rules used by all other contexts.

Here is one more oddity sighted in XP. It seems that set/a will strip the outer quotes from an immediate "123" number, or an explicit %var% expansion, but not from an implied use of the same variable.

Code: Select all

``set X="9"set /a Y = %X% + 10set /a Z = X + 10echo X = %X%, Y = %Y%, Z = %Z%``

Code: Select all

``X = "9", Y = 19, Z = 10``

Liviu

jeb
Expert
Posts: 967
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Rules for how CMD.EXE parses numbers

Liviu wrote: It seems that set/a will strip the outer quotes from an immediate "123" number, or an explicit %var% expansion, but not from an implied use of the same variable.

Interessting, I tested it and I suppose all quotes are removed after all batch parser phases at the first step of the SET/A execution (before the immediate variables will be expanded)

Code: Select all

``setlocal EnableDelayedExpansionset Q="123"set /a res1=!Q!+2set /a res2="1"2"3"+"2"""set /a res3=Q+2set res``

Only res3 fails, as the remove quotes phase is over

jeb

dbenham
Expert
Posts: 2394
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Rules for how CMD.EXE parses numbers

I agree - SET /A removes all quotes prior to expanding any variables.

A SET /A variable is expected to contain a number. The interesting aspect is that SET /A treats the value of a variable as zero if the contents cannot be parsed as a number. There is no difference between an undefined variable and a variable with an invalid number.

An invalid number as a literal raises an error
An invalid number within a variable becomes zero

Dave Benham

carlos
Expert
Posts: 501
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: Rules for how CMD.EXE parses numbers

Very interesting post Dave. Only a little info, in for /f

In tokens and skip options any value <=0 do a error.
Internally default values are:
tokens=1
skip=0

then you can be redundant in tokens, using tokens=1 but you cannot redundant in skip.
You cannot write skip=0

any value <=0 do a error.
Only in tokens If you specify a negative number using the hexadecimal notation, this token is ignored.

Code: Select all

``for /f "tokens=1,0xffffffff,2" %%a in ("t1 t2") do echo.%%b``

Aacini
Expert
Posts: 1670
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Rules for how CMD.EXE parses numbers

dbenham wrote:I like that idea of a list - You've touched on another batch "unsolved" problem with your recent post dealing with search and replace.

Those types of problems are interesting, but they are more about finding an algorithm that will work efficiently and reliably in batch.

This post is more about understanding how CMD.EXE works. It doesn't solve any problem directly but potentially opens up new avenues for development and/or identifies blind alley dead ends.

Dave Benham

I completely and utterly agree

In applications like the Batch file processor (cmd.exe) may be tons of small operative details, glitches, etc. Although it is certainly interesting to know about they, many of them will never appear in real Batch file programs, so they fall just under the "interesting curiosity" label.

Please don't let me be misunderstood. I have a good time when I read about these topics and appreciate the author's effort, but I think that writing endless programs to test every possible aspect of a certain Batch detail don't worth the spent time. It would be preferably to try to find the solution to specific Batch problems that may be used by a large number of Batch file developers, or write large and complex Batch applications (like me, that I am currently writting my own version of figlet program).

Antonio

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Rules for how CMD.EXE parses numbers

dbenham wrote:IF

[...] The big difference is that overflow conditions no longer result in an error. Instead the maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.

FWIW the same logic is apparently used when implicitly expanding variables in set/a statements. Test run under xp.sp3:

Code: Select all

``C:\tmp>set "allones=0xFFFFFFFF"C:\tmp>set /a allones2147483647C:\tmp>set /a 0x7FFFFFFF2147483647C:\tmp>set /a %allones%Invalid number.  Numbers are limited to 32-bits of precision.C:\tmp>set /a 0xFFFFFFFFInvalid number.  Numbers are limited to 32-bits of precision.``

Liviu

dbenham
Expert
Posts: 2394
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Rules for how CMD.EXE parses numbers

Very nice Liviu

I've confirmed the same behavior on Vista, and I've updated the original post to include the rules you've discovered about SET /A variables.

Dave Benham

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Rules for how CMD.EXE parses numbers

And here is one more about un-expanded variables Apparently, implicit expansion only works for "genuine" environment variables, not dynamic ones.

Code: Select all

``C:\tmp>cmd /c exit 123C:\tmp>set /a errorlevel0C:\tmp>set /a %errorlevel%123C:\tmp>set /a random0C:\tmp>set /a %random%19362C:\tmp>``

Liviu

cmderror
Posts: 1
Joined: 22 Apr 2013 08:57

Re: Rules for how CMD.EXE parses numbers

Hello,

I found this thread quite interesting as I was troubleshooting cmd script problems related to number limits.

Thought, a very important thing has to be precised regarding the SET /A :
CMD.EXE does not handle numbers the same way if running Windows 7 or Windows XP.
In short :
- XP accepts any positive or negative 32bit number (that is from -4294967294 up to +4294967295), and convert it to a signed 32bit.
- 7 only accepts signed 32bit numbers if in decimal or octal (in hex, it's "OK" from -0xFFFFFFFF to +0xFFFFFFFF).

Some examples at the limit :
• Set /A 0xFFFFFFFF
XP : overflow
7 : -1
• Set /A 4294967295
XP : overflow
7 : overflow

• Set /A 0xFFFFFFFE
XP : -2
7 : -2
• Set /A 4294967294
XP : -2
7 : overflow

• Set /A 0x80000000
XP : -2147483648
7 : -2147483648
• Set /A -2147483648
XP : -2147483648
7 : overflow

• Set /A 2147483648
XP : -2147483648
7 : overflow

• Set /A -4294967294
XP : 2

7 : overflow

So, this is blatantly obvious (well, wasn't that much to me before I realized it...) when trying to set the error level for an out of positive bound :
• Set /A 2000000000
XP : 2000000000
7 : 2000000000

• Set /A 3000000000
XP : -1294967296

7 : overflow

(I would bet the behavior discrepancy was introduced with Vista, but I don't have any Vista box at hand to confirm)

PS : the IF comparison (if 2147483647==999999999999999999999999 echo These numbers are equal) should use equ rather than == as said a few lines above

npocmaka_
Posts: 498
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: Rules for how CMD.EXE parses numbers

Code: Select all

``for /l %%M in (2147483647  1 2147483648) do ( echo %%M & pause )``

One more case when 2147483647 overflows to -2147483648 and the result is a never ending story
(strange.. when I execute this my prompt disappears... even after ctrl+c)

and

Code: Select all

``exit /b 2147483648``

penpen
Expert
Posts: 1864
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Rules for how CMD.EXE parses numbers

This behaviour is a hint, that the command line interpreter has an internal representation
for 32 bit signed integer numbers (sint32) of the following form (minimum):
- 1 bit sign
- 32 bit unsigned integer number (uint32) value
This must be a form that is used prior to overflow/underflow detection:
If the number representation were a simple sint32 the input would overflow/underflow and the loop would produce no output.

And additionally the for /L loop uses this internal representation, and no overflow/underflow detection:
Then it should be no problem to explain this result:
The loop terminates if 2147483648<%%M, what will never happen, as %%M overflows.

The same endless loop on thee "negative" "underflowing" side:

Code: Select all

``Z:\>for /l %M in (-2147483648  -1 -2147483649) do ( echo %M & pause )``

Testet on WinXP home/prof SP3.

penpen