There are multiple contexts where CMD.EXE parses a string into a 4 byte signed integer value ranging from -2147483648 to 2147483647:
- SET /A
- IF
- %var:~n,m% (variable substring expansion)
- FOR /F "TOKENS=n"
- FOR /F "SKIP=n"
In all contexts, CMD can parse numbers expressed as decimal, hexadecimal, or octal notation:
Code:
decimal: [-]{non-zero decimal digit}[{decimal digit}...]
hexadecimal: [-]0{x|X}{hexadecimal digit}[{hexadecimal digit}...]
octal: [-]0{octal digit}[{octal digit}...]
{decimal digit} = any of {0|1|2|3|4|5|6|7|8|9}
{hexadecimal digit} = any of {0|1|2|3|4|5|6|7|8|9|A|B|C|D|E|F|a|b|c|d|e|f}
{octal digit} = any of {0|1|2|3|4|5|6|7}
But there are subtle differences depending on the context. The differences are in how negative numbers are parsed, and also how overflow and invalid number errors are handled. It appears there is one set of rules for SET /A, and another set of rules used by all other contexts.
SET /ALiteralsdecimal - The sign is initially ignored and the string of decimal digits is first converted into the unsigned binary numeric representation. Afterward, if the number was preceded by a negative sign, then the negative value is computed by taking the 2's compliment of the binary value. (invert digits and add 1)
-1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111
Everything works great except the negative limit of a signed 4 byte integer cannot be expressed! The problem is the parser limits itself to 31 bits in the 1st step. If the 32nd bit (the sign bit) is set, then the parser detects an overflow error.
-2147483648 -> 1000 0000 0000 0000 0000 0000 0000 0000 : ERROR - overflow detected
The actual error message is "Invalid number. Numbers are limited to 32-bits of precision.", with ERRORLEVEL=1073750992. Very misleading and unfortunate if you ask me.
If an invalid digit is used, then a different error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.
hexadecimal - The parser initially ignores any sign and the string of hexadecimal digits is converted into the unsigned binary numeric representation. If the number was preceded by a negative sign then the negative value is computed by taking the 2's compliment of the binary value.
The difference is that the SET /A hexadecimal parser allows the 32nd bit to be set during the initial parsing. After the initial parsing is complete, the 32nd bit is treated as the sign bit. So there are 2 representations for every number!
0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
-0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 -> 0000 0000 0000 0000 0000 0000 0000 0001 = 1
0xFFFFFFFF -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
-0x1 -> 0000 0000 0000 0000 0000 0000 0000 0001 -> 1111 1111 1111 1111 1111 1111 1111 1111 = -1
The oddball is -2147483648 because the 2's compliment of that number is itself!
0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000
-0x80000000 -> 1000 0000 0000 0000 0000 0000 0000 0000 -> 1000 0000 0000 0000 0000 0000 0000 0000
Actually there are many more representations for each number because additional leading 0s can be added. There is no limit other than the 8191 limit to a command line.
0x1, 0x01, 0x00000000000000000000000000000000001 are all equivalent representations of 1.
Another odd SET /A behavior is that overflow conditions are ignored when parsing hexadecimal notation. Any hex notation that would require 33 or more bits will result in either 1 or -1.
The following all result in -1:
Code:
set /a 0x1000000000
set /a 0xFFFFFFFFFF
set /a 0x888888888888888888888
The following all result in 1:
Code:
set /a -0x1000000000
set /a -0xFFFFFFFFFF
set /a -0x888888888888888888888
If an invalid hex digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991.
octal - The number is parsed similarly to decimal. The sign is initailly ignored and the octal digits are converted into a 31 bit unsigned integer. If the 32nd bit is set then an overflow error is detected. Any negative sign is applied afterward by taking the 2's compliment, but only if no error was detected.
So -2147483648 cannot be represented with octal notation, just as it cannot be represented with decimal notation.
Just like with hexadecimal, any number of leading zeros may be prefixed to a valid octal number.
00000000000000000000000000000000000000000000000001 --> 1
-00000000000000000000000000000000000000000000000001 --> -1
If an invalid octal digit is used, then the error is reported: "Invalid number. Numeric constants are either decimal (17),hexadecimal (0x11), or octal (021)." with ERRORLEVEL=1073750991. This error is a common occurrence when decimal 8 or 9 is zero prefixed, as can occur when parsing date and time information.
Edit - additional variable rules discovered by LiviuVariablesThe rules for parsing un-expanded numeric variables are different. All three numeric notations employ a similar strategy: First ignore any leading negative sign and convert the number into an unsigned binary representation. Then apply any leading negative sign by taking the 2's compliment.
The big difference is that overflow conditions no longer result in an error. Instead the maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.
Undefined variables are treated as zero, and variables that do not contain a valid numeric format are treated as zero.
IFIF only parses numbers when one of (EQU, NEQ, LSS, LEQ, GTR, GEQ) is used. The == comparison operator always results in a string comparison.
All three numeric notations employ a similar strategy: First ignore any leading negative sign and convert the number into an unsigned binary representation. Then apply any leading negative sign by taking the 2's compliment.
The big difference is that overflow conditions no longer result in an error. Instead the maximum magnitude value is used. A positive overflow becomes 2147483647, and a negative overflow becomes -2147483648.
Code:
if 2147483647==999999999999999999999999 echo These numbers are equal
if -2147483648=-999999999999999999999999 echo These numbers are equal
if 0xFFFFFFFFFFFFF==2147483647 echo These numbers are equal
if -0xFFFFFFFFFFFF==-2147483648 echo These numbers are equal
if 077777777777777==2147483647 echo These numbers are equal
if -077777777777777==-2147483648 echo These numbers are equal
This is a radical departure for hex notation. With SET /A, 0xFFFFFFFF sets the sign bit and the value is -1. With IF, 0xFFFFFFFF is treated as a positive number with an overflow condition, so it becomes 2147483647.
One other major difference - Numeric parsing is abandoned when an invalid digit is detected and IF uses a string comparison.
Code:
if 09 lss 9 echo TRUE because 9 is an invalid octal digit so string comparison is used
%var:~n,m% (variable substring expansion)I believe substring numeric parsing is the same as for IF, but it is difficult to prove because variables are limited to length 8191. The only thing I can prove is that overflow conditions give the same result as a non-overflow number that exceeds the length of the string.
Code:
set var=hello
::All of the following statements print out the entire string
echo %var:~0,5%
echo %var:~0,10%
echo %var:~0,9999999999999999999%
echo %var:~0,0xA%
echo %var:~0,0xFFFFFFFFFFFFFFFFF%
echo %var:~0,05%
echo %var:~0,0777777777777777777%
echo %var:~-5%
echo %var:~-10%
echo %var:~-9999999999999999999%
echo %var:~-0xA%
echo %var:~-0xFFFFFFFFFFFFFFFFF%
echo %var:~-05%
echo %var:~-0777777777777777777%
If an invalid digit is detected then variable expansion is aborted and the result is the code (minus the percents) instead of a substring of the value.
echo %var:~09% --> var:~09
FOR /F "TOKENS=n"Again I believe the numeric parsing rules are the same as for IF, but it is even more difficult to prove.
Any value < 1 results in a syntax error. This includes negative values with an overflow condition.
Any value > 31 results in a FOR /F parsing no-op (that request for a token is ignored) because FOR /F is limited to parsing a maximum of 31 tokens. This includes positive numbers with an overflow condition.
Code:
for /f "tokens=31,32,0xFFFFFFFFFFFF,1" %A in (
"1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33"
) do @echo A=%A, B=%B, C=%C, D=%D
results in A=1, B=31, C=%C, D=%D
This has nothing to do with number parsing, but note how the token numbers are sorted prior to assigning the letters.
FOR /F "SKIP=n"Exactly the same as "TOKENS" except I believe the max SKIP value is 2147483647. I did some testing, but it is a pain, and I'm not sure I tested properly.
Any SKIP value < 1 results in a syntax error.
I believe any SKIP value > 2147483647 results in immediate termination of the parsing of the input. But the command is still executed even if the positive overflow occurs.
Code:
for /f "skip=0x80000000" %A in ('dir "does not exist"') do @echo %A
The above generates the "File Not Found" error.
From what I remember of my testing, "SKIP=0x7FFFFFFF" properly skipped the proper number of lines in a huge file, and "SKIP=0x80000000" immediately returned without error and without taking the time to scan the huge file.
Dave Benham