Discussion about jeb's batch parsing rules on StackOverflow

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Discussion about jeb's batch parsing rules on StackOverflow

#1 Post by dbenham » 25 Jan 2018 14:43

The purpose of this thread is to have a central place to discuss the batch parsing rules on StackOverflow that jeb initiated.

Of particular interest are discussions about shortcomings or inaccuracies of the current model, along with suggestions for improvements.

I've already made a great many changes to the original posted rules. But there is still room for improvement.

There are already a number of DosTips threads that investigate various aspects of this topic. At some point I may add links to those topics. But I hope future discussion always takes place here.

Currently there are two issues that I am thinking about:

1) Should phases 3 and 4 be reversed?

The echoing of parsed commands (phase 3) occurs at two points: after the initial round of phase 2 (main parser), and then again after each round of phase 4 (FOR variable expansion for each DO iteration).

I think the logic would be much simpler to describe if the order of phases 3 and 4 were reversed. But I am reluctant to renumber the phases for fear of breaking phase references in historical posts.

What do you think jeb :?:

2) I think phase 7 (command execution) needs some refinement

I greatly expanded phase 7. But I see a potential problem, and I'm not sure how to correct it.

Sometimes a command can be both an internal command and an external command. For example, creation of an ECHO.BAT file.

Clearly the parser generally selects the internal command over the external command in phase 7.

Assuming ECHO.BAT exists in the current folder, then ECHO OK will print OK (execute the internal command) instead of executing the ECHO.BAT.

The CALL rules in phase 6 already account for the fact that CALL ECHO will call the batch script instead, because phase 6 identifies the batch script before phase 7 has a chance to execute the internal command.

Also supporting the existing rules, if I have TEST.BAT in the current folder, then when I execute ECHO\..\TEST, it simply prints out ..\TEST

But I am disturbed by ECHO\..\TEST.BAT - it executes the batch script instead :!: :evil:

Also, ECHO.BAT will execute the batch script instead of the internal command.

I'm struggling to find a set of simple rules that can account for the differences.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#2 Post by dbenham » 25 Jan 2018 15:58

I just realized - I think issue 2 can be resolved by a small change to 7.1:
proposed change wrote:
  • 7.1 - Execute internal command - If the command token is quoted or the command token is a path to an existing file (any extension must be included), then 7.1 is skipped. Otherwise, if an internal command can be parsed from the command token, then execute the internal command.
    • Normally the command token exactly matches the name of an internal command. But it is possible for options and or arguments to be included in the command token. For example `echo(Hello world` is parsed as an ECHO command with arguments `Hello world`. The exact internal command parsing rules vary from command to command.
Not explicitly stated, but the path need not match an executable file. If the command token matches any existing file, then 7.1 is skipped. Later on 7.3 execution will fail with an error if it is unable to match the command token with a valid executable file.

Can anyone find any exceptions to the above rule?

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#3 Post by jeb » 26 Jan 2018 01:27

dbenham wrote:
25 Jan 2018 15:58
1) Should phases 3 and 4 be reversed?

The echoing of parsed commands (phase 3) occurs at two points: after the initial round of phase 2 (main parser), and then again after each round of phase 4 (FOR variable expansion for each DO iteration).

I think the logic would be much simpler to describe if the order of phases 3 and 4 were reversed. But I am reluctant to renumber the phases for fear of breaking phase references in historical posts.
I think it's already in the correct order now.

Code: Select all

@echo off
setlocal
prompt #
echo on
FOR /L %%n in ( 1 1 3) DO (
  echo %%n
)
Output wrote:#FOR /L %n in (1 1 2) DO (echo %n )

#(echo 1 )
1

#(echo 2 )
2
For any command, first phase 3 occours (in this case the FOR main line will be echoed) and then the FOR-loop phase starts.
In each Loop (Phase4) the FOR-variables are expanded, then goto phase 3 and recheck the ECHO state, return to phase4 after all other phases are done.
I'm not sure if phase4 is still a "phase" as it stands a little bit outside of the normal phase order.

You proposed changes for phase7 looks better than my original text and they are much more extensive. :D

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#4 Post by dbenham » 26 Jan 2018 07:29

I think you missed my point for issue 1.

With the current phase numbering

A normal command flows as follows:

0 -> 1 -> 2 -> 3 -> skip 4 -> 5 -> usually skip 6 -> 7

Phase 3 only executes if command block in previously executed phase 2 did not start with @
Phase 3 shows the results of phase 2

When a FOR command executes in 7, it kicks off the DO commands, starting with phase 4:

3
^
4 -> 5 -> usually skip 6 -> 7

Phase 4 must explicitly call phase 3 as a subroutine.
Phase 3 only executes if command block in previously executed (not skipped) phase 4 :!: :? did not start with @
Phase 3 shows the results of phase 4
Phase 3 then returns to 4 before it flows to 5.

Or, in a linear layout, it flows as
4 -> 3 -> skip 4 -> 5 -> usually skip 6 -> 7

My proposed new phase numbering

A normal command flows as follows:
0 -> 1 -> 2 -> skip 3 -> 4 -> 5 -> usually skip 6 -> 7

Phase 4 only executes if command block in previously executed phase 2 did not start with @
Phase 4 shows the results of phase 2

No matter the order, a normal command always skips FOR expansion, so the order does not matter much

When a FOR command executes in 7, it kicks off the DO commands, starting with phase 3:

3 -> 4 -> 5 - usually skip 6 -> 7

Phase 3 simply flows naturally into phase 4
Phase 4 only executes if command block in previously executed phase 3 did not start with @ - a sensible order
Phase 4 shows the results of phase 3

----------------

Does my proposal make sense now? If starting from scratch, I would definitely use the modified numbering. But for historical reasons, I am reluctant.

Back to Issue 2

Well, my proposed rules were too simple.

I've come up with the following revised rules that seem to work for me on Win 7. I'll test soon on Win 10.

7.1 - Execute internal command - If the command token is quoted, then skip this step. Otherwise, attempt to parse out an internal command and execute.
  • The following tests are run to determine if an unquoted command token represents an internal command:
    • If the command token exactly matches an internal command, then execute it.
    • Else break the command token at the first occurrence of + ( / [ or ]
      If the preceding text is an internal command, then execute it
    • Else break the original command token at the first occurrence of . \ or :
      If the preceding text is not an internal command, then goto 7.2
      Else the preceding text may be an internal command. Remember this command.
    • Break the original command token at the first occurrence of + ( / [ or ]
      If the preceding text is a path to an existing file, then goto 7.2
      Else execute the remembered command.
  • Just because a command token is parsed as an internal command does not mean that it will execute successfully. Each internal command has its own rules as to what syntax is allowed.
  • ...
7.2 - Execute volume change - Else if the command token does not begin with a quote, is exactly two characters long, and the 2nd character is a colon, then change the volume

7.3 - Execute external command - Else try to treat the command as an external command

7.4 - Ignore a label - Ignore the command and all its arguments if the command token begins with :
Rules in 7.2 and 7.3 may prevent a label from reaching this point.


I think the above rules for 7.1 are good, But they violate a rule that jeb posted at ECHO. FAILS to give text or blank line - Instead use ECHO/
jeb wrote: These one fails, if files exists like echo*, the * is one of ".[]+'`~"

Code: Select all

echo.
echo[
echo]
echo+
I agree that ECHO. fails if a file named ECHO exists. Note that the trailing . is removed by the OS
The command fails with an error stating: 'echo.' is not recognized as an internal or external command, operable program or batch file.
This result is consistent with my proposed 7.1 rules - it is not recognized as an internal command, and eventually fails to execute as an external command.

But I cannot reproduce this behavior with ECHO[ ECHO] or ECHO+ on Windows 7. Update - I have confirmed Win 10 behaves the same as Win 7
If I create a file named ECHO[ and then execute the command ECHO[ then it successfully executes the internal ECHO command and prints a blank line.
The same is true with ] and +
If I could reproduce jeb's result, then it would invalidate my rules.

Did jeb get this wrong :?:
Or does the behavior described by jeb only apply to Win XP :?:
Or ... :?: :?


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#5 Post by dbenham » 27 Jan 2018 14:27

Argh :evil:

I just realized that there are critical differences in batch mode vs. command line (Today I'm testing on Win 10).
I'm still trying to figure things out, but my proposed rules are definitely not quite right.

I still haven't been able to reproduce failure of ECHO[ if the file ECHO[ exists. Not in batch mode or command line mode. It always echoes a blank line.

But if ECHO[.BAT exists, then ECHO[ in batch mode executes the batch file.
In command line ECHO[ still echoes a blank line.

I'm beginning to question whether I will ever figure this out. I'm thinking that I won't succeed unless I exactly nail down how phase 2 parses tokens. Currently that is a bit fuzzy.


Dave Benham

misol101
Posts: 475
Joined: 02 May 2016 18:20

Re: Discussion about jeb's batch parsing rules on StackOverflow

#6 Post by misol101 » 28 Jan 2018 07:19

Don't you think Microsoft would help out with this if you asked them nicely? (unless, of course, reverse engineering is the point)

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Discussion about jeb's batch parsing rules on StackOverflow

#7 Post by penpen » 29 Jan 2018 14:04

dbenham wrote:
26 Jan 2018 07:29
Did jeb get this wrong :?:
Or does the behavior described by jeb only apply to Win XP :?:
Or ... :?: :?
I tested the above with (a virtual machine) WinXP, SP3, x86:
There the behaviour is as jeb described.

penpen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#8 Post by dbenham » 29 Jan 2018 22:23

@penpen - Thanks for verifying that ECHO[ ECHO] and ECHO+ do not work properly on XP if a file with that name exists in the current directory.

Now could you (or anyone?) confirm that ECHO[ ECHO] and ECHO+ do work properly on Win 7, 8, and/or 10 if a file with that name exists in the current directory.

It looks like we have a difference in the parsing rules for XP vs later versions. :(


Dave Beham

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#9 Post by jeb » 30 Jan 2018 02:40

penpen wrote:
29 Jan 2018 14:04
dbenham wrote: ↑
Fri Jan 26, 2018 2:29 pm
Did jeb get this wrong :?:
Or does the behavior described by jeb only apply to Win XP :?:
Or ... :?: :?

I tested the above with (a virtual machine) WinXP, SP3, x86:
There the behaviour is as jeb described.
I get different results than penpen.

I retested it with winXP32 and Win7 and get the same results for both.

Code: Select all

@echo off

call :test "["
call :test "]"
call :test "+"
call :test "."
exit /b

:test
call :testExt "%~1" ""
call :testExt "%~1" ".bat"
exit /b

:testExt
call :__testExt "%~1" "%~2"
if "%OK%" == "0" echo Last test FAILED, for "ECHO%~1" with FILE "ECHO%~1%~2"
exit /b

:__testExt
set ok=0
set "char=%~1"
set "EXT=%~2"
del echo* 2> nul
del echo*.bat 2> nul

echo(
echo Testing "echo%CHAR%" with existing file "echo%CHAR%%EXT%"

echo ECHO THIS IS %%0 > "echo%CHAR%%EXT%"

(
    echo%CHAR% #1 in block
)

for %%A in (1) DO (
    echo%CHAR% #2 in for block
)

echo%CHAR% #3 plain
set ok=1
exit /B
Testing "echo[" with existing file "echo["
#1 in block
#2 in for block
#3 plain

Testing "echo[" with existing file "echo[.bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO[" with FILE "ECHO[.bat"

Testing "echo]" with existing file "echo]"
#1 in block
#2 in for block
#3 plain

Testing "echo]" with existing file "echo].bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO]" with FILE "ECHO].bat"

Testing "echo+" with existing file "echo+"
#1 in block
#2 in for block
#3 plain

Testing "echo+" with existing file "echo+.bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO+" with FILE "ECHO+.bat"

Testing "echo." with existing file "echo."
Der Befehl "echo." ist entweder falsch geschrieben ode
konnte nicht gefunden werden.
Der Befehl "echo." ist entweder falsch geschrieben ode
konnte nicht gefunden werden.
Der Befehl "echo." ist entweder falsch geschrieben ode
konnte nicht gefunden werden.

Testing "echo." with existing file "echo..bat"
#1 in block
#2 in for block
Das Sprungziel - __testExt wurde nicht gefunden.
Last test FAILED, for "ECHO." with FILE "ECHO..bat"
My old statement was not quite correct, it should be:
ECHO. fails, when a file "ECHO" (without extension) exists in the same directory (but when also a "ECHO.BAT" file exists, that file will be executed instead)

ECHO<?> searches and executes for a file named "ECHO<?>.bat", <?> is one character of the list ". [ ] +"
The search for the file only occours when the ECHO<?> is not inside a command block or the command for a FOR or IF command
This does not apply in command line context (echo. still fails for the point)

jeb

PS: Some more investigations are required :!:
echo. will fail when both "echo." and "echo.bat" exist, but when "echo..bat" exists, that file will be executed
& && || operatores modifies the behaviour, it currently seems to disable the file search function for "echo<?>.bat"

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#10 Post by dbenham » 30 Jan 2018 05:35

Thanks jeb. That is a relief that XP is not different than later versions. :D

But command concatenation, command blocks, FOR, and IF alter the behavior :shock: :!:

I was about to post a set of phase 7 rules that I thought for sure accounted for all the behavior. But your new discovery blows me away :evil: I never thought to test for that.

I wonder if command blocks, concatenation, FOR and IF simply use command line search rules.

One critical thing I have discovered about phase 2 - A left paren ( functions as a token delimiter when parsing the command token :!:


Dave Benham

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Discussion about jeb's batch parsing rules on StackOverflow

#11 Post by jeb » 30 Jan 2018 05:58

dbenham wrote:
30 Jan 2018 05:35
One critical thing I have discovered about phase 2 - A left paren ( functions as a token delimiter when parsing the command token :!:
Yes, I know and I suppose I wrote something about that fact, as I assume that "echo(" got his special abillities from exactly there.

The next test works with "ECHO[" and the others, but with "ECHO(" it works only for percent expansion, therefore the splitting of "ECHO" and "(" have to be happen in phase2

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "myEcho=echo("

%myEcho% #1
!myEcho! #2
for /F "delims=" %%A in ("%myEcho%") do (
  %%A #3
)
And another test, that demonstartes that "(" splits the command token in the first test case.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "(var= PAREN"

echo!(var!
echo!^(var!
Output wrote:var
PAREN
dbenham wrote:
30 Jan 2018 05:35
I wonder if command blocks, concatenation, FOR and IF simply use command line search rules
My question is, why the hell is there any difference at all :?:
I can't believe that this is intentionally, but what type of code would produce such a behaviour?

jeb

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#12 Post by dbenham » 30 Jan 2018 07:51

OK - Here are my proposed rules for how it is determined if a command is an internal command. They account for all the test results I have seen, but I haven't tested all possible permutations.

There are 4 aspects of phase 2 that are critical to understanding my phase 7 rules:
  • ( functions as a token delimiter when parsing the command token
  • Token delimiters preceding the command token are stripped
  • Escaped token delimiters can be included in the command token
  • All token delimiters after the command token are preserved in the argument list for a command when it is passed to phase 7
7.1 - Execute internal command - If the command token is quoted, then skip this step. Otherwise, attempt to parse out an internal command and execute
  • The following tests are made to determine if an unquoted command token represents an internal command
    • If the command token exactly matches an internal command, then execute it.
      .
    • Else break the command token before the first occurrence of + / [ ] or standard token delimiter
      If the preceding text is an internal command, then remember that command
      • If in command line mode, or if the command is from a parenthesized block, IF command block, FOR command block, or involved with command concatenation, then execute the internal command
        .
      • Else (must be a stand-alone command in batch mode) scan the current folder and the PATH for a .COM, .EXE, .BAT, or .CMD file whose base name matches the original command token
        • If the first matching file is a .BAT or .CMD, then goto 7.3.exec and execute that script
        • Else (match not found or first match is .EXE or .COM) execute the remembered internal command
    • Else break the command token before the first occurrence of . \ or :
      If the preceding text is not an internal command, then goto 7.2
      Else the preceding text may be an internal command. Remember this command.
      .
    • Break the command token before the first occurrence of + / [ ] or standard token delimiter
      If the preceding text is a path to an existing file, then goto 7.2
      Else execute the remembered command
  • If an internal command is parsed from a larger command token, then the unused portion of the command token is included in the argument list
    .
  • Note that ( does not have any special meaning in phase 7 - it is not a standard token delimiter
    .
  • Just because a command token is parsed as an internal command does not mean that it will execute successfully. Each internal command has its own rules as to what syntax is allowed
7.2 - Execute volume change - Else if the command token does not begin with a quote, is exactly two characters long, and the 2nd character is a colon, then change the volume
  • Details skipped for now
7.3 - Execute external command - Else try to treat the command as an external command
  • Details about error detection and label detection skipped for now
  • 7.3.exec - Execute the external command.
7.4 - Ignore a label - Ignore the command and all its arguments if the command token begins with :
Rules in 7.2 and 7.3 may prevent a label from reaching this point


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#13 Post by dbenham » 30 Jan 2018 08:25

I just reread this post, and realized that ( does not function as a command token delimiter for the first command after an unexecuted label within a parenthesized block

Code: Select all

(
  :UnexecutedLabel
  echo(1 FAILS & echo(2 OK
  
  echo(3 OK
  
  :UnexecutedLabel
  :ExecutedLabel
  echo(4 OK
)
Also, A parenthesized block cannot follow immediately after an unexecuted label within a parenthesized block.

Code: Select all

(
  :UnexecutedLabel
  (echo 1 FAILS) & (echo 2 OK)

  (echo 3 OK)

  :UnexecutedLabel
  :ExecutedLabel
  (echo 4 OK)
)
It seems likely that the parenthesized block parser implementation is what causes ( to function as a command token delimiter, which I suspect is an unintended side effect.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#14 Post by dbenham » 30 Jan 2018 09:07

jeb wrote:
30 Jan 2018 05:58
dbenham wrote:
30 Jan 2018 05:35
I wonder if command blocks, concatenation, FOR and IF simply use command line search rules
My question is, why the hell is there any difference at all :?:
I can't believe that this is intentionally, but what type of code would produce such a behavior?
Well, the extra test deals with discovery of a batch file (not any other type of external file), and this is often needed when using CALL. There is no obvious reason to ever CALL a batch file from the command line. (Yes there are some hacky issues that could make a command line CALL useful, but I doubt such uses were planned for)

So I'm guessing that the odd extra batch test is related to the CALL mechanism, and there are some unintended side effects that control when the test is performed, and when it is not.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Discussion about jeb's batch parsing rules on StackOverflow

#15 Post by dbenham » 30 Jan 2018 15:09

I've incorporated the revised phase 7 rules into the SO post, along with a number of additional changes.

The last major addition that I want to do is add a new answer to the SO question that collects all rules about labels into one place. Some of the information will be redundant with info in the main answer. But the rules about how GOTO and CALL parse labels will be new. Once I finish this, I will add a reference to label answer within the main answer.

Another possible refinement is to flesh out the rules how external commands are identified (involving current directory, PATH, PATHEXT, and file associations, ...). I'm not yet committed to doing this, but I think it would be really useful.

The last major project I can think of would be to investigate and document the phase 7 option and argument parsing rules for each internal command. But I seriously doubt I will ever undertake this effort.


Dave Benham

Post Reply