For example when you build a command line from multiple parts, or pass arguments to a subroutine.
Each time it's tricky if the command or arguments contain special characters like ^ & | ! etc...
In that case, it's necessary to escape the special characters using additional ^ characters.
And the escaping rules are tricky because the number of ^ characters to add depends on the presence of ! and " characters!
Things can become really horrible when there are several levels of parsing to go through, some with delayed !expansion, others without.
The worst case being macros: When writing a complex macro, you have to prepare things for at least two, often three parsings!
Anyway, after wasting too much time finding the correct escaping for a complex case, I decided to automate that:
What I needed was a routine that took a string, and that would escape it so that it could survive intact through a given number of parsings.
As the rules are context-dependent, this basically required writing a batch parser, that duplicates the state machine used by the real cmd.exe parser; And depending on the current state anywhere in the string, that generates the necessary number of ^ ahead of tricky characters.
Fortunately, I didn't need a full-fledged parser. A subset modeling the tokenizer seems to be enough.
(Stage 2 in the reference on the subject: https://stackoverflow.com/a/4095133)
I decided to experiment in batch.
The first attempts were quite complex.
But after a while, I ended up with a relatively simple algorithm that seems to give good results:
Code: Select all
:#----------------------------------------------------------------------------#
:# #
:# Function EscapeCmdString #
:# #
:# Description Prepare a command for passing through multiple parsings #
:# #
:# Arguments %1 Name of the variable containing the command string #
:# %2 Output variable name. Default: Same as input variable #
:# %3 Number of parsings to go through. Default: 1 #
:# %4 # of the above with !expansion. Default: 0|%3 if exp #
:# #
:# Notes The cmd parser tokenizer removes levels of ^ escaping. #
:# This routine escapes a command line, or an argument, so #
:# that special characters like ^ & | > < ( ) make it #
:# through intact through one or more tokenizations. #
:# #
:# Known limitation: The LF character is not managed. #
:# #
:# History #
:# 2019-10-03 JFL Initial implementation #
:# #
:#----------------------------------------------------------------------------#
:EscapeCmdString %1=CMDVAR [%2=OUTVAR] [%3=# parsings] [%4=# with !expansion]
for /f "tokens=2" %%e in ("!! 0 1") do setlocal EnableDelayedExpansion & set "CallerExp=%%e"
set "H0=^^" &:# Return a Hat ^ with QUOTE_MODE 0=off
set "H1=^" &:# Return a Hat ^ with QUOTE_MODE 1=on
if %CallerExp%==1 set "H0=!H0!!H0!" & set "H1=!H1!!H1!" &:# !escape our return value
set "NPESC=1" &:# Default number of %expansion escaping to do
if not "%~3"=="" set "NPESC=%~3" &:# specified # of extra %expansion escaping to do
set /a "NXESC=%CallerExp%*NPESC" &:# Default number of !expansion escaping to do
if not "%~4"=="" set "NXESC=%~4" &:# specified # of extra !expansion escaping to do
for /l %%i in (1,1,%NXESC%) do set "H0=!H0!!H0!" & set "H1=!H1!!H1!"
for /l %%i in (1,1,%NPESC%) do set "H0=!H0!!H0!"
:# Define characters that need escaping outside of quotes
for %%c in ("<" ">" "|" "&" "(" ")") do set ^"EscapeCmdString.NE[%%c]=1^"
set ^"STRING=!%1!^"
set "OUTVAR=%2"
if not defined OUTVAR set "OUTVAR=%1"
set "RESULT="
set "QUOTE_MODE=0" &:# 1=Inside a quoted string
set "ESCAPE=0" &:# 1=The previous character was a ^ character
set "N=-1"
:EscapeCmdString.loop
set /a "N+=1"
set "C=!STRING:~%N%,1!" &:# Get the Nth character in the string
if not defined C goto :EscapeCmdString.end
if "!C!!C!"=="""" (
if !ESCAPE!==0 (
set /a "QUOTE_MODE=1-QUOTE_MODE"
) else ( :# Open " quotes can be escaped, but not close " quotes
if "!QUOTE_MODE!"=="0" set "RESULT=!RESULT!!H0:~1!"
)
) else if "!C!"=="^" (
if "!QUOTE_MODE!"=="0" set /a "ESCAPE=1-ESCAPE"
set "RESULT=!RESULT!!H%QUOTE_MODE%:~1!"
) else if "!C!"=="^!" (
set "RESULT=!RESULT!!H%QUOTE_MODE%:~1!"
) else if defined EscapeCmdString.NE["!C!"] ( :# Characters that need escaping outside of quotes
if "!QUOTE_MODE!"=="0" set "RESULT=!RESULT!!H0:~1!"
)
if not "!C!"=="^" set "ESCAPE=0"
set "RESULT=!RESULT!!C!"
goto :EscapeCmdString.loop
:EscapeCmdString.end
endlocal & set ^"%OUTVAR%=%RESULT%^" ! = &:# The ! forces always having !escaping ^ removal in delayed expansion mode
exit /b
Ex:
Code: Select all
C:\JFL\Temp>Library.bat -te "echo R^&D !"
_INITIAL=echo R^&D !
# EnableDelayedExpansion
_ESCAPED=echo R^^^^^^^&D ^^^!
REPARSED=echo R^&D !
C:\JFL\Temp>Library.bat -te "echo R^&D !" off
_INITIAL=echo R^&D !
# DisableDelayedExpansion
_ESCAPED=echo R^^^&D ^!
REPARSED=echo R^&D !
C:\JFL\Temp>Library.bat -te "echo R^&D !" on 2
_INITIAL=echo R^&D !
# EnableDelayedExpansion
_ESCAPED=echo R^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^&D ^^^^^^^^^^^^^^^!
REPARSED=echo R^^^^^^^&D ^^^!
REPARSED=echo R^&D !
C:\JFL\Temp>
The result of :EscapeCmdString is stored in variable _ESCAPED.
Finally, I run (set REPARSED=%_ESCAPED%) to force one level of parsing, and display the value.
(The _ ahead of the first two variables names is there only to make sure all values are aligned, which makes it easier to check if the first and last strings match.)
The first two examples above show that the escaped string is not the same if delayed expansion is on or off in the test routine.
The third example shows how a string that must survive two parsings with delayed expansion on grows ridiculously long.
To allow testing more tricky cases, without losing the tricky characters in the library invocation or its argument processing loop, the test routine allows passing HTML entities for tricky characters.
But I couldn't use the HTML syntax &name; as the & itself is tricky for batch. So, instead, I use [name] with brackets.
Ex:
Code: Select all
C:\JFL\Temp>Library.bat -te "0[excl]1[quot]2[excl]3[quot]" on 2
_INITIAL=0!1"2!3"
# EnableDelayedExpansion
_ESCAPED=0^^^^^^^^^^^^^^^!1"2^^^!3"
REPARSED=0^^^!1"2^!3"
REPARSED=0!1"2!3"
C:\JFL\Temp>
Use (library.bat -?) to display a help screen with the list of HTML entities supported.
I considered using this :EscapeCmdString to pass return values through endlocal barriers.
This would work, but the performance would be poor. I don't recommend it unless you're desperate.
The next step is improving it for macro support:
Routine :EscapeCmdString does not yet support LF in strings. (The [lf] entity works, but strings with LF characters now break :EscapeCmdString.)
I plan to add that eventually. And when it's done, we'll be able to do things like calling a macro from another macro:
Passing your favorite $macro to :EscapeCmdString will automatically generate a $$macro, hopefully usable from within other macros.
As this would be done only once during the program initialization, I hope the performance would be less of an issue.
Any feedback welcome!