Batch macros to convert between ASCII code and character

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Batch macros to convert between ASCII code and character

#1 Post by dbenham » 09 Feb 2013 11:40

A long time ago, I created a charLib batch script that can interconvert between numeric ASCII codes and character values (new functions: :chr, :asc, :asciiMap). The library worked really well, but the script contains many characters that do not post well on forums. Also, CALLed subroutines can slow things down when used in a tight loop.

I've converted the :asc and :chr routines into @asc and @chr macros that are extremely fast (at least by batch standards). See Macros with parameters appended and http://www.dostips.com/forum/viewtopic.php?f=3&t=1827 for more information about batch macros.

There are available techniques that use external executables, but none exist that are universally native to Windows, and they are slower than the macros. For example, see http://www.dostips.com/forum/viewtopic.php?f=3&t=3857. Hybrid batch/powershell or batch/VBScript could be used, but again, this is slower than the macros. Of course a fast solution is available with pure powershell or pure VBScript. But the batch macros are great for anyone that wants to work with batch.

The macro definition script is designed to be used as a TSR program (Terminate and Stay Resident). It defines the macros and necessary character maps as persistent variables. Any script can use the macros, once they are loaded.

There is a slightly faster alternate algorithm that requires definition of one persistent variable for every character. I opted for this slightly slower algorithm that minimizes the number of required variables.

Instead of embedding problematic characters within the script, I use temporary VBScript to dynamically generate the character maps used by the macros. The script now posts perfectly on forums like this. The VBScript is only used during the definition of the macros. Once defined, all further processing is pure native batch.

The macros work on all modern Windows versions, including XP.

Here is the charMacros.bat script that loads the macros

Code: Select all

@echo off
:: charMacros.bat
::
::   This script installs macros that can be used to interconvert between
::   numeric extended ASCII codes and character values.
::
::   The script defines the following variables:
::
::     @asc - A macro used to convert a character into the numeric ASCII code
::
::     @chr - A macro used to convert a numeric ASCII code into a character
::
::     #LF - A variable containing a line feed character
::
::     #CR - A variable containing a carriage return character
::
::     #charMap - A variable used by the @asc macro
::
::     #asciiMap - A variable used by the @chr macro
::
::     \n - used for specifiying the end of line in a macro definition
::

if "!" == "" >&2 echo ERROR: Delayed expansion must be disabled when loading %~nx0&exit /b 1

:: Define a Carriage Return string, only useable as !#CR!
for /f %%a in ('copy /Z "%~dpf0" nul') do set "#CR=%%a"

:: Define a Line Feed (newline) string (normally only used as !#LF!)
set #LF=^


:: Above 2 blank lines are required - do not remove

:: Define a newline with line continuation
set ^"\n=^^^%#LF%%#LF%^%#LF%%#LF%^^"

:: Define character maps used to interconvert between extended ASCII codes
:: and characters.
>"%temp%\charMacros.vbs" (
  echo for i=1 to 255
  echo   if i=10 then WScript.Stdout.Write " " else WScript.Stdout.Write chr(i^)
  echo next
  echo WScript.Stdout.Write chr(10^)+"#"
  echo for i=1 to 255
  echo   if i^<^>10 then WScript.Stdout.Write chr(i^)+chr(i^)+right("0"+hex(i^),2^)+"#"
  echo next
)
set "#charMap="
for /f "delims=" %%A in ('cscript /nologo "%temp%\charMacros.vbs"') do (
  if defined #charMap (set "#asciiMap=%%A") else set "#charMap= %%A"
)
del "%temp%\charMacros.vbs"


:: %@asc%  StrVar  Position  [RtnVar]
::
::   Converts a character into the extended ASCII code value.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   The macro is safe to "call" regardless whether delayed expansion is
::   enabled or not.
::
::     StrVar = The name of a variable that contains the character
::              to be converted
::
::     Position = The position of the character within the string
::                to be converted. 0 based.
::
::     RtnVar = The name of the variable used to store the result.
::
set @asc=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1-3 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal enableDelayedExpansion%\n%
  if defined %%~a (%\n%
    set "str=!%%~a!"%\n%
    set /a "n=%%~b" 2^>nul%\n%
    for %%N in (!n!) do set "chr=!str:~%%N,1!"%\n%
    if defined chr (%\n%
      set "rtn="%\n%
      if "!chr!"=="=" set rtn=61%\n%
      if "!chr!"=="^!" set rtn=33%\n%
      if "!chr!"=="!#lf!" set rtn=10%\n%
      if not defined rtn for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
        set "test=!#asciiMap:*#%%c=!"%\n%
        if not "%%c"=="!test:~0,1!" set "test=!test:*#%%c=!"%\n%
        set /a "rtn=0x!test:~1,2!"%\n%
      )%\n%
    )%\n%
    for %%v in (!rtn!) do endlocal^&if "%%~c" neq "" (set "%%~c=%%v") else echo(%%v%\n%
  ) else endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


:: %@chr%  AsciiCode  [RtnVar]
::
::   Converts an extended ASCII code into the corresponding character.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   The macro is safe to "call" regardless whether delayed expansion is
::   enabled or not.
::
::     AsciiCode - Any value from 1 to 255. The value can be expressed as any
::                 numeric expression supported by SET /A.
::
::     RtnVar - The name of the variable used to store the result
::
set @chr=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1,2 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal%\n%
  set "NotDelayed=!"%\n%
  setlocal EnableDelayedExpansion%\n%
  set "n=0"%\n%
  set /a "n=%%~a"%\n%
  if !n! gtr 255 set "n=0"%\n%
  if !n! gtr 0 (%\n%
    if !n! equ 10 (%\n%
      for %%C in ("!#LF!") do (%\n%
        endlocal^&endlocal%\n%
        if "%%~b" neq "" (set "%%~b=%%~C") else echo(%%~C%\n%
      )%\n%
    ) else (%\n%
      for %%N in (!n!) do set "c=!#charMap:~%%N,1!"%\n%
      if "!c!" equ "^!" if not defined NotDelayed set "c=^^^!"%\n%
      for /f delims^^=^^ eol^^= %%C in ("!c!!#CR!") do (%\n%
        endlocal^&endlocal%\n%
        if "%%~b" neq "" (set "%%~b=%%C") else echo(%%C%\n%
      )%\n%
    )%\n%
  ) else endlocal^&endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


exit /b 0

And here is a small test script that tests the functionality, and shows how easy it is to use the macros. It loops through codes 1 - 255 and converts the code into a character, and then back to a code again. It prints out each decimal code and resultant character, and prints an error if the starting code does not match the ending code. The script runs the test first with delayed expansion disabled, then again with delayed expansion enabled. Both work perfectly without errors :D

Code: Select all

@echo off

:: Load the macros. This only needs to be done once per CMD.EXE session
if not defined @chr call charMacros

echo Test "calls" with delayed expansion DISABLED:
setlocal disableDelayedExpansion
for /l %%N in (1 1 255) do (
  %@chr% %%N c
  %@asc% c 0 n
  setlocal enableDelayedExpansion
  echo    %%N:[!c!]
  if !n! neq %%N echo Error at %%N
  endlocal
)
echo(

echo Test "calls" with delayed expansion ENABLED
setlocal enableDelayedExpansion
for /l %%N in (1 1 255) do (
  %@chr% %%N c
  %@asc% c 0 n
  echo    %%N:[!c!]
  if !n! neq %%N echo Error at %%N
)


Dave Benham
Last edited by dbenham on 22 Aug 2013 10:18, edited 1 time in total.

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Batch macros to convert between ASCII code and character

#2 Post by Liviu » 11 Feb 2013 20:16

A few comments below, for clarification rather than critique.
dbenham wrote:The macro definition script is designed to be used as a TSR program (Terminate and Stay Resident). It defines the macros and necessary character maps as persistent variables. Any script can use the macros, once they are loaded.
It should be noted that the character maps are codepage dependent. If the script using the macros does a "chcp", then it needs to then rebuild the macros in order for the maps to be updated.

dbenham wrote:

Code: Select all

for /f "delims=" %%A in ('cscript "%temp%\charMacros.vbs"') do ( 
Under my xp.sp3 at least, that needs to be "cscript /nologo" for the output to be right.

dbenham wrote:And here is a small test script that tests the functionality, and shows how easy it is to use the macros. It loops through codes 1 - 255 and converts the code into a character, and then back to a code again.
Conversions code->char->code should always succeed, as your test script verifies. Should be noted however that conversions char->code may (a) fail for characters outside the current coedpage, and (b) return different codes depending on the active codepage. For example...

Code: Select all

C:\tmp>type charMacrosTest2.bat
@echo off
setlocal
call charMacros

setlocal enableDelayedExpansion
set "str=%~1"
set /a len = 0
:loop
if "!str:~%len%,1!" equ "" goto :eof
%@asc% str %len% chr
echo [!str:~%len%,1!] [!chr!]
set /a len += 1
goto :loop

C:\tmp>chcp 437
Active code page: 437

C:\tmp>charMacrosTest2 "αß©∂€"
[α] [224]
[ß] [225]
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[©] [©]
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[∂] [∂]
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[€] [€]

C:\tmp>chcp 850
Active code page: 850

C:\tmp>charMacrosTest2 "αß©∂€"
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[α] [α]
[ß] [225]
[©] [184]
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[∂] [∂]
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[€] [€]

C:\tmp>chcp 1252
Active code page: 1252

C:\tmp>charMacrosTest2 "αß©∂€"
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[α] [α]
[ß] [223]
[©] [169]
Invalid number.  Numeric constants are either decimal (17),
hexadecimal (0x11), or octal (021).
[∂] [∂]
[€] [128]

C:\tmp>

Liviu

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch macros to convert between ASCII code and character

#3 Post by dbenham » 13 Feb 2013 00:06

Liviu wrote:Under my xp.sp3 at least, that needs to be "cscript /nologo" for the output to be right.
Right you are, thanks. I don't know why my code was working on my Vista machine.

Liviu wrote:It should be noted that the character maps are codepage dependent. If the script using the macros does a "chcp", then it needs to then rebuild the macros in order for the maps to be updated.
...
Conversions code->char->code should always succeed, as your test script verifies. Should be noted however that conversions char->code may (a) fail for characters outside the current coedpage, and (b) return different codes depending on the active codepage.
Thanks for the education :D

I guess I've known that batch half supports unicode, and half doesn't. But I've never really understood exactly what is going on. Your explanation helps in my understanding.

So environment variables are stored as UTF-16, correct? But batch scripts must be written in ANSI (extended ASCII). The characters represented by codes 0 - 128 are fixed to be standard ASCII. But the meaning of extended codes 128-255 are dependent on the active code page. CMD silently converts between ANSI and unicode.

I've also been sort of aware that FOR supports unicode file names, but FOR /F does not. But I'm not exactly clear on what that means. I think that FOR /F cannot read unicode text files. FOR /F does weird things to unicode output from commands like WMIC and PING. How does FOR /F handle unicode strings? What other commands cannot handle unicode?

Here is a modified version of my charMacros.bat script that addresses your points:

  • I added the //NOLOGO switch.
  • I define #mapCodePage to hold the CHCP setting that was in effect when the macros were loaded. When in doubt, the value can be checked against the current value, and the macros reloaded if they are different.
  • I modified the @asc macro to return -1 if the character is not in the current character map.

Code: Select all

@echo off
:: charMacros.bat
::
::   This script installs macros that can be used to interconvert between
::   numeric extended ASCII codes and character values.
::
::   The script defines the following variables:
::
::     @asc - A macro used to convert a character into the numeric ASCII code
::
::     @chr - A macro used to convert a numeric ASCII code into a character
::
::     #LF - A variable containing a line feed character
::
::     #CR - A variable containing a carriage return character
::
::     #charMap - A variable used by the @asc macro
::
::     #asciiMap - A variable used by the @chr macro
::
::     #mapCodePage - The CHCP setting at the time the maps were loaded
::
::     \n - used for specifiying the end of line in a macro definition
::

if "!" == "" >&2 echo ERROR: Delayed expansion must be disabled when loading %~nx0&exit /b 1

:: Define a Carriage Return string, only useable as !#CR!
for /f %%a in ('copy /Z "%~dpf0" nul') do set "#CR=%%a"

:: Define a Line Feed (newline) string (normally only used as !#LF!)
set #LF=^


:: Above 2 blank lines are required - do not remove

:: Define a newline with line continuation
set ^"\n=^^^%#LF%%#LF%^%#LF%%#LF%^^"

:: Define character maps used to interconvert between extended ASCII codes
:: and characters.
>"%temp%\charMacros.vbs" (
  echo for i=1 to 255
  echo   if i=10 then WScript.Stdout.Write " " else WScript.Stdout.Write chr(i^)
  echo next
  echo WScript.Stdout.Write chr(10^)+"#"
  echo for i=1 to 255
  echo   if i^<^>10 then WScript.Stdout.Write chr(i^)+chr(i^)+right("0"+hex(i^),2^)+"#"
  echo next
)
set "#charMap="
for /f "delims=" %%A in ('cscript //nologo "%temp%\charMacros.vbs"') do (
  if defined #charMap (set "#asciiMap=%%A") else set "#charMap= %%A"
)
del "%temp%\charMacros.vbs"
for /f "delims=" %%A in ('chcp') do set "#mapCodePage=%%A"


:: %@asc%  StrVar  Position  [RtnVar]
::
::   Converts a character into the extended ASCII code value.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   A value of -1 is returned if the character is not in the currently loaded
::   code page. The macro is safe to "call" regardless whether delayed expansion
::   is enabled or not.
::
::     StrVar = The name of a variable that contains the character
::              to be converted
::
::     Position = The position of the character within the string
::                to be converted. 0 based.
::
::     RtnVar = The name of the variable used to store the result.
::
set @asc=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1-3 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal enableDelayedExpansion%\n%
  if defined %%~a (%\n%
    set "str=!%%~a!"%\n%
    set /a "n=%%~b" 2^>nul%\n%
    for %%N in (!n!) do set "chr=!str:~%%N,1!"%\n%
    if defined chr (%\n%
      set "rtn="%\n%
      if "!chr!"=="=" set rtn=61%\n%
      if "!chr!"=="^!" set rtn=33%\n%
      if "!chr!"=="!#lf!" set rtn=10%\n%
      if not defined rtn for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
        set "test=!#asciiMap:*#%%c=!"%\n%
        if not "%%c"=="!test:~0,1!" set "test=!test:*#%%c=!"%\n%
        if "%%c"=="!test:~-0,1!" (set /a "rtn=0x!test:~1,2!") else set "rtn=-1"%\n%
      )%\n%
    )%\n%
    for %%v in (!rtn!) do endlocal^&if "%%~c" neq "" (set "%%~c=%%v") else echo(%%v%\n%
  ) else endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


:: %@chr%  AsciiCode  [RtnVar]
::
::   Converts an extended ASCII code into the corresponding character.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   The macro supports value 1 - 255. The value 0 is not supported.
::   The macro is safe to "call" regardless whether delayed expansion is
::   enabled or not.
::
::     AsciiCode - Any value from 1 to 255. The value can be expressed as any
::                 numeric expression supported by SET /A.
::
::     RtnVar - The name of the variable used to store the result
::
set @chr=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1,2 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal%\n%
  set "NotDelayed=!"%\n%
  setlocal EnableDelayedExpansion%\n%
  set "n=0"%\n%
  set /a "n=%%~a"%\n%
  if !n! gtr 255 set "n=0"%\n%
  if !n! gtr 0 (%\n%
    if !n! equ 10 (%\n%
      for %%C in ("!#LF!") do (%\n%
        endlocal^&endlocal%\n%
        if "%%~b" neq "" (set "%%~b=%%~C") else echo(%%~C%\n%
      )%\n%
    ) else (%\n%
      for %%N in (!n!) do set "c=!#charMap:~%%N,1!"%\n%
      if "!c!" equ "^!" if not defined NotDelayed set "c=^^^!"%\n%
      for /f delims^^=^^ eol^^= %%C in ("!c!!#CR!") do (%\n%
        endlocal^&endlocal%\n%
        if "%%~b" neq "" (set "%%~b=%%C") else echo(%%C%\n%
      )%\n%
    )%\n%
  ) else endlocal^&endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


exit /b 0


Dave Benham

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Batch macros to convert between ASCII code and character

#4 Post by Liviu » 13 Feb 2013 20:16

dbenham wrote:Here is a modified version of my charMacros.bat script
Looks good, thank you.

As for the rest, that's a batch of loaded questions ;-) Maybe someday I'll get around to put together the bits I've collected over time about unicode at the cmd prompt in one coherent piece. Until then, below are some line item answers. The examples assume variable "abc" has been defined at the prompt prior to running any other commands.

Code: Select all

C:\tmp>set "abc=αß©"
The 3 characters are not chosen entirely at random. "α" = U+03B1 is present in OEM CP 437 with code 0xE0 (decimal 224), but not in cp 1252. "ß" = U+00DF is present in both OEM CP 437 with code 0xE1 (225), and ANSI CP 1252 with code 0xDF (223). "©" is present in ANSI CP 1252 with code 0xA9 (169) but not in cp 437. They can be entered at the cmd prompt as ALT+224, ALT+225 (or ALT+0223), and ALT+0169 respectively.

dbenham wrote:So environment variables are stored as UTF-16, correct?
Correct. In fact, both the name and value of an environment variable are stored as UTF-16 natively.

dbenham wrote:But batch scripts must be written in ANSI (extended ASCII). The characters represented by codes 0 - 128 are fixed to be standard ASCII. But the meaning of extended codes 128-255 are dependent on the active code page.
Essentially, yes. I'd rather avoid the "ANSI" designation in the context, since there is another meaning of "ANSI" (vs. "OEM" codepages) which may only confuse things further. Basically, the batch file needs to be saved in the same codepage as the one it's going to run under, in order for the high-bit ASCIIs to render as intended. Consider for example the following, which assumes that the batch file has been saved with the binary contents indicated on the top line - no matter the editor, or the "perceived" codepage at the time of saving. What happens at execution time is that the high-bit ASCII characters 128+ are interpreted according to the active codepage at run time - note the different output between chcp 437 and 1252.

Code: Select all

C:\tmp>rem contents of e0_e1_a9.cmd is hex 40 65 63 68 6F 20 E0 E1 A9 0D 0A

C:\tmp>chcp 437 >nul

C:\tmp>type e0_e1_a9.cmd & e0_e1_a9
@echo αß⌐
αß⌐

C:\tmp>chcp 1252 >nul

C:\tmp>type e0_e1_a9.cmd & e0_e1_a9
@echo àá©
àá©

dbenham wrote:CMD silently converts between ANSI and unicode.
When needed, yes. But most of CMD is unicode internally, and conversions are only needed at the boundaries. From what I've seen - though not found spelled out in the reference docs - that means (a) input piping from another command, (b) input redirection from a file, (c) output piping to another command, or (d) output redirection to a file unless running under "cmd /u". For an example of (c)...

Code: Select all

C:\tmp>chcp 437 >nul

C:\tmp>echo %abc% & echo %abc% |more
αß©
αßc

C:\tmp>chcp 1252 >nul

C:\tmp>echo %abc% & echo %abc% |more
αß©
aß©

dbenham wrote:I've also been sort of aware that FOR supports unicode file names, but FOR /F does not.
That's more like whatever comes from disk (or Windows APIs) does support unicode, while piped-in or read-from-file strings don't. The first two "for" loops below work fine, the next two don't (note that the last one locates the file itself correctly).

Code: Select all

C:\tmp>C:\tmp>for /f "delims=" %u in ("%abc%") do @echo %u
αß©

C:\tmp>copy /y e0_e1_a9.cmd %abc%.tmp
        1 file(s) copied.

C:\tmp>for %u in (%abc:~0,2%?.tmp) do @echo %u
αß©.tmp

C:\tmp>for /f "delims=" %u in ('dir /b %abc:~0,2%?.tmp') do @echo %u
αßc.tmp

C:\tmp>for /f "delims=" %u in (%abc%.tmp) do @echo %u
@echo αß⌐

dbenham wrote: But I'm not exactly clear on what that means. I think that FOR /F cannot read unicode text files. FOR /F does weird things to unicode output from commands like WMIC and PING. How does FOR /F handle unicode strings?
The 'command' variant of FOR/F flattens unicode strings back to the current codepage before assigning the loop variable.

dbenham wrote: What other commands cannot handle unicode?
Most all builtins that I've come across do. For example, string length or string replace routines (incuding those posted here that don't use temp files) should work with unicode strings out of the box.

Liviu

P.S. A few closing random thoughts...
- while toying with characters outside the default system OEM codepage, better set the cmd console to a truetype font, not raster; display confusion alone is bad enough, but there may be _behavior_ differences hinted at in the API documentation for SetConsoleOutputCP;
- tests above were done in xp.sp3, en-us, default (not cmd/u) prompt, truetype font;
- this doesn't even begin to touch the matter of double-byte codepages, such as those native to CJK (asian) windows, where for example one unicode codepoint could occupy two "character cells" in the console.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Batch macros to convert between ASCII code and character

#5 Post by foxidrive » 13 Feb 2013 22:00

Thanks for the info. I don't deal with codepages other than what is default and avoid unicode, but it's interesting to see the info presented well.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch macros to convert between ASCII code and character

#6 Post by dbenham » 14 Feb 2013 04:42

Thanks Liviu :D

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Batch macros to convert between ASCII code and character

#7 Post by Liviu » 15 Feb 2013 21:57

IMHO there still are at least two major problems plaguing "unicode" at the prompt.

1. The inability to process piped or redirected input. For example, piping "narrows" unicode down to the active 8-bit codepage, and the all too often recommended "for /f %x in ('dir ...')" will fail on pathnames with characters outside the current codepage. This is even worse because the plain "for %x in (*)" - which handles unicode just fine otherwise - does not have an option to iterate over 'h'idden files, like "for /f %x in ('dir /b /a-d')" does. Oddly enough, "for /r %x in (.)" does include hidden subdirectories, and also handles unicode names fine.

2. The difficulty of generating/hardcoding a "unicode" string in a batch file - other than having the user type it in interactively, or reading it off an existing file/directory name. Batch files remain 8-bit codepage-bound text files. And the half-baked support for UTF-8 codepage 65001 (which is still an 8-bit character encoding, but capable of representing any unicode codepoint) is crippled to the point that batch parsing breaks once it's active - can still be cheated for quick file conversions to/from UTF-8, but not much beyond that.

P.S. I lowered the level of #2 to "difficulty" (vs. "impossible") since I've just found a roundabout way to actually do it. Will followup in a separate post shortly. Not production code, by a long shot, still a proof of concept that "it might be conceivably done" ;-)

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch macros to convert between ASCII code and character

#8 Post by dbenham » 07 May 2014 08:29

I've used Liviu's WSF within batch discovery to eliminate the need for a temporary VBS file. :)

I've also added @ascHex and @Str2Hex macros.

Code: Select all

<!-- : Begin batch script
@echo off
:: charMacros.bat
::
::   This script installs macros that can be used to interconvert between
::   numeric extended ASCII codes and character values.
::
::   The script defines the following variables:
::
::     @Str2Hex - A macro used to convert a string into a string of hex digit
::                pairs representing the ASCII codes in the string.
::
::     @asc - A macro used to convert a character into the decimal ASCII code
::
::     @ascHex - A macro used to convert a character into the hex ASCII codde
::
::     @chr - A macro used to convert a numeric ASCII code into a character
::
::     #LF - A variable containing a line feed character
::
::     #CR - A variable containing a carriage return character
::
::     #charMap - A variable used by the @asc macro
::
::     #asciiMap - A variable used by the @chr macro
::
::     #mapCodePage - The CHCP setting at the time the maps were loaded
::
::     \n - used for specifiying the end of line in a macro definition
::
:: Originally developed and posted by Dave Benham (with help from DosTips users) at
:: http://www.dostips.com/forum/viewtopic.php?f=3&t=4284

if "!" == "" >&2 echo ERROR: Delayed expansion must be disabled when loading %~nx0&exit /b 1

:: Define a Carriage Return string, only useable as !#CR!
for /f %%a in ('copy /Z "%~dpf0" nul') do set "#CR=%%a"

:: Define a Line Feed (newline) string (normally only used as !#LF!)
set #LF=^


:: Above 2 blank lines are required - do not remove

:: Define a newline with line continuation
set ^"\n=^^^%#LF%%#LF%^%#LF%%#LF%^^"

:: Define character maps used to interconvert between extended ASCII codes
:: and characters.
set "#charMap="
for /f "delims=" %%A in ('cscript //nologo "%~f0?.wsf"') do (
  if defined #charMap (set "#asciiMap=%%A") else set "#charMap= %%A"
)
for /f "delims=" %%A in ('chcp') do set "#mapCodePage=%%A"


:: %@Str2Hex%  StrVar  [RtnVar]
::
::   Converts the string within StrVar into a string of extended ASCII codes,
::   with each code represented as a pair of hexadecimal digits. The length of
::   the result will always be exactly twice the length of the original string.
::
::   Any character within the string that is not in the currently loaded code
::   page will be represented as 00.
::
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::
::   The macro is safe to "call" regardless whether delayed expansion
::   is enabled or not.
::
::     StrVar = The name of a variable that contains the string
::              to be converted
::
::     RtnVar = The name of the variable used to store the result.
::
set @Str2Hex=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1,2 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal enableDelayedExpansion%\n%
  if defined %%~a (%\n%
    set "str=!%%~a!"%\n%
    set "s=!%%~a!"%\n%
    set "len=0"%\n%
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (%\n%
      if "!s:~%%P,1!" neq "" (%\n%
        set /a "len+=%%P"%\n%
        set "s=!s:~%%P!"%\n%
      )%\n%
    )%\n%
    set "rtn="%\n%
    for /l %%N in (0 1 !len!) do (%\n%
      set "chr=!str:~%%N,1!"%\n%
      set "hex="%\n%
      if "!chr!"=="=" set hex=3D%\n%
      if "!chr!"=="^!" set hex=21%\n%
      if "!chr!"=="!#lf!" set hex=0A%\n%
      if not defined hex for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
        set "test=!#asciiMap:*#%%c=!"%\n%
        if not "%%c"=="!test:~0,1!" set "test=!test:*#%%c=!"%\n%
        if "%%c"=="!test:~-0,1!" (set "hex=!test:~1,2!") else set "hex=00"%\n%
      )%\n%
      set "rtn=!rtn!!hex!"%\n%
    )%\n%
    for %%v in (!rtn!) do endlocal^&if "%%~b" neq "" (set "%%~b=%%v") else echo(%%v%\n%
  ) else endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


:: %@asc%  StrVar  Position  [RtnVar]
::
::   Converts a character into the extended ASCII code value.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   A value of -1 is returned if the character is not in the currently loaded
::   code page. The macro is safe to "call" regardless whether delayed expansion
::   is enabled or not.
::
::     StrVar = The name of a variable that contains the character
::              to be converted
::
::     Position = The position of the character within the string
::                to be converted. 0 based.
::
::     RtnVar = The name of the variable used to store the result.
::
set @asc=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1-3 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal enableDelayedExpansion%\n%
  if defined %%~a (%\n%
    set "str=!%%~a!"%\n%
    set /a "n=%%~b" 2^>nul%\n%
    for %%N in (!n!) do set "chr=!str:~%%N,1!"%\n%
    if defined chr (%\n%
      set "rtn="%\n%
      if "!chr!"=="=" set rtn=61%\n%
      if "!chr!"=="^!" set rtn=33%\n%
      if "!chr!"=="!#lf!" set rtn=10%\n%
      if not defined rtn for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
        set "test=!#asciiMap:*#%%c=!"%\n%
        if not "%%c"=="!test:~0,1!" set "test=!test:*#%%c=!"%\n%
        if "%%c"=="!test:~-0,1!" (set /a "rtn=0x!test:~1,2!") else set "rtn=-1"%\n%
      )%\n%
    )%\n%
    for %%v in (!rtn!) do endlocal^&if "%%~c" neq "" (set "%%~c=%%v") else echo(%%v%\n%
  ) else endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


:: %@chr%  AsciiCode  [RtnVar]
::
::   Converts an extended ASCII code into the corresponding character.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   The macro supports value 1 - 255. The value 0 is not supported.
::   The macro is safe to "call" regardless whether delayed expansion is
::   enabled or not.
::
::     AsciiCode - Any value from 1 to 255. The value can be expressed as any
::                 numeric expression supported by SET /A.
::
::     RtnVar - The name of the variable used to store the result
::
set @chr=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1,2 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal%\n%
  set "NotDelayed=!"%\n%
  setlocal EnableDelayedExpansion%\n%
  set "n=0"%\n%
  set /a "n=%%~a"%\n%
  if !n! gtr 255 set "n=0"%\n%
  if !n! gtr 0 (%\n%
    if !n! equ 10 (%\n%
      for %%C in ("!#LF!") do (%\n%
        endlocal^&endlocal%\n%
        if "%%~b" neq "" (set "%%~b=%%~C") else echo(%%~C%\n%
      )%\n%
    ) else (%\n%
      for %%N in (!n!) do set "c=!#charMap:~%%N,1!"%\n%
      if "!c!" equ "^!" if not defined NotDelayed set "c=^^^!"%\n%
      for /f delims^^=^^ eol^^= %%C in ("!c!!#CR!") do (%\n%
        endlocal^&endlocal%\n%
        if "%%~b" neq "" (set "%%~b=%%C") else echo(%%C%\n%
      )%\n%
    )%\n%
  ) else endlocal^&endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


:: %@ascHex%  StrVar  Position  [RtnVar]
::
::   Converts a character into the extended ASCII code hex value.
::   The result is stored in RtnVar, or ECHOed if RtnVar is not specified.
::   A value of -1 is returned if the character is not in the currently loaded
::   code page. The macro is safe to "call" regardless whether delayed expansion
::   is enabled or not.
::
::     StrVar = The name of a variable that contains the character
::              to be converted
::
::     Position = The position of the character within the string
::                to be converted. 0 based.
::
::     RtnVar = The name of the variable used to store the result.
::
set @ascHex=for %%# in (1 2) do if %%#==2 (%\n%
for /f "eol= tokens=1-3 delims=, " %%a in ("!#args!") do (endlocal%\n%
  setlocal enableDelayedExpansion%\n%
  if defined %%~a (%\n%
    set "str=!%%~a!"%\n%
    set /a "n=%%~b" 2^>nul%\n%
    for %%N in (!n!) do set "chr=!str:~%%N,1!"%\n%
    if defined chr (%\n%
      set "rtn="%\n%
      if "!chr!"=="=" set rtn=3D%\n%
      if "!chr!"=="^!" set rtn=21%\n%
      if "!chr!"=="!#lf!" set rtn=0A%\n%
      if not defined rtn for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
        set "test=!#asciiMap:*#%%c=!"%\n%
        if not "%%c"=="!test:~0,1!" set "test=!test:*#%%c=!"%\n%
        if "%%c"=="!test:~-0,1!" (set "rtn=!test:~1,2!") else set "rtn=-1"%\n%
      )%\n%
    )%\n%
    for %%v in (!rtn!) do endlocal^&if "%%~c" neq "" (set "%%~c=%%v") else echo(%%v%\n%
  ) else endlocal%\n%
set "#args=")) else setlocal enableDelayedExpansion^&set #args=,


exit /b 0


----- Begin wsf script --->
<job><script language="VBScript">
for i=1 to 255
  if i=10 then WScript.Stdout.Write " " else WScript.Stdout.Write chr(i)
next
WScript.Stdout.Write chr(10)+"#"
for i=1 to 255
  if i<>10 then WScript.Stdout.Write chr(i)+chr(i)+right("0"+hex(i),2)+"#"
next
</script></job>


Dave Benham

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Batch macros to convert between ASCII code and character

#9 Post by Ed Dyreen » 04 Jun 2015 11:18

hi Dave,

just one question,

Code: Select all

...for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
      set "test=!#asciiMap:*#%%c=!"%\n%
why the #CR here ?

I am trying the javaScript variant

Code: Select all

if ( WScript.Arguments.length > 1 ) {

   if ( WScript.Arguments(0) == "charToDec_" ) {

      var $char = WScript.Arguments(1);
      //
      WScript.Echo( $char.charCodeAt( 0 ) );
   };

   if ( WScript.Arguments(0) == "decToChar_" ) {

      var $char = WScript.Arguments(1);
      //
      $char = String.fromCharCode( $char );
      //
      WScript.Echo( $char );
   };
};
but if that gives too many exceptions I might try char files

Code: Select all

to decimal

for /R "!$cache.fullPath!\chr" %%? in ("*.CHR") do (

   set "$=" &set /P "$="

   if "!s:~%%i,1!" == "!$!" set "dec=%%~n?"

) <"%%?"

to char

set "char=" &set /P "char=" <"!$cache.fullPath!\chr\!dec!.CHR"

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch macros to convert between ASCII code and character

#10 Post by dbenham » 04 Jun 2015 11:54

Ed Dyreen wrote:just one question,

Code: Select all

...for /f delims^^=^^ eol^^= %%c in ("!chr!!#CR!") do (%\n%
      set "test=!#asciiMap:*#%%c=!"%\n%
why the #CR here ?


FOR /F always strips the last character from the string if it happens to be a carriage return (\r). I always append an \r that is later stripped just in case the !chr! is also \r. The extra \r protects my desired \r.


Dave Benham

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Batch macros to convert between ASCII code and character

#11 Post by Ed Dyreen » 14 Jun 2015 17:56

Having a problem with javaScript;

Code: Select all

for ( var i=0; i <= 255; i++ ) {
   if ( i == 128 ) {
      WScript.Stdout.Write( "Ç" );
   } else if ( i == 130 ) {
      WScript.Stdout.Write( "é" );
   } else if ( i >= 131 & i <= 159 ) {
      WScript.Stdout.Write( " " );
   } else {
      WScript.Stdout.Write( String.fromCharCode( i ) );
   };
};
Char 128 and 130 to 159 give me an error Microsoft JScript: invalid procedure call or argument. Odd because these chars print fine when hard coded and they also print fine in vbScript.

What's more is that I find it odd Microsoft tells me it's running JScript, while I did use //e: javaScript switch. it looks like javaScript to me, but since I haven't studied jscript I'm not sure it is.

This is a downer, with this many unprintable characters I will have to resort to vbscript like dave does, unless someone knows what's going on and how to resolve this.

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Batch macros to convert between ASCII code and character

#12 Post by Aacini » 14 Jun 2015 19:49

What is the problem with those characters? See this post.

How to solve it? See this post and my next post below that one.

EDIT: I modified the example code to made it simpler.

Code: Select all

@if (@CodeSection == @Batch) @then

@echo off
cls
echo Normal VBScript output:
echo For i=0 To 255 : WScript.StdOut.Write Chr(i) : Next > test.vbs
Cscript //nologo test.vbs
echo/
echo/
echo Adjusted JScript output:
Cscript //nologo //E:JScript "%~F0"
echo/
echo/
echo Previous output filtered with FINDSTR:
Cscript //nologo //E:JScript "%~F0" | findstr "^"
echo/
goto :EOF

@end

var AsciiCode = new Array();

AsciiCode[128]=8364;
AsciiCode.push(       129, 8218,  401, 8222, 8230, 8224, 8225, 710, 8240, 352, 8249, 338, 141, 381, 143,
                144, 8216, 8217, 8220, 8221, 8226, 8211, 8212, 732, 8482, 353, 8250, 339, 157, 382, 376 );

for ( var i = 0; i <= 255; i++ ) {
   var code = AsciiCode[i] ? AsciiCode[i] : i;
   WScript.Stdout.Write(String.fromCharCode(code));
}

I read somewhere that JScript is just the Microsoft's version of JavaScript called that way to avoid naming problems...

Antonio

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Batch macros to convert between ASCII code and character

#13 Post by Ed Dyreen » 15 Jun 2015 15:34

Thanks Aacini for the explanation but I still don't really understand. I only know that in ANSI codepages some characters are always the same and some are different which had to do with Microsoft's attempt to internationalization? Anyways maybe one day this will be explained in a book from school if our professors find it that important.

For now I chose to avoid the complications and I used the ?.WSF trick to embed multiple languages. So I ended up embedding the charToDec and the decToChar functions in VBScript. These functions don't even really need VBScript because I have some files cached from 0.CHR to 255.CHR ( created with makecab ) which can be enumerated but VBScript computes faster.

I also didn't use Dave's charmap. I know a map can enhance performance but if I really need performance I'd do it completely in VBScript.

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Batch macros to convert between ASCII code and character

#14 Post by Ed Dyreen » 14 Jul 2015 09:00

@Dave

guess you have an unnecessary char here

Code: Select all

        if "%%c"=="!test:~-0,1!"
can just be

Code: Select all

        if "%%c"=="!test:~0,1!"

Post Reply