How to use Unicode Chars?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
einstein1969
Expert
Posts: 941
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

How to use Unicode Chars?

#1 Post by einstein1969 » 14 May 2014 12:09

Hi to all,

I have read around about this and now I'm considering to use this, but I have confusion in my mind about this context, and have some questions but i start for a simple problem to risolve that may implicit respond at my questions.

Well, suppose that i wonted display/print a set of unicode character.
I have choice to print on screen the characters from U+2500 to U+2600

  • 1 - What is the best approach to do this?
  • 2 - I know that using ALT+num i can display a char. But I have seen that if i use ALT+0NUM the behavior is different. Can anyone explain the difference?
  • 2a - Can use this method for create unicode characters?
  • 3 - Why the notation U+ ? What mean?
  • 4 - I see that windows(I use seven 32bit) use the "character map" application for copy and paste. This means that clipboard is unicode aware?
  • 5 - I use notepad for editing the batch script. I need to use attention/options in create this new unicode aware script?
  • 6 - I know that the comand windows need use the font Lucida Console. It's right?

PS: My question is not clear because i dont' know well the unicode!

This is the code that i need to change:

Code: Select all

@echo off & setlocal EnableDelayedExpansion

Rem Use FONT raster 8x8 for better visualization

rem default color and chars
set Def_Color=0A
set Def_Fill_char=Û
set Def_Empty_Char=ú


Set /a Lines=80, Cols=120

Call :Init

For /L %%x in (1,1,10) do call :Plot_X_Y %%x+10 %%x+5
For /L %%x in (1,1,20) do call :Plot_X_Y 35-%%x %%x+8

For /L %%x in (-16,1,16) do For /L %%y in (-16,1,16) do (
  set /a c=%%x*%%x+%%y*%%y-15*15
  if !c! leq 0 call :Plot_X_Y %%x+50 %%y+25
)


call :Flush_Screen_Buff

goto :End

#############################################################################

:Plot_X_Y x y [Char]
  set /a X=%1, Y=%2
  if !random:~-1! leq 1 set /P "=Plot !X! !Y!     !CR!" <nul
  if !X! gtr 0 if !Y! gtr 0 if !X! leq !Cols! if !Y! leq !Lines! (
     if "%3"=="" (set P_!X!_!Y!=%Def_Fill_char%) else set P_!X!_!Y!=%3
  )
goto :eof

:Flush_Screen_Buff
 Cls
 set "Label_Y=  Y ³³³³³³"
 Echo((0,0^) X ÄÄÄÄÄÄ
 For /L %%y in (1,1,%lines%) do (
   if "!Label_Y:~%%y,1!"=="" (set/P"=%BS% " <nul) else set/P"=%BS%!Label_Y:~%%y,1!" <nul
   For /L %%x in (1,1,%cols%) do set /P "=!P_%%x_%%y:~0,1!" <nul
   echo(
 )
 Echo(
goto :eof


:Init

Echo For Better Visualization use FONT raster 8x8
Echo Please Wait... Init the system...
chcp 437

rem Define BS to contain a backspace
for /f %%a in ('"prompt $H&for %%b in (1) do rem"') do set "BS=%%a"

for /f %%a in ('copy /Z "%~dpf0" nul') do set "CR=%%a"

rem Buffer schermo
rem un pixel o 2 pixel (nel modo a doppia risoluzione) e' formato da un charattere C e dal
rem colore foreground fg F e background bg B.

Rem Utilizzo tante variabili con tre caratteri che rappresentano le tre informazioni
Rem P_X_Y=CBF

set /a TL=Lines+5, TC=Cols+2

For /L %%y in (1,1,%lines%) do For /L %%x in (1,1,%cols%) do set "P_%%x_%%y=%Def_Empty_Char%%Def_Color%"

mode %TC%,%TL%
Cls & Color %Def_Color%

Call :Flush_Screen_Buff

goto :eof

:End
 endlocal
goto :eof


Einstein1969

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: How to use Unicode Chars?

#2 Post by aGerman » 14 May 2014 17:32

You can't encode Batch scripts in Unicode and the console fonts don't realy support Unicode (only parts of it).
Have a look at this thread:
viewtopic.php?f=3&t=5500
also explicitely search the forum for Livius posts who often explains Unicode related issues.
I fear finally you won't go with Unicode in Batch :|

Regards
aGerman

penpen
Expert
Posts: 1995
Joined: 23 Jun 2013 06:15
Location: Germany

Re: How to use Unicode Chars?

#3 Post by penpen » 14 May 2014 17:56

If you only want to store/load unicode data, then you may read (or write) utf-16 from a file using "cmd /U" and "<<" (or ">>") (without NUL = U+0).
(Read hex values using fc).

You have to create this file by hand, for example using cmd /A and "copy the values together" as described here:
http://www.dostips.com/forum/viewtopic.php?f=3&t=5326

As you encode these values using U+n notation (UTF-32), you have to converst UTF-32 to UTF-16 and vice versa.
This may help you (scetched, unoptimized):

Code: Select all

@echo off

set /A "value1=0x064321"
set /A "value2=0x0000FF"


call :UTF_32_BE_2_UTF_16_BE "%value1%" "RESULT1"
call :UTF_32_BE_2_UTF_16_BE "%value2%" "RESULT2"

echo surrogates ^(U+%value1%^) = %RESULT1%
echo surrogates ^(U+%value2%^) = %RESULT2%
echo(


call :UTF_16_BE_2_UTF_32_BE "%RESULT1%" "RESULT3"
call :UTF_16_BE_2_UTF_32_BE "%RESULT2%" "RESULT4"

echo codepoint ^(%RESULT1%^) = U+%RESULT3%
echo codepoint ^(%RESULT2%^) = U+%RESULT4%
echo(


goto :eof


:UTF_32_BE_2_UTF_16_BE
:: %~1   UTF 32 BE value
:: %~2   container to store the result
   setlocal enableDelayedExpansion

   set /A "MIN_CODE_POINT     = 0x000000"
   set /A "MAX_CODE_POINT     = 0x10FFFF"
   set /A "MIN_HIGH_SURROGATE = 0xD800"
   set /A "MIN_LOW_SURROGATE  = 0xDC00"
   set /A "MIN_SUPPLEMENTARY_CODE_POINT=0x010000"

   if %~1 LSS %MIN_CODE_POINT% (
      set "result=invalid codepoint"
   ) else if %~1 LSS %MIN_SUPPLEMENTARY_CODE_POINT% (
      set /A "result=%~1"
   ) else if %~1 LEQ %MAX_CODE_POINT% (
      set /A "U = %~1", "offset = U - MIN_SUPPLEMENTARY_CODE_POINT"
      set /A "hs = MIN_HIGH_SURROGATE + (offset >> 10)", "ls = MIN_LOW_SURROGATE + (offset & 0x3FF)"
      set "result=!hs! !ls!"
   ) else (
      set "result=invalid codepoint"
   )

   endlocal & set "%~2=%result%"
   goto :eof


:UTF_16_BE_2_UTF_32_BE
:: %~1   UTF 16 surrogate (pair with order: high surrogate low surrogate)
:: %~2   container to store the result
   setlocal enableDelayedExpansion

   set /A "MIN_HIGH_SURROGATE = 0xD800"
   set /A "MIN_LOW_SURROGATE  = 0xDC00"
   set /A "MIN_SUPPLEMENTARY_CODE_POINT=0x010000"

   if "%~1" == "invalid codepoint" (
      set "result=%~1"
   ) else (
      for /F "tokens=1-2" %%a in ("%~1") do if NOT "%%b" == "" (
         set /A "result=((%%a - MIN_HIGH_SURROGATE) << 10) + (%%b - MIN_LOW_SURROGATE) + MIN_SUPPLEMENTARY_CODE_POINT"
      ) else (
         set /A "result=%~1"
      )
   )

   endlocal & set "%~2=%result%"
   goto :eof

penpen

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: How to use Unicode Chars?

#4 Post by Liviu » 14 May 2014 21:55

einstein1969 wrote:Well, suppose that i wonted display/print a set of unicode character.
I have choice to print on screen the characters from U+2500 to U+2600
  • 1 - What is the best approach to do this?
  • 2 - I know that using ALT+num i can display a char. But I have seen that if i use ALT+0NUM the behavior is different. Can anyone explain the difference?
  • 2a - Can use this method for create unicode characters?
  • 3 - Why the notation U+ ? What mean?
  • 4 - I see that windows(I use seven 32bit) use the "character map" application for copy and paste. This means that clipboard is unicode aware?
  • 5 - I use notepad for editing the batch script. I need to use attention/options in create this new unicode aware script?
  • 6 - I know that the comand windows need use the font Lucida Console. It's right?

1. Short answer is that there is no (known and practical) portable solution to create arbitrary Unicode characters in pure batch in all versions of Windows. If you are willing to restrict it to Windows 7+ and cmd prompts configured to use Unicode/TT fonts then, yes, it's possible but still not entirely straightforward. See the thread at http://www.dostips.com/forum/viewtopic.php?f=3&t=5516 for a starting point, and also search this site for Unicode/utf-8 to learn more.
2. ALT+# inserts the character which has code '#' in the OEM codepage. ALT+0# inserts the character which has code '#' in the ANSI page. For example, on my system with OEM codepage 437 and ANSI codepage 1252, ALT+225 and ALT+0223 both insert the same character "ß" U+00DF.
2a. No. But lookup the EnableHexNumpad registry setting.
3. That's the standard notation for Unicode codepoints. For background and history see http://unicode.org/mail-arch/unicode-ml/y2005-m11/0060.html.
4. Yes.
5. As long as it's plain-text 7-bit ASCII, no. If it uses extended ASCII characters (>= 0x80) then you must execute or read it back under the same codepage as Notepad saved it. If you save it as UTF-8-No-BOM then (the non-ASCII part) is only usable under Windows 7+ and codepage 65001.
6. You need a Unicode/TT font like Lucida Console (not a raster font) to both enable some automatic codepage translations in the console, and display characters outside the current codepage. What characters get displayed correctly depends on the font coverage, and FWIW even Lucida Console doesn't carry the full U+2500-2600 range.

Liviu

penpen
Expert
Posts: 1995
Joined: 23 Jun 2013 06:15
Location: Germany

Re: How to use Unicode Chars?

#5 Post by penpen » 23 May 2014 04:58

I've nearly forgotten to post this:

Code: Select all

:: list.bat
@echo off
setlocal enableDelayedExpansion
set "hexDigits=0 1 2 3 4 5 6 7 8 9 A B C D E F"
set "codepoints="
for %%h in (%hexDigits%) do for %%l in (%hexDigits%) do set "codepoints=!codepoints! U+25%%~h%%~l"
set "codepoints=%codepoints% U+2600"

testcon.bat %codepoints%

endlocal
You need the testcon.bat from the thread Liviu has linked above (under point 1., sorry for not linking myself, but only two links allowed).
If you are using win xp then you need a patched cmd.exe; see http://www.dostips.com/forum/viewtopic.php?f=3&t=5588.
To switch to the font "Lucida Console", the program "CmdFont.exe" may help, see http://www.dostips.com/forum/viewtopic.php?p=34649#p34649:

Code: Select all

CmdFont.exe SET 6
This works at least under my win xp home 32 bit.

penpen

Post Reply