DosTips.com

A Forum all about DOS Batch
It is currently 29 May 2017 09:08

All times are UTC-06:00




Post new topic  Reply to topic  [ 20 posts ]  Go to page 1 2 Next
Author Message
PostPosted: 04 Feb 2017 22:34 
Offline

Joined: 04 Feb 2017 22:03
Posts: 9
Hi all!

I'm using the "echo" command to write text both on-screen and to a text file. For aesthetic reasons, I now want to use box-characters like the ones shown at the top of the fourth column here:
http://www.tedmontgomery.com/tutorial/altchrc.html

If I copy and paste them into Notepad and try to save the file, I get a message warning that I'm using non-ANSI characters and need to change the encoding of the file. Unfortunately, it seems that batch files need to be in ANSI format to work.

But... I tried using Notepad++ and can paste and save a file with a box-character in it. The status bar shows that the file is encoded in UTF-8, but (unlike in Windows Notepad) the batch file saves and runs fine... except the box-characters echoed to the screen appear as capital accented characters. (The ones output to the text file appear fine.)

If I hold down the ALT key and type a 3-digit code (from the link above) in Notepad or Notepad++, the relevant box-character appears. If I do it in Wordpad, the accented capital letter appears. If I copy and paste that letter into the on-screen echo commands in my script, they appear "correctly" as box-characters when I execute the file.

I have to admit, I'm somewhat confused. I think it's something to do with "code pages"... but... :-/

I want to share my script with other people running other Windows versions and (maybe) different languages. How can I make sure my batch file can produce these box-characters consistently (to file and screen) on any system?

Is that possible?


Top
   
PostPosted: 05 Feb 2017 06:13 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
esuhl wrote:
I think it's something to do with "code pages"

Right. Batch files have to be written in a single-byte code page. UTF-8 or UTF-16 won't work. The reason why you can't simply c/p the characters is that Windows applications (such as Notepad) and console applications (such as cmd) work with different code pages. That means only the ASCII characters are always the same (up to the ~ character). Higher bytes are interpreted differently.

esuhl wrote:
How can I make sure my batch file can produce these box-characters consistently (to file and screen) on any system?

You can't. Not even every code page supports all these box-drawing characters.

Working around this limitation you may save these characters in a separate file.
box.txt
Code: Select all
│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌

Use the Windows Notepad. Use the "Save as" dialogue and make sure the encoding is "Unicode"!

Code page 437 seems to support all these characters. So you have to change the code page using CHCP and use TYPE to read and save the characters in a variable in your batch code.
Code: Select all
@echo off &setlocal
>nul chcp 437
for /f %%i in ('type "box.txt"') do set "box=%%i"
echo %box%
echo %box:~0,1%
echo %box:~1,1%
pause


Eventually you could create box.txt with batch, too.
Code: Select all
>"temp.~b64" echo(//4CJSQlYSViJVYlVSVjJVElVyVdJVwlWyUQJRQlNCUsJRwlACU8JV4lXyVaJVQlaSVmJWAlUCVsJWclaCVkJWUlWSVYJVIlUyVrJWolGCUMJQ==
>nul certutil.exe -f -decode "temp.~b64" "box.txt"
del "temp.~b64"

Steffen

~~~~~~~~~ EDIT ~~~~~~~~~

An example of how to use this technique:
Code: Select all
@echo off &setlocal

:: Save the current OEM code page (in order to be able to reset it later on).
for /f "tokens=2 delims=:" %%i in ('chcp') do set /a oemcp=%%~ni

:: Change the console code page.
:: For supporting all of the box-drawing characters choose one out of
:: 437 (English US), 708 (Arabic ASMO), 720 (Arabic Microsoft), 737 (Greek), 860 (Portuguese),
:: 861 (Icelandic), 862 (Hebrew), 863 (French Canada), 865 (Nordic), 866 (Russian)
>nul chcp 437

:: Create, convert, and save the box-drawing characters.
>"%temp%\boxdrw.~b64" echo(//4CJSQlYSViJVYlVSVjJVElVyVdJVwlWyUQJRQlNCUsJRwlACU8JV4lXyVaJVQlaSVmJWAlUCVsJWclaCVkJWUlWSVYJVIlUyVrJWolGCUMJQ==
>nul certutil.exe -f -decode "%temp%\boxdrw.~b64" "%temp%\boxdrw.~u16"
for /f %%i in ('type "%temp%\boxdrw.~u16"') do set "box=%%i"
del "%temp%\boxdrw.~b64" "%temp%\boxdrw.~u16"

:: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ begin of examples ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
echo All box-drawing characters and their substring variables.
setlocal EnableDelayedExpansion
for /l %%i in (0 1 39) do echo  !box:~%%i,1! %%box:~%%i,1%%&echo(
endlocal
echo(

echo Draw boxes with single-frame characters.
echo  %box:~39,1%%box:~17,1%%box:~15,1%%box:~17,1%%box:~12,1%
echo  %box:~0,1%A%box:~0,1%B%box:~0,1%
echo  %box:~16,1%%box:~17,1%%box:~18,1%%box:~17,1%%box:~1,1%
echo  %box:~0,1%C%box:~0,1%D%box:~0,1%
echo  %box:~13,1%%box:~17,1%%box:~14,1%%box:~17,1%%box:~38,1%
echo(
echo Draw boxes with double-frame characters.
echo  %box:~22,1%%box:~26,1%%box:~24,1%%box:~26,1%%box:~8,1%
echo  %box:~7,1%A%box:~7,1%B%box:~7,1%
echo  %box:~25,1%%box:~26,1%%box:~27,1%%box:~26,1%%box:~6,1%
echo  %box:~7,1%C%box:~7,1%D%box:~7,1%
echo  %box:~21,1%%box:~26,1%%box:~23,1%%box:~26,1%%box:~9,1%
echo(
echo 1st combination (single vertical, double horizontal frames).
echo  %box:~34,1%%box:~26,1%%box:~30,1%%box:~26,1%%box:~5,1%
echo  %box:~0,1%A%box:~0,1%B%box:~0,1%
echo  %box:~19,1%%box:~26,1%%box:~37,1%%box:~26,1%%box:~2,1%
echo  %box:~0,1%C%box:~0,1%D%box:~0,1%
echo  %box:~33,1%%box:~26,1%%box:~28,1%%box:~26,1%%box:~11,1%
echo(
echo 2nd combination (double vertical, single horizontal frames).
echo  %box:~35,1%%box:~17,1%%box:~31,1%%box:~17,1%%box:~4,1%
echo  %box:~7,1%A%box:~7,1%B%box:~7,1%
echo  %box:~20,1%%box:~17,1%%box:~36,1%%box:~17,1%%box:~3,1%
echo  %box:~7,1%C%box:~7,1%D%box:~7,1%
echo  %box:~32,1%%box:~17,1%%box:~29,1%%box:~17,1%%box:~10,1%
echo(
:: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ end of examples ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

:: Reset the default OEM code page of your system.
>nul chcp %oemcp%

pause


Top
   
PostPosted: 05 Feb 2017 09:05 
Offline

Joined: 16 Dec 2016 22:31
Posts: 98
Another little trick if I may. As aGermen said in his post codepage 437 supports almost all charecters so we are using
Code: Select all
chcp 437
at the top of our code.
Now the thing is that we need to write the file in the same charecter-set. And here comes handy notepad++. Goto 'format->charecter sets->OEM-US'
As much as I have seen oem-us is the same as 437th codepage. Write any extended ascii charecters as it is(by using alt+number or copy-paste) in the bat file. Make sure you have used 'chcp 437',and saved the file as 'unicode' and the cmd window would be able to echo the charecters.

Sounak


Top
   
PostPosted: 05 Feb 2017 10:04 
Online
Expert

Joined: 06 Dec 2011 22:15
Posts: 1371
Location: México City, México
I said in this SO answer.

You just need this equivalences file:

Code: Select all
Notepad: ┌┬┐ ├┼┤ └┴┘ ─ │
cmd.exe: Ú¿ ÃÅ´ ÀÁÙ Ä ³

Notepad: ╔╦╗ ╠╬╣ ╚╩╝ ═ ║
cmd.exe: ÉË» Ìι Èʼ Í º

Copy it into a text file, that you must save as Unicode encoding. Then, when you want to insert a character in your Batch file, just choose the one below the graphic char you want to show!

Note: These characters are correct for code pages 850 or 437.

Antonio


Top
   
PostPosted: 05 Feb 2017 11:29 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
Antonio

It's also a matter of the encoding of the batch code itself. In other words your suggestion will work only if the "ANSI" setting of Notepad points to code page Windows-1252.

Steffen


Top
   
PostPosted: 05 Feb 2017 22:15 
Offline

Joined: 04 Feb 2017 22:03
Posts: 9
Wow! Thank you all for your incredibly helpful replies! :D

I'll have another play around to see if I can work out the best way to do this.

One thing I don't understand... is how Notepad++ uses UTF-8 encoding to save the batch file and it works fine, yet a batch file saved with UTF-8 in Notepad doesn't run properly. :-/

And... This is probably a stupid question, but... if I create a working batch file with a particular encoding (ANSI, UTF-8, etc.) and someone else downloads it onto their computer, will the encoding be preserved? I'm assuming it would be (if they don't edit it)...?


Top
   
PostPosted: 06 Feb 2017 10:06 
Offline

Joined: 16 Dec 2016 22:31
Posts: 98
esuhl wrote:
One thing I don't understand... is how Notepad++ uses UTF-8 encoding to save the batch file and it works fine, yet a batch file saved with UTF-8 in Notepad doesn't run properly. :-/

I'm not quite sure about this but as much as I have seen when I open .bat files saved as utf-8 in notepad++ with sublime 3 it shows unicode encoding. Maybe notepad++ auto saves the file in a cmd preferable encoding if any batch related extension is used. I'm not sure though, it's just an assumption.
esuhl wrote:
And... This is probably a stupid question, but... if I create a working batch file with a particular encoding (ANSI, UTF-8, etc.) and someone else downloads it onto their computer, will the encoding be preserved? I'm assuming it would be (if they don't edit it)...?

Encoding means in which coding technique the file is saved. You can easily understand the difference by saving same text as unicode and as utf-8 then comparing them with a hex-editor.
Long story short, yes the encoding will be kept unless the persons edits(and saves in a different encoding) the file.
Hope this helped.

Sounak


Top
   
PostPosted: 06 Feb 2017 12:59 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
I wrote a simple batch code to explain what happens.
Code: Select all
@echo off &setlocal
echo ä
pause

Saved using Notepad and with ANSI encoding.

If I open it in a HEX editor the ä appears as byte E4. I know that the default ANSI code page on my PC is 1252. Lets have a look at the table:
https://en.wikipedia.org/wiki/Windows-1 ... age_layout
In row E and column 4 you'll find letter ä. So far everything okay.


If I execute the file it outputs
Code: Select all
õ
Drücken Sie eine beliebige Taste . . .

How does it come? As I already told you Windows and console apps work with different code pages. In my case the default OEM (console) code page is 850. Again lets have a look at the table.
https://en.wikipedia.org/wiki/Code_page ... age_layout
In row E and column 4 you'll find letter õ. Of course the content of the file didn't change but the interpretation of the byte read by cmd.exe changed. That's the reason why the same byte shows up as ä in Notepad and as õ in the cmd window.


Next step is UTF-8. Saving the same code as UTF-8 using Notepad and executing it
Code: Select all
C:\Users\steffen\Desktop>´╗┐@echo off   & setlocal
Der Befehl "´╗┐@echo" ist entweder falsch geschrieben oder
konnte nicht gefunden werden.

C:\Users\steffen\Desktop>echo ä
ä

C:\Users\steffen\Desktop>pause
Drücken Sie eine beliebige Taste . . .

The file seems to be totally destroyed. First of all we see three characters ´╗┐ in front of @echo of. This is the UTF-8 Byte Order Mark EF BB BF as it shows up in code page 850. See
https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
The BOM is an optional byte sequence. Its purpose is to help the parsing programs to easily determine that the following text is UTF-8 encoded. It's automatically prepended by Notepad. Since cmd.exe is designed to parse single-byte encoded souce code only it doesn't know anything of a BOM and parses it bytewise instead. That's the reason why it fails to execute the first line correctly.

If I remove the BOM with my HEX editor and execute the code it looks like that:
Code: Select all
ä
Drücken Sie eine beliebige Taste . . .

Now you see two characters echoed instead of only one. That's the nature of UTF-8. Any character that exceeds the 7 bit ASCII range (> 7F) will be assembled as byte sequence of up to 4 bytes.
However the reason why your code works if you saved it as UTF-8 using Notepad++ may have to do that it doesn't prepend the BOM. (Although I can't say it for sure because I don't have it installed.)


Now we know the basics. What happens if we upload this batch file and - say - a Russian downloads it. The byte E4 doesn't change but ...
The default ANSI code page in Russia is 1251. Notepad will interpret the E4 as д (see https://en.wikipedia.org/wiki/Windows-1251 ). And when the code is executed it shows up as ф because the default OEM code page in Russia is 866 (see https://en.wikipedia.org/wiki/Code_page_866 ). As you can see any batch code that contains characters that exceed the ASCII range isn't portable.

That's the reason why I created a UTF-16 file that contains the box-drawing characters. UTF-16 keeps being UTF-16 regardless what local settings your PC has. The Base64 code that is used to create the UTF-16 file is plain ASCII and thus, the same in every single byte code page. The TYPE command converts the UTF-16 text to the code page previously set via CHCP. Of course this code page has to support these characters. Otherwise the conversion will fail.

Steffen


Top
   
PostPosted: 07 Feb 2017 13:43 
Offline

Joined: 28 Jun 2010 03:46
Posts: 272
Thank you Steffen for such a great explanation on Code Pages!

Your idea for displaying these characters under different Code Pages is great. (but your example does not work for me).

Currently (because my .cmd is in use only by me) I am using this: I have this in an ANSI .txt file:

Code: Select all
    218 196  194  196  191
    +-   -   -+-   -   -+
    |Ú   Ä    |   Ä   ż|
     
   ł|179     ł|179     ł|179
 
    |195      |197      |180
    +-   -   -+-   -   -+
    |Ă   Ä   Ĺ|    Ä   ´|
 
   ł|179     ł|179     ł|179
 
    |Ŕ   Ä    |Á   Ä   Ů|
    +-   -   -+-   -   -+
    192 196  193  196  217


Here I have a DECIMAL value and an OEM character that I must use with NOTEPAD. It displays correctly under CMD window.

Saso


Top
   
PostPosted: 07 Feb 2017 13:47 
Offline

Joined: 28 Jun 2010 03:46
Posts: 272
Steffen: your example is displayed like this.

Image

(I added CHCP command to show current CP)

Saso


Top
   
PostPosted: 07 Feb 2017 14:25 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
miskox wrote:
(but your example does not work for me)

That's interesting. There are two possible reasons:
1) Code page 439 is not supported on your PC. Try one of the others that I noted in the comments.
2) The font does not support the missing characters. Change your console font to either Lucida Console or Consolas.

miskox wrote:
Currently (because my .cmd is in use only by me) I am using this: I have this in an ANSI .txt file:

Not surprising this doesn't work for me. Since you posted the content of your file (instead of the file itself) you brought another player into the game - the clipboard. It's important to understand that we are now talking about windows and fonts instead of files and code pages. The window (such as the browser window) is basically able to display any character. It's only limited by the characters that the used font supports. The clipboard copies the characters from one window and pastes it to the target window. As long as the target font supports the pasted characters they will show up as the same characters in the target window.
Lets take one character out of your example:
Code: Select all
@echo off &setlocal
echo ż
pause

I copied the ż from your post and pasted it into Notepad. It shows up correctly in Notepad because the font supports the ż. The clipboard made a good job.

If I try to save the code Notepad complains that I will lose some content if I continue saving the file ANSI encoded. Why is it? The default ANSI code page on my PC is 1252 (yours is 1250 as far as I remember). This code page doesn't support character ż.
The other way around I suppose Antonio's proposal won't work for you (while it works for me).

Steffen


Top
   
PostPosted: 09 Feb 2017 06:43 
Offline

Joined: 28 Jun 2010 03:46
Posts: 272
Steffen: you were correct: I changed the font and now it is ok.

Saso


Top
   
PostPosted: 09 Feb 2017 09:33 
Offline
Expert

Joined: 22 Jan 2010 18:01
Posts: 2651
Location: Germany
Thanks for you feedback Saso. This will help others solving this potential problem.

Steffen


Top
   
PostPosted: 09 Feb 2017 11:18 
Offline

Joined: 04 Feb 2017 22:03
Posts: 9
Amazing! Thank you all for your help! I think I'm starting to understand now... :-)

Notepad++ has encoded the batch file in UTF-8 format without a BOM, which is why the box characters in it are preserved when the file is saved and reopened. And, although the console requires batch files to be in single-byte ANSI format, UTF-8 files (without BOM) still work because (apparently) UTF-8 is identical to ANSI/ASCII in the lower (0-127) character range, and (presumably?) the multi-byte box characters are just echoed to the text file by the interpreter as single bytes without it needing to know anything about UTF-8...?

So... it sounds like this is a fairly safe way to echo box characters to a file on any Windows system.

That just leaves echoing box characters to screen. The console isn't (by default) compatible with UTF-8, so the easiest way to use box characters would be to chcp to a suitable code page, and use the equivalent characters (accented E, etc.) in the batch file, knowing that they will appear as box characters on screen on any system.

But from the recent comments above, this all depends upon the console font...? So, I just need to find a way to reset the font to the default and the box characters will appear correctly. Is this possible?

-----------------------------

Whilst looking into this, I had another idea. I saw that CMD.EXE can be started with the /U switch to start it in Unicode mode. At first I thought this wouldn't help as I wanted my batch file to run by itself without needing the user to manually start a command prompt first. Then I wondered if I could write a batch file to open a second console window in Unicode mode and re-direct the batch commands to it. I came across the following threads, which suggest solutions, but neither works. :-/

http://alt.msdos.batch.nt.narkive.com/7gd221w9/how-to-cmd-exe-switches-u-or-a-from-inside-a-dos-batch-script

https://www.pcreview.co.uk/threads/how-to-cmd-exe-switches-u-or-a-from-inside-a-dos-batch-script.3520673/

I also found a list of valid code page numbers (link below), and 65001 is supposedly UTF-8. Unfortunately it doesn't work in batch files -- I get the message, "The system cannot write to the specified device." I thought that was because the console isn't aware of multi-byte characters. But then opened a new command window with "CMD /U" and ran the batch file containing the "chcp 65001" command from there, but it failed again with the same message.

https://msdn.microsoft.com/en-us/library/windows/desktop/dd317756(v=vs.85).aspx


Top
   
PostPosted: 09 Feb 2017 12:52 
Offline

Joined: 04 Feb 2017 22:03
Posts: 9
aGerman wrote:
miskox wrote:
(but your example does not work for me)

That's interesting. There are two possible reasons:
1) Code page 439 is not supported on your PC.


So... a computer won't necessarily support all valid code pages...? Is the only reliable solution to stick to pure ASCII characters?


Top
   
Display posts from previous:  Sort by  
Post new topic  Reply to topic  [ 20 posts ]  Go to page 1 2 Next

All times are UTC-06:00


Who is online

Users browsing this forum: Aacini, Bing [Bot], Compo, jeb, zimxavier and 7 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Limited