Page 1 of 3

Can I get access to an ascii character not in the standard set?

Posted: 27 Oct 2017 00:54
by Jer
My project which involves a batch-only script and includes boxes, borders,
text and any ascii character, except 7, 8, 9, 10, 13, 26 and 27, is doing well
to a point, and now I need advice from the experts.

One of the options is entering all codes and text content from the command-line.
The other way to designate a text source is by entering a file ID, and the box or
border configuration codes are also entered from the command-line.

For entering all text from the command-line, I found that substituting poison characters
up front made it possible to configure (edit) the text for output without having the
issues these characters present when being manipulated. Some character that don't
do well from the command-line are accepted in code: [AP] = ampersand, [GT] = greater
than symbol, the double quote which may cause an error, [QT], etc.
So an entry might be: boxtool 2 /L30 /+ {#Quoth the raven [qt]nevermore[qt]
which is box type #2, length of text area is 30 chars., center the line of text.
{# means that everything that follows is contents inside the box.

I also found that changing the double quote to another character, ascii #160, until
ready to echo to the output file, prevented errors when the double quote was in
the same string as other poison characters, during the edit process.

The problem is the loss of ascii char. #160 because it is dedicated as a substitution character.
The excluded characters mentioned above do not work as substitution characters.

The solution needs to have 1 substitution character, in a variable, from another character set.
1 substitution character is required because text is being aligned.
Thanks for your advice.

Re: Can I get access to an ascii character not in the standard set?

Posted: 27 Oct 2017 02:14
by penpen
You could access any unicode character using UTF-8 (codepage 65001):

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
::ÿþ
set "cp="
for /F "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
if not defined cp set "cp=850"
>nul chcp 65001
set "quarterNote=♩"
>nul chcp %cp%

echo(!quarterNote!

endlocal
goto :eof
Notes:
There is no guarantee that such characters are displayed in a specific way (=>depends on the font used).
I've added the "::ÿþ" line to prevent notepad from autoconverting this text file to UTF-8; you have to store this file using ANSI encoding (which should be the default).


penpen

Re: Can I get access to an ascii character not in the standard set?

Posted: 27 Oct 2017 10:03
by Jer
Thanks penpen. The substitution works but I am having a problem restoring the " characters.

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
::ÿþ
set "cp="
for /F "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
if not defined cp set "cp=850"
>nul chcp 65001
set "quarterNote=♩"
>nul chcp %cp%
echo(!quarterNote!
Set "string="quotes""
echo string var: %string%
Set "string=%string:"=!quarterNote!%"
echo quote characters substituted: %string%

rem not working:
Set "string=%string:!quarterNote!="%"
echo string should be restored: %string%

endlocal
goto :eof

C:\Temp\DOSBatch>test093

string var: "quotes"
quote characters substituted: ♩quotes♩
string should be restored: ♩quotes♩

Re: Can I get access to an ascii character not in the standard set?

Posted: 27 Oct 2017 10:20
by aGerman
I'm surprised that your first replacement works.

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
::ÿþ
set "cp="
for /F "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
if not defined cp set "cp=850"
>nul chcp 65001
set "quarterNote=♩"
>nul chcp %cp%
echo(!quarterNote!
Set "string="quotes""
echo string var: %string%
Set "string=!string:"=%quarterNote%!"
echo quote characters substituted: %string%

Set "string=!string:%quarterNote%="!"
echo string should be restored: %string%

endlocal
goto :eof

Steffen

Re: Can I get access to an ascii character not in the standard set?

Posted: 27 Oct 2017 21:57
by Jer
penpen & aGerman, you made my day. Thanks a million!

One day my non-hybrid batch-zilla creation will be cleaned up and ready
for sharing in zipped format with the forum.

Jerry

https://www.dropbox.com/s/yiwywbwrznq6s4h/image0004.jpg?dl=0

edit: I need instruction on how to embed an image. I have not used dropbox
for a long time and this attempt did not work. Thanks.

Re: Can I get access to an ascii character not in the standard set?

Posted: 31 Jan 2018 14:31
by Jer
penpen wrote:
You could access any unicode character using UTF-8 (codepage 65001)
How is this done? I put your solution to work in temporarily replacing the double quote
while the input string is being interpreted for text content and other coded instructions,
and now would like to explore doing the same for < and > in a particular function that
brings files together, side-by-side. The files could contain any ascii character, excluding
7 characters (7,8,9,10,13,26,27) which can't be echoed to file and displayed with the TYPE command.

Can you describe what is going on with the quarternote solution? How would I
assign to variables other UTF-8 characters, different from code page 437 characters,
and if possible, be able to get the characters myself without just copying and pasting
your codes?
Jerry

Re: Can I get access to an ascii character not in the standard set?

Posted: 02 Feb 2018 17:39
by penpen
You just need to create a byte sequence (for example using your Windows default ANSI codepage), that equals an UTF-8 code, see:
https://en.wikipedia.org/wiki/UTF-8#Description.

Assumed your default ANSI codepage is 1252:
https://en.wikipedia.org/wiki/Windows-1252
Note that this codepage has undefined mappings for some (codepage 1252) codepoints (81, 8D, 8F, 90, ... in hex).
If you need these values you might change to another codepage with a defined mapping for those byte values.
If i remember right, then codepage 850 should be fully defined for all single byte values (00 .. FF).

A List of all UTF-8 character encoding is, for example, listed here:
http://www.fileformat.info/info/charset/UTF-8/list.htm


The UTF-8 code for the "QUARTER NOTE (U+2669)" character is "E2,99,A9" (in hex), see:
http://www.fileformat.info/info/charset ... start=8192

Now you look up the coding for codepage 1252, and you see get "♩".


penpen

Re: Can I get access to an ascii character not in the standard set?

Posted: 09 Feb 2018 01:22
by Jer
I spent over an hour browsing through the links, viewing the huge list of UTF-8 codes and characters,
and I can't say all that reading did me any good. You gave me the solution when you
identified the hex code for quarternote, E2,99,A9. Next in line is eighth note E2,99,AA.
My text editor, KEDIT, allows me to enter a character by typing in the hex code, so "ci x'e299aa'
inserts what represents the eighth note character: ♪
ci=column insert at the current line, current column of focus

An observation: the eighth note is visible when echoed or viewed with the SET command
but you can't see it in a file. I tried dozens of other UTF-8 characters and they all
displayed as a question mark. My code page is 437.

I had prepared a short batch file to demonstrate the advantage of double quote substitution
in a string that also has other poison characters, and that string will be edited, such as
putting the text in cells by adding interior cell walls at selected intervals.
Unfortunately, (~|*&"!│=:<%>^) &"!│=:<%>^)(~|* after editing the string by inserting
a character (vertical cell wall) in the middle, did not produce the same error in the test script
that my current project did, errors like "& was not expected" or "| was not expected".

Quarternote substitution for double quotes did solve those issues in the big batch script,
which makes borders and boxes around text, file appending, prepending, overlay, and
patterns too! Incoming text lines have their quotes substituted, then restored just before
the final product is displayed, a TYPEd file.

At this point, substituting other characters is not needed. Getting the double quotes
out of the way creates harmony among the cast of characters.

Thanks penpen.

Re: Can I get access to an ascii character not in the standard set?

Posted: 09 Feb 2018 05:07
by penpen
Jer wrote:
09 Feb 2018 01:22
An observation: the eighth note is visible when echoed or viewed with the SET command
but you can't see it in a file. I tried dozens of other UTF-8 characters and they all
displayed as a question mark. My code page is 437.
This is the default behaviour.
The command shell internally stores the characters as UTF-16LE coedpoints, supporting all actual unicode characters, so it is able to display these characters correctly (if your actually selected command shell font contains a glyph for that character). If you print this character to a file, then the command shell uses the codepoint of your actual codepage (437 in your case) for this character:
If a character isn't part of this codepage then the command shell will just print a predefined character of that codepage, which in most cases is the '?' character.

penpen

Re: Can I get access to an ascii character not in the standard set?

Posted: 22 May 2018 01:33
by Jer
Capturing a character from another codepage is changing the font
in the console window from a raster font to Courier New 12pt.
It occurs after this line is executed: >nul chcp 65001
The width of the window is reduced when this happens,
and it happens every time I run the code.

I did Windows 10 (64bit) updates today and noticed the problem right after.
Re-booting did not help. Any ideas?

edit: same results running in safe mode: run msconfig, selective startup, "system services" only checked

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
::ÿþ
set "cp="
for /F "tokens=2 delims=:." %%a in ('chcp') do set "cp=%%~a"
if not defined cp set "cp=850"
>nul chcp 65001
set "quarterNote=♩"
>nul chcp %cp%
echo(!quarterNote!
Set "string="quotes""
echo string var: %string%
Set "string=!string:"=%quarterNote%!"
echo quote characters substituted: %string%

Set "string=!string:%quarterNote%="!"
echo string should be restored: %string%

endlocal
goto :eof

Re: Can I get access to an ascii character not in the standard set?

Posted: 22 May 2018 10:23
by penpen
Jer wrote:
22 May 2018 01:33
Capturing a character from another codepage is changing the font
in the console window from a raster font to Courier New 12pt.
It occurs after this line is executed: >nul chcp 65001
The width of the window is reduced when this happens,
and it happens every time I run the code.
:shock:
Never expected something like that to happen by only changing the codepage!

I might expect that to happen if a character outside the codepage is dispalyed in the shell, so sorry to ask, but:
Are you sure it isn't one of the "echo"-lines?

Also i've just tested the above code on windows xp (32 bit) (virtual image), win8.1 x86, german localization (virtual image), and win10 x64, german localization and that never happened to me:
Which version of windows do you use (and which codepages are involved exactly, so i could try to provoke that on one of my systems)?


penpen

Re: Can I get access to an ascii character not in the standard set?

Posted: 22 May 2018 13:44
by Jer
My Windows version: Microsoft Windows [Version 10.0.17134.48]

Now it is the next day and I can't reproduce the problem.
The only thing different yesterday was doing the Windows updates and
I did a video capture (for the first time) with the Windows 10 XBOX app.

I did another video capture today and the problem came back. So that
seems to be the cause of the issue, the XBOX app. Then I shut down and immediately
started up...same issue. Then I shut down and started up 15 minutes later
and the problem went away.

Thanks for your post penpen. I had put a pause after each line in the batch
script to see which line made the font change. Life is now good and my project
goes over one more hurdle.
Jerry

Re: Can I get access to an ascii character not in the standard set?

Posted: 22 May 2018 15:11
by penpen
You should report this bug to Microsoft, so i won't get that (experimental?) update - (i'm on 10.0.16299.431):
https://www.quora.com/How-do-I-report-a ... -Microsoft.


penpen

Re: Can I get access to an ascii character not in the standard set?

Posted: 24 May 2018 00:24
by Jer
The issue of chcp 65001 changing a raster font to Courier New is back and
will not go away after 2 days. When I open a cmd window with the raster font
8x12 and type chcp 65001 the font changes to Courier New 12pt. This does not
happen with the other 9 raster font sizes. 8x12 happens to be the one I prefer.

This morning I re-installed Windows 10 and that did not solve the problem.
A potential bug in a pair of 8) batch utilities under development.
Any tricks to try fixing this?

Re: Can I get access to an ascii character not in the standard set?

Posted: 26 May 2018 14:42
by penpen
I don't even have a guess why something like that happens... (and i've never heard of such a bug before).

I would use a microsoft forum, and would ask for technical help there.
After finding a solution you should also post an additional note herein.

I'm sorry that i can't help you more :( .


penpen