a little unicode related subtopic

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
taripo
Posts: 227
Joined: 01 Aug 2011 13:48

a little unicode related subtopic

#1 Post by taripo » 13 Dec 2011 15:17

this is the part of the unicode discussion which started in the set /a thread, moved from the set /a thread to here.
Last edited by taripo on 13 Dec 2011 15:48, edited 1 time in total.

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#2 Post by taripo » 13 Dec 2011 15:18

Ed Dyreen wrote:'
...
for /f "usebackq tokens=1-3 delims=¦" %%b in ( '"1"¦"4"¦"RandomNumber"' ) do %Get.RandomNumber.TokenSTR% %()% >nul
call :LABEL%RandomNumber%



I see if I try to copy/paste ¦ into a cmd prompt, I get a white mark, it's the ANSI broken bar \xA6.

If I put that in a batch file, and TYPE blah.bat it shows ª (the IBM Extended ASCII symbol \xA6).

Why not write ª in the code? I guess that's what is executed when you run a batch file with that..

Do you have ¦ on your keyboard mapping to broken bar, so it's convenient to type, but if you do, how do you then type a proper pipe | ? Does your keyboard have both characters?

Your batch file works..

But if I try to get something simple to work.. That delimiter doesn't work for me as a delimiter. Comma does.. but ¦ doesn't


notepad fff.bat displays
for /f "delims=¦" %%f in (a¦b) do echo %%f
for /f "delims=," %%f in (a,b) do echo %%f


C:\gaa>type fff.bat
for /f "delims=ª" %%f in (aªb) do echo %%f
for /f "delims=," %%f in (a,b) do echo %%f


C:\gaa>echo abc>a

C:\gaa>echo def>b

C:\gaa>fff

C:\gaa>for /F "delims=ª" %f in (aªb) do echo %f
The system cannot find the file aªb.

C:\gaa>for /F "delims=," %f in (a b) do echo %f

C:\gaa>echo abc
abc

C:\gaa>echo def
def

C:\gaa>

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#3 Post by taripo » 13 Dec 2011 15:20

Taripo wrote: If I put that in a batch file, and TYPE blah.bat it shows ª

Ed wrote: I have no clue, I don't have that problem You can use whatever delimiter you like

Code: Select all

C:\PROFSYS\ADMIN>prompt $

@echo off
for /f "tokens=1-26 delims=¦" %a in ( "this¦works" ) do echo.a=%~a_ &echo.b=%~b_

a=this_
b=works_

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#4 Post by taripo » 13 Dec 2011 15:22

what version of windows?

i'm on XP.

This pic shows the straight pipe, and the broken bar.. As they appear, in a webpage in chrome, or in notepad.

here is a link to fff.bat
http://ge.tt/9ApVvRA

this shows fff.bat type, and ran, and in notepad
Image

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#5 Post by taripo » 13 Dec 2011 15:23

I have just replaced the delimiter in your script.. and it (still) works (this time without the weirdness).. Before, in your script, I tried replacing the delimiter with , and it failed. But replacing the delimiter with _ worked.

Not sure what's happening though with the | and ¦ though.

What do your ones look like, one straight pipe, one broken bar? you see my screenshots.

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#6 Post by taripo » 13 Dec 2011 15:24

Ed wrote: hmm, do you really need command.COM

Try .CMD for batch extension...

Tarip wrote
.cmd looks exactly the same.

I included a link to the file - 3 posts up.

I am curious if you get a different result from exactly the same batch file.

It works though.

And the main thing, is your script works too, and without weirdness since I changed the delimiter to _

Ed wrote:
I get a decimal value 166 and 167 from 0xA6, and 0xA7, however ¦ is dec 221 = 0xDD

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#7 Post by taripo » 13 Dec 2011 15:24

Ed Dyreen wrote:'
I get a decimal value 166 and 167 from 0xA6, and 0xA7, however ¦ is dec 221 = 0xDD




166(decimal) is 0xA6 hexadecimal e.t.c. So it'd be that for anybody.

¦ is not 0xDD. ▌ is 0xDD



My CMD cannot print ¦ But I can try to paste it in manually to the console.
It doesn't exist in IBM Extended ASCII, and it maps to another thing

it prints ▌ and ▌ is 0xDD and it works as a delimiter.


C:\gaa>for /F "tokens=1-2 delims=▌" %f in ("a▌b") do @echo %f - %g
a - b

My CMD prompt displays IBM Extended ASCII.. It sees 0xA6, doesn't know it's a broken bar, and it displays what is 0xA6 in Extended ASCII which is
http://www.jimprice.com/ascii-dos.gif
row A, column 6
also here 0xA6
http://ascii-table.com/ascii-extended-pc-list.php
it's the little a thing.


What version of Windows are you using? It looks like
What text editor are you using,

I am using Notepad, and saving in ANSI. It has a broken bar 'cos ANSI does. And it saves it as 0xA6, When CMD prompt tries to execute a bat file with 0xA6, or TYPE a bat file with 0xA6, it think it is a little 'a' thing. As you see, 0xA6 in Extended ASCII.

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#8 Post by taripo » 13 Dec 2011 15:25

That 0xDD character is not the pipe | or ¦

It doesn't work as a pipe.

C:\WINDOWS>dir ▌ more
Volume in drive C has no label.
Volume Serial Number is FC9D-4769

Directory of C:\WINDOWS


Directory of C:\WINDOWS

File Not Found

C:\WINDOWS>


It sounds like maybe you type ¦ and it comes out as ▌ in your bat file.
But then what would you type for a pipe to come out?

What are you using to write your bat file?

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#9 Post by taripo » 13 Dec 2011 15:25

It depends on your locale settings which codepages are used. You will find them in the registry.

Code: Select all

@echo off &setlocal
for /f "tokens=2*" %%i in ('reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" /v "OEMCP"') do set /a OEMCP=%%j
for /f "tokens=2*" %%i in ('reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" /v "ACP"') do set /a ACP=%%j
set OEMCP
set ACP
pause>nul

That code returns for my German settings:

Code: Select all

OEMCP=850
ACP=1252

Means codepage 850 (ASCII) and codepage 1252 (ANSI).

If you save your code in ANSI it is however interpreted in ASCII in your command window. For that reason some characters are not displayed in the same manner.
E.g. Hex 0xA9 represents character © in codepage 1252 but character ® in codepage 850.
(ref. http://en.wikipedia.org/wiki/Code_page_850, http://en.wikipedia.org/wiki/Windows-1252)

BTW: That discussion is a bit off topic in a "SET /A" thread, isn't it :wink:

Regards
aGerman

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#10 Post by taripo » 13 Dec 2011 15:46

all the above was originally from the set /a thread.
we then continued
viewtopic.php?f=3&t=2550

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: a little unicode related subtopic

#11 Post by Ed Dyreen » 23 Jan 2012 00:38

'
Taripo reported codepage problems in this topic
viewtopic.php?f=3&t=1817&start=0&hilit=code+page+850
And took the discussion to a new level here
viewtopic.php?f=3&t=2550&start=0

To be very clear, this character as you see it '¦' I use as data delimiter, in my case this is safe as it should never occur in any delimited data. I use codepage 850, and my batch has no problems with it at all !

Code: Select all

for /f "usebackq tokens=1-3 delims=¦" %%b in ( '"MinimumSTR"¦"MaximumSTR"¦"StoreVAR"' ) do 
if I

Code: Select all

chcp 850
Will my batch then finally work for people with different codepages :?:

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#12 Post by taripo » 23 Jan 2012 00:59

all the unicode related posts in that set /a thread you link to, are now in this thread I pasted them in 'cos they were a bit off topic within that set /a thread.

Also,

here i've pasted your code into the cmd prompt..

here are two command prompts, one with lucida console, one with raster fonts.

Image

It might not make any difference, but which font are you using, and do you also get the display that picture shows when you paste it in the command prompt?

Here are images for when I paste it into notepad save it then I do TYPE on it.
Image

note- though agerman mentioned about 2 places of setting codepages, whether that makes a difference to the above I don't know, probably not.. though it does matter for some things i'd have to check back to what he showed me there. 'cos the posts show about it.
Also I didn't save it as unicode.. which I suppose I should have...

Are you also using the unicode switch on cmd.com ? that's another thing agerman mentioned.. In the screenshots i've done I haven't done it with that. I haven't really tried the two codepage settings or the unicode switch for a while, only the last time it was discussed, though I may look back at it.

Here is the file saved as unicode (as opposed to ANSI which is notepad's default)
Image

and that big thick bar thing is funny business 'cos it's not even the broken pipe character, it's 0xdd so a broken pipe pasted in when it's set to raster font gets converted to that.

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: a little unicode related subtopic

#13 Post by Ed Dyreen » 23 Jan 2012 01:14

'
Hi Taripo, what a coincidence :)

Well it looks like a small a with an underscore 'a', but that is just how it displays.

I only care if the code is affected and whether forcing codepage 850 makes my code work.
You reported it didn't on your OS with your codepage and that you solved it with an underscore.
I want to make it work for everyone with this '¦' delimiter. :?

taripo
Posts: 227
Joined: 01 Aug 2011 13:48

Re: a little unicode related subtopic

#14 Post by taripo » 23 Jan 2012 01:15

by the way Ed, until you made your post, this thread was continuing here
encodings
viewtopic.php?f=3&t=2550

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: a little unicode related subtopic

#15 Post by Liviu » 23 Jan 2012 01:23

Ed Dyreen wrote:'
Taripo reported codepage problems in this topic [...]
Aacini wrote:My PIPE.COM program is only 69 bytes in size, so it is a very good replacement of FINDSTR to be used in these cases. If you are worried about where to get my program from, here it is; just copy the 69 bytes below to a file named PIPE.COM and it is ready to run:

ë2´)€ì!Í!ŠÐŠà€Ä!€ü.u.€þ+u)R²A€ê!´#€ì!Í!Z´#€ì!Í!Šò€Æ!´,€ì!Í!"ÀuôLÍ!ëã

I also read about the .COM files that can be created with batch, but this requires me to save the batch as unicode otherwise I'll get a warning of unsupported characters , but if I do that my batch scripts won't execute ! Only if I save them as ansi !

So how could that ever work ?

Haven't followed the old thread, but as far as copy/pasting extended characters into .com executables also consider viewtopic.php?p=12950#p12950 or, more to the point, do _not_ do it unless you completely positively understand the mechanics and possible pitfalls. You certainly do _not_ want to save it as unicode, and saving as "ansi" may or may not work depending on your vs. the original poster's codepage settings and chosen editor behaviors.

Cheers,
Liviu

Post Reply