JSORT.BAT v4.2 - problems with german umlauts

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Sponge Belly
Posts: 216
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: JSORT.BAT v4.2 - problems with german umlauts

#16 Post by Sponge Belly » 07 Aug 2022 05:43

Excellent work, Steffen! 8)

Just one question… what do you mean by ACP 1252 and OEMCP 850?

How can I have 2 different code pages active at once? If I chcp 1252, how can the code page be 850 at the same time?

Thanks!

- SB

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: JSORT.BAT v4.2 - problems with german umlauts

#17 Post by aGerman » 07 Aug 2022 06:18

ACP is for ANSI Code Page. This is what Windows GUI apps are using for charsets that are usually not Unicode compliant. (In contrast, Microsoft is often using the term "Unicode" for UTF-16 even though this is not the only charset which is Unicode copliant.) On older Windows versions Notepad saved new text files ACP encoded by default. And usually text editors fall back to render text files using the ACP if they can't detect the actual encoding.

OEMCP is for Original Equipment Manufacturer Code Page. It's a little missleading nowadays and stems from a time when DOS was the operating system while Windows has been only a subsystem running on DOS. It's been necessary that DOS aligned the character encoding with the encoding the OEM used (e.g. for the BIOS) because there has been the same virtual terminal interface for both. That means, the whole screen was the interface because originally there were no such things like different windows on one screen.
For historical/compatibility reasons the console host is still using this OEMCP rather than the ACP.

Microsoft is using these terms in the docs and so I used them here, too. An easy way to explore both values at once is the registry.
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
You'll find both the ACP and the OEMCP key with the ID used by default on your system.

FWIW native English speakers are probably not quite aware of the problems caused by having different encodings for Windows apps and console apps. English is more or less ASCII only. And both the ACP and the OEMCP share the same ASCII subset. However, if your Batch script is ACP encoded but the CMD interprets it like it was OEMCP encoded you may understand that this causes all kind of havoc as soon as the script contains non-ASCII text, too. I guess the first time English speakers are confused is when they have to use funny characters in the script to get boxes and blocks drawn into a console window :lol:

Steffen

Post Reply