Page 1 of 2

JSORT.BAT v4.2 - problems with german umlauts

Posted: 29 Jul 2022 02:11
by Savion
Hello.

I have tested the nice script but jsort make problems with german umlauts.
If I convert a text with this:

Code: Select all

JSORT old.txt /p 12 /I /N /o new.txt
the new.txt has errors with German umlauts.

The old.txt is a ANSI file and this is the new.txt (ANSI) with errors.
1.jpg
1.jpg (52.96 KiB) Viewed 6433 times
If I convert the new.txt to UTF-8
2.jpg
2.jpg (54.31 KiB) Viewed 6433 times
Correct would be:
3.jpg
3.jpg (52.28 KiB) Viewed 6433 times
What is the right syntax for the export file?
I hope anyone can help me.

Thanks.

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 29 Jul 2022 06:06
by aGerman
Without trying to reproduce the behavior yet - just an idea: What happens if you place a

Code: Select all

chcp 1252
before calling JSORT?

Steffen

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 29 Jul 2022 13:22
by Savion
Hello Steffen.

I have tested.
Same error - All Umlauts are not right.
I have no idea more.

Another idea?

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 04:42
by aGerman
I'm afraid fixing this would require a refactoring of JSORT. E.g. reading and writing files in the JScript section instead of the Batch section of this hybrid script.

Steffen

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 07:15
by Savion
Thanks Steffen for your information.

Do you know a program / script that can correct umlauts?
Then I would let them run over the txt files.

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 08:15
by miskox
Can't SORT do the job? I see that only /N could be a problem. But if you don't need it... or you can rearrange the input file.

Saso

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 08:34
by Savion
Hello miskox.
Jsort can SORT. This is not the problem. Only the created new.txt with german umlauts is the problem.

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 09:37
by aGerman
I guess Saso refers to the SORT command that ships with Windows anyways.

Steffen

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 11:07
by aGerman
I found a relatively simple fix:
Replace all occurrences of
WScript.Echo
with
WScript.StdOut.WriteLine
in JSORT.BAT

Steffen

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 12:17
by miskox
aGerman wrote:
30 Jul 2022 09:37
I guess Saso refers to the SORT command that ships with Windows anyways.

Steffen
Yes Steffen, you are right SORT.exe that is part of the Windows OS.

Saso

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 30 Jul 2022 21:10
by Savion
Steffen - YES THIS IS IT! :D
THANK's THANK's THANK's!

No problem more with umlauts.

Only replace
WScript.Echo
with
WScript.StdOut.WriteLine


Beautiful Sunday Steffen.

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 31 Jul 2022 10:12
by miskox
Looks like Dave has some work to do.

Thanks Steffen.

Saso

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 01 Aug 2022 07:26
by aGerman
Not sure if Dave will still be maintaining this script. I should probably add this workaround to the original topic since the last commenter in 2019 seemingly faced the same problem.

Steffen

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 06 Aug 2022 05:59
by Sponge Belly
aGerman wrote:
Replace all occurrences of WScript.Echo with WScript.StdOut.WriteLine
Thanks for the tip, Steffen! :)

But can you explain why replacing WScript.Echo with WScript.StdOut.WriteLine solves the umlaut problem?

- SB

Re: JSORT.BAT v4.2 - problems with german umlauts

Posted: 06 Aug 2022 06:54
by aGerman
Quick investigation.

Script:

Code: Select all

@if (0)==(0) echo off
<%1 >CONOUT$ cscript //nologo //e:jscript "%~fs0"
pause
goto :eof @end

var ch = WScript.StdIn.ReadLine();
WScript.Echo(ch.charCodeAt(0).toString(16));
WScript.Echo(ch);
WScript.StdOut.WriteLine(ch);
Precondition for the output shown below:
ACP: 1252
OEMCP: 850
A test file containing only the byte 0xE9

Known character representation for byte 0xE9:
é in my ACP
Ú in my OEMCP

Output if the test file is dropped to the script:

Code: Select all

e9
é
Ú
Drücken Sie eine beliebige Taste . . .
Conclusion:
- WScript.StdIn.ReadLine reads byte 0xE9 without any charset conversion.
- WScript.Echo performs a conversion from ACP to OEMCP. The new value needs to be 0x82 to get represented as é in CP 850. Redirected to a file and interpreted in ACP byte 0x82 would be a "single low quotation mark" character. Can be proven by replacing CONOUT$ with a file name in the script.
- WScript.StdOut.WriteLine writes the original byte value through. It appears as Ú in CP 850. Redirected to a file and interpreted in ACP it would still be the é.

Steffen