CONVERTCP.exe - Convert text from one code page to another

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
miskox
Posts: 330
Joined: 28 Jun 2010 03:46

Re: CONVERTCP.exe - Convert text from one code page to another

#76 Post by miskox » 22 Mar 2019 02:42

What was at first a 'simple' problem for me (converting .txt files between CP852 and CP1250 -see viewtopic.php?p=50289#p50289) is now an ongoing project.

Steffen once wrote
I don't expect to get bug reports because the utility will not be found and used that often.
How many times this thread has been read: 27,000 times!

Also
dated 06 Dec 2016 14:05
Then I'll leave it as it is unless somebody finds a bug or has a request to add another feature ...
and
dated 02 Feb 2017 20:21 (version 1.4.3)
I'll archive it and leave it alone
And now we are at version 6.1!

Steffen: thanks again.

Saso

aGerman
Expert
Posts: 3639
Joined: 22 Jan 2010 18:01
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#77 Post by aGerman » 22 Mar 2019 10:44

Yes, and now I have to grapple all the time to keep the thing up to date and running. You're the culprit, Saso :evil: (just kidding :lol:)
Seriously, when I started developing this utility I didn't expect that there is no end in sight. But meanwhile it's something like my baby. And still it's fun to work on it, and still I learn something new every time. And as long as a few people have a use for it, it's motivation enough to continue. Curiously there is no on-board tool for Windows like iconv for *nixoid systems.
So, thank you for having the idea :)

Steffen

Squashman
Expert
Posts: 4106
Joined: 23 Dec 2011 13:59

Re: CONVERTCP.exe - Convert text from one code page to another

#78 Post by Squashman » 23 Mar 2019 07:59

aGerman wrote:
22 Mar 2019 10:44
Curiously there is no on-board tool for Windows like iconv for *nixoid systems.
I wonder if it comes with the Linux subsytem for Windows 10? I have yet to install and try it.

aGerman
Expert
Posts: 3639
Joined: 22 Jan 2010 18:01
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#79 Post by aGerman » 23 Mar 2019 09:57

Squashman wrote:
23 Mar 2019 07:59
I wonder if it comes with the Linux subsytem for Windows 10?
Yes, of course. But the WSL isn't available for Win10 x86, and since iconv (along with the other Linux tools) is a native ELF file, you can't just execute it from the Windows command line. You always need the Linux shell of your installed distribution involved.

Steffen

penpen
Expert
Posts: 1695
Joined: 23 Jun 2013 06:15
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#80 Post by penpen » 23 Mar 2019 10:53

To be honest, i never needed such a tool:
Most of the time it was sufficient to be able to convert from utf-16le to all installed codepages.
(So i never tried to convert from codepage to utf-16le, so i never checked if that was possible.)

utf-8le -> any installed codepage:

Code: Select all

@echo off
:: needed files:
:: "bom.utf-16le.txt" contains 2 boms, nothing else
:: "test.utf-16le.txt" contains any text must have a utf-16le bom

:: with or without a bom
chcp 65001
>"test.utf-8.bom.txt" type "bom.utf-16le.txt" "test.utf-16le.txt"
>"test.utf-8.txt" type "test.utf-16le.txt"

chcp 65000
>"test.utf-7.txt" type "test.utf-16le.txt"

chcp 850
>"test.cp850.txt" type "test.utf-16le.txt"

penpen

aGerman
Expert
Posts: 3639
Joined: 22 Jan 2010 18:01
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#81 Post by aGerman » 23 Mar 2019 12:39

In post #3 I already addressed this possibility, penpen. Also ADO streams as used in Dave's JREPL.BAT are good alternatives to convert the text encoding. I'm absolutely of your opinion that you don't need any 3rd party whenever you can use the possibilities that the operating system already provides.
(So i never tried to convert from codepage to utf-16le, so i never checked if that was possible.)
Think of CMD /u /c.

It's rather the multi-threaded processing in CONVERTCP that makes it quite usefull if you have to convert big files. Furthermore you can convert UTF-16 BE and UTF-32 LE/BE where the combination of CHCP and TYPE isn't applicable anymore. And TYPE still causes problems using UTF-8 because character boundaries are not respected.

Steffen

aGerman
Expert
Posts: 3639
Joined: 22 Jan 2010 18:01
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#82 Post by aGerman » 23 Apr 2019 14:07

In C, the internal parser for command line arguments as well as library functions made for this purpose, may return partially quoted paths incorrectly. These parsers treat backslashes as escape characters to preserve quotation marks as literal expressions. This behavior causes errors though because on Windows the backslash is used as separator in paths. E.g. if you pass
C:\"my folder"\file.ext foo
you might have expected to get
C:\my folder\file.ext
as first argument and
foo
as second argument.
But instead, C sees
C:"my
as first argument and
folder\file.ext foo
as second argument.

CONVERTCP has no use for literal quotes in any of the arguments passed. To overcome faulty path specifications I implemented an own command line parser in version 6.2. It still uses quotation marks to preserve spaces and tab characters in a quoted substring, but it removes all quotation marks from the passed arguments and keeps backslashes as literal expressions.


Virustotal scans of version 6.2:
x86: https://www.virustotal.com/gui/file/b14 ... /detection
x64: https://www.virustotal.com/gui/file/c5d ... /detection

Steffen

smrutibora
Posts: 1
Joined: 25 Apr 2019 04:50

Re: CONVERTCP.exe - Convert text from one code page to another

#83 Post by smrutibora » 26 Apr 2019 02:40

So the low request ASCII code esteems continue as before, yet the high request esteems fluctuate from code page to code page. I can perceive how some code pages may share a few characters in like manner, yet their high request code esteems may be unique. So your utility can do the fundamental interpretation for characters in like manner. In any case, the end result for different characters that are not shared?

What's more, are there regularly enough high request characters in like manner to make the utility worthwhile?

I should think there would be various code pages with no non-ASCII cover by any means, so I can't perceive how the utility could be helpful in those cases.

At first, I considered how the utility functions - how might it know all the right mappings? Be that as it may, I took a gander at the source and see that it changes over the content to UTF-16, and afterward changes over back to an alternate single-byte character set. I guess it is the equivalent basic schedules that cmd.exe utilizations to change over stretched out ASCII content to and from UTF-16.
Last edited by aGerman on 27 Apr 2019 07:18, edited 2 times in total.
Reason: Moderator note: later added advertising link removed - you get banned from the forum

aGerman
Expert
Posts: 3639
Joined: 22 Jan 2010 18:01
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#84 Post by aGerman » 26 Apr 2019 09:11

smrutibora wrote:
26 Apr 2019 02:40
So the low request ASCII code esteems continue as before, yet the high request esteems fluctuate from code page to code page. I can perceive how some code pages may share a few characters in like manner, yet their high request code esteems may be unique. So your utility can do the fundamental interpretation for characters in like manner. In any case, the end result for different characters that are not shared?
No utility can convert characters that are not shared between the involved codepages. Please read the initial post. The paragraph beginning with "The support of code pages is restricted ..." explains the behavior.
smrutibora wrote:
26 Apr 2019 02:40
What's more, are there regularly enough high request characters in like manner to make the utility worthwhile?

I should think there would be various code pages with no non-ASCII cover by any means, so I can't perceive how the utility could be helpful in those cases.
The existence of single byte codepages has rather historical reasons. It's a poor concept and doomed to failure. Unfortunately it's still widespread on Windows. Convert your text to an encoding that fully supports Unicode, such as UTF-8 or UTF-16. That way it will be readable in every environment, regardless of local settings.
smrutibora wrote:
26 Apr 2019 02:40
how might it know all the right mappings?
They are already stored in the *.NLS files in folder System32.
smrutibora wrote:
26 Apr 2019 02:40
I guess it is the equivalent basic schedules that cmd.exe utilizations to change over stretched out ASCII content to and from UTF-16.
Correct. Conversions of UTF-16 BE and UTF-32 LE and BE are own extensions though. There are no Windows API functions for this purpose.

Steffen

aGerman
Expert
Posts: 3639
Joined: 22 Jan 2010 18:01
Location: Germany

Re: CONVERTCP.exe - Convert text from one code page to another

#85 Post by aGerman » 10 Jun 2019 06:06

Most of the current command line utilities don't support Virtual Terminal processing yet. In this case ANSI escape sequences are not used to control the console output and their textual expressions get printed to the screen. Example using an old version:

Code: Select all

>nul chcp 65001
echo +ABsAWw-93;42m+JYgliCWIJZMlkyWTJZIlkiWSJZElkSWR-   +ABsAWw-0m|convertcp "UTF-7" "UTF-8"
old_behavior.png
old_behavior.png (2.25 KiB) Viewed 731 times
Even if I expect that VT processing will be only barely used along with CONVERTCP, it won't hurt to enable it once that Windows 10 provides this possibility.
Same example code using CONVERTCP v. 6.3:
virtual_terminal_processing_v6.3.png
virtual_terminal_processing_v6.3.png (1.73 KiB) Viewed 731 times
Virtual Terminal processing affects the output to the console window only. Thus, CONVERTCP has to print to the window directly by omitting option /o and any redirections of the standard output stream. It's supported on Windows version 10.0.10586 onwards if the new console host is used.
The behavior on older Windows versions and for writing to files as well as for redirections keeps being the same as before.

Virustotal scans of version 6.3:
x86: https://www.virustotal.com/gui/file/94d ... /detection
x64: https://www.virustotal.com/gui/file/036 ... /detection

Steffen

Post Reply