CONVERTCP.exe - Convert text from one code page to another
Moderator: DosItHelp
Re: CONVERTCP.exe - Convert text from one code page to another
@aGerman and @Dave:
As Steffen wrote he made CONVERTCP for me - I had one old 16-bit 'pure' DOS .exe which I was using on an almost daily basis to do conversions between CP852 and CP1250 (and vice versa) - of course these 16-bit .exe files don't work on 64-bit architecture any more. Of course Steffen checked the source file and decided it was easier for him to write the program from scratch than to try and rebuild it. Here I cannot thank him enough.
Dave's JREPL is an excellent tool. It is just too complex for my needs (as Steffen also mentioned). Also (Steffen mentioned) I have very large .txt files to process (few hundred Mega Bytes).
Thank you.
Saso
As Steffen wrote he made CONVERTCP for me - I had one old 16-bit 'pure' DOS .exe which I was using on an almost daily basis to do conversions between CP852 and CP1250 (and vice versa) - of course these 16-bit .exe files don't work on 64-bit architecture any more. Of course Steffen checked the source file and decided it was easier for him to write the program from scratch than to try and rebuild it. Here I cannot thank him enough.
Dave's JREPL is an excellent tool. It is just too complex for my needs (as Steffen also mentioned). Also (Steffen mentioned) I have very large .txt files to process (few hundred Mega Bytes).
Thank you.
Saso
Re: CONVERTCP.exe - Convert text from one code page to another
Recently I found a silly little bug. In the past I supported options with leading dash (besides of leading forward slash). It was an undocumented "feature" which, in the end, was a failure. File names with leading dash were erroneously recognized as options. Fixed with version 1.5.
Virustotal:
x86: https://www.virustotal.com/en/file/cfbd ... /analysis/
x64: https://www.virustotal.com/en/file/1113 ... /analysis/
Steffen
Virustotal:
x86: https://www.virustotal.com/en/file/cfbd ... /analysis/
x64: https://www.virustotal.com/en/file/1113 ... /analysis/
Steffen
Re: CONVERTCP.exe - Convert text from one code page to another
Out of curiosity, how did you solve the problem
Did you remove the "feature" of - options?
Or did you come up with a mechanism for differentiating - option from file name beginning with - ? (Perhaps differentiating quoted vs. unquoted)
Dave

Did you remove the "feature" of - options?
Or did you come up with a mechanism for differentiating - option from file name beginning with - ? (Perhaps differentiating quoted vs. unquoted)
Dave
Re: CONVERTCP.exe - Convert text from one code page to another
Yes I removed it, Dave. It would have been possible to check for the existence of the following file argument after option /i. Also leading dashes could have been permitted after option /o. But any additional check would lead to decrease the performance of the tool. Thus, I decided to remove it and keep it simple. I don't think it will cause a lot of backward compatibility problems because it wasn't the documented way to pass options.
FWIW Arguments in C come in without surrounding quotes (similar to the WScript.Arguments items in the WSH). To differentiate quoted and unquoted arguments I would have to parse the command line myself. Possible but ... no
Steffen
FWIW Arguments in C come in without surrounding quotes (similar to the WScript.Arguments items in the WSH). To differentiate quoted and unquoted arguments I would have to parse the command line myself. Possible but ... no

Steffen
Re: CONVERTCP.exe - Convert text from one code page to another
Hi, Please let me know how to proceed to convert, should i go in cmd? then what i need to do? please explain in details. I have a file in my system suppose path in C:\abc\abc.txt, now what i have to do?
Re: CONVERTCP.exe - Convert text from one code page to another
The very first post of this thread shows you how to use it. Read that and try something then come back with a specific question.
Re: CONVERTCP.exe - Convert text from one code page to another
Lately I wrote a little cross-platform library in pure C to convert between different Unicode charsets. I came across UTF-32 that Windows doesn't provide any API functions for conversions. Since I already wrote my own functions I thought I could also add the support of UTF-32 to CONVERTCP. Some bigger changes were needed in the core functions of the source code which is the reason why I increased the major version number to 2.
Use codepage ID 12000 for UTF-32 Little Endian and 12001 for UTF-32 Big Endian.
While doing some tests I also found and fixed a bug that might have happend while reading UTF-16 BE containing surrogate pairs and having a size >1 MB.
Virustotal scans of version 2.0:
x86: https://www.virustotal.com/en/file/964c ... /analysis/
x64: https://www.virustotal.com/en/file/66ce ... /analysis/
Steffen
Use codepage ID 12000 for UTF-32 Little Endian and 12001 for UTF-32 Big Endian.
While doing some tests I also found and fixed a bug that might have happend while reading UTF-16 BE containing surrogate pairs and having a size >1 MB.
Virustotal scans of version 2.0:
x86: https://www.virustotal.com/en/file/964c ... /analysis/
x64: https://www.virustotal.com/en/file/66ce ... /analysis/
Steffen
Re: CONVERTCP.exe - Convert text from one code page to another
Hello,
I have discovered a bug in CONVERTCP.exe utility. I have converted followiing file, and last several lines of output file is totally different then original file, please see it.
This output was generated by comand:
Archive with both files You can download here (size exceeded allowed limit):
https://emerson.sendthisfile.com/c.jsp? ... Crx0yfTx6T
Note: These file will expire in 14 days.
best regards
Lubomir
I have discovered a bug in CONVERTCP.exe utility. I have converted followiing file, and last several lines of output file is totally different then original file, please see it.
This output was generated by comand:
Code: Select all
convertcp.exe 65001 1250 /i .\issues-03-2018qqqq.csv /o tmpout.csv
https://emerson.sendthisfile.com/c.jsp? ... Crx0yfTx6T
Note: These file will expire in 14 days.
best regards
Lubomir
Re: CONVERTCP.exe - Convert text from one code page to another
Thank you very much for your feedback Lubomir!
I was able to reproduce this issue. I'll get back with a bugfix as soon as possible.
Steffen
I was able to reproduce this issue. I'll get back with a bugfix as soon as possible.
Steffen
Re: CONVERTCP.exe - Convert text from one code page to another
Bug fixed
Virustotal scans of version 2.1:
x86: https://www.virustotal.com/en/file/a313 ... /analysis/
x64: https://www.virustotal.com/en/file/b9ae ... /analysis/
For those of you that are interested in the technical reason ...
Output streams are buffered by the operating system. That's something that I already knew. But since the failure never occurred in my tests I thought that the buffer was flushed at least when the thread function terminates. Obviously I was wrong. It took a while to find the reason but the fix is quite simple. It's just a call of the FlushFileBuffers API function at the end of the thread function. EDIT: NOPE, THIS DID NOT SOLVE THE PROBLEM
So thanks again Lubomir! Much appreciated indeed. Developers need people like you that report bugs rather than silently use the next found program
Steffen

Virustotal scans of version 2.1:
x86: https://www.virustotal.com/en/file/a313 ... /analysis/
x64: https://www.virustotal.com/en/file/b9ae ... /analysis/
For those of you that are interested in the technical reason ...
Output streams are buffered by the operating system. That's something that I already knew. But since the failure never occurred in my tests I thought that the buffer was flushed at least when the thread function terminates. Obviously I was wrong. It took a while to find the reason but the fix is quite simple. It's just a call of the FlushFileBuffers API function at the end of the thread function. EDIT: NOPE, THIS DID NOT SOLVE THE PROBLEM
So thanks again Lubomir! Much appreciated indeed. Developers need people like you that report bugs rather than silently use the next found program

Steffen
Re: CONVERTCP.exe - Convert text from one code page to another
Thanks again Steffen!
Though I never had this problem it is good to have new version.
Test file supplied has 40,000+ lines - my test file had more. So strange how I never encountered the problem.
Saso
P.S.: Steffen: maybe you could add a date below your name in the first post when you changed the file. Release notes are way down the post. So it would look something like this:
Though I never had this problem it is good to have new version.
Test file supplied has 40,000+ lines - my test file had more. So strange how I never encountered the problem.
Saso
P.S.: Steffen: maybe you could add a date below your name in the first post when you changed the file. Release notes are way down the post. So it would look something like this:
Code: Select all
Steffen
(updated 11-apr-2018)
convertcp_v2.1.zip
(86.46 KiB) Downloaded 5 times
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Re: CONVERTCP.exe - Convert text from one code page to another
Yeah there must have been something magic in the file of Lubomir
Sure I can add the date somewhere near the link. Although there is already a list of release notes at the end of the initial post as you mentioned.
Steffen

Sure I can add the date somewhere near the link. Although there is already a list of release notes at the end of the initial post as you mentioned.
Steffen
Re: CONVERTCP.exe - Convert text from one code page to another
Just one thing that might be worth to notice:
Right now I discovered that the output of CONVERTCP and JREPL (as well as using various text editors) are still different when I converted Lubomir's file.
CONVERTCP does not change line endings automatically. E.g. things like double line feeds (LF LF) can be found in Lubomir's file. CONVERTCP leaves it as LF LF while other software may automatically convert it to CR LF CR LF.
If software changes the line ending then your data could get corrupted. Typical example:
If you wrap the line in an Excel cell using [Alt]+[Enter] and you export this data as CSV then you'll find it as single LF (while the end of the row is CR LF). It would have been fatal if this single LF would be converted to CR LF.
Steffen
Right now I discovered that the output of CONVERTCP and JREPL (as well as using various text editors) are still different when I converted Lubomir's file.
CONVERTCP does not change line endings automatically. E.g. things like double line feeds (LF LF) can be found in Lubomir's file. CONVERTCP leaves it as LF LF while other software may automatically convert it to CR LF CR LF.
If software changes the line ending then your data could get corrupted. Typical example:
If you wrap the line in an Excel cell using [Alt]+[Enter] and you export this data as CSV then you'll find it as single LF (while the end of the row is CR LF). It would have been fatal if this single LF would be converted to CR LF.
Steffen