This command line utility is a codepage converter. It supports charsets such as single-byte code pages, UTF-8, UTF-16 LE/BE, and EBCDIC. Its designed to process big files also. It shall work on Windows XP onwards (tested on XP, Windows 7, Windows 8.1, and Windows 10). It's a free and open source tool.
A few days ago miskox asked me to rewrite an old 16 bit tool that he uses in order to make it run on 64 bit Windows also. The tool converts text from one single-byte code page to another. I bet the native English speakers of you are wondering what such a tool is even good for. The answer is that the CMD console and Windows applications use different code pages where non-ASCII characters have different code points. Thus, characters like Ü, É, Š, and the like show up as different/wrong characters.
convertcp_v1.4.4.zip [84.33 KiB]
Downloaded 32 times
Usage of convertcp.exe
Converts a stream of characters to another code page.
CONVERTCP CP_In CP_Out [/i "infile.txt"] [/o "outfile.txt"] [/b|/a]
CP_In Code Page Identifier of the input stream
CP_Out Code Page Identifier of the output stream
To get a list of supported Code Page Identifiers use option /l
Alternatively you can use 0 for the ANSI Code Page
and 1 for the OEM Code Page of your system default settings.
/i Introduces the source file
/o Introduces the destination file
(the content of an existing file will be truncated
unless option /a was passed)
Redirections to or from CONVERTCP can be used instead of /i and /o
/b Add the Byte Order Mark to the output stream
(will be ignored if CP_Out was not one of
65001, 1200, or 1201)
/a Append the output stream to the destination file
(always use the same CP_Out)
Do not combine options /b and /a
/? Display this help message
/l Display a list of supported Code Page Identifiers
installed on this computer
infile Path of a text file whose content shall be converted
outfile Path of a text file where the converted stream
shall be written
The support of code pages is restricted ...
a) by the shared characters of both used code pages. If a read character has no equivalent the implementations of the used API functions decide if they
- either convert to the approximated ASCII character (e.g. Š to S)
- or replace it with a default character (usually a question mark)
b) by the maximum number of bytes used to represent a character. The table outputted using option /l indicates in the second column whether or not a code page can be used by CONVERTCP for input streams greater than 1MB (while all listed code pages can be used for output streams independing of their size).
The utility was written in C/WinAPI. Besides of the exe files (which are 32 bit and 64 bit MinGW/GCC release builds) the source code is included in the attached ZIP file. The program flow chart is for those who try to understand how the program works (even though it's simplified and incomplete). All files under MIT license.
Critique is always much appreciated.
Convert the output of a command and save it in a text file.
(The output of FINDSTR /? will be converted from the default OEM code page to UTF-16 LE with BOM prepended. The converted stream will be saved in "commands.txt".)
findstr /? | convertcp 1 1200 /b /o "commands.txt"
Convert the content of a text file and save it to another text file.
(The content of "commands.txt" will be converted from UTF-16 LE to the default ANSI code page and saved in "commands2.txt")
convertcp 1200 0 /i "commands.txt" /o "commands2.txt"
Convert the content of a text file and output it to the console window.
(The content of "commands2.txt" will be converted from the default ANSI code page to the default OEM code page and displayed.)
convertcp 0 1 /i "commands2.txt"
Append to an existing file.
(The output of FIND /? will be converted from the default OEM code page to UTF-16 LE. The converted stream will be appended to "commands.txt".)
find /? | convertcp 1 1200 /a /o "commands.txt"
Create a file with a Byte Order Mark only.
(NUL is redirected to CONVERTCP. Thus, the input stream is empty. The input code page ID is meaningless. Because the output code page ID is for UTF-8 and option /b was passed only the UTF-8 BOM will be written to the file. This might be useful if you want to append text to the file in multiple steps afterwards.)
<nul convertcp 0 65001 /b /o "bom.txt"
List the installed code pages.
(Process the outputted list of CONVERTCP /L in a FOR /F loop in order to write the values comma-separated)
for /f "skip=3 tokens=1,3,4*" %%i in ('convertcp /l') do echo "%%i","%%j","%%l"
2017/05/27 - v18.104.22.168/1 added option /l to print a list of installed code pages
2017/02/02 - v22.214.171.124/1 added option /a for appending to an existing file
2017/01/29 - v126.96.36.199/1 reduced the size of the binary files by half (kudos to carlos)
2017/01/23 - v188.8.131.52/1 minor performance improvement
2016/12/28 - v184.108.40.206/1 UTF-16 BE support added, options /i and /o added
2016/12/09 - v220.127.116.11/1 fixed bug in conversion from UTF-8
2016/12/08 - v18.104.22.168/1 ambiguous code fixed, minor optimizations, source code tidied
2016/12/05 - v22.214.171.124/1 UTF-16 LE support added
2016/12/03 - v126.96.36.199/1 UTF-8 support added, fixed misleading error message if the input stream has a size of exact multiples of 4 MB
2016/11/28 - v188.8.131.52/1 minor optimizations, source code tidied, 64bit utility added
2016/11/25 - v184.108.40.206 fixed possible deadlock caused by unsignaled threads
2016/11/24 - v220.127.116.11 fixed possible memory leak if reallocations fail
2016/11/24 - v18.104.22.168 moved to C, multithreaded conversion added
unpublished - first versions using C++ vector containers, without multithreading