
Is there a similar snippet, like that one:
http://www.dostips.com/?t=Snippets.AnsiToUnicode
to translate a file from UTF-8 to Unicode txt format?
I have to do that with thousands of files, so I need a command to call from a script

thanks!
Moderator: DosItHelp
Code: Select all
@echo off
setlocal disabledelayedexpansion
:: save original codepage
for /f "tokens=2 delims=:" %%a in ('chcp') do @set /a "cp=%%~a"
:: write utf-16le BOM
chcp 1252 >nul
rem replace with 'cmd /a /c (set ..' if called at 'cmd /u' prompt
(set /p =ÿþ) <nul >%2 2>nul
chcp %cp% >nul
:: convert utf-8 to utf-16le
rem all on one line since batch parsing fails while active codepage is utf-8
chcp 65001 >nul & cmd /u /c type %1 >>%2 & chcp %cp% >nul
TYPE is indeed surprisingly well behaved for a builtin commandaGerman wrote:I wasn't aware that TYPE would return a usable output.
Maybe your input file had a UTF-8 BOM (neither required nor recommended), which TYPE doesn't like. Or maybe your viewer did not skip over the UTF-16LE BOM (both required and recommended). Or maybe you just had some characters in the test file that the viewer font does not cover.Squashman wrote:...but there were a few unreadable characters at the beginning.
Notepad does indeed write a BOM to the UTF-8 file. When redirecting the output to a file, "type" converts the UTF-8 BOM to a UTF-16LE BOM. Since the original code forces a UTF-16LE BOM itself, the end result would be a UTF-16LE file mistakenly starting with two BOM sequences (0xFF 0xFE 0xFF 0xFE).Squashman wrote:I used notepad to save it as UTF-8