Page 1 of 1
Unicode to UTF-8
Posted: 28 May 2014 11:52
by spradhan
I need to convert Unicode to UTF-8 format. How can I write a batch file to do so?
Thanks!
Re: Unicode to UTF-8
Posted: 28 May 2014 14:10
by penpen
Re: Unicode to UTF-8
Posted: 28 May 2014 19:19
by Liviu
It's not entirely clear from the question what the source "Unicode" is. Assuming it's a UTF-16LE encoded text file with the proper BOM, then converting it to a UTF-8 text file can be done with a simple 'type' at the cmd prompt. The following works since at least XP - replace of course 'utf16le.txt' and 'utf8.txt' with your actual input/output filenames.
Code: Select all
C:\tmp>chcp 65001
Active code page: 65001
C:\tmp>type utf16le.txt >utf8.txt
The same can be done in a batch file, but in order for it to also work under XP (not just Win7+) the codepage must be changed and restored on the same line, and 'type' must run under a separate instance of 'cmd'.
Code: Select all
@echo off
setlocal disabledelayedexpansion
:: save original codepage
for /f "tokens=2 delims=:." %%a in ('chcp') do @set /a "cp=%%~a"
:: convert utf-16le to utf-8
rem all on one line since batch parsing fails while active codepage is utf-8
chcp 65001 >nul & cmd /a /c type %1 >%2 & chcp %cp% >nul
Liviu
Re: Unicode to UTF-8
Posted: 29 May 2014 07:16
by spradhan
Thanks penpen and Liviu.
Liviu - the code worked!! Thanks!