Page 1 of 1

Unicode to UTF-8

Posted: 28 May 2014 11:52
by spradhan
I need to convert Unicode to UTF-8 format. How can I write a batch file to do so?

Thanks!

Re: Unicode to UTF-8

Posted: 28 May 2014 14:10
by penpen

Re: Unicode to UTF-8

Posted: 28 May 2014 19:19
by Liviu
It's not entirely clear from the question what the source "Unicode" is. Assuming it's a UTF-16LE encoded text file with the proper BOM, then converting it to a UTF-8 text file can be done with a simple 'type' at the cmd prompt. The following works since at least XP - replace of course 'utf16le.txt' and 'utf8.txt' with your actual input/output filenames.

Code: Select all

C:\tmp>chcp 65001
Active code page: 65001

C:\tmp>type utf16le.txt >utf8.txt

The same can be done in a batch file, but in order for it to also work under XP (not just Win7+) the codepage must be changed and restored on the same line, and 'type' must run under a separate instance of 'cmd'.

Code: Select all

@echo off
setlocal disabledelayedexpansion

:: save original codepage
for /f "tokens=2 delims=:." %%a in ('chcp') do @set /a "cp=%%~a"

:: convert utf-16le to utf-8
rem all on one line since batch parsing fails while active codepage is utf-8
chcp 65001 >nul & cmd /a /c type %1 >%2 & chcp %cp% >nul

Liviu

Re: Unicode to UTF-8

Posted: 29 May 2014 07:16
by spradhan
Thanks penpen and Liviu.

Liviu - the code worked!! Thanks!