Unicode to UTF-8

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
spradhan
Posts: 3
Joined: 28 May 2014 11:24

Unicode to UTF-8

#1 Post by spradhan » 28 May 2014 11:52

I need to convert Unicode to UTF-8 format. How can I write a batch file to do so?

Thanks!

penpen
Expert
Posts: 1725
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Unicode to UTF-8

#2 Post by penpen » 28 May 2014 14:10


Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Unicode to UTF-8

#3 Post by Liviu » 28 May 2014 19:19

It's not entirely clear from the question what the source "Unicode" is. Assuming it's a UTF-16LE encoded text file with the proper BOM, then converting it to a UTF-8 text file can be done with a simple 'type' at the cmd prompt. The following works since at least XP - replace of course 'utf16le.txt' and 'utf8.txt' with your actual input/output filenames.

Code: Select all

C:\tmp>chcp 65001
Active code page: 65001

C:\tmp>type utf16le.txt >utf8.txt

The same can be done in a batch file, but in order for it to also work under XP (not just Win7+) the codepage must be changed and restored on the same line, and 'type' must run under a separate instance of 'cmd'.

Code: Select all

@echo off
setlocal disabledelayedexpansion

:: save original codepage
for /f "tokens=2 delims=:." %%a in ('chcp') do @set /a "cp=%%~a"

:: convert utf-16le to utf-8
rem all on one line since batch parsing fails while active codepage is utf-8
chcp 65001 >nul & cmd /a /c type %1 >%2 & chcp %cp% >nul

Liviu

spradhan
Posts: 3
Joined: 28 May 2014 11:24

Re: Unicode to UTF-8

#4 Post by spradhan » 29 May 2014 07:16

Thanks penpen and Liviu.

Liviu - the code worked!! Thanks!

Post Reply