I need to convert Unicode to UTF-8 format. How can I write a batch file to do so?
Thanks!
Unicode to UTF-8
Moderator: DosItHelp
Re: Unicode to UTF-8
This may help you:
- UTF-16 <-> UTF-32 http://www.dostips.com/forum/viewtopic.php?p=34442#p34442, and
- UTF-32 --> UTF_8 http://www.dostips.com/forum/viewtopic.php?p=34662#p34662.
penpen
- UTF-16 <-> UTF-32 http://www.dostips.com/forum/viewtopic.php?p=34442#p34442, and
- UTF-32 --> UTF_8 http://www.dostips.com/forum/viewtopic.php?p=34662#p34662.
penpen
Re: Unicode to UTF-8
It's not entirely clear from the question what the source "Unicode" is. Assuming it's a UTF-16LE encoded text file with the proper BOM, then converting it to a UTF-8 text file can be done with a simple 'type' at the cmd prompt. The following works since at least XP - replace of course 'utf16le.txt' and 'utf8.txt' with your actual input/output filenames.
The same can be done in a batch file, but in order for it to also work under XP (not just Win7+) the codepage must be changed and restored on the same line, and 'type' must run under a separate instance of 'cmd'.
Liviu
Code: Select all
C:\tmp>chcp 65001
Active code page: 65001
C:\tmp>type utf16le.txt >utf8.txt
The same can be done in a batch file, but in order for it to also work under XP (not just Win7+) the codepage must be changed and restored on the same line, and 'type' must run under a separate instance of 'cmd'.
Code: Select all
@echo off
setlocal disabledelayedexpansion
:: save original codepage
for /f "tokens=2 delims=:." %%a in ('chcp') do @set /a "cp=%%~a"
:: convert utf-16le to utf-8
rem all on one line since batch parsing fails while active codepage is utf-8
chcp 65001 >nul & cmd /a /c type %1 >%2 & chcp %cp% >nul
Liviu
Re: Unicode to UTF-8
Thanks penpen and Liviu.
Liviu - the code worked!! Thanks!
Liviu - the code worked!! Thanks!