save playlist as unicode or ?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

save playlist as unicode or ?

#1 Post by Ed Dyreen » 07 Jan 2019 08:46

I was having some problems trying to play some MP3 files.

Code: Select all

Délinquant - LIM - Track 7.MP3
Using the code

Code: Select all

>"%~n0.m3u" (

	for %%@ in (

		ape, mp3, dts, flac, wav, mid, m4a, wma, mp4, flv

	) do 	2>nul dir /B /A-D /S "%~dp0*.%%@" &&set /A $ += 1
)
VLC goes nuts, I open the M3U file and I find that every occurrence of char 'é' is replaced by ','

Code: Select all

D,linquant - LIM - Track 7.MP3
If I do it from the console directly it works as expected.

Code: Select all

>chcp
Actieve codetabel: 850
>echo.>é.tmp
>dir /B /A-D "*.tmp"|more>out.tmp
>type out.tmp
out.tmp
é.tmp
But same code from within batch

Code: Select all

Actieve codetabel: 850
Ú.tmp
out.tmp

Druk op een toets om door te gaan. . .

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: save playlist as unicode or ?

#2 Post by aGerman » 07 Jan 2019 11:21

Try to change the codepage to your default ANSI codepage (probably 1252) at the beginning of the script. As long as the characters are representable in ANSI it should work.

Steffen

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: save playlist as unicode or ?

#3 Post by Ed Dyreen » 07 Jan 2019 12:54

aGerman wrote:
07 Jan 2019 11:21
Try to change the codepage to your default ANSI codepage (probably 1252) at the beginning of the script. As long as the characters are representable in ANSI it should work.

Steffen
The output written to the file is correct. Thanks. :D

Why is explorer using a different codepage ?

Now i see the wrong character 'Ú' displayed in the console, why ?

aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: save playlist as unicode or ?

#4 Post by aGerman » 07 Jan 2019 13:59

Ed Dyreen wrote:
07 Jan 2019 12:54
Why is explorer using a different codepage ?

Now i see the wrong character 'Ú' displayed in the console, why ?
Rather knock on Microsoft's door to ask these questions. I guess it's for historical reasons (coming from DOS) that console applications use an OEM codepage while Windows applications use ANSI codepages. Although both use UTF-16 LE internally (even most of the ANSI versions of Windows API functions are just wrapping their UTF-16 pendents). Thus, UTF-16 LE is the one and only encoding the Windows explorer understands without ambiguities. Whenever you work with the file system you should access paths by reading and writing in UTF-16 if you have the opportunity to do so. And that's the point where you come to (one of many) limitations of Batch scripts… :wink:

However, to understand why you get a different output just have a look at the character maps: 850 and 1252 in your case.
In codepage 1252 letter é is represented by byte E9 (row E_, column _9). If you lookup the same byte in codepage 850 you'll find letter Ú. As you can see the same byte represents different letters (Unicode code points) in different codepages. If you'd ask me, the concept of codepages is the outdated and ambiguous trial to represent letters of different languages by only one byte which was doomed to failure from the very beginning. But we are talking about a time when a byte was as precious as a diamond because memory space was rare and processing was slow.

Steffen

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: save playlist as unicode or ?

#5 Post by Ed Dyreen » 08 Jan 2019 02:27

If I always adjust my batch to use the users windows explorer code page, could batch crash ?

How do I figure out which code page windows explorer is using ?

I do not want these types of code page errors to occur when my programs are used by others. What can I do ?

Is it possible to have the visible output to be also 'é'. ?
echo. still says 'Ú' even though I chcped to 1252, I was expecting to see 'é'.

jfl
Posts: 226
Joined: 26 Oct 2012 06:40
Location: Saint Hilaire du Touvet, France
Contact:

Re: save playlist as unicode or ?

#6 Post by jfl » 08 Jan 2019 03:58

Ed Dyreen wrote:
08 Jan 2019 02:27
How do I figure out which code page windows explorer is using ?
Windows Explorer is using UTF-16 LE always.

Now, to find out which is the default 8-bit code page for non-Unicode Windows apps in your localized version of Windows, aka. the "System Code Page", you need to look in the registry value ACP:

Code: Select all

C:\JFL\Temp>reg query "HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage" /v "ACP"

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Nls\CodePage
    ACP    REG_SZ    1252

C:\JFL\Temp>
My codepage.exe tool gives you this and more:

Code: Select all

C:\JFL\Temp>codepage
Current console code page: 437 = OEM - United States
Default console code page: 437 = OEM - United States
System code page: 1252 = ANSI - Latin I
Console font: [TrueType] Liberation Mono

C:\JFL\Temp>
The console indeed defaults to the old DOS code page for your localization, for compatibility with old batch files written for DOS.
Changing it to the system code page makes things better, as it's likely to contain all characters you're likely to encounter in your language and neighboring ones.
Changing it to code page 65001 (UTF-8) is even better when dealing with file names, because it handles all Unicode characters, not just the 256 in your system code page.
But changing it to CP 65001 is more tricky when reading or writing files contents, as (contrary to Linux) very few files contain UTF-8 text in Windows.

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: save playlist as unicode or ?

#7 Post by Ed Dyreen » 08 Jan 2019 05:13

jfl wrote:
08 Jan 2019 03:58
Changing it to code page 65001 (UTF-8) is even better when dealing with file names, because it handles all Unicode characters, not just the 256 in your system code page.
But changing it to CP 65001 is more tricky when reading or writing files contents, as (contrary to Linux) very few files contain UTF-8 text in Windows.
did not mention that, but that is actually the first thing I tried chcping to 65001, but the .M3U file seemed empty from notepad. maybe not on windows XP ?

I then stopped searching and decided to just ask.

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: save playlist as unicode or ?

#8 Post by penpen » 08 Jan 2019 05:47

I'm unsure (actually can't test that), but the result might be caused by the winxp bug:
viewtopic.php?f=3&t=5588

I think the newest patcher is here:
http://consolesoft.com/p/cmd-xp-65001-fix/index.html


penpen

Post Reply