Incorrect encoding after reading a file

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Incorrect encoding after reading a file

#1 Post by doscode » 28 Mar 2012 14:58

Hello, I need to solve this. I read directory names from file. The names contains diacritics, European characters and are not displayed properly.

Code: Select all

chcp 1250

FOR /F "tokens=* delims= usebackq" %%R IN ("directories.conf") DO (

 IF EXIST ".\%%R" (
  echo %%R
 ) ELSE (ECHO NOT INCLUDED %%R)

)


Can you help me with it?

abc0502
Posts: 1007
Joined: 26 Oct 2011 22:38
Location: Egypt

Re: Incorrect encoding after reading a file

#2 Post by abc0502 » 28 Mar 2012 17:01

I don't think dos can show these characters

try this:
http://www.dostips.com/forum/viewtopic.php?f=3&t=2895
and this
http://www.dostips.com/forum/viewtopic.php?f=3&t=1462&start=0
chek the last post in the second link

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Incorrect encoding after reading a file

#3 Post by Liviu » 28 Mar 2012 23:00

doscode wrote:Hello, I need to solve this. I read directory names from file. The names contains diacritics, European characters and are not displayed properly.

It's not clear from your post whether (a) the batch file does in fact work and only the display is wrong i.e. the "%%R" vs. "NOT INCLUDED %%R" output is correct and just looks odd on screen, or (b) the directories are not correctly located at all.

In case (a) you probably need to just set your console font to "Lucida Console" instead of the default "Raster Font".

In case (b) you need to look at the "directories.conf" file more closely. If it's 8-bit-extended-ASCII then you have to know the codepage in effect when the file was saved, and "chcp" to it, first. Otherwise, if the .conf file is encoded as UTF-8 or UTF-16 then you're pretty much out of luck as far as plain batch files go.

Liviu

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#4 Post by doscode » 29 Mar 2012 02:19

Liviu:
The code works in sense that there is not any error except the encoding error (by other words: the program should work if the encoding would be OK. I used the same technique for reading directory and writing to file). The program behaves in different way than it should because of this error. I believe it is not only the display which is incorrect. The branch NOT INCLUDED %%R" runs instead echo %%R. Echo %%R will not run for there are not correct characters. The file "directories.conf" was generated by redirecting dir output and before that, I used "chcp 1250" to get correct characters into the file.
Last edited by doscode on 29 Mar 2012 03:11, edited 1 time in total.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#5 Post by doscode » 29 Mar 2012 02:31

ABS0502:
I read there:
"I see several questions.
1. display unicode
2. Working with a fix batch-text inside your batch
3. Working with unicode in a for-loop
4. redirecting unicode to another (unicode)text file
5. comparing characters/internal representation"
(I don't undertstand to all points, but I use Windows1250 in the file, that is CP1250)

Would it be possible to create code, that would change the characters from windows1250 to ANSI?. I think that the program works in ANSI but file in windows1250. Problem is that I cannot change such characters like "╚" to "Č" or "°" to "ř" because the script is saved in ANSI. Correct me if I am wrong, I am just very amateur. But would it be possible to write some code for the windows1250 character in cmd?

Or does system Windows have some command-line program for file conversion?

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Incorrect encoding after reading a file

#6 Post by Liviu » 29 Mar 2012 09:49

doscode wrote:Liviu:
I believe it is not only the display which is incorrect. The branch NOT INCLUDED %%R" runs instead echo %%R.

Please post an example of one such directory name for which the above happens. In particular, copy/paste the directory name from Windows Explorer, then also copy/paste it from the .conf file you generated.

My guess is that the name contains characters outside CP 1250, which can be verified if you provide the example. Note that the pathnames on disk are UTF-16 (unicode), and they are not always representable in a given 8-bit codepage.

Liviu

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#7 Post by doscode » 29 Mar 2012 17:15

More questions

* I have seen somewhere a commadn to print supported code pages, but I have lost it. What is the command?

* I found this command:

Code: Select all

cmd /u /c type oem.txt > utf.txt

Is it the code to convert file from one CP to another CP? If yes, what is the "type"?

* can I use the command to convert the 1250 to unicode?
* does chcp support unicode? What is the command. This does not work for me.
[url]http://wei-jiang.com/system/windows/how-to-set-unicode-encoding-in-windows-command-line
[/url]

I edited the code for chcp 65001 and batch file crashes:

Code: Select all

@echo off
cd ..
chcp 65001
del directories_.conf

FOR /F "tokens=2 delims=:" %%C IN ('chcp') DO (
Echo Your code page is %%C
)

FOR /F "delims=!" %%R IN ('dir * /b /a:d /o:n') DO (

 IF EXIST "%%R\scenery" (
  echo %%R
  echo %%R >> directories_.conf
 ) ELSE (ECHO NOT INCLUDED %%R)

)

Echo Directory list created...
pause



* So if I would save the file (dir /a:d /b > file.txt ) as unicode, could I open it in notepad to be it readable? And would the characters be OK if I would read the file again?

Example of the contains of the file:
Jižní Čechy střed
Jižní Čechy východ
Jižní Čechy západ
Jižní Morava západ - Telč

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Incorrect encoding after reading a file

#8 Post by Liviu » 29 Mar 2012 19:37

doscode wrote:Example of the contains of the file:
Jižní Čechy střed
Jižní Čechy východ
Jižní Čechy západ
Jižní Morava západ - Telč
I asked that you copy the directory names from both Windows Explorer, and separately from the .conf file. Please do that. Yes, it matters.

Taking the directory names you posted at face value, I do not see the problem you describe.

I created a directory C:\tmp\doscode. I created the 4 directories you quoted under it. I generated a .conf file as CP 1250 txt. I then ran your original batch file (saved as doscode.cmd) against the .conf file. It worked, details copied below - in particular, you'll notice that there is no "NOT INCLUDED" echo.

Code: Select all

C:\tmp\doscode>chcp 1250
Active code page: 1250

C:\tmp\doscode>dir /ad /b
Jižní Morava západ - Telč
Jižní Čechy střed
Jižní Čechy východ
Jižní Čechy západ

C:\tmp\doscode>dir /ad /b >directories.conf

C:\tmp\doscode>type doscode.cmd
@echo off

chcp 1250

FOR /F "tokens=* delims= usebackq" %%R IN ("directories.conf") DO (
  IF EXIST ".\%%R" (
    echo %%R
  ) ELSE (
    ECHO NOT INCLUDED %%R
  )
)

C:\tmp\doscode>doscode
Active code page: 1250
Jižní Morava západ - Telč
Jižní Čechy střed
Jižní Čechy východ
Jižní Čechy západ

C:\tmp\doscode>

Please tell what you are doing differently, or provide a complete reproducible test case where results don't match expectations.

Liviu

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#9 Post by doscode » 30 Mar 2012 02:53

Why Windows Explorer? If I would copy the names of directories from explorer, so the names are same as the above. E.g. Jižní Morava západ - Telč

Do you mean to copy console?

This is output of my program.

Code: Select all

(Edit: Here is the line Active codepage 1250,
I forgot to add it here but cannot to do it in editing mode, coz it gives correct characters, but they are not correct.
)
NOT INCLUDED Ji×nÝ ╚echy st°ed
NOT INCLUDED Ji×nÝ ╚echy vřchod
NOT INCLUDED Ji×nÝ ╚echy zßpad
NOT INCLUDED Ji×nÝ Morava zßpad - TelŔ


I use script called "make activation list.bat".

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#10 Post by doscode » 30 Mar 2012 03:02

If I would change the font to Lucida:

Code: Select all

Aktivní znaková stránka: 1250
NOT INCLUDED Jižní Čechy střed
NOT INCLUDED Jižní Čechy východ
NOT INCLUDED Jižní Čechy západ
NOT INCLUDED Jižní Morava západ - Telč

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#11 Post by doscode » 30 Mar 2012 04:53

I have found the problem. It has nothing common with encoding. I have two files with similar name (copy of the original) and in the original I had made some changes, so path is not the same. I found that the copy of original works and the original works not. And then I found the path problem. Sorry for wasting your time.

Nevertheless, If I would found some problem in encoding concerning this script I would write here. Yet. Thanks for your interest to help.

doscode
Posts: 175
Joined: 15 Feb 2012 14:02

Re: Incorrect encoding after reading a file

#12 Post by doscode » 30 Mar 2012 05:11

This is working script (just short part):

Code: Select all

T:\test>(
echo on
 IF EXIST ".\Jižní Morava východ 1 " (echo Jižní Morava východ 1   )  ELSE (ECHO NOT INCLUDED Jižní Morava východ 1   )
)
Jižní Morava východ 1

T:\test>(
echo on
 IF EXIST ".\Jižní Morava východ 2 " (echo Jižní Morava východ 2
   )  ELSE (ECHO NOT INCLUDED Jižní Morava východ 2   )
)
Jižní Morava východ 2


Do you see the space after ".\Jižní Morava východ 2"?
That does not matter in the test condition, but there is not space in real name.

If I changed the script to:

Code: Select all

 IF EXIST ".\Jižní Morava východ 2 \scenery" 


so the space is now problem to test correct path. So the space is taken from the file as there is and of line.

I found that I generated the space in the command

Code: Select all

echo %%R >> directories.conf

which should be

Code: Select all

echo %%R>> directories.conf

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Incorrect encoding after reading a file

#13 Post by Liviu » 30 Mar 2012 09:57

doscode wrote:Why Windows Explorer? If I would copy the names of directories from explorer, so the names are same as the above. E.g. Jižní Morava západ - Telč

The names are the same in this case, indeed. However, the names would be different if they contained characters outside CP 1250 for example "α" greek alpha. Explorer would show (and copy) the actual "α" greek alpha, while the CP 1250 text file would have a "?" in its place (assuming it was generated with a "dir /ad /b >etc" command).

Related point is that your batch file will not work for directory names containing characters outside CP 1250 - and it cannot possibly work, since it's simply not possible to store arbitrary UTF-16 (unicode) names in a text file using an 8-bit codepage. Before you ask, it's also not possible to read UTF-8 or UTF-16 text files with a "for" loop, so there is no obvious workaround.

Liviu

Post Reply