In fact, we should store multilanguage text in Unicode (UTF16LE preffered), but before using it as BAT code, we can convert it to actual OEM. Windows automatically recoded Unicode file depending on localized adjustments.
Ah, we may have a philosophical difference of opinion here.
I realize that a batch file is an interpreted text file, and Unicode is the best way (only reliable standard way) to store text with any arbitrary combination of languages. BUT, my original intent was to have a functioning
batch file that could be passed back and forth and work properly with "any" language. As you pointed out, a batch file in Unicode cannot be executed. If we follow your suggestion, then when the first person converts the Unicode to his/her OEM character set, all the unsupported characters are irreversibly translated into question marks "?". The batch file is no longer universal in that it cannot be passed on to the next person that requires the corrupted translation maps.
The problem is even more acute if you consider the :asc, :chr, :str2hex, :hex2str
functions. These require a string that represents all byte values from 0x01 - 0xFF. The Unicode representation will change from language to language, but the binary representation of the functioning batch file will not.
In my mind the problem is solved by treating the batch file not as text but as binary. The meaning of each character (byte) greater than 0x7F changes between languages, but the "universal" function(s) continue(s) to work as intended. In theory anyway, I'm still not sure what happens with multi-byte languages. I acknowledge that my strategy introduces its own set of issues, probably some that I'm not aware of. An obvious one is how to post a binary representation of a batch file on this site.
I don't think there is a perfect answer to this debate. I'd love input from more people on their opinion of the best way to proceed. Does the community at large think that treating a batch file as binary is a bad idea? Or is it worth pursuing?
With regard to my last posted code:
No, this unicode representation incorrect, probably, because you copied not from unicode source. Binary dump of this representation differs from the original: 3F 3F 20 FD FD 20 BF BF
Well, as per my earlier discussion, I was not attempting to post Unicode. Can you tell me if the steps I outlined in my prior post work for you?Addendum added 2011-05-13
Oops! I misunderstood what you wrote. I see that you already did try it and OEM 858 is obviously incorrect.
Try these maps using OEM 437. With correct manipulation using DOS OEM/Unicode conversion I am able to reconstruct the Russian, so I think it should work for you as well.
"dd mm yy,dd mm yy; 64 64 20 6D 6D 20 79 79 - English"
"yy mm dd,JJ MM TT; 4A 4A 20 4D 4D 20 54 54 - German"
"yy dd mm,úú ññ ¼¼; A3 A3 20 A4 A4 20 AC AC - Russian"
In the mean time I will try to figure out what character set my editor is using for display.
Thanks amel27 for your valued help and input