Hi Dave
dbenham wrote: ↑01 May 2020 23:22
pipes do not in and of themselves do any type of transformation.
You're splitting hairs! OK, it's not actually the pipe that does the transformation, it's the batch interpreter that does it before passing the data to the pipe.
In practice, the effect is the same: Any pipe in a batch involves conversion of the UTF-16 Unicode strings used internally by cmd.exe to 8-bit strings encoded in the current console code page.
Code: Select all
C:\JFL\Proj\Non-ASCII>chcp
Active code page: 437
C:\JFL\Proj\Non-ASCII>dir
Volume in drive C has no label.
Volume Serial Number is B4F9-E8AF
Directory of C:\JFL\Proj\Non-ASCII
2018-02-12 17:37 <DIR> .
2018-02-12 17:37 <DIR> ..
2017-04-13 18:07 <DIR> Arabic العربية
2017-04-13 18:07 <DIR> Chinese 中文
2017-08-17 14:34 <DIR> Czech Čeština
2017-08-17 14:35 <DIR> Español Spanish
2017-03-22 23:18 <DIR> French Français
2017-03-15 21:00 <DIR> German Deutsch
2017-08-17 14:36 <DIR> Greek Ελληνικά
2017-04-13 18:08 <DIR> Hebrew עִבְרִית
2017-08-17 14:37 <DIR> Hindi हिन्दी
2017-04-13 18:09 <DIR> Japanese 日本語
2017-08-17 14:37 <DIR> Korean 한국어
2017-04-13 18:06 <DIR> Russian Русский
2017-08-18 18:09 <DIR> Thai ภาษาไทย
2017-03-15 17:13 992 ansi.txt
2017-03-12 13:51 105 README.txt
2018-02-12 17:37 4,937 test.tar.gz
2017-03-15 15:37 1,986 utf16.txt
2017-03-15 17:33 1,251 utf7.txt
2017-03-15 13:14 1,063 utf8.txt
6 File(s) 10,334 bytes
15 Dir(s) 336,079,147,008 bytes free
C:\JFL\Proj\Non-ASCII> dir | more
Volume in drive C has no label.
Volume Serial Number is B4F9-E8AF
Directory of C:\JFL\Proj\Non-ASCII
2018-02-12 17:37 <DIR> .
2018-02-12 17:37 <DIR> ..
2017-04-13 18:07 <DIR> Arabic ???????
2017-04-13 18:07 <DIR> Chinese ??
2017-08-17 14:34 <DIR> Czech Cestina
2017-08-17 14:35 <DIR> Español Spanish
2017-03-22 23:18 <DIR> French Français
2017-03-15 21:00 <DIR> German Deutsch
2017-08-17 14:36 <DIR> Greek ε???????
2017-04-13 18:08 <DIR> Hebrew ????????
2017-08-17 14:37 <DIR> Hindi ??????
2017-04-13 18:09 <DIR> Japanese ???
2017-08-17 14:37 <DIR> Korean ???
2017-04-13 18:06 <DIR> Russian ???????
2017-08-18 18:09 <DIR> Thai ???????
2017-03-15 17:13 992 ansi.txt
2017-03-12 13:51 105 README.txt
2018-02-12 17:37 4,937 test.tar.gz
2017-03-15 15:37 1,986 utf16.txt
2017-03-15 17:33 1,251 utf7.txt
2017-03-15 13:14 1,063 utf8.txt
6 File(s) 10,334 bytes
15 Dir(s) 336,079,147,008 bytes free
C:\JFL\Proj\Non-ASCII>chcp 65001
Active code page: 65001
C:\JFL\Proj\Non-ASCII>dir
Volume in drive C has no label.
Volume Serial Number is B4F9-E8AF
Directory of C:\JFL\Proj\Non-ASCII
2018-02-12 17:37 <DIR> .
2018-02-12 17:37 <DIR> ..
2017-04-13 18:07 <DIR> Arabic العربية
2017-04-13 18:07 <DIR> Chinese 中文
2017-08-17 14:34 <DIR> Czech Čeština
2017-08-17 14:35 <DIR> Español Spanish
2017-03-22 23:18 <DIR> French Français
2017-03-15 21:00 <DIR> German Deutsch
2017-08-17 14:36 <DIR> Greek Ελληνικά
2017-04-13 18:08 <DIR> Hebrew עִבְרִית
2017-08-17 14:37 <DIR> Hindi हिन्दी
2017-04-13 18:09 <DIR> Japanese 日本語
2017-08-17 14:37 <DIR> Korean 한국어
2017-04-13 18:06 <DIR> Russian Русский
2017-08-18 18:09 <DIR> Thai ภาษาไทย
2017-03-15 17:13 992 ansi.txt
2017-03-12 13:51 105 README.txt
2018-02-12 17:37 4,937 test.tar.gz
2017-03-15 15:37 1,986 utf16.txt
2017-03-15 17:33 1,251 utf7.txt
2017-03-15 13:14 1,063 utf8.txt
6 File(s) 10,334 bytes
15 Dir(s) 336,078,675,968 bytes free
C:\JFL\Proj\Non-ASCII>dir | more
Volume in drive C has no label.
Volume Serial Number is B4F9-E8AF
Directory of C:\JFL\Proj\Non-ASCII
2018-02-12 17:37 <DIR> .
2018-02-12 17:37 <DIR> ..
2017-04-13 18:07 <DIR> Arabic العربية
2017-04-13 18:07 <DIR> Chinese 中文
2017-08-17 14:34 <DIR> Czech Čeština
2017-08-17 14:35 <DIR> Español Spanish
2017-03-22 23:18 <DIR> French Français
2017-03-15 21:00 <DIR> German Deutsch
2017-08-17 14:36 <DIR> Greek Ελληνικά
2017-04-13 18:08 <DIR> Hebrew עִבְרִית
2017-08-17 14:37 <DIR> Hindi हिन्दी
2017-04-13 18:09 <DIR> Japanese 日本語
2017-08-17 14:37 <DIR> Korean 한국어
2017-04-13 18:06 <DIR> Russian Русский
2017-08-18 18:09 <DIR> Thai ภาษาไทย
2017-03-15 17:13 992 ansi.txt
2017-03-12 13:51 105 README.txt
2018-02-12 17:37 4,937 test.tar.gz
2017-03-15 15:37 1,986 utf16.txt
2017-03-15 17:33 1,251 utf7.txt
2017-03-15 13:14 1,063 utf8.txt
6 File(s) 10,334 bytes
15 Dir(s) 336,078,675,968 bytes free
C:\JFL\Proj\Non-ASCII>
Exactly the same happens for argument strings passed by batch to sub-commands.
The only case where there is no conversion is in pipes between two external sub-commands.
Ex: myprog1.exe | myprog2.exe
But even in this case, the standard input and arguments of myprog1.exe ARE converted by cmd to the console code page, and the output of myprog2.exe to the console IS converted from the code page to UTF16. Which trashes all characters that are not available in that console code page. And de-facto the input of myprog2.exe IS encoded in the current console code page.
I'm very sensitive to all that because most third-party command-line programs mutilate my first name when displaying my home directory name in the Windows console, even more so when their output goes through a pipe. And I hate that.
To avoid that, programs that want to maximize their chance of correctly displaying characters in the user's language (French for me, but it would be the same for Chinese on a Chinese version of Windows) must assume that their console AND piped input and/or output are encoded in the current console code page. (Which is unfortunately never 65001 by default.)
This is what my
SysToolsLib programs do, and they display French characters (for me) in file names correctly in all cases, whatever the console code page, and whether in pipes or not. All known ports of Unix tools to Windows fail miserably at one or the other, and usually at both.
Jean-François