dbenham wrote:Here is a simple test case. [...] set a variable to extended ASCII 0xC8: [...] It looks like the direct read witht he S option is working, and the piped method is failing. But all is not as it seems.
Why, it is exactly as it seems. If, instead of piping to repl, you do just an "echo %input% | more" it will still return a plain "E" under codepage 437. The "corruption" is caused by the pipe, not by whatever is on the right hand side of it.
dbenham wrote:The glyphs are being written within JScript, and it is using a different code page to write to the screen.
No, the console output of cscript is fully Unicode and codepage-independent (also see the old, but I believe still applicable note, at
http://blogs.msdn.com/b/ericlippert/archive/2004/02/11/71472.aspx). Conversion to the active codepage only occurs when the output is piped or redirected.
dbenham wrote:My main goal for REPL.BAT was to enable editing of files using batch. If I redirect the output of REPL.BAT to a file, and then examine the contents, we see that the reverse is actually true. The direct read of the variable is corrupting the content
I don't dispute the symptom, but I disagree with the diagnostic
The corruption occurs due to the file redirection, and has nothing to do with reading the variable directly.
Just for prove that point beyond doubt, modify the "cscript" call in repl as follows.
Code: Select all
::: cscript //E:JScript //nologo "%~f0" %*
cscript //U //E:JScript //nologo "%~f0" %*
Then copy the code below to, say, replVar7.cmd (with "7" as a reminder that it doesn't work under XP).
Code: Select all
@echo off & setlocal disableDelayedExpansion
:: save original codepage ('.' for some localized windows e.g. german)
for /f "tokens=2 delims=:." %%a in ('chcp') do @set /a "cp=%%~a"
:: run repl, save output to UTF16-LE file with BOM
chcp 1252 >nul
(set /p =ÿþ) <nul >repl-u16.tmp 2>nul
call repl %3 %4 LXS %1 >>repl-u16.tmp
:: convert file to UTF-8 and read variable from it *** does NOT work in XP ***
chcp 65001 >nul
type repl-u16.tmp >repl-u8.tmp
for /f "delims=" %%s in (repl-u8.tmp) do set "output=%%s"
:: restore original codepage
chcp %cp% >nul
endlocal & set "%2=%output%" & goto :eof
Now run the following at a Win7 cmd prompt.
Code: Select all
C:\tmp>chcp 437 >nul
C:\tmp>set "input=‹αß©∂€›" & set input
input=‹αß©∂€›
C:\tmp>(set output=) && (call replVar7 input output "x" "y") && (set output) || (echo *** error)
output=‹αß©∂€›
C:\tmp>(set output=) && (call replVar7 input output "©" "-!-") && (set output) || (echo *** error)
output=‹αß-!-∂€›
C:\tmp>chcp 1252 >nul
C:\tmp>(set output=) && (call replVar7 input output "x" "y") && (set output) || (echo *** error)
output=‹αß©∂€›
C:\tmp>(set output=) && (call replVar7 input output "©" "-!-") && (set output) || (echo *** error)
output=‹αß-!-∂€›
C:\tmp>(set output=) && (call replVar7 input output "α" "-!-") && (set output) || (echo *** error)
output=‹-!-ß©∂€›
C:\tmp>(set output=) && (call replVar7 input output "a" "-!-") && (set output) || (echo *** error)
output=‹αß©∂€›
C:\tmp>set "input=È" & set input
input=È
C:\tmp>(set output=) && (call replVar7 input output "x" "y") && (set output) || (echo *** error)
output=È
C:\tmp>(set output=) && (call replVar7 input output "È" "y") && (set output) || (echo *** error)
output=y
C:\tmp>(set output=) && (call replVar7 input output "È" "ÈÈÈ") && (set output) || (echo *** error)
output=ÈÈÈ
All output is correct, and there is no corruption. However, the only change made to repl was in its output method - so the corruption that happened before was not about getting the variable right, but rather outputting the result correctly. (Note that I am not saying that there may not be other codepage issues with repl/S - but just that your test case is not an example of such.)
The way I see it, the issue here is not about repl in particular. You could replace "repl a a s input" with "cmd /v/c echo !input!" and have pretty much the same problem - it's a child process that outputs a string to the console correctly, but the parent has no (portable, Unicode-safe) way to capture that string into a variable of its own.
Liviu