Is there an explanation for the result of SET/P and CLIP?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
trebor68
Posts: 146
Joined: 01 Jul 2011 08:47

Is there an explanation for the result of SET/P and CLIP?

#1 Post by trebor68 » 21 Mar 2015 05:15

I have come to an unexpected result when testing with my batch file. This result can also be generated at the prompt.

Code: Select all

>echo. | set /p ".=a15(51|62)"
a15(51|62)
>echo. | set /p ".=a15(51|62)" | clip

>

The first command displays the result correctly.
The second command copies the result to the clipboard. But when inserted into a program Japanese characters are displayed.

Code: Select all

>echo. | set /p ".=A15(51|62)"
A15(51|62)
>echo. | set /p ".=A15(51|62)" | clip

>

The first command displays the result correctly.
The second command copies the result to the clipboard. But here is the string is displayed correctly when pasted into a program.

Between the two parts there is only the difference that the letter "a" is a lowercase letter and other times once a capital letter.

This result is however only be applicable if the string has this structure:
a lowercase letter, two digits, round bracket, two digits, pipe symbol, two digits and round bracket
a15(51|62)


My questions are:
What explanation is there?
There are also these results in other versions of Windows? I use Win7.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Is there an explanation for the result of SET/P and CLIP

#2 Post by foxidrive » 21 Mar 2015 06:38

It's likely to be a codepage issue.

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Is there an explanation for the result of SET/P and CLIP

#3 Post by penpen » 21 Mar 2015 09:55

The command (line) interpreter (CLI: cmd.exe) internally uses UCS-2 (a subset of UTF-16 LE; always 2 bytes per character) characters to encode characters.
But by default the CLI uses ANSI whenever a character is read/written to a pipe, a device, or a file (in common varying length per character - depends on the actual codepage; in this case 1 byte per character because all characters are std ASCII values).
The clip.exe autodetects the character encoding, and chooses UCS-2 in this case.

So these transformations occure (doublequotes, left/right sqare brackets and comma are delimiters and not part of the string):
[6100, 3100, 3500, 2800, 3500, 3100, 7C00, 3600, 3200, 2900] (hex representation; UCS-2 string "a15(51|62)")
--> [61, 31, 35, 28, 35, 31, 7C, 36, 32, 29] (hex representation; transformed to ANSI and written to pipe)
--> [6131, 3528, 3531, 7C36, 3229] (hex representation; read UCS-2 string "ㅡ⠵ㄵ㙼⤲" from pipe)

Sidenote - the characters in detail (hex notation == Unicode codepoint == Unicode character <- Name):
- 6131 == U+3161 == "ㅡ" <-- HANGUL LETTER EU
- 3528 == U+2835 == "⠵" <-- BRAILLE PATTERN DOTS-1356
- 3531 == U+3135 == "ㄵ" <-- HANGUL LETTER NIEUN-CIEUC
- 7C36 == U+367C == "㙼" <-- [has no name, only the definition: A military wall, a rampart, to pile up, a pile]
- 3229 == U+2932 == "⤲" <-- NORTH WEST ARROW CROSSING NORTH EAST ARROW

Edit 2: Using "cmd/U" is no solution; Sorry.
To avoid this you may use the "cmd.exe" switch "/U" (if not using win9X the second should be slightly faster):

Code: Select all

echo. | cmd /U /C set /P ".=a15(51|62)" | clip
cmd /U /C ^<nul set /P "=a15(51|62)" | clip


penpen

Edit: Clip.exe autodetects the character encoding (UCS-2/ANSI) and does not always use UCS-2, so i corrected this error; see Livius next post.
Edit 2: Using "cmd /U" does not solve the issue, it only changes the strings that are involved; see my next post for
Last edited by penpen on 23 Mar 2015 18:33, edited 2 times in total.

trebor68
Posts: 146
Joined: 01 Jul 2011 08:47

Re: Is there an explanation for the result of SET/P and CLIP

#4 Post by trebor68 » 21 Mar 2015 20:17

a upper-case letter, two digits, round bracket, two digits, pipe symbol, two digits and round bracket
A15(51|62)

Here, however, the correct text is passed: A15(51|62)

Code: Select all

echo. | set /p ".=A15(51|62)" | clip

Other strings have also made no problems.
shorter strings: a155\d
longer strings: a15(51|62|73|84)
Strings longer than 100 characters have been copied without problems in the clipboard and pasted into the program.


I'll edit the batch file so not later such a problem occurs.

Thank You.

trebor68

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Is there an explanation for the result of SET/P and CLIP

#5 Post by Liviu » 22 Mar 2015 00:15

trebor68 wrote:There are also these results in other versions of Windows? I use Win7.
Confirmed in Win7 x64, as well as Server 2012 R2 (thus likely Windows 8.1, too).
(Just as a side note, next time please specify which particular program you pasted into, since that may be relevant in other cases.)

Here though, the (mis)behavior is not confined to clip.exe, or related to the clipboard. Same thing happens while simply piping into more.exe.

Code: Select all

C:\tmp>echo. | set /p ".=a15(51|62)" | more
?????

C:\tmp>echo. | set /p ".=a15(51|62" | more
a15(51|62

C:\tmp>echo. | set /p ".=a15(51|62)+" | more
a15(51|62)+

@penpen explained the mechanics of _what_ happens nicely. Here is my (unverified) guess about _why_ it happens.

Pipes are byte-oriented streams, so, lacking prior knowledge, a receiving program that expects text data (clip.exe, or more.exe) must make a guess as to what encoding it assumes for input. Most likely, they end up relying on Windows' own IsTextUnicode API (https://msdn.microsoft.com/en-us/library/windows/desktop/dd318672%28v=vs.85%29.aspx) which is notoriously unreliable for very short texts without newlines. In your case, it looks like the byte stream is "detected" as UTF-16LE.

As shown in my example above, deleting or adding a character leaves an odd byte count, which eliminates UTF-16 as a candidate and causes the text to be taken as ASCII. Don't see a good reason why changing "a" to a capital "A" also makes the text be recognized as ASCII, but then I am not going to second-guess all the guesswork that went into IsTextUnicode.

Liviu

trebor68
Posts: 146
Joined: 01 Jul 2011 08:47

Re: Is there an explanation for the result of SET/P and CLIP

#6 Post by trebor68 » 22 Mar 2015 04:39

Now I can understand why I have to use a UniCode character string here. I will change this also in my other test batch files.

The program in which the string is to be inserted is VirtuaGirl (http://www.virtuagirl.com).
In the search box, it is possible to use regular expressions (RegEx).


Here is the beginning of the batch file.

Code: Select all

@echo off
setlocal enableextensions enabledelayedexpansion
if not %1#==# (set datei=%1.txt) else set datei=sport.txt
set "BoxStr="

rem  CardID's von Kollektion
set strcol=
for %%a in (a b c d e f) do findstr /i /r "%%a[0-9][0-9][0-9][0-9]" %datei% >nul && if not errorlevel 1 set strcol=!strcol! %%a
set hcolx=%strcol%

:nextcol
if "%hcolx%"=="" ((cmd /u /c ^<nul set /p ".=%BoxStr%" | clip) & goto :eof) else if not "%strcol%"=="%hcolx%" set "BoxStr=%BoxStr%|"
set hcol=%hcolx:~1,1%
set hcolx=%hcolx:~2%
set "BoxStr=%BoxStr%%hcol%"

...


Thanks to Liviu.

trebor68

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Is there an explanation for the result of SET/P and CLIP

#7 Post by penpen » 23 Mar 2015 18:59

I've written a c# program to copy a line to the clipboard, just execute this batch ("clipLine.bat") to create the executable:

Code: Select all

// // >nul 2> nul & @goto :main
/*
 * Author: Ulf Schneider aka penpen
 * Free for non profit use only.
 * For profit use contact me at www.dostips.com via private message (PM).
 * CmdFont.cs.bat
 */

/*
:main
   @echo off
   setlocal
   cls
   set "csc="

   pushd "%SystemRoot%\Microsoft.NET\Framework"
   for /f "tokens=* delims=" %%i in ('dir /b /o:n "v*"') do (
      dir /a-d /b "%%~fi\csc.exe" >nul 2>&1 && set "csc="%%~fi\csc.exe""
   )
   popd

   if defined csc (
      echo most recent C#.NET compiler located in:
      echo %csc%.
   ) else (
      echo C#.NET compiler not found.
      goto :eof
   )

   for %%a in ("%~dpn0") do for %%b in ("%%~dpna") do (
rem      %csc% /?
      %csc% /nologo /optimize /warnaserror /nowin32manifest /unsafe /debug- /target:exe /out:"%%~b.exe" "%~f0"
   )
   exit /B


System.Windows.Forms.Clipboard.SetText (Winforms) "winforms.dll"
System.Windows.Clipboard.SetText (WPF)   "PresentationCore.dll"
*/

using System;
using System.ComponentModel;
using System.Drawing;
using System.Windows.Forms;

using System.Text;
using System.IO;
using System.Runtime.InteropServices;


using DWORD = System.Int32;
using HANDLE = System.IntPtr;


public class ClipLine : Form {
   public ClipLine() {
   }

   [STAThread]
   public static unsafe void Main(string[] args) {
      Encoding currentEncoding = Console.InputEncoding;
      String unicodeString;

      switch (
         (args.Length != 1) ? "" :
         (!IsInputPiped) ? "" :
         args[0].ToUpper()
      ) {
         case "U":
         case "UNICODE":
            Console.InputEncoding = Encoding.Unicode;
            unicodeString = Console.ReadLine();
            Clipboard.SetText(unicodeString, System.Windows.Forms.TextDataFormat.UnicodeText);
            Console.InputEncoding = currentEncoding;
            break;

         case "A":
         case "ANSI":
            unicodeString = Console.ReadLine();
            Clipboard.SetText(unicodeString, System.Windows.Forms.TextDataFormat.UnicodeText);
            Console.InputEncoding = currentEncoding;
            break;

         default:
            Console.WriteLine ("Usage: program | ToClipboard[.exe] encoding");
            Console.WriteLine ("");
            Console.WriteLine ("Reads a line from STDIN and copies it to the clipboard.");
            Console.WriteLine ("  program   any valid program");
            Console.WriteLine ("  encoding  valid values are ANSI, ASCII, UTF8, UTF16, UTF32, Unicode (Same as UTF-16)");
            Console.WriteLine ("  A[NSI]      data in STDIN is encoded using actual ANSI codepage.");
            Console.WriteLine ("  U[NICODE]   data in STDIN is encoded using UNICODE.");
            Console.Out.Flush ();
            break;
      }
   }

   public static bool IsInputPiped {
      get { return FILE_TYPE_PIPE == GetFileType(GetStdHandle(STD_INPUT_HANDLE)); }
   }


   public const DWORD FILE_TYPE_UNKNOWN = (DWORD) (0x0000);
   public const DWORD FILE_TYPE_DISK    = (DWORD) (0x0001);
   public const DWORD FILE_TYPE_CHAR    = (DWORD) (0x0002);
   public const DWORD FILE_TYPE_PIPE    = (DWORD) (0x0003);
   public const DWORD FILE_TYPE_REMOTE  = (DWORD) (0x8000);

   [DllImport("kernel32.dll")]
   private static extern DWORD GetFileType(HANDLE hFile);


   public const DWORD STD_INPUT_HANDLE  = (DWORD) (-10);
   public const DWORD STD_OUTPUT_HANDLE = (DWORD) (-11);
   public const DWORD STD_ERROR_HANDLE  = (DWORD) (-12);

   [DllImport("kernel32.dll")]
   private static extern HANDLE GetStdHandle(DWORD nStdHandle);
}

Just select the cmd.exe encoding actually in use:

Code: Select all

cmd /A /C echo abcdef| clipline.exe ANSI
cmd /U /C echo abcdef| clipline.exe UNICODE
cmd /A /C echo abcdef| clipline.exe A
cmd /U /C echo abcdef| clipline.exe U


penpen

Edit: Removed a bug, and shortened the code.

trebor68
Posts: 146
Joined: 01 Jul 2011 08:47

Re: Is there an explanation for the result of SET/P and CLIP

#8 Post by trebor68 » 28 Mar 2015 03:32

I understand that
- Automatic detection does not always work correctly
- CMD / U only changes the string

This batch file creates a string of only the following characters:

Code: Select all

a b c d e f  0 1 2 3 4 5 6 7 8 9  ( ) [ ] \ | 

Here is an example of a string:

Code: Select all

a16(29|3[0-4])|c05(49|5[024])|e00(5[789]|60|61)|f0034


The internal values, which can be found with the search string only contain characters between x20 and x79, ie ANSI code. There is only one exception.
This exception is the sign:

Code: Select all

ï  ==  U+00EF  ==  LATIN SMALL LETTER I WITH DIAERESIS  ==  Input at prompt: Alt+0239
Ï  ==  U+00CF  ==  LATIN CAPITAL LETTER I WITH DIAERESIS  ==  Input at promt: Alt+0207  ==  Hint: Is not in any value.


With the 'CMD / U' command I get the result that the search box can use it correctly.

I thank you again for your comments.

trebor68

Post Reply