regex search and replace for batch - Easily edit files!

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#16 Post by dbenham » 28 Jun 2013 17:18

Borrowing Aacini's idea within his FindRepl.bat to have an option of only printing lines that have been modified, I added the A option to REPL.BAT

The A option causes only modified lines to be printed. Using that option, I was able to simplify the built in help within REPL.BAT - I eliminated the need to call FINDSTR to filter out non-help lines.

Here is an example of using the A option. The following transforms a normal DIR listing into one that resembles DIR /B except the short names are listed instead of the long name. If a short name does not exist, then the long name is used.

Code: Select all

dir /x |repl "^\S.{38}(?:(\S+).*| {13}(.*[^.]))" "$1$2" a

The result is the same as would be gotten from the following, except the code below will corrupt unicode in file names.

Code: Select all

for /f "eol=: delims=" %F in ('dir /b') do @echo %~snxF

A simple FOR could be used to preserve unicode names, but then the output will only include files, not folders.

See my edited code in my first post in this thread for the new version of REPL.BAT


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#17 Post by dbenham » 29 Jun 2013 20:18

I've made another update to my initial post.

The X (eXtended escape sequences) option has been modified to support \q as an escape sequence for a double quote.

The \q sequence works in both the Search and Replace strings.

If the L (Literal) option is combined with X, then the Search string now supports all of the extended escape sequences that were previously only available for the Replace string.


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: regex search and replace for batch - Easily edit files!

#18 Post by foxidrive » 25 Jul 2013 18:20

Dave, I'm not sure if this is a bug or I am triggering behaviour I don't understand.

The first line echos the text being piped to repl.
The second line is where I want to replace the pipe character
The third line shows what happens when I replace an = instead

Code: Select all

d:\abc>type "file.txt" | find "result="
text=jam|hello=123|result=ok|cow=cat|...

d:\abc>type "file.txt" | find "result=" | repl "|" "a"
ataeaxata=ajaaama|ahaealalaoa=a1a2a3a|araeasaualata=aoaka|acaoawa=acaaata|a.a.a.a

d:\abc>type "file.txt" | find "result=" | repl "=" "a"
textajam|helloa123|resultaok|cowacat|...


Any ideas?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#19 Post by dbenham » 25 Jul 2013 21:28

The | character is the regex meta character for alternation, meaning match what is on the left or right. I must confess, I don't understand how your construct gives the result that it does, but I don't think it is a bug. I think I've seen it used before.

To get the result you are looking for, either escape the |, or else use the L (literal) option

Code: Select all

type "file.txt" | find "result=" | repl "\|" "a"
:: or
type "file.txt" | find "result=" | repl "|" "a" L


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: regex search and replace for batch - Easily edit files!

#20 Post by foxidrive » 26 Jul 2013 06:42

Thanks Dave, that explains it. I used \x7c to get around it which worked too.

Regular expressions are great but they can also give behaviour that seems so odd, until you find the reason.

Ta.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: New function - :hexDump

#21 Post by foxidrive » 01 Oct 2013 09:13

Dave, why does this fail?

Code: Select all

@echo off
echo..>file.bin
type file.bin | repl "^.*$" "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52" x >file2.bin
pause


It's going to add an 0d0a on the end of it, but I thought it would work...

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: New function - :hexDump

#22 Post by dbenham » 01 Oct 2013 13:43

foxidrive wrote:Dave, why does this fail?

Code: Select all

@echo off
echo..>file.bin
type file.bin | repl "^.*$" "\x89\x50\x4E\x47\x0D\x0A\x1A\x0A\x00\x00\x00\x0D\x49\x48\x44\x52" x >file2.bin
pause


It's going to add an 0d0a on the end of it, but I thought it would work...

I think you posted this to the wrong thread :wink:
I believe you wanted regex search and replace for batch - Easily edit files!

But... :shock: :?
That is really odd.

It seems that CSCRIPT cannot redirect or pipe output that contains certain characters. I'm not sure if this is the entire script engine, or just JScript.

Here is a really simple JScript: test.js

Code: Select all

WScript.StdOut.WriteLine('\x89')

This runs no problem

Code: Select all

cscript //nologo test.js

But redirection fails

Code: Select all

cscript //nologo test.js >test.out

And also pipe fails

Code: Select all

cscript //nologo test.js | findstr "^"

This really troubles me, as it has a severe impact on many hybrid JScript/batch solutions that I actively use :evil:

I've spot checked a few extended ASCII codes, and some work, and some don't.

I'm really not sure what the hell is going on. I wonder if it has something to do with unicode characters that are somehow incompatible with the console somehow? But that is really just an uneducated stab in the dark.

Unfortunately, I don't have time at the moment to really investigate. I'll certainly come back to this, but maybe someone else can figure it out.

Things to investigate: Exactly which charcters work, which don't? Does the active code page have any impact?


Dave Benham

npocmaka_
Posts: 512
Joined: 24 Jun 2013 17:10
Location: Bulgaria
Contact:

Re: New function - :hexDump

#23 Post by npocmaka_ » 01 Oct 2013 14:07

I think these are the problematic characters (in forward and backward arrays): http://www.codeproject.com/Articles/178 ... ng-JScript

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: New function - :hexDump

#24 Post by Aacini » 01 Oct 2013 19:47

npocmaka_ wrote:I think these are the problematic characters (in forward and backward arrays): http://www.codeproject.com/Articles/17825/Reading-and-Writing-Binary-Files-Using-JScript

I had not realized what is the problem posted by foxidrive with Dave's REPL.BAT; I want to do some tests about this matter with my FindRepl.bat...

However, I reviewed the code posted at the link above when I developed my BinToBat.bat conversion program. My code present the conversion process in a more readable and compact form than the original one. These are the involved parts:

Code: Select all

// Convert binary bytes from input file to Hex digits in Stdout

var ado = WScript.CreateObject("ADODB.Stream");
ado.Type = 2;  // adTypeText = 2
ado.CharSet = "iso-8859-1";  // code page with minimum adjustments for input
ado.Open();
ado.LoadFromFile(WScript.Arguments(1));

var adjustment = "\u20AC\u0081\u201A\u0192\u201E\u2026\u2020\u2021" +
                 "\u02C6\u2030\u0160\u2039\u0152\u008D\u017D\u008F" +
                 "\u0090\u2018\u2019\u201C\u201D\u2022\u2013\u2014" +
                 "\u02DC\u2122\u0161\u203A\u0153\u009D\u017E\u0178" ;

var thisByte = ado.ReadText(1), lastByte = thisByte.charCodeAt(0);

if ( lastByte > 255 ) {
   lastByte = 128 + adjustment.indexOf(thisByte);
}

. . . .
. . . .

// Convert Hex digits from Stdin to binary bytes in output file

var ado = WScript.CreateObject("ADODB.Stream");
ado.Type = 2;  // adTypeText = 2
ado.CharSet = "iso-8859-1";  // right code page for output (no adjustments)
ado.Open();


I remember that I created this conversion method after completed several tests with different ado.CharSet's (code pages). All of them produced wrong results with certain bytes, but the choosen code page was the one with the minimum number of mismatchs. The problem is that the conversion of certain codes via JScript's String.fromCharCode() method generate an Unicode result instead of an Ascii character, as described at this post.

EDIT: If the problem with REPL.BAT is that it generate Unicode characters from certain hexadecimal codes, then the cure is simple: the output must be generated via an ADODB.Stream with .CharSet = "iso-8859-1", because in this case all the 256 Ascii characters are correctly generated.

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#25 Post by dbenham » 02 Oct 2013 12:33

Thanks Antonio - those unicode maps worked perfectly. :D

I've updated REPL.BAT in the first post on this thread to substitute the appropriate unicode character for the problem \xnn characters. I did not have to resort to ADO.

I've done some testing, and all seems to be working. The displayed character isn't always what I expect, but when the output is redirected or piped, the correct binary value is produced. If I pipe the output to FINDSTR "^", then I get my desired screen output.

I'm a bit surprised (pleasantly surprised) that the active code page does not seem to have any effect on the binary output.


Dave Benham

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: regex search and replace for batch - Easily edit files!

#26 Post by Aacini » 04 Oct 2013 10:54

dbenham wrote:Thanks Antonio - those unicode maps worked perfectly. :D

Dave Benham

You are welcome, Dave! I am glad that it was useful to you... :mrgreen:

That conversion table was defined via several tests with files created with ADO, that allows to manage both the Ascii and Unicode numbers. However, I was thinking about a method to get the conversion table with no use of ADO, so I devised a brute force method that send each Unicode character from 128 to 65535 to a redirected disk file and then compare the result vs. each one of the 32 problematic Ascii characters in 128-159 range that were previously stored in individual disk files via Batch code.

Code: Select all

@if (@CodeSection == @Batch) @begin

@echo off
setlocal EnableDelayedExpansion

set "AsciiChars128-159=€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ"
del AsciiChars*.txt 2> NUL

for /L %%c in (128,1,159) do (
   echo !AsciiChars128-159:~0,1!> AsciiChar%%c.txt
   set "AsciiChars128-159=!AsciiChars128-159:~1!"
)

del ConversionTable.txt 2> NUL
for /L %%c in (128,1,65535) do (
   if not exist AsciiChar*.txt goto break
   set /P "=%%c," < NUL
   Cscript //nologo //E:JScript "%~F0" %%c > JScriptChar.txt 2> NUL
   for %%s in (JScriptChar.txt) do if %%~Zs equ 3 (
      set "codeFound="
      for %%f in (AsciiChar*.txt) do if not defined codeFound (
         fc /B %%f JScriptChar.txt > NUL
         if not errorlevel 1 (
            set AsciiCode=%%~Nf
            set AsciiCode=!AsciiCode:~-3!
            echo AsciiCode[!AsciiCode!] = %%c; >> ConversionTable.txt
            del %%f
            set codeFound=true
            echo/
            echo Ascii code !AsciiCode! is Unicode character %%c
         )
      )
   )
)
:break
goto :EOF

@end

WScript.Stdout.WriteLine( String.fromCharCode(parseInt(WScript.Arguments.Unnamed.Item(0))) );

I run this program in my computer and got this table:

Code: Select all

AsciiCode[129] = 129; 
AsciiCode[141] = 141;
AsciiCode[143] = 143;
AsciiCode[144] = 144;
AsciiCode[157] = 157;
AsciiCode[140] = 338;
AsciiCode[156] = 339;
AsciiCode[138] = 352;
AsciiCode[154] = 353;
AsciiCode[159] = 376;
AsciiCode[142] = 381;
AsciiCode[158] = 382;
AsciiCode[131] = 401;
AsciiCode[136] = 710;
AsciiCode[152] = 732;
AsciiCode[150] = 8211;
AsciiCode[151] = 8212;
AsciiCode[145] = 8216;
AsciiCode[146] = 8217;
AsciiCode[130] = 8218;
AsciiCode[147] = 8220;
AsciiCode[148] = 8221;
AsciiCode[132] = 8222;
AsciiCode[134] = 8224;
AsciiCode[135] = 8225;
AsciiCode[149] = 8226;
AsciiCode[133] = 8230;
AsciiCode[137] = 8240;
AsciiCode[139] = 8249;
AsciiCode[155] = 8250;
AsciiCode[128] = 8364;
AsciiCode[153] = 8482;


Just for completeness, here is another method to achieve the conversion in a simpler way that does not require to first generate the character and then check if it is greather than 255:

Code: Select all

@if (@CodeSection == @Batch) @begin

@echo off
Cscript //nologo //E:JScript "%~F0"
goto :EOF

@end

var i, code, AsciiCode = new Array();

AsciiCode[128] = 8364;
AsciiCode[130] = 8218;
AsciiCode[131] = 401;
AsciiCode[132] = 8222;
AsciiCode[133] = 8230;
AsciiCode[134] = 8224;
AsciiCode[135] = 8225;
AsciiCode[136] = 710;
AsciiCode[137] = 8240;
AsciiCode[138] = 352;
AsciiCode[139] = 8249;
AsciiCode[140] = 338;
AsciiCode[142] = 381;
AsciiCode[145] = 8216;
AsciiCode[146] = 8217;
AsciiCode[147] = 8220;
AsciiCode[148] = 8221;
AsciiCode[149] = 8226;
AsciiCode[150] = 8211;
AsciiCode[151] = 8212;
AsciiCode[152] = 732;
AsciiCode[153] = 8482;
AsciiCode[154] = 353;
AsciiCode[155] = 8250;
AsciiCode[156] = 339;
AsciiCode[158] = 382;
AsciiCode[159] = 376;

for ( i = 0; i <= 255; i++ ) {
   code = (AsciiCode[i] != undefined) ? AsciiCode[i] : i;
   WScript.Stdout.Write(String.fromCharCode(code));
}

As you indicated, the characters that appear in the screen are different when they are displayed directly by the JScript program vs. any type of Batch code.

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#27 Post by dbenham » 04 Apr 2014 22:03

I've updated the code in the first post of this thread to version 3.3

The A option (altered content only) can now be combined with the M option (multiline) if S (source is a variable) is also used.

The M option now preserves the original line terminators if the S option is used, thus making it consistent with how it works when reading from stdin.

Two new help options were added:

REPL /?REGEX - Launches Microsoft's regex documentation in your browser

REPL /?REPLACE - Launches Microsoft's regex replace documentation in your browser


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: regex search and replace for batch - Easily edit files!

#28 Post by foxidrive » 05 Apr 2014 04:15

Good to see the new features Dave.


Never mind about the question that was here - I forgot that the | is a modifier and just needed to be escaped.

Badchip
Posts: 1
Joined: 30 May 2014 11:48

Re: regex search and replace for batch - Easily edit files!

#29 Post by Badchip » 30 May 2014 11:55

How to change a string with this batch?

I always used a similar code: repl.exe oldstring newstring "old file">"new file"

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#30 Post by dbenham » 27 Jul 2014 23:05

I've updated the original post on this thread to contain version 4.0. It now supports the N option to enable working with binary files that contain NULL bytes.

A big thanks to penpen for diagnosing the problem with NULL bytes, as well as deriving a work-around.


Dave Benham

Post Reply