regex search and replace for batch - Easily edit files!

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: regex search and replace for batch - Easily edit files!

#31 Post by foxidrive » 28 Jul 2014 04:25

Badchip wrote:How to change a string with this batch?

I always used a similar code: repl.exe oldstring newstring "old file">"new file"


There is a purpose built solution here, by Dave too. Read the thread as I recall there was a limitation which you need to be aware of.

viewtopic.php?f=3&t=5492&hilit=REPLVAR.BAT+version+1.4

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: regex search and replace for batch - Easily edit files!

#32 Post by foxidrive » 28 Jul 2014 04:27

dbenham wrote:I've updated the original post on this thread to contain version 4.0. It now supports the N option to enable working with binary files that contain NULL bytes.

A big thanks to penpen for diagnosing the problem with NULL bytes, as well as deriving a work-around.


Dave Benham


Thanks Dave, and a big thanks to penpen. :thumbsup:

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#33 Post by dbenham » 30 Jul 2014 18:20

Yet one more change - now version 4.1

Reading binary files works properly with any size read as long as Read() is used instead of ReadAll(). So now start reading size 1024, doubling the size each time there is more content to be read. This is a major performance enhancement.

The N option is no longer needed so I removed it. The M option always works properly with binary files.

Binary files containing NULL bytes should not be processed without the M option.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#34 Post by dbenham » 28 Oct 2014 15:24

Updated the initial post with version 5.0 - Return codes have been refined and documented.

It is now possible to differentiate between a call that makes a change to the data, vs. a call that does not make a change, vs. a call that results in an error.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#35 Post by dbenham » 07 Nov 2014 12:27

I've decided to maintain a history of all versions from this point forward, in addition to keeping the most current version in the first post. So I'm posting version 5 here:

Code: Select all

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
::REPL.BAT version 5.0
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?[REGEX|REPLACE]
:::REPL  /V
:::
:::  Performs a global regular expression search and replace operation on
:::  each line of input from stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /?, then prints help documentation
:::  to stdout. If a single argument of /?REGEX, then opens up Microsoft's
:::  JScript regular expression documentation within your browser. If a single
:::  argument of /?REPLACE, then opens up Microsoft's JScript REPLACE
:::  documentation within your browser.
:::
:::  If called with a single argument of /V, case insensitive, then prints
:::  the version of REPL.BAT.
:::
:::  Search  - By default, this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript regex syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default, this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::
:::            For example, $& represents the portion of the source that matched
:::            the entire search pattern, $1 represents the first captured
:::            submatch, $2 the second captured submatch, etc. A $ literal
:::            can be escaped as $$.
:::
:::            An empty replacement string must be represented as "".
:::
:::            Replace substitution pattern syntax is fully documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            B - The Search must match the beginning of a line.
:::                Mostly used with literal searches.
:::
:::            E - The Search must match the end of a line.
:::                Mostly used with literal searches.
:::
:::            V - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            A - Only print altered lines. Unaltered lines are discarded.
:::                If both the M and V options are present, then prints the
:::                entire result if there was a change anywhere in the string.
:::                The A option is incompatible with the M option unless the S
:::                option is also present.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line, thus enabling
:::                search for \n. This also enables preservation of the original
:::                line terminators. If the M option is not present, then every
:::                printed line is termiated with carriage return and line feed.
:::                The M option is incompatible with the A option unless the S
:::                option is also present.
:::
:::                Note: If working with binary data containing NULL bytes,
:::                      then the M option must be used.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences within the Replace string:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \q     -  Quote
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Extended ASCII byte code expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Also enables the \q escape sequence for the Search string.
:::                The other escape sequences are already standard for a regular
:::                expression Search string.
:::
:::                Also modifies the behavior of \xnn in the Search string to work
:::                properly with extended ASCII byte codes.
:::
:::                Extended escape sequences are supported even when the L option
:::                is used. Both Search and Replace support all of the extended
:::                escape sequences if both the X and L opions are combined.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string. Without
:::                the M option, ^ anchors the beginning of the string, and $ the
:::                end of the string. With the M option, ^ anchors the beginning
:::                of a line, and $ the end of a line.
:::
:::  Return Codes:  0 = At least one change was made
:::                     or the /? or /V option was used
:::
:::                 1 = No change was made
:::
:::                 2 = Invalid call syntax or incompatible options
:::
:::                 3 = JScript runtime error, typically due to invalid regex
:::
::: REPL.BAT was written by Dave Benham, with assistance from DosTips user Aacini
::: to get \xnn to work properly with extended ASCII byte codes. Also assistance
::: from DosTips user penpen diagnosing issues reading NULL bytes, along with a
::: workaround. REPL.BAT was originally posted at:
::: http://www.dostips.com/forum/viewtopic.php?f=3&t=3855
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^:::" "" a
    exit /b 0
  ) else if /i "%~1" equ "/?regex" (
    explorer "http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/?replace" (
    explorer "http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/V" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^::(REPL\.BAT version)" "$1" a
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 2
  )
)
echo(%~3|findstr /i "[^SMILEBVXA]" >nul && (
  call :err "Invalid option(s)"
  exit /b 2
)
echo(%~3|findstr /i "M"|findstr /i "A"|findstr /vi "S" >nul && (
  call :err "Incompatible options"
  exit /b 2
)
cscript //E:JScript //nologo "%~f0" %*
exit /b %errorlevel%

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var rtn=1;
try {
  var env=WScript.CreateObject("WScript.Shell").Environment("Process");
  var args=WScript.Arguments;
  var search=args.Item(0);
  var replace=args.Item(1);
  var options="g";
  if (args.length>2) options+=args.Item(2).toLowerCase();
  var multi=(options.indexOf("m")>=0);
  var alterations=(options.indexOf("a")>=0);
  if (alterations) options=options.replace(/a/g,"");
  var srcVar=(options.indexOf("s")>=0);
  if (srcVar) options=options.replace(/s/g,"");
  if (options.indexOf("v")>=0) {
    options=options.replace(/v/g,"");
    search=env(search);
    replace=env(replace);
  }
  if (options.indexOf("x")>=0) {
    options=options.replace(/x/g,"");
    replace=replace.replace(/\\\\/g,"\\B");
    replace=replace.replace(/\\q/g,"\"");
    replace=replace.replace(/\\x80/g,"\\u20AC");
    replace=replace.replace(/\\x82/g,"\\u201A");
    replace=replace.replace(/\\x83/g,"\\u0192");
    replace=replace.replace(/\\x84/g,"\\u201E");
    replace=replace.replace(/\\x85/g,"\\u2026");
    replace=replace.replace(/\\x86/g,"\\u2020");
    replace=replace.replace(/\\x87/g,"\\u2021");
    replace=replace.replace(/\\x88/g,"\\u02C6");
    replace=replace.replace(/\\x89/g,"\\u2030");
    replace=replace.replace(/\\x8[aA]/g,"\\u0160");
    replace=replace.replace(/\\x8[bB]/g,"\\u2039");
    replace=replace.replace(/\\x8[cC]/g,"\\u0152");
    replace=replace.replace(/\\x8[eE]/g,"\\u017D");
    replace=replace.replace(/\\x91/g,"\\u2018");
    replace=replace.replace(/\\x92/g,"\\u2019");
    replace=replace.replace(/\\x93/g,"\\u201C");
    replace=replace.replace(/\\x94/g,"\\u201D");
    replace=replace.replace(/\\x95/g,"\\u2022");
    replace=replace.replace(/\\x96/g,"\\u2013");
    replace=replace.replace(/\\x97/g,"\\u2014");
    replace=replace.replace(/\\x98/g,"\\u02DC");
    replace=replace.replace(/\\x99/g,"\\u2122");
    replace=replace.replace(/\\x9[aA]/g,"\\u0161");
    replace=replace.replace(/\\x9[bB]/g,"\\u203A");
    replace=replace.replace(/\\x9[cC]/g,"\\u0153");
    replace=replace.replace(/\\x9[dD]/g,"\\u009D");
    replace=replace.replace(/\\x9[eE]/g,"\\u017E");
    replace=replace.replace(/\\x9[fF]/g,"\\u0178");
    replace=replace.replace(/\\b/g,"\b");
    replace=replace.replace(/\\f/g,"\f");
    replace=replace.replace(/\\n/g,"\n");
    replace=replace.replace(/\\r/g,"\r");
    replace=replace.replace(/\\t/g,"\t");
    replace=replace.replace(/\\v/g,"\v");
    replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
      function($0,$1,$2){
        return String.fromCharCode(parseInt("0x"+$0.substring(2)));
      }
    );
    replace=replace.replace(/\\B/g,"\\");
    search=search.replace(/\\\\/g,"\\B");
    search=search.replace(/\\q/g,"\"");
    search=search.replace(/\\x80/g,"\\u20AC");
    search=search.replace(/\\x82/g,"\\u201A");
    search=search.replace(/\\x83/g,"\\u0192");
    search=search.replace(/\\x84/g,"\\u201E");
    search=search.replace(/\\x85/g,"\\u2026");
    search=search.replace(/\\x86/g,"\\u2020");
    search=search.replace(/\\x87/g,"\\u2021");
    search=search.replace(/\\x88/g,"\\u02C6");
    search=search.replace(/\\x89/g,"\\u2030");
    search=search.replace(/\\x8[aA]/g,"\\u0160");
    search=search.replace(/\\x8[bB]/g,"\\u2039");
    search=search.replace(/\\x8[cC]/g,"\\u0152");
    search=search.replace(/\\x8[eE]/g,"\\u017D");
    search=search.replace(/\\x91/g,"\\u2018");
    search=search.replace(/\\x92/g,"\\u2019");
    search=search.replace(/\\x93/g,"\\u201C");
    search=search.replace(/\\x94/g,"\\u201D");
    search=search.replace(/\\x95/g,"\\u2022");
    search=search.replace(/\\x96/g,"\\u2013");
    search=search.replace(/\\x97/g,"\\u2014");
    search=search.replace(/\\x98/g,"\\u02DC");
    search=search.replace(/\\x99/g,"\\u2122");
    search=search.replace(/\\x9[aA]/g,"\\u0161");
    search=search.replace(/\\x9[bB]/g,"\\u203A");
    search=search.replace(/\\x9[cC]/g,"\\u0153");
    search=search.replace(/\\x9[dD]/g,"\\u009D");
    search=search.replace(/\\x9[eE]/g,"\\u017E");
    search=search.replace(/\\x9[fF]/g,"\\u0178");
    if (options.indexOf("l")>=0) {
      search=search.replace(/\\b/g,"\b");
      search=search.replace(/\\f/g,"\f");
      search=search.replace(/\\n/g,"\n");
      search=search.replace(/\\r/g,"\r");
      search=search.replace(/\\t/g,"\t");
      search=search.replace(/\\v/g,"\v");
      search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      search=search.replace(/\\B/g,"\\");
    } else search=search.replace(/\\B/g,"\\\\");
  }
  if (options.indexOf("l")>=0) {
    options=options.replace(/l/g,"");
    search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
    replace=replace.replace(/\$/g,"$$$$");
  }
  if (options.indexOf("b")>=0) {
    options=options.replace(/b/g,"");
    search="^"+search
  }
  if (options.indexOf("e")>=0) {
    options=options.replace(/e/g,"");
    search=search+"$"
  }
  var search=new RegExp(search,options);
  var str1, str2;

  if (srcVar) {
    str1=env(args.Item(3));
    str2=str1.replace(search,replace);
    if (!alterations || str1!=str2) if (multi) {
      WScript.Stdout.Write(str2);
    } else {
      WScript.Stdout.WriteLine(str2);
    }
    if (str1!=str2) rtn=0;
  } else if (multi){
    var buf=1024;
    str1="";
    while (!WScript.StdIn.AtEndOfStream) {
      str1+=WScript.StdIn.Read(buf);
      buf*=2
    }
    str2=str1.replace(search,replace);
    WScript.Stdout.Write(str2);
    if (str1!=str2) rtn=0;
  } else {
    while (!WScript.StdIn.AtEndOfStream) {
      str1=WScript.StdIn.ReadLine();
      str2=str1.replace(search,replace);
      if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2);
      if (str1!=str2) rtn=0;
    }
  }
} catch(e) {
  WScript.Stderr.WriteLine("JScript runtime error: "+e.message);
  rtn=3;
}
WScript.Quit(rtn);


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#36 Post by dbenham » 07 Nov 2014 13:27

Version 6 - Major new functionality :!:

A StackOverflow user asked if REPL.BAT could be modified to apply a function like toUpperCase() to the replacement string. See the OP's comment to my answer at http://stackoverflow.com/questions/2674 ... earch-repl.

I thought it should be possible given that the JScript replace method allows the replacement argument to be a function call. Then Aacini realized it could be quite easy to implement using eval(). I refined his technique to use the function's arguments object (renamed as $) to allow for variable number of captured strings.

So version 6 adds the J option that treats the replacement argument as a JScript expression. The expression can reference the following:

$[0] - the string that matched the full regex
$[1] - the first captured submatch
...
$[n] - the final captured submatch
$[n+1] - the offset of the match relative to the beginning of the original string
$[n+2] - the original string

This new feature is really cool 8)

Convert text to upper case:

Code: Select all

echo this IS cool|repl ".+" "$[0].toUpperCase()" j
--output--

Code: Select all

THIS IS COOL

Convert all words to mixed case (init caps):

Code: Select all

echo this IS cool|repl "(\b.)|(.)" "$[1]?$[1].toUpperCase():$[2].toLowerCase()" j
--output--

Code: Select all

This Is Cool

Multiply the 3rd column in a csv by 2 if it is an integer:

Code: Select all

echo 5,6,9,4|repl "^((?:.*?,){2})(\d+)(?=,|$)" "$[1]+$[2]*2" j
--output--

Code: Select all

5,6,18,4


Here is version 6:

Code: Select all

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
::REPL.BAT version 6.0
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?[REGEX|REPLACE]
:::REPL  /V
:::
:::  Performs a global regular expression search and replace operation on
:::  each line of input from stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /?, then prints help documentation
:::  to stdout. If a single argument of /?REGEX, then opens up Microsoft's
:::  JScript regular expression documentation within your browser. If a single
:::  argument of /?REPLACE, then opens up Microsoft's JScript REPLACE
:::  documentation within your browser.
:::
:::  If called with a single argument of /V, case insensitive, then prints
:::  the version of REPL.BAT.
:::
:::  Search  - By default, this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript regex syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default, this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::
:::            For example, $& represents the portion of the source that matched
:::            the entire search pattern, $1 represents the first captured
:::            submatch, $2 the second captured submatch, etc. A $ literal
:::            can be escaped as $$.
:::
:::            An empty replacement string must be represented as "".
:::
:::            Replace substitution pattern syntax is fully documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            A - Only print altered lines. Unaltered lines are discarded.
:::                If both the M and S options are present, then prints the
:::                entire result if there was a change anywhere in the string.
:::                The A option is incompatible with the M option unless the S
:::                option is also present.
:::
:::            B - The Search must match the beginning of a line.
:::                Mostly used with literal searches.
:::
:::            E - The Search must match the end of a line.
:::                Mostly used with literal searches.
:::
:::            I - Makes the search case-insensitive.
:::
:::            J - The Replace argument represents a JScript expression.
:::                The expression may access an array like arguments object
:::                named $. However, $ is not a true array object.
:::
:::                The $.length property contains the total number of arguments
:::                available. The $.length value is equal to n+3, where n is the
:::                number of capturing left parentheses within the Search string.
:::
:::                $[0] is the substring that matched the Search,
:::                $[1] through $[n] are the captured submatch strings,
:::                $[n+1] is the offset where the match occurred, and
:::                $[n+2] is the original source string.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in the Replace string
:::                are treated as $ literals.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line, thus enabling
:::                search for \n. This also enables preservation of the original
:::                line terminators. If the M option is not present, then every
:::                printed line is terminated with carriage return and line feed.
:::                The M option is incompatible with the A option unless the S
:::                option is also present.
:::
:::                Note: If working with binary data containing NULL bytes,
:::                      then the M option must be used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string. Without
:::                the M option, ^ anchors the beginning of the string, and $ the
:::                end of the string. With the M option, ^ anchors the beginning
:::                of a line, and $ the end of a line.
:::
:::            V - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences within the Replace string:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \q     -  Quote
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Extended ASCII byte code expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Also enables the \q escape sequence for the Search string.
:::                The other escape sequences are already standard for a regular
:::                expression Search string.
:::
:::                Also modifies the behavior of \xnn in the Search string to work
:::                properly with extended ASCII byte codes.
:::
:::                Extended escape sequences are supported even when the L option
:::                is used. Both Search and Replace support all of the extended
:::                escape sequences if both the X and L opions are combined.
:::
:::  Return Codes:  0 = At least one change was made
:::                     or the /? or /V option was used
:::
:::                 1 = No change was made
:::
:::                 2 = Invalid call syntax or incompatible options
:::
:::                 3 = JScript runtime error, typically due to invalid regex
:::
::: REPL.BAT was written by Dave Benham, with assistance from DosTips user Aacini
::: to get \xnn to work properly with extended ASCII byte codes. Also assistance
::: from DosTips user penpen diagnosing issues reading NULL bytes, along with a
::: workaround. REPL.BAT was originally posted at:
::: http://www.dostips.com/forum/viewtopic.php?f=3&t=3855
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^:::" "" a
    exit /b 0
  ) else if /i "%~1" equ "/?regex" (
    explorer "http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/?replace" (
    explorer "http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/V" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^::(REPL\.BAT version)" "$1" a
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 2
  )
)
echo(%~3|findstr /i "[^SMILEBVXAJ]" >nul && (
  call :err "Invalid option(s)"
  exit /b 2
)
echo(%~3|findstr /i "M"|findstr /i "A"|findstr /vi "S" >nul && (
  call :err "Incompatible options"
  exit /b 2
)
cscript //E:JScript //nologo "%~f0" %*
exit /b %errorlevel%

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var rtn=1;
try {
  var env=WScript.CreateObject("WScript.Shell").Environment("Process");
  var args=WScript.Arguments;
  var search=args.Item(0);
  var replace=args.Item(1);
  var options="g";
  if (args.length>2) options+=args.Item(2).toLowerCase();
  var multi=(options.indexOf("m")>=0);
  var alterations=(options.indexOf("a")>=0);
  if (alterations) options=options.replace(/a/g,"");
  var srcVar=(options.indexOf("s")>=0);
  if (srcVar) options=options.replace(/s/g,"");
  var jexpr=(options.indexOf("j")>=0);
  if (jexpr) options=options.replace(/j/g,"");
  if (options.indexOf("v")>=0) {
    options=options.replace(/v/g,"");
    search=env(search);
    replace=env(replace);
  }
  if (options.indexOf("x")>=0) {
    options=options.replace(/x/g,"");
    if (!jexpr) {
      replace=replace.replace(/\\\\/g,"\\B");
      replace=replace.replace(/\\q/g,"\"");
      replace=replace.replace(/\\x80/g,"\\u20AC");
      replace=replace.replace(/\\x82/g,"\\u201A");
      replace=replace.replace(/\\x83/g,"\\u0192");
      replace=replace.replace(/\\x84/g,"\\u201E");
      replace=replace.replace(/\\x85/g,"\\u2026");
      replace=replace.replace(/\\x86/g,"\\u2020");
      replace=replace.replace(/\\x87/g,"\\u2021");
      replace=replace.replace(/\\x88/g,"\\u02C6");
      replace=replace.replace(/\\x89/g,"\\u2030");
      replace=replace.replace(/\\x8[aA]/g,"\\u0160");
      replace=replace.replace(/\\x8[bB]/g,"\\u2039");
      replace=replace.replace(/\\x8[cC]/g,"\\u0152");
      replace=replace.replace(/\\x8[eE]/g,"\\u017D");
      replace=replace.replace(/\\x91/g,"\\u2018");
      replace=replace.replace(/\\x92/g,"\\u2019");
      replace=replace.replace(/\\x93/g,"\\u201C");
      replace=replace.replace(/\\x94/g,"\\u201D");
      replace=replace.replace(/\\x95/g,"\\u2022");
      replace=replace.replace(/\\x96/g,"\\u2013");
      replace=replace.replace(/\\x97/g,"\\u2014");
      replace=replace.replace(/\\x98/g,"\\u02DC");
      replace=replace.replace(/\\x99/g,"\\u2122");
      replace=replace.replace(/\\x9[aA]/g,"\\u0161");
      replace=replace.replace(/\\x9[bB]/g,"\\u203A");
      replace=replace.replace(/\\x9[cC]/g,"\\u0153");
      replace=replace.replace(/\\x9[dD]/g,"\\u009D");
      replace=replace.replace(/\\x9[eE]/g,"\\u017E");
      replace=replace.replace(/\\x9[fF]/g,"\\u0178");
      replace=replace.replace(/\\b/g,"\b");
      replace=replace.replace(/\\f/g,"\f");
      replace=replace.replace(/\\n/g,"\n");
      replace=replace.replace(/\\r/g,"\r");
      replace=replace.replace(/\\t/g,"\t");
      replace=replace.replace(/\\v/g,"\v");
      replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      replace=replace.replace(/\\B/g,"\\");
    }
    search=search.replace(/\\\\/g,"\\B");
    search=search.replace(/\\q/g,"\"");
    search=search.replace(/\\x80/g,"\\u20AC");
    search=search.replace(/\\x82/g,"\\u201A");
    search=search.replace(/\\x83/g,"\\u0192");
    search=search.replace(/\\x84/g,"\\u201E");
    search=search.replace(/\\x85/g,"\\u2026");
    search=search.replace(/\\x86/g,"\\u2020");
    search=search.replace(/\\x87/g,"\\u2021");
    search=search.replace(/\\x88/g,"\\u02C6");
    search=search.replace(/\\x89/g,"\\u2030");
    search=search.replace(/\\x8[aA]/g,"\\u0160");
    search=search.replace(/\\x8[bB]/g,"\\u2039");
    search=search.replace(/\\x8[cC]/g,"\\u0152");
    search=search.replace(/\\x8[eE]/g,"\\u017D");
    search=search.replace(/\\x91/g,"\\u2018");
    search=search.replace(/\\x92/g,"\\u2019");
    search=search.replace(/\\x93/g,"\\u201C");
    search=search.replace(/\\x94/g,"\\u201D");
    search=search.replace(/\\x95/g,"\\u2022");
    search=search.replace(/\\x96/g,"\\u2013");
    search=search.replace(/\\x97/g,"\\u2014");
    search=search.replace(/\\x98/g,"\\u02DC");
    search=search.replace(/\\x99/g,"\\u2122");
    search=search.replace(/\\x9[aA]/g,"\\u0161");
    search=search.replace(/\\x9[bB]/g,"\\u203A");
    search=search.replace(/\\x9[cC]/g,"\\u0153");
    search=search.replace(/\\x9[dD]/g,"\\u009D");
    search=search.replace(/\\x9[eE]/g,"\\u017E");
    search=search.replace(/\\x9[fF]/g,"\\u0178");
    if (options.indexOf("l")>=0) {
      search=search.replace(/\\b/g,"\b");
      search=search.replace(/\\f/g,"\f");
      search=search.replace(/\\n/g,"\n");
      search=search.replace(/\\r/g,"\r");
      search=search.replace(/\\t/g,"\t");
      search=search.replace(/\\v/g,"\v");
      search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      search=search.replace(/\\B/g,"\\");
    } else search=search.replace(/\\B/g,"\\\\");
  }
  if (options.indexOf("l")>=0) {
    options=options.replace(/l/g,"");
    search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
    replace=replace.replace(/\$/g,"$$$$");
  }
  if (options.indexOf("b")>=0) {
    options=options.replace(/b/g,"");
    search="^"+search
  }
  if (options.indexOf("e")>=0) {
    options=options.replace(/e/g,"");
    search=search+"$"
  }
  var search=new RegExp(search,options);
  var str1, str2;

  if (srcVar) {
    str1=env(args.Item(3));
    str2=str1.replace(search,jexpr?replFunc:replace);
    if (!alterations || str1!=str2) if (multi) {
      WScript.Stdout.Write(str2);
    } else {
      WScript.Stdout.WriteLine(str2);
    }
    if (str1!=str2) rtn=0;
  } else if (multi){
    var buf=1024;
    str1="";
    while (!WScript.StdIn.AtEndOfStream) {
      str1+=WScript.StdIn.Read(buf);
      buf*=2
    }
    str2=str1.replace(search,jexpr?replFunc:replace);
    WScript.Stdout.Write(str2);
    if (str1!=str2) rtn=0;
  } else {
    while (!WScript.StdIn.AtEndOfStream) {
      str1=WScript.StdIn.ReadLine();
      str2=str1.replace(search,jexpr?replFunc:replace);
      if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2);
      if (str1!=str2) rtn=0;
    }
  }
} catch(e) {
  WScript.Stderr.WriteLine("JScript runtime error: "+e.message);
  rtn=3;
}
WScript.Quit(rtn);

function replFunc() {
  var $=arguments;
  return(eval(replace));
}


Dave Benham

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: regex search and replace for batch - Easily edit files!

#37 Post by Aacini » 07 Nov 2014 14:11

Pretty cool! 8)

Why did you converted function's arguments into an array? You may directly use function arguments as in my original code. The only drawback is to declare enough number of parameters in order to accomodate the maximum number of subexpressions used. For example:

Code: Select all

fileContents.replace(search,JSexpr?function($0,$1,$2,$3,$4,$5,$6,$7){return(eval(replace))}:replace));

This point allows to write a cleaner replacement expression without all those square braquets. I like the next example! :D

Code: Select all

C:\> echo this IS cool| "FindRepl Modified.bat" "(\b.)|(.)" "$1?$1.toUpperCase():$2.toLowerCase()" /J
This Is Cool

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#38 Post by dbenham » 07 Nov 2014 16:08

Aacini wrote:Pretty cool! 8)

Why did you converted function's arguments into an array? You may directly use function arguments as in my original code. The only drawback is to declare enough number of parameters in order to accomodate the maximum number of subexpressions used.

The bolded quote is exactly why - I don't want to guess how many captured submatches I should allow.

I find it humorous that when dealing with batch environment variable "arrays", I use var1 or var.1 without any brackets, precisely because they are "pesky", and they are not needed. Yet you seem to always use var[1]. You don't seem to mind the "pesky" square brackets in that context. :lol:

But in this JScript context our perspectives are reversed, except I think there is a very good reason for sticking with the square brackets - to guarantee I can access all of the arguments.

I suppose I could define arguments $1 through $N, and the user could use them, and revert to $[M] for M values greater than N.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#39 Post by dbenham » 07 Nov 2014 16:27

Here is version 6.1 that gives the best of both worlds. You can use $0 - $10 for arguments $[0] - $[10]. Square brackets must be used for arguments above 10.

Code: Select all

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
::REPL.BAT version 6.1
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?[REGEX|REPLACE]
:::REPL  /V
:::
:::  Performs a global regular expression search and replace operation on
:::  each line of input from stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /?, then prints help documentation
:::  to stdout. If a single argument of /?REGEX, then opens up Microsoft's
:::  JScript regular expression documentation within your browser. If a single
:::  argument of /?REPLACE, then opens up Microsoft's JScript REPLACE
:::  documentation within your browser.
:::
:::  If called with a single argument of /V, case insensitive, then prints
:::  the version of REPL.BAT.
:::
:::  Search  - By default, this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript regex syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default, this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::
:::            For example, $& represents the portion of the source that matched
:::            the entire search pattern, $1 represents the first captured
:::            submatch, $2 the second captured submatch, etc. A $ literal
:::            can be escaped as $$.
:::
:::            An empty replacement string must be represented as "".
:::
:::            Replace substitution pattern syntax is fully documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            A - Only print altered lines. Unaltered lines are discarded.
:::                If both the M and S options are present, then prints the
:::                entire result if there was a change anywhere in the string.
:::                The A option is incompatible with the M option unless the S
:::                option is also present.
:::
:::            B - The Search must match the beginning of a line.
:::                Mostly used with literal searches.
:::
:::            E - The Search must match the end of a line.
:::                Mostly used with literal searches.
:::
:::            I - Makes the search case-insensitive.
:::
:::            J - The Replace argument represents a JScript expression.
:::                The expression may access an array like arguments object
:::                named $. However, $ is not a true array object.
:::
:::                The $.length property contains the total number of arguments
:::                available. The $.length value is equal to n+3, where n is the
:::                number of capturing left parentheses within the Search string.
:::
:::                $[0] is the substring that matched the Search,
:::                $[1] through $[n] are the captured submatch strings,
:::                $[n+1] is the offset where the match occurred, and
:::                $[n+2] is the original source string.
:::
:::                Arguments $[0] through $[10] may be abbreviated as
:::                $1 through $10. Argument $[11] and above must use the square
:::                bracket notation.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in the Replace string
:::                are treated as $ literals.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line, thus enabling
:::                search for \n. This also enables preservation of the original
:::                line terminators. If the M option is not present, then every
:::                printed line is terminated with carriage return and line feed.
:::                The M option is incompatible with the A option unless the S
:::                option is also present.
:::
:::                Note: If working with binary data containing NULL bytes,
:::                      then the M option must be used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string. Without
:::                the M option, ^ anchors the beginning of the string, and $ the
:::                end of the string. With the M option, ^ anchors the beginning
:::                of a line, and $ the end of a line.
:::
:::            V - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences within the Replace string:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \q     -  Quote
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Extended ASCII byte code expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Also enables the \q escape sequence for the Search string.
:::                The other escape sequences are already standard for a regular
:::                expression Search string.
:::
:::                Also modifies the behavior of \xnn in the Search string to work
:::                properly with extended ASCII byte codes.
:::
:::                Extended escape sequences are supported even when the L option
:::                is used. Both Search and Replace support all of the extended
:::                escape sequences if both the X and L opions are combined.
:::
:::  Return Codes:  0 = At least one change was made
:::                     or the /? or /V option was used
:::
:::                 1 = No change was made
:::
:::                 2 = Invalid call syntax or incompatible options
:::
:::                 3 = JScript runtime error, typically due to invalid regex
:::
::: REPL.BAT was written by Dave Benham, with assistance from DosTips user Aacini
::: to get \xnn to work properly with extended ASCII byte codes. Also assistance
::: from DosTips user penpen diagnosing issues reading NULL bytes, along with a
::: workaround. REPL.BAT was originally posted at:
::: http://www.dostips.com/forum/viewtopic.php?f=3&t=3855
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^:::" "" a
    exit /b 0
  ) else if /i "%~1" equ "/?regex" (
    explorer "http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/?replace" (
    explorer "http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/V" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^::(REPL\.BAT version)" "$1" a
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 2
  )
)
echo(%~3|findstr /i "[^SMILEBVXAJ]" >nul && (
  call :err "Invalid option(s)"
  exit /b 2
)
echo(%~3|findstr /i "M"|findstr /i "A"|findstr /vi "S" >nul && (
  call :err "Incompatible options"
  exit /b 2
)
cscript //E:JScript //nologo "%~f0" %*
exit /b %errorlevel%

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var rtn=1;
try {
  var env=WScript.CreateObject("WScript.Shell").Environment("Process");
  var args=WScript.Arguments;
  var search=args.Item(0);
  var replace=args.Item(1);
  var options="g";
  if (args.length>2) options+=args.Item(2).toLowerCase();
  var multi=(options.indexOf("m")>=0);
  var alterations=(options.indexOf("a")>=0);
  if (alterations) options=options.replace(/a/g,"");
  var srcVar=(options.indexOf("s")>=0);
  if (srcVar) options=options.replace(/s/g,"");
  var jexpr=(options.indexOf("j")>=0);
  if (jexpr) options=options.replace(/j/g,"");
  if (options.indexOf("v")>=0) {
    options=options.replace(/v/g,"");
    search=env(search);
    replace=env(replace);
  }
  if (options.indexOf("x")>=0) {
    options=options.replace(/x/g,"");
    if (!jexpr) {
      replace=replace.replace(/\\\\/g,"\\B");
      replace=replace.replace(/\\q/g,"\"");
      replace=replace.replace(/\\x80/g,"\\u20AC");
      replace=replace.replace(/\\x82/g,"\\u201A");
      replace=replace.replace(/\\x83/g,"\\u0192");
      replace=replace.replace(/\\x84/g,"\\u201E");
      replace=replace.replace(/\\x85/g,"\\u2026");
      replace=replace.replace(/\\x86/g,"\\u2020");
      replace=replace.replace(/\\x87/g,"\\u2021");
      replace=replace.replace(/\\x88/g,"\\u02C6");
      replace=replace.replace(/\\x89/g,"\\u2030");
      replace=replace.replace(/\\x8[aA]/g,"\\u0160");
      replace=replace.replace(/\\x8[bB]/g,"\\u2039");
      replace=replace.replace(/\\x8[cC]/g,"\\u0152");
      replace=replace.replace(/\\x8[eE]/g,"\\u017D");
      replace=replace.replace(/\\x91/g,"\\u2018");
      replace=replace.replace(/\\x92/g,"\\u2019");
      replace=replace.replace(/\\x93/g,"\\u201C");
      replace=replace.replace(/\\x94/g,"\\u201D");
      replace=replace.replace(/\\x95/g,"\\u2022");
      replace=replace.replace(/\\x96/g,"\\u2013");
      replace=replace.replace(/\\x97/g,"\\u2014");
      replace=replace.replace(/\\x98/g,"\\u02DC");
      replace=replace.replace(/\\x99/g,"\\u2122");
      replace=replace.replace(/\\x9[aA]/g,"\\u0161");
      replace=replace.replace(/\\x9[bB]/g,"\\u203A");
      replace=replace.replace(/\\x9[cC]/g,"\\u0153");
      replace=replace.replace(/\\x9[dD]/g,"\\u009D");
      replace=replace.replace(/\\x9[eE]/g,"\\u017E");
      replace=replace.replace(/\\x9[fF]/g,"\\u0178");
      replace=replace.replace(/\\b/g,"\b");
      replace=replace.replace(/\\f/g,"\f");
      replace=replace.replace(/\\n/g,"\n");
      replace=replace.replace(/\\r/g,"\r");
      replace=replace.replace(/\\t/g,"\t");
      replace=replace.replace(/\\v/g,"\v");
      replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      replace=replace.replace(/\\B/g,"\\");
    }
    search=search.replace(/\\\\/g,"\\B");
    search=search.replace(/\\q/g,"\"");
    search=search.replace(/\\x80/g,"\\u20AC");
    search=search.replace(/\\x82/g,"\\u201A");
    search=search.replace(/\\x83/g,"\\u0192");
    search=search.replace(/\\x84/g,"\\u201E");
    search=search.replace(/\\x85/g,"\\u2026");
    search=search.replace(/\\x86/g,"\\u2020");
    search=search.replace(/\\x87/g,"\\u2021");
    search=search.replace(/\\x88/g,"\\u02C6");
    search=search.replace(/\\x89/g,"\\u2030");
    search=search.replace(/\\x8[aA]/g,"\\u0160");
    search=search.replace(/\\x8[bB]/g,"\\u2039");
    search=search.replace(/\\x8[cC]/g,"\\u0152");
    search=search.replace(/\\x8[eE]/g,"\\u017D");
    search=search.replace(/\\x91/g,"\\u2018");
    search=search.replace(/\\x92/g,"\\u2019");
    search=search.replace(/\\x93/g,"\\u201C");
    search=search.replace(/\\x94/g,"\\u201D");
    search=search.replace(/\\x95/g,"\\u2022");
    search=search.replace(/\\x96/g,"\\u2013");
    search=search.replace(/\\x97/g,"\\u2014");
    search=search.replace(/\\x98/g,"\\u02DC");
    search=search.replace(/\\x99/g,"\\u2122");
    search=search.replace(/\\x9[aA]/g,"\\u0161");
    search=search.replace(/\\x9[bB]/g,"\\u203A");
    search=search.replace(/\\x9[cC]/g,"\\u0153");
    search=search.replace(/\\x9[dD]/g,"\\u009D");
    search=search.replace(/\\x9[eE]/g,"\\u017E");
    search=search.replace(/\\x9[fF]/g,"\\u0178");
    if (options.indexOf("l")>=0) {
      search=search.replace(/\\b/g,"\b");
      search=search.replace(/\\f/g,"\f");
      search=search.replace(/\\n/g,"\n");
      search=search.replace(/\\r/g,"\r");
      search=search.replace(/\\t/g,"\t");
      search=search.replace(/\\v/g,"\v");
      search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      search=search.replace(/\\B/g,"\\");
    } else search=search.replace(/\\B/g,"\\\\");
  }
  if (options.indexOf("l")>=0) {
    options=options.replace(/l/g,"");
    search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
    replace=replace.replace(/\$/g,"$$$$");
  }
  if (options.indexOf("b")>=0) {
    options=options.replace(/b/g,"");
    search="^"+search
  }
  if (options.indexOf("e")>=0) {
    options=options.replace(/e/g,"");
    search=search+"$"
  }
  var search=new RegExp(search,options);
  var str1, str2;

  if (srcVar) {
    str1=env(args.Item(3));
    str2=str1.replace(search,jexpr?replFunc:replace);
    if (!alterations || str1!=str2) if (multi) {
      WScript.Stdout.Write(str2);
    } else {
      WScript.Stdout.WriteLine(str2);
    }
    if (str1!=str2) rtn=0;
  } else if (multi){
    var buf=1024;
    str1="";
    while (!WScript.StdIn.AtEndOfStream) {
      str1+=WScript.StdIn.Read(buf);
      buf*=2
    }
    str2=str1.replace(search,jexpr?replFunc:replace);
    WScript.Stdout.Write(str2);
    if (str1!=str2) rtn=0;
  } else {
    while (!WScript.StdIn.AtEndOfStream) {
      str1=WScript.StdIn.ReadLine();
      str2=str1.replace(search,jexpr?replFunc:replace);
      if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2);
      if (str1!=str2) rtn=0;
    }
  }
} catch(e) {
  WScript.Stderr.WriteLine("JScript runtime error: "+e.message);
  rtn=3;
}
WScript.Quit(rtn);

function replFunc($0, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10) {
  var $=arguments;
  return(eval(replace));
}


Dave Benham

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: regex search and replace for batch - Easily edit files!

#40 Post by Aacini » 08 Nov 2014 01:10

dbenham wrote:I find it humorous that when dealing with batch environment variable "arrays", I use var1 or var.1 without any brackets, precisely because they are "pesky", and they are not needed. Yet you seem to always use var[1]. You don't seem to mind the "pesky" square brackets in that context. :lol:

But in this JScript context our perspectives are reversed, except I think there is a very good reason for sticking with the square brackets - to guarantee I can access all of the arguments.


Dave Benham

My reasons are exactly the same in both cases!

When an "array" is used in a Batch file, no matter how the array element is written (var1 or var.1 or var[1]) the result is exactly the same. However, the three forms may make a huge difference from programmers point of view, especially if they have not enough experience. If a beginner see "var[1]" form and don't understand it, he/she may easily find information about this form as an "array" that include all required details, but the other forms are harder to understand and frequently require additional explanations. If we can write arrays in several forms, why don't write they in the clearer form in the benefit of unexperienced programmers? Anyway, you have to write array elements with square braquets in most other programming languages, so I don't see any reason to not do the same thing in Batch files.

(I have the same opinion about using special characters as replaceable parameters in FOR commands; I don't see any reason to not use a letter and insert a strange character instead, that just made the FOR more difficult to read. If your code is clear/standard/common, it will have less problems when it will be used by other people).

Accordingly to JScript documentation, the replaceText string may access the captured submatched substrings via a dollar-sign and a digit: $1, $2, $3, etc. The replaceText may also be a function, but in this case the parameters of the function could be named in any way, for example: "function(A,B,C,D)", in such a case the submatched substrings could be accessed via B, C, D, etc. parameters in the function body. However, the writers of the JScript documentation choose to name they in the same way of the original feature: "function($0,$1,$2,$3)"; this way, $1 always refers to the first submatched substring no matter if these characters appears in the replacement text as a string or in the body of the replacement function, and hence, in the JScript expression provided by the user of the program (for example, REPL.BAT or FindRepl.bat). I don't see any reason to not follow this standard.

I think that regular expressions is a topic difficult to comprehend, especially to beginners, so introducing a different way to access submatched substrings via array elements, that are not documented at any other place excepting in your program, is a bad idea. If this change was dictated by a technicall requirement, I think that is preferable to set a restriction instead ("You may define a maximum of N subexpressions"). Anyway, you may set this limit as high as you want; you just need to write a long "function($0,$1,...,$9,$10,...)" line; the documentation specifies that the maximum captured submatched subexpression is $99. Remember that these nuisances are in the benefit of unexperienced programmers.

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#41 Post by dbenham » 11 Nov 2014 12:04

I am ceasing development of REPL.BAT, other than possible bug fixes. All new enhancements will be in a new utility called JREPL.BAT. I am changing the name because the syntax has changed, so it is not backward compatible with REPL.BAT. The new utility is available here: viewtopic.php?f=3&t=6044

@Aacini

I do agree $1 is much more convenient than $[1]. But, I don't want to blindly define the maximum number of arguments ($0 - $101) because there is significant overhead in declaring function arguments that are not used.

I realize now that a test replace can be performed to derive the number of capturing left parentheses, and then the function can be dynamically built using eval(). I will not bother making a change to REPL.BAT. That feature will go into the new JREPL.BAT utility.


Dave Benham
Last edited by dbenham on 14 Nov 2014 15:52, edited 1 time in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#42 Post by dbenham » 11 Nov 2014 12:08

Here is version 6.2 (perhaps the last ever version). It fixes a bug - the replacement JScript expression should not escape $ if L is used. It also corrects documentation of the A option.

REPL.BAT version 6.2

Code: Select all

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
::REPL.BAT version 6.2
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?[REGEX|REPLACE]
:::REPL  /V
:::
:::  Performs a global regular expression search and replace operation on
:::  each line of input from stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /?, then prints help documentation
:::  to stdout. If a single argument of /?REGEX, then opens up Microsoft's
:::  JScript regular expression documentation within your browser. If a single
:::  argument of /?REPLACE, then opens up Microsoft's JScript REPLACE
:::  documentation within your browser.
:::
:::  If called with a single argument of /V, case insensitive, then prints
:::  the version of REPL.BAT.
:::
:::  Search  - By default, this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript regex syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default, this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::
:::            For example, $& represents the portion of the source that matched
:::            the entire search pattern, $1 represents the first captured
:::            submatch, $2 the second captured submatch, etc. A $ literal
:::            can be escaped as $$.
:::
:::            An empty replacement string must be represented as "".
:::
:::            Replace substitution pattern syntax is fully documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            A - Only print altered lines. Unaltered lines are discarded.
:::                If the S options is present, then prints the result only if
:::                there was a change anywhere in the string. The A option is
:::                incompatible with the M option unless the S option is present.
:::
:::            B - The Search must match the beginning of a line.
:::                Mostly used with literal searches.
:::
:::            E - The Search must match the end of a line.
:::                Mostly used with literal searches.
:::
:::            I - Makes the search case-insensitive.
:::
:::            J - The Replace argument represents a JScript expression.
:::                The expression may access an array like arguments object
:::                named $. However, $ is not a true array object.
:::
:::                The $.length property contains the total number of arguments
:::                available. The $.length value is equal to n+3, where n is the
:::                number of capturing left parentheses within the Search string.
:::
:::                $[0] is the substring that matched the Search,
:::                $[1] through $[n] are the captured submatch strings,
:::                $[n+1] is the offset where the match occurred, and
:::                $[n+2] is the original source string.
:::
:::                Arguments $[0] through $[10] may be abbreviated as
:::                $1 through $10. Argument $[11] and above must use the square
:::                bracket notation.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in the Replace string
:::                are treated as $ literals.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line, thus enabling
:::                search for \n. This also enables preservation of the original
:::                line terminators. If the M option is not present, then every
:::                printed line is terminated with carriage return and line feed.
:::                The M option is incompatible with the A option unless the S
:::                option is also present.
:::
:::                Note: If working with binary data containing NULL bytes,
:::                      then the M option must be used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string. Without
:::                the M option, ^ anchors the beginning of the string, and $ the
:::                end of the string. With the M option, ^ anchors the beginning
:::                of a line, and $ the end of a line.
:::
:::            V - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences within the Replace string:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \q     -  Quote
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Extended ASCII byte code expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Also enables the \q escape sequence for the Search string.
:::                The other escape sequences are already standard for a regular
:::                expression Search string.
:::
:::                Also modifies the behavior of \xnn in the Search string to work
:::                properly with extended ASCII byte codes.
:::
:::                Extended escape sequences are supported even when the L option
:::                is used. Both Search and Replace support all of the extended
:::                escape sequences if both the X and L opions are combined.
:::
:::  Return Codes:  0 = At least one change was made
:::                     or the /? or /V option was used
:::
:::                 1 = No change was made
:::
:::                 2 = Invalid call syntax or incompatible options
:::
:::                 3 = JScript runtime error, typically due to invalid regex
:::
::: REPL.BAT was written by Dave Benham, with assistance from DosTips user Aacini
::: to get \xnn to work properly with extended ASCII byte codes. Also assistance
::: from DosTips user penpen diagnosing issues reading NULL bytes, along with a
::: workaround. REPL.BAT was originally posted at:
::: http://www.dostips.com/forum/viewtopic.php?f=3&t=3855
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^:::" "" a
    exit /b 0
  ) else if /i "%~1" equ "/?regex" (
    explorer "http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/?replace" (
    explorer "http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx"
    exit /b 0
  ) else if /i "%~1" equ "/V" (
    <"%~f0" cscript //E:JScript //nologo "%~f0" "^::(REPL\.BAT version)" "$1" a
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 2
  )
)
echo(%~3|findstr /i "[^SMILEBVXAJ]" >nul && (
  call :err "Invalid option(s)"
  exit /b 2
)
echo(%~3|findstr /i "M"|findstr /i "A"|findstr /vi "S" >nul && (
  call :err "Incompatible options"
  exit /b 2
)
cscript //E:JScript //nologo "%~f0" %*
exit /b %errorlevel%

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var rtn=1;
try {
  var env=WScript.CreateObject("WScript.Shell").Environment("Process");
  var args=WScript.Arguments;
  var search=args.Item(0);
  var replace=args.Item(1);
  var options="g";
  if (args.length>2) options+=args.Item(2).toLowerCase();
  var multi=(options.indexOf("m")>=0);
  var alterations=(options.indexOf("a")>=0);
  if (alterations) options=options.replace(/a/g,"");
  var srcVar=(options.indexOf("s")>=0);
  if (srcVar) options=options.replace(/s/g,"");
  var jexpr=(options.indexOf("j")>=0);
  if (jexpr) options=options.replace(/j/g,"");
  if (options.indexOf("v")>=0) {
    options=options.replace(/v/g,"");
    search=env(search);
    replace=env(replace);
  }
  if (options.indexOf("x")>=0) {
    options=options.replace(/x/g,"");
    if (!jexpr) {
      replace=replace.replace(/\\\\/g,"\\B");
      replace=replace.replace(/\\q/g,"\"");
      replace=replace.replace(/\\x80/g,"\\u20AC");
      replace=replace.replace(/\\x82/g,"\\u201A");
      replace=replace.replace(/\\x83/g,"\\u0192");
      replace=replace.replace(/\\x84/g,"\\u201E");
      replace=replace.replace(/\\x85/g,"\\u2026");
      replace=replace.replace(/\\x86/g,"\\u2020");
      replace=replace.replace(/\\x87/g,"\\u2021");
      replace=replace.replace(/\\x88/g,"\\u02C6");
      replace=replace.replace(/\\x89/g,"\\u2030");
      replace=replace.replace(/\\x8[aA]/g,"\\u0160");
      replace=replace.replace(/\\x8[bB]/g,"\\u2039");
      replace=replace.replace(/\\x8[cC]/g,"\\u0152");
      replace=replace.replace(/\\x8[eE]/g,"\\u017D");
      replace=replace.replace(/\\x91/g,"\\u2018");
      replace=replace.replace(/\\x92/g,"\\u2019");
      replace=replace.replace(/\\x93/g,"\\u201C");
      replace=replace.replace(/\\x94/g,"\\u201D");
      replace=replace.replace(/\\x95/g,"\\u2022");
      replace=replace.replace(/\\x96/g,"\\u2013");
      replace=replace.replace(/\\x97/g,"\\u2014");
      replace=replace.replace(/\\x98/g,"\\u02DC");
      replace=replace.replace(/\\x99/g,"\\u2122");
      replace=replace.replace(/\\x9[aA]/g,"\\u0161");
      replace=replace.replace(/\\x9[bB]/g,"\\u203A");
      replace=replace.replace(/\\x9[cC]/g,"\\u0153");
      replace=replace.replace(/\\x9[dD]/g,"\\u009D");
      replace=replace.replace(/\\x9[eE]/g,"\\u017E");
      replace=replace.replace(/\\x9[fF]/g,"\\u0178");
      replace=replace.replace(/\\b/g,"\b");
      replace=replace.replace(/\\f/g,"\f");
      replace=replace.replace(/\\n/g,"\n");
      replace=replace.replace(/\\r/g,"\r");
      replace=replace.replace(/\\t/g,"\t");
      replace=replace.replace(/\\v/g,"\v");
      replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      replace=replace.replace(/\\B/g,"\\");
    }
    search=search.replace(/\\\\/g,"\\B");
    search=search.replace(/\\q/g,"\"");
    search=search.replace(/\\x80/g,"\\u20AC");
    search=search.replace(/\\x82/g,"\\u201A");
    search=search.replace(/\\x83/g,"\\u0192");
    search=search.replace(/\\x84/g,"\\u201E");
    search=search.replace(/\\x85/g,"\\u2026");
    search=search.replace(/\\x86/g,"\\u2020");
    search=search.replace(/\\x87/g,"\\u2021");
    search=search.replace(/\\x88/g,"\\u02C6");
    search=search.replace(/\\x89/g,"\\u2030");
    search=search.replace(/\\x8[aA]/g,"\\u0160");
    search=search.replace(/\\x8[bB]/g,"\\u2039");
    search=search.replace(/\\x8[cC]/g,"\\u0152");
    search=search.replace(/\\x8[eE]/g,"\\u017D");
    search=search.replace(/\\x91/g,"\\u2018");
    search=search.replace(/\\x92/g,"\\u2019");
    search=search.replace(/\\x93/g,"\\u201C");
    search=search.replace(/\\x94/g,"\\u201D");
    search=search.replace(/\\x95/g,"\\u2022");
    search=search.replace(/\\x96/g,"\\u2013");
    search=search.replace(/\\x97/g,"\\u2014");
    search=search.replace(/\\x98/g,"\\u02DC");
    search=search.replace(/\\x99/g,"\\u2122");
    search=search.replace(/\\x9[aA]/g,"\\u0161");
    search=search.replace(/\\x9[bB]/g,"\\u203A");
    search=search.replace(/\\x9[cC]/g,"\\u0153");
    search=search.replace(/\\x9[dD]/g,"\\u009D");
    search=search.replace(/\\x9[eE]/g,"\\u017E");
    search=search.replace(/\\x9[fF]/g,"\\u0178");
    if (options.indexOf("l")>=0) {
      search=search.replace(/\\b/g,"\b");
      search=search.replace(/\\f/g,"\f");
      search=search.replace(/\\n/g,"\n");
      search=search.replace(/\\r/g,"\r");
      search=search.replace(/\\t/g,"\t");
      search=search.replace(/\\v/g,"\v");
      search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
        function($0,$1,$2){
          return String.fromCharCode(parseInt("0x"+$0.substring(2)));
        }
      );
      search=search.replace(/\\B/g,"\\");
    } else search=search.replace(/\\B/g,"\\\\");
  }
  if (options.indexOf("l")>=0) {
    options=options.replace(/l/g,"");
    search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
    if (!jexpr) replace=replace.replace(/\$/g,"$$$$");
  }
  if (options.indexOf("b")>=0) {
    options=options.replace(/b/g,"");
    search="^"+search
  }
  if (options.indexOf("e")>=0) {
    options=options.replace(/e/g,"");
    search=search+"$"
  }
  var search=new RegExp(search,options);
  var str1, str2;

  if (srcVar) {
    str1=env(args.Item(3));
    str2=str1.replace(search,jexpr?replFunc:replace);
    if (!alterations || str1!=str2) if (multi) {
      WScript.Stdout.Write(str2);
    } else {
      WScript.Stdout.WriteLine(str2);
    }
    if (str1!=str2) rtn=0;
  } else if (multi){
    var buf=1024;
    str1="";
    while (!WScript.StdIn.AtEndOfStream) {
      str1+=WScript.StdIn.Read(buf);
      buf*=2
    }
    str2=str1.replace(search,jexpr?replFunc:replace);
    WScript.Stdout.Write(str2);
    if (str1!=str2) rtn=0;
  } else {
    while (!WScript.StdIn.AtEndOfStream) {
      str1=WScript.StdIn.ReadLine();
      str2=str1.replace(search,jexpr?replFunc:replace);
      if (!alterations || str1!=str2) WScript.Stdout.WriteLine(str2);
      if (str1!=str2) rtn=0;
    }
  }
} catch(e) {
  WScript.Stderr.WriteLine("JScript runtime error: "+e.message);
  rtn=3;
}
WScript.Quit(rtn);

function replFunc($0, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10) {
  var $=arguments;
  return(eval(replace));
}


Dave Benham

bsrini
Posts: 1
Joined: 12 Nov 2014 20:47

Re: regex search and replace for batch - Easily edit files!

#43 Post by bsrini » 12 Nov 2014 21:19

Hello Dave,

I am using the latest 6.2 version of repl.bat to search and replace a string using regular expressions....I am running into an issue with a particular use case. Below is the code:

Code: Select all

@echo off
set allGroups=Requirements Contributors
set groupMembers=Smith, John (MIT) (SmithJ), BomarD, Ronald, Karl (non-empl) (RonaldK), Doe, Jane (DoeJ), ZimA
SET group=
SET mbrStr=
echo.

setlocal enabledelayedexpansion
:nextVariable
for /F "tokens=1* delims=;" %%a in ("%allGroups%") do (
      set allGroups=%%b
   set group=%%~a

   for /f "tokens=*" %%a in ("%groupMembers%") do (
      set input1=%%a
      set input2=%%a
      
      for /f "delims=" %%a in ('echo "!input1!," ^|repl "\(non-empl\)" "" ^|repl "\(MIT\)" "" ^|repl ".*?\((.*?)\),(.*?)" "$1," ^|repl " " "" ^|repl "..$" "" ') do set output1=%%a
      set mbrStr=!mbrStr!!output1!
      echo !group!=!mbrStr!

      set mbrStr=
      REM repeat for $2
      for /f "delims=" %%a in ('echo "!input2!," ^|repl "\(non-empl\)" "" ^|repl "\(MIT\)" "" ^|repl ".*?\((.*?)\),(.*?)" "$2," ^|repl " " "" ^|repl "..$" "" ') do set output2=%%a
      set mbrStr=!mbrStr!!output2!
      echo !group!=!mbrStr!
   )
)
if defined allGroups goto nextVariable

goto :EOF


Input in the above script is Smith, John (MIT) (SmithJ), BomarD, Ronald, Karl (non-empl) (RonaldK), Doe, Jane (DoeJ), ZimA
Those userIDs that are in parenthesis ONLY should be output as a comma separated string, excluding anything else in parenthesis like MIT or non-empl.

So, expected string would be:

Requirements Contributors=SmithJ,RonaldK,DoeJ

However, output is:

Requirements Contributors=SmithJ,RonaldK,DoeJ,ZimA
Requirements Contributors=,,,ZimA

I am separating $1 and $2 just to show that ZimA gets included no matter what. Is there any way to tweak the regular expression to show my expected result? I have tried various combinations without success.

Any help would be greatly appreciated.

Thank you,

Balu

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: regex search and replace for batch - Easily edit files!

#44 Post by foxidrive » 13 Nov 2014 03:31

Does this work for you?

Code: Select all

@echo off
set "var=Smith, John (MIT) (SmithJ), BomarD, Ronald, Karl (non-empl) (RonaldK), Doe, Jane (DoeJ), ZimA"
echo %var%|repl "\(.*?\) \((.*?)\)," "($1)," |repl ".*?\((.*?)\),.*?" "$1," |repl "(.*), .*" "$1"
pause
]

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: regex search and replace for batch - Easily edit files!

#45 Post by dbenham » 13 Nov 2014 07:53

The J option is cool 8)

Code: Select all

@echo off
set groupMembers=Smith, John (MIT) (SmithJ), BomarD, Ronald, Karl (non-empl) (RonaldK), Doe, Jane (DoeJ), ZimA
for /f "delims=, tokens=*" %%A in (
  'repl "[^,]*\((.*?)\)(?:,|$)|.*?(?:,|$)" "$1?','+$1:''" js groupMembers'
) do echo Requirements Contributors=%%A

--Output--

Code: Select all

Requirements Contributors=SmithJ,RonaldK,DoeJ


Dave Benham

Post Reply