View unanswered posts | View active topics It is currently 30 Jul 2014 13:12



Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2, 3  Next
New regex utility to search and replace strings in files 
Author Message
Expert

Joined: 06 Dec 2011 22:15
Posts: 710
Location: México City, México
Post New regex utility to search and replace strings in files
FIND.EXE program was the first DOS command designed to search strings in files, followed by FINDSTR.EXE that include a limited regular expression management. The Windows Script Host, present in all Windows versions from XP on, provide an advanced regular expression support that had been used to write "search and replace" JScript and VBScript programs designed to be used in Batch environment, like the large dbenham's REPL.BAT program, or the small foxidrive's sar.bat one.

FindRepl.bat is my own version of this type of programs. I have included several additional features that allows it to output a more extensive set of results, so the same program may be used to solve a wider range of similar problems.

EDIT 2013-07-02: I slightly modified the program below in order to include these details: print submatched substrings enclosed in /Q character (if any), additional /VR switch for sReplace (previously /V worked in both rSearch and sReplace), and eliminate line delimiters when /B:rBlock is "\r\n" and sReplace is "" (previously this replacement works only when no blocks were defined). The program description below also includes these points.

EDIT 2013-07-06: I changed /VR switch by /R and allows to search for "\r\n" when /$ switch is given with no /V nor /N nor blocks.
Code:
@if (@CodeSection == @Batch) @then

:: The first line above is...
:: - in Batch: a valid IF command that does nothing.
:: - in JScript: a conditional compilation IF statement that is false,
::               so this Batch section is omitted until next "arroba-end".


@echo off

rem FindRepl.bat: Utility to search and search/replace strings in a file
rem Antonio Perez Ayala
rem   - June/26/2013: first version.
rem   - July/02/2013: use /Q in submatched substrings, use /VR for /V in Replace, eliminate "\r\n" in blocks.
rem   - July/07/2013: change /VR by /R, search for "\r\n" with /$ and no /V, /N nor blocks.

if "%~1" neq "" if "%~1" neq "/?" goto begin
   < "%~F0" CScript //nologo //E:JScript "%~F0" "^<usage>" /E:"^</usage>" /O:+1:-1
   goto :EOF
:begin
   CScript //nologo //E:JScript "%~F0" %*
   exit /B %errorlevel%
:end

<usage>
Searches for strings in a file, and prints or replaces them.

FINDREPL [/I] [/V] [/N] rSearch [/E:rEndBlk] [/O:s:e] [/B:rBlock] [/$:s1...]
         [[/R] [/A] sReplace] [/Q:c] [/S:sSource]

  /I         Specifies that the search is not to be case-sensitive.
  /V         Prints only lines that do not contain a match.
  /N         Prints the line number before each line that matches.
  rSearch    Text to be searched for, or Start text for a block of lines.
  /E:rEndBlk Text to be searched for the End of a block of lines.
  /O:s:e     Specifies offsets for Starting and Ending lines of blocks.
  /B:rBlock  Text to be searched again in the blocks of lines.
  /$:s1...   Specifies to print saved submatched substrings instead of lines.
  /R         Prints only replaced lines, instead of all file lines.
  /A         Specifies that sReplace has alternative values matching rSearch.
  sReplace   Text that will replace the matched text.
  /Q:c       Specifies a character that is used in place of quotation marks.
  /S:sSource Text to be processed instead of Stdin file.

All search texts must be given in VBScript regular expression format. See:
http://msdn.microsoft.com/en-us/library/ee236359(v=vs.84).aspx
(^ and $ anchors the beginning and end of line, respectively)
The replacement string may use $ to retrieve saved submatched substrings. See:
http://msdn.microsoft.com/en-us/library/t0kbytzc(v=vs.84).aspx
Use /A switch to insert several values separated by pipe in rSearch/sReplace.
Use /Q:c switch to insert a quote in the search/replacement texts.
If first character of any text is an equal-sign, specify a Batch variable.

There are three ways to define Blocks of lines using rSearch text as base:

/O:s:e                           /E:rEndBlk              /E:rEndBlk /O:s:e
-------------------------------------------------------------------------------
Add S and E (with optional       From rSearch line       Add S to rSearch line
signs) to matching lines.        to rEndBlk line.        and E to rEndBlk line.

... and one way more if rSearch is not given:   /O:s:e   From line S to line E.
If S or E is negative, specify a backwards line from the end of file.
If E is not given, then it defaults to the last line of the file (same as -1).

The output vary depending on the given parameters and switches this way:

rSearch        /V                 Block            /B:rBlock        /$:s1...
-------------------------------------------------------------------------------
Matched        Non-matched        Blocks of        Search /B:       Saved
lines.         lines.             lines.           in blocks        submatches.

sReplace       /R                 Block            /B:rBlock
-------------------------------------------------------------------------------
All file       Only replaced      Search /B:rBlock in blocks
lines.         file lines.        and replaces matched lines.

The total number of matchings/replacements is returned in ERRORLEVEL.
</usage>

End of Batch section


@end


// JScript section


// FINDREPL [/I] [/V] [/N] rSearch [/E:rEndBlk] [/O:s:e] [/B:rBlock] [/$:s1...]
//          [[/R] [/A] sReplace] [/Q:c] [/S:source]


var options = WScript.Arguments.Named,
    args    = WScript.Arguments.Unnamed,
    env     = WScript.CreateObject("WScript.Shell").Environment("Process"),

    ignoreCase   = options.Exists("I")?"i":"",
    notMatched   = options.Exists("V"),
    showNumber   = options.Exists("N"),
    search       = undefined,
    endBlk       = undefined,
    offset       = undefined,
    block        = undefined,
    submatches   = undefined,
    justReplaced = options.Exists("R"),
    replace      = undefined,
    quote        = options.Item("Q"),

    lineNumber = 0, range = new Array(),
    procLines = false, procBlocks = false,
    nextMatch, result  = 0,

    match = function ( line, regex ) { return line.search(regex) >= 0; },

    parseInts =
       function ( strs ) {
          var Ints = new Array();
          for ( var i = 0; i < strs.length; ++i ) {
             Ints[i] = parseInt(strs[i]);
          }
          return Ints;
       },

    getRegExp =
       function ( param, justLoad ) {
          var result = param;
          if ( result.substr(0,1) == "=" ) {
             result = env(result.substr(1));
          } else {
             if ( quote != undefined ) result = result.replace(eval("/"+quote+"/g"),'"');
          }
          if ( ! justLoad ) result = new RegExp(result,"gm"+ignoreCase);
          return result;
       }
    ;


if ( args.Length > 0 ) {
   search = getRegExp(args.Item(0),true);
}
if ( options.Exists("E") ) {
   endBlk = getRegExp(options.Item("E"));
   procBlocks = true;
}
if ( options.Exists("O") ) {
   offset = parseInts(options.Item("O").split(":"));
   procBlocks = true;
}
if ( options.Exists("B") ) {
   block = getRegExp(options.Item("B"),true);
}
if ( options.Exists("$") ) submatches = parseInts(options.Item("$").split(":"));
var removeCRLF = false;
if ( args.Length > 1     ) {
   replace = args.Item(1);
   removeCRLF = (block == "\\r\\n") && (replace == "");
   if ( replace.substr(0,1) == "=" ) replace = env(replace.substr(1));
   replace = eval('"' + replace + '"');
   if ( quote != undefined ) replace = replace.replace(eval("/"+quote+"/g"),'"');
   if ( options.Exists("A") ) {  // Enable alternation replacements from "Se|ar|ch" to "Re|pla|ce"
      var Asearch = search.split("|"),
          Areplace = replace.split("|"),
          repl = new Array();
      for ( var i = 0; i < Asearch.length; i++ ) {
         repl[Asearch[i]] = Areplace[i];
      }
      replace = function($0,$1,$2) { return repl[$0]; };
      Asearch.length = 0;
      Areplace.length = 0;
   }
}
if ( search != undefined ) search = new RegExp(search, "gm"+ignoreCase);
if ( block  != undefined ) block  = new RegExp(block , "gm"+ignoreCase);



// FINDREPL [/I] [/V] [/N] rSearch [/E:rEndBlk] [/O:s:e] [/B:rBlock] [/$:s1...]
//          [[/R] [/A] sReplace] [/Q:c] [/S:sSource]

//          In Search and Replace operations:
//            /V or /N switches implies line processing
//            /E or /O switches implies block (and line) processing
//          If Search operation (with no previous switches) have NOT /$ switch:
//            implies line processing (otherwise is file processing)


if ( options.Exists("S") ) {  // Process Source string instead of file
   var source = options.Item("S");
   if ( source.substr(0,1) == "=" ) source = env(source.substr(1));
   var fileContents = new Array(), lastLine = 1;
   fileContents[0] = source;
   procLines = true;
} else {  // Process Stdin file
// -> positive justification omitted

fileContents = WScript.StdIn.ReadAll();

if ( notMatched || showNumber || procBlocks ) procLines = true;
if ( replace==undefined && submatches==undefined ) procLines = true;

if ( procLines ) {  // Separate file contents in lines
   var lastByte = fileContents.slice(-1);
   fileContents = fileContents.replace(/([^\r\n]*)\r?\n/g,"$1\n").match(/^.*$/gm);
   lastLine = fileContents.length - ((lastByte == "\n")?1:0);
}

if ( procBlocks ) {  // Create blocks of lines
   if ( search != undefined ) {  // Blocks based on Search lines:
      if ( offset == undefined ) offset = new Array(0,0);
      for ( var i = 1; i <= lastLine; i++ ) {
         if ( match(fileContents[i-1],search) ) {
            if ( endBlk != undefined ) {  // 1- from Search line to EndBlk line [+offsets].
               for ( var j=i+1; j<=lastLine && !match(fileContents[j-1],endBlk); j++ );
               if ( j <= lastLine ) {
                  var s = i+offset[0], e = j+offset[1];
                  // Insert additional code here to cancel overlapped blocks
                  range.push(s>0?s:1, e>0?e:1);
               }
               i = j;
            } else {  // 2- surrounding Search lines with offsets.
               s = i+offset[0], e = i+offset[1];
               range.push(s>0?s:1, e>0?e:1);
            }
         }
      }
   } else {  // Offset with no Search: block is range of lines
      if ( offset.length < 2 ) offset[1] = lastLine;
      s = offset[0]<0 ? offset[0]+lastLine+1 : offset[0];
      e = offset[1]<0 ? offset[1]+lastLine+1 : offset[1];
      range.push(s>0?s:1, e>0?e:1);
   }
   if ( range.length == 0 ) WScript.Quit(0);
   range.push(0xFFFFFFFF,0xFFFFFFFF);
}

// <- negative justification omitted
}
// endif Process Source string instead of file

if ( replace == undefined ) {  // Search operations
   if ( procLines ) {  // Search on lines
   // -> positive justification omitted...
   if ( procBlocks ) {  // Process previously created blocks
      for ( var r=0, lineNumber=1; lineNumber <= lastLine; lineNumber++ ) {
         if ( (range[r]<=lineNumber && lineNumber<=range[r+1]) != notMatched ) {
            if ( submatches != undefined ) {
               if ( showNumber ) WScript.Stdout.Write(lineNumber+":");
               while ( (nextMatch = block.exec(fileContents[lineNumber-1])) != null ) {
                  for ( var s = 0; s < submatches.length; s++ ) {
                     WScript.Stdout.Write(" " + (quote!=undefined?quote:'"') +
                                                nextMatch[submatches[s]] +
                                                (quote!=undefined?quote:'"'));
                  }
                  result++;
               }
               WScript.Stdout.WriteLine();
            } else {
               if ( block == undefined  ||  match(fileContents[lineNumber-1],block) ) {
                  if ( showNumber ) WScript.Stdout.Write(lineNumber+":");
                  WScript.Stdout.WriteLine(fileContents[lineNumber-1]);
                  result++;
               }
            }
         }
         if ( lineNumber >= range[r+1] ) r += 2;
      }
   } else {  // Process all lines for Search
      for ( lineNumber = 1; lineNumber <= lastLine; lineNumber++ ) {
         if ( match(fileContents[lineNumber-1],search) != notMatched ) {
            if ( showNumber ) WScript.Stdout.Write(lineNumber+":");
            if ( submatches != undefined ) {
               search.lastIndex = 0;
               while ( (nextMatch = search.exec(fileContents[lineNumber-1])) != null ) {
                  for ( var s = 0; s < submatches.length; s++ ) {
                     WScript.Stdout.Write(" " + (quote!=undefined?quote:'"') +
                                                nextMatch[submatches[s]] +
                                                (quote!=undefined?quote:'"'));
                  }
                  result++;
               }
               WScript.Stdout.WriteLine();
            } else {
               WScript.Stdout.WriteLine(fileContents[lineNumber-1]);
               result++;
            }
         }
      }
   }
   // <- negative justification omitted...

   } else {  // Search on entire file and show submatched substrings
      if ( submatches != undefined ) {
         while ( (nextMatch = search.exec(fileContents)) != null ) {
            for ( var s = 0; s < submatches.length; s++ ) {
               WScript.Stdout.Write(" " + (quote!=undefined?quote:'"') +
                                          nextMatch[submatches[s]] +
                                          (quote!=undefined?quote:'"'));
            }
            result++;
            WScript.Stdout.WriteLine();
         }
      }
   }
} else {  // Replace operations
   if ( procLines ) {  // Replace on lines
   // -> positive justification omitted...
   if ( procBlocks ) {  // Process previously created blocks
      if ( block == undefined ) block = search;  // Replace rSearch or rBlock (the last one)
      var CRLFremoved = false;
      for ( var r=0, lineNumber=1; lineNumber <= lastLine; lineNumber++ ) {
         if ( range[r]<=lineNumber && lineNumber<=range[r+1] ) {
         if ( removeCRLF ) {
            WScript.Stdout.Write(fileContents[lineNumber-1]);
            CRLFremoved = true;
            result++;
         } else {
            if ( match(fileContents[lineNumber-1],block) ) {
               if ( CRLFremoved ) { WScript.Stdout.WriteLine(); CRLFremoved = false; }
               WScript.Stdout.WriteLine(fileContents[lineNumber-1].replace(block,replace));
               result++;
            } else {
               if ( CRLFremoved ) { WScript.Stdout.WriteLine(); CRLFremoved = false; }
               if ( ! justReplaced ) WScript.Stdout.WriteLine(fileContents[lineNumber-1]);
            }
         }
         } else {
            if ( CRLFremoved ) { WScript.Stdout.WriteLine(); CRLFremoved = false; }
            if ( ! justReplaced ) WScript.Stdout.WriteLine(fileContents[lineNumber-1]);
         }
         if ( lineNumber >= range[r+1] ) r += 2;
      }
      if ( CRLFremoved ) { WScript.Stdout.WriteLine(); CRLFremoved = false; }
   } else {  // Process all lines for Replace
      for ( lineNumber = 1; lineNumber <= lastLine; lineNumber++ ) {
         if ( match(fileContents[lineNumber-1],search) ) {
            WScript.Stdout.WriteLine(fileContents[lineNumber-1].replace(search,replace));
            result++;
         } else {
            if ( ! justReplaced ) WScript.Stdout.WriteLine(fileContents[lineNumber-1]);
         }
      }
   }
   // <- negative justification omitted...

   } else {  // Replace on entire file
      WScript.Stdout.Write(fileContents.replace(search,replace));
   }
}
WScript.Quit(result);

Although it may seems complex, the usage of FindRepl.bat is straightforward. All optional switches may be included in any order, like in most standard "DOS" commands. The program processes lines read from Stdin and have two base parameters, a Search part and a Replace part: < theFile.txt FindRepl "Search" "Replace". If the Replace part is not given, then FindRepl.bat behaves like FINDSTR command.

- Simple find:
Code:
< theFile.txt FindRepl "any string"

- Enumerate/Count all lines, even if the file have Unix-type end-of-line delimiters (<LF> only) or the last line doesn't have the end-of-line delimiter, or both:
Code:
< theFile.txt FindRepl /N
echo Number of lines: %errorlevel%

- Eliminate all lines with "XYZ" text:
Code:
< theFile.txt FindRepl /V "XYZ"


Besides individual lines, FindRepl.bat program may also output blocks of lines around matching lines that are defined via two offsets with optional signs that will be added to the number of each matching line: FindRepl "Search" /O:s:e, or via an ending matching line: FindRepl "Start" /E:"End", or via any combination: FindRepl "Start" /E:"End" /O:s:e. The simplest way to output a block of lines is via a direct range of line numbers without Search part.

- Show from line 12 to line 34:
Code:
< theFile.txt FindRepl /O:12:34

- Show the first 15 lines (head):
Code:
< theFile.txt FindRepl /O:1:15

- Show the last 20 lines (tail):
Code:
< theFile.txt FindRepl /O:-20

- Show both the first 15 lines and the last 20 lines (with line numbers):
Code:
< theFile.txt FindRepl /V /O:16:-21 /N

- Search for a certain string and show a block of 3 lines in each match: 1 line before and 1 line after the matching line (http://www.dostips.com/forum/viewtopic.php?f=3&t=3801):
Code:
< theFile.txt FindRepl "the string" /O:-1:+1

- Eliminate a block of lines surrounded by certain delimiting lines (http://stackoverflow.com/questions/17126655/how-to-remove-lines-or-text-in-given-lines-from-file-in-batch):
Code:
< theFile.txt FindRepl /V "start remove" /E:"end remove"

- Show a block of lines surrounded by delimiting lines, but not including them (like the show help part of FindRepl.bat program):
Code:
< theFile.txt FindRepl "start show" /E:"end show" /O:+1:-1


The Search string is not plain text, but a regular expression ("regexp"). You may use a backslash followed by another character to specify binary bytes or sets of common characters in a regexp. For example, to specify a <TAB> character you may use \t or \cI (Ctrl-I) or \x09 (Ascii hexa-code 9). For a <LF> (newline) character you may use \n or \cJ or \x0A, and use \r or \cM or \x0D for <CR>. You may also use \d for any digit, \D for nondigit characters, \w for alphanumeric and underscore characters, \b to anchor the beginning or end of a word, and many more. For a complete description of this topic, see: http://msdn.microsoft.com/en-us/library/1400241x(v=vs.84).aspx Please note that you always must escape with backslash the following special characters in order to search for they: \*+?^$.[]{}()|

- Show lines that include a dot followed by a <TAB>:
Code:
< theFile.txt FindRepl "\.\t"


All text parameters usually are strings enclosed in quotes or a single word with no quotes if it does not contain special Batch characters. Note that you can not include a quote in a regexp, so in this case you may use another character and place it in /Q switch. Although you may use the \x22 hexadecimal value to insert a quote in a regexp, the /Q method is much clearer and \x22 can not be used in the sReplace text because it is not a regexp. The colons in all switches are mandatory.

- Show lines that include this text: He said: "Here I am" and gone
Code:
< theFile.txt FindRepl "He said: 'Here I am' and gone" /Q:'


If the first character of any text parameter is an equal-sign, then the text represent the name of a Batch variable whose contents will be processed; if you need to insert an equal-sign in the first character of a literal string, just escape it with backslash. The /S switch allows to process a given text instead of the Stdin file.

- Check if "C:\Certain\Path" already exists in system PATH variable (http://stackoverflow.com/questions/17086292/how-to-insert-a-new-path-into-system-path-variable-if-it-is-not-already-there):
Code:
FindRepl "C:\Certain\Path" /S:=PATH > NUL
if %errorlevel% gtr 0 echo The given path exists in system PATH


When blocks of lines are defined, their resulting lines may be searched again using a different regexp in a nested second-level search via the /B:rBlock switch. In this case, if the /V switch is also given, then it is applied just to the first rSearch.

- Search for value of "Name" field inside "Category X" tag of an XML file, assuming that all values are placed in separated lines:
Code:
< theFile.xml FindRepl "<Category X>" /E:"</Category X>" /B:"Name"


A regular expression may be comprised of several subexpressions enclosed in parentheses. Usually the search of such regexp return matching lines, but the /$ switch may be used to extract individual submatched subexpressions via numbers that specify the desired ones (a zero will show the whole matched expression); each subexpression is shown enclosed in quotes or in the character given in /Q switch. For a complete description of this point, see: http://msdn.microsoft.com/en-us/library/kstkz771(v=vs.84).aspx

- Search for the line next to "BEGIN DSJOB" and extract the value enclosed in quotes after "Identifier" (http://www.dostips.com/forum/viewtopic.php?f=3&t=4679):
Code:
< theFile.txt FindRepl "BEGIN DSJOB" /O:+1:+1 /B:"Identifier '([^']*)'" /Q:' /$:1


The next example use "(\w+)" regexp to match words separated by special characters and "(\w+)@(\w+)\.(\w+)" regexp to match email addresses comprised of an user name, an at sign, a server name, a dot, and a domain name.
Code:
C:\> < theFile.txt FindRepl /N
1:Line with an email address: joedoe@unknown.org
2:Please send mail to george@contoso.com and someone@example.com. Thanks!
3:Line number 3 with no email address

C:\> echo Number of lines: %errorlevel%
Number of lines: 3

C:\> set email=(\w+)@(\w+)\.(\w+)

C:\> < theFile.txt FindRepl /N =email
1:Line with an email address: joedoe@unknown.org
2:Please send mail to george@contoso.com and someone@example.com. Thanks!

C:\> echo Lines with email addresses: %errorlevel%
Lines with email addresses: 2

C:\> rem Extract email addresses:

C:\> < theFile.txt FindRepl /N =email /$:0
1: "joedoe@unknown.org"
2: "george@contoso.com" "someone@example.com"

C:\> echo Number of email addresses: %errorlevel%
Number of email addresses: 3

C:\> rem Separate email parts in domain, server, and user order:

C:\> < theFile.txt FindRepl /N =email /$:3:2:1
1: "org" "unknown" "joedoe"
2: "com" "contoso" "george" "com" "example" "someone"

C:\> rem Separate words:

C:\> < theFile.txt FindRepl "(\w+)" /$:1
 "Line" "with" "an" "email" "address" "joedoe" "unknown" "org"
 "Please" "send" "mail" "to" "george" "contoso" "com" "and" "someone" "example" "com" "Thanks"
 "Line" "number" "3" "with" "no" "email" "address"

C:\> echo Number of words: %errorlevel%
Number of words: 27

________________________________________________________________________________

If the Replace part is given, then FindRepl.bat replaces the matched Search text with the Replace part. Note that "the matched Search text" is the same text previously described in Search part above, including blocks and /B:rBlock regexp. However, you can not insert a direct range of line numbers to define a block for replacement, because in this case the Search part is mandatory. The only switches that don't works in a Replace operation are /N and /$, but you may use a $ match variable in the Replace text in order to retrieve the saved submatched substrings as described at this site: http://msdn.microsoft.com/en-us/library/t0kbytzc(v=vs.84).aspx

- Eliminate all "XYZ" words:
Code:
< theFile.txt FindRepl.bat "\bXYZ\b" ""

- Replace all "ACSD" strings with "XYZ" (http://www.dostips.com/forum/viewtopic.php?f=3&t=3282):
Code:
< theFile.txt FindRepl "ACSD" "XYZ"

- Search for a specific word and replace the next 4 characters (http://stackoverflow.com/questions/17085650/find-a-string-and-replace-specific-letters-in-batch-file):
Code:
< theFile.txt FindRepl "word ...." "word NEWC"


In the Replace text you may insert the usual escaped characters: \t (TAB), \n (LF), \r (CR), etc., but you must use a double dollar sign ($$) in order to insert a literal $ character. Also, because Batch-JScript interface problems, you can not insert a quote in the replace text even if it is stored in a Batch variable, but you may use the /Q switch to solve this problem.

- Change from 1 up to 8 spaces by a <TAB>:
Code:
< theFile FindRepl " {1,8}" "\t"

- Change the data value of a certain tag in a XML file (http://stackoverflow.com/questions/17054275/changing-tag-data-in-an-xml-file-using-windows-batch-file):
Code:
< theFile.xml FindRepl "(<TagName>).*(</TagName>)" "$1NewValue$2"


If the /B:rBlock text is "\r\n" (<CR><LF> or end of line delimiter) and the sReplace string is "" (empty), then output lines are shown joined together in a long string without line separators. When the entire file is processed for Replace (no /V nor /N switches and no blocks are given), then the end of line delimiting characters ("\r\n" = <CR><LF>) may be searched and replaced in any way you wish. For example:

- Change <LF> Unix line delimiters to <CR><LF> Windows ones:
Code:
< unixFile.txt FindRepl "\n" "\r\n" > windowsFile.txt

- Change <CR><LF> Windows line delimiters to <LF> Linux ones:
Code:
< windowsFile.txt FindRepl "\r\n" "\n" > unixFile.txt


If the entire file is processed for Replace this way, the value returned in ERRORLEVEL is zero.

If the entire file is processed for Search (no /V nor /N nor blocks) and the /$ switch is given, then the end of line delimiters may be searched, but just the results of /$ switch will be displayed.

Finally, support for fast replacement of multiple strings in just one pass of the file has been added. This feature is achieved via the /A switch and a series of alternative values separated by pipe character and placed in both Search and Replace text in the form of an "alternation", as described at this site: http://msdn.microsoft.com/en-us/library/kstkz771(v=vs.84).aspx

- Replace certain names (three of them) by three different ones (http://www.dostips.com/forum/viewtopic.php?f=3&t=3848):
Code:
< theFile.txt FindRepl "Bob Jones|Mary|Tom Riley" /A "Fred Thomas|Jane|Doug Smith"


- Translate day-of-the-week names from Spanish to English:
Code:
set "spanish=lunes|martes|miércoles|jueves|viernes|sábado|domingo"
set "english=Monday|Tuesday|Wednesday|Thursday|Friday|Saturday|Sunday"
< theFile.txt FindRepl =spanish /A =english
Please note that this multi-string replacement is always case sensitive; it is up to you that each alternative in the rSearch alternation does not match more than one result.

The next example load a series of values from a replacements file with "old:new" format, and process a data file to replace all the strings. The multi-string replacement selection method is based on a direct access to an array element, so its speed is not affected by the number of elements. The only limit is the 8KB total size of the Batch variables that store the sets of replacements.
Code:
@echo off
setlocal EnableDelayedExpansion
set search=
set replace=
for /F "tokens=1,2 delims=:" %%a in (replacements.txt) do (
   set "search=!search!|%%a"
   set "replace=!replace!|%%b"
)
set "search=!search:~1!"
set "replace=!replace:~1!"
< theFile FindRepl =search /A =replace

You may also use an alternation to just Search a file for a long list of words.
________________________________________________________________________________

FindRepl.bat program have not any kind of error detection; if you provide wrong parameters, the CScript run-time support will issue an error message. Although error detection code could be easily added, this point will increase the size of the (already large) JScript program.

I encourage you to review the sites given in previous links about the Windows Script Host Regular Expression capabilities. A domain of this topic is the key to fully exploit the capabilities of this and other similar utilities.

You are invited to use FindRepl.bat program and report any problem or bug you may find. Be aware that the processing of very large files depends on the available memory, so if the file is huge, then the program may take too long to complete. I tested FindRepl.bat with a 10.5 MB size file in my very old and limited Windows XP computer and get the right results after a couple minutes.


Antonio


Last edited by Aacini on 07 Jul 2013 07:08, edited 9 times in total.



26 Jun 2013 21:00
Profile

Joined: 23 Dec 2011 13:59
Posts: 1760
Post Re: New regex utility to search and replace strings in files
How about an option to have your search strings in a file like the /G option of FINDSTR?

On a side note: I have been meaning for years to figure out a better way to split a file based on the search strings in a FILE without having to use a FOR loop or use two separate FINDSTR commands.

Input
Inputfile.txt
Searchstrings.txt

Output
Match.txt (lines that match the search strings text file)
NoMatch.txt (doesn't match the search strings.)

Years ago I used a for loop to read the file in and then used FINDSTR and then checked the errorlevel. But this was extremely slow and clunky.

Always wondered if there was a better or faster way to do it.


26 Jun 2013 21:20
Profile

Joined: 10 Feb 2012 02:20
Posts: 4027
Post Re: New regex utility to search and replace strings in files
That's impressive Aacini - it looks like a very powerful tool - and I only got 2/3 of the way through the great documentation before I ran out of steam. I'll come back to finish later.

Just two typos so far - this is missing findrepl.bat

< theFile.xml "<Category X>" /E:"</Category X>" /B:"Name"

and I think where you write arroba in English it is commonly known as an 'at sign'

One more point - the http links need to have spaces at the start and end, I think, otherwise they are plain text and not links.

Cheers


26 Jun 2013 21:59
Profile
Expert

Joined: 06 Dec 2011 22:15
Posts: 710
Location: México City, México
Post Re: New regex utility to search and replace strings in files
Squashman wrote:
How about an option to have your search strings in a file like the /G option of FINDSTR?
@Squashman,

I didn't included this option because it would add even more code and resources (like FileSystemObject) to the already large program. A JScript program is compiled every time it is used and FindRepl is intended to be used frequently with small files (/G switch is not one of the most used options). Besides, the loading of search strings from a file may be done easily in Batch. If /G option would be included and the strings would be used on several files, the strings file would be loaded again with each file (unless the program be modified to also process files by itself via wild-cards). If the strings file is processed apart, you may use the created strings variable in several FindRepl executions.

For example, your program could be written this way:
Code:
@echo off
setlocal EnableDelayedExpansion
rem FINDSTRINGS.BAT inputFile stringsFile
set search=
for /F "delims=" %%a in (%2) do set "search=!search!|%%a"
set "search=!search:~1!"
< %1 call FindRepl =search > Match.txt
< %1 call FindRepl =search /V > NoMatch.txt



foxidrive wrote:
One more point - the http links need to have spaces at the start and end, I think, otherwise they are plain text and not links.
@foxidrive,

Thanks a lot for reporting the typos foxi, I fixed them...

In this forum the number of links in each post are limited to just 2, so in order to include more we need to disguise they in a way that the site engine don't detect them. The trick (that I borrowed from someone else, don't remember who) is to enclose the double slashes between italic tags. Of course, you need to copy the link and paste it in the navigator bar in order to use it...

Antonio


27 Jun 2013 08:52
Profile

Joined: 23 Dec 2011 13:59
Posts: 1760
Post Re: New regex utility to search and replace strings in files
Aacini wrote:
For example, your program could be written this way:
Code:
< %1 call FindRepl =search > Match.txt
< %1 call FindRepl =search /V > NoMatch.txt

Antonio

Trying to avoid running it twice. I work with very large files. I guess I will have to get back into programming again.


27 Jun 2013 09:17
Profile

Joined: 27 Mar 2013 01:29
Posts: 240
Location: Bozen
Post Re: New regex utility to search and replace strings in files
I miss a "/g:file" option in sed every day (or every other day).
I manage myself with unpleasant constructions such as
Code:
sed -r "s#(.*)#/\1/d#" fileB | sed -f - fileA


It would be nice to have something better :)


27 Jun 2013 12:11
Profile

Joined: 25 Apr 2012 23:51
Posts: 57
Post Re: New regex utility to search and replace strings in files
Antonio,

thanks for providing this. Something to replace grep in batch + it has replace functions as well.

Now for reading the manual :D

thank you.


27 Jun 2013 19:02
Profile

Joined: 10 Feb 2012 02:20
Posts: 4027
Post Re: New regex utility to search and replace strings in files
Aacini wrote:
foxidrive wrote:
One more point - the http links need to have spaces at the start and end, I think, otherwise they are plain text and not links.
@foxidrive,

In this forum the number of links in each post are limited to just 2, so in order to include more we need to disguise they in a way that the site engine don't detect them. The trick (that I borrowed from someone else, don't remember who) is to enclose the double slashes between italic tags. Of course, you need to copy the link and paste it in the navigator bar in order to use it...

Antonio


I hadn't realised there was limit - I see. ta.


27 Jun 2013 21:06
Profile
Expert

Joined: 06 Dec 2011 22:15
Posts: 710
Location: México City, México
Post Re: New regex utility to search and replace strings in files
@Squashman,

I wrote a small program that fulfills your specific needs:
Code:
@if (@CodeSection == @Batch) @then

@echo off
if "%~2" neq "" if "%~1" neq "/?" goto begin
   echo Load strings from a file and search them in another file
   echo/
   echo FINDSTRINGS [I] [/N1] [/N2] dataFile stringsFile
   echo/
   echo   /I        Specifies that the search is not to be case-sensitive.
   echo   /N1 /N2   Prints line numbers before lines that matches/not matches.
   echo/
   echo Matching lines are printed in Stdout and non-matching lines in Stderr.
   goto :EOF
:begin
   CScript //nologo //E:JScript "%~F0" %*
   exit /B %errorlevel%

@end

// JScript section

var options = WScript.Arguments.Named,
    args    = WScript.Arguments.Unnamed,
    ignoreCase  = options.Exists("I")?"i":"",
    showNumber1 = options.Exists("N1"),
    showNumber2 = options.Exists("N2");

if ( args.Length < 2 ) WScript.Quit(1);

var fso = new ActiveXObject("Scripting.FileSystemObject"),
    file = fso.OpenTextFile(args.Item(1), 1),  // stringsFile, ForReading
    search = file.ReadLine().replace(/([][*+?^$.{}()|/\\])/g,"\\$1");
while ( ! file.AtEndOfStream ) {
   search += "|" + file.ReadLine().replace(/([][*+?^$.{}()|/\\])/g,"\\$1");
}
file.Close();
search = new RegExp(search,"g"+ignoreCase);
file = fso.OpenTextFile(args.Item(0), 1);  // dataFile, ForReading
while ( ! file.AtEndOfStream ) {
   var line = file.ReadLine();
   if ( line.search(search) >= 0 ) {
      if ( showNumber1 ) WScript.Stdout.Write((file.Line-1)+":");
      WScript.Stdout.WriteLine(line);
   } else {
      if ( showNumber2 ) WScript.Stderr.Write((file.Line-1)+":");
      WScript.Stderr.WriteLine(line);
   }
}
file.Close();
WScript.Quit(0);

For example:
Code:
FINDSTRINGS Inputfile.txt Searchstrings.txt > Match.txt 2> NoMatch.txt


Enjoy! :mrgreen:

Antonio


28 Jun 2013 09:11
Profile
Expert

Joined: 12 Feb 2011 21:02
Posts: 1179
Location: United States (east coast)
Post Re: New regex utility to search and replace strings in files
Very nice work Antonio. You have implemented many features on my wish list for a find utility, and added others I hadn't even thought of.

I'm not convinced that combining a search utility with a replace utility is the best option. I'm guessing that it might be easier to provide two specialized utilities that are cumulatively more powerful then a single combined utility. I'm thinking easier both from a coding standpoint, and from a documentation and usability standpoint. But that really is a guess - I haven't studied your code or done any design work.

One feature I miss in your utility is the lost ability to read one line at a time. It is not uncommon to have a chain of pipes, and your use of ReadAll() serializes that step in the chain. The next pipe can't begin until your process is complete. ReadAll() also is less then ideal for really large files. I wonder what it would take to restructure things to read and process one line at a time whenever possible.

One feature I particularly like is your option to only print modified lines when doing a replace. I borrowed that idea and added it to my REPL.BAT utility.


Dave Benham


28 Jun 2013 17:37
Profile

Joined: 25 Apr 2012 23:51
Posts: 57
Post Re: New regex utility to search and replace strings in files
Antonio,

this is great. No need to get permission on using downloaded exe from web to use for potential virus etc.

was using grep from http://unxutils.sourceforge.net/. Very little knowledge on regex

sample on grep + repl
Code:
for %%g IN (*%%f_LN*) do (
    ..\grep -B 7 -P "Check Voltage is" "%%g" |findstr /v /c:"Check Voltage is" | ..\repl "\r|\n" "" M >> ..\test.htm
)


replace by findrepl
Code:
for %%g IN (*%%f_LN*) do (
< "%%g" ..\findrepl  "Check Voltage is" /O:-7:0 |findstr /v /c:"Check Voltage is" | ..\findrepl "\r\n" "" >> ..\test.htm
)


Not only that, Could not notice any noticable difference when running through 58 files with each less than 1MB size in WIN2000 SP4.

Thanks again for this.


30 Jun 2013 23:47
Profile

Joined: 23 Dec 2011 13:59
Posts: 1760
Post Re: New regex utility to search and replace strings in files
brinda wrote:
No need to get permission on using downloaded exe from web to use for potential virus etc.

:?: :?: :?:


01 Jul 2013 05:35
Profile

Joined: 25 Apr 2012 23:51
Posts: 57
Post Re: New regex utility to search and replace strings in files
Squashman wrote:
brinda wrote:
No need to get permission on using downloaded exe from web to use for potential virus etc.

:?: :?: :?:


squashman,

at my work place. They do not allow grep.exe or anything to be downloaded for fear of virus etc eventhough an anti-virus is there - policy. A lot of proof is needed before the downloaded program gets approved + even than there is always the check that if anything goes wrong while using this - the fault lies with the user who downloaded. sorry, left the word out. :)


01 Jul 2013 06:15
Profile

Joined: 23 Dec 2011 13:59
Posts: 1760
Post Re: New regex utility to search and replace strings in files
brinda wrote:
Squashman wrote:
brinda wrote:
No need to get permission on using downloaded exe from web to use for potential virus etc.

:?: :?: :?:


squashman,

at my work place. They do not allow grep.exe or anything to be downloaded for fear of virus etc eventhough an anti-virus is there - policy. A lot of proof is needed before the downloaded program gets approved + even than there is always the check that if anything goes wrong while using this - the fault lies with the user who downloaded. sorry, left the word out. :)

Then your English is a bit broken. You basically said you did not need permission to download executable files from the web.


01 Jul 2013 09:56
Profile
Expert

Joined: 06 Dec 2011 22:15
Posts: 710
Location: México City, México
Post Re: New regex utility to search and replace strings in files
brinda wrote:
replace by findrepl
Code:
for %%g IN (*%%f_LN*) do (
< "%%g" ..\findrepl  "Check Voltage is" /O:-7:0 |findstr /v /c:"Check Voltage is" | ..\findrepl "\r\n" "" >> ..\test.htm
)


@brinda,

If I correctly understood your code, you get first a block of 8 lines that ends at "Check Voltage is" line (with first FindRepl), and then eliminate the bottom line (with findstr /v). In this case the findstr is not necessary; just get the appropriate block of lines directly with FindRepl (from -7 to -1):
Code:
for %%g IN (*%%f_LN*) do (
< "%%g" ..\findrepl  "Check Voltage is" /O:-7:-1 | ..\findrepl "\r\n" "" >> ..\test.htm
)


I slightly modified FindRepl.bat program so it may also eliminate the end-of-line delimiters when the /B:rBlock is "\r\n" and the sReplace string is "". If you use the modified version of this program (just copy it again from above), you may solve your problem this way:
Code:
for %%g IN (*%%f_LN*) do (
< "%%g" ..\findrepl  "Check Voltage is" /O:-7:-1 /B:"\r\n" "" >> ..\test.htm
)


Antonio


02 Jul 2013 04:51
Profile
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 33 posts ]  Go to page 1, 2, 3  Next


Who is online

Users browsing this forum: Google [Bot] and 13 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group.
Forum style by Vjacheslav Trushkin for Free Forums/DivisionCore.