FindRepl.bat:New regex utility to search and replace strings

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
dbenham
Expert
Posts: 2378
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: FindRepl.bat:New regex utility to search and replace str

#61 Post by dbenham » 22 Nov 2014 09:17

I like the idea of a an "examples/tutorial" thread - it is something I have been thinking about starting. The tricky part will be discouraging/preventing people from posting questions like "How do I do this?", when the needed info has already been posted. It can very quickly devolve into a general discussion page. Ideally, people would post new questions separate from the examples/tutorial, and if the solution demonstrates an important concept, or if it has generally useful utility, then it could be added to the examples/tutorial.

There is another issue with a tutorial thread. Often times there is a logical progression of ideas in a tutorial. But what to do if a new example "belongs" early in the thread - should an earlier post be updated?

The ultimate would be to have control of our own content pages, separate from the forum :mrgreen:

I don't like the idea of separating code from discussion. I am happy with the design of my main JREPL thread. The first post is the current code. Subsequent posts show the historical development and discussion of the code. Each new version gets its own post, thus maintaining code history. In the case of a simple bug fix, I list only the change in a new post, and then update both the prior version post, as well as the current version at the top.


Dave Benham

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: FindRepl.bat:New regex utility to search and replace str

#62 Post by foxidrive » 22 Nov 2014 09:59

dbenham wrote:I like the idea of a an "examples/tutorial" thread - The tricky part will be discouraging/preventing people from posting questions like "How do I do this?"

The mods can move any discussion/questions to the discussion thread.
There is another issue with a tutorial thread. Often times there is a logical progression of ideas in a tutorial. But what to do if a new example "belongs" early in the thread - should an earlier post be updated?

The ultimate would be to have control of our own content pages, separate from the forum :mrgreen:

Yes, the tutorial thread would be yours to control - if you can edit your prior posts. All posts should be made by you. You can add any code made by others to the list and put it whereever it is best suited.
I don't like the idea of separating code from discussion. I am happy with the design of my main JREPL thread.

That's up to you - I'll just comment that having monster posts in the leading parts of the thread make it tedious to find the start of the discussion, for those joining the thread.

I also thought that being directed to the most recently posted code automatically, when it was updated (the last post) would be keen. ;)

Squashman
Expert
Posts: 4164
Joined: 23 Dec 2011 13:59

Re: FindRepl.bat:New regex utility to search and replace str

#63 Post by Squashman » 22 Nov 2014 11:52

Well I have said it a few times that we should update the library with a lot of the excellent code that has been written here.

Made me think we should organize it into a pdf book.

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: FindRepl.bat:New regex utility to search and replace str

#64 Post by foxidrive » 22 Nov 2014 22:47

Squashman wrote:Well I have said it a few times that we should update the library with a lot of the excellent code that has been written here.


That would be a hefty task, and by rights each snippet would have to be retested to confirm it does the job, before adding it to the library.

Made me think we should organize it into a pdf book.

Aacini
Expert
Posts: 1648
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#65 Post by Aacini » 01 Dec 2014 00:23

I released the new FindRepl.bat version 2.1 with the following modifications over version 2.


PREVIOUS FEATURES CHANGED
  • A couple small bugs that incorrectly split the alternations fixed.
  • A clearer documentation and built-in help screen (thanks foxidrive!).
  • Previous /ARG# switches changed into /VAR one.
  • Global variables managed by sum(i,j) and prod(i,j) functions changed into arrays.


NEW FEATURES ADDED
  • Replacements on a block of lines selected by line numbers is now possible.
  • New global variables managed by max(i,j) and min(i,j) functions.
  • New toDate predefined function create a Date value from any string.
  • New showDate predefined function show a Date value in several formats.
  • New dateDiff, days and dateAdd predefined functions performs calculations on Date values.
  • New fileProperty, fileDelete, fileCopy, fileMove and fileExist File management predefined functions.
  • New "// Comment" and "var name=value" lines may be included in the /G switch file.


The new code with the modifications is in the same place of the original at the first post of this thread, as usual. The full description of the new features is given in the FindRepl.bat Version 2 Documentation at this post. I suggest you to review the description of the most important FindRepl.bat version 2.1 features: Date type functions in section "3. FindRepl.bat predefined functions" and the whole section "4. File management predefined functions".

The new version 2.1 addresses some points that were previously suggested or reported, both in the program code as in the documentation.


dbenham wrote:From a few posts ago:

Aacini wrote:Accordingly to regexp alternation documentation, "The largest possible expression on either side of the pipe character is matched". This means that if the input data can be matched by more than one regular expression in the alternation, the selected one will be the largest possible one; if several regexps are of the same size, they are tested to match in left-to-right order.


I believe you are misinterpreting the MS doc. (I also was puzzled at first). It is simply stating that the entire expression on either side is alternated. For example: "a b|c d" should be thought of as "(a b) or (c d)", not "a (b or c) d".

The regex engine works left to right, and short circuits once a matching alternation is found. It does not pick the largest one when multiple expressions match.

Dave Benham
Thanks for this observation Dave, I modified the documentation accordingly.




on http://www.dostips.com/forum/viewtopic.php?f=3&t=4697&p=34923#p34923 foxidrive wrote:Feature request:

Antonio, would you think it's a good idea to add a feature to change a specified line number only, and using regular expressions as usual in the rSearch and sReplace ?

So findrepl would pass all other lines through unchanged, and only change the desired line number.

Command something like this:

Code: Select all

type file.txt | findrepl /LINE 3 "VALUETOCHANGE" "Given Value"


file.txt


aaa VALUETOCHANGE bbb
aaa VALUETOCHANGE bbb
aaa VALUETOCHANGE bbb
aaa VALUETOCHANGE bbb
aaa VALUETOCHANGE bbb


changed text:


aaa VALUETOCHANGE bbb
aaa VALUETOCHANGE bbb
aaa Given Value bbb
aaa VALUETOCHANGE bbb
aaa VALUETOCHANGE bbb


or can this be done some way with findrepl itself?
Just put an empty string in rSearch parameter and use /O and /B switches to complete the desired replacement:

Code: Select all

type file.txt | findrepl "" /O:3:3 /B:"VALUETOCHANGE" "Given Value"





bars143 wrote:i had another problem in my new script here:

inputfile.txt wrote:there are two doublecolon : symbols : here
there are two doublequotes marks "symbols" here
there are less and greater than < symbols > here
there are questionmark and asterisk ? symbols * here
there are backslash and slash \ symbols / here
there are pipe and exclaimation marks | symbols ! here


my script here:

Code: Select all

@echo off
echo deleting characters:

type inputfile.txt |FindRepl ":|'|<|>|\?|\*|\\|/|\||\!" /A "" /Q:'

echo replacements:

type inputfile.txt |FindRepl ":|'|<|>|\?|\*|\\|/|\||\!" /A "[DC]|[DQ]|[LS]|[GR]|[QM]|[ST]|[BS]|[SL]|[PI]|[EM]" /Q:'

pause
exit

output here:

Code: Select all

deleting characters:


there are  two doublecolon  symbols  here
there are  two doublequotes marks symbols here
there are  less and greater than  symbols  here
there are  questionmark and asterisk  symbols  here
there are  backslash and slash  symbols  here
there are  pipe and exclaimation marks  symbols  here

replacements:


there are  two doublecolon [DC] symbols [DC] here
there are  two doublequotes marks [DQ]symbols[DQ] here
there are  less and greater than [LS] symbols [GR] here
there are  questionmark and asterisk  symbols  here
there are  backslash and slash [PI] symbols [SL] here
there are  pipe and exclaimation marks  symbols  here

Press any key to continue . . .


as shown above:

1) "[QM]" , "[ST]" , "[BS]" , "[EM]" are not displayed.

2) "[PI]" is misplaced and is found in "[BS]" position.

Bars
This bug was fixed. The code below is a more complete test on this point:

Code: Select all

@echo off
setlocal

set text=Duplicate characters that require escapes: "|\*+?^$.[]{}()"
echo %text%

echo/
echo Original alternation (literal strings)
set "search='|\||\\|\*|\+|\?|\^|\$|\.|\[|\]|\{|\}|\(|\)"
set "replace=''|\|\||\\\\|\*\*|\+\+|\?\?|\^\^|\$\$|\.\.|\[\[|\]\]|\{\{|\}\}|\(\(|\)\)"
echo %text% | FindRepl /Q:' =search /A =replace
call FindRepl /Q:' =search /A =replace /S:=text


echo/
echo New /J alternation (expressions)
set "search=@||\|||\\||\*||\+||\?||\^||\$||\.||\[||\]||\{||\}||\(||\)"
set "replace='@@'||'\|\|'||'\\\\'||'\*\*'||'\+\+'||'\?\?'||'\^\^'||'\$\$'||'\.\.'||'\[\['||'\]\]'||'\{\{'||'\}\}'||'\(\('||'\)\)'"
echo %text% | FindRepl /Q:@ =search /A =replace /J
call FindRepl /Q:@ =search /A =replace /J /S:=text





There are two important sets of new features in FindRepl.bat version 2.1: the Date management functions and the File management functions; I suggest you to review them in the Version 2 documentation.

Below there is the output of FComp.bat /M "FindRepl version 2.1.bat" "FindRepl version 2.bat" that will contain just the modifications of the new version when compared vs. the previous one. This listing may serve to roll back to previous FindRepl.bat version 2 from version 2.1 via Update-Rollback.bat program that you may download from: http://www.dostips.com/forum/viewtopic.php?f=3&t=3968&p=38700#p38700

V2.1_rollbackto_V2.txt:

Code: Select all

18 -9 +6
   19:  rem   - Nov/20/2014: Version 2 - New switches: /J, /L, /G, /ARG#, and || separate regexp's in /Alternations.
   20:   
   21:  if "%~1" equ "" goto showUsage
   22:  if "%~1" equ "/?" goto showUsage
   23:   
   24:  CScript //nologo //E:JScript "%~F0" %*
5 -3 +3
   30:  FINDREPL [/I] [/V] [/N] rSearch [/E:rEndBlk] [/O:s:e] [/B:rBlock] [/$:1:2...]
   31:           [[/R] [/A] sReplace] [/Q:c] [/S:sSource]
   32:           [/J[:n] [/L:jEnd]] [/G:file] [/ARG1:text1 [/ARG2:text2] ...]
4 -77 +45
   37:    rSearch    Text to be searched for, or Start text for a block of lines.
   38:    /E:rEndBlk Text to be searched for the End of a block of lines.
   39:    /O:s:e     Specifies offsets for Starting and Ending lines of blocks.
   40:    /B:rBlock  Text to be searched again in the blocks of lines.
   41:    /$:1:2...  Specifies to print saved submatched substrings instead of lines.
   42:    /R         Prints only replaced lines, instead of all file lines.
   43:    /A         Specifies that sReplace has alternative values matching rSearch.
   44:    sReplace   Text that will replace the matched text.
   45:    /Q:c       Specifies a character that is used in place of quotation marks.
   46:    /S:sSource Text to be processed instead of Stdin file.
   47:    /J[:n]     Specify that rSearch/sReplace contain expressions (regexp/JScript)
   48:    /L:jEnd    Execute jEnd as a JScript expression after the end of file.
   49:    /G:file    Specifies the file to get rSearch and sReplace texts from.
   50:    /ARG1:...  Specifies texts to initialize JScript variables with same names.
   51:   
   52:  All search texts must be given in VBScript regular expression format.
   53:  The replacement string may use $ to retrieve saved submatched substrings. If /J
   54:  switch is given, the JScript expression in sReplace text use submatched $0-$n.
   55:  Use /A switch to insert alternatives separated by pipe in rSearch/sReplace;
   56:  if /J switch is given, the alternatives are regexp's separated by double-pipe.
   57:  Use /Q:c switch to insert a quote in the search/replacement texts.
   58:  If first character of any text is an equal-sign, specify a Batch variable.
   59:   
   60:  There are three ways to define Blocks of lines using rSearch text as base:
   61:   
   62:  /O:s:e                           /E:rEndBlk              /E:rEndBlk /O:s:e
   63:  -------------------------------------------------------------------------------
   64:  Add S and E (with optional       From rSearch line       Add S to rSearch line
   65:  signs) to matching lines.        to rEndBlk line.        and E to rEndBlk line.
   66:   
   67:  ... and one way more if rSearch is not given:   /O:s:e   From line S to line E.
   68:  If S or E is negative, specify a backwards line from the end of file.
   69:  If E is not given, then it defaults to the last line of the file (same as -1).
   70:   
   71:  The output vary depending on the given parameters and switches this way:
   72:   
   73:  rSearch        /V                 Block            /B:rBlock        /$:1:2...
   74:  -------------------------------------------------------------------------------
   75:  Matched        Non-matched        Blocks of        Search /B:       Saved
   76:  lines.         lines.             lines.           in blocks        submatches.
   77:   
   78:  sReplace       /R                 Block            /B:rBlock
   79:  -------------------------------------------------------------------------------
   80:  All file       Only replaced      Search /B:rBlock in blocks
   81:  lines.         file lines.        and replaces matched lines.
3 -1 +18
   85:     1- FindRepl.bat documentation:
   86:           Detailed description of FindRepl features with multiple examples.
   87:     2- FindRepl.bat version 2 documentation:
   88:           Additional descriptions on /J, /L, /G switches and || alternatives.
   89:     3- Regular Expressions documentation:
   90:           Describe the features that may be used in rSearch, rEndBlk and rBlock.
   91:     4- Alternation and Subexpressions documentation:
   92:           Describe how use | to separate values in rSearch with /A switch
   93:           and features of (subexpressions) for /$ switch and $0..$n in sReplace.
   94:     5- JScript expressions documentation (/J switch):
   95:           Describe the operators that may be used in JScript expressions.
   96:     6- Data types and functions for JScript expressions
   97:           Describe additional data types and functions in JScript:
   98:           - String Object: functions to manipulate strings.
   99:           - Math Object: arithmetic functions.
  100:           - Date Object: functions for date calculations.
  101:           See also topic 2- section 3- FindRepl.bat predefined functions.
  102:     0- End this help screen.
3 -39 +3
  106:  setlocal EnableDelayedExpansion
  107:   
  108:  < "%~F0" CScript //nologo //E:JScript "%~F0" "^<usage>" /E:"^</usage>" /O:+1:-1
7 -3 +1
  116:              "http://msdn.microsoft.com/en-us/library/htbw4ywd(v=vs.84).aspx") do (
4 -1 3 -1 +1
  124:  choice /C %choices%0 /N /M "Select one of previous help topics:"
6 -1 25 -1 +1
  156:      search       = undefined,
28 -2 +5
  185:            if ( result.substr(0,1) == "=" ) {
  186:               result = env(result.substr(1));
  187:            } else {
  188:               if ( quote != undefined ) result = result.replace(eval("/"+quote+"/g"),'"');
  189:            }
7 -1 +5
  197:  for ( var i = 1; block=options.Item("ARG"+i); i++ ) {
  198:     if ( block.substr(0,1) == "=" ) block = env(block.substr(1));
  199:     eval ( 'var ARG'+i+'="'+block+'";' );
  200:  }
  201:  var daysNow = Math.floor((new Date()).getTime()/86400000), n = 0;
3 -6 +1
  205:     for ( var i = 1; i <= JexprN; i++ ) eval( "var SUM"+i+"=0,N"+i+"=0,PROD"+i+"=1;" );
4 -1 +1
  210:  function choose(arg,i){return(arg[i]);};
6 -4 +5
  217:     var val = 0, v;
  218:     for ( var i=a; i<=b; i++ ) {
  219:        val+=v=parseFloat(arg[i]); eval("SUM"+i+"+=v;++N"+i+";");
  220:     }
  221:     n=b-a+1;
3 -14 +10
  225:     var val = 1, v;
  226:     for ( var i=a; i<=b; i++ ) {
  227:        val*=v=parseFloat(arg[i]); eval("PROD"+i+"*=v;");
  228:     }
  229:     n=b-a+1;
  230:     return(val);
  231:  }
  232:  function max(arg,a,b) {
  233:     var val=-Number.MAX_VALUE;
  234:     for ( var i=a; i<=b; i++ ) if ( arg[i]>val ) val=arg[i];
3 -105 +8
  238:     var val= Number.MAX_VALUE;
  239:     for ( var i=a; i<=b; i++ ) if ( arg[i]<val ) val=arg[i];
  240:     return(val);
  241:  }
  242:  function days(s){return(daysNow - Math.floor((new Date(s)).getTime()/86400000));}
  243:  function prompt(s){WScript.Stderr.Write('Replace "'+s+'" by: '); return(file.ReadLine());}
  244:   
  245:  if ( file = options.Item("G") ) {
3 -15 +9
  249:     var sepIn = (Jexpr?'||':'|'), sepOut = "";
  250:     while ( ! file.AtEndOfStream ) {
  251:        block = file.ReadLine().split(sepIn);
  252:        search += sepOut+block[0];
  253:        if ( alternation ) replace += sepOut+block[1];
  254:        sepOut = sepIn;
  255:     }
  256:     block = undefined;
  257:     file.Close();
9 -5 +5
  267:  if ( replace ) {  // Check this! It may cause problems when combined with /A and /J
  268:     // replace = eval('"' + replace + '"');
  269:     if ( quote != undefined ) replace = replace.replace(eval("/"+quote+"/g"),'"');
  270:  }
  271:   
9 -1 13 -5 +5
  294:           var searchA = search.split("|"),
  295:               replaceA = replace.split("|"),
  296:               repl = new Array();
  297:           for ( var i = 0; i < searchA.length; i++ ) {
  298:              repl[searchA[i]] = replaceA[i];
3 -1 +1
  302:           replaceA.length = 0;
4 -5 +6
  307:           Jexpr = false;
  308:   
  309:           var searchA = search.split("||"),              // divide search "regexp1||regexp2" in parts
  310:               repl = replace.split("||");                // the same for "replace1||replace2"
  311:           search = "";
  312:           replace = "$0,";
12 -3 28 -3 +3
  353:  if ( search != undefined ) search = new RegExp(search, "gm"+ignoreCase);
  354:  if ( block  != undefined ) block  = new RegExp(block , "gm"+ignoreCase);
  355:  file = fso.OpenTextFile("CONIN$", 1);
34 -1 +1
  390:        if ( search != undefined ) {  // Blocks based on Search lines:
28 -1 +1
  419:  // endif Process Source string instead of file
108 -3 +2
  528:     if ( lastLine = options.Item("L") ) {
  529:        if ( lastLine.substr(0,1) == '=' ) lastLine = env(lastLine.substr(1));
5


Antonio
Last edited by Aacini on 15 Dec 2014 23:38, edited 1 time in total.

Aacini
Expert
Posts: 1648
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#66 Post by Aacini » 01 Dec 2014 00:24

foxidrive wrote:. . .
Below I have a suggestion to help people use your tools, because there is a fair amount of chatter and questions in the main threads,
if you both think it is worthwhile then I'd suggest that you create dedicated threads called something like:


FindRepl: Examples and tutorials
FindRepl: Code
FindRepl: Discussion


. . .



dbenham wrote:I like the idea of a an "examples/tutorial" thread - it is something I have been thinking about starting. The tricky part will be discouraging/preventing people from posting questions like "How do I do this?", when the needed info has already been posted. It can very quickly devolve into a general discussion page. Ideally, people would post new questions separate from the examples/tutorial, and if the solution demonstrates an important concept, or if it has generally useful utility, then it could be added to the examples/tutorial.

There is another issue with a tutorial thread. Often times there is a logical progression of ideas in a tutorial. But what to do if a new example "belongs" early in the thread - should an earlier post be updated?

. . .



Before I posted the version 2 of FindRepl.bat I tried to devise a method that allows an efficient posting of new versions and at the same time allows the discussion of ideas by several users. The method I am using now is based on the following points:

  • The program's built-in help screen should be short, just as a quick reference guide, because the correct description of complex applications can not be appropriately done in text screens displayed in the command-line window. This point aids in keeping the size of source code not too large.
  • An extensive documentation of the program may be posted after the code, so it may include a much better formatting options and two direct-links per post. Several indirect links-as-text may also be added (i.e. to related questions or examples posted elsewhere), but they requires to be copied to the address bar of the explorer. An index may be included at beginning of the documentation, so the reader may use the "Find in this page" feature of the explorer to quickly pass to any desired section just looking for its number and a dot; this point minimize the lack of more direct links. [SITE OWNER]: Perhaps would be a good idea to define a new type of link that points to other targets in the same thread or post, like the "Table of Contents" to ==Subdivision headers== of wikipedia.
  • The first post of the thread should contain the last version of the code followed by the last version of the documentation, and nothing more. This point allows to keep all reference information about the application in just one place, with no disturbing discussion posts in the middle. All discussion threads are kept in posterior posts.
  • When a new version is released, there is no need to preserve the full code of all previous versions. The frequent readers of very active threads have all program versions for sure, and casual visitors that just wants to use the application don't care of previous versions. However, if preserving all versions could be important for any reason, it is just needed to preserve the changes of the last version versus the previous one. My next step is to modify my old FComp.bat program in order to present this information in a way that allows to obtain a previous version when these changes are applied to the last posted version via a simple edit session. A further to-do feature is to write a program that perform this modification automatically. This point not only save space, but also prevents to cut an interesting discussion thread by large sections of code.
  • When a new version is released, the announcement post should describe the new features when they are compared vs. previous ones, and include some examples of the new points. The new features must also be included in the last documentation, but describing they as part of the application, not as something new. If new features does not alter any previous one, the announcement post may only indicate that the new features are fully described at section "#. What's-its-name" in the documentation.

I posted the new FindRepl.bat version 2.1 above using this method.

Antonio

foxidrive
Expert
Posts: 6033
Joined: 10 Feb 2012 02:20

Re: FindRepl.bat:New regex utility to search and replace str

#67 Post by foxidrive » 01 Dec 2014 00:50

Fantastic Antonio, that's another Xmas present for us! :)

Aacini wrote:
  • When a new version is released, there is no need to preserve the full code of all previous versions.


You're right there - and maybe a different solution is possible that will let people get a previous version in full and with little effort by you or them?
If you have some web hosting that is available: keep a ZIP or RAR file with all the versions of findrepl in them and just update that file every time findrepl changes.
The size of the ZIP file should be quite reasonable and a link can be placed in the first post.

Just a different suggestion

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#68 Post by bars143 » 01 Dec 2014 06:40

Aacini, Great Jobs, seems you are hardworking enough to make new way to help us! :)

well , your new script using your latest Findrepl :

Code: Select all

echo %text% |FindRepl  =search /A =replace 


input.txt

Code: Select all


there are  two doublecolon : symbols : here
there are  two doublequotes marks "symbols" here
there are  less and greater than < symbols > here
there are  questionmark and asterisk ? symbols * here
there are  backslash and slash \ symbols / here
there are  pipe and exclamation marks | symbols ! here
there are  dollar and ampersand $ symbols ^ here
there are  plus and dot + symbols . here


my bat file:

Code: Select all

@echo off
setlocal

echo/
echo Original alternation (literal strings)
set "search='|\||\\|\*|\+|\?|\^|\$|\.|\[|\]|\{|\}|\(|\)|\<|\>|:|\/|\!"
set "replace=[DQ]|[PI]||[BS]|[ST]|[PL]|[QM]|[AM]|[DR]|[DT]|[OSB]|[CSB]|[OCB]|[CCB]|[OP]|[CP]|[LS]|[GT]|[DC]|[SL]|[EM]"

type input.txt | FindRepl /Q:' =search /A =replace

echo.&echo.&pause


output in console:

Code: Select all


Original alternation (literal strings)

there are  two doublecolon [DC] symbols [DC] here
there are  two doublequotes marks [DQ]symbols[DQ] here
there are  less and greater than [LS] symbols [GT] here
there are  questionmark and asterisk [QM] symbols [ST] here
there are  backslash and slash [BS] symbols [SL] here
there are  pipe and exclamation marks [PI] symbols [EM] here
there are  dollar and ampersand [DR] symbols [AM] here
there are  plus and dot [PL] symbols [DT] here

Press any key to continue . . .


almost easy to edit and expand for more chars!
i will try more chars soon.

Bars

brinda
Posts: 78
Joined: 25 Apr 2012 23:51

Re: FindRepl.bat:New regex utility to search and replace str

#69 Post by brinda » 01 Dec 2014 07:03

antonio,

thank you

Aacini
Expert
Posts: 1648
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#70 Post by Aacini » 15 Dec 2014 23:51

The new FindRepl.bat version 2.2 was released.

FindRepl.bat program provide a wide range of different uses because the possible ways that its parameters can be combined, although the large number of combinations may lead to some misunderstandings. For example, the capability of replace text in a block of lines selected by line numbers was already programmed in FindRepl.bat program since the first version indeed, but until I tried to add this capability I realized that the way to use it was by inserting a null search text (""), so I just removed a test on this point in order to make this feature operational!. This way, this feature is not a "new" one nor a bug fix, so just an update in the documentation was required.

A similar point happens with the new /J switch in FindRepl version 2. The inclusion of this switch modifies the way that other parameters are processed (as JScript expressions instead of text), but which parameters exactly should be affected by /J switch? The obvious one was the sReplace text, but there are more parameters in FindRepl program. The three "search" parameters (rSearch, rEndBlk and rBlock) are regular expressions so they are not affected by the /J switch, although the inclusion of /J eliminate the previous restriction in /A switch that rSearch was limited to contain just literal strings.

The last possible parameter is /S:sSource, that provide the text to be processed instead of Stdin file, and in this case the conclusion is obvious: when /J switch is included, the sSource text must be evaluated as a JScript expression that will provide the actual input text to process. Although this is a very small change in the code (just an additional "eval" function is needed indeed), the possibilities derived from this change are substantial; this way, this "not-new" feature increases one more time the range of possible problems that FindRepl.bat program may solve using the same concepts and methods than before.

When /J switch was added, several predefined functions were included in FindRepl version 2.1 in order to be used in the sReplace JScript expression, so this time we need to include some "data generating" functions designed to be used in the /S:sSource JScript expression. These additional predefined functions should belong to the same FindRepl version 2.1, not to a new version; however, the amount of added code is somewhat large, so I named it Version 2.2 although strictly speaking there is not any "new" feature on it.

The FindRepl.bat program listing placed in the first post of this thread now includes the additional code as usual, and the documentation of Version 2 was modified in order to present the new predefined functions in a more organized way. I did a few minor changes in the names of some original functions so they fits better in the new organization. I suggest you to review the "3. FindRepl.bat predefined functions" section, specially the point "3.4. Data generating functions". Below there are a few examples of these functions.

- Example of the input of a prefilled form via prompt function:

Code: Select all

C:\> echo NAME's favorite color is COLOR and favorite pets are PETS. | FindRepl "NAME|COLOR|PETS" "prompt('Enter your '+$0+': ')" /J
Enter your NAME: Antonio
Enter your COLOR: green
Enter your PETS: dogs
Antonio's favorite color is green and favorite pets are dogs.

- List files that were modified more than 30 days ago and size larger than 1 MB, and allows the user to selectively delete them:

Code: Select all

dir /B | FindRepl "([^\r\n]*)\r\n" "(days(fileProperty($1,TW))>30)&&(fileProperty($1,Z)>1048576)?prompt('Delete '+$1+'? ').substr(0,1).toUpperCase()=='Y'?Delete($1)+'\r\n':'':''" /J

- Some examples taken from (http://www.dostips.com/forum/viewtopic.php?f=3&t=6081):

1) Convert all .txt files to lower case in the current directory:

Code: Select all

dir /B | FindRepl "([^\r\n]*)\r\n" "fileRename($1,$1.toLowerCase())" /J

2) Rename all 76 ".jpg" files in current directory to an increasing padded number followed by a constant string. Resulting file names should look like "01_Christmas2014.jpg", "02_Christmas2014.jpg", etc.:

Code: Select all

dir /B *.jpg | FindRepl "([^\r\n]*)\r\n" "/VAR:Num=100" "fileRename($1,(++Num).toString().substr(1)+'_Christmas2014.jpg')+'\r\n'" /J

- List all drives in the system and warn when the percentage of occupied space in a disk is greater than 85%:

Code: Select all

set "source=drivesCollection('Path','FreeSpace','TotalSize')"
set "search=#([^#]*)#,#([^#]*)#,#([^#]*)#\r\n"
FindRepl /S:=source /Q:# =search "$1+'\t'+(S=Math.floor(100-$2*100/$3))+'% Occupied'+(S>85?'\tRequires cleanup!':'')+'\r\n'" /J

- List all files in current folder with Created, Accessed and Modified Dates:

Code: Select all

FindRepl "/S:filesCollection('.','DateCreated','DateLastAccessed','DateLastModified','Name')" /J

- List the Path of all subfolders in current folder, including the Name of all files in each subfolder:

Code: Select all

FindRepl "/S:foldersCollection('.','Path','filesCollection(folder.Files,\'Name\')')" /J

- List the Path of all subfolders in current folder and all nested subfolders under them, but no more than two levels down:

Code: Select all

FindRepl "/VAR:Levels=2" "/S:foldersCollection('.','Path','Levels?(--Levels,folder.SubFolders):false','++Levels')" /J

- List the folders I have defined in My Documents:

Code: Select all

FindRepl "/S:specialFolders('MyDocuments.SubFolders','Name')" /J

- List the name of all Fonts installed in the system with .FNT extension:

Code: Select all

FindRepl "/S:specialFolders('Fonts.Files','Name')" /J ".*\.FNT"

- List the services installed in the system and its status:

Code: Select all

FindRepl "/S:wmiCollection('Win32_Service','Name','ServiceType','Started','State')" /J



As said before, it is important to understand the different ways that FindRepl's parameters may be combined in order to get a desired result. Let's review some possibilities.

If the rSearch argument is null, all input lines are just copied to the screen:

Code: Select all

C:\ type test.txt
This is just a small example of a
text file that
have three total lines.

C:\ type test.txt | FindRepl ""
This is just a small example of a
text file that
have three total lines.

You may let FindRepl to get all input lines, but show nothing this way:

Code: Select all

C:\ type test.txt | FindRepl ".|\n*" "" /J

The input lines are stored in a long string called "inputLines", that may be shown via the /L switch after all lines has been "processed":

Code: Select all

C:\ type test.txt | FindRepl ".|\n*" "" /J /L:inputLines
This is just a small example of a
text file that
have three total lines.

This way, you may apply any JScript extra function or method to the input lines in the /L expression, for example:

Code: Select all

C:\ type test.txt | FindRepl ".|\n*" "" /J /L:inputLines.toUpperCase()
THIS IS JUST A SMALL EXAMPLE OF A
TEXT FILE THAT
HAVE THREE TOTAL LINES.

Do you want to sort the input lines? Just split the string in individual lines (into an array), sort the array, and join the elements back in a string:

Code: Select all

C:\ type test.txt | FindRepl ".|\n*" "" /J /L:inputLines.split('\r\n').sort().join('\r\n')
This is just a small example of a
have three total lines.
text file that

Ops! The default JScript's sort method is case sensitive! The JScript documentation indicate that sort method may have a parameter that specify "The name of a function with two arguments used to determine the order of the elements, that must return one of the following values: A negative value if the first argument passed is less than the second argument. Zero if the two arguments are equivalent. A positive value if the first argument is greater than the second argument". That is:

Code: Select all

C:\ type test.txt | FindRepl ".|\n*" "" /J "/L:inputLines.split('\r\n').sort(function(a,b){var A=a.toUpperCase(),B=b.toUpperCase();return(A<B?-1:A>B?1:0)}).join('\r\n')"
have three total lines.
text file that
This is just a small example of a

This way, you may adjust the sort comparison exactly as you need it in order to solve your problem: start at a given column, left-fill a field with zeros, or any other requirement you may have in the same way than you adjust the rest of FindRepl parameters in order to get the result you want. FindRepl.bat program is a tool that may be as powerful as you want to use it.

The number of different results that may be obtained via the combination of FindRepl.bat Version 2.2 parameters and its predefined functions is large, so the types of possible results can not be easily anticipated. I found surprised myself when I realized that a certain combination may lead to a new type of result! It is necessary to perform several additional tests and, after that, post the examples of the new types of problems that may be solved using a certain combination of parameters in FindRepl.bat Version 2.2.

Antonio

PS - Below is the listing created by FComp /Modifications program that allows you to roll back from FindRepl.bat Version 2.2 to Version 2.1, as described at this post.

V2.2_rollbackto_V2.1.txt:

Code: Select all

20 -2 +1
   21:  rem                                Two new sets of predefined functions for Date and File management.
28 -1 +1
   50:    /J[:n]     Specifies that rSearch/sReplace have expressions (regexp/JScript).
10 -3 92 -12 +5
  153:           See also Topic 2- Sections 3. and 4. on predefined functions.
  154:   
  155:     7- FileSystemObject File object documentation:
  156:           Describe the properties that may be used in fileProperty
  157:           and the other four File management predefined functions.
13 -3 +1
  171:              "http://msdn.microsoft.com/en-us/library/1ft05taf(v=vs.84).aspx"
13 -1 +1
  185:  echo  - Help on topic %choice% opened...
12 +4
  198:   
  199:  // PARSE PARAMETERS
  200:   
  201:   
21 -1 27 -5 +6
  250:  // Predefined variables
  251:   
  252:  if ( block=options.Item("VAR") ) eval ( "var "+block+";" );
  253:  if ( Jexpr) {
  254:     var JexprN = 10;
  255:     if ( options.Item("J") ) JexprN = parseInt(options.Item("J"));
8 -1 +1
  264:  // Predefined functions
42 -2 37 -4 4 -16 5 -2 +1
  353:      daysNow = Math.floor((new Date()).getTime()/millisecsPerDay);
11 -4 +2
  365:  function fileExist(fileName) {
  366:     return(fso.FileExists(fileName));
2 -220 +23
  369:  function fileCopy(fileName,destination) {
  370:     file = fso.GetFile(fileName);
  371:     file.Copy(destination);
  372:     return('File "'+fileName+'" copied');
  373:  }
  374:  function fileMove(fileName,destination) {
  375:     file = fso.GetFile(fileName);
  376:     file.Move(destination);
  377:     return('File "'+fileName+'" moved');
  378:  }
  379:  function fileDelete(fileName) {
  380:     file = fso.GetFile(fileName);
  381:     file.Delete();
  382:     return('File "'+fileName+'" deleted');
  383:  }
  384:   
  385:  var A = "Attributes", TC = "DateCreated", TA = "DateLastAccessed", TW = "DateLastModified",
  386:      D = "Drive", N = "Name", P = "ParentFolder", F = "Path", SN = "ShortName",
  387:      SF = "ShortPath", Z = "Size", X = "Type";
  388:  function fileProperty(fileName,property) {
  389:     file = fso.GetFile(fileName);
  390:     return( eval("file."+property) );
  391:  }
9 -3 10 -1 2 -1 +1
  413:              if ( alternation ) replace += sepOut+block[1];
47 -1 +1
  461:              repl[eval('"'+searchA[i]+'"')] = replaceA?eval('"'+replaceA[i]+'"'):"";
56 -1 17 -1 3 -14 +6
  538:     var inputLines = new Array(), lastLine = 1;
  539:     inputLines[0] = source;
  540:     procLines = true;
  541:  } else {  // Process Stdin file
  542:   
  543:     inputLines = WScript.StdIn.ReadAll();
40 -1 +1
  584:  // endif Process Stdin file
108 -8 +7
  693:     if ( Jexpr && (lastLine=options.Item("L")) ) {
  694:        if ( lastLine.substr(0,1) == '=' ) lastLine = env(lastLine.substr(1));
  695:        if ( quote != undefined ) lastLine = lastLine.replace(eval("/"+quote+"/g"),"\\x22");
  696:        WScript.Stdout.WriteLine( eval(lastLine) );
  697:     }
  698:   
  699:  }
1

brinda
Posts: 78
Joined: 25 Apr 2012 23:51

Re: FindRepl.bat:New regex utility to search and replace str

#71 Post by brinda » 16 Dec 2014 06:47

thank you antonio

Aacini
Expert
Posts: 1648
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: FindRepl.bat:New regex utility to search and replace str

#72 Post by Aacini » 16 Dec 2014 21:16

brinda wrote:thank you antonio

You're welcome!

Did you tested FindRepl.bat program with your problem about removing unwanted characters from an OCR scanned file? You posted such request here. This is the replacements file that you may use to solve that request:

Code: Select all

// OCRfix.txt: Fix the data captured via OCR scan, used with FindRepl.bat program

// These characters must be escaped with backslash: \*+?^$.[]{}()|
// * match "zero or more", + match "one or more", ? match "zero or one"

// a) for the symbols ,?!. is found, they should follow a word(no space).
//    A space should be there after ,?!. before the next word
// -> Replace "zero or more spaces" ",|?|!|." "zero or more spaces" by
//    "no spaces" ",|?|!|." "one space"
// c) Capitalization of first character after ?!.
// -> Convert next character, after NOT ",", to uppercase
 *(,|\?|!|\.) *(.)||$1+' '+($1!=',')?$2.toUpperCase():$2

// b) "(open quote) should have a space before and "(close quote) a space after
// -> Initialize "inQuote" variable with false
var inQuote=false;
// -> Replace "zero or more spaces" "quote" IF NOT inQuote by "space" "quote" AND set inQuote to true
 *\x22||inQuote?$0:inQuote=true,' \x22'
// -> Replace "quote" "zero or more spaces" IF inQuote by "quote" "space" AND set inQuote to false
\x22 *||inQuote?inQuote=false,'\x22 \':$0

// Previous lines use \x22 for "quote", because a literal quote can NOT be included;
// this way, this file does NOT depend on using the /Q switch in FindRepl execution

// d) paragraph begins is determined by 2 times ENTER and end after with 2 times enter as well.
// -> paragraph ends at "\r\n\r\n"    (Or should it be "\r\r"?)

// e) A tab is needed at the beginning of a paragraph with first letter capitalize.
// -> Replace "\r\n\r\n" "any char" by "\r\n\r\n" "tab" "any char capitalized"
\r\n\r\n(.)||'\r\n\r\n\t'+$1.toUpperCase()

// f) If there is ENTER in between this paragraph, strip the enter
// -> Remove any ENTER that passed previous rules
\r\n||''

// g) Any other tab or extra spaces should be strip.
//    Only a single space before and after a word.
// -> Replace multiple tabs/spaces by one space
(\t| )+||' '

I am very interested to know if FindRepl is capable to create the right output file in just one run. Perhaps previous replacements file needs to be modified a little; for example, in the order that the replacements are performed (the widest rules must be placed first, and the narrow last). I'll appreciate it if you may post any notice about this point.

Antonio

brinda
Posts: 78
Joined: 25 Apr 2012 23:51

Re: FindRepl.bat:New regex utility to search and replace str

#73 Post by brinda » 16 Dec 2014 22:20

sure antonio. am always grateful for this forum and the help it gives

probably i am doing this wrong

Code: Select all

D:\Downloads>< test.txt findrepl /G:ocrfix.txt > new.txt
D:\Downloads\findrepl.bat(754, 21) Microsoft JScript runtime error: Expected ')' in regular expression


on trying individually

Code: Select all

D:\Downloads>type test.txt
this  is 1 paragraph .this  is 1 paragraph. this  is 1 paragraph .

this  is 2 paragraph     .
this    is 2 paragraph , this  is 2, paragraph .
this   is 2 paragraph .

this is 3 paragraph ,this is 3 paragraph, this is 3 paragraph

D:\Downloads>< test.txt findrepl "(\t| )+" " "
this is 1 paragraph .this is 1 paragraph. this is 1 paragraph .

this is 2 paragraph .
this is 2 paragraph , this is 2, paragraph .
this is 2 paragraph .

this is 3 paragraph ,this is 3 paragraph, this is 3 paragraph


need some help.

using the latest version on first page

brinda

nikhil3298
Posts: 1
Joined: 21 Jan 2015 11:56

Re: FindRepl.bat:New regex utility to search and replace str

#74 Post by nikhil3298 » 23 Jan 2015 00:35

Hi Aacini,

I am working on this requirement where I need to search for multiple keywords in a file
So I need to print the keyword found as well as line number in file also the line in the file
Like for below file
Keyword is found here
NoKeyword is found here

[Keyword]:1:Keyword is found here

Can you please help on the same.
Thanks a lot...its a great community.

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: FindRepl.bat:New regex utility to search and replace str

#75 Post by bars143 » 26 Jan 2015 01:37

Hi, Aacini or anyone

i had an input.text file contains five sentences including characters with five empty line = an overall total of 10 lines:

Code: Select all

first line, with comma character

second line with punctuation mark !

third line with question mark ?

fourth line with two "double quote"

fifth line had dot in the end.


and i want your help to give us a solution/findrepl.bat script that give temp-output.txt result with all characters removed:

Code: Select all

first
line
with
comma
character
second
line
with
punctuation
mark
third
line
with
question
mark
fourth
line
with
two
double
quote
fifth
line
had
dot
in
the
end


then finally removed all duplicates as content of final-output.txt ( i need this output for easy translation to my native language which is a Cebuano language - a local dialect):

Code: Select all

first
line
with
comma
character
second
punctuation
mark
third
question
fourth
two
double
quote
fifth
had
dot
in
the
end


if solution is given then i will use it for a DVD subtitle's .srt file edition

sorry for not trying to give you a script of my own.


Bars,

windows xp sp3 32bit user

Post Reply