JREPL.BAT v8.6 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: JREPL.BAT v7.1 - regex text processor now with Unicode and XRegExp support

#331 Post by aGerman » 10 Sep 2017 05:02

Glad it was helpful :D

Steffen

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.2 - regex text processor now with Unicode and XRegExp support

#332 Post by dbenham » 23 Sep 2017 13:11

Sorry guys. I discovered a stupid bug with the /T FILE ADO support - it was completely broken :oops:
I had tested the feature, but then I made one additional small change before release of 7.1, and I forgot to apply the change to the /T FILE feature.

The /T FILE option now properly supports ADO as was originally intended with v7.0 :)

I also improved the documentation of the new v7 features.

I've updated the prior release post to v7.2

The /X documentation describes how the \xnn escape sequence only works properly if your machine defaults to the Windows-1252 character set, or if you explicitly use ADO to read and write using the Windows-1252 character set.

I am working on a new version 7.4 that should enable \xnn to support any single byte character set. Hopefully it will not be long.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.3 - regex text processor now with Unicode and XRegExp support

#333 Post by dbenham » 23 Sep 2017 16:03

Aaaargh. :evil:
Another $#!!* bug: /O - failed if input read with ADO.

Fixed and updated to version 7.3

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.4 - regex text processor now with Unicode and XRegExp support

#334 Post by dbenham » 25 Sep 2017 15:45

Here is version 7.4 with new behavior for the /X \xnn extended ASCII escape sequence.
JREPL7.4.zip
Downloaded 112 times from the main release page in 13 days while v7.4 was the current release
(23.59 KiB) Downloaded 715 times

Prior to 7.4, \xnn was always treated as a Windows-1252 byte code. This worked great for most people in Western Europe and North and South America. But there are many others where this behavior is worthless.

Starting with v7.4, the /X \xnn sequence uses the correct local character set, whenever possible. This is accomplished by creating a binary file length 256 that contains all possible byte codes, and then letting JREPL read the file. Assuming the file is interpreted as a single byte character set, then each byte is converted into a specific Unicode code point according to the rules of the character set. JREPL can then read the character at a particular offset to determine the correct mapping.

The other v7.0 features allow the character set to be selected independently for both input and output, using ADO. Version 7.4 automatically interprets the /X \xnn using the correct character set, depending on if the sequence is in a search string (input), or replacement string (output).

If the character set is not a fixed, single byte character set, then /X \xnn is treated as a Unicode code point.

If the binary file cannot be created for any reason, then JREPL falls back to v7.3 behavior, where /X \xnn is always treated as a Windows-1252 byte code.

Below is the relevant documentation changes for the new v7.4 behavior. Be sure to look at the other v7.0 features, as they work well with the v7.4 changes.

Code: Select all

>jrepl /?history

    2017-09-25 v7.4: Modified /X \xnn extended ASCII escape sequence to support
                     any single byte character set.
                     Added /X \x{nn,Charset} escape sequence.
                     Added /XBYTES and /XBYTESOFF options.
                     Modified decode() to support the new /X \xnn behavior.
<...truncated>

>jrepl /?/X & jrepl /?/XBYTES & jrepl /?/XBYTESOFF

      /X  - Preserves extended ASCII characters that may appear within
            command line arguments and/or variables by first writing the
            values to temporary files within the %TEMP% directory. Extended
            ASCII values are byte codes >= 128 (0x80). Extended ASCII within
            files, stdin, and stdout are preserved regardless.

            Also enables extended escape sequences for both Search strings and
            Replacement strings, with support for the following sequences:

            \\     -  Backslash
            \b     -  Backspace
            \c     -  Caret (^)
            \f     -  Formfeed
            \n     -  Newline
            \q     -  Quote (")
            \r     -  Carriage Return
            \t     -  Horizontal Tab
            \v     -  Vertical Tab
            \xnn   -  Extended ASCII byte code expressed as 2 hex digits nn.
                      If used within a Find string, then the input character
                      set is used. If within a Replacement string, then the
                      output character set is used. If the selected character
                      set is invalid or not a single byte character set, then
                      \xnn is treated as a Unicode code point.
            \x{nn,CharSet} - Same as \xnn, except explicitly uses CharSet
                      character set mapping.
            \unnnn -  Unicode code point expressed as 4 hex digits nnnn.
            \u{N}  -  Any Unicode code point where N is 1 to 6 hex digits

            JREPL automatically creates an XBYTES.DAT file containing all 256
            possible byte codes. The XBYTES.DAT file is preferentially created
            in "%ALLUSERSPROFILE\JREPL\" if at all possible. Otherwise the
            file is created in "%TEMP%\JREPL\" instead. JREPL uses the file
            to establish the correct \xnn byte code mapping for each character
            set. Once created, successive runs reuse the same XBYTES.DAT file.
            If the file gets corrupted, then use the /XBYTES option to force
            creation of a new XBYTES.DAT file. If JREPL cannot create the file
            for any reason, then JREPL defaults to using pre v7.4 behavior
            where /X \xnn is interpreted as Windows-1252.

            Without the /X option, only standard JSCRIPT escape sequences
            \\, \b, \f, \n, \r, \t, \v, \xnn, \unnnn are available for the
            search strings. And the \xnn sequence represents a unicode
            code point, not extended ASCII.

            Extended escape sequences are supported even when the /L option
            is used. Both Search and Replace support all of the extended
            escape sequences if both the /X and /L opions are combined.

            Extended escape sequences are not applied to JScript code when
            using any of the /Jxxx options. Use the decode() function if
            extended escape sequences are needed within the code.


      /XBYTES - Force creation of a new XBYTES.DAT file for use by the /X
            option when decoding \xnn sequences.


      /XBYTESOFF - Force JREPL to use pre v7.4 behavior where /X \xnn is
            always interpreted as Windows-1252.
 
>jrepl /?jscript

<...truncated>

      decode( String [,CharSet] )

               Decodes extended escape sequences within String as defined by
               the /X option, and returns the result. CharSet specifies the
               single byte character set to use for \xnn escape sequences.
               If CharSet is 'input', then the character set of the input is
               used. If CharSet is 'output', then the character set of the
               output is used. If CharSet is 'default' or undefined, then the
               default character set for the machine is used. Otherwise,
               CharSet should be a valid internet character set name understood
               by the machine. If the selected character set is invalid or not
               a single byte character set, then \xnn is treated as a Unicode
               code point.

               All backslashes within String must be escaped an extra time to
               use this function in your code.

               Examples:
                  quote literal:       decode('\\q','output')
                  extended ASCII(128): decode('\\x80','output')
                  backslash literal:   decode('\\\\','output')

               This function is only needed if you use any \q, \c, or \u{N}
               escape sequences, or \xnn escape sequence for extended ASCII.

<...truncated>

I have no plans to introduce any new features to JREPL, so barring any bugs, this should be the last version of JREPL for quite some time.

Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.6 - regex text processor now with Unicode and XRegExp support

#335 Post by dbenham » 08 Oct 2017 09:40

dbenham wrote:(Re: v7.4) I have no plans to introduce any new features to JREPL, so barring any bugs, this should be the last version of JREPL for quite some time.
Well I guess I lied :twisted: , plus I fixed a minor bug

Here is version 7.6 with new help options and a bug fix
JREPL7.6.zip
Downloaded 144 times from the main release page in 16 days while v7.6 was the current release.
(24.41 KiB) Downloaded 706 times

Code: Select all

>jrepl /?history

    2017-10-08 v7.6: Fixed /?Intro syntax help for /?Charset/[Query]   
    2017-10-08 v7.5: Added /?CHARSET and /?XREGEXP web page help options
                     Added /?CHARSET/[query] List character sets help option
                     Fixed ADO output.WriteLine() to use \r\n instead of \n
                     Improved documentation: /EXC, /OFF, /U, /?HELP, decode()
<truncated...>


1) Bugfix
The output.WriteLine() method used in user supplied JScript has been fixed to always use \r\n line terminators. The prior bugged version was using \n with ADO output.


2) New /?[?]CHARSET/[Query] help option

Code: Select all

>jrepl /?help

<truncated...>

      /?CHARSET/[Query] - List all character set names for use with ADO I/O
                  that are installed on this computer. Optionally restrict
                  the list to names that contain Query. Wildcards * and ? may
                  be used within Query. The default Query is an empty string,
                  meaning list all available character sets. The list is
                  generated via reg.exe.

                  Examples:

                     jrepl /??charset/    - Paged list of all available names
                     jrepl /?charset/utf  - List of names containing "utf"
Note that this help option requires that the user has privileges to query the registry via reg.exe.


3) New /?CHARSET web page help
Opens up a Microsoft documentation page listing code pages and their corresponding character set names.


4) New /?XREGEXP web page help
Opens up the home page for the xRegExp augmented regular expression javascript module'


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.7 - regex text processor now with Unicode and XRegExp support

#336 Post by dbenham » 24 Oct 2017 18:48

Here is JREPL.BAT version 7.7
JREPL7.7.zip
Downloaded 215 times in 3 weeks from the main release page while v7.7 was the latest version.
(24.6 KiB) Downloaded 709 times


Code: Select all

>jrepl /?history

    2017-10-24 v7.7: Fixed broken Microsoft documentation links
                     Allow /O "-|CharSet"
                     Fixed decode(Str[,CharSet]) bug when CharSet is undefined
<truncated...>


1) Fixed broken documentation links
Microsoft documentation links within JREPL were recently broken and had to be fixed.

2) Allow /O "-|CharSet"
The prior version forced the output character set to match the input when the /O - option was used (no |CharSet specification allowed). Version 7.7 allows the output character set to be different when using /O -.

Code: Select all

C:\test>jrepl /?/o

      /O OutFile[|CharSet]

<truncated...>

            If /F InFile is also used, then an OutFile value of "-" overwrites
            the original InFile with the output. A value of "-" preserves the
            original character set. A value of "-|" explicitly transforms the
            file into the machine default character set. A "-|CharSet" value
            explicitly transforms the file into the specified character set.
            The output is first written to a temporary file with the same path
            and name, with .new appended. Upon completion, the temp file is
            moved to replace the InFile.

<truncated...>

3) Fix decode(Str[,CharSet]) bug
Version 7.4 extended the decode() function to allow specification of the character set to be used with \x escape sequences. In the interest of remaining backward compatible, the CharSet argument was supposed to be optional, but versions 7.4 through 7.6 were bugged.

Version 7.7 truly makes the CharSet argument optional, as was originally intended.


Dave Benham

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: JREPL.BAT v7.7 - regex text processor now with Unicode and XRegExp support

#337 Post by zimxavier » 12 Nov 2017 06:41

Hi :)

1. Is there a way to avoid the buggy findstr with /g parameter using JREPL? (I use /r parameter for avoiding a nasty bug even if my strings are literal, and it is very slow) Maybe jmatchq?
And \v parameter?

2. I would like to extract some strings from inside brackets and after "function" in one move:

Example.txt

Code: Select all

function = { string1 string2 string3 }
function = {
    string4
}
My code is currently in two steps:
batch.bat

Code: Select all

for %%F in ("D:\folder\*.txt") do (
call JREPL "(\bfunction\s*=\s*{)([\s\S]*?)}" "$txt=$2" /jmatchq /m /x /f "%%F" >> "newfile.txt"
)
call JREPL "([A-Za-z0-9_-]+)" "$txt=$1" /jmatchq /f "newfile.txt" /o -
step1: newfile.txt

Code: Select all

 string1 string2 string3 
    string4
step2: newfile.txt

Code: Select all

string1
string2
string3 
string4
Thanks for any help!

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.7 - regex text processor now with Unicode and XRegExp support

#338 Post by dbenham » 12 Nov 2017 12:46

1) There is no simple JREPL emulation of FINDSTR /G at the moment. But it is something that I have thought about in the past. I'm already working on a new JREPL release. Now that you have requested a solution, I think I will extend the /K and /R options to allow reading a set of search strings from a file. It shouldn't take long to whip up and release this new functionality (probably within 1 week).


2) Oh yes :!:
This begins to tap into the true power of JREPL :D

Read up on the /T (translate) option that allows you to specify multiple independent find/replace pairs. Couple that with /JMATCHQ and /JBEG to provide a little JSCRIPT logic, and the solution is relatively simple and elegant.

Code: Select all

@echo off
>newfile.txt (
  for %%F in ("D:\folder\*.txt") do (
    call jrepl "\bfunction\s*=\s*{ } [A-Za-z0-9_-]+" "$txt=!(go=true) $txt=go=false $txt=go?$0:go" /t " " /jmatchq /jbeg "var go=false" /f "%%F"
  )
)
I wish JSCRIPT regex supported look behind, because then the solution would be oh so simple with the /P (pre-filter) option:

Code: Select all

call jrepl "[A-Za-z0-9_-]+" "$txt=$0" /p "(?<=\bfunction\s*=\s*{)[\c}]+(?=})" /m /x /jmatchq /f "%%F"
But alas :( ... No look behinds, so the above does not work.

It is possible to solve this with the /P option, without /T, but then you must search for "function = {" twice, which I do not like:

Code: Select all

call jrepl "\bfunction\s*=\s*{|([A-Za-z0-9_-]+)" "$txt=$1?$1:false" /p "\bfunction\s*=\s*{[\c}]+}" /jmatchq /m /x /f "%%F"
2017-11-23 Update - New version 7.9 adds a /PREPL option that circumvents the lack of look behind support.

The solution is now as simple as:

Code: Select all

call jrepl "[A-Za-z0-9_-]+" "" /match /p "\bfunction\s*=\s*\{([\c}]+)}" /prepl "{$1}" /m /f "%%F"
Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.8 - regex text processor now with Unicode and XRegExp support

#339 Post by dbenham » 13 Nov 2017 16:49

Here is JREPL.BAT version 7.8
JREPL7.8.zip
Downloaded 112 times in 10 days from the main release page while v7.8 was the current release.
(26.23 KiB) Downloaded 688 times

Code: Select all

Prompt>jrepl /?history

    2017-11-13 v7.8: Added \x{nn-mm} and \x{nn-mm,CharSet} escape sequences
                     Split /X into /XFILE and /XSEQ - /X implies both
                     Add :FILE syntax for /K and /R to load searches from file
                     Fixed /XSEQ escaped backslash bug with /INC, /EXC, AND /P
<truncated...>
1) Add :FILE syntax for /K and /R to load searches from file

This feature satisfies a request from zimxavier to emulate the FINDSTR /G option, but without the nasty FINDSTR bugs.

Code: Select all

Prompt>jrepl /?/k & jrepl /?/r

      /K PreContext:PostContext[:FILE]
      /K Context[:FILE]

            Keep matches - Search and write out lines that contain at least
            one match, without doing any replacement. The Replace argument is
            still required, but is ignored.

            The integers PreContext and PostContext specify how many non-
            matching lines to write before the match, and after the match,
            respectively. If a single Context integer is given, then the same
            number of non-matching lines are written before and after.
            A Context of 0 writes only matching lines.

            If :FILE is appended to the context, then the Search parameter
            specifies a file containing one or more search terms, one term
            per line. A line matches if any of the search terms are found
            witin the line. The file can be opened via ADO if |CharSet
            (internet character set name) is appended to the file name.
            Note: the /V option does not apply to Search if /K :FILE is used.

            /K is incompatible with /A, /J, /JQ, /JMATCH, /JMATCHQ, /M,
            /MATCH, /R, /S, and /T.

      /R PreContext:PostContext[:FILE]
      /R Context[:FILE]

            Reject matches - Search and write out lines that do not contain
            any matches, without doing any replacement. The Replace argument
            is still required, but is ignored.

            The integers PreContext and PostContext specify how many matching
            lines to write before the non-match, and after the non-match,
            respectively. If a single Context integer is given, then the same
            number of matching lines are written before and after.
            A Context of 0 writes only non-matching lines.

            If :FILE is appended to the context, then the Search parameter
            specifies a file containing one or more search terms, one term
            per line. A line is rejected if any of the search terms are found
            witin the line. The file can be opened via ADO if |CharSet
            (internet character set name) is appended to the file name.
            Note: the /V option does not apply to Search if /K :FILE is used.

            /R is incomptaible with /A, /J, /JQ, /JMATCH, /JMATCHQ, /K, /M,
            /MATCH, /S, and /T.
Example - List all lines from input.txt that match at least one string found in file search.txt.

You might try the following with FINDSTR, but it could give the wrong results due to this bug

Code: Select all

findstr /L /G:search.txt input.txt
You can get the correct result using JREPL as follows:

Code: Select all

call jrepl search.txt "" /L /K 0:file /F input.txt
2) Added \x{nn-mm} and \x{nn-mm,CharSet} escape sequences
3) Split /X into /XFILE and /XSEQ - /X implies both

Code: Select all

D:\test>jrepl /?/x & jrepl /?/xfile & jrepl /?/xseq

      /X  - Shorthand for combined /XFILE and /XSEQ.

      /XFILE - Preserves extended ASCII characters that may appear within
            command line arguments and/or variables by first writing the
            values to temporary files within the %TEMP% directory. Extended
            ASCII values are byte codes >= 128 (0x80). This option is ignored
            (no temporary files written) if /UTF is also used.

            Temporary files may be needed when the cmd.exe active code page
            does not match the default character set used by the CSCRIPT
            JSCRIPT engine.

      /XSEQ - Enables extended escape sequences for both Search strings and
            Replacement strings, with support for the following sequences:

            \\     -  Backslash
            \b     -  Backspace
            \c     -  Caret (^)
            \f     -  Formfeed
            \n     -  Newline
            \q     -  Quote (")
            \r     -  Carriage Return
            \t     -  Horizontal Tab
            \v     -  Vertical Tab
            \xnn   -  Extended ASCII byte code expressed as 2 hex digits nn.
                      The code is mapped to the correct Unicode code point,
                      depending on the chosen character set. If used within
                      a Find string, then the input character set is used. If
                      within a Replacement string, then the output character
                      set is used. If the selected character set is invalid or
                      not a single byte character set, then \xnn is treated as
                      a Unicode code point. Note that extended ASCII character
                      class ranges like [\xnn-\xnn] should not be used because
                      the intended range likely does not map to a contiguous
                      set of Unicode code points - use [\x{nn-mm}] instead.
            \x{nn-mm} - A range of extended ASCII byte codes for use within
                      a regular expression character class expression. The
                      The min value nn and max value mm are expressed as hex
                      digits. The range is automatically expanded into the
                      full set of mapped Unicode code points. The character
                      set mapping rules are the same as for \xnn.
            \x{nn,CharSet} - Same as \xnn, except explicitly uses CharSet
                      character set mapping.
            \x{nn-mm,CharSet} - Same as \x{nn-mm}, except explicitly uses
                      CharSet character set mapping.
            \unnnn -  Unicode code point expressed as 4 hex digits nnnn.
            \u{N}  -  Any Unicode code point where N is 1 to 6 hex digits

            JREPL automatically creates an XBYTES.DAT file containing all 256
            possible byte codes. The XBYTES.DAT file is preferentially created
            in "%ALLUSERSPROFILE\JREPL\" if at all possible. Otherwise the
            file is created in "%TEMP%\JREPL\" instead. JREPL uses the file
            to establish the correct \xnn byte code mapping for each character
            set. Once created, successive runs reuse the same XBYTES.DAT file.
            If the file gets corrupted, then use the /XBYTES option to force
            creation of a new XBYTES.DAT file. If JREPL cannot create the file
            for any reason, then JREPL defaults to using pre v7.4 behavior
            where /XSEQ \xnn is interpreted as Windows-1252.

            Without the /XSEQ option, only standard JSCRIPT escape sequences
            \\, \b, \f, \n, \r, \t, \v, \xnn, \unnnn are available for the
            search strings. And the \xnn sequence represents a unicode
            code point, not extended ASCII.

            Extended escape sequences are supported even when the /L option
            is used. Both Search and Replace support all of the extended
            escape sequences if both the /XSEQ and /L opions are combined.

            Extended escape sequences are not applied to JScript code when
            using any of the /Jxxx options. Use the decode() function if
            extended escape sequences are needed within the code.
4) Fixed /XSEQ escaped backslash bug with /INC, /EXC, AND /P

Prior to v7.8, the /INC, /EXC, and /P options could give the wrong result if the regular expression contained a backslash literal and the /X option was used.
For example, the following command:

Code: Select all

jrepl "some search" "some replace" /x /inc "/\\n/" /f input.xt
is supposed to include lines that contain the following literal string "\n".

But the prior bugged versions would mistakenly treat the resultant "\n" as an escape sequence, and would attempt to include lines that contain a newline instead.

Version 7.8 fixes the bug and gives the correct behavior.


Dave Benham

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: JREPL.BAT v7.8 - regex text processor now with Unicode and XRegExp support

#340 Post by zimxavier » 14 Nov 2017 09:15

dbenham wrote:1) There is no simple JREPL emulation of FINDSTR /G at the moment. But it is something that I have thought about in the past. I'm already working on a new JREPL release. Now that you have requested a solution, I think I will extend the /K and /R options to allow reading a set of search strings from a file. It shouldn't take long to whip up and release this new functionality (probably within 1 week).

Read up on the /T (translate) option that allows you to specify multiple independent find/replace pairs. Couple that with /JMATCHQ and /JBEG to provide a little JSCRIPT logic, and the solution is relatively simple and elegant.

Code: Select all

@echo off
>newfile.txt (
  for %%F in ("D:\folder\*.txt") do (
    call jrepl "\bfunction\s*=\s*{ } [A-Za-z0-9_-]+" "$txt=!(go=true) $txt=go=false $txt=go?$0:go" /t " " /jmatchq /jbeg "var go=false" /f "%%F"
  )
)
Dave Benham
Wow! It works fine :D I need to study the relatively simple solution though :roll:

-----------------------------------------------------------------------------------------------------

About JREPL.BAT version 7.8... Wow x2
No issue with the result, it works as expected. I currently have a huge performance loss though.

Code: Select all

call jrepl search.txt "" /L /K 0:file /F input.txt > output.txt
is a lot slower than

Code: Select all

findstr /b /L /g:search.txt input.txt > output.txt
(with or without B option)

Time - findstr: 52 files in 40 seconds
Time - jrepl: Not even 3 files in 5 minutes (I cancelled)
I hope I miss nothing. Maybe related to the size of my search.txt (67 000 lines)?

Can I emulate B option? And X?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.8 - regex text processor now with Unicode and XRegExp support

#341 Post by dbenham » 14 Nov 2017 09:40

zimxavier wrote: About JREPL.BAT version 7.8... Wow x2
No issue with the result, it works as expected. I currently have a huge performance loss though.

Code: Select all

call jrepl search.txt "" /L /K 0:file /F input.txt > output.txt
is a lot slower than

Code: Select all

findstr /b /L /g:search.txt input.txt > output.txt
(with or without B option)

Time - findstr: 52 files in 40 seconds
Time - jrepl: Not even 3 files in 5 minutes (I cancelled)
I hope I miss nothing. Maybe related to the size of my search.txt (67 000 lines)?
Certainly 67,000 lines in your search file is going to take time - I'm a bit surprised that JREPL works at all with that many search terms.
Since FINDSTR /G has a bug that can lead to missed matches, the FINDSTR timing is a bit pointless. Perhaps if it gave the correct answer, it would be slower :?:

That being said, your FINDSTR command uses the /B option, which you did not use with JREPL. Looking anywhere within a line for a string is very computationally expensive compared to restricting matches to the beginning of a line.
zimxavier wrote: Can I emulate B option? And X?
Have you tried to use the built in help?

You can use the following to get a description of all available help options:

Code: Select all

Prompt>jrepl /?help

  Help is available by supplying a single argument beginning with /? or /??:

      /?        - Writes all available help to stdout.
      /??       - Same as /? except uses MORE for pagination.

      /?Topic   - Writes help about the specified topic to stdout.
                  Valid topics are:

                    INTRO   - Basic syntax and default behavior
                    OPTIONS - Brief summary of all options
                    JSCRIPT - JREPL objects available to user JScript
                    RETURN  - All possible return codes
                    VERSION - Display the version of JREPL.BAT
                    HISTORY - A summary of all releases
                    HELP    - Lists all methods of getting help

                  Example: List a summary of all available options

                     jrepl /?options

      /?WebTopic - Opens up a web page within your browser about a topic.
                  Valid web topics are:

                    REGEX   - Microsoft regular expression documentation
                    REPLACE - Microsoft Replace method documentation
                    UPDATE  - DosTips release page for JREPL.BAT
                    CHARSET - List of possible character set names for ADO I/O
                              Some character sets may not be installed
                    XREGEXP - xRegExp.com home page (extended regex docs)

      /?/Option - Writes detailed help about the specified /Option to stdout.

                  Example: Display paged help about the /T option

                     jrepl /??/t

      /?CHARSET/[Query] - List all character set names for use with ADO I/O
                  that are installed on this computer. Optionally restrict
                  the list to names that contain Query. Wildcards * and ? may
                  be used within Query. The default Query is an empty string,
                  meaning list all available character sets. The list is
                  generated via reg.exe.

                  Examples:

                     jrepl /??charset/    - Paged list of all available names
                     jrepl /?charset/utf  - List of names containing "utf"
You can get a brief summary of all options via:

Code: Select all

Prompt>jrepl /?options

  Options:  Behavior may be altered by appending one or more options.
  The option names are case insensitive, and may appear in any order
  after the Replace argument.

      /A                     - write Altered lines only
      /APP                   - Append results to the output file
      /B                     - match Beginning of line
      /C                     - Count number of source lines
      /D                     - Delimiter for /N and /OFF
      /E                     - match End of line
      /EXC BlockList         - EXClude lines from selected blocks
      /F InFile[|CharSet]    - read input from a File
      /I                     - Ignore case
      /INC BlockList         - INClude lines from selected blocks
      /J                     - JScript replace expressions
      /JBEG InitCode         - initialization JScript code
      /JBEGLN NewLineCode    - line initialization JScript code
      /JEND FinalCode        - finalization JScript code
      /JENDLN EndLineCode    - line finalization JScript code
      /JLIB FileList         - load file(s) of initialization code
      /JMATCH                - write matching JScript replacements only
      /JMATCHQ               - new Quick form of /JMATCH
      /JQ                    - new Quick form of /J
      /K Context or Pre:Post - search and Keep lines that match
      /L                     - Literal search
      /M                     - Multi-line mode
      /MATCH                 - Search and print each match, one per line
      /N MinWidth            - prefix output with liNe numbers
      /O OutFile[|CharSet]   - write Output to a file
      /OFF MinWidth          - add char OFFsets to /K, /JMATCHQ, /MATCH output
      /P Regex               - only search/replace strings that match a Regex
      /PFLAG Flags           - set the /P regex Flags to "g", "gi", "", or "i"
      /R Context or Pre:Post - search and Reject lines that match
      /RTN ReturnVar[:Line#] - Return result in a variable
      /S VarName             - Source is read from a variable
      /T DelimChar or FILE   - Translate multiple search/replace pairs
      /TFLAG Flags           - Specify XRegExp flags for use with /T
      /U                     - Unix line terminators (\n instead of \r\n)
      /UTF                   - All input and output as UTF-16LE (BOM optional)
      /V                     - use Variables for Search/Replace and code
      /X                     - enable eXtended ASCII and escape sequences
      /XBYTES                - force creation of new XBYTES.DAT
      /XBYTESOFF             - force all \xnn to be treated as Windows-1252
      /XREG FileList         - adds XRegExp support to JREPL
You can also get detailed information about a specific option. For example, to get details on /B:

Code: Select all

Prompt> jrepl /?/b

      /B  - The Search must match the Beginning of a line.
            Mostly used with literal searches.
So yes, JREPL has /B and /E options that serve the same purpose as with FINDSTR. You can emulate the FINDSTR /X option by using both /B and /E.

If you add /B to your JREPL /K 0:FILE search, then performance may be significantly better.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.9 - regex text processor now with Unicode and XRegExp support

#342 Post by dbenham » 23 Nov 2017 04:14

Here is JREPL.BAT v7.9
JREPL7.9.zip
Downloaded 1045 times from the main JREPL release page over 15 weeks while v7.9 was the current release.
(26.28 KiB) Downloaded 751 times

Summary of changes:

Code: Select all

prompt>jrepl /??history

    2017-11-23 v7.9: Allow escape sequences with /T "" coupled with /XSEQ
                     Added /PREPL option to augment /P behavior
                     Bug fix - Force /L when /T "" used, as per documentation
                     Bug fix - Allow /?charset/search to include non alpha
<truncated...>
1) Allow escape sequences with /T "" coupled with /XSEQ

Code: Select all

prompt>jrepl /?/t

      /T DelimiterChar
      /T FILE

            The /T option is very similar to the Oracle Translate() function,
            or the unix tr command, or the sed y command.

            The Search represents a set of search expressions, and Replace
            is a like sized set of replacement expressions. Expressions are
            delimited by DelimiterChar (a single character). If DelimiterChar
            is an empty string, then each character is treated as its own
            expression. The /L option is implicitly set if DelimiterChar is
            empty. Normally escape sequences are interpreted after the search
            and replace strings are split into expressions. But if the
            DelimiterChar is empty and /XSEQ is used, then escape sequences
            are interpreted prior to the split at every character.
 
 <truncated...>
 
Example - Use /T "" and /X coupled with \x{nn-mm} escape sequences ( introduced in v7.8 ) to perform ROT13 obfuscation (rotation cipher)

Code: Select all

prompt>echo Goodbye Cruel World! | jrepl "\x{41-5a}\x{61-7a}" "\x{4e-5a}\x{41-4d}\x{6e-7a}\x{61-6d}" /t "" /x
Tbbqolr Pehry Jbeyq!
The escape sequences are interpreted and expanded into the following strings before the /T "" option splits the strings:

Code: Select all

find=ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
repl=NOPQRSTUVWXYZABCDEFGHIJKLMnopqrstuvwxyzabcdefghijklm
After the split, we get the expected find/replace pairs: A -> N, B -> O, C -> P, etc.

2) Added /PREPL option to augment /P behavior

Prior to v7.9, the /P option had limited use because JSCRIPT regular expressions do not support look behind expressions.

The new /PREPL option circumvents that limitation by passing only a captured group within a matched /P filter.

Here is modified documentation for /P, as well as documentation for the new /PREPL option - Be sure to look at the two examples embedded within the docs:

Code: Select all

prompt>jrepl /?/p & jrepl /?/prepl

      /P FilterRegex

            Only Search/Replace strings that match the Pre-filter regular
            expression FilterRegex. All escape sequences defined by /XSEQ are
            available to FilterRegex, even if /XSEQ has not been set.

            FilterRegex is a global, case sensitive search by default.
            The behavior may be changed via the /PFLAG option.

            By default, /P passes the entire matched filter string to the
            main Search/Replace routine. If your FilterRegex includes captured
            groups, then you can add the /PREPL option to selectively pass one
            or more captured groups instead.

            The /P option ignores /I, but honors /M.

            The /P option may be combined with /INC and/or /EXC, in which case
            /P is applied after lines have been included and/or excluded.

            From the standpoint of the main "Search" argument, ^ matches the
            beginning of the matched filter, and $ matches the end of the
            matched filter.

            Example - Substitute X for each character within curly braces,
                      including the braces.

               echo abc{xyz}def|jrepl . X /p "{.*?}"

            result:

               abcXXXXXdef

            See /PREPL for an example showing how to preserve the enclosing
            braces.

      /PREPL FilterReplaceCode

            Specify a JScript expression FilterReplaceCode that controls
            what portion of the /P Pre-filter match is passed on to the main
            Search/Replace routine, and what portion is preserved as-is.

            The expression is mostly standard JScript, and should evaluate to
            a string value. $0 is the entire Pre-filter match, and $1 through
            $N are the captured groups. The only non-standard syntax is the
            use of curly braces to indicate what string expression gets passed
            on to the main Search/Replace. Prior to executing the /P filter,
            each brace expression within /PREPL is transformed as follows:

               {Expression}  -->  (Expression).replace(Search,Replace)

            Any JScript is allowed within /PREPL, except string literals
            should not contain $, {, or }.

            Using /P without /PREPL is the same as using /P with /PREPL "{$0}"

            /PREPL cannot be used with /K or /R.

            Note that neither /V nor /XFILE apply to /PREPL.

            Example - Substitute X for each character within curly braces,
                      excluding the braces.

               echo abc{xyz}def|jrepl . X /p "({)(.*?)(})" /prepl "$1+{$2}+$3"

            result:

               abc{XXX}def
3) Bug fix - Force /L when /T "" used, as per documentation

Using /T "" is supposed to implicitly set the /L option, but this feature was broken in some prior release.

Version 7.9 restores this behavior.

4) Bug fix - Allow /?charset/search to include non alpha

The ability to search and display installed character sets via /?charset/search was introduced in v7.5,
but there was a bug that only allowed alphabetic characters in the search.
Searching by a number like jrepl /?charset/1252 would result in an erroneous Invalid /? option error message.

Version 7.9 fixes the bug such that all characters can now be included in the search.


Dave Benham

naraen87
Posts: 17
Joined: 21 Dec 2017 06:41

Re: JREPL.BAT v7.9 - regex text processor now with Unicode and XRegExp support

#343 Post by naraen87 » 05 Jan 2018 04:36

Dave Could you please help me to form a script using JREPL for the following change

Code: Select all

URL=http://UKGSWTOWB12:
WEBSERVER=UKGSWTOWB12
WPORT=8003
PORT=8003
DEBUG=true
SS=false
UTC=false
InitServlet=/war_Servlet
Package=com.tcs.bancs
AccessVerifierRequired=true
MCAppDataBaseType=Oracle
NCSContext=NCSWeb

NoOfTabs=10
longDateFormat=false
#MasterCraftVector.SEC.AppendByValue=false
ejb.AM.Local=Y
ejb.FA.Local=Y
ejb.CR.Local=Y
ejb.CM.Local=Y
ejb.AN.Local=Y
ejb.IF.Local=Y
ejb.es.Local=Y
MCAppServer=weblogic

Environment=Training-v6.19.13.0

DisplayedPageTitle = TCS B&alpha;NCS
MasterCraftDateTime.corearch.BaseTZ=GMT

#################bancs.system.otherContextURL=http://ArchivalIntranetHost:ArchivalIntranetPort/Bancs
bancs.system.isarchival=NO
#entry to implement CA parser
WorkItemParserImplClass=com.tcs.bfsarch.workitem.BancsWorkItemParserImpl

##################bancs.system.otherContextURL=http://172.19.102.115:7001/Bancs
#This property has to be set "no" to stop archive of report when generated by batch
BatchReportArchive=yes

# These properties have to be specified for sending mails.
BaNCSAccLocked=<<correcsponding template id to be given>>
BaNCSPassReset=<<correcsponding template id to be given>>
BaNCSPassGen=<<correcsponding template id to be given>>

# Keberos Properties
# This is the separator in Active Directory.
KeyValueSeprator=:
# LDAP_PARTS should be 3 for IBM JRE and 2 for SUN JRE 
LDAP_PARTS=3
# Date format of whenChanged date from Active Directory , specify in lowercase.
# year in yyyy , month in mm , date in dd , hours in hh , minutes in mm , seconds in ss
noOfRecToExport=100

#This property is used to show/hide keyopad for entering password
VirtualKeypad_Password_Req = no
#Property to enable table header sorting
SortingRequired = yes

#Property to display confirmation box or alert box to user while doing data export
isConfirmationReq=yes

#Property to enable Non admin Users to view all generated reports
ReportViewAllUser=yes

ReflectionUtilClass=com.tcs.mastercraft.mctype.CachedReflectionUtil
#This property is used to show ammount and currency seperately in ADSL and GL windows
ExportToExcelAmtCurSep=yes
#This property is to be set 'yes' for the Product Logo to appear in the center
#Default value is yes
CenterLogo_Req = yes
###########Security Log Configuration start#############

# This property is used enable/disable (YES/NO) the HeartBeat message .
HeartBeatMsgReq=YES
#Maximum length for the security log
SecLogMaxLength=1024
#Incase the length of security log  is more than SecLogMaxLength property then Message body ( $MessageText will be truncated ).So we can user MessageAppender to mention that message is truncated and the appender will be added at the end of MessageText.
MessageAppender=...(cont...)
#This is used to seperate the multiple error messages in the MessageText.
MessageSeparator=,
#The Bellow properties used to escape some char from the MessageText.
#To enable escape char feature in MessageText. This replaces the regular exp 'MessageEscapeRegex' with 'MessageEscapeReplacement' property value(regex,replacement). EscapecChar need to be taken care.
MessageEscapeReq=YES
MessageEscapeRegex==
MessageEscapeReplacement=\\\\=

#This property is used to specify the missing properties from the security log. If any properties are missing the value set for this 'UnkownPropertyValue' will appear in the message, which indicate property is missing.
#If this feature is not needed, please delete the value part <UnkownProp>.
UnkownPropertyValue=<UnkownProp>

#We can define these two properties for what message needs to be printed for success/failure state of a event.	
ServiceState_Success=Success
ServiceState_Failure=Failure

#To specify the default severity. 
DefaultSeverity=5

#To specify the default EffectedUserID. 
DefaultEffectedUserID=

#These are the property to specify the signature id , signature name , service description and severity for each Actions.
#HeartBeat Message
SignatureID_HeartBeat=HB01
SignatureName_HeartBeat=HeartBeat
Severity_HeartBeat=0

#Login/Logoff/ChangePassword/UserAccountLocked events
SignatureName_LoginSuccess=Successful User Login
SignatureName_LoginFailure=User Login Failure
ServiceDesc_Login=LoginAction.Login
SignatureID_LoginSuccess=LL01
SignatureID_LoginFailure=LL03
Severity_Login=0

#These are the property to specify the signature id , signature name and severity for each service. User can add there own services for that in FUNCTION TABLE set diagonestic level 4 and add respective properties
#Format for SignatureName :  SignatureName_<ServiceID/FuncID>
#Format for SignatureID   :  SignatureID_<ServiceID/FuncID>_<0/4>. 0 for success and 4 for failure
#Format for Severity :  Severity_<ServiceID/FuncID>
SignatureName_226=CreateUser
SignatureID_226_0=SA0101
SignatureID_226_4=SA0102
Severity_226=0

SignatureName_229=UnlockUser
SignatureID_229_0=SA1901
SignatureID_229_4=SA1902
Severity_229=0

#Set this to as provide to use third party encryption algo apart from JCE . Eg - org.bouncycastle.jce.provider.BouncyCastleProvider is used for bouncy castle.
provider=

#Environment variable containing the key
BANCS_ENC_LOC=
#This property has to set as "YES" to make access flag condition visible in Create User Screen otherwise set as "NO"
ACCESS_FLG_REQD=NO
#The below property is used to show or hide the splash screen
#yes: Splash screen appears
#no : Splash screen does not appear
SplashScreen_Req=yes
In the Above property file I've change value of Environment to UAT1-v6.19.15.0
I don't know about the current value of Environment.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v7.9 - regex text processor now with Unicode and XRegExp support

#344 Post by dbenham » 05 Jan 2018 07:49

@naraen87

I must say it is disappointing you haven't demonstrated any effort at understanding regular expressions, or countless existing examples on JREPL usage. Your situation is about as simple a use case as there is.

Imagine you are the program that must change the value - how would you do it? What would you look for?

So you don't know what the current environment value is... What do you know about the line?

You should be able to quickly figure this out on your own if you expend just a little effort.

If you still cannot figure this out, but you demonstrate you have made an honest effort to solve this, then I can give you the simple answer.


Dave Benham

Squashman
Expert
Posts: 4465
Joined: 23 Dec 2011 13:59

Re: JREPL.BAT v7.9 - regex text processor now with Unicode and XRegExp support

#345 Post by Squashman » 05 Jan 2018 09:52

naraen87 wrote:
05 Jan 2018 04:36
In the Above property file I've change value of Environment to UAT1-v6.19.15.0
I don't know about the current value of Environment.
You would not even need to use JREPL to accomplish this task. A FOR /F command with the tokens and delims options would do the trick. Just my two cents.

Post Reply