JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2261
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#406 Post by dbenham » 19 May 2019 20:21

Steffen (aGerman) discovered how to enable the ANSI escape sequences without resorting to modifying the registry. See series of posts starting at viewtopic.php?f=3&t=9144&p=59700#p59691

This is useful for the new /H option.

Here is version 8.1
JREPL8.1.zip
Downloaded 130 times in 17 days from the main release page while v8.1 was the current release.
(30.18 KiB) Downloaded 79 times

Summary of Changes

Code: Select all

C:\>jrepl /?history

    2019-05-19 v8.1: Add /VT to enable Virtual Terminal ANSI escape sequences.
                     Code courtesy of DosTips user aGerman (Steffen).
    ...  
New /VT option to enable ANSI escape sequences

Code: Select all

C:\>jrepl /?/vt

     /VT - Enables Virtual Terminal processing of ANSI escape sequences for
           the current JREPL.bat process within the Windows 10 console. This
           option is not needed if the registry has HKEY_CURRENT_USER\Console
           "VirtualTerminalLevel" DWORD set to 1 in the registry.
I also updated the documentation for /H

Code: Select all

    /H  - Highlight all replaced or matched text in the output using the ...
            ...
            Native support for ANSI escape sequences requires Windows 10 or
            higher. ANSI escape sequences only work on the Windows 10 console
            if the "Use legacy console" option is off in console properties.

            In addition, one of the following must be used:
             - The /VT option can be used to enable the escape sequences for
               a single JREPL run.
            or
             - The registry can have the following DWORD defined, which will
               enable escape sequences for all console applications.
                  [HKEY_CURRENT_USER\Console]
                  "VirtualTerminalLevel"=dword:00000001



Dave Benham

kaddet
Posts: 1
Joined: 20 May 2019 16:05

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#407 Post by kaddet » 22 May 2019 07:13

Many, many thanks Dave for this amazing tool, it saves me work and most important: TIME!

PD: I sent you a PM :D

zimxavier
Posts: 52
Joined: 17 Jan 2016 10:09
Location: France

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#408 Post by zimxavier » 30 May 2019 05:40

Hi!

My curent script:

1) extract "string" from file1 to file2

Code: Select all

call JREPL "^(string)\b" "$txt=$1" /jmatchq /f "file1.txt" >> "file2.txt"
2) remove the same "string" from file1

Code: Select all

call JREPL "^(string)\r\n" "" /i /m /f file1.txt /o -
Can I do it in one go? Currently, the first step is a copy-paste, and I would like to know if a cut-paste is possible.

Thank you again for this awesome tool!

dbenham
Expert
Posts: 2261
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#409 Post by dbenham » 30 May 2019 06:13

JREPL normally has only one output (or modifies one output file at a time). But you can supply your own JScript via /JQ if you want to have multiple outputs.

Your two search strings in your example are not quite the same, so I don't see how you are currently achieving a cut and paste in two steps. But here is all you need to do a cut and paste with a single JREPL call (cut a line from file1 and append it to file2):

Code: Select all

call JREPL "^string\r?\n" "stdout.Write($0);$txt=''" /I /M /JQ /F "file1.txt" /O - >> "file2.txt"
All standard JScript syntax is available in your replacement code. In addition, review JREPL /?JSCRIPT to see the additional "objects" that are predefined for your use. My code above uses the non-standard stdout object.


Dave Benham

dbenham
Expert
Posts: 2261
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#410 Post by dbenham » 30 May 2019 07:39

My previous code works well as long as the input file is not huge (well less than 1GB). But the /M option will cause it to fail if the input is huge.

With a bit of extra code, it is possible to do the cut from file1 and append to file2 without /M if you supply additional JScript code:

Code: Select all

call JREPL "^string$" "skip=true;stdout.WriteLine($0)" /I /JQ /JENDLN "if (skip)$txt=skip=false" /F "file1.txt" /O - >> "file2.txt"
The above will work with any size file as long as no single line is huge.


Dave Benham

dbenham
Expert
Posts: 2261
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.3 - regex text processor with support for text highlighting and alternate character sets

#411 Post by dbenham » 05 Jun 2019 16:01

Here is version 8.3 (edit 2019-07-16)
JREPL8.3.zip
Version 8.2 was downloaded 405 times in 6 weeks
(28.95 KiB) Downloaded 67 times


Summary of changes

Code: Select all

c:\>jrepl /?history

    2019-07-16 v8.3: Documentation correction - Binary data with null bytes may
                     be read via ADO.
    2019-06-05 v8.2: /RTN bug fix - preserve Unicode by using CHCP 65001/utf-8
                     to transfer value to variable unless /XFILE used.
...
Hopefully this is the last remaining encoding issue with JREPL. The /RTN option makes use of a temporary file to transfer the JScript result into an environment variable. Prior versions were using the default CSCRIPT input/output encoding - 1252 on my machine. This does not support Unicode.

Version 8.2 writes the temporary file as UTF-8 without BOM, and the batch code temporarily sets CHCP 65001 before reading the lines with a FOR /F. Unicode is now properly preserved in the output variable.

JREPL reverts to old behavior if /XFILE is used.


Dave Benham

ResonantStep
Posts: 4
Joined: 27 Apr 2019 06:32

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#412 Post by ResonantStep » 09 Jun 2019 17:24

Hi, thanks for your utility, very useful in batch scripts.
I'm trying to make a "drag and drop" script to remove duplicate lines and keeping last.

Here's my code so far:

Code: Select all

@echo off
call "JREPL.bat" "^(.*?)$\s+?^(?=.*^\1$)" "" /m /f "%~dpnx1" /o -
call "JREPL.bat" "^(.*?)$\s+?^(?=.*^\1$)" "" /m /f "%~dpnx1" /o -
PowerShell -NoProfile -ExecutionPolicy Bypass "(Get-Content '%~dpnx1') | Set-Content '%~dpnx1'"
exit /b
Problems I have are:
I need to call 2 times the script for it to work.
it outputs in CR instead of CRLF, so I have to add that powershell command (which is very slow from batch) to convert eol to windows format
+ I'd like to preserve empty lines.

And by will to learn, I also have 2 questions:
-What would be the command if I wished to keep first duplicate line instead of last one?
-Why this regex:

Code: Select all

^(.*)(?:\r?\n|\r)(?=[\s\S]*^\1$^)
(I copied that regex from "Regular Expressions Cookbook")
doesn't work from external batch, but works if I insert the command directly in JREPL.bat like this:

Code: Select all

...
if exist "%TEMP%\lock_process.tmp" ( goto :Batch) else ( echo repl>"%TEMP%\lock_process.tmp")
"%~f0" "^(.*)(?:\r?\n|\r)(?=[\s\S]*^\1$^)" "" /m /f "%~f1" /o -

============= :Batch portion ===========
setlocal disableDelayedExpansion
del /f /s /q "%TEMP%\lock_process.tmp" >NUL 2>&1
...
Ideally I would prefer to make this with one file/script...but Jrepl is more robust.

ps: a last question :)
I have a bunch of reg files encoded as UTF-16LE, how can I make it work with UTF-16 too?

dbenham
Expert
Posts: 2261
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#413 Post by dbenham » 11 Jun 2019 21:33

Your regex is definitely not correct. I'm not going to try to figure out exactly what is going on.

The regex I would use is:

Code: Select all

^([^\r\n]+)\r?\n(?=[\s\S]*^\1$)
But when you use CALL with quoted strings, the ^ is doubled to ^^. That doesn't matter for beginning of line anchor, but it gives the wrong result with something like [^x]. So I would use the /XSEQ option and use \c instead of ^.

Here is how I would remove all duplicates except empty lines. The last instance of each unique line is kept.

Code: Select all

call jrepl "\c([\c\r\n]+)\r?\n(?=[\s\S]*\c\1$)" "" /xseq /m /f "%~f1" /o -
If you want to additionally remove all lines that are empty or contain only white space, then

Code: Select all

call JREPL "\c([\c\r\n]*[\c\r\n\t ][\c\r\n]*)\r?\n(?![\s\S]*\c\1$)" "$txt=$1" /xseq /m /jmatchq /f "%~f1" /o -
I don't know of a simple regex way to preserve the first instance of each unique line instead of the last instance. That would require a look behind assertion, which JScript regex does not support. Or if the lines were pre-sorted, then the following would work:

Code: Select all

call jrepl "\c(([\c\r\n]*)\r?\n?)(?:\2(?:\r?\n|$))*" $1 /xseq /m /f "%~f1" /o -
But if the lines are sorted, then it doesn't matter which is preserved - keep first or keep last will give the same result :wink:

For all of the solutions, you can work with UTF-16LE by appending |unicode to the /F option. This would include the BOM in the output. If you want no BOM, then append |unicode|nb.
For example:

Code: Select all

call jrepl "\c([\c\r\n]+)\r?\n(?=[\s\S]*\c\1$)" "" /xseq /m /f "%~f1|unicode|nb" /o -
Dave Benham

ResonantStep
Posts: 4
Joined: 27 Apr 2019 06:32

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#414 Post by ResonantStep » 13 Jun 2019 05:39

Thanks a lot, very helpful and complete answer, working great and fast now.
I implemented a "BOM check" and added your two commands (don't know what happens for files without BOM but I guess the script will work 99% of the time)

Code: Select all

setLocal
if exist "%~f1\" (
	echo This can not be used against directories.
	endlocal && timeout /t 2 /nobreak >nul 2>&1
	exit /b
)

if "%~z1" EQU "0" (
	echo Empty files are not accepted.
	endlocal && timeout /t 2 /nobreak >nul 2>&1
	exit /b
)

set "file=%~snx1"
del /Q /F "%file%.hex" >nul 2>&1
certutil -f -encodehex %file% %file%.hex>nul

for /f "usebackq delims=" %%E in ("%file%.hex") do (
	set "f_line=%%E" > nul
	goto :BOM_Check
)

:BOM_Check
del /Q /F "%file%.hex" >nul 2>&1
:: Check the BOMs
:: utf-8
echo %f_line% | find "ef bb bf" >nul && endlocal && goto :Jrepl_Command_1
:: utf-32 LE
echo %f_line% | find "ff fe 00 00" >nul && endlocal && goto :Jrepl_Command_2
:: utf-16 LE
echo %f_line% | find "ff fe" >nul && endlocal && goto :Jrepl_Command_2
:: utf-16 BE
echo %f_line% | find "fe ff" >nul && endlocal && goto :Jrepl_Command_2
:: utf-32 BE
echo %f_line% | find "00 00 fe ff" >nul && endlocal && goto :Jrepl_Command_2
:: ASCII
endlocal && goto :Jrepl_Command_1

:Jrepl_Command_1
	echo >"%TEMP%\lock_process.tmp"
	call jrepl "\c([\c\r\n]*[\c\r\n\t ][\c\r\n]*)\r?\n(?![\s\S]*\c\1$)" "$txt=$1" /xseq /m /jmatchq /f "%~f1" /o -
	exit /b

:Jrepl_Command_2
	echo >"%TEMP%\lock_process.tmp"
	call jrepl "\c([\c\r\n]*[\c\r\n\t ][\c\r\n]*)\r?\n(?![\s\S]*\c\1$)" "$txt=$1" /xseq /m /jmatchq /f "%~f1|unicode" /o -
	exit /b

bblakey
Posts: 3
Joined: 21 Jun 2019 12:47

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#415 Post by bblakey » 21 Jun 2019 12:56

I'm struggling a bit trying to write the command that will read a string definitions file, then a target text config file doing a replace when a string is matched.

Two text files look like these, one "definition" and one "target"...

This is a sample config text file:
There is $<a>$ apple
Now there were $<b>$ apples before
In the beginning there were $<c>$ apples

Text file with string definitions, for ex I want to replace "$<a>$" with "one":
$<a>$,one
$<b>$,two
$<c>$,three

Thank you in advance for any guidance!

Aacini
Expert
Posts: 1605
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#416 Post by Aacini » 21 Jun 2019 15:27

Mmmm... If you may change the format of both input files, then there is a very simple solution:

target.txt:

Code: Select all

There is !a! apple
Now there were !b! apples before
In the beginning there were !c! apples
definition.txt:

Code: Select all

a=one
b=two
c=three
test.bat:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Read string definitions file
for /F "delims=" %%a in (definition.txt) do set "%%a"

rem Read target file and replace text
(for /F "delims=" %%a in (target.txt) do echo %%a) > result.txt
result.txt

Code: Select all

There is one apple
Now there were two apples before
In the beginning there were three apples
Antonio

dbenham
Expert
Posts: 2261
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#417 Post by dbenham » 21 Jun 2019 16:03

Solution 1 - Use a FOR /F to parse your definitions and write temporary FIND.TXT and REPL.TXT files. Then use JREPL with the /L literal option and /T translate option.

Code: Select all

@echo off
>find.txt 2>repl.txt (
  for /f "delims=, tokens=1*" %%A in (definitions.txt) do (
    echo(%%A
    echo(%%B>&2
  )
)
call jrepl find.txt repl.txt /L /T file /F config.txt /O output.txt
del find.txt repl.txt
Solution 2 - Use FOR /F to parse your definitions and define environment variables for each definition. Then use JREPL with /JQ option specifying user supplied JScript to replace each found $<x>$ string with the environment variable value. This is my favorite solution.

Code: Select all

@echo off
setlocal disableDelayedExpansion
for /f "delims=, tokens=1*" %%A in (definitions.txt) do set "%%A=%%B"
call jrepl "\$<.*?>\$" "$txt=env($0)" /JQ /F config.txt /O output.txt
Solution 3 - See Aacini's post above.

I've used the strategy that Aacini gave you in the past. But if your config file has any ! or ^ literals, or empty lines, or lines that begin with ; then additional coding is required. There are lots of niggling details that make general purpose pure batch solutions much more difficult than it ought to be. That is one of the reasons I wrote and use JREPL.BAT.


Dave Benham

bblakey
Posts: 3
Joined: 21 Jun 2019 12:47

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#418 Post by bblakey » 21 Jun 2019 16:13

Ok, let me take a look at doing this. Unfortunately I don't really have control over changing the $<>$ tags, so I'll see what I can do - thanks!

bblakey
Posts: 3
Joined: 21 Jun 2019 12:47

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#419 Post by bblakey » 21 Jun 2019 16:33

I've verified that Solution 2 works like a champ! I'll spin thru Solution 1, as well, see which one makes more sense for me.

Thank you for your time and samples, much appreciated!

onlinestatements
Posts: 10
Joined: 24 Oct 2018 09:54

Re: JREPL.BAT v7.15 - regex text processor now with Unicode and XRegExp support

#420 Post by onlinestatements » 11 Jul 2019 11:22

onlinestatements wrote:
24 Oct 2018 15:24
The NUL characters will only ever appear in the very first line of every file I ever process.
Could we somehow tell it to always exclude or skip the first line and not have to use /M option.
Then I just need to scan only lines 2 thru 10 for the removal of the FF and the Esc E
Then I also wouldn't have to put the asterisks after the FF since it won't ever reach the other FF's.
Thanks
Is it possible yet to tell jrepl to only search line numbers 3 thru 10?
I have the latest 8.2 version

Post Reply