JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2241
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#406 Post by dbenham » 19 May 2019 20:21

Steffen (aGerman) discovered how to enable the ANSI escape sequences without resorting to modifying the registry. See series of posts starting at viewtopic.php?f=3&t=9144&p=59700#p59691

This is useful for the new /H option.

Here is version 8.1
JREPL8.1.zip
Downloaded 130 times in 17 days from the main release page while v8.1 was the current release.
(30.18 KiB) Downloaded 19 times

Summary of Changes

Code: Select all

C:\>jrepl /?history

    2019-05-19 v8.1: Add /VT to enable Virtual Terminal ANSI escape sequences.
                     Code courtesy of DosTips user aGerman (Steffen).
    ...  
New /VT option to enable ANSI escape sequences

Code: Select all

C:\>jrepl /?/vt

     /VT - Enables Virtual Terminal processing of ANSI escape sequences for
           the current JREPL.bat process within the Windows 10 console. This
           option is not needed if the registry has HKEY_CURRENT_USER\Console
           "VirtualTerminalLevel" DWORD set to 1 in the registry.
I also updated the documentation for /H

Code: Select all

    /H  - Highlight all replaced or matched text in the output using the ...
            ...
            Native support for ANSI escape sequences requires Windows 10 or
            higher. ANSI escape sequences only work on the Windows 10 console
            if the "Use legacy console" option is off in console properties.

            In addition, one of the following must be used:
             - The /VT option can be used to enable the escape sequences for
               a single JREPL run.
            or
             - The registry can have the following DWORD defined, which will
               enable escape sequences for all console applications.
                  [HKEY_CURRENT_USER\Console]
                  "VirtualTerminalLevel"=dword:00000001



Dave Benham

kaddet
Posts: 1
Joined: 20 May 2019 16:05

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#407 Post by kaddet » 22 May 2019 07:13

Many, many thanks Dave for this amazing tool, it saves me work and most important: TIME!

PD: I sent you a PM :D

zimxavier
Posts: 52
Joined: 17 Jan 2016 10:09
Location: France

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#408 Post by zimxavier » 30 May 2019 05:40

Hi!

My curent script:

1) extract "string" from file1 to file2

Code: Select all

call JREPL "^(string)\b" "$txt=$1" /jmatchq /f "file1.txt" >> "file2.txt"
2) remove the same "string" from file1

Code: Select all

call JREPL "^(string)\r\n" "" /i /m /f file1.txt /o -
Can I do it in one go? Currently, the first step is a copy-paste, and I would like to know if a cut-paste is possible.

Thank you again for this awesome tool!

dbenham
Expert
Posts: 2241
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#409 Post by dbenham » 30 May 2019 06:13

JREPL normally has only one output (or modifies one output file at a time). But you can supply your own JScript via /JQ if you want to have multiple outputs.

Your two search strings in your example are not quite the same, so I don't see how you are currently achieving a cut and paste in two steps. But here is all you need to do a cut and paste with a single JREPL call (cut a line from file1 and append it to file2):

Code: Select all

call JREPL "^string\r?\n" "stdout.Write($0);$txt=''" /I /M /JQ /F "file1.txt" /O - >> "file2.txt"
All standard JScript syntax is available in your replacement code. In addition, review JREPL /?JSCRIPT to see the additional "objects" that are predefined for your use. My code above uses the non-standard stdout object.


Dave Benham

dbenham
Expert
Posts: 2241
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.1 - regex text processor with support for text highlighting and alternate character sets

#410 Post by dbenham » 30 May 2019 07:39

My previous code works well as long as the input file is not huge (well less than 1GB). But the /M option will cause it to fail if the input is huge.

With a bit of extra code, it is possible to do the cut from file1 and append to file2 without /M if you supply additional JScript code:

Code: Select all

call JREPL "^string$" "skip=true;stdout.WriteLine($0)" /I /JQ /JENDLN "if (skip)$txt=skip=false" /F "file1.txt" /O - >> "file2.txt"
The above will work with any size file as long as no single line is huge.


Dave Benham

dbenham
Expert
Posts: 2241
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#411 Post by dbenham » 05 Jun 2019 16:01

Here is version 8.2
JREPL8.2.zip
(28.79 KiB) Downloaded 19 times


Summary of changes

Code: Select all

c:\>jrepl /?history

    2019-06-05 v8.2: /RTN bug fix - preserve Unicode by using CHCP 65001/utf-8
                     to transfer value to variable unless /XFILE used.
...
Hopefully this is the last remaining encoding issue with JREPL. The /RTN option makes use of a temporary file to transfer the JScript result into an environment variable. Prior versions were using the default CSCRIPT input/output encoding - 1252 on my machine. This does not support Unicode.

Version 8.2 writes the temporary file as UTF-8 without BOM, and the batch code temporarily sets CHCP 65001 before reading the lines with a FOR /F. Unicode is now properly preserved in the output variable.

JREPL reverts to old behavior if /XFILE is used.


Dave Benham

ResonantStep
Posts: 4
Joined: 27 Apr 2019 06:32

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#412 Post by ResonantStep » 09 Jun 2019 17:24

Hi, thanks for your utility, very useful in batch scripts.
I'm trying to make a "drag and drop" script to remove duplicate lines and keeping last.

Here's my code so far:

Code: Select all

@echo off
call "JREPL.bat" "^(.*?)$\s+?^(?=.*^\1$)" "" /m /f "%~dpnx1" /o -
call "JREPL.bat" "^(.*?)$\s+?^(?=.*^\1$)" "" /m /f "%~dpnx1" /o -
PowerShell -NoProfile -ExecutionPolicy Bypass "(Get-Content '%~dpnx1') | Set-Content '%~dpnx1'"
exit /b
Problems I have are:
I need to call 2 times the script for it to work.
it outputs in CR instead of CRLF, so I have to add that powershell command (which is very slow from batch) to convert eol to windows format
+ I'd like to preserve empty lines.

And by will to learn, I also have 2 questions:
-What would be the command if I wished to keep first duplicate line instead of last one?
-Why this regex:

Code: Select all

^(.*)(?:\r?\n|\r)(?=[\s\S]*^\1$^)
(I copied that regex from "Regular Expressions Cookbook")
doesn't work from external batch, but works if I insert the command directly in JREPL.bat like this:

Code: Select all

...
if exist "%TEMP%\lock_process.tmp" ( goto :Batch) else ( echo repl>"%TEMP%\lock_process.tmp")
"%~f0" "^(.*)(?:\r?\n|\r)(?=[\s\S]*^\1$^)" "" /m /f "%~f1" /o -

============= :Batch portion ===========
setlocal disableDelayedExpansion
del /f /s /q "%TEMP%\lock_process.tmp" >NUL 2>&1
...
Ideally I would prefer to make this with one file/script...but Jrepl is more robust.

ps: a last question :)
I have a bunch of reg files encoded as UTF-16LE, how can I make it work with UTF-16 too?

dbenham
Expert
Posts: 2241
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#413 Post by dbenham » 11 Jun 2019 21:33

Your regex is definitely not correct. I'm not going to try to figure out exactly what is going on.

The regex I would use is:

Code: Select all

^([^\r\n]+)\r?\n(?=[\s\S]*^\1$)
But when you use CALL with quoted strings, the ^ is doubled to ^^. That doesn't matter for beginning of line anchor, but it gives the wrong result with something like [^x]. So I would use the /XSEQ option and use \c instead of ^.

Here is how I would remove all duplicates except empty lines. The last instance of each unique line is kept.

Code: Select all

call jrepl "\c([\c\r\n]+)\r?\n(?=[\s\S]*\c\1$)" "" /xseq /m /f "%~f1" /o -
If you want to additionally remove all lines that are empty or contain only white space, then

Code: Select all

call JREPL "\c([\c\r\n]*[\c\r\n\t ][\c\r\n]*)\r?\n(?![\s\S]*\c\1$)" "$txt=$1" /xseq /m /jmatchq /f "%~f1" /o -
I don't know of a simple regex way to preserve the first instance of each unique line instead of the last instance. That would require a look behind assertion, which JScript regex does not support. Or if the lines were pre-sorted, then the following would work:

Code: Select all

call jrepl "\c(([\c\r\n]*)\r?\n?)(?:\2(?:\r?\n|$))*" $1 /xseq /m /f "%~f1" /o -
But if the lines are sorted, then it doesn't matter which is preserved - keep first or keep last will give the same result :wink:

For all of the solutions, you can work with UTF-16LE by appending |unicode to the /F option. This would include the BOM in the output. If you want no BOM, then append |unicode|nb.
For example:

Code: Select all

call jrepl "\c([\c\r\n]+)\r?\n(?=[\s\S]*\c\1$)" "" /xseq /m /f "%~f1|unicode|nb" /o -
Dave Benham

ResonantStep
Posts: 4
Joined: 27 Apr 2019 06:32

Re: JREPL.BAT v8.2 - regex text processor with support for text highlighting and alternate character sets

#414 Post by ResonantStep » 13 Jun 2019 05:39

Thanks a lot, very helpful and complete answer, working great and fast now.
I implemented a "BOM check" and added your two commands (don't know what happens for files without BOM but I guess the script will work 99% of the time)

Code: Select all

setLocal
if exist "%~f1\" (
	echo This can not be used against directories.
	endlocal && timeout /t 2 /nobreak >nul 2>&1
	exit /b
)

if "%~z1" EQU "0" (
	echo Empty files are not accepted.
	endlocal && timeout /t 2 /nobreak >nul 2>&1
	exit /b
)

set "file=%~snx1"
del /Q /F "%file%.hex" >nul 2>&1
certutil -f -encodehex %file% %file%.hex>nul

for /f "usebackq delims=" %%E in ("%file%.hex") do (
	set "f_line=%%E" > nul
	goto :BOM_Check
)

:BOM_Check
del /Q /F "%file%.hex" >nul 2>&1
:: Check the BOMs
:: utf-8
echo %f_line% | find "ef bb bf" >nul && endlocal && goto :Jrepl_Command_1
:: utf-32 LE
echo %f_line% | find "ff fe 00 00" >nul && endlocal && goto :Jrepl_Command_2
:: utf-16 LE
echo %f_line% | find "ff fe" >nul && endlocal && goto :Jrepl_Command_2
:: utf-16 BE
echo %f_line% | find "fe ff" >nul && endlocal && goto :Jrepl_Command_2
:: utf-32 BE
echo %f_line% | find "00 00 fe ff" >nul && endlocal && goto :Jrepl_Command_2
:: ASCII
endlocal && goto :Jrepl_Command_1

:Jrepl_Command_1
	echo >"%TEMP%\lock_process.tmp"
	call jrepl "\c([\c\r\n]*[\c\r\n\t ][\c\r\n]*)\r?\n(?![\s\S]*\c\1$)" "$txt=$1" /xseq /m /jmatchq /f "%~f1" /o -
	exit /b

:Jrepl_Command_2
	echo >"%TEMP%\lock_process.tmp"
	call jrepl "\c([\c\r\n]*[\c\r\n\t ][\c\r\n]*)\r?\n(?![\s\S]*\c\1$)" "$txt=$1" /xseq /m /jmatchq /f "%~f1|unicode" /o -
	exit /b

Post Reply