JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2378
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#451 Post by dbenham » 12 Apr 2020 10:45

Why are you using word boundary \b anchors? That is why the ! after = is not replaced.

Why are you using delayed expansion? Delayed expansion won't help if the path contains spaces. Your paths should always be quoted, just in case there is a space. Without delayed expansion you don't have to worry about escaping !

I assume you are using /M because of the potential for null bytes in your content. If that is not an issue, then you can drop the /M.

Note that " cannot be passed as an argument to any CScript program like JREPL.BAT. You must use the \x22 escape sequence, or if you add the /XSEQ option you can use the non-standard \q escape sequence.

If you want to substitute _ for all 3 problem characters, you can simply do:

Code: Select all

call jrepl "[!\q:]" "_" /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"
If you want to substitute a different character for each problem character, then you can use the /T option without any delimiter. Assume you want _ for !, ' for ", and # for :, then:

Code: Select all

call jrepl "!\q:" "_'#" /t "" /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"
If you wanted to encode the problem characters in a way that can be reversed, then you can use the /T option with a delimiter - I'll use space as a delimiter. You could do something like _ --> _U, ! --> _B, " --> _Q, and : --> _C

Code: Select all

call jrepl "_ ! \q :" "_U _B _Q _C" /t " " /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"
Decoding is just as simple:

Code: Select all

call jrepl "_U _B _Q _C" "_ ! \q :" /t " " /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"

All that being said, I am concerned about your whole approach. I should think parsing your file with batch would only be plausible if all audio/video content is referenced by URL within your mxf. If any of the media content is embedded in your mxf then I should think all bets are off. Even if it can be done, batch seems like the worst possible option. Depending on what you are trying to do, JREPL might be a good choice for doing your actual parsing.


Dave Benham

ctm
Posts: 5
Joined: 11 Apr 2020 17:12

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#452 Post by ctm » 12 Apr 2020 14:37

Thanks for the very extensive and detailed answer!
the \b thing was basically a silly copy-cat error; I saw it somewhere and assumed it is kind of lead-in to the string...

the mxf is not a media file but a representation of a database, sort of (the channel listing of the windows media center)...
anyways, it works, thanks;

I'm using now (the colons turned out not to be critical):
call jrepl.bat "! \q" "_ #" /t "" /XSEQ /f "%Lineup%" /o "%dr%___lineup.mx"

( with the /m option the process seems to be faster... can that be?)

One thing and one question:
when I use:
"! \q" "_ §" with § insread of #, I get this ("xBA"):
1.jpg
1.jpg (8.32 KiB) Viewed 1919 times
so, § seems to be also "special"...

question: in the file also double-double quotes occur ("") which I would like to resolve to '§ §', or now '# #' (with a space in between)...
is there a way to do it in one run? now I simply do a second run, exchanging ## for # #...
like "! \q\q \q" "_ '# #' #"
so that the double quotes are replaced by the space-separated string and the single-quotes by the #...

Thanks, in any case I get done what I need !

ctm
Posts: 5
Joined: 11 Apr 2020 17:12

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#453 Post by ctm » 14 Apr 2020 15:20

ok, so I'm progressing, in a sence:

if I apply the following:

Code: Select all

 call _jrepl.bat "! \x22" "_ \xA7" /t "" /xseq /f "%Lineup%|UTF-8|NB" /o "%dr%___lineup.mx"
I get from the original.mxf the resulting "___lineup.mx" without BOM which makes some things easier (what still would be nice, is to replace "\x22\x22" by "\xA7 \xA7" in one step);
but for some reasons the process stops after a few lines with:

Code: Select all

JScript runtime error: Ungültiger Prozeduraufruf oder ungültiges Argument
which means "unvalid procedure call or argument"

could be a bug? or a problem with the source-mxf-file; /M does not change the outcome...
Attachments
X.zip
(1.23 MiB) Downloaded 37 times

ctm
Posts: 5
Joined: 11 Apr 2020 17:12

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#454 Post by ctm » 15 Apr 2020 14:32

so the line which crashes the script contains the following characters:

Code: Select all

... name="地デジ" isAuto...
if I replace them with something else, the script runs trough...

ctm
Posts: 5
Joined: 11 Apr 2020 17:12

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#455 Post by ctm » 16 Apr 2020 17:04

so, I solved my problem sufficiently by using:

Code: Select all

 call _jrepl.bat "! \x22\x22 \x22" "_ \xA7_\xA7 \xA7" /t " " /xseq /UTF /f "%Lineup%|UTF-8|NB" /o "%dr%___lineup.mx|UTF-8|NB"
I trade-off "§ §" for "§_§", which is no problem;
still, I'm curious why "§" creates so much trouble...

syrist
Posts: 2
Joined: 27 May 2020 07:42

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#456 Post by syrist » 27 May 2020 07:51

JREPL is great and I'm going to use it all the time. It already massively improved my workflow for processing data.

One question... is there a way to use JREPL to strip all line feeds from a text file? So I can go from:

Code: Select all

This
is
a
test
to

Code: Select all

Thisisatest
I tried this command line but it didn't work:

Code: Select all

jrepl "\r" "" /xseq /f INPUT.txt /o OUTPUT.txt

Squashman
Expert
Posts: 4160
Joined: 23 Dec 2011 13:59

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#457 Post by Squashman » 27 May 2020 09:40

\r is a carriage return
\n is a linefeed.

In the previous version of JREPL if you wanted to remove one of these you will need to use the /M option as well.

But in February Dave put in a new /EOL option to specify what end of line terminators you want to use.

syrist
Posts: 2
Joined: 27 May 2020 07:42

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#458 Post by syrist » 27 May 2020 11:47

Thnaks, that worked.

Another quick question... I have a text file that contains pipe | symbols that I want to remove. I don't see an escape code for pipes... so I just tried this which didn't work:

Code: Select all

JREPL "|" "" /xseq /m /f TEST2.txt /o TEST2b.txt
Any ideas for this one?

dbenham
Expert
Posts: 2378
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#459 Post by dbenham » 27 May 2020 14:22

Pipe is a regex meta-character that means alternation (basically an OR operator). Like any meta character, you escape it with a backslash.

The /XSEQ and /M options aren't needed. So

Code: Select all

JREPL "\|" "" /f TEST2.txt /o TEST2b.txt
The other option is to use the /L literal option, so the search is treated as a literal string instead of a regular expression

Code: Select all

JREPL "|" "" /l /f TEST2.txt /o TEST2b.txt

Dave Benham

Squashman
Expert
Posts: 4160
Joined: 23 Dec 2011 13:59

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#460 Post by Squashman » 01 Jun 2020 18:16

dbenham wrote:
20 Nov 2014 17:24
VBS and JScript strings are limited to 2GB, which is where the restrictions come from. The /M option must load the entire file into a single string.


Dave Benham
And using the /M option has been the only way I know of to strip a Carriage Return out of the middle of a record. I just got a bunch of crap data thrown at me with a lot of records with a CR in the middle of the record but the actual line ending is CRLF. The file is 4 GB's so the /M option is out.

dbenham
Expert
Posts: 2378
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#461 Post by dbenham » 01 Jun 2020 20:37

You should have no problem removing \r (carriage return) in the middle of a line without the /M option. You can remove all \r that are not part of the standard \r\n end of line with the following:

Code: Select all

call jrepl "\r" "" /f "file.txt" /o -
As each line is read, the \r\n end of line is automatically stripped by ReadLine, leaving only naked \r. The find/replace is then able to remove those unwanted \r. Then when each line is written as output using WriteLine, the \r\n ends of line are restored.

Only removal of the end of line (\n or \r\n) required the /M option.

But with the relatively new /EOL option you should be able to selectively remove end of line without /M with something like this

Code: Select all

call jrepl "$" "\r\n" /xseq /eol "" /exc "/regexMatchingLinesWhereNewlineShouldBeStripped/" /f "hugeFile.txt" /o -
The empty /EOL strips the end of line from each output line, but the find/replace restores the end of line. Except the /EXC regular expression excludes those matching lines from the find/replace, so they are the only ones where the end of line is actually stripped.

The /EXC regex need not match the entire line, just enough to selectively identify only the lines where you want the EOL to be stripped.

Dave Benham

Ridim
Posts: 1
Joined: 24 Jun 2020 06:40

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#462 Post by Ridim » 24 Jun 2020 07:24

Hello,

It will be perfect if I can get some help, because I'm stuck with this powerful script..

Here is a piece of code :

Code: Select all

{"announcement":{"id":32322,"category":0,"layout_type":0,"title":"Campagne !","banner":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\"Hello ! \r\n Join our community","start_at":1592784000,"new_appear_at":0,"is_new":false,"announcement_tab_id":1,"bodies":[{"layout_type":0,"image":"https://test.com/pics2.png?Expires=1592926851\u0026Signature=ggI1Aho0G2cdGEcn86-5b8gf~h1sPa0jwefexQO1hPLK~lAOxxsm8cuWS~2u8SUeNYd9zLUYPjqHQLyr5zA0vl08mBIPrl6v6IL9vER~5g8ununnxwyLAbRFIKXQtG1yaZj0QJBel9TJEspB0WTutbu5cDBW4~3Tb3jK7NKCLQt1ze4ZuTGszVmehz7V4SGuPtxyeQnOR1u1HuMbcsOHRfKJKdJ~X~hDAWDXSbXhXk3UrbpJngEO3nrYXDye93n0dtUDSGjpZ5Ebjq-iHCJ-9S4Hr0XF1KHvkYj6YPoz~Ql1W~qqbTX4PsX-3c34cXWt0dmDj4iACa2fVpb~p-eIWg__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}{shadow: #FFCC00, 3, 4, 5}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2 {color}\r\n{shadow: #FFCC00, 3, 4, 5}{size : 20}News\" {size}\r\n{size : 28}available !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"} 
Here is a cleaned version to be more pretty :

Code: Select all

{"announcement":
{"id":32322,
"category":0,
"layout_type":0,
"title":"Campagne !",
"image":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q",
"description":"\"Hello ! \r\n Join our community",
"image":"https://test.com/pics2.png?Expires=1592926851\u0026Signature=ggI1Aho0G2cdGEcn86-5b8gf~h1sPa0jwefexQO1hPLK~lAOxxsm8cuWS~2u8SUeNYd9zLUYPjqHQLyr5zA0vl08mBIPrl6v6IL9vER~5g8ununnxwyLAbRFIKXQtG1yaZj0QJBel9TJEspB0WTutbu5cDBW4~3Tb3jK7NKCLQt1ze4ZuTGszVmehz7V4SGuPtxyeQnOR1u1HuMbcsOHRfKJKdJ~X~hDAWDXSbXhXk3UrbpJngEO3nrYXDye93n0dtUDSGjpZ5Ebjq-iHCJ-9S4Hr0XF1KHvkYj6YPoz~Ql1W~qqbTX4PsX-3c34cXWt0dmDj4iACa2fVpb~p-eIWg__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q",
"description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}{shadow: #FFCC00, 3, 4, 5}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2 {color}\r\n{shadow: #FFCC00, 3, 4, 5}{size : 20}News\" {size}\r\n{size : 28}available !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"} 
I need this to be converted to html, to be more precise I want to remove all the lines except "image" and "description"

I dont know how to do this and if it's possible to remove all entries except the ons that contain these 2 words.

I want to replace too :
"image":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q",
by
<img src="https://test.com/pics1.png" /><br>
So the command should be something like this :

JREPL " "image":"" "" /f myfile.json /o -
and
JREPL "?Expires[all_chars_until_the last_",]"," "" /f myfile.json /o -


Same for this line, I need a cleanup like this :
"description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2\r\nNews\" \r\navailable !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"}
by

Code: Select all

<span style="color: #ff9900;"><strong>Check 1-2 News available !<br> JREPL ROCKS ! <br></strong></span> 


So the command should be something like this :

JREPL " "description"[all_chars_until_the last_}_of_{color : #4B76F2}]" "" /f myfile.json /o -
and
JREPL "\r\n\r\n" "<br>" /f myfile.json /o -
and
JREPL "\r\n" "" /f myfile.json /o -
and
JREPL "{size}{outline}{shadow}{color}{center}" "" /f myfile.json /o -

Of course none of these cmds works, certainly because I don"t use the correct syntax.

My first question will be, is-it possible to do such things with this script ?!

Thanks by advance,

Best regards!

Frits
Posts: 2
Joined: 06 Jul 2020 14:14

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#463 Post by Frits » 06 Jul 2020 14:44

The delete duplicates code in post 17 isn't working anymore.

Code: Select all

jrepl "" "" /N 10 /f "inputFile" ^
    | sort /+11 ^
    | jrepl ".*?:(.*)$"  "x=p;p=$1;($1==x?false:$src);" /jmatch /jbeg "var p='',x" ^
    | sort ^
    | jrepl "^.*?:" "" > "outputFile"
I tested with JREPL 8.5 on Windows 10 and only "false" lines are reported.
The post 17 code is working with JREPL 1.0.

To overcome the problem I adapted the delete duplicates code to a normal replace statement.
Trailing blanks are removed and blank lines are kept.

Code: Select all

call jrepl "" "" /N 10 /f input.txt ^
   | jrepl  " +$" "" ^
   | sort /+11 ^
   | jrepl "^(.*?:)(.{2,}?\n)(?:.*?:\2.*?)+" "$1$2" /m ^
   | sort ^
   | jrepl "^.*?:" "" > results.txt

Frits
Posts: 2
Joined: 06 Jul 2020 14:14

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

#464 Post by Frits » 11 Jul 2020 07:30

Hi Ridim,

Your cleaned version didn't exactly match you original input.
I have included the input file I have used.

I am no expert so the conversion might be done more efficiently.

Input used:

Code: Select all

{"announcement":{"id":32322,"category":0,"layout_type":0,"title":"Campagne !","image":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\"Hello ! \r\n Join our community","start_at":1592784000,"new_appear_at":0,"is_new":false,"announcement_tab_id":1,"bodies":[{"layout_type":0,"image":"https://test.com/pics2.png?Expires=1592926851\u0026Signature=ggI1Aho0G2cdGEcn86-5b8gf~h1sPa0jwefexQO1hPLK~lAOxxsm8cuWS~2u8SUeNYd9zLUYPjqHQLyr5zA0vl08mBIPrl6v6IL9vER~5g8ununnxwyLAbRFIKXQtG1yaZj0QJBel9TJEspB0WTutbu5cDBW4~3Tb3jK7NKCLQt1ze4ZuTGszVmehz7V4SGuPtxyeQnOR1u1HuMbcsOHRfKJKdJ~X~hDAWDXSbXhXk3UrbpJngEO3nrYXDye93n0dtUDSGjpZ5Ebjq-iHCJ-9S4Hr0XF1KHvkYj6YPoz~Ql1W~qqbTX4PsX-3c34cXWt0dmDj4iACa2fVpb~p-eIWg__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}{shadow: #FFCC00, 3, 4, 5}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2 {color}\r\n{shadow: #FFCC00, 3, 4, 5}{size : 20}News\" {size}\r\n{size : 28}available !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"}

Code: Select all

:: step1 split lines
:: step2 only keep lines with image or description
:: step3 process image lines
:: step4 replace \r\n\r\n by <br>
:: step5 blank strings \r\n, \" and {...}
:: step6 process description lines
:: step7 remove empty lines

jrepl "({*?\q[^\q]*?\q:)" "\n$1" /f input.txt /xseq ^
  | jrepl "\qimage|description\q" "$src" /xseq /jmatch  ^
  | jrepl "(?:{*?\qimage\q:)(?:(.*?)\?Expires)(?:.*?)$" "<img src=$1\q /><br>" /xseq ^
  | jrepl "\\r\\n\\r\\n" "<br>" ^
  | jrepl "\\r\\n|\\\q|{.*?}" "" /xseq ^
  | jrepl "(?:{*?\qdescription\q:\q)(.*?)(?:\q*?(?:,|}))\r*?$" "<span style=\qcolor: #ff9900;\q><strong>$1</strong><\/span>" /xseq ^
  | jrepl "^" ""
Output:

Code: Select all

<img src="https://test.com/pics1.png" /><br>
<span style="color: #ff9900;"><strong>Hello !  Join our community</strong><\/span>
<img src="https://test.com/pics2.png" /><br>
<span style="color: #ff9900;"><strong>Check 1-2 News available !<br>JREPL ROCKS !<br></strong><\/span>

Post Reply