Page 31 of 37

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 12 Apr 2020 10:45
by dbenham
Why are you using word boundary \b anchors? That is why the ! after = is not replaced.

Why are you using delayed expansion? Delayed expansion won't help if the path contains spaces. Your paths should always be quoted, just in case there is a space. Without delayed expansion you don't have to worry about escaping !

I assume you are using /M because of the potential for null bytes in your content. If that is not an issue, then you can drop the /M.

Note that " cannot be passed as an argument to any CScript program like JREPL.BAT. You must use the \x22 escape sequence, or if you add the /XSEQ option you can use the non-standard \q escape sequence.

If you want to substitute _ for all 3 problem characters, you can simply do:

Code: Select all

call jrepl "[!\q:]" "_" /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"
If you want to substitute a different character for each problem character, then you can use the /T option without any delimiter. Assume you want _ for !, ' for ", and # for :, then:

Code: Select all

call jrepl "!\q:" "_'#" /t "" /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"
If you wanted to encode the problem characters in a way that can be reversed, then you can use the /T option with a delimiter - I'll use space as a delimiter. You could do something like _ --> _U, ! --> _B, " --> _Q, and : --> _C

Code: Select all

call jrepl "_ ! \q :" "_U _B _Q _C" /t " " /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"
Decoding is just as simple:

Code: Select all

call jrepl "_U _B _Q _C" "_ ! \q :" /t " " /xseq /m /f "%Linup%" /o "%dr%__lineup.mx"

All that being said, I am concerned about your whole approach. I should think parsing your file with batch would only be plausible if all audio/video content is referenced by URL within your mxf. If any of the media content is embedded in your mxf then I should think all bets are off. Even if it can be done, batch seems like the worst possible option. Depending on what you are trying to do, JREPL might be a good choice for doing your actual parsing.


Dave Benham

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 12 Apr 2020 14:37
by ctm
Thanks for the very extensive and detailed answer!
the \b thing was basically a silly copy-cat error; I saw it somewhere and assumed it is kind of lead-in to the string...

the mxf is not a media file but a representation of a database, sort of (the channel listing of the windows media center)...
anyways, it works, thanks;

I'm using now (the colons turned out not to be critical):
call jrepl.bat "! \q" "_ #" /t "" /XSEQ /f "%Lineup%" /o "%dr%___lineup.mx"

( with the /m option the process seems to be faster... can that be?)

One thing and one question:
when I use:
"! \q" "_ §" with § insread of #, I get this ("xBA"):
1.jpg
1.jpg (8.32 KiB) Viewed 42478 times
so, § seems to be also "special"...

question: in the file also double-double quotes occur ("") which I would like to resolve to '§ §', or now '# #' (with a space in between)...
is there a way to do it in one run? now I simply do a second run, exchanging ## for # #...
like "! \q\q \q" "_ '# #' #"
so that the double quotes are replaced by the space-separated string and the single-quotes by the #...

Thanks, in any case I get done what I need !

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 14 Apr 2020 15:20
by ctm
ok, so I'm progressing, in a sence:

if I apply the following:

Code: Select all

 call _jrepl.bat "! \x22" "_ \xA7" /t "" /xseq /f "%Lineup%|UTF-8|NB" /o "%dr%___lineup.mx"
I get from the original.mxf the resulting "___lineup.mx" without BOM which makes some things easier (what still would be nice, is to replace "\x22\x22" by "\xA7 \xA7" in one step);
but for some reasons the process stops after a few lines with:

Code: Select all

JScript runtime error: Ungültiger Prozeduraufruf oder ungültiges Argument
which means "unvalid procedure call or argument"

could be a bug? or a problem with the source-mxf-file; /M does not change the outcome...

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 15 Apr 2020 14:32
by ctm
so the line which crashes the script contains the following characters:

Code: Select all

... name="地デジ" isAuto...
if I replace them with something else, the script runs trough...

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 16 Apr 2020 17:04
by ctm
so, I solved my problem sufficiently by using:

Code: Select all

 call _jrepl.bat "! \x22\x22 \x22" "_ \xA7_\xA7 \xA7" /t " " /xseq /UTF /f "%Lineup%|UTF-8|NB" /o "%dr%___lineup.mx|UTF-8|NB"
I trade-off "§ §" for "§_§", which is no problem;
still, I'm curious why "§" creates so much trouble...

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 27 May 2020 07:51
by syrist
JREPL is great and I'm going to use it all the time. It already massively improved my workflow for processing data.

One question... is there a way to use JREPL to strip all line feeds from a text file? So I can go from:

Code: Select all

This
is
a
test
to

Code: Select all

Thisisatest
I tried this command line but it didn't work:

Code: Select all

jrepl "\r" "" /xseq /f INPUT.txt /o OUTPUT.txt

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 27 May 2020 09:40
by Squashman
\r is a carriage return
\n is a linefeed.

In the previous version of JREPL if you wanted to remove one of these you will need to use the /M option as well.

But in February Dave put in a new /EOL option to specify what end of line terminators you want to use.

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 27 May 2020 11:47
by syrist
Thnaks, that worked.

Another quick question... I have a text file that contains pipe | symbols that I want to remove. I don't see an escape code for pipes... so I just tried this which didn't work:

Code: Select all

JREPL "|" "" /xseq /m /f TEST2.txt /o TEST2b.txt
Any ideas for this one?

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 27 May 2020 14:22
by dbenham
Pipe is a regex meta-character that means alternation (basically an OR operator). Like any meta character, you escape it with a backslash.

The /XSEQ and /M options aren't needed. So

Code: Select all

JREPL "\|" "" /f TEST2.txt /o TEST2b.txt
The other option is to use the /L literal option, so the search is treated as a literal string instead of a regular expression

Code: Select all

JREPL "|" "" /l /f TEST2.txt /o TEST2b.txt

Dave Benham

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

Posted: 01 Jun 2020 18:16
by Squashman
dbenham wrote:
20 Nov 2014 17:24
VBS and JScript strings are limited to 2GB, which is where the restrictions come from. The /M option must load the entire file into a single string.


Dave Benham
And using the /M option has been the only way I know of to strip a Carriage Return out of the middle of a record. I just got a bunch of crap data thrown at me with a lot of records with a CR in the middle of the record but the actual line ending is CRLF. The file is 4 GB's so the /M option is out.

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 01 Jun 2020 20:37
by dbenham
You should have no problem removing \r (carriage return) in the middle of a line without the /M option. You can remove all \r that are not part of the standard \r\n end of line with the following:

Code: Select all

call jrepl "\r" "" /f "file.txt" /o -
As each line is read, the \r\n end of line is automatically stripped by ReadLine, leaving only naked \r. The find/replace is then able to remove those unwanted \r. Then when each line is written as output using WriteLine, the \r\n ends of line are restored.

Only removal of the end of line (\n or \r\n) required the /M option.

But with the relatively new /EOL option you should be able to selectively remove end of line without /M with something like this

Code: Select all

call jrepl "$" "\r\n" /xseq /eol "" /exc "/regexMatchingLinesWhereNewlineShouldBeStripped/" /f "hugeFile.txt" /o -
The empty /EOL strips the end of line from each output line, but the find/replace restores the end of line. Except the /EXC regular expression excludes those matching lines from the find/replace, so they are the only ones where the end of line is actually stripped.

The /EXC regex need not match the entire line, just enough to selectively identify only the lines where you want the EOL to be stripped.

Dave Benham

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 24 Jun 2020 07:24
by Ridim
Hello,

It will be perfect if I can get some help, because I'm stuck with this powerful script..

Here is a piece of code :

Code: Select all

{"announcement":{"id":32322,"category":0,"layout_type":0,"title":"Campagne !","banner":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\"Hello ! \r\n Join our community","start_at":1592784000,"new_appear_at":0,"is_new":false,"announcement_tab_id":1,"bodies":[{"layout_type":0,"image":"https://test.com/pics2.png?Expires=1592926851\u0026Signature=ggI1Aho0G2cdGEcn86-5b8gf~h1sPa0jwefexQO1hPLK~lAOxxsm8cuWS~2u8SUeNYd9zLUYPjqHQLyr5zA0vl08mBIPrl6v6IL9vER~5g8ununnxwyLAbRFIKXQtG1yaZj0QJBel9TJEspB0WTutbu5cDBW4~3Tb3jK7NKCLQt1ze4ZuTGszVmehz7V4SGuPtxyeQnOR1u1HuMbcsOHRfKJKdJ~X~hDAWDXSbXhXk3UrbpJngEO3nrYXDye93n0dtUDSGjpZ5Ebjq-iHCJ-9S4Hr0XF1KHvkYj6YPoz~Ql1W~qqbTX4PsX-3c34cXWt0dmDj4iACa2fVpb~p-eIWg__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}{shadow: #FFCC00, 3, 4, 5}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2 {color}\r\n{shadow: #FFCC00, 3, 4, 5}{size : 20}News\" {size}\r\n{size : 28}available !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"} 
Here is a cleaned version to be more pretty :

Code: Select all

{"announcement":
{"id":32322,
"category":0,
"layout_type":0,
"title":"Campagne !",
"image":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q",
"description":"\"Hello ! \r\n Join our community",
"image":"https://test.com/pics2.png?Expires=1592926851\u0026Signature=ggI1Aho0G2cdGEcn86-5b8gf~h1sPa0jwefexQO1hPLK~lAOxxsm8cuWS~2u8SUeNYd9zLUYPjqHQLyr5zA0vl08mBIPrl6v6IL9vER~5g8ununnxwyLAbRFIKXQtG1yaZj0QJBel9TJEspB0WTutbu5cDBW4~3Tb3jK7NKCLQt1ze4ZuTGszVmehz7V4SGuPtxyeQnOR1u1HuMbcsOHRfKJKdJ~X~hDAWDXSbXhXk3UrbpJngEO3nrYXDye93n0dtUDSGjpZ5Ebjq-iHCJ-9S4Hr0XF1KHvkYj6YPoz~Ql1W~qqbTX4PsX-3c34cXWt0dmDj4iACa2fVpb~p-eIWg__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q",
"description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}{shadow: #FFCC00, 3, 4, 5}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2 {color}\r\n{shadow: #FFCC00, 3, 4, 5}{size : 20}News\" {size}\r\n{size : 28}available !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"} 
I need this to be converted to html, to be more precise I want to remove all the lines except "image" and "description"

I dont know how to do this and if it's possible to remove all entries except the ons that contain these 2 words.

I want to replace too :
"image":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q",
by
<img src="https://test.com/pics1.png" /><br>
So the command should be something like this :

JREPL " "image":"" "" /f myfile.json /o -
and
JREPL "?Expires[all_chars_until_the last_",]"," "" /f myfile.json /o -


Same for this line, I need a cleanup like this :
"description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2\r\nNews\" \r\navailable !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"}
by

Code: Select all

<span style="color: #ff9900;"><strong>Check 1-2 News available !<br> JREPL ROCKS ! <br></strong></span> 


So the command should be something like this :

JREPL " "description"[all_chars_until_the last_}_of_{color : #4B76F2}]" "" /f myfile.json /o -
and
JREPL "\r\n\r\n" "<br>" /f myfile.json /o -
and
JREPL "\r\n" "" /f myfile.json /o -
and
JREPL "{size}{outline}{shadow}{color}{center}" "" /f myfile.json /o -

Of course none of these cmds works, certainly because I don"t use the correct syntax.

My first question will be, is-it possible to do such things with this script ?!

Thanks by advance,

Best regards!

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 06 Jul 2020 14:44
by Frits
The delete duplicates code in post 17 isn't working anymore.

Code: Select all

jrepl "" "" /N 10 /f "inputFile" ^
    | sort /+11 ^
    | jrepl ".*?:(.*)$"  "x=p;p=$1;($1==x?false:$src);" /jmatch /jbeg "var p='',x" ^
    | sort ^
    | jrepl "^.*?:" "" > "outputFile"
I tested with JREPL 8.5 on Windows 10 and only "false" lines are reported.
The post 17 code is working with JREPL 1.0.

To overcome the problem I adapted the delete duplicates code to a normal replace statement.
Trailing blanks are removed and blank lines are kept.

Code: Select all

call jrepl "" "" /N 10 /f input.txt ^
   | jrepl  " +$" "" ^
   | sort /+11 ^
   | jrepl "^(.*?:)(.{2,}?\n)(?:.*?:\2.*?)+" "$1$2" /m ^
   | sort ^
   | jrepl "^.*?:" "" > results.txt

Re: JREPL.BAT v8.5 - regex text processor with support for text highlighting and alternate character sets

Posted: 11 Jul 2020 07:30
by Frits
Hi Ridim,

Your cleaned version didn't exactly match you original input.
I have included the input file I have used.

I am no expert so the conversion might be done more efficiently.

Input used:

Code: Select all

{"announcement":{"id":32322,"category":0,"layout_type":0,"title":"Campagne !","image":"https://test.com/pics1.png?Expires=1592926851\u0026Signature=lBAIkBz3rrMHCBDHHPCnUk515Xoa~MtbB4Xsw4SCoFgrlkLojJjhNcOCrFx8LET3T1SQAVFYFYGYkrED2E5BiBpF48XJAWiN02RgyJY80JIeQDE2WTh04784OqnZRwQG3DkmmVfQNLjNxFAFeRWms~942r2qxIpQ-5tZojMVlNO9molhbGwnCwydXN4957r1s2EJ5LsRlHnxHySBB4nRZzAUCUFrPwrQBqFf0ZCkU-P0y-J80vcGTaCCI6TKmwvGZyeatEoPezqqiSFLwKgeF42z05Y-9kO5jbkOqHfgfAkBbohNn5pdNIfSHy44PuwFqex4XZlDnRofdqSZXZigKQ__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\"Hello ! \r\n Join our community","start_at":1592784000,"new_appear_at":0,"is_new":false,"announcement_tab_id":1,"bodies":[{"layout_type":0,"image":"https://test.com/pics2.png?Expires=1592926851\u0026Signature=ggI1Aho0G2cdGEcn86-5b8gf~h1sPa0jwefexQO1hPLK~lAOxxsm8cuWS~2u8SUeNYd9zLUYPjqHQLyr5zA0vl08mBIPrl6v6IL9vER~5g8ununnxwyLAbRFIKXQtG1yaZj0QJBel9TJEspB0WTutbu5cDBW4~3Tb3jK7NKCLQt1ze4ZuTGszVmehz7V4SGuPtxyeQnOR1u1HuMbcsOHRfKJKdJ~X~hDAWDXSbXhXk3UrbpJngEO3nrYXDye93n0dtUDSGjpZ5Ebjq-iHCJ-9S4Hr0XF1KHvkYj6YPoz~Ql1W~qqbTX4PsX-3c34cXWt0dmDj4iACa2fVpb~p-eIWg__\u0026Key-Pair-Id=APKAI5CNWVPOYRVLGC6Q","description":"\r\n{center: begin}{size : 30}{outline : #000000 , 2}{shadow: #FFCC00, 3, 4, 5}\"{shadow: #FFCC00, 3, 4, 5}{color : #4B76F2}Check 1-2 {color}\r\n{shadow: #FFCC00, 3, 4, 5}{size : 20}News\" {size}\r\n{size : 28}available !{size}{outline}{shadow}{color}{center}\r\n\r\nJREPL ROCKS !\r\n\r\n"}

Code: Select all

:: step1 split lines
:: step2 only keep lines with image or description
:: step3 process image lines
:: step4 replace \r\n\r\n by <br>
:: step5 blank strings \r\n, \" and {...}
:: step6 process description lines
:: step7 remove empty lines

jrepl "({*?\q[^\q]*?\q:)" "\n$1" /f input.txt /xseq ^
  | jrepl "\qimage|description\q" "$src" /xseq /jmatch  ^
  | jrepl "(?:{*?\qimage\q:)(?:(.*?)\?Expires)(?:.*?)$" "<img src=$1\q /><br>" /xseq ^
  | jrepl "\\r\\n\\r\\n" "<br>" ^
  | jrepl "\\r\\n|\\\q|{.*?}" "" /xseq ^
  | jrepl "(?:{*?\qdescription\q:\q)(.*?)(?:\q*?(?:,|}))\r*?$" "<span style=\qcolor: #ff9900;\q><strong>$1</strong><\/span>" /xseq ^
  | jrepl "^" ""
Output:

Code: Select all

<img src="https://test.com/pics1.png" /><br>
<span style="color: #ff9900;"><strong>Hello !  Join our community</strong><\/span>
<img src="https://test.com/pics2.png" /><br>
<span style="color: #ff9900;"><strong>Check 1-2 News available !<br>JREPL ROCKS !<br></strong><\/span>

Re: JREPL.BAT v8.6 - regex text processor with support for text highlighting and alternate character sets

Posted: 30 Jul 2020 16:22
by dbenham
Here is JREPL version 8.6
JREPL8.6.zip
(31.48 KiB) Downloaded 1662 times

Summary of changes:

Code: Select all

    2020-07-30 v8.6: Added options to /K, /R, and /MATCH to count the number
                     of matches or rejects instead of printing them.
                     Added the counter /JScript variable.
                     Fixed /T Pig Latin example - $0 corrected to read $&
FIND /C "search" counts the number of lines that contain "search". But the types of searches are extremely limited.

I wanted to add a similar functionality to JREPL, but with all the additional sophisticated capabilities that JREPL offers.

JREPL "search" "" /L /K COUNT /F "file.txt" is the equivalent to FIND /C "search" "file.txt" - The number of lines that contain "search" is reported.

JREPL "search" "" /L /R COUNT /F "file.txt" is the equivalent to FIND /C /V "search" "file.txt" - The number of lines that do not contain "search" is reported.

Lastly there is a /MATCH version that counts the total number of matches rather than lines. A single line may contain multiple matches.
JREPL "search" "COUNT" /L /MATCH /F "file.txt" counts the total number of times "search" is found within the file.

Of course you don't have to use the /L literal option, you can do a regex search instead. And you can combine the counting with other options like /INC, /EXC and /RTN to name a few.

All three counting options utilize a counter JScript variable to keep track of the count. You are free to use this variable in your own JScript provided that you are not using one of the /K /R or /MATCH counting options. The counter variable is initialized to 0 at the start.

See the following documentation for a complete description of the new features:
JREPL /?/K
JREPL /?/R
JREPL /?/MATCH
JREPL /?JSCRIPT