Page 4 of 37
Re: JREN.BAT - Rename files/folders using regular expression
Posted: 22 Jan 2015 03:33
by foxidrive
dbenham wrote:That is exactly the type of problem that
JREPL.BAT is designed to solve. The /J option provides a mechanism to apply the JScript toUpperCase() method to a portion of the replacement string.
Your question really belongs in the JREPL.BAT thread.
Mucho goodness Dave, thanks for taking the time to show and explain this. (posts moved too).
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 22 Jan 2015 13:25
by dbenham
Here is version 3.4 with a minor bug fix. The option parser defined a variable named TEST that could interfere with a user supplied variable name via the /S or /V option. The internal name was changed to /TEST to make it unlikely to collide with user defined variables. Also, the documentation was updated to indicate that /S and /V variable names cannot match an option name, nor be /TEST.
JREPL.BAT Version 3.4Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 23 Jan 2015 22:51
by foxidrive
Hi Dave,
The code below gives this line - have I done something unusual?
File 'Airborne.mkv': container: Matroska call jeval "-1000000000"1647920000000I had expected this:
edit: and I just found that it was because of the
? in (.*?)
File 'Airborne.mkv': container: Matroska call jeval "1647920000000-1000000000"Code: Select all
@echo off
echo File 'Airborne.mkv': container: Matroska [duration:1647920000000|jrepl "\[duration:(.*?)" "call jeval \q$1-1000000000\q" /x
pause
goto :EOF
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 23 Jan 2015 23:51
by dbenham
Yes.
(.*?) is a non-greedy match, meaning it will match the minimum amount possible and still have the remainder of the search match. In this case it matches nothing.
(.*) is a greedy match, meaning it will match the maximum amount possible, yet still have the remainder of the search match. In this case it matches to the end of the string.
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 24 Jan 2015 00:44
by foxidrive
dbenham wrote:Yes.
(.*?) is a non-greedy match, meaning it will match the minimum amount possible and still have the remainder of the search match. In this case it matches nothing.
Thanks.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 26 Jan 2015 20:28
by bars143
Hi @Dbenham,
do you have Jreply script that translate output.txt from english-cebuano.txt source to translated.txt
please read my request to @Aacini in this link:
this page 39510but the output should have " - " and " ' " :
the result after translation should be:
source of translation is:
Code: Select all
my , ang akong
user-name , user-name
can't , 'ili
be , mao ang
Bars , Bars
my delimeter is comma symbol but you can change it if something wrong in using that character delimeter in creating my own tranlation text file.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 26 Jan 2015 22:16
by dbenham
That will be a very crude translation, given that the same word can have many meanings, and thus different translations, depending on context.
But your request is easily done. I create a dictionary using environment variables. The JREPL search term identifies each word in the text, and then replaces it with the environment variable value if it exists. If the translation does not exist, then it preserves the original word.
I also check the initial letter of the original word, and if it is upper case, then I make the initial letter of the translation upper case as well.
I designed the code to work properly with normal text in paragraph form. The input does not need to have each word on a separate line.
For my test I removed user-name and Bars from the dictionary to verify that unknown words are preserved as is.
translate.txtCode: Select all
my , ang akong
can't , 'ili
be , mao ang
test.txtcodeCode: Select all
@echo off
setlocal
:: Delete any pre-existing _ variables
for /f "delims==" %%V in ('set _ 2^>nul') do set "%%V="
:: Load the dictionary
for /f "tokens=1* delims=, " %%A in (translate.txt) do set "_%%A=%%B"
:: Translate the text
call jrepl "(?:([A-Z])|[a-z0-9])(?:\S*[A-Za-z0-9])?" ^
"word=env('_'+$0)?env('_'+$0):$0;$1?word.slice(0,1).toUpperCase()+word.slice(1):word" ^
/j /f test.txt
--OUTPUT--Code: Select all
"Ang akong user-name 'ili mao ang Bars!"
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 27 Jan 2015 01:25
by bars143
dbenham wrote:That will be a very crude translation, given that the same word can have many meanings, and thus different translations, depending on context.
But your request is easily done. I create a dictionary using environment variables. The JREPL search term identifies each word in the text, and then replaces it with the environment variable value if it exists. If the translation does not exist, then it preserves the original word.
I also check the initial letter of the original word, and if it is upper case, then I make the initial letter of the translation upper case as well.
I designed the code to work properly with normal text in paragraph form. The input does not need to have each word on a separate line.
For my test I removed user-name and Bars from the dictionary to verify that unknown words are preserved as is.
translate.txtCode: Select all
my , ang akong
can't , 'ili
be , mao ang
test.txtcodeCode: Select all
@echo off
setlocal
:: Delete any pre-existing _ variables
for /f "delims==" %%V in ('set _ 2^>nul') do set "%%V="
:: Load the dictionary
for /f "tokens=1* delims=, " %%A in (translate.txt) do set "_%%A=%%B"
:: Translate the text
call jrepl "(?:([A-Z])|[a-z0-9])(?:\S*[A-Za-z0-9])?" ^
"word=env('_'+$0)?env('_'+$0):$0;$1?word.slice(0,1).toUpperCase()+word.slice(1):word" ^
/j /f test.txt
--OUTPUT--Code: Select all
"Ang akong user-name 'ili mao ang Bars!"
Dave Benham
@Dbenham,
Thanks for quick reply and its also a bigger help to your additional info as its a coincidence that i had planned it as you suggested above some option.
with your great answer then i can start creating dictionary file to suit my dialect language.
Cebuano language, is a second dialect language in my country Philippine .
i will start to create DVD subtitle's cebuano version soon.
Bars
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 27 Jan 2015 21:26
by bars143
@Dbenham , a big thanks for your good scripts -- it works on multi-lines with one empty line per sentence including .srt format that required timestamp before a sentence like this example:
1
00:00:24,827 --> 00:00:29,827
"My user-name can't be Bars!"
2
00:00:59,587 --> 00:01:04,587
My other user-name is Bars143!
here are my files to works on my assignment:
translation source file --> english-cebuano.txt
Code: Select all
my , ang akong
can't , 'ili
be , mao ang
other , uban
is , ay
input subtitle file --> english.srt
Code: Select all
1
00:00:24,827 --> 00:00:29,827
My user-name can't be Bars!
2
00:00:59,587 --> 00:01:04,587
My other user-name is Bars143!
output subtitle file -->cebuano.srt
Code: Select all
1
00:00:24,827 --> 00:00:29,827
Ang akong user-name 'ili mao ang Bars!
2
00:00:59,587 --> 00:01:04,587
Ang akong uban user-name ay Bars143!
but the actual subtile file can be 625 sentences/lines or above in one DVD movie.
i think it can be working on big srt file too.
here is my edited code based on your scripts added output file at the end of script:
Code: Select all
@echo off
setlocal
:: Delete any pre-existing _ variables
for /f "delims==" %%V in ('set _ 2^>nul') do set "%%V="
:: Load the dictionary
for /f "tokens=1* delims=, " %%A in (english-cebuano.txt) do set "_%%A=%%B"
:: Translate the text
call jrepl "(?:([A-Z])|[a-z0-9])(?:\S*[A-Za-z0-9])?" ^
"word=env('_'+$0)?env('_'+$0):$0;$1?word.slice(0,1).toUpperCase()+word.slice(1):word" ^
/j /f english.srt >cebuano.srt
Bars
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 06 Apr 2015 06:28
by foxidrive
Can someone spare a few brain cells to say if this is feasible with jrepl, or findrepl?
I have a file like this and I want to remove the lines in each block of dates but keep last 4 lines.
So all lines would be kept - but in each block of dates only the last 4 would be kept.
There is further text on the end of many lines if that is significant.
Code: Select all
[FONT=Courier New]
See the top
[b][color=orange]============[/color][/b]
[b][color=green]Date:[/color]/[b]
02/02/2003 [b][color=green]$02.50=[/color][/b]
25/00/2004 [color=red]$28.50 /05[/color] -$6.54
06/02/2004 [color=blue]$04.20 /05[/color] $0.08
22/03/2004 [color=blue]$09.70 /05[/color] $0.64
05/04/2004 [color=red]$28.50 /05[/color] -$6.54
03/05/2004 [color=blue]$9.40 /05[/color] $0.78
30/05/2004 [color=blue]$00.20 /05[/color] $0.93
07/06/2004 [color=blue]$00.95 /05[/color] $0.00
04/06/2004 [color=red]$39.25 /05[/color] -$3.27
24/06/2004 [b][color=green]$00.00=[/color][/b]
09/07/2004 [color=red]$84.50 /05[/color] -$2.04
26/07/2004 [color=blue]$04.85 /05[/color] $0.20
06/09/2004 [color=blue]$03.45 /05[/color] $0.02
03/09/2004 [color=blue]$02.40 /05[/color] $0.03
20/09/2004 [color=blue]$32.85 /05[/color] $2.70
27/09/2004 [color=red]$85.50 /05[/color] -$2.03
08/00/2004 [b][color=green]$08.00=[/color][/b]
08/00/2004 [color=blue]$04.70 /05[/color] $0.23
06/02/2004 [color=red]$85.50 /05[/color] -$2.03
04/02/2005 [color=red]$85.50 /05[/color] -$2.03
22/02/2005 [color=blue]$35.50 /05[/color] $2.96
04/03/2005 [color=blue]$22.50 /05[/color] $0.88
28/03/2005 [color=blue]$20.80 /05[/color] $0.73
[b][color=orange]============[/color][/b]
[b][color=green]Date:[/color]/[b]
09/00/2003 [color=blue]$20.40 /00=[/color] $2.04
06/00/2003 [color=red]$28.50 /00=[/color] -$2.04
25/00/2004 [color=red]$28.50 /05[/color] -$6.54
06/00/2004 [b][color=green]$00.00=[/color][/b]
08/02/2004 [b][color=green]$02.00=[/color][/b]
06/02/2004 [color=blue]$04.20 /05[/color] $0.08
22/03/2004 [color=blue]$09.70 /05[/color] $0.64
05/04/2004 [color=red]$28.50 /05[/color] -$6.54
03/05/2004 [color=blue]$9.40 /05[/color] $0.78
30/05/2004 [color=blue]$00.20 /05[/color] $0.93
07/06/2004 [color=blue]$00.95 /05[/color] $0.00
04/06/2004 [color=red]$39.25 /05[/color] -$3.27
09/07/2004 [color=red]$84.50 /05[/color] -$2.04
26/07/2004 [color=blue]$04.85 /05[/color] $0.20
06/09/2004 [color=blue]$03.45 /05[/color] $0.02
03/09/2004 [color=blue]$02.40 /05[/color] $0.03
20/09/2004 [color=blue]$32.85 /05[/color] $2.70
27/09/2004 [color=red]$85.50 /05[/color] -$2.03
08/00/2004 [color=blue]$04.70 /05[/color] $0.23
06/02/2004 [color=red]$85.50 /05[/color] -$2.03
04/02/2005 [color=red]$85.50 /05[/color] -$2.03
22/02/2005 [color=blue]$35.50 /05[/color] $2.96
04/03/2005 [color=blue]$22.50 /05[/color] $0.88
28/03/2005 [color=blue]$20.80 /05[/color] $0.73
[b][color=orange]============[/color][/b]
[b][color=green]Date:[/color]/[b]
09/00/2003 [color=blue]$20.40 /00=[/color] $2.04
06/00/2003 [color=red]$28.50 /00=[/color] -$2.04
25/00/2004 [color=red]$28.50 /05[/color] -$6.54
06/02/2004 [color=blue]$04.20 /05[/color] $0.08
22/03/2004 [color=blue]$09.70 /05[/color] $0.64
05/04/2004 [color=red]$28.50 /05[/color] -$6.54
03/05/2004 [color=blue]$9.40 /05[/color] $0.78
30/05/2004 [color=blue]$00.20 /05[/color] $0.93
07/06/2004 [color=blue]$00.95 /05[/color] $0.00
04/06/2004 [color=red]$39.25 /05[/color] -$3.27
09/07/2004 [color=red]$84.50 /05[/color] -$2.04
26/07/2004 [color=blue]$04.85 /05[/color] $0.20
06/09/2004 [color=blue]$03.45 /05[/color] $0.02
03/09/2004 [color=blue]$02.40 /05[/color] $0.03
20/09/2004 [color=blue]$32.85 /05[/color] $2.70
27/09/2004 [color=red]$85.50 /05[/color] -$2.03
08/00/2004 [color=blue]$04.70 /05[/color] $0.23
06/02/2004 [color=red]$85.50 /05[/color] -$2.03
06/00/2005 [b][color=green]$20.00=[/color][/b]
04/02/2005 [color=red]$85.50 /05[/color] -$2.03
22/02/2005 [color=blue]$35.50 /05[/color] $2.96
04/03/2005 [color=blue]$22.50 /05[/color] $0.88
28/03/2005 [color=blue]$20.80 /05[/color] $0.73
[/FONT]
I could split it up into files and use the tail feature but I wondered if there was a 'cleverer' method. Ta.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 06 Apr 2015 07:29
by dbenham
25/00/2004 is a date
So your months are 0 based (0=jan, 11=dec)
Anyway, the answer is actually quite simple - look for consecutive one or more lines beginning with a date that precede (look ahead) 4 lines that begin with a date, and replace with nothing.
Code: Select all
jrepl "(^\d\d/\d\d/\d{4} .*\n)+(?=(^\d\d/\d\d/\d{4} .*\n){4})" "" /m /f test.txt
sample output:
Code: Select all
[FONT=Courier New]
See the top
[b][color=orange]============[/color][/b]
[b][color=green]Date:[/color]/[b]
04/02/2005 [color=red]$85.50 /05[/color] -$2.03
22/02/2005 [color=blue]$35.50 /05[/color] $2.96
04/03/2005 [color=blue]$22.50 /05[/color] $0.88
28/03/2005 [color=blue]$20.80 /05[/color] $0.73
[b][color=orange]============[/color][/b]
[b][color=green]Date:[/color]/[b]
04/02/2005 [color=red]$85.50 /05[/color] -$2.03
22/02/2005 [color=blue]$35.50 /05[/color] $2.96
04/03/2005 [color=blue]$22.50 /05[/color] $0.88
28/03/2005 [color=blue]$20.80 /05[/color] $0.73
[b][color=orange]============[/color][/b]
[b][color=green]Date:[/color]/[b]
04/02/2005 [color=red]$85.50 /05[/color] -$2.03
22/02/2005 ] [color=blue]$35.50 /05[/color] $2.96
04/03/2005 [color=blue]$22.50 /05[/color] $0.88
28/03/2005 [color=blue]$20.80 /05[/color] $0.73
[/FONT]
Having solved this, I now realize that head and tail could be implemented without any user suplied JScript or /C option.
head.bat
Code: Select all
::head.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "((.*\n){%~1})[\s\S]+" "$1" /m %2 %3 %4 %5 %6 %7
tail.bat
Code: Select all
::tail.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "(.*\n)+(?=(.*\n|.+(?![\s\S])){%~1})" "" /m %2 %3 %4 %5 %6 %7
But these new versions are limited to ~2GB files because of the /M option, and I think they also may be less efficient. I would still use the
older versions.
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 07 Apr 2015 04:56
by foxidrive
I obfuscated the data a tad by globally replacing a few numbers with a different number, but your solution looks wonderful - if only I could figure out how it works.
Anyway, the answer is actually quite simple - look for consecutive one or more lines beginning with a date that precede (look ahead) 4 lines that begin with a date, and replace with nothing.
Code: Select all
jrepl "(^\d\d/\d\d/\d{4} .*\n)+(?=(^\d\d/\d\d/\d{4} .*\n){4})" "" /m /f test.txt
I can see it's checking for the date format at the start of the lines and taking the entire line with the CR at the end
and the next bit
+(?=(^\d\d/\d\d/\d{4} .*\n){4}) is adding 4 more of the same - using the + as some arithmetic
to add 4 of the next (same) terms.
I have a basic appreciation of how it works but the technique mashes my brain like adding a quart of whiskey to my orange juice,
and I don't quite follow the lookaheads. But thank you for your generous assistance and I'll be able to apply the same code
in future by copying and testing.
Having solved this, I now realize that head and tail could be implemented without any user suplied JScript or /C option.
head.bat
Code: Select all
::head.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "((.*\n){%~1})[\s\S]+" "$1" /m %2 %3 %4 %5 %6 %7
tail.bat
Code: Select all
::tail.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "(.*\n)+(?=(.*\n|.+(?![\s\S])){%~1})" "" /m %2 %3 %4 %5 %6 %7
But these new versions are limited to ~2GB files because of the /M option, and I think they also may be less efficient. I would still use the
older versions.
Dave Benham
Thanks for these extra options too.
My file is only 300 kb so it scrapes in.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 07 Apr 2015 05:20
by dbenham
You just need to study the regex syntax a bit more
The + means the previous expression must match 1 or more times (similar to * which matches 0 or more times)
The ?= is a look-ahead construct, meaning the expression within the parentheses (the 4 lines) must follow the previous expression, but the content is not considered part of the match.
So, in summary, it looks for as many matching lines as it can find that precede the 4 matching lines. Since the last 4 matching lines are not included in the match, we can replace everything with nothing.
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 07 Apr 2015 08:45
by foxidrive
Thanks for the clear explanation. I see now where my mistaken impressions were failing me.
Juts now I've been playing with another aspect, using the multi-line /m switch and here once more I seem to have misunderstood something.
I had expected every text from
*** and onwards to be removed but this is not to be.
Can you please advise how I can remove the last part of the file from
*** at the start of a line?
Code: Select all
@echo off
(
echo(aaa
echo(
echo(***
echo(
echo(bbb
)>accounts.txt
:remove text at bottom from *** and onward
echo ======
call jrepl "^(.*)\*\*\*.*" "$1" /m /f accounts.txt
echo ======
call jrepl "^(.*)\2A\2A\2A.*" "$1" /x /m /f accounts.txt
echo ======
pause
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Posted: 07 Apr 2015 10:08
by dbenham
^\*\*\*$ Matches your *** line
[\s\S] matches any character
so [\s\S]* matches everything after your ***, including the \r and \n characters at the end of ***
Putting it all together:
Code: Select all
jrepl "^\*\*\*$[\s\S]*" "" /m /f test.txt
Dave Benham