Batch script to exctract everything between 2 regex

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Elpolloloco
Posts: 6
Joined: 04 Feb 2020 09:49

Batch script to exctract everything between 2 regex

#1 Post by Elpolloloco » 04 Feb 2020 09:54

Hello everyone. I am new to programming I have about 400 *.txt files in a folder, they all contain data between 2 specific text and I need that data either in a new file (one file per *.txt)

I have read about jrepl.bat but i really have no idea how to do it i have try everything and nothing works.

Can someone help me with that, what should I add in the jrepl.bat for such thing to happend, and where in the .bat file

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch script to exctract everything between 2 regex

#2 Post by dbenham » 04 Feb 2020 11:14

You have not given enough information to give any help.

What are the text markers that identify the text to keep? Are they string literals? or regex?
Should the markers be included in the output?
Do the markers constitute an entire line? If not then the line containing the markers may need to be split.

In addition to fully describing the rules, you should provide a small test file as well as the expected output. It doesn't need to be real data, just enough to illustrate the rules.


Dave Benham

Elpolloloco
Posts: 6
Joined: 04 Feb 2020 09:49

Re: Batch script to exctract everything between 2 regex

#3 Post by Elpolloloco » 04 Feb 2020 12:21

Ok sorry for not giving info. Here is an example

I have a lot of files in a folder, with many names, all are *.txt

Original_file.txt
Line 1
Line 2
Line 3
Line 4
Line 5

I want a new file (or the file itself can be modified) with this result

Output_file.txt
Line 3
Line 4

The point is, getting everything that is between the regex values "Line 2" and "Line 5"

Or, at least delete everything above "Line 3" in one script and another to delete everything after "Line 4"

Keep in mind that the folder has hundreds of *.txt files so i just want the batch to do it automatically with everything inside that folder.

Thank you very much for the help. :D

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch script to exctract everything between 2 regex

#4 Post by dbenham » 04 Feb 2020 14:54

You say regex, but it looks like you are looking to match a string literal.
I'll run with what you have given.

Code: Select all

for %%F in (*.txt) do call jrepl "^" "" /k 0 /inc "'Line 2'be+1:'Line 5'be-1" /f "%%F" /o "%%~nF.mod.txt"
The /INC option specifies which lines to include in the search. It starts with the line after the "Line 1" line, and ends with the line before the "Line 5" line. The BE characters specify the strings must match both the beginning and the end of the line (exact line match).

The /K 0 option signifies to output all lines that match the search. In this case the search simply matches the beginning of a line, so it matches all lines that pass the /INC test. The replace argument is ignored. The 0 specifies not to include any additional lines before or after the matching line.

Elpolloloco
Posts: 6
Joined: 04 Feb 2020 09:49

Re: Batch script to exctract everything between 2 regex

#5 Post by Elpolloloco » 05 Feb 2020 07:16

If I add that code to a .bat file i get

JScript runtime error opening input file: File not found

If i try to use it on a CMD window opened on the directory i get

%%F was unexpected at this time.

All the files in that directory have different names

Elpolloloco
Posts: 6
Joined: 04 Feb 2020 09:49

Re: Batch script to exctract everything between 2 regex

#6 Post by Elpolloloco » 05 Feb 2020 07:17

Also, does the '' around "line2" are necesary?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch script to exctract everything between 2 regex

#7 Post by dbenham » 05 Feb 2020 09:31

Sorry, I forgot the /f option, and I also had a case problem. I've updated the code in my prior post.

All quotes are needed as written. The outer double quotes are needed to treat the entire construct as a single parameter. The single quotes signify that the string is to be treated as a string literal

Elpolloloco
Posts: 6
Joined: 04 Feb 2020 09:49

Re: Batch script to exctract everything between 2 regex

#8 Post by Elpolloloco » 05 Feb 2020 13:14

Yes now its working fine, Thank You. It seems like it needs the whole line value and not just a partial value.

Now, is there any way to do the complete opposite? Keep only the content OUTSIDE that range.
Last edited by Elpolloloco on 05 Feb 2020 14:18, edited 1 time in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch script to exctract everything between 2 regex

#9 Post by dbenham » 05 Feb 2020 14:08

Argh. My mind is obviously elsewhere, and I am making silly mistakes.

My original code was correct, except for a lower case %%f that should have be %%F. I corrected that post once again.

But I just realized you are trying to run the command from the command line. I was assuming you wanted a batch script.

To run the command from the command line you need to change all doubled %% to a single % as follows:

Code: Select all

for %F in (*.txt) do call jrepl "^" "" /k 0 /inc "'Line 2'be+1:'Line 5'be-1" /f "%F" /o "%~nF.mod.txt"
You should never have a reason to change the JREPL.BAT file, (unless a new version comes out)
JREPL has extensive capability through its many command line options. Full documentation is available via JREPL /?

Elpolloloco
Posts: 6
Joined: 04 Feb 2020 09:49

Re: Batch script to exctract everything between 2 regex

#10 Post by Elpolloloco » 05 Feb 2020 14:30

Thanks, I get a lot of these documents every month and they are not always so predictable, so I would appreciate if you could help me out with a few more codes in order to do any other possible scenario

1. String 1
2. String 2
3. String 3
4. String 4
5. String 5

Keep everything ABOVE String 2
Keep everything ABOVE 3. (line, not string)
Keep everything BELOW String 4
Keep everything BELOW 4. (Line, not string)
The opposite of the former code, delete everything except whatever is between String 2 and String 4
The opposite of the former code (by line), delete everything except whatever is between 2. (line, not string) and 4. (Line, not string)

With these im sure I will be able to handle pretty much any scenario I get in a future.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Batch script to exctract everything between 2 regex

#11 Post by dbenham » 05 Feb 2020 15:27

I don't mind giving an example how to do something, but at this point you should learn to do this your self.

Enter JREPL /?/EXC and JREPL /?/INC to get documentation on the syntax for specifying which lines to include or exclude. Study the help, including the examples, and also study the code I already gave you. You should be able to adapt that code to meet your needs.

Post Reply