JREPL to check 12 columns for a certain string?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
SIMMS7400
Posts: 475
Joined: 07 Jan 2016 07:47

JREPL to check 12 columns for a certain string?

#1 Post by SIMMS7400 » 17 Jun 2020 04:18

HI Folks -

I have a massive data file (600K rows) that I need to "clean". The source team where I get the file from is pushing back on doing this clean up from their end so I need to build a solution on my end before I import the file into our financial application.

There are TONS of rows that can be deleted. For instance the ENTIRE row can be deleted if columns 5-16 include "" or '#MISSING'. Example:
E01,S00900,6016-10,2020,"","","","","","",'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
E01,S00900,6016-13,2020,"","","","","","",'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
E01,S00900,6016-14,2020,"","","","","","",'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
E01,S00900,6016-15,2020,"","","","","","",'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
These rows however can't be deleted:
E01,S00900,6520-00,2020,279.56,2812.98,435.01,1072.39,51.75,318.20,'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
E01,S00900,6520-01,2020,32.10,32.10,"",450.00,"","",'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
Can this be done with JREPL? If so, how would I code the find portion of it? I guess you could say if columns 5+ do not have a numerical value, then delete entire row.

dbenham
Expert
Posts: 2388
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL to check 12 columns for a certain string?

#2 Post by dbenham » 17 Jun 2020 06:00

Quite simple actually. You just need to write a regex that matches lines to be deleted and use the /R option to reject lines that match and the /XSEQ option to enable substitution of \c for ^ and \q for "

The \c is needed due to the pesky doubling of ^ when using CALL

Code: Select all

call jrepl "\c(?:[\c,]*,){4}(?:\q\q|'#MISSING')(?:,(?:\q\q|'#MISSING'))*$" "" /xseq /r 0 /f input.csv /o output.csv
Use /O - (/O followed by a dash) if you want to overwrite the original file.

It would be even faster if you can get your hands on a copy of GNU grep for Windows:

Code: Select all

grep -v -P "^(?:[^,]*,){4}(?:\"\"|'#MISSING')(?:,(?:\"\"|'#MISSING'))*$" input.csv >output.csv

Dave Benham

SIMMS7400
Posts: 475
Joined: 07 Jan 2016 07:47

Re: JREPL to check 12 columns for a certain string?

#3 Post by SIMMS7400 » 19 Jun 2020 04:57

HI Dave -

As usual, thank you so much!!! That worked like an absolute charm!!!!

Thank you and have a great weekend!

Post Reply