Easy way to check tokens 5-16 and delete if specified pattern?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
SIMMS7400
Posts: 451
Joined: 07 Jan 2016 07:47

Easy way to check tokens 5-16 and delete if specified pattern?

#1 Post by SIMMS7400 » 20 Feb 2020 03:22

Hi Folks -

I have the following data file (sample) that is created by another team and they are unwilling to work with me to clean up their process. Therefore, I need to handle cleaning up the file on my end:
E01,S00000,1040-00,2019,"","","","","","","","","","","",""
E01,S00000,1100-00,2019,267171.46,273027.23,717379.93,195954.96,296135.03,813816.07,252308.17,304287.16,698450.70,335489.17,567923.76,728968.55
E01,S00000,1100-30,2019,1889066.33,2863499.20,682036.53,1985084.97,3921360.24,980411.11,2915929.92,4929743.93,1060628.90,2744523.44,3903142.91,595767.42
E01,S00000,1130-00,2019,"","","","","","","","","","","",""
E01,S00000,1140-00,2019,"","","","","","","","","","","",""
E01,S00000,1210-30,2019,163016436.29,162153109.71,161479341.90,161241199.50,160727337.53,159780948.62,159718783.88,159194871.14,159133778.25,158872207.43,158751022.73,158882082.53
E01,S00000,1210-40,2019,-218211945.61,-217738772.03,-221904629.75,-225170015.31,-224702941.25,-225234443.63,-227407317.13,-230219433.60,-229532466.89,-217082211.44,-204252043.55,-190508732.54
E01,S00000,1210-90,2019,"",'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
E01,S00000,1210-50,2019,1578792.53,4648868.03,-263904.89,2608853.62,2688750.22,-937903.84,147911.79,-747464.56,765309.36,656466.40,-145911.30,501938.01
E01,S00000,1210-90,2019,'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'
What I need to do is remove the lines where tokens 5-16 are either NULL (""), 'MISSING' or a combination of "" & 'MISSING'.

Right now, I have this code running before the data upload and it's working fine:
FINDSTR /V /C:"\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\",\"\"" "%DATAFILENAME%" > "%DATAFILENAME%out"
FINDSTR /V /C:"'#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING','#MISSING'" "%DATAFILENAME%out" > "%DATAFILENAME%out1"
But I'm unsure how to code for the 3rd exception I mentioned above when it would be a combination of 'MISSING' and "". Essentially, as long as there are NO numerical values in tokens 5-16, I can delete that line. Any ideas?

penpen
Expert
Posts: 1786
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Easy way to check tokens 5-16 and delete if specified pattern?

#2 Post by penpen » 20 Feb 2020 06:04

You should use Dave's JREPL.bat:
viewtopic.php?f=3&t=6044

penpen

dbenham
Expert
Posts: 2347
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Easy way to check tokens 5-16 and delete if specified pattern?

#3 Post by dbenham » 20 Feb 2020 07:39

Your two existing FINDSTR statements are special cases of a general solution.

You have already stated the simple solution without realizing it:
SIMMS7400 wrote:Essentially, as long as there are NO numerical values in tokens 5-16, I can delete that line.
Based on your description, plus your sample data, all you need to do is delete any line that does not contain a digit after the 4th comma. Or, more appropriately, preserve any line that does have a digit after the 4th comma.

Code: Select all

findstr "^[^,]*,[^,]*,[^,]*,[^,]*,.*[0123456789]" "%DATAFILENAME%" >""%DATAFILENAME%out"
Certainly JREPL can do this easily, but there really is no need to use anything other than FINDSTR, except for the fact that the JREPL regex can be more compact.

Code: Select all

call jrepl "^(.*?,){4}.*\d" "" /k 0 /f "%DATAFILENAME%" /o "%DATAFILENAME%out"
And if you are OK with overwriting the original file, then

Code: Select all

call jrepl "^(.*?,){4}.*\d" "" /k 0 /f "%DATAFILENAME%" /o -
Dave Benham

SIMMS7400
Posts: 451
Joined: 07 Jan 2016 07:47

Re: Easy way to check tokens 5-16 and delete if specified pattern?

#4 Post by SIMMS7400 » 20 Feb 2020 11:28

wow!!!! THat was easy enough.

Thank you Dave, I really appreciate it!!!

penpen
Expert
Posts: 1786
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Easy way to check tokens 5-16 and delete if specified pattern?

#5 Post by penpen » 20 Feb 2020 13:06

We don't know exactly how those numbers are generated and most probably they are IEEE 754 numbers where you might get "inf", "+inf", "-inf" or "NaN" as well.
To not miss such lines i would use something like the following (untested):

Code: Select all

call jrepl "^[^,]*(,[^,]*){3}(,\x22\x22|,'#MISSING'){12}.*" "" /R 0 /f "%DATAFILENAME%"
penpen

Post Reply