Page 1 of 2

Find All Filenames NOT Containing a String

Posted: 19 Jul 2017 06:05
by Samir
My brain is failing me this morning. :(

I have a set of filenames like this:

Code: Select all

BAT000001480.RPT
BAT000001481.RPT
BAT000001482.RPT
BAT000001483.RPT
BAT000001484.RPT
BAT000001485.RPT
Inside most of these files is the following string along with other strings:

Code: Select all

No Data To Report
I want to find all the filenames that DON'T have this string in them, and I don't want to look at any of the other data inside the file (will do this by opening the file), I just want the filename.

Basically I need the output to be the inverse of:

Code: Select all

FINDSTR /M "No Data To Report" BAT*.RPT
where it would only display the filenames NOT containing the string vs containing it.

Thank you in advance. Any additional questions, please ask. I tried to be as through as possible when describing the problem. (I also did several searches on this as I'm sure it has come up before, but alas I couldn't find anything that I could use. :cry: )

Re: Find All Filenames NOT Containing a String

Posted: 19 Jul 2017 07:22
by aGerman
Actually you have to apply two more options. First you need /c otherwise it searches the four words separately. Then you need /v to invert the searching.

Code: Select all

FINDSTR /MVC:"No Data To Report" BAT*.RPT


Steffen

Re: Find All Filenames NOT Containing a String

Posted: 19 Jul 2017 09:51
by Samir
aGerman wrote:Actually you have to apply two more options. First you need /c otherwise it searches the four words separately. Then you need /v to invert the searching.

Code: Select all

FINDSTR /MVC:"No Data To Report" BAT*.RPT


Steffen
I played with these earlier and they would still return files that had the strings vs those without. And I tried it again, exactly how you had it on a single file I'm using for testing and it still listed the file even though it shouldn't have been since I wanted files without the phrase. :( (This seems to come from the /V still returning the rest of the contents of the file, so something is still 'found'.)

Thoughts?

Re: Find All Filenames NOT Containing a String

Posted: 19 Jul 2017 09:57
by aGerman
Samir wrote:(This seems to come from the /V still returning the rest of the contents of the file, so something is still 'found'.)

Hmm good point.

untested

Code: Select all

for %%i in (BAT*.RPT) do >nul FINDSTR /C:"No Data To Report" "%%~i" || echo "%%i"


Steffen

Re: Find All Filenames NOT Containing a String

Posted: 19 Jul 2017 16:21
by Samir
aGerman wrote:
Samir wrote:(This seems to come from the /V still returning the rest of the contents of the file, so something is still 'found'.)

Hmm good point.

untested

Code: Select all

for %%i in (BAT*.RPT) do >nul FINDSTR /MC:"No Data To Report" "%%~i" || echo "%%i"


Steffen
It worked! 8)

But after looking at the results, I realize that I now have a second requirement. :oops:

And now when thinking about that other requirement, I realized why this was so easy the first time--I was looking for a single word vs not trying to find a phrase. :oops: A simple findstr command found everything I needed in one go. 8)

Thank you for your help!

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 11:24
by dbenham
aGerman wrote:Removed option /M, it doesn't make sense anymore.

Actually the /M option can still serve a purpose.

Without /M, the entire file must be scanned.

With /M, the scan will end as soon as the first matching line is found. This can have a significant impact if the file is very large.

Of course the entire file must be scanned if a matching line is never found.


Dave Benham

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 12:48
by Samir
dbenham wrote:
aGerman wrote:Removed option /M, it doesn't make sense anymore.

Actually the /M option can still serve a purpose.

Without /M, the entire file must be scanned.

With /M, the scan will end as soon as the first matching line is found. This can have a significant impact if the file is very large.

Of course the entire file must be scanned if a matching line is never found.


Dave Benham
Very interesting. Then this would have made sense if we had continued the original search as there 7000+ files. It would have been interesting to see the difference in time with and without the /M switch.

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 13:17
by dbenham
I should clarify what I meant by "matching line" - any line that would be printed by FINDSTR if the /M option is not used.

So if the /V option is used, then a "matching line" is any line that does not match the search string. Combine the /M option with /V, and the scan terminates as soon as a line is found that does not match the search string. Without the /V option, the /M option terminates at the first line that does match the search string.


Dave Benham

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 13:33
by Samir
dbenham wrote:I should clarify what I meant by "matching line" - any line that would be printed by FINDSTR if the /M option is not used.

So if the /V option is used, then a "matching line" is any line that does not match the search string. Combine the /M option with /V, and the scan terminates as soon as a line is found that does not match the search string. Without the /V option, the /M option terminates at the first line that does match the search string.


Dave Benham
In the use case solution that was originally presented, it seems that the /M made no difference on the results. But according to what you are saying, the parsing time should definitely be less with /M since the entire file would not be read.

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 14:32
by dbenham
You will not notice much difference unless the file is quite large.

On my machine, a 1.8 GB file with a matching line on the first line took 53 seconds without the /M option, and only 0.07 seconds with the /M option.


Dave Benham

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 15:03
by dbenham
:oops: I must confess, I only did the positive test (without the /V option) with a search string that matches all lines. The timing difference between with and without /M was dramatic.

Code: Select all

findstr    "^" testbig.txt >nul  --> 53 sec
findstr /M "^" testbig.txt >nul  --> 0.07 sec

But the situation is a bit odd when I add the /V option, but still do a search that returns all lines (no line matches search string).

Using the same 1.8 GB file:

Code: Select all

findstr    /V zzzzzz testbig.txt >nul     --> 32 seconds
findstr /M /V zzzzzz testbig.txt >nul     --> 11 seconds


It is odd that the "^" search without /V is slower than the "zzzzzz" search with /V.

I still believe /M without /V terminates at the first matching line.

But I'm thinking that /V may prevent the early termination. I wonder if the observed difference between the /V searches is due to fewer printf executions with /M vs without (assuming FINDSTR is written in C) . I know the redirection to nul dramatically cuts the time of printf, but printf still must take some time. With the /M option there is only 1 printf.


Dave Benham

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 15:27
by Samir
That is a dramatic difference indeed!

I'm going to try some testing on my end and see what type of results I get with the original command. Bear in mind the files are on a file server, so network latency might skew my results slightly.

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 15:31
by aGerman
dbenham wrote:Actually the /M option can still serve a purpose.

Oh I see. Thanks for pointing.

The test results are quite interesting. Your assumtion that findstr was written in C is most likely true. It's not the printf (or WriteFile, both can be found in findstr.exe) that slows down the execution. It's rather the badly implemented way to write to the console window, regardless of what language or function was used.

Steffen

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 15:41
by dbenham
aGerman wrote:It's not the printf (or WriteFile, both can be found in findstr.exe) that slows down the execution. It's rather the badly implemented way to write to the console window, regardless of what language or function was used.

I was thinking that the redirection to nul would short circuit the code that writes to the console window, leaving only the printf within FINDSTR itself. But that is just an educated guess.

Without the redirection to nul, the writes to the console are absolutely horrifically slow.


Dave Benham

Re: Find All Filenames NOT Containing a String

Posted: 23 Jul 2017 15:46
by Samir
My results were quite interesting. I didn't show any noticeable speed increase/decrease. The file set was 5263 files totaling just over 7MB with file sizes ranging from just over 100 bytes to 6842 bytes.

I ran the following commands each took approximately 80 seconds to run.

Code: Select all

FINDSTR /MVC:"No Data To Report" BAT*.RPT > NUL 
FINDSTR /VC:"No Data To Report" BAT*.RPT > NUL
I ran each twice to see if any cacheing played a part as the entire set of files can fit in the hard drive's cache as well as maybe some local cacheing by the system. The results were identical.

Not sure why my results didn't show any speed improvement. :oops: