Undocumented FINDSTR features and limitations

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
aGerman
Expert
Posts: 4654
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Undocumented FINDSTR features and limitations

#16 Post by aGerman » 22 Jan 2012 15:20

Haha, no I also didn't want to discredit Dave. He already mentioned it was untested. I corrected this for all the Google users who stumble upon this thread :wink:

Regards
aGerman

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#17 Post by dbenham » 06 Jun 2012 21:13

I updated my SO post: http://stackoverflow.com/a/8844873/1012053

I used to think that the Windows pipe operator appended <CR><LF> to the input if the last character in the stream was not a <LF>. But I've since discovered that FINDSTR is actually doing the alteration of the input.

FINDSTR also appends <CR><LF> to redirected input on Vista (and XP?) if the last character of the redirected file is not <LF>.

I've discovered a nasty FINDSTR "feature" running on Windows 7: it hangs indefinitely on Windows 7 if you search redirected input and the redirected file does not end with <LF>. :shock: :roll:


Dave Benham

Fawers
Posts: 187
Joined: 08 Apr 2012 17:11
Contact:

Re: Undocumented FINDSTR features and limitations

#18 Post by Fawers » 06 Jun 2012 21:48

How could I not see this before? I have so many doubts on using FINDSTR.

Added this page to my favs. Gonna read it when I've got enough time to.
Thanks, Dave.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Undocumented FINDSTR features and limitations

#19 Post by foxidrive » 06 Jun 2012 21:57

dbenham wrote:FINDSTR also appends <CR><LF> to redirected input on Vista (and XP?) if the last character of the redirected file is not <LF>.

I've discovered a nasty FINDSTR "feature" running on Windows 7: it hangs indefinitely on Windows 7 if you search redirected input and the redirected file does not end with <LF>. :shock: :roll:


XP also has the issue if the file does not end with appropriate line endings. It hangs.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#20 Post by dbenham » 27 Nov 2012 21:29

I updated my SO FINDSTR post with two new sections:

1) Description of XP behavior displaying most control characters as dots

2) Bugged /S and /D options may fail to find files if short 8.3 names are encountered.


Dave Benham

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: Undocumented FINDSTR features and limitations

#21 Post by carlos » 27 Nov 2012 22:02

Thanks.
I remember that the default is /R not /L.
Example:

Code: Select all

echo.#|Findstr "."

print #
but

Code: Select all

echo.#|Findstr /L "."

don't print anything. The default is /R not /L

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#22 Post by dbenham » 27 Nov 2012 23:18

I think you did not read the post carefully. It is more complicated than that.

I stated that the default for the /C option is literal.

The default for all other methods (anything other than /C option) depends on the content of the 1st search string. If the 1st search string contains an un-escaped meta character and the string is a valid regex, then all searches will be treated as regex. If the first string does not contain an un-escaped meta character, or if it is not a valid regex, then all search strings will be treated as literals.

The following is a regex search that matches because the first string is a valid regex that contains a meta character.

Code: Select all

echo #|findstr ". a"

But this next example is a literal search that does not match because the first search string does not contain a meta character

Code: Select all

echo #|findstr "a ."


Dave Benham

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: Undocumented FINDSTR features and limitations

#23 Post by carlos » 28 Nov 2012 00:18

Thanks for the info.
When I specify /L or /R using the /C option, I get a message that says that the /C option was omitted.

Code: Select all

C:\Users\Carlos>echo.#|findstr /c /l "#"
FINDSTR: se ha omitido /c
#

C:\Users\Carlos>echo.#|findstr /c /r "#"
FINDSTR: se ha omitido /c
#



Also, I have a dude with the /O option.
I have these file:

Code: Select all

all#everybody#is#ok

If I use:

Code: Select all

findstr /N /O "#" file.txt

it print:

Code: Select all

1:0:all#everybody#is#ok


The offset should be 3 not 0?

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Undocumented FINDSTR features and limitations

#24 Post by foxidrive » 28 Nov 2012 03:13

It seems buggy,

file.txt
d#dd
aaaa#aa#
all#everybody#is#ok
aaa


d:\ABC>findstr /O /c:"#" file.txt
0:d#dd
6:aaaa#aa#
16:all#everybody#is#ok

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#25 Post by dbenham » 28 Nov 2012 06:53

Here again, the correct information is already in my SO post.
SO FINDSTR post wrote:lineOffset: = The decimal byte offset of the start of the matching line, with 0 representing the 1st character of the 1st line. Only printed if /O option is specified.

Note - it is the byte offset of the beginning of the line that matches (measured from the beginning of the file), not the offset of the beginning of the match itself. Also, don't forget to count the CarriageReturn/LineFeed line terminators.

The /N and /O options specify the same locations within the file, but the /N option counts the number of lines, whereas the /O option counts the number of bytes. The /N option is 1 based, the /O option is 0 based.

So the results given by the Carlos and Foxidrive examples are corect/not bugged.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#26 Post by dbenham » 28 Nov 2012 08:52

I've updated my SO post to clarify the /O option;
SO FINDSTR post wrote:lineOffset: = The decimal byte offset of the start of the matching line, with 0 representing the 1st character of the 1st line. Only printed if /O option is specified. This is not the offset of the match within the line. It is the number of bytes from the beginning of the file to the beginning of the line.


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Undocumented FINDSTR features and limitations

#27 Post by foxidrive » 28 Nov 2012 11:08

dbenham wrote:I've updated my SO post to clarify the /O option;
SO FINDSTR post wrote:lineOffset: = The decimal byte offset of the start of the matching line, with 0 representing the 1st character of the 1st line. Only printed if /O option is specified. This is not the offset of the match within the line. It is the number of bytes from the beginning of the file to the beginning of the line.


Dave Benham


Thanks Dave.

That could be used to count the length of a line also, or several lines.

It was very confusing as you normally think of the character offset to be to a match of the regexp/literal. The description should say "prints file offset before each matching line."

/O Prints character offset before each matching line.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#28 Post by dbenham » 28 Nov 2012 13:12

foxidrive wrote:That could be used to count the length of a line also, or several lines.

Cool idea foxidrive :!: 8) :idea:

Code: Select all

@echo off
setlocal
set "test=Hello world!"

:: Echo the length of TEST
call :strLen test

:: Store the length of TEST in LEN
call :strLen test len
echo len=%len%
exit /b

:strLen  strVar  [rtnVar]
setlocal disableDelayedExpansion
set len=0
if defined %~1 for /f "delims=:" %%N in (
  '"(cmd /v:on /c echo !%~1!&echo()|findstr /o ^^"'
) do set /a "len=%%N-3"
endlocal & if "%~2" neq "" (set %~2=%len%) else echo %len%
exit /b

I haven't figured out why I must subtract 3 instead of 2, but it appears to work.

Dave Benam

Squashman
Expert
Posts: 4465
Joined: 23 Dec 2011 13:59

Matching Whole Words

#29 Post by Squashman » 13 Dec 2017 19:19

What am I not understanding about matching two whole words.

Given the following input

Code: Select all

squash, 22, 14, 15, 12, 18, 19
squashman,22,14,15,12,18,19
josh,10, 16, 19, 3, 5, 19, 18, 7, 2, 4
joshua,10, 16, 19, 3, 5, 19, 18, 7, 2, 4
And using this code

Code: Select all

@echo off
set "userid=squash"
set "number=15"
echo match whole word userid
findstr "\<%userid%\>" "wholetest.txt"
echo match whole word number
findstr "\<%number%\>" "wholetest.txt"
echo match two whole words
findstr "\<%userid%\>.*\<%number%\>" "wholetest.txt"
pause
goto :EOF
I get this output

Code: Select all

match whole word userid
squash, 22, 14, 15, 12, 18, 19
match whole word number
squash, 22, 14, 15, 12, 18, 19
squashman,22,14,15,12,18,19
match two whole words
Why does it not match two whole words?

Code: Select all

squash, 22, 14, 15, 12, 18, 19

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Undocumented FINDSTR features and limitations

#30 Post by dbenham » 13 Dec 2017 21:07

The explanation is within the italicized portions of the following quote from the 2nd answer in my SO Q&A:'
dbenham on StackOverflow wrote: Regex word boundary
\< must be the very first term in the regex. The regex will not match anything if any other characters precede it. \< corresponds to either the very beginning of the input, the beginning of a line (the position immediately following a <LF>), or the position immediately following any "non-word" character. The next character need not be a "word" character.

\> must be the very last term in the regex. The regex will not match anything if any other characters follow it. \> corresponds to either the end of input, the position immediately prior to a <CR>, or the position immediately preceding any "non-word" character. The preceding character need not be a "word" character.

Dave Benham

Post Reply