Brilliant idea, but ...
You really threw me for a loop for a while. I was thinking that should not work at all.
But I think I have most everything figured out.
Your implementation is not what you think it is, but it works
1) First off, it does not matter if the last line ends with line feed for two reasons:
- The regex would work regardless whether the last line ends with line feed
- Whenever data is piped into FINDSTR, <CR><LF> is automatically appended to the data anyway if the last line does not end with line feed.
2) Your regex is not what you think.
You want to find lines that are not followed by a new line beginning with a digit. The correct syntax is
Code: Select all
findstr /n "^" "%~1" | findstr /rv "!lf![123456789]"
Your code actually amounts to the same thing because delayed expansion requires caret literals to be escaped if the line also contains an exclamation point. You didn't escape it, so [^123456789] becomes [123456789]. The proper way to escape a quoted caret is "^^", but we really don't want the caret anyway
Note that the left hand side of the pipe works because the parser breaks the line into two at the pipe. The left side doesn't have an exclamation point, so the quoted caret is safe.
3) The LF variable is expanded in the main script, before the pipe is executed. I'm pleasantly surprised this works. Behind the scenes, the FINDSTR command actually becomes:
Code: Select all
cmd /c findstr /rv "<LF>[123456789]"
I would have thought passing a quoted line feed literal would have caused problems, but apparently not
The following also works, and it makes more sense to me. The expansion is delayed until when the right side of the pipe is actually executed.
Code: Select all
@echo off
setlocal disableDelayedExpansion
(set lf=^
)
findstr /n "^" "%~1" | cmd /v:on /c findstr /rv "!lf![123456789]"
In this case, the right side becomes
Code: Select all
cmd /c cmd /v:on /c findstr /rv "!lf![123456789]"
4) And now for the odd behavior with the large file. I was able to reproduce your problem with my own 12 mB file. I too was seeing inconsistent behavior.
I believe the problem has to do with FINDSTR input line length limits when piping data. When data is piped into FINDSTR, it will fail to match any line that exceeds 8191 characters. (Note, this is a feature of FINDSTR, not pipes in general) I think this is exacerbated by the fact we are attempting to search across line breaks. So even if no line exceeds 8191, the buffer may be exceeded when FINDSTR attempts to read the next line. I think your idea of a pipe buffering/timing issue may also be coming into play, hence the inconsistent behavior. I don't think it would ever be a problem if we were not searching across line breaks.
FINDSTR has no line length limit when it reads a file directly. So the solution is simple. Ditch the pipe and use a temporary file.
Code: Select all
@echo off
setlocal enableDelayedExpansion
(set LF=^
)
findstr /n "^" "%~1" >"%~1.temp"
findstr /rv "!LF![123456789]" "%~1.temp"
del "%~1.temp"
Dave Benham