Page 2 of 2

Re: robust line counter

Posted: 21 Oct 2014 10:27
by Sponge Belly
Thanks to everyone who replied…

@Siberia-Man

More has problems as pointed out by Squashman above and in this post. Type would be a better alternative:

Code: Select all

type filename.txt | find /c /v ""


@FoxiDrive

Using redirected input from a file with find is unreliable… as you yourself discussed in this Batch World thread from May, 2004. :lol:

My line counting subroutine isn’t for everyday use, but if you need to know the number of lines in a text file you can’t make any assumptions about (other than it uses LFs as line terminators), then this is the code for you! ;)

- SB

Re: robust line counter

Posted: 21 Oct 2014 11:00
by siberia-man
Sponge Belly
Agree with you. "more" has much more problems with huge lines. I remember once I have had inadequate behavior with lines over 64K characters.

Re: robust line counter

Posted: 21 Oct 2014 22:24
by foxidrive
Sponge Belly wrote:@FoxiDrive

Using redirected input from a file with find is unreliable… as you yourself discussed


Geeze, well spotted - I'd forgotten about that foible.

It's still broken in Windows 8.1 as this returns 4 and should return 5 as findstr does below

Code: Select all

@echo off
(
echo.123
echo.456
echo.
echo.
echo.
)>file.txt
find /c /v "" <file.txt

for /f "delims=:" %%a in ('findstr /n "^" file.txt ') do set numlines=%%a
echo findstr=%numlines%
pause

Re: robust line counter

Posted: 22 Oct 2014 04:12
by penpen
If you also want to count unfinidhed empty lines (should be std), then your example above should report 6 instead of 4 or 5.

Adding a character != newline should not change the linecount:

Code: Select all

cls
@echo off
setlocal
(
echo.123
echo.456
echo.
echo.
echo.
)>file.txt

find /c /v "" <file.txt
for /f "delims=:" %%a in ('findstr /n "^" file.txt ') do set numlines=%%a

rem to avoid overflow on linecount >= 2^32 create dummy2.txt that contains 10 bytes per newline
2>nul del "dummy.txt", "dummy2.txt"
for %%a in ("file.txt") do >nul fsutil file createnew "dummy.txt" %%~za
set "found=aaaaaaaa"
> "dummy2.txt" echo(%found%
for %%a in ("dummy2.txt") do if NOT 10 == %%~za (
   set "found=aaaaaaaaa"
   > "dummy2.txt" echo(%found%
)
>> "dummy2.txt" (
   for /F "tokens=2 skip=1" %%a in ('fc /b "file.txt" "dummy.txt"') do if /I "0A" == "%%~a" echo(%found%
)
for %%a in ("dummy2.txt") do set numlines2=%%~za
set "numlines2=%numlines2:~0,-1%"
del "dummy.txt", "dummy2.txt"

echo findstr=%numlines%
echo(fc=%numlines2%
echo(


>> file.txt < nul set /P "=a"
find /c /v "" <file.txt
for /f "delims=:" %%a in ('findstr /n "^" file.txt ') do set numlines=%%a

2>nul del "dummy.txt", "dummy2.txt"
for %%a in ("file.txt") do >nul fsutil file createnew "dummy.txt" %%~za
set "found=aaaaaaaa"
> "dummy2.txt" echo(%found%
for %%a in ("dummy2.txt") do if NOT 10 == %%~za (
   set "found=aaaaaaaaa"
   > "dummy2.txt" echo(%found%
)
>> "dummy2.txt" (
   for /F "tokens=2 skip=1" %%a in ('fc /b "file.txt" "dummy.txt"') do if /I "0A" == "%%~a" echo(%found%
)
for %%a in ("dummy2.txt") do set numlines2=%%~za
set "numlines2=%numlines2:~0,-1%"
del "dummy.txt", "dummy2.txt"

echo findstr=%numlines%
echo(fc=%numlines2%
echo(

pause
endlocal
I assume the safest way is, to to use fc.exe and count the hex newline characters and add 1 to get the linecount.

penpen
Edit: Added fc.exe based line count.

Re: robust line counter

Posted: 24 Oct 2014 12:41
by Sponge Belly
Argh! Hoisted by my own petard. :oops:

I neglected the edge-case of a file whose last line is a single non-LF character. See this post for revised code.

@Penpen

Using fc /b to count the LFs in a file is an interesting approach, but you fell into the same trap I did. Your code counts the number of LFs and adds 1, but this will return a wrong result if the file ends with a LF. :(

BFN! ;)

- SB

Re: robust line counter

Posted: 25 Oct 2014 05:39
by penpen
As described, my above example counts empty unfinished lines, too, so this is no error.

You may use this, if you don't want to count empty unfinished lines (currently untested):

Code: Select all

rem to avoid overflow on linecount >= 2^32 create dummy2.txt that contains 10 bytes per newline
2>nul del "dummy.txt", "dummy2.txt"
for %%a in ("file.txt") do >nul fsutil file createnew "dummy.txt" %%~za
set "found=aaaaaaaa"
> "dummy2.txt" echo(%found%
for %%a in ("dummy2.txt") do if NOT 10 == %%~za set "found=aaaaaaaaa"
set "newline=1"
> "dummy2.txt" (
   for /F "tokens=2 skip=1" %%a in ('fc /b "file.txt" "dummy.txt"') do if /I "0A" == "%%~a" (
   echo(%found%
   set "newline=1"
   ) else (
   set "newline=0"
   )

   if "%newline%" == "0" echo(%found%
)
for %%a in ("dummy2.txt") do set numlines2=%%~za
set "numlines2=%numlines2:~0,-1%"
del "dummy.txt", "dummy2.txt"

penpen

Re: robust line counter

Posted: 25 Oct 2014 06:56
by Sponge Belly
Penpen wrote: I assume the safest way is, to to use fc.exe and count the hex newline characters and add 1 to get the linecount.
That’s what made me think you were incorrectly adding 1 to the line count when a file ended with a LF. Sorry I misunderstood.

Re: robust line counter

Posted: 28 Feb 2018 13:46
by Sponge Belly
Below is my third (and hopefully final) version of my line-counting subroutine:

Code: Select all

@echo off & setLocal enableExtensions disableDelayedExpansion

call :countLines noOfLines "%~1" || (
    >&2 echo(file "%~nx1" is empty & goto end
) %= cond exec =%
echo(file "%~nx1" has %noOfLines% line(s)

:end - exit program with appropriate errorLevel
endLocal & goto :EOF

:countLines result= "%file%"
:: counts the number of lines in a file
setLocal disableDelayedExpansion
(set "lc=0" & call)

for /f "delims=:" %%N in ('
    cmd /d /a /c type "%~2" ^^^& ^<nul set /p "=#" ^| (^
    2^>nul findStr /n "^" ^&^& echo(^) ^| ^
    findStr /blv 1: ^| 2^>nul findStr /lnxc:" "
') do (set "lc=%%N" & call;) %= for /f =%

endlocal & set "%1=%lc%"
exit /b %errorLevel% %= countLines =%
Empty files will now raise an error. Works with Unicode as well as ANSI text files.