robust line counter

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Sponge Belly
Posts: 231
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: robust line counter

#16 Post by Sponge Belly » 21 Oct 2014 10:27

Thanks to everyone who replied…

@Siberia-Man

More has problems as pointed out by Squashman above and in this post. Type would be a better alternative:

Code: Select all

type filename.txt | find /c /v ""


@FoxiDrive

Using redirected input from a file with find is unreliable… as you yourself discussed in this Batch World thread from May, 2004. :lol:

My line counting subroutine isn’t for everyday use, but if you need to know the number of lines in a text file you can’t make any assumptions about (other than it uses LFs as line terminators), then this is the code for you! ;)

- SB

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: robust line counter

#17 Post by siberia-man » 21 Oct 2014 11:00

Sponge Belly
Agree with you. "more" has much more problems with huge lines. I remember once I have had inadequate behavior with lines over 64K characters.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: robust line counter

#18 Post by foxidrive » 21 Oct 2014 22:24

Sponge Belly wrote:@FoxiDrive

Using redirected input from a file with find is unreliable… as you yourself discussed


Geeze, well spotted - I'd forgotten about that foible.

It's still broken in Windows 8.1 as this returns 4 and should return 5 as findstr does below

Code: Select all

@echo off
(
echo.123
echo.456
echo.
echo.
echo.
)>file.txt
find /c /v "" <file.txt

for /f "delims=:" %%a in ('findstr /n "^" file.txt ') do set numlines=%%a
echo findstr=%numlines%
pause

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: robust line counter

#19 Post by penpen » 22 Oct 2014 04:12

If you also want to count unfinidhed empty lines (should be std), then your example above should report 6 instead of 4 or 5.

Adding a character != newline should not change the linecount:

Code: Select all

cls
@echo off
setlocal
(
echo.123
echo.456
echo.
echo.
echo.
)>file.txt

find /c /v "" <file.txt
for /f "delims=:" %%a in ('findstr /n "^" file.txt ') do set numlines=%%a

rem to avoid overflow on linecount >= 2^32 create dummy2.txt that contains 10 bytes per newline
2>nul del "dummy.txt", "dummy2.txt"
for %%a in ("file.txt") do >nul fsutil file createnew "dummy.txt" %%~za
set "found=aaaaaaaa"
> "dummy2.txt" echo(%found%
for %%a in ("dummy2.txt") do if NOT 10 == %%~za (
   set "found=aaaaaaaaa"
   > "dummy2.txt" echo(%found%
)
>> "dummy2.txt" (
   for /F "tokens=2 skip=1" %%a in ('fc /b "file.txt" "dummy.txt"') do if /I "0A" == "%%~a" echo(%found%
)
for %%a in ("dummy2.txt") do set numlines2=%%~za
set "numlines2=%numlines2:~0,-1%"
del "dummy.txt", "dummy2.txt"

echo findstr=%numlines%
echo(fc=%numlines2%
echo(


>> file.txt < nul set /P "=a"
find /c /v "" <file.txt
for /f "delims=:" %%a in ('findstr /n "^" file.txt ') do set numlines=%%a

2>nul del "dummy.txt", "dummy2.txt"
for %%a in ("file.txt") do >nul fsutil file createnew "dummy.txt" %%~za
set "found=aaaaaaaa"
> "dummy2.txt" echo(%found%
for %%a in ("dummy2.txt") do if NOT 10 == %%~za (
   set "found=aaaaaaaaa"
   > "dummy2.txt" echo(%found%
)
>> "dummy2.txt" (
   for /F "tokens=2 skip=1" %%a in ('fc /b "file.txt" "dummy.txt"') do if /I "0A" == "%%~a" echo(%found%
)
for %%a in ("dummy2.txt") do set numlines2=%%~za
set "numlines2=%numlines2:~0,-1%"
del "dummy.txt", "dummy2.txt"

echo findstr=%numlines%
echo(fc=%numlines2%
echo(

pause
endlocal
I assume the safest way is, to to use fc.exe and count the hex newline characters and add 1 to get the linecount.

penpen
Edit: Added fc.exe based line count.

Sponge Belly
Posts: 231
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: robust line counter

#20 Post by Sponge Belly » 24 Oct 2014 12:41

Argh! Hoisted by my own petard. :oops:

I neglected the edge-case of a file whose last line is a single non-LF character. See this post for revised code.

@Penpen

Using fc /b to count the LFs in a file is an interesting approach, but you fell into the same trap I did. Your code counts the number of LFs and adds 1, but this will return a wrong result if the file ends with a LF. :(

BFN! ;)

- SB
Last edited by Sponge Belly on 25 Oct 2014 06:50, edited 1 time in total.

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: robust line counter

#21 Post by penpen » 25 Oct 2014 05:39

As described, my above example counts empty unfinished lines, too, so this is no error.

You may use this, if you don't want to count empty unfinished lines (currently untested):

Code: Select all

rem to avoid overflow on linecount >= 2^32 create dummy2.txt that contains 10 bytes per newline
2>nul del "dummy.txt", "dummy2.txt"
for %%a in ("file.txt") do >nul fsutil file createnew "dummy.txt" %%~za
set "found=aaaaaaaa"
> "dummy2.txt" echo(%found%
for %%a in ("dummy2.txt") do if NOT 10 == %%~za set "found=aaaaaaaaa"
set "newline=1"
> "dummy2.txt" (
   for /F "tokens=2 skip=1" %%a in ('fc /b "file.txt" "dummy.txt"') do if /I "0A" == "%%~a" (
   echo(%found%
   set "newline=1"
   ) else (
   set "newline=0"
   )

   if "%newline%" == "0" echo(%found%
)
for %%a in ("dummy2.txt") do set numlines2=%%~za
set "numlines2=%numlines2:~0,-1%"
del "dummy.txt", "dummy2.txt"

penpen
Last edited by penpen on 26 Oct 2014 05:10, edited 1 time in total.

Sponge Belly
Posts: 231
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: robust line counter

#22 Post by Sponge Belly » 25 Oct 2014 06:56

Penpen wrote: I assume the safest way is, to to use fc.exe and count the hex newline characters and add 1 to get the linecount.
That’s what made me think you were incorrectly adding 1 to the line count when a file ended with a LF. Sorry I misunderstood.

Sponge Belly
Posts: 231
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: robust line counter

#23 Post by Sponge Belly » 28 Feb 2018 13:46

Below is my third (and hopefully final) version of my line-counting subroutine:

Code: Select all

@echo off & setLocal enableExtensions disableDelayedExpansion

call :countLines noOfLines "%~1" || (
    >&2 echo(file "%~nx1" is empty & goto end
) %= cond exec =%
echo(file "%~nx1" has %noOfLines% line(s)

:end - exit program with appropriate errorLevel
endLocal & goto :EOF

:countLines result= "%file%"
:: counts the number of lines in a file
setLocal disableDelayedExpansion
(set "lc=0" & call)

for /f "delims=:" %%N in ('
    cmd /d /a /c type "%~2" ^^^& ^<nul set /p "=#" ^| (^
    2^>nul findStr /n "^" ^&^& echo(^) ^| ^
    findStr /blv 1: ^| 2^>nul findStr /lnxc:" "
') do (set "lc=%%N" & call;) %= for /f =%

endlocal & set "%1=%lc%"
exit /b %errorLevel% %= countLines =%
Empty files will now raise an error. Works with Unicode as well as ANSI text files.

Post Reply