Determining the number of lines in a file.

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#31 Post by Squashman » 07 Jan 2012 21:26

I think I can use Judago's Divide.bat
http://judago.webs.com/content/Scripts/divide.txt

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#32 Post by Squashman » 09 Jan 2012 18:09

I was thinking about making this post a separate thread but since it is going to be integrated into this one script I am just going to post it here.

As I said in my last post I am going to use Judago's Divide.bat to calculate the number of lines because we may run into file sizes that are larger than the Integer maximum of 2147483647. Going to take the total line length (Data + EOL) from the code that Dave gave me and divide that into the file size.

With Judago's script is does seem to return the correct MOD (remainder) when you do specify the number of decimals places to be zero.

Code: Select all

E:\batch files\HEAD>divide.bat 24627955 / 985 "" 0
25003 - Leftover: (this isn't always the mod result)

E:\batch files\HEAD>divide.bat 24627956 / 985 "" 0
25003 - Leftover:1 (this isn't always the mod result)

E:\batch files\HEAD>divide.bat 24627954 / 985 "" 0
25002 - Leftover:984 (this isn't always the mod result)

When you don't specify the number decimals you get some weird output.

Code: Select all

E:\batch files\HEAD>divide.bat 24627956 / 985
25003.00101522 - Leftover:830 (this isn't always the mod result)

Now his code allows for it to return a variable and that seems to work fine.

Code: Select all

E:\batch files\HEAD>divide.bat 24627956 / 985 result 0

E:\batch files\HEAD>echo %result%
25003

But I also want to return the MOD(remainder) or what he calls the LeftOver. But when I change the code at the bottom for the output to this I do not get the what is stored in the array variable.
Bolded what I added.

Code: Select all

if not "%~3"=="" (
    endlocal
    set %~3=%total%
    [b]set %~5=!input%chunk%![/b]
) else (
    echo %total% - Leftover:!input%chunk%! ^(this isn't always the mod result^)
    endlocal
)

So when I run it like this.

Code: Select all

E:\batch files\HEAD>divide.bat 24627956 / 985 result 0 remainder

E:\batch files\HEAD>echo %result% %remainder%
25003 !input0!

Why does the LeftOver ECHO fine from the batch file but doesn't set variable the way I need it to in the batch file? The remainder should be 1. Not sure what I am doing wrong. Probably a rookie mistake.

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Determining the number of lines in a file.

#33 Post by Ed Dyreen » 09 Jan 2012 20:16

'
Without reading this lengthy thread nor understanding everything, try: SETLOCAL ENABLEDELAYEDEXPANSION
http://judago.webs.com/content/Scripts/divide.txt

Code: Select all

:finish
if "%total:~0,1%"=="0" if not "%total:~1%"=="" set total=%total:~1%&&goto finish
if "%total:~0,1%"=="." set total=0%total%
set "Squashman=!input%chunk%!"
if not "%~3"=="" (
    endlocal
    set %~3=%total%
    set "%~5=%Squashman%"
) else (
    echo %total% - Leftover:!input%chunk%! ^(this isn't always the mod result^)
    endlocal
)
exit /b 0

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#34 Post by Squashman » 10 Jan 2012 10:23

Thanks Ed. That worked. Didn't realize I needed to assign that variable to another variable before the ENDLOCAL. Delayed Expansion was enabled higher up in the script.

Aacini
Expert
Posts: 1659
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Determining the number of lines in a file.

#35 Post by Aacini » 13 Jan 2012 20:15

I wrote a very small program that read bytes from stdin looking for the first LF character and display the lenght of the first line; it is called LINE1LEN.COM. It return an ERRORLEVEL of 2 if the line end with CR+LF, or 1 if it ends with LF only.

I also wrote a second program that count the number of LF characters it reads from stdin and display it; it is called NUMLINES.COM. It return an ERRORLEVEL comprised of one or two digits that represent the last two characters of the file this way: 1=LF, 2=EOF, 0=any other. For example: 12=LF+EOF, 2=EOF only, etc; the number of lines is incremented by 1 if the last characters does not include a LF.

These programs have not limit in the size of the file they can read. The characters per line counter is a 16-bits wide value (up to 65535) and the number of lines counter is a 32-bits wide value. They are .COM executable files written in assembly language, but they are not the faster ones because they read just one character at once to keep them simple and small. However, I am confident that they will run faster than equivalent Batch methods.

Although I wrote these programs specifically for the requirements of this topic, they may also be of general use. For example, you no longer need LINE1LEN program because NUMLINES directly gives the real number of lines in the file, no matters the lenght of each one.

Code: Select all

@echo off
if not exist line1len.com call :CreateLine1Len
if not exist numlines.com call :CreateNumLines

line1len < %1 > line1len.tmp
set line1eol=%errorlevel%
set /P line1len=< line1len.tmp
numlines < %1 > numlines.tmp
set fileEof=%errorlevel%
set /P numlines=< numlines.tmp
echo Length of first line: %line1len%
set /P =First line ends with: < NUL
if %line1eol% == 1 (echo LF only) else echo CR+LF
echo Number of lines in the file: %numlines%
set /P =The file ends with: < NUL
if %fileEof% == 0  echo Normal character (no LF or EOF)
if %fileEof% == 1  echo LF character only (no EOF)
if %fileEof% == 2  echo EOF character only (no previous LF)
if %fileEof% == 10 echo LF followed by a normal character (no EOF)
if %fileEof% == 12 echo LF and EOF characters
goto :EOF

:CreateLine1Len
setlocal DisableDelayedExpansion
set line1len=3ɲ+€ê!´)€ì!Í!:ÂtLAŠð´,€ì!Í!"Àuç2ÀþÀR².€ê!R:òu1þÀI‹ø‹Á3ÉAA2ÿ³+€ë!3Ò÷ó€Â0RA#ÀuóZ´#€ì!Í!âö‹Ç´LÍ!ëÀëÐ
setlocal EnableDelayedExpansion
echo !line1len!>line1len.com
exit /B

:CreateNumLines
setlocal DisableDelayedExpansion
set numlines=f3ɲ+€ê!´)€ì!Í!:Âu^|fAŠûŠØ´,€ì!Í!"ÀuäR:úuh²;€ê!:Úua°-,!ë °+,!fAëõ:ÚuPþÀëífA²;€ê!:ÚuâþÀþÀ².€ê!R‹øf‹Á3ÉAAf3Û³+€ë!f3Òf÷ó€Â0RAf#ÀuðZ´#€ì!Í!âö‹Ç´LÍ!ë„ë®ë¤ë²
setlocal EnableDelayedExpansion
echo !numlines!>numlines.com
exit /B

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#36 Post by Squashman » 26 Jan 2012 14:57

Well here is some interesting output from this batch.

Code: Select all

E:\batch files\HEAD>FileAttrib.bat EST3_CRLF.txt
FC: cannot open 25712336 - No such file or folder

The system cannot find the batch label specified - error

E:\batch files\HEAD>type FileAttrib.log
Filename                                Quantity   RecLength  EOL
25712336                                           1026       0D0A
EST3_CRLF.txt                                      1026       0D0A

E:\batch files\HEAD>dir EST3_CRLF.txt | find "EST3_CRLF.txt"
11/29/11  01:20 PM        25,712,336 EST3_CRLF.txt

It looks like it is putting the file size back into the FC command. Not sure why I am getting the batch label error.
I basically combined Dave's script to get the line length and combined it with Judago's Divide.bat to do math on large numbers into 1 batch file. Taking the Line Length from Dave' script and using Judago's Divide script to divide the File Size by the Line length which in theory should give the number of lines in the file based on the fact that all the files I use are fixed length files. I then output this to a log file.

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Filename for 40  Quantity for 11    RecLength for 11  EOL for 4
echo.Filename                                Quantity   RecLength  EOL >FileAttrib.log
:loop
set "fSize=%~z1"
::Build a dummy file with length 32kbytes to do a binary compare with
::This file could be made larger or smaller, depending on requirements
<nul set /p ".=A" >dummy.txt
for /l %%n in (1 1 15) do type dummy.txt >>dummy.txt

::Use FC /B to compare with dummy. Use FINDSTR to locate offset and hex representation of each CR or LF
::Use FOR /F to only look at the 1st two instances.
for /f "tokens=1,2 delims=: " %%A in ('fc /b "%~1" dummy.txt ^| findstr /r /c:": 0D 41$" /c:": 0A 41$"') do (
  if not defined eolOffset (
      set /a "eolOffset=0x%%A, next=eolOffset+1, eolSize=1, next=eolOffset+1"
      set "eol=%%B"
  ) else (
      set /a "eolOffset2=0x%%A
      if "!eolOffset2!"=="!next!" if "!eol!" neq "%%B" (
        set "eol=!eol!%%B"
        set /a "eolSize+=1"
      )
      goto :break
  )
)

:break
set /a "recordLen=eolOffset, lnLen=recordLen+eolSize"

CALL :division %fSize% / %lnLen% records 0 remainder
SET "OFile=%~1                                        ."
SET "records=%records%           ."
SET "recordLen=%recordLen%           ."
SET "eol=%eol%    ."
echo.%Ofile:~0,40%%records:~0,11%%recordLen:~0,11%%eol:~0,4%>>FileAttrib.log


:: echo Record 1 length = %recordLen%
:: echo             eol = %eol%
:: echo         eolSize = %eolSize%
:: echo   Line 1 length = %lnLen%
endlocal
exit /b


:division

if "%~1"=="/?" (
    echo.&echo USAGE:&echo.
    echo "%~0" largenumber smallnumber [variablename] [places]
    echo "%~0" largenumber / smallnumber [variablename] [places]
    echo.&echo "largenumber" can floating point "smallnumber" can't.
    echo Only the first [places] decimal places are used for input or
    echo output. if [places] is omitted then the default of 8 is used.
    echo.&echo To specify [places] with out [variablename] pass an empty set.
    echo Below is an example:
    echo.&echo "%~0" 46546545464.123456789 / 1024 "" 9&echo.
    echo.&echo "smallnumber" must be below 2097153, "largenumber" can have
    echo hundreds of places.&echo.&echo -Judago 2009/2010
    exit /b 0
)
SETLOCAL ENABLEDELAYEDEXPANSION
set error=Invalid Input
if "%~1"=="" goto error
if "%~2"=="/" shift /2
if "%~2"=="" goto error
for /f "delims=1234567890." %%a in ("%~1%~2") do goto error
set divisor=%~2
set error=Divisor must be whole
if not "!divisor!"=="!divisor:.=!" goto error
if !divisor! gtr 2097152 (
    set error=Divisor too large, limited to: 2097152
    goto error
)
set dplace=%~4
if not defined dplace set dplace=8
for /f "delims=1234567890." %%a in ("%~4") do set dplace=8
set input=0%~1
for /l %%a in (1 1 %dplace%) do set input=!input!0
set error=Divide by zero
if "!divisor:0=!"=="" goto error
set chunk=
set total=

set /a fpos=dplace + 1
:isfloat
if not "!input:.=!"=="!input!" (
    if not "!input:~-%fpos%,1!"=="." (
        set input=!input:~0,-1!
        goto isfloat
    ) else (
        set input=!input:.=!
    )
)

:split
if not "%input:~3%"=="" (
    set /a chunk+=1
    set input!chunk!=%input:~-3%
    set input=%input:~0,-3%
    if defined input goto split
) else (
    set /a chunk+=1
    set input!chunk!=%input%
)

:loop
if defined input%chunk% (
    if "!input%chunk%:~0,1!"=="0" (
        set input%chunk%=!input%chunk%:~1!
        goto loop
    )
) else (
    set input%chunk%=0
    goto pad
)
set chunkresult=0

:divide
If !input%chunk%! geq !divisor! (
    If !input%chunk%! geq !divisor!000 (
        set /a input%chunk%-=!divisor!000
        set /a chunkresult+=1000
        goto divide
    ) else (
        If !input%chunk%! geq !divisor!00 (
            set /a input%chunk%-=!divisor!00
            set /a chunkresult+=100
            goto divide
        ) else (
            If !input%chunk%! geq !divisor!0 (
                set /a input%chunk%-=!divisor!0
                set /a chunkresult+=10
                goto divide
            ) else (
                set /a input%chunk%-=!divisor!
                set /a chunkresult+=1
                goto divide
            )
        )
    )
)
:pad
if "!chunkresult:~2,1!"=="" set chunkresult=0!chunkresult!
if "!chunkresult:~2,1!"=="" set chunkresult=0!chunkresult!
set total=%total%%chunkresult%
set chunkresult=0
if %chunk% gtr 0 (
    set /a chunk-=1
    if !input%chunk%! gtr 0 (
        set carry=!input%chunk%!
        for %%a in (!chunk!) do set input!chunk!=!carry!!input%%a!
    )
)
if %chunk% gtr 0 goto loop
if not defined total set total=0
if %dplace% gtr 0 set total=!total:~0^,-%dplace%!.!total:~-%dplace%!

:finish
if "%total:~0,1%"=="0" if not "%total:~1%"=="" set total=%total:~1%&&goto finish
if "%total:~0,1%"=="." set total=0%total%
set "mod=!input%chunk%!"
IF NOT DEFINED mod SET mod=0
if not "%~3"=="" (
    endlocal
    set %~3=%total%
    set "%~5=%mod%"
) else (
    echo %total% - Leftover:!input%chunk%! ^(this isn't always the mod result^)
    endlocal
)
exit /b 0

:Error
1>&2 echo %error% - See "%~0 /?"
endlocal
exit /b 1

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#37 Post by Squashman » 05 Feb 2013 13:00

Just bumping this to the top. I abandoned this project last year because I got frustrated with it not working. Going to start working on this again.

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#38 Post by Squashman » 05 Feb 2013 14:33

I threw in two echos after the :BREAK label and two after the CALL just to see where I am at in the code and to see what the %1 variable is.

Code: Select all

:break
echo OUT OF THE FC FOR LOOP
echo %1
set /a "recordLen=eolOffset, lnLen=recordLen+eolSize"
SET "OFile=%~1                                        ."
CALL :division %fSize% / %lnLen% records 0 remainder
echo Back from Division sub routine
echo %1


The output again baffles me. It looks like it is trying to run the For Loop with the FC command twice and it also looks like it is running the CALL to the division subroutine twice. It should only do each once.

Code: Select all

C:\batch files\Fixed_Attrib>Fixed_Attrib.bat EST2.txt
OUT OF THE FC FOR LOOP
EST2.txt
FC: cannot open 25690405 - No such file or folder

OUT OF THE FC FOR LOOP
25690405
Invalid Input - See ":division /?"
Back from Division sub routine
25690405
Back from Division sub routine
EST2.txt

C:\batch files\Fixed_Attrib>

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Determining the number of lines in a file.

#39 Post by Liviu » 05 Feb 2013 18:09

Squashman wrote:The output again baffles me. It looks like it is trying to run the For Loop with the FC command twice and it also looks like it is running the CALL to the division subroutine twice. It should only do each once.

Short answer is that you have two :loop labels.

That said, I haven't followed all of it, and you could probably make it easier on others if you described the context in more detail e.g. what the various files are (est2.txt, FileAttrib.log, dummy.txt).

Liviu

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#40 Post by Squashman » 06 Feb 2013 07:15

Well you would have to read the entire thread to understand what everything is doing.

I never saw the two :LOOP labels. I have no idea how that got in there. I can't believe I never saw that.

I feel like a complete IDIOT! :oops:

I should have turned the ECHO ON to see all the commands executing!!!! Lesson Learned again!

Script is working as it should now.

If you haven't read through this thread in the past, basically what my script does is give the Fixed File Attributes. I work with Fixed length text files. Which basically means every line within the text file is the same length. Basically I wanted a quick way to find out what the Line Length was and how many lines were in the file. With Dave's code he also gave me the ability to determine what the End of Line characters are. Does it have a CR\LF or Just a LF.

Big Thanks to Dave for the Line Length code.
Thanks to Ed for helping me tweak Judago's script.
A Huge thanks to Liviu for finding that stupid coding mistake.

So basically my output looks like this.

Code: Select all

Filename                                Quantity   RecLength  EOL 
EST2.txt                                25015      1026       0A


I am going to tweak some more so that the script can take multiple input files. All my scripts at work do that so I will just make sure I don't do another stupid duplicate LABEL!

booga73
Posts: 108
Joined: 30 Nov 2011 16:16

Re: Determining the number of lines in a file.

#41 Post by booga73 » 08 Feb 2013 14:56

I came across this code to help identify number of lines in a text file, an I have used it in my scripting too.

Perhaps this may help?

:: code start to identify number of lines in a text
Set FileName=NameOfFile
Set /a LineNumb=0
for /f "tokens=2 delims=:" %%a in ('find /c /v "" %FileName%') do set /a LineNumb=%%a
@Echo %FileName% has %LineNumb% lines.

Squashman
Expert
Posts: 4176
Joined: 23 Dec 2011 13:59

Re: Determining the number of lines in a file.

#42 Post by Squashman » 08 Feb 2013 15:08

booga73 wrote:I came across this code to help identify number of lines in a text file, an I have used it in my scripting too.

Perhaps this may help?

:: code start to identify number of lines in a text
Set FileName=NameOfFile
Set /a LineNumb=0
for /f "tokens=2 delims=:" %%a in ('find /c /v "" %FileName%') do set /a LineNumb=%%a
@Echo %FileName% has %LineNumb% lines.

Did you read my first post in this thread?

Aacini
Expert
Posts: 1659
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Determining the number of lines in a file.

#43 Post by Aacini » 08 Feb 2013 22:20

Perhaps you may want to test the Batch file below; it use FINDSTR with /O option to get the line length, and divide the file size by line length to correctly get the remainder.

FINDSTR /O get the length of the first (any) line in bytes independently if it ends in CR+LF or just LF.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set fSize=%~Z1
echo File size: %fSize%

rem Get the length of the first line
for /F "skip=1 delims=:" %%a in ('findstr /O "^" "%~1"') do set lineLen=%%a& goto break
:break
echo Line length: %lineLen%

rem Split the file size in groups of 4 digits
set N=0
:nextGroup
   set group=%fSize:~-4%
   :checkLeftZero
      if "%group:~0,1%" neq "0" goto noLeftZero
      set group=%group:~1%
   if defined group goto checkLeftZero
   :noLeftZero
   if not defined group set group=0
   set /A N+=1
   set group[%N%]=%group%
   set fSize=%fSize:~0,-4%
if defined fSize goto nextGroup

rem Divide the groups by the line length and assemble the result
set quotient=
set remainder=0
for /L %%i in (%N%,-1,1) do (
   set /A group=remainder*10000+group[%%i], group[%%i]=group/lineLen, remainder=group%%lineLen
   if not defined quotient (
      if !group[%%i]! neq 0 set quotient=!group[%%i]!
   ) else (
      set group=000!group[%%i]!
      set quotient=!quotient!!group:~-4!
   )
)

echo Number of records: %quotient%
echo Remainder: %remainder%


Antonio

EDIT: I fixed a small error: the longest line was divided at this part "remainder=group%%lineLen" because my text editor.
Last edited by Aacini on 09 Feb 2013 08:33, edited 1 time in total.

dbenham
Expert
Posts: 2383
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Determining the number of lines in a file.

#44 Post by dbenham » 09 Feb 2013 00:47

Aacini wrote:Perhaps you may want to test the Batch file below; it use FINDSTR with /O option to get the line length, and divide the file size by line length to correctly get the remainder.

FINDSTR /O get the length of the first (any) line in bytes independently if it ends in CR+LF or just LF.

The whole point of this thread is to quickly get the number of lines in a file, assuming all lines are the same length. As Squashman clearly stated in the opening post to this thread, any technique that uses FOR /F or FINDSTR or FIND to scan the entire file is unacceptable for performance reasons. Try running your script against a multi-gigabyte file.


Dave Benham

Aacini
Expert
Posts: 1659
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Determining the number of lines in a file.

#45 Post by Aacini » 09 Feb 2013 08:28

dbenham wrote:The whole point of this thread is to quickly get the number of lines in a file, assuming all lines are the same length. As Squashman clearly stated in the opening post to this thread, any technique that uses FOR /F or FINDSTR or FIND to scan the entire file is unacceptable for performance reasons. Try running your script against a multi-gigabyte file.


Dave Benham


Oops! You are right! I thought for a moment that the GOTO BREAK will break the FOR after read the second line (I was so tired last night... :oops: )

This problem can be solved executing FINDSTR /O outside a FOR, passing its output through a pipe to a code that read just the second line and terminate, so the SO will cancel FINDSTR after a "write to a non-existent pipe" error. Here it is:

Code: Select all

rem Get the length of the first line
findstr /O "^" "%~1" 2>NUL | findstr /V "^0:" 2>NUL | (set /P lineLen=& set lineLen ) > lineLen.txt
for /F "tokens=2 delims==:" %%a in (lineLen.txt) do set lineLen=%%a
echo Line length: %lineLen%

This way, first FINDSTR start output line sizes, second FINDSTR take they and output the second line, and the next code take it, save it in a file and terminate. At this moment second FINDSTR is aborted, so first FINDSTR is also aborted.

I tried at first to directly read the second line this way:

Code: Select all

findstr /O "^" "%~1" 2>NUL | (set /P =&set /P lineLen=& set lineLen ) > lineLen.txt
but for a reason I don't understand, only the first output line can be correctly read in the piped block, so I have to insert a second piped FINDSTR to move the second line to the first place.

Antonio

Post Reply