Count string occurrences in text file

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
renzlo
Posts: 116
Joined: 03 May 2011 19:06

Count string occurrences in text file

#1 Post by renzlo » 18 May 2011 09:19

Experts,

how do you count string occurrences? For example, I have a text file with these contents:
hand randomtexthand randomtext
hand randomtext
hand randomtext
hand randomtext

I want to know the total count of hand,so ìt should be 5. But when using find /c the output is 4,it just counts the line. How can I solve this?
Last edited by renzlo on 18 May 2011 15:55, edited 1 time in total.

Ed Dyreen
Expert
Posts: 1441
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Count string occurrences

#2 Post by Ed Dyreen » 18 May 2011 10:33

Code: Select all

@echo off &SetLocal EnableExtensions EnableDelayedExpansion

echo.SKIP>"TST1.TXT"
::
for /f "useback tokens=*" %%! in (

   "TST.TXT"

) do    echo.%%~! >>"TST1.TXT"

set "ReadLine="
::
set /a COUNT = 0
set /a MEM = !COUNT!
::
set /a SKIP = 1
::
:LOOP ()
::
if !COUNT! equ !MEM! echo.SKIP=!SKIP!_

for /f "useback skip=%SKIP% tokens=*" %%! in (

   "TST1.TXT"

) do (
   if defined ReadLine (
      ::
      echo.if /i ["!ReadLine:~0,4!"] == ["hand"]
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_

      ) else (
         set "ReadLine=!ReadLine:~1!"
         echo.ReadLine0=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )

   ) else (
      set "ReadLine=%%~!"
      echo.ReadLine1=!ReadLine!_
      ::
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )
   )
   ::
   goto :LOOP "()"
)
echo.COUNT=!COUNT!_
echo.end
pause
exit


TST.TXT

Code: Select all

hand1 randomtext hand2 handhand
hand5 randomtext
hand6 randomtext


**censored** :!: I lost a whole hour helpin U :!:
censored == Kut in het nederlands
censored == Sheisse in deutscher Sprache
censored == Merde en francais
Why are our languages not beeing censored !
That is discrimination and racism :mrgreen:

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: Count string occurrences

#3 Post by renzlo » 18 May 2011 15:44

Thanks but it seems inaccurate. It should be 5 not 6.

Ed Dyreen
Expert
Posts: 1441
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Count string occurrences

#4 Post by Ed Dyreen » 18 May 2011 15:45

Can you solve it for me :?: No offence I am tired.

Gatte learn the language, it's not that hard use pause to debug
Last edited by Ed Dyreen on 18 May 2011 15:58, edited 1 time in total.

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: Count string occurrences

#5 Post by renzlo » 18 May 2011 15:54

Ed, thanks for the time, i really appreciate it. If only i knew what's the problem is, but im still a newbie. :(

Ed Dyreen
Expert
Posts: 1441
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Count string occurrences in text file

#6 Post by Ed Dyreen » 18 May 2011 15:58

Maybe delete isdefined += 1 somwhere

Ed Dyreen
Expert
Posts: 1441
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Count string occurrences in text file

#7 Post by Ed Dyreen » 18 May 2011 16:04

Code: Select all

@echo off &SetLocal EnableExtensions EnableDelayedExpansion

echo.SKIP>"TST1.TXT"
::
for /f "useback tokens=*" %%! in (

   "TST.TXT"

) do    echo.%%~! >>"TST1.TXT"

set "ReadLine="
::
set /a COUNT = 0
set /a MEM = !COUNT!
::
set /a SKIP = 1
::
:LOOP ()
::
if !COUNT! equ !MEM! echo.SKIP=!SKIP!_

for /f "useback skip=%SKIP% tokens=*" %%! in (

   "TST1.TXT"

) do (
   if defined ReadLine (
      ::
      echo.if /i ["!ReadLine:~0,4!"] == ["hand"]
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_

      ) else (
         set "ReadLine=!ReadLine:~1!"
         echo.ReadLine0=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )

   ) else (
      set "ReadLine=%%~!"
      echo.ReadLine1=!ReadLine!_
      ::
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )
   )
   ::
   goto :LOOP "()"
)
echo.COUNT=!COUNT!_
echo.end
pause
exit


the count is six

hand1 randomtext hand2 handhand
hand5 randomtext
hand6 randomtext

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: Count string occurrences in text file

#8 Post by renzlo » 18 May 2011 17:36

Thanks Ed. Im gonna test the above code when i get back home.

dbenham
Expert
Posts: 1961
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Count string occurrences in text file

#9 Post by dbenham » 18 May 2011 19:13

If a case insensitive search is acceptable then the following is simpler and I presume significantly faster:

Code: Select all

@echo off
set "str= hand"
set file=hands.txt
set cnt=0
for /f ^"eol^=^

delims^=^" %%a in ('"findstr /i "/c:%str%" %file%"') do set "ln=%%a"&call :countStr

echo '%str%' appears %cnt% times in hands.txt (case insensitive)
exit /b

:countStr
  setlocal enableDelayedExpansion
  :loop
  set "ln2=!ln:*%str%=!"
  if "!ln2!" neq "!ln!" (
    set "ln=!ln2!"
    set /a "cnt+=1"
    goto :loop
  )
  endlocal & set cnt=%cnt%
exit /b

hands.txt

Code: Select all

  HAND randomtext handrandomtext
  handrandomtext
; hand randomtext
  hand randomtext
hand

results:

Code: Select all

' hand' appears 5 times in hands.txt (case insensitive)

I changed the search string to include a leading space and changed the text file to demonstrate issues with the FOR /F eol option. See the eol discussion embedded within Sorting tokens within a string for more info. Note that the last hand in hands.txt should not be counted because it is not preceded by a space.

I also changed the text file to include uppercase to demonstrate that the search is indeed case insensitive. FINDSTR is by default case sensitive, but the string substitution technique that I used can only be case insensitive.

Using FINDSTR is still very worth while because we don't want to waste time parsing lines that don't have the search string anywhere within them.

Dave Benham

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: Count string occurrences in text file

#10 Post by renzlo » 18 May 2011 20:32

thanks dave for the script. it is working. the problem now is that when hand was mixed with some words like handrandomtext, it is not counted, is there a way to solve this?

dbenham
Expert
Posts: 1961
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Count string occurrences in text file

#11 Post by dbenham » 18 May 2011 20:57

renzlo wrote: the problem now is that when hand was mixed with some words like handrandomtext, it is not counted, is there a way to solve this?

I don't understand - the example text file I ran has that exact string and it is counted properly. If you mean that something like "randomHANDrandom" is not counted, that is because my script intentionally is looking for a " hand" with a space in the front. You can simply modify my script to remove the leading space from the search string.

In other words set "str= hand" becomes set "str=hand"

Dave

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: Count string occurrences in text file

#12 Post by renzlo » 18 May 2011 21:23

yes i did that dave.

the code:

Code: Select all

@echo off
set "str=hand"
set file=hands.txt
set cnt=0
for /f ^"eol^=^

delims^=^" %%a in ('"findstr /i "/c:%str%" %file%"') do set "ln=%%a"&call :countStr

echo '%str%' appears %cnt% times in hands.txt (case insensitive)
exit /b

:countStr
  setlocal enableDelayedExpansion
  :loop
  set "ln2=!ln:*%str%=!"
  if "!ln2!" neq "!ln!" (
    set "ln=!ln2!"
    set /a "cnt+=1"
    goto :loop
  )
  endlocal & set cnt=%cnt%
exit /b


the contents of hands.txt:

Code: Select all

HAND randomtexthandrandomtext
handrandomtext
hand randomtext
hand randomtext
hand


output:

Code: Select all

'hand' appears 8 times in hands.txt (case insensitive)


it should be 6.

dbenham
Expert
Posts: 1961
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Count string occurrences in text file

#13 Post by dbenham » 18 May 2011 21:34

Oops! :oops: There was a bug when the entire line consists of nothing but "hand"(s)
Here is the fix:

Code: Select all

@echo off
set "str=hand"
set file=hands.txt
set cnt=0
for /f ^"eol^=^

delims^=^" %%a in ('"findstr /i "/c:%str%" %file%"') do set "ln=%%a"&call :countStr

echo '%str%' appears %cnt% times in hands.txt (case insensitive)
exit /b

:countStr
  setlocal enableDelayedExpansion
  :loop
  if defined ln (
    set "ln2=!ln:*%str%=!"
    if "!ln2!" neq "!ln!" (
      set "ln=!ln2!"
      set /a "cnt+=1"
      goto :loop
    )
  )
  endlocal & set cnt=%cnt%
exit /b


Dave

renzlo
Posts: 116
Joined: 03 May 2011 19:06

Re: Count string occurrences in text file

#14 Post by renzlo » 18 May 2011 21:50

working great, thanks dave. thanks for your time. i really appreciate it.

kmbarre
Posts: 13
Joined: 30 Jul 2012 07:09

Re: Count string occurrences in text file

#15 Post by kmbarre » 30 Jul 2012 07:15

I know this is a really old thread. I've tried this and it works great. I was wonder how I might get it to loop through a series of files in a directory. I've tried the find dos command, but the problem is my files are very large and consist of one line. So the find command does not work as it counts theh number of lines that contain the string value. What the find command does do is spit out a list of files (if I use *.txt as my file identifier) and the count. Can someone help me convert this script to do something similar?

Post Reply