Page 1 of 3

Count string occurrences in text file

Posted: 18 May 2011 09:19
by renzlo
Experts,

how do you count string occurrences? For example, I have a text file with these contents:
hand randomtexthand randomtext
hand randomtext
hand randomtext
hand randomtext

I want to know the total count of hand,so ìt should be 5. But when using find /c the output is 4,it just counts the line. How can I solve this?

Re: Count string occurrences

Posted: 18 May 2011 10:33
by Ed Dyreen

Code: Select all

@echo off &SetLocal EnableExtensions EnableDelayedExpansion

echo.SKIP>"TST1.TXT"
::
for /f "useback tokens=*" %%! in (

   "TST.TXT"

) do    echo.%%~! >>"TST1.TXT"

set "ReadLine="
::
set /a COUNT = 0
set /a MEM = !COUNT!
::
set /a SKIP = 1
::
:LOOP ()
::
if !COUNT! equ !MEM! echo.SKIP=!SKIP!_

for /f "useback skip=%SKIP% tokens=*" %%! in (

   "TST1.TXT"

) do (
   if defined ReadLine (
      ::
      echo.if /i ["!ReadLine:~0,4!"] == ["hand"]
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_

      ) else (
         set "ReadLine=!ReadLine:~1!"
         echo.ReadLine0=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )

   ) else (
      set "ReadLine=%%~!"
      echo.ReadLine1=!ReadLine!_
      ::
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )
   )
   ::
   goto :LOOP "()"
)
echo.COUNT=!COUNT!_
echo.end
pause
exit


TST.TXT

Code: Select all

hand1 randomtext hand2 handhand
hand5 randomtext
hand6 randomtext


**censored** :!: I lost a whole hour helpin U :!:
censored == Kut in het nederlands
censored == Sheisse in deutscher Sprache
censored == Merde en francais
Why are our languages not beeing censored !
That is discrimination and racism :mrgreen:

Re: Count string occurrences

Posted: 18 May 2011 15:44
by renzlo
Thanks but it seems inaccurate. It should be 5 not 6.

Re: Count string occurrences

Posted: 18 May 2011 15:45
by Ed Dyreen
Can you solve it for me :?: No offence I am tired.

Gatte learn the language, it's not that hard use pause to debug

Re: Count string occurrences

Posted: 18 May 2011 15:54
by renzlo
Ed, thanks for the time, i really appreciate it. If only i knew what's the problem is, but im still a newbie. :(

Re: Count string occurrences in text file

Posted: 18 May 2011 15:58
by Ed Dyreen
Maybe delete isdefined += 1 somwhere

Re: Count string occurrences in text file

Posted: 18 May 2011 16:04
by Ed Dyreen

Code: Select all

@echo off &SetLocal EnableExtensions EnableDelayedExpansion

echo.SKIP>"TST1.TXT"
::
for /f "useback tokens=*" %%! in (

   "TST.TXT"

) do    echo.%%~! >>"TST1.TXT"

set "ReadLine="
::
set /a COUNT = 0
set /a MEM = !COUNT!
::
set /a SKIP = 1
::
:LOOP ()
::
if !COUNT! equ !MEM! echo.SKIP=!SKIP!_

for /f "useback skip=%SKIP% tokens=*" %%! in (

   "TST1.TXT"

) do (
   if defined ReadLine (
      ::
      echo.if /i ["!ReadLine:~0,4!"] == ["hand"]
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_

      ) else (
         set "ReadLine=!ReadLine:~1!"
         echo.ReadLine0=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )

   ) else (
      set "ReadLine=%%~!"
      echo.ReadLine1=!ReadLine!_
      ::
      if /i ["!ReadLine:~0,4!"] == ["hand"] (
         ::
         set /a COUNT += 1
         echo.COUNT=!COUNT!_

         set "ReadLine=!ReadLine:~4!"
         echo.ReadLine2=!ReadLine!_
      )

      if not defined ReadLine (
         ::
         set /a MEM = !COUNT!
         ::
         set /a SKIP += 1
      )
   )
   ::
   goto :LOOP "()"
)
echo.COUNT=!COUNT!_
echo.end
pause
exit


the count is six

hand1 randomtext hand2 handhand
hand5 randomtext
hand6 randomtext

Re: Count string occurrences in text file

Posted: 18 May 2011 17:36
by renzlo
Thanks Ed. Im gonna test the above code when i get back home.

Re: Count string occurrences in text file

Posted: 18 May 2011 19:13
by dbenham
If a case insensitive search is acceptable then the following is simpler and I presume significantly faster:

Code: Select all

@echo off
set "str= hand"
set file=hands.txt
set cnt=0
for /f ^"eol^=^

delims^=^" %%a in ('"findstr /i "/c:%str%" %file%"') do set "ln=%%a"&call :countStr

echo '%str%' appears %cnt% times in hands.txt (case insensitive)
exit /b

:countStr
  setlocal enableDelayedExpansion
  :loop
  set "ln2=!ln:*%str%=!"
  if "!ln2!" neq "!ln!" (
    set "ln=!ln2!"
    set /a "cnt+=1"
    goto :loop
  )
  endlocal & set cnt=%cnt%
exit /b

hands.txt

Code: Select all

  HAND randomtext handrandomtext
  handrandomtext
; hand randomtext
  hand randomtext
hand

results:

Code: Select all

' hand' appears 5 times in hands.txt (case insensitive)

I changed the search string to include a leading space and changed the text file to demonstrate issues with the FOR /F eol option. See the eol discussion embedded within Sorting tokens within a string for more info. Note that the last hand in hands.txt should not be counted because it is not preceded by a space.

I also changed the text file to include uppercase to demonstrate that the search is indeed case insensitive. FINDSTR is by default case sensitive, but the string substitution technique that I used can only be case insensitive.

Using FINDSTR is still very worth while because we don't want to waste time parsing lines that don't have the search string anywhere within them.

Dave Benham

Re: Count string occurrences in text file

Posted: 18 May 2011 20:32
by renzlo
thanks dave for the script. it is working. the problem now is that when hand was mixed with some words like handrandomtext, it is not counted, is there a way to solve this?

Re: Count string occurrences in text file

Posted: 18 May 2011 20:57
by dbenham
renzlo wrote: the problem now is that when hand was mixed with some words like handrandomtext, it is not counted, is there a way to solve this?

I don't understand - the example text file I ran has that exact string and it is counted properly. If you mean that something like "randomHANDrandom" is not counted, that is because my script intentionally is looking for a " hand" with a space in the front. You can simply modify my script to remove the leading space from the search string.

In other words set "str= hand" becomes set "str=hand"

Dave

Re: Count string occurrences in text file

Posted: 18 May 2011 21:23
by renzlo
yes i did that dave.

the code:

Code: Select all

@echo off
set "str=hand"
set file=hands.txt
set cnt=0
for /f ^"eol^=^

delims^=^" %%a in ('"findstr /i "/c:%str%" %file%"') do set "ln=%%a"&call :countStr

echo '%str%' appears %cnt% times in hands.txt (case insensitive)
exit /b

:countStr
  setlocal enableDelayedExpansion
  :loop
  set "ln2=!ln:*%str%=!"
  if "!ln2!" neq "!ln!" (
    set "ln=!ln2!"
    set /a "cnt+=1"
    goto :loop
  )
  endlocal & set cnt=%cnt%
exit /b


the contents of hands.txt:

Code: Select all

HAND randomtexthandrandomtext
handrandomtext
hand randomtext
hand randomtext
hand


output:

Code: Select all

'hand' appears 8 times in hands.txt (case insensitive)


it should be 6.

Re: Count string occurrences in text file

Posted: 18 May 2011 21:34
by dbenham
Oops! :oops: There was a bug when the entire line consists of nothing but "hand"(s)
Here is the fix:

Code: Select all

@echo off
set "str=hand"
set file=hands.txt
set cnt=0
for /f ^"eol^=^

delims^=^" %%a in ('"findstr /i "/c:%str%" %file%"') do set "ln=%%a"&call :countStr

echo '%str%' appears %cnt% times in hands.txt (case insensitive)
exit /b

:countStr
  setlocal enableDelayedExpansion
  :loop
  if defined ln (
    set "ln2=!ln:*%str%=!"
    if "!ln2!" neq "!ln!" (
      set "ln=!ln2!"
      set /a "cnt+=1"
      goto :loop
    )
  )
  endlocal & set cnt=%cnt%
exit /b


Dave

Re: Count string occurrences in text file

Posted: 18 May 2011 21:50
by renzlo
working great, thanks dave. thanks for your time. i really appreciate it.

Re: Count string occurrences in text file

Posted: 30 Jul 2012 07:15
by kmbarre
I know this is a really old thread. I've tried this and it works great. I was wonder how I might get it to loop through a series of files in a directory. I've tried the find dos command, but the problem is my files are very large and consist of one line. So the find command does not work as it counts theh number of lines that contain the string value. What the find command does do is spit out a list of files (if I use *.txt as my file identifier) and the count. Can someone help me convert this script to do something similar?