Page 1 of 1

How to count semicolons (;) in a row?

Posted: 29 Sep 2017 07:52
by miskox
Hi all!

I have a .txt file (.csv delimeted by ;).

How can I count semicolons in each row?

Code: Select all

field1;field2;field3;
field1;field2;field3;
field1;field2;field3;
field1;field2;
field1;field2;field3;


In this way I would be able to tell if there is a row in a file that is corrupt (wrong) - wrong number of semicolons.

Let's say that there must be 3 semicolons in a row. If there is a row with different numbers of semicolons (less than 3 or more than 3) I want this row to be displayed.

Any ideas? I don't know where to start.

Thanks.
Saso

Re: How to count semicolons (;) in a row?

Posted: 29 Sep 2017 08:34
by Squashman
I believe this function will help you.
viewtopic.php?f=3&t=6429#p41035

Code: Select all

@echo off

FOR /F "delims=" %%G IN (input.txt) DO (
   CALL :occur "%%~G"
)
pause
GOTO :EOF
:occur
setlocal EnableDelayedExpansion
set i=0
set "x=%~1"
set "x!i!=%x:;=" & set /A i+=1 & set "x!i!=%"
echo number of semicolons: %i%
endlocal

Re: How to count semicolons (;) in a row?

Posted: 29 Sep 2017 11:29
by aGerman
Actually you don't even need to create an assoziative array.

Code: Select all

@echo off &setlocal
set "file=test.csv"
set "n=3"

for /f usebackq^ delims^=^ eol^= %%i in ("%file%") do (
  setlocal
  set "x=%%i"
  call :count || echo %%i
  endlocal
)
pause
exit /b

:count
set "x=%x:;=" & set /a n-=1 & set "x=%"
exit /b %n%

Steffen

Re: How to count semicolons (;) in a row?

Posted: 29 Sep 2017 11:48
by Squashman
I like how you set the exitcode and used that with conditional execution. Very clever!

Re: How to count semicolons (;) in a row?

Posted: 29 Sep 2017 14:40
by dbenham
Yes, very clever solution.

But the n value must be reset each iteration.
And there is no need to set any string variables when counting.
And with a bit more work, the technique can safely deal with quotes and poison characters, though quoted ; will be mistakenly counted either way.
And no need to SETLOCAL/ENDLOCAL within loop, unless you really can't have an extra variable defined after the test.

Code: Select all

@echo off
setlocal
set "file=test.csv"
set "count=3"

for /f "usebackq delims=" %%i in ("%file%") do (
  set "x=%%i"
  call :count || echo %%i
)
pause
exit /b

:count
set /a n=count
set "x=%x:"=%"
break "%x:;="&set /a n-=1&break "%"
exit /b %n%

I'm not sure what is the fastest "null op" command. Other options are

Code: Select all

rem.
rem^ %=with a trailing space after the caret=%

The technique will have problems if a line length gets anywhere close to 8191 bytes long.

Another option is to use the fast strlen function to get the string length, remove all semicolons, and then compute the new length, and subtract to get the count.

Dave Benham

Re: How to count semicolons (;) in a row?

Posted: 29 Sep 2017 14:55
by aGerman
You are absolutely right about the setlocal/endlocal. And to be honest I never thought about to use any other command than SET :shock:

Bookmarked :wink:

Steffen

Re: How to count semicolons (;) in a row?

Posted: 30 Sep 2017 10:13
by dbenham
I just realized that the n count variable only needs to be reset if you remove the SETLOCAL/ENDLOCAL.

I'm thinking that resetting one variable is more efficient than SETLOCAL/ENDLOCAL. But I don't think I've actually ever done any testing.


Dave Benham

Re: How to count semicolons (;) in a row?

Posted: 30 Sep 2017 12:20
by aGerman
Yes executing setlocal/endlocal multiple times is slow. I don't know exactly how it was implemented but since endlocal has to restore the original environment I assume that setlocal saved a copy of it. Resetting only one environment variable is for sure much more efficient.

Steffen

Re: How to count semicolons (;) in a row?

Posted: 02 Oct 2017 07:02
by miskox
Thank you all!

It really works - though I really don't understand the code.

Thanks again.

Saso

Re: How to count semicolons (;) in a row?

Posted: 02 Oct 2017 13:21
by aGerman
miskox wrote:I really don't understand the code.

I merged Dave's proposal with my FOR options and added some remarks.

Code: Select all

@echo off &setlocal
set "file=test.csv"
set "count=3"

REM Read the file line-wise and assign them to %%i.
REM The escape sequences are to set eol to nothing. Thus, lines with leading semi-colons are processed as well.
for /f usebackq^ delims^=^ eol^= %%i in ("%file%") do (
  REM Assign the content of %%i to x.
  set "x=%%i"
  REM Call subroutine :count. If an errorlevel other than 0 was returned then echo will be executed.
  call :count || echo %%i
)
pause
exit /b

:count
REM Reset the variable n.
set /a n=count
REM Remove all quotation marks in x.
set "x=%x:"=%"
REM Access the internal iterations that CMD does if a character will be replaced.
REM Thus, for each occurrence of a semi-colon n will be decreased by one.
REM Only if exactly 3 semi-colons where found n is 0 after this operation.
break "%x:;="&set /a n-=1&break "%"
REM Return n as errorlevel to the point where :count was called.
exit /b %n%

Steffen

Re: How to count semicolons (;) in a row?

Posted: 03 Oct 2017 04:19
by miskox
Steffen, thank you. The part I don't understand is:

Code: Select all

REM Access the internal iterations that CMD does if a character will be replaced.
REM Thus, for each occurrence of a semi-colon n will be decreased by one.
REM Only if exactly 3 semi-colons where found n is 0 after this operation.
break "%x:;="&set /a n-=1&break "%"


I ran the code with ECHO ON to see what happens. I see that BREAK (or CMD?) executes these commands so many times as how many fields are there (without being a loop there or something). This part is a mystery to me.

Code: Select all

>break "field15"  & set /a n-=1  & break "field25"  & set /a n-=1  & break "field35"  & set /a n-=1  & break ""


Thanks.
Saso

Re: How to count semicolons (;) in a row?

Posted: 03 Oct 2017 05:18
by aGerman
That's the tricky part Saso :lol:
If you remove the SET /A command and the concatenated second BREAK then the following is left over:

Code: Select all

break "%x:;=%"

Forget about BREAK. This is a command which absolutely does no operation and is only in place to have a valid syntax. What we are after is the string manipulation where the semi-colons in x are replaced with "nothing". In order to perform this replacement the CMD has to begin at the first character and search for the first occurrence of a semi-colon. Then it will be replaced. After that the CMD begins again with the next character and searches the next semi-colon. This will be repeated until the end of the string was reached.
As your test with ECHO ON shows we are able to access these internal iterations if you put command concatenations inside of the replacement syntax. That's an undocumented behavior but it works :wink:

Steffen

Re: How to count semicolons (;) in a row?

Posted: 03 Oct 2017 06:57
by Aacini
@miskox,

A detailed explanation of this method is given at this thread.

Antonio