Ways to aware that some string contains a substrings

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Ways to aware that some string contains a substrings

#1 Post by siberia-man » 07 Jan 2014 07:09

I am trying to study for myself the most reliable ways to know that some string contains another string. Seems that some points are behind batch stuff.

Lets say we have two input string: STACK is a string to search in and NEEDLE is a string to be looked for. There are three answers.
1. STACK contains NEEDLE
2. STACK starts with NEEDLE
3. STACK ends with NEEDLE

There are explanations below showing possible troubles with them

1. We could say that the following code is responsible for the first answer:

Code: Select all

setlocal enabledelayedexpansion

set "if_stack=%~1"
set "if_needle=%~2"

if not "!if_stack!" == "!if_stack:%if_needle%=!" (
   endlocal
   exit /b 0
)

endlocal
exit /b 1


It works in the most of cases but we cannot be sure absolutely. It works until we use some special characters in input strings. I have investigated this issue with some of them and found that "=" (equal sign) and "*" (asterisk sign) bring to the issue. See the following examples:

Code: Select all

call if_contains "abc" "b" && echo OK
call if_contains "abc" "=" && echo OK


2. How to test if the string starts with another one we can learn by this link :StartsWith - Tests if a text starts with a given string. Unfortunately, we have the same troubles as described above.

In my opinion the better and the more reliable way is as follows. Estimate the length of NEEDLE. Bit the most beginning or ending part of the same length from STACK and compare it with NEEDLE. The NEEDLE length is calculated by the way as described here :strLen - returns the length of a string. The resulting algorithm is the following:

Code: Select all

setlocal enabledelayedexpansion

set "if_stack=%~1"
set "if_needle=%~2"

rem Estimate the length of the NEEDLE string
set "if_str=A!if_needle!"
set /a "if_len=0"
for /l %%a in ( 12, -1, 0 ) do (
   set /a "if_len|=1<<%%a"
   for %%b in ( !if_len! ) do if "!if_str:~%%b,1!" == "" set /a "if_len&=~1<<%%a"
)

rem Extract the leading %if_len% characters from the STACK string
call set "if_str=%%if_stack:~0,!if_len!%%"

if "!if_str!" == "!if_needle!" (
   endlocal
   exit /b 0
)

endlocal
exit /b 1


The same usage examples for the code above

Code: Select all

call if_starts "abc" "ab" && echo OK
call if_starts "abc" "a=" && echo OK


As a conclusion - my questions

Do you know the better solutions for StartsWith/EndsWith?
Do you know the reliable solution for Contains?

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Ways to aware that some string contains a substrings

#2 Post by foxidrive » 07 Jan 2014 07:30

Findstr with the /b and /e switches can handle begin and ends with.

Findstr also handles 'contains'.


Is your question academic?

berserker
Posts: 95
Joined: 18 Dec 2013 00:51

Re: Ways to aware that some string contains a substrings

#3 Post by berserker » 07 Jan 2014 07:48

siberia-man wrote:Do you know the better solutions for StartsWith/EndsWith?
Do you know the reliable solution for Contains?


what you are doing is re-inventing the wheel. like foxi said, the native findstr can do the job of "contains" and starts with, ends with. you can also use tools like grep.exe or awk.exe if you want to handle more string manipulation complexities, such as using advanced regular expressions.

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Ways to aware that some string contains a substrings

#4 Post by siberia-man » 07 Jan 2014 10:30

what you are doing is re-inventing the wheel

You're absolutely right =). As foxidrive have said, I have academic interest: is there native batch facilities to achieve the target. Of course, I know about find/findstr as a tool for searching an example of strings (we could add to the list awk/sed/etc) but the more interesting case is usage of pure batch things. Also there is good investigation by dbenham on StackOverflow about findstr undocumented features.

ShadowThief
Expert
Posts: 1160
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Ways to aware that some string contains a substrings

#5 Post by ShadowThief » 07 Jan 2014 10:32

findstr is native batch...

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Ways to aware that some string contains a substrings

#6 Post by siberia-man » 07 Jan 2014 10:50

ShadowThief wrote:findstr is native batch...

To be more clear... Talking about native batch I mean the CMD.EXE features only. In this case findstr is external command despite of that it is native thing in the operating systems.

ShadowThief
Expert
Posts: 1160
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Ways to aware that some string contains a substrings

#7 Post by ShadowThief » 07 Jan 2014 12:00

So based on http://ss64.com/nt/ you want to know if it's possible to determine if a substring exists using only

assoc, call, cd, cls, color, copy, date, del, dir, echo, endlocal, erase, exit, for, ftype, goto, if, md, mklink, move, path, pause, popd, prompt, pushd, rem, set, shift, start, time, timeout, type, ver, verify, vol, and ::

correct? If that's the case, then I am confident the answer is "yes," although I haven't tried it out yet.

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Ways to aware that some string contains a substrings

#8 Post by penpen » 07 Jan 2014 14:41

siberia-man wrote:Lets say we have two input string: STACK is a string to search in and NEEDLE is a string to be looked for. There are three answers.
1. STACK contains NEEDLE
2. STACK starts with NEEDLE
3. STACK ends with NEEDLE
(...)
As a conclusion - my questions
Do you know the better solutions for StartsWith/EndsWith?
Do you know the reliable solution for Contains?
You may implement one of the string matching algorithms in batch;
I think the Knuth–Morris–Pratt algorithm is simple enough to allow you to do this:
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
If the search string (NEEDLE) is only contained at maximum one time within the string to search (STACK),
then you only need to implement the algorithm kmp_search, (else you neeed the kmp_table algorithm):
kmp_search complexity is O(n) where n is the number of characters in the string to search (|STACK| in your case), so it is pretty fast (kmp_table is pretty fast, too).
You just have to translate the pseudocode to batch code.

You just have to compare the found index () of the first entry (0) and last possible entry (L:=|STACK|-|NEEDLE|).
If NEEDLE is contained in STACK found index (i) is in {0 : L}:
- i == 0 ==> StartsWith
- i == L ==> EndsWith
- else within

To compute the string length you may use jeb's strlen macro:
http://www.dostips.com/forum/viewtopic.php?p=11344#p11344

In whole a little bit work, but a nice and possible solution, i think.

penpen

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Ways to aware that some string contains a substrings

#9 Post by siberia-man » 07 Jan 2014 17:21

penpen
First of all I was looking for simple and short solutions. Any way, your suggestion is good enough to be called as native and the more reliable.

Squashman
Expert
Posts: 4465
Joined: 23 Dec 2011 13:59

Re: Ways to aware that some string contains a substrings

#10 Post by Squashman » 07 Jan 2014 17:37

siberia-man wrote:penpen
First of all I was looking for simple and short solutions. Any way, your suggestion is good enough to be called as native and the more reliable.

Were you expecting someone to write the code for you?

Not going to be a simple and short solution using the shell commands only. That is why we have more evolved executable programs that can do more complicated tasks.

Not sure why you think Native means the shell commands only. If Microsoft gives us a utility to use then it is native and it becomes the SHORT and MORE RELIABLE solution.

There isn't anyway Unix system programmers would have been able to manage their servers all those years with only the commands that the "SHELL" provided. I am sure most of them would say they would have never been able to do their jobs without SED, GREP or AWK. I bet they threw a hell of a party when PERL was released. Now we have all those great tools ported to Windows as well.

I am all for Native scripting and that is what this website is really dedicated too but I am not that much of a purist. Our definition of Native means Microsoft provided the tool with the operating system.

siberia-man
Posts: 208
Joined: 26 Dec 2013 09:28
Contact:

Re: Ways to aware that some string contains a substrings

#11 Post by siberia-man » 07 Jan 2014 18:32

I don't expect if someone solve the issue for me. I would like to discuss the possible solutions of the topic. As I've said above it is the academic interest only.

For example working in bash I can use something like below:

Code: Select all

if echo $STACK | grep -q "$NEEDLE" ; then
    echo CONTAINS
fi

if grep -q "$NEEDLE" <<< $STACK ; then
    echo CONTAINS
fi


In other hands, I can use another code looking little bit more complicated but faster and something I call "native":

Code: Select all

case "$STACK" in
*$NEEDLE*)
    echo CONTAINS
    ;;
esac

if [[ $STACK == *"$NEEDLE"* ]] ; then
    echo CONTAINS
fi

if [[ $STACK =~ .*$NEEDLE.* ]] ; then
    echo CONTAINS
fi


From all these examples above we can see the bucket of different solutions on shell programming using external commands and builtin features on the shell itself. Which of them could be used in the real task depends on the bash version, the experience of the programmer or specifics of the task. For example, GREP usage is simple and good solution coming the first one. But it will have performance issues in loops.

Once again. I was just looking for implementation of the task using only the builtin commands of the interpreter. My question had the academic interest. And... I am not purist =)

The following code is almost reliable as requested but (sadly) has lack as well as the most of batch scripts. Don't take your attention for some spaghetti code - all of this is just illustration

Code: Select all

::
:: test STRING -contains NEEDLE && echo OK
:: test STRING -starts NEEDLE && echo OK
:: test STRING -ends NEEDLE && echo OK
::
@echo off

setlocal enabledelayedexpansion

rem Skip estimation if one of STACK or NEEDLE is empty
set "if_stack=%~1"
set "if_needle=%~3"

if not defined if_stack (
   endlocal
   exit /b 1
)

if not defined if_needle (
   endlocal
   exit /b 1
)

set "if_str=A!if_stack!"
set /a "if_stack_len=0"
for /l %%a in ( 12, -1, 0 ) do (
   set /a "if_stack_len|=1<<%%a"
   for %%b in ( !if_stack_len! ) do if "!if_str:~%%b,1!" == "" set /a "if_stack_len&=~1<<%%a"
)

set "if_str=A!if_needle!"
set /a "if_needle_len=0"
for /l %%a in ( 12, -1, 0 ) do (
   set /a "if_needle_len|=1<<%%a"
   for %%b in ( !if_needle_len! ) do if "!if_str:~%%b,1!" == "" set /a "if_needle_len&=~1<<%%a"
)

set /a "if_rest_len=if_stack_len-if_needle_len"
set /a "if_str_pos=0"

for /l %%l in ( 0, 1, %if_rest_len% ) do (
   if "!if_stack:~%%l,%if_needle_len%!" == "!if_needle!" (
      if "%~2" == "-contains" (
         endlocal
         exit /b 0
      )
      if "%~2" == "-starts" if %%l equ 0 (
         endlocal
         exit /b 0
      )
      if "%~2" == "-ends" if %%l equ %if_rest_len% (
         endlocal
         exit /b 0
      )
   )
)

endlocal
exit /b 1

berserker
Posts: 95
Joined: 18 Dec 2013 00:51

Re: Ways to aware that some string contains a substrings

#12 Post by berserker » 07 Jan 2014 19:27

siberia-man wrote:But it will have performance issues in loops.

just curious , how? you mean grepping for strings while iterating a big file?

Squashman
Expert
Posts: 4465
Joined: 23 Dec 2011 13:59

Re: Ways to aware that some string contains a substrings

#13 Post by Squashman » 07 Jan 2014 19:48

berserker wrote:
siberia-man wrote:But it will have performance issues in loops.

just curious , how? you mean grepping for strings while iterating a big file?

I was wondering the same thing as I used to do bash scripting many years ago. I hope he is not referring to the cardinal sin of piping cat to grep.

berserker
Posts: 95
Joined: 18 Dec 2013 00:51

Re: Ways to aware that some string contains a substrings

#14 Post by berserker » 07 Jan 2014 20:05

Squashman wrote:
berserker wrote:
siberia-man wrote:But it will have performance issues in loops.

just curious , how? you mean grepping for strings while iterating a big file?

I was wondering the same thing as I used to do bash scripting many years ago. I hope he is not referring to the cardinal sin of piping cat to grep.

something like this

Code: Select all

while read line
do
   if grep "string" $line
   ....
   ...
   fi
done


that's not optimized. That's why, depends on how one designs his algorithm. :)

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Ways to aware that some string contains a substrings

#15 Post by foxidrive » 07 Jan 2014 23:31

This is as robust as simple batch gets:

Code: Select all

@echo off
set "string1=needleb"
set "string2=aneedle"
set "string3=aneedleb"

if /i "|||%string1%"=="|||needle%string1:needle=%" echo needle is at the beginning
if /i "%string2%|||"=="%string2:needle=%needle|||" echo needle is at the end
if /i not "%string3%"=="%string3:needle=%" echo string contains needle
pause



needle is at the beginning
needle is at the end
string contains needle

Post Reply