Page 1 of 2

Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 07:09
by siberia-man
I am trying to study for myself the most reliable ways to know that some string contains another string. Seems that some points are behind batch stuff.

Lets say we have two input string: STACK is a string to search in and NEEDLE is a string to be looked for. There are three answers.
1. STACK contains NEEDLE
2. STACK starts with NEEDLE
3. STACK ends with NEEDLE

There are explanations below showing possible troubles with them

1. We could say that the following code is responsible for the first answer:

Code: Select all

setlocal enabledelayedexpansion

set "if_stack=%~1"
set "if_needle=%~2"

if not "!if_stack!" == "!if_stack:%if_needle%=!" (
   endlocal
   exit /b 0
)

endlocal
exit /b 1


It works in the most of cases but we cannot be sure absolutely. It works until we use some special characters in input strings. I have investigated this issue with some of them and found that "=" (equal sign) and "*" (asterisk sign) bring to the issue. See the following examples:

Code: Select all

call if_contains "abc" "b" && echo OK
call if_contains "abc" "=" && echo OK


2. How to test if the string starts with another one we can learn by this link :StartsWith - Tests if a text starts with a given string. Unfortunately, we have the same troubles as described above.

In my opinion the better and the more reliable way is as follows. Estimate the length of NEEDLE. Bit the most beginning or ending part of the same length from STACK and compare it with NEEDLE. The NEEDLE length is calculated by the way as described here :strLen - returns the length of a string. The resulting algorithm is the following:

Code: Select all

setlocal enabledelayedexpansion

set "if_stack=%~1"
set "if_needle=%~2"

rem Estimate the length of the NEEDLE string
set "if_str=A!if_needle!"
set /a "if_len=0"
for /l %%a in ( 12, -1, 0 ) do (
   set /a "if_len|=1<<%%a"
   for %%b in ( !if_len! ) do if "!if_str:~%%b,1!" == "" set /a "if_len&=~1<<%%a"
)

rem Extract the leading %if_len% characters from the STACK string
call set "if_str=%%if_stack:~0,!if_len!%%"

if "!if_str!" == "!if_needle!" (
   endlocal
   exit /b 0
)

endlocal
exit /b 1


The same usage examples for the code above

Code: Select all

call if_starts "abc" "ab" && echo OK
call if_starts "abc" "a=" && echo OK


As a conclusion - my questions

Do you know the better solutions for StartsWith/EndsWith?
Do you know the reliable solution for Contains?

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 07:30
by foxidrive
Findstr with the /b and /e switches can handle begin and ends with.

Findstr also handles 'contains'.


Is your question academic?

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 07:48
by berserker
siberia-man wrote:Do you know the better solutions for StartsWith/EndsWith?
Do you know the reliable solution for Contains?


what you are doing is re-inventing the wheel. like foxi said, the native findstr can do the job of "contains" and starts with, ends with. you can also use tools like grep.exe or awk.exe if you want to handle more string manipulation complexities, such as using advanced regular expressions.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 10:30
by siberia-man
what you are doing is re-inventing the wheel

You're absolutely right =). As foxidrive have said, I have academic interest: is there native batch facilities to achieve the target. Of course, I know about find/findstr as a tool for searching an example of strings (we could add to the list awk/sed/etc) but the more interesting case is usage of pure batch things. Also there is good investigation by dbenham on StackOverflow about findstr undocumented features.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 10:32
by ShadowThief
findstr is native batch...

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 10:50
by siberia-man
ShadowThief wrote:findstr is native batch...

To be more clear... Talking about native batch I mean the CMD.EXE features only. In this case findstr is external command despite of that it is native thing in the operating systems.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 12:00
by ShadowThief
So based on http://ss64.com/nt/ you want to know if it's possible to determine if a substring exists using only

assoc, call, cd, cls, color, copy, date, del, dir, echo, endlocal, erase, exit, for, ftype, goto, if, md, mklink, move, path, pause, popd, prompt, pushd, rem, set, shift, start, time, timeout, type, ver, verify, vol, and ::

correct? If that's the case, then I am confident the answer is "yes," although I haven't tried it out yet.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 14:41
by penpen
siberia-man wrote:Lets say we have two input string: STACK is a string to search in and NEEDLE is a string to be looked for. There are three answers.
1. STACK contains NEEDLE
2. STACK starts with NEEDLE
3. STACK ends with NEEDLE
(...)
As a conclusion - my questions
Do you know the better solutions for StartsWith/EndsWith?
Do you know the reliable solution for Contains?
You may implement one of the string matching algorithms in batch;
I think the Knuth–Morris–Pratt algorithm is simple enough to allow you to do this:
http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm
If the search string (NEEDLE) is only contained at maximum one time within the string to search (STACK),
then you only need to implement the algorithm kmp_search, (else you neeed the kmp_table algorithm):
kmp_search complexity is O(n) where n is the number of characters in the string to search (|STACK| in your case), so it is pretty fast (kmp_table is pretty fast, too).
You just have to translate the pseudocode to batch code.

You just have to compare the found index () of the first entry (0) and last possible entry (L:=|STACK|-|NEEDLE|).
If NEEDLE is contained in STACK found index (i) is in {0 : L}:
- i == 0 ==> StartsWith
- i == L ==> EndsWith
- else within

To compute the string length you may use jeb's strlen macro:
http://www.dostips.com/forum/viewtopic.php?p=11344#p11344

In whole a little bit work, but a nice and possible solution, i think.

penpen

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 17:21
by siberia-man
penpen
First of all I was looking for simple and short solutions. Any way, your suggestion is good enough to be called as native and the more reliable.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 17:37
by Squashman
siberia-man wrote:penpen
First of all I was looking for simple and short solutions. Any way, your suggestion is good enough to be called as native and the more reliable.

Were you expecting someone to write the code for you?

Not going to be a simple and short solution using the shell commands only. That is why we have more evolved executable programs that can do more complicated tasks.

Not sure why you think Native means the shell commands only. If Microsoft gives us a utility to use then it is native and it becomes the SHORT and MORE RELIABLE solution.

There isn't anyway Unix system programmers would have been able to manage their servers all those years with only the commands that the "SHELL" provided. I am sure most of them would say they would have never been able to do their jobs without SED, GREP or AWK. I bet they threw a hell of a party when PERL was released. Now we have all those great tools ported to Windows as well.

I am all for Native scripting and that is what this website is really dedicated too but I am not that much of a purist. Our definition of Native means Microsoft provided the tool with the operating system.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 18:32
by siberia-man
I don't expect if someone solve the issue for me. I would like to discuss the possible solutions of the topic. As I've said above it is the academic interest only.

For example working in bash I can use something like below:

Code: Select all

if echo $STACK | grep -q "$NEEDLE" ; then
    echo CONTAINS
fi

if grep -q "$NEEDLE" <<< $STACK ; then
    echo CONTAINS
fi


In other hands, I can use another code looking little bit more complicated but faster and something I call "native":

Code: Select all

case "$STACK" in
*$NEEDLE*)
    echo CONTAINS
    ;;
esac

if [[ $STACK == *"$NEEDLE"* ]] ; then
    echo CONTAINS
fi

if [[ $STACK =~ .*$NEEDLE.* ]] ; then
    echo CONTAINS
fi


From all these examples above we can see the bucket of different solutions on shell programming using external commands and builtin features on the shell itself. Which of them could be used in the real task depends on the bash version, the experience of the programmer or specifics of the task. For example, GREP usage is simple and good solution coming the first one. But it will have performance issues in loops.

Once again. I was just looking for implementation of the task using only the builtin commands of the interpreter. My question had the academic interest. And... I am not purist =)

The following code is almost reliable as requested but (sadly) has lack as well as the most of batch scripts. Don't take your attention for some spaghetti code - all of this is just illustration

Code: Select all

::
:: test STRING -contains NEEDLE && echo OK
:: test STRING -starts NEEDLE && echo OK
:: test STRING -ends NEEDLE && echo OK
::
@echo off

setlocal enabledelayedexpansion

rem Skip estimation if one of STACK or NEEDLE is empty
set "if_stack=%~1"
set "if_needle=%~3"

if not defined if_stack (
   endlocal
   exit /b 1
)

if not defined if_needle (
   endlocal
   exit /b 1
)

set "if_str=A!if_stack!"
set /a "if_stack_len=0"
for /l %%a in ( 12, -1, 0 ) do (
   set /a "if_stack_len|=1<<%%a"
   for %%b in ( !if_stack_len! ) do if "!if_str:~%%b,1!" == "" set /a "if_stack_len&=~1<<%%a"
)

set "if_str=A!if_needle!"
set /a "if_needle_len=0"
for /l %%a in ( 12, -1, 0 ) do (
   set /a "if_needle_len|=1<<%%a"
   for %%b in ( !if_needle_len! ) do if "!if_str:~%%b,1!" == "" set /a "if_needle_len&=~1<<%%a"
)

set /a "if_rest_len=if_stack_len-if_needle_len"
set /a "if_str_pos=0"

for /l %%l in ( 0, 1, %if_rest_len% ) do (
   if "!if_stack:~%%l,%if_needle_len%!" == "!if_needle!" (
      if "%~2" == "-contains" (
         endlocal
         exit /b 0
      )
      if "%~2" == "-starts" if %%l equ 0 (
         endlocal
         exit /b 0
      )
      if "%~2" == "-ends" if %%l equ %if_rest_len% (
         endlocal
         exit /b 0
      )
   )
)

endlocal
exit /b 1

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 19:27
by berserker
siberia-man wrote:But it will have performance issues in loops.

just curious , how? you mean grepping for strings while iterating a big file?

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 19:48
by Squashman
berserker wrote:
siberia-man wrote:But it will have performance issues in loops.

just curious , how? you mean grepping for strings while iterating a big file?

I was wondering the same thing as I used to do bash scripting many years ago. I hope he is not referring to the cardinal sin of piping cat to grep.

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 20:05
by berserker
Squashman wrote:
berserker wrote:
siberia-man wrote:But it will have performance issues in loops.

just curious , how? you mean grepping for strings while iterating a big file?

I was wondering the same thing as I used to do bash scripting many years ago. I hope he is not referring to the cardinal sin of piping cat to grep.

something like this

Code: Select all

while read line
do
   if grep "string" $line
   ....
   ...
   fi
done


that's not optimized. That's why, depends on how one designs his algorithm. :)

Re: Ways to aware that some string contains a substrings

Posted: 07 Jan 2014 23:31
by foxidrive
This is as robust as simple batch gets:

Code: Select all

@echo off
set "string1=needleb"
set "string2=aneedle"
set "string3=aneedleb"

if /i "|||%string1%"=="|||needle%string1:needle=%" echo needle is at the beginning
if /i "%string2%|||"=="%string2:needle=%needle|||" echo needle is at the end
if /i not "%string3%"=="%string3:needle=%" echo string contains needle
pause



needle is at the beginning
needle is at the end
string contains needle