Another take on safe string replace

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Another take on safe string replace

#1 Post by Liviu » 10 Sep 2012 00:16

As discussed many times before, the stock string substitution !text:%find%=%replace%! has issues with leading ~* and embedded = characters. There are known workarounds in special cases, e.g. when %find% is a single character, and there is always the brute force approach of doing an incremental char-by-char scan of !text!, instead.

Below is an alternative approach, borrowing from prior art using delimited for/f loops and the tunneling trick. Basic idea is straightforward - replace the problem characters with %~1, %~2 etc, perform the substitution on the sanitized strings, then restore the characters replaced in the first step. One complication is that the initial replacement must be done in one pass, otherwise escaping % and ~ would step over each other.

The code below won't handle quotes and control characters, and is way slower than optimal. That aside, if you find other holes or failure cases, please chime in.

Code: Select all

@echo off & setlocal disableDelayedExpansion
call :repStr "%~1" "%~2" "%~3" outStr
( endlocal
  if not "%~4"=="" (set "%~4=%outStr%") else if not "%~1"=="" (set "%~1=%outStr%")
) & goto :eof

:: replaces a substring with a (possibly empty) substitute string
:: to be called from 'setlocal disableDelayedExpansion' environment
::
:: %1 [in,  required] - name of variable containing the input string
:: %2 [in,  required] - name of variable containing the string being replaced
:: %3 [in,  optional] - name of variable containing the replacement string
::                      if missing or empty, occurences of string %2 are removed
:: %4 [out, optional] - name of variable to receive the output string
::                      if missing or empty, the %1 value is updated in place
::
:: errorlevel  0  ok
:: errors      1  missing %1 input string variable, error
::             2  missing %2 string to replace variable, error
:: warnings   -1  empty %1 input string variable, output string set to empty
::            -2  empty %2 string to replace variable, input copied to output
::           -10  called from enableDelayedExpansion, returned output may be wrong

:repStr
if "%~1"=="" (exit /b 1) else if "%~2"=="" exit /b 2

setlocal disableDelayedExpansion
if not "%~4"=="" (set "outStrRef=%~4") else (set "outStrRef=%~1")

setlocal enableDelayedExpansion
set "inStr=!%~1!" & if not defined inStr endlocal & endlocal & set "%outStrRef%=" & exit /b -1
set "oldStr=!%~2!" & if not defined oldStr endlocal & endlocal & set "%outStrRef%=%inStr%" & exit /b -2
if not "%~3"=="" (set "newStr=!%~3!") else set "newStr="
endlocal & set "inStr=%inStr%" & set "oldStr=%oldStr%" & set "newStr=%newStr%"

call :rigStr inStr
call :rigStr oldStr
call :rigStr newStr

setlocal enableDelayedExpansion
set "outStr=!inStr:%oldStr%=%newStr%!"
endlocal & set "outStr=%outStr%"

for /f "tokens=1-5" %%1 in ("%% ~ * = !") do (
  set "outStr=%outStr%"
)
endlocal & set "outStr=%outStr%"
if not "!"=="" (exit /b 0) else (exit /b -10)

:: replaces '%' = '%~1', '~' = '%~2', '*' = '%~3', '=' = '%~4', '!' = '%~5'

:rigStr %~*=!
setlocal disableDelayedExpansion

setlocal enableDelayedExpansion
set "tail=#!%~1!"
endlocal & set "tail=%tail%"

set "outStr="
:loop
for /F "delims=%%~*=!" %%A in ("%tail%") do (
  set "next=%%A"
  call :next
)
if defined tail goto :loop
endlocal & set "%~1=%outStr%"
goto :eof

:: used by rigStr - replaces one delim, updates 'outStr' and 'tail'

:next
setlocal enableDelayedExpansion
call :strlen next offset
if %offset% leq 1 (
  if "!tail:~1,1!"=="%%" (
    set "outStr=!outStr!%%~1"
  ) else if "!tail:~1,1!"=="~" (
    set "outStr=!outStr!%%~2"
  ) else if "!tail:~1,1!"=="*" (
    set "outStr=!outStr!%%~3"
  ) else if "!tail:~1,1!"=="=" (
    set "outStr=!outStr!%%~4"
  ) else if "!tail:~1,1!"=="^!" (
    set "outStr=!outStr!%%~5"
  )
  set "tail=!tail:~2!"
) else (
  set "outStr=!outStr!!next:~1!"
  set "tail=!tail:~%offset%!"
)
if defined tail (
  set "tail=#!tail!"
)
endlocal & set "outStr=%outStr%" & set "tail=%tail%"
goto :eof

:: http://www.dostips.com/?t=Function.strLen
::
:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101116 :$categories StringOperation
:$source http://www.dostips.com
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,0) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b

Liviu

jeb
Expert
Posts: 1058
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Another take on safe string replace

#2 Post by jeb » 10 Sep 2012 04:34

Hi Liviu,

nice idea to change them, this way. :)
But I suppose it could be a bit faster with less setlocal changes.

Or even with an other technic.
You use the FOR/F loop with delimiters and a strlen for each part,
but it could also be possible to use the delayed subtitution.

Code: Select all

set inStr=!inStr:%%=%%~1!
set head=!inStr:*~=!
call :rebuildStr "%%%%~2"
set head=!inStr:**=!
call :rebuildStr "%%%%~3"

...
:rebuildStr
call :strlen head strsize
set "inStr=!head!%~1!inStr:~strsize!"
exit /b


Liviu wrote:The code below won't handle quotes and control characters,...

But that would be the problematic part :), without this it's not a real solution.

jeb

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Another take on safe string replace

#3 Post by Liviu » 10 Sep 2012 10:27

jeb wrote:But I suppose it could be a bit faster with less setlocal changes.

Yes, and also less CALLs and GOTOs. At this point I consider it a proof of concept more than production optimized code.

jeb wrote:Or even with an other technic.
You use the FOR/F loop with delimiters and a strlen for each part,
but it could also be possible to use the delayed subtitution.

Code: Select all

set inStr=!inStr:%%=%%~1!
set head=!inStr:*~=!
call :rebuildStr "%%%%~2"

You could optimize some of the substitutions, but I don't think you can eliminate the for/f loops entirely. As I said, at least the '%' and '~' characters must be replaced in one pass.

Assume for example the input string is "%~" and you want to replace "~" with "_". Doing it sequentially would mean:
- replace "%" with "%~1": input = "%~1~", string-to-replace and replacement-string unchanged;
- replace "~" with "%~2": input = "%%~21%~2", string-to-replace = "%~2", replacement-string unchanged;
- perform replacement: output = "%_1_";
- replace back "%~1" with "%", "%~2" with "~": output = "%_1_";
In other words, it won't work.

jeb wrote:
Liviu wrote:The code below won't handle quotes and control characters,...

But that would be the problematic part :), without this it's not a real solution.

Control characters other than CR and LF do in fact work fine, and those two are of less concern in the context of text line-oriented processing. As for quotes, that's a solvable problem, but it will complicate the code a bit. Before going all the way out, thought I'd first check if the method itself was sane and sound ;-)

Liviu

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Another take on safe string replace

#4 Post by Ed Dyreen » 10 Sep 2012 17:25

[edit 10 Sep 2012 22:47]
Hi Liviu,

Here is another one,

The upside: macro, the downside: brute force :(

Code: Select all

@echo off &prompt $G &setlocal disableDelayedExpansion &set $lf=^


::
set "$plf=%%$lf%%"
set "$mn1c=%%$plf%%^"

%=   =%set ^"macro=^(^
%$mn1c%
%=      =%set $=!$:#=##!%$mn1c%
%$mn1c%
%=      =%set "$=!$:"=""!^"%$mn1c%
%=      =%set "$=!$:^=^^!"^^^&call set "$=%%^$:^!="#"^!%%"^^^&set "$=!$:"#"=^!"^^^!%$mn1c%
%=      =%set "$=!$:""="!^"%$mn1c%
%$mn1c%
%=      =%set "c=-1"^&for /l %%? in () do set /a c += 1 ^>nul ^&for %%? in (!c!) do (%$mn1c%
%=         =%if "!$:~%%?,1!" == "" (%$mn1c%
%=            =%set $=!$:##=#!%$mn1c%
%=            =%echo.!$!^^^!%$mn1c%
%=            =%exit%$mn1c%
%=         =%)%$mn1c%
%=         =%if "!$:~%%?,1!" == "*" (%$mn1c%
%=            =%set "0=!$:~0,%%?!"%$mn1c%
%=            =%set "1=!$:~%%?!"%$mn1c%
%=            =%set "$=!0!4#2!1:~1!"%$mn1c%
%=         =%)%$mn1c%
%=         =%if "!$:~%%?,1!" == "~" (%$mn1c%
%=            =%set "0=!$:~0,%%?!"%$mn1c%
%=            =%set "1=!$:~%%?!"%$mn1c%
%=            =%set "$=!0!1#26!1:~1!"%$mn1c%
%=         =%)%$mn1c%
%=         =%if "!$:~%%?,1!" == "=" (%$mn1c%
%=            =%set "0=!$:~0,%%?!"%$mn1c%
%=            =%set "1=!$:~%%?!"%$mn1c%
%=            =%set "$=!0!6#1!1:~1!"%$mn1c%
%=         =%)%$mn1c%
%=      =%)%$mn1c%
%$mn1c%
%=   =%)"

setlocal enableExtensions enableDelayedExpansion

set "$=#,*,~,=,^^" !

echo.
echo.0$=!$!_
for /f "delims=" %%§ in ( 'cmd /v:on /e:on /t:0B /q /c "!macro!"' ) do (
   ::
   set "$=%%§"
)
echo.$=!$!_


set "$=#,*,~,=,^^,^!" !

echo.
echo.1$=!$!_
for /f "delims=" %%§ in ( 'cmd /v:on /e:on /t:0B /q /c "!macro!"' ) do (
   ::
   set "$=%%§"
)
echo.$=!$!_

pause
exit

Code: Select all

0$=#,*,~,=,^_
$=#,4#2,1#26,6#1,^_

1$=#,*,~,=,^,!_
$=#,4#2,1#26,6#1,^,!_
Druk op een toets om door te gaan. . .

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Another take on safe string replace

#5 Post by Liviu » 10 Sep 2012 21:16

Ed Dyreen wrote:The upside: macro, the downside: brute force :(

Technically, another downside is that it's spawning a second instance of cmd.

Can't and won't say I followed all of it, but there seems to be a caret quoting imbalance. If you change '$' to just 'set $=#,*,~,=,^^^^' the output doubles the original carets.

Code: Select all

$=#,*,~,=,^^_
#,4#2,1#26,6#1,^^^^
$=#,4#2,1#26,6#1,^^^^_

Liviu

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Another take on safe string replace

#6 Post by Ed Dyreen » 10 Sep 2012 21:56

Liviu wrote:Technically, another downside is that it's spawning a second instance of cmd.
I didn't know how else to break the infinite loop but you could eliminate that and count the string.
Liviu wrote:there seems to be a caret quoting imbalance.
Sorry, forgot to retract. ( No production code, just brainstorming ) :wink:

Ed

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Another take on safe string replace

#7 Post by dbenham » 11 Sep 2012 10:31

@Liviu - interesting algorithm :D

@Ed - If you are going to take the time to do a brute force search and replace, you might as well do the desired search and replace instead of an intermediate result. :lol:

One big advantage of the brute force technique is it supports a case sensitive search.

Here is a version that avoids the use of CALL or GOTO. I think it is about as efficient as it can get for brute force, and it could easily be converted into a macro. I think it supports all characters accept LF anywhere and CR at the end. It must be called with delayed expansion disabled.

With the addition of the safe return technique, it could support all chars, (except nul of course) and could also support calls with delayed expansion enabled.

I'd be curious in the performance differences between this brute force method and Liviu's new method, once it is finished and optimized.

Code: Select all

@echo off
:Replace InputVar  OutputVar  SearchVar  [ReplaceVar]  [/I]
::
::  Perform a search and replace on the contents of a variable.
::
::  InputVar = Name of a variable containing the source string.
::
::  OutputVar = Name of a variable where result is to be stored.
::
::  SearchVar = Name of variable containing the search string.
::
::  ReplaceVar = Name of variable containing the replacement string.
::               If ReplaceVar is missing or is not defined then the
::               search string is replaced with an empty string.
::
::  The /I option specifies a case insensitive search.
::
::  The number of replacements made is returned as errorlevel.
::
::  If an error occurs then OutVar is not set and
::  the errorlevel is set to -1.
::
::  Replace should be called with delayed expansion disabled.
::  It could easily be extended with a safe return technique that
::  supports delayed expansion calls.
::
setlocal enableDelayedExpansion

  ::error checking
  if "%~3"=="" (
    >&2 echo ERROR: Insufficient arguments
    exit /b -1
  )
  if not defined %~1 (
    >&2 echo ERROR: variable %~1 not defined
    exit /b -1
  )
  if not defined %~3 (
    >&2 echo ERROR: searchVar %2 not defined
    exit /b -1
  )
  if "%~5" neq "" if /i "%~4" neq "/I" (
    >&2 echo ERROR: Invalid option %4
    exit /b -1
  )

  ::get input, search, and replace strings
  set "_input=!%~1!"
  set "_search=!%~3!"
  set "_replace=!%~4!"

  ::compute length of _input
  set "str=A!_input!"
  set inputLen=0
  for /l %%A in (12,-1,0) do (
    set /a "inputLen|=1<<%%A"
    for %%B in (!inputLen!) do if "!str:~%%B,1!"=="" set /a "inputLen&=~1<<%%A"
  )

  ::compute length of _search
  set "str=A!_search!"
  set searchLen=0
  for /l %%A in (12,-1,0) do (
    set /a "searchLen|=1<<%%A"
    for %%B in (!searchLen!) do if "!str:~%%B,1!"=="" set /a "searchLen&=~1<<%%A"
  )

  ::perform search and replace on input
  set "rtn="
  set /a "end=inputLen-searchLen, beg=0"
  for %%l in (%searchLen%) do (
    for /l %%o in (0 1 !end!) do (
      if %%o geq !beg! if %~5 "!_input:~%%o,%%l!"=="!_search!" (
        set /a "len=%%o-beg"
        for /f "tokens=1,2" %%a in ("!beg! !len!") do set "rtn=!rtn!!_input:~%%a,%%b!!_replace!"
        set /a "beg=%%o+searchLen, replaceCnt+=1"
      )
    )
  )
  for %%a in (!beg!) do set "rtn=!rtn!!_input:~%%a!"

  ::define a linefeed variable
  set LF=^


  ::safely return the result if rtn defined
  for /f eol^=^%LF%%LF%delims^= %%a in ("!rtn!") do (
    endlocal
    set "%~2=%%a"
    exit /b %replaceCnt%
  )

::return empty string if rtn not defined
endlocal & set "%~2=" & exit /b %replaceCnt%


Dave Benham

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Another take on safe string replace

#8 Post by Liviu » 11 Sep 2012 23:21

Below is a variation on the original code, with support added for stray parantheses.

Code: Select all

@echo off

if not "!"=="" goto:repStr

setlocal disableDelayedExpansion
call :repStr "%~1" "%~2" "%~3" outStr
endlocal & (
  if not "%~4"=="" (
    set "%~4=%outStr%"
  ) else if not "%~1"=="" (
    set "%~1=%outStr%"
  )
)
goto :eof

:: replaces a substring with a (possibly empty) substitute string
:: to be called from 'setlocal disableDelayedExpansion' environment
::
:: %1 [in,  required] - name of variable containing the input string
:: %2 [in,  required] - name of variable containing the string being replaced
:: %3 [in,  optional] - name of variable containing the replacement string
::                      if missing or empty, occurences of string %2 are removed
:: %4 [out, optional] - name of variable to receive the output string
::                      if missing or empty, the %1 value is updated in place
::
:: errorlevel  0  ok
:: errors      1  missing %1 input string variable, error
::             2  missing %2 string to replace variable, error
:: warnings   -1  empty %1 input string variable, output string set to empty
::            -2  empty %2 string to replace variable, input copied to output
::           -10  called from enableDelayedExpansion, returned output may be wrong

:repStr
if "%~1"=="" (exit /b 1) else if "%~2"=="" exit /b 2

setlocal disableDelayedExpansion
if not "%~4"=="" (set "outStrRef=%~4") else (set "outStrRef=%~1")

setlocal enableDelayedExpansion
set "inStr=!%~1!" & if not defined inStr endlocal & endlocal & (
  set "%outStrRef%="
) & exit /b -1
set "oldStr=!%~2!" & if not defined oldStr endlocal & endlocal & (
  set "%outStrRef%=%inStr%"
) & exit /b -2
if not "%~3"=="" (
  set "newStr=!%~3!"
) else (
  set "newStr="
)
endlocal & (
  @rem double double-quotes for caret sanity
  set "inStr=%inStr:"=""%"
  set "oldStr=%oldStr:"=""%"
  if defined newStr set "newStr=%newStr:"=""%"
)

call :rigStr inStr
call :rigStr oldStr
call :rigStr newStr

setlocal enableDelayedExpansion
set "outStr=!inStr:%oldStr%=%newStr%!"
endlocal & set "outStr=%outStr%"

@rem outer double-quotes %%0 and %%8 not used
for /f "usebackq tokens=1-8" %%0 in ('" %% ~ * = ! ^ "^"" "') do (
  endlocal & endlocal & set "outStr=%outStr%"
)
if not "!"=="" (exit /b 0) else (exit /b -10)

:: replaces '%' = '%~1', '~' = '%~2', '*' = '%~3', '=' = '%~4', '!' = '%~5', '^' = '%~6', '""' = '%~7'

:rigStr %~*=!"
setlocal enableDelayedExpansion
set "tail=#!%~1!"
endlocal & set "tail=%tail%"

set "outStr="
:loop
for /f delims^=%%~*^=^^!^" %%A in ("%tail:"=""%") do (
  set "next=%%A"
  call :next
)
if defined tail goto :loop
set "%~1=%outStr%"
goto :eof

:: used by :rigStr - replaces one delim, updates 'outStr' and 'tail'

:next
setlocal enableDelayedExpansion
call :strlen next offset
if %offset% leq 1 (
  if not "!tail:~1,1!"==^""" (
    if "!tail:~1,1!"=="%%" (
      set "outStr=!outStr!%%~1"
    ) else if "!tail:~1,1!"=="~" (
      set "outStr=!outStr!%%~2"
    ) else if "!tail:~1,1!"=="*" (
      set "outStr=!outStr!%%~3"
    ) else if "!tail:~1,1!"=="=" (
      set "outStr=!outStr!%%~4"
    ) else if "!tail:~1,1!"=="^!" (
      set "outStr=!outStr!%%~5"
    ) else if "!tail:~1,1!"=="^" (
      set "outStr=!outStr!^^"
    )
    set "tail=!tail:~2!"
  ) else (
    set "outStr=!outStr!%%~7"
    @rem skip over both double-quotes
    set "tail=!tail:~3!"
  )
) else (
  set "outStr=!outStr!!next:~1!"
  set "tail=!tail:~%offset%!"
)
if defined tail (
  set "tail=#!tail!"
)
endlocal & (
  set "outStr=%outStr%"
  set "tail=%tail%"
)
goto :eof

:: http://www.dostips.com/?t=Function.strLen
::
:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20101116 :$categories StringOperation
:$source http://www.dostips.com
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,0) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b


dbenham wrote:One big advantage of the brute force technique is it supports a case sensitive search.

Good point, which somehow got overlooked in the discussion so far.

dbenham wrote:I'd be curious in the performance differences between this brute force method and Liviu's new method, once it is finished and optimized.

For now, my code is only getting slower with the update above ;-) I'll try to tidy it up, but that may not happen right away. Roughly speaking, its run time scales about linearly with the number of special characters in the strings, while the brute force approach scales with the length itself of the string. For sufficiently long and random strings, my code would asymptotically win. However, for environment variables limited to 8K long (even less than that due to the %~ expansions) I can easily see it losing, even badly.

Liviu

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Another take on safe string replace

#9 Post by Liviu » 16 Sep 2012 23:24

dbenham wrote:I'd be curious in the performance differences between this brute force method and Liviu's new method, once it is finished and optimized.

I've run a few test cases, and your "brute-force" code wins easily against my "piecewise-replace". Can't give one single hard number since it much depends on the text contents, but from what I've gathered the "brute-force" is between 2x to 10x faster. I compared the snippets as posted above, don't know that there is much left to optimize, or worth of optimizing at this point.

Liviu

Sponge Belly
Posts: 234
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: Another take on safe string replace

#10 Post by Sponge Belly » 04 Oct 2012 17:06

Hello All!

Sorry for the late arrival. What did I miss?

Anyways, I've cranked out a little program similar to Liviu's. Only my effort counts the occurrences of the substring in the source string and creates a 0-based index list of the position of each occurrence.

Code: Select all

@echo off & setlocal enableextensions
if "%~1" neq "/re-enter" goto main
shift & shift
call %~0
goto end

:main
set "str1=A galaxy far, far away..."
set "str2=one fish two fish red fish blue fish"

call :FindOccurs A "%str1%" occurs olist
call :FindOccurs FAR "%str1%" occurs olist
call :FindOccurs "Obi "Ben" Kinobi" "%str1%" occurs olist
call :FindOccurs FISH "%str2%" occurs olist
call :FindOccurs BIRD "%str2%" occurs olist

:end
endlocal & exit /b 0

:FindOccurs Substring String NoOfOccurs IndexListOfOccurs
setlocal
set "sub=%~1" & set "str=%~2"
set /a count=0 & set "list= "

set "sub=%sub:"=%"
if "%sub%" neq "%~1" (
echo(quotes not permitted in substring 1>&2
set /a count=-1,list=-1 & goto skip)
set "str=%str:"=_%"

for /f "skip=1 delims=:" %%i in ('call "%~f0" /re-enter :InsertLF') ^
do (set /a count+=1,current=%%i-count-1
call set "list=%%list%%%%current%% ")
if %count%==0 (set /a list=-1) else set "list=%list:~1,-1%"
echo(%count%: %list%

:skip
endlocal & set /a %~3=%count% & set "%~4=%list%" & goto :EOF

:InsertLF
setlocal enabledelayedexpansion
set ^"str=!str:%sub%=^

%sub%!"
(cmd /v:on /c echo("!str!")| findstr /o ^^
endlocal & goto :EOF


As you can see, I had to resort to chicanery to make it work. I inserted an Lf char before every occurrence of the substring and piped it to findstr. I couldn't get it to work inside the in (...) clause of the for loop in FindOccurs. If anyone out there can show me how, I'd appreciate it.

My next project is to adapt the code listed above using Dave Benham's brute force technique as a template.
Last edited by Sponge Belly on 06 Oct 2012 06:57, edited 1 time in total.

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Another take on safe string replace

#11 Post by Liviu » 04 Oct 2012 19:11

Sponge Belly wrote:Sorry for the late arrival. What did I miss?

Didn't miss much, everything is saved in the forum archives. Now, you are excused until after you browse all those thousands topics ;-)

Sponge Belly wrote:Anyways, I've cranked out a little program similar to Liviu's. Only my effort counts the occurrences of the substring in the source string and creates a 0-based index list of the position of each occurrence.

That's an interesting topic, and replacing and finding are related, yet not quite the same. Consider for example the string "ahaha". Replacing "aha" with another string would make one substitution. However, finding "aha" might return 2 occurrences - depending on one's search rules.

Given the differences, maybe your topic of finding and indexing the occurrences could be better served by starting a dedicated thread.

Sponge Belly wrote:My next project is to adapt the code listed above using Dave Benham's brute force technique as a template.

As a side note, posted code looks and works better inside [code] tags.

Back to your code, can't say I followed it closely, but it needs more work to make it immune to the usual problem characters (as posted, substrings of ! * ~ & cause it to fail). Also, I'd think a more "brute force" approach might be competitive vs. shelling and findstr.

Liviu

Sponge Belly
Posts: 234
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: Another take on safe string replace

#12 Post by Sponge Belly » 06 Oct 2012 07:18

:oops: Classic newbie error. But when I went back to add the code tags, someone had beaten me to it. Thanks, whoever you are!

Anyways, I wrote the program to see if I could exploit findstr /o ^^. I acknowledge that it is grossly inefficient, case insensitive, and that it will choke on ~, =, and *. But the code you posted at the start of this thread has the same limitations, Liviu!

Take your point about starting a new thread, though. And if I ever get around to writing a brute force version, I'll do just that.

PS: Argh! All this talk of lazy versus eager matching brought back painful memories of reading Jeffrey Friedl's book on Regular Expressions. Mind-bending stuff. You're never quite the same after reading that book, be warned!

But my point is, if the search string is "aaaa" and the substring is "aa", should the substring match 2 or 3 times (lazy or eager)? Well, if you move the substring along by the length of the substring each time you find a match, it'll be lazy. If you move the substring along by one character every time regardless, it'll be eager.

Looks like we have another parameter for the subroutine. How many does that make, 5 or 6? Hmm, at what point does "freeping creaturism" set in?

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Another take on safe string replace

#13 Post by Liviu » 06 Oct 2012 13:35

Sponge Belly wrote:I acknowledge that it is grossly inefficient, case insensitive, and that it will choke on ~, =, and *. But the code you posted at the start of this thread has the same limitations, Liviu!

Inefficient and case insensitive, yes. But, no, my code doesn't choke on poison characters. That's the whole point of the "safe" in the topic title, and the convolutions in the code. If you find an error case then please post the details.

Sponge Belly wrote:Looks like we have another parameter for the subroutine. How many does that make, 5 or 6?

I don't think literal search/replace routines need additional parameters. In the given example "ahaha", "aha" search should return two hits at offsets 0, 2, and "aha" replace should do one replacement at offset 0. If you are thinking at full regular expressions, which is a wholly different topic, then there are indeed at least 2 independent parameters (first begin/end, and shortest/longest) which yield 4 distinct combinations (described in some detail for example at http://www.lugaru.com/man/Searching.Rules.html for one particular dialect of regex).

Liviu

Sponge Belly
Posts: 234
Joined: 01 Oct 2012 13:32
Location: Ireland
Contact:

Re: Another take on safe string replace

#14 Post by Sponge Belly » 23 Nov 2012 04:43

Thanks for enlightening me, Liviu!

And apologies for saying your code would choke on control characters. I don't know why I thought that. Maybe if I read first and typed later...

carlos
Expert
Posts: 503
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: Another take on safe string replace

#15 Post by carlos » 23 Nov 2012 22:45

Liviu. I would like understand the method. Please you can explain me? Maybe It helpme for maybe improve the codification.
I improve the codification of nice strlen function. Because it works with numeric constants, I remove the mathematic calculation of the constants. Also I add the return of the len in the errorlevel.

Code: Select all

:strlen
(SETLOCAL ENABLEDELAYEDEXPANSION &set "len=0" &set "str=A!%~1!"
for %%A in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
set /a "len|=%%A"
for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~%%A"
))
(ENDLOCAL &IF "%~2" NEQ "" SET "%~2=%len%"
EXIT /b %len%)


Post Reply