Page 2 of 2

Re: strLen boosted

Posted: 14 Jan 2011 06:17
by jeb
Nice,

but I see one thing to optimize.
The appending of the "gauging"-helper can be done without a FOR-LOOP.
Not very much, but if you do some million tests ...

Code: Select all

:strLen string len -- returns the length of a string
(   SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for /L %%A in (12,-1,8) do (
        set /a "len|=1<<%%A"
        for %%B in (!len!) do if "!str:~%%B,1!"=="" set /a "len&=~1<<%%A"
    )
)
REM ##### HERE IS THE DIFFERENCE
(
    set str=!str:~%len%,-1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a len+=0x!str:~0x1FF,1!!str:~0xFF,1!
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
)
EXIT /b

Re: strLen boosted

Posted: 16 Jan 2011 17:03
by sowgtsoi
Ooops, apparently the word 'gauge' is a false friend ; it appears that I should have used the word 'dipstick' to convey what I had in mind ; I've corrected it in the preceding posts.
(By the way, I know of no preexisting name for the technique you employed and it reminds me of when one has to gauge the oil level of a car by dipping a fixed length stick in the liquid and directly assessing the result, hence this choice when I had to name it).



Well .. we have a case here were differing amounts of comments lead to opposite conclusions when the speed is tested :
- your code -- as-is (minus the separating comment) -- is faster in all cases (at a good 95% of the previous time),
- your code -- prepared for a possible inclusion (changed date, site URL) -- is slower in all cases (at a small 105%).

The second case is more rigorous (same amounts of comments in the functions compared) but that it's not faster is indeed unintuitive considering the results of this test :

test_fornofor.cmd

Code: Select all

@echo off
setlocal enabledelayedexpansion
set a=0
echo:Testing ... (takes a few seconds)

call :getTod tcs_1
for /L %%i in (1,1,100000) do ( for %%j in (!a!) do set a=%%j )
call :getTod tcs_2
for /L %%i in (1,1,100000) do ( set a=%a% )
call :getTod tcs_3

set /a   "tcs_for=tcs_2-tcs_1" & if   !tcs_for! LSS 0 set /a   tcs_for+=8640000
set /a "tcs_nofor=tcs_3-tcs_2" & if !tcs_nofor! LSS 0 set /a tcs_nofor+=8640000

echo:   With a 'for' : %tcs_for%cs.
echo:Without a 'for' : %tcs_nofor%cs. (should be faster)

endlocal
pause
@echo on
@exit /b


:: less locale dependent version of getTod
:: adapted from http://www.dostips.com/DtCodeFunctions.php#_Toc128586395
::
:getTod -- get a Time of Day value in 1/100th seconds
::   -- %~1: out - time of day
SETLOCAL
set t=%time: =0%
set /a t=((1%t:~0,2%*60+1%t:~3,2%)*60+1%t:~6,2%)*100+1%t:~9,2%-36610100
( ENDLOCAL & REM RETURN VALUES
    IF "%~1" NEQ "" SET %~1=%t%
)
GOTO:EOF

Re: strLen boosted

Posted: 26 Jan 2011 16:56
by jeb
Ok, +-5% is not really important.

Therefore I build a much better variant, with only one FOR-LOOP :wink:
And in your test suite it is always faster.
To better compare the variants, I removed the comment blocks,
and move the EXIT /b always into the last block

strLenN_binarySplit.bat

Code: Select all

:strLen string len -- returns the length of a string via binary search
(   
   setlocal EnableDelayedExpansion
    set "s=!%~1!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
      if "!s:~%%P,1!" NEQ "" (
         set /a "len+=%%P"
         set "s=!s:~%%P!"
      )
   )
)
(
   endlocal
    set "%~2=%len%"
   exit /b
)
::End of function


It tests like the normal binary test, but in the positive case the string is reduced by the new value.

btw. (with or without the ,1)

Code: Select all

if "!str:~%%P,1!" NEQ ""
if "!str:~%%P!" NEQ ""
seems to be nearly of the same speed, even if str is long. :o

hope it helps
jeb

Re: strLen boosted

Posted: 23 Feb 2011 16:52
by sowgtsoi
(tests' sources in the next post)

(Using «,1» alleviates the need to worry about the command line limit. One less thing to think about ? No impact on speed !? I like !)


Wow, astute ! 'strLenN_binarySplit'(.txt) manages to complete in a good 90% of the time taken by the yet already refined 'strLenL_dipstick' !

I've retrofitted 'strLenL_dipstick' with your enhancements to produce 'strLenO_dipstick_powers' which in turn completes in :
- a small 95% of the time taken by 'strLenN_binarySplit',
- a good 85% of the time taken by 'strLenL_dipstick',
- a small 75% of the time taken by 'strLenJ_20101116', the current version of «:strLen».

All these improvements lead me to propose this function for inclusion :

Code: Select all

:strLen string len -- returns the length of a string
::                 -- string [in]  - variable name containing the string being measured for length
::                 -- len    [out] - variable to be used to return the string length
:: Many thanks to 'sowgtsoi', but also 'jeb' and 'amel27' dostips forum users helped making this short and efficient
:$created 20081122 :$changed 20110220 :$categories StringOperation
:$source http://www.dostips.com
( SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"&rem keep the A up front to ensure we get the length and not the upper bound
                     rem it also avoids trouble in case of empty string
    set "len=0"
    for %%P in (4096 2048 1024 512 256) do (
        if "!str:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "str=!str:~%%P!"
        )
    )
    set str=!str:~1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a "len+=0x!str:~0x1FF,1!!str:~0xFF,1!"
)
( ENDLOCAL & REM RETURN VALUES
    IF "%~2" NEQ "" SET /a %~2=%len%
    EXIT /b
)
:mrgreen: .. the which function is now three times faster than the original 'strLenA_loopUnrolled' !


Thanks for your insights !


Edits : grammar.

Re: strLen boosted

Posted: 23 Feb 2011 16:56
by sowgtsoi
To test on an XP system or higher, copy the following files in the dedicated folder described in the second post of this thread ("strLen_tests_stub.txt", "strLenA_loopUnrolled.txt", "strLenJ_20101116.txt" and "strLenL_dipstick.txt" should already be present too).

Notes :
| I've realized that using the variable name "s" was causing a problem here.
| In "strLen_tests_stub.txt", «s» also appears in the body of the testing function that calls «:strLen».
| As far as I can tell, the interpreter is not confused by variable shadowing ; things proceed correctly.
| What escapes me is that the homonymy results in a slight speed boost.
| The problem is that it will have no particular reason to happen in the wild.
| (*cough* 'strLenI_chunks' *cough*).
| This slight boost is significant in that it's of the same magnitude as the improvements made at this stage of «:strLen»'s streamlining.
| In some sense it's good news : this sensitivity clues that «:strLen» is now only a tad removed from perfection !
|
| For the sake of accuracy, the last functions are tested "all things otherwise equal" : same amounts of comments (none), same variables where relevant, etc.



strLen_even_more_tests.cmd

Code: Select all

@echo off
:: 2011-01-08
SETLOCAL ENABLEDELAYEDEXPANSION

:: always working in the directory of this script :
%~d0 & cd "%~dp0"

:: setup :
if not exist logs\ mkdir logs
if exist tmp\ rmdir /S /Q tmp
mkdir tmp
for %%v in (
    A_loopUnrolled
    J_20101116
    L_dipstick
    N_binarySplit
    O_dipstick_powers
    ) do (
    type "strLen_tests_stub.txt" "strLen%%v.txt" > "tmp\test_strLen%%v.cmd"
) 1>nul 2>nul

:: syntactic sugar :
set      "test= call ^"tmp\test_strLen^^!version^^!.cmd^" "
set   "do_test= "
set "skip_test= if 0==1 "


%do_test% (echo:&echo:&ver&echo:Unit test :
    set "version=A_loopUnrolled"   &%test% unittest
    set "version=J_20101116"       &%test% unittest
    set "version=L_dipstick"       &%test% unittest
    set "version=N_binarySplit"    &%test% unittest
    set "version=O_dipstick_powers"&%test% unittest
) >con
::>"logs\strLen_unittest.log"
::>con


%skip_test% (echo:&echo:&ver&echo:Points of interest :
    set "version=A_loopUnrolled"   &%test% correctness    0_start 8_span
    set "version=J_20101116"       &%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
    set "version=L_dipstick"       &%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
    set "version=N_binarySplit"    &%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
    set "version=O_dipstick_powers"&%test% correctness    0_start 8_span
                                    %test% correctness 1020_start 8_span
                                    %test% correctness 7164_start 8_span
) >con
::>"logs\strLen_correctness_partial.log"
::>con


%do_test% (echo:&echo:&ver&echo:Limits :
    set "version=A_loopUnrolled"   &%test% correctness 1020_start 8_span
    set "version=J_20101116"       &%test% correctness 8183_start 8_span
    set "version=L_dipstick"       &%test% correctness 8183_start 8_span
    set "version=N_binarySplit"    &%test% correctness 8183_start 8_span
    set "version=O_dipstick_powers"&%test% correctness 8183_start 8_span
) >con
::>"logs\strLen_correctness_limits.log"
::>con


:: with all versions, takes 3 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&ver&echo:Comprehensive testing :
    set "version=A_loopUnrolled"   &%test% correctness 0_start 1030_span
    set "version=J_20101116"       &%test% correctness 0_start 8200_span
    set "version=L_dipstick"       &%test% correctness 0_start 8200_span
    set "version=N_binarySplit"    &%test% correctness 0_start 8200_span
    set "version=O_dipstick_powers"&%test% correctness 0_start 8200_span
) >"logs\strLen_correctness_full_test.log"
::>"logs\strLen_correctness_full_test.log"
::>con


:: with all versions, takes 6 good minutes at 1.7 GHz :
%skip_test% (echo:&echo:&echo:&ver&echo:Speed comparisons :>con
    echo:&set "version=A_loopUnrolled"   &%test% speed    0_start  250_span 8_times
    rem "warm up" done.
                                          %test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
    echo:&set "version=J_20101116"       &%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    echo:&set "version=L_dipstick"       &%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    echo:&set "version=N_binarySplit"    &%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    echo:&set "version=O_dipstick_powers"&%test% speed    0_start  250_span 8_times
                                          %test% speed  250_start  250_span 8_times
                                          %test% speed  500_start  250_span 8_times
                                          %test% speed  750_start  250_span 8_times
                                          %test% speed 1000_start 1000_span 2_times
                                          %test% speed 2000_start 1000_span 2_times
                                          %test% speed 7000_start 1000_span 2_times
    rem to check consistency, the first test is repeated :
    echo:&set "version=A_loopUnrolled"   &%test% speed    0_start  250_span 8_times
) >>"logs\strLen_speed_comparisons.log"
::>>"logs\strLen_speed_comparisons.log"
::>con


:: cleaning :
if exist tmp\ rmdir /S /Q tmp
echo:
echo:
echo:strLen tests : completed.
pause
ENDLOCAL
@echo on
@EXIT /b



strLenN_binarySplit.txt

Code: Select all

:strLen
( SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=!%~1!#"
    set "len=0"
    for %%P in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
        if "!str:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "str=!str:~%%P!"
        )
    )
)
( ENDLOCAL
    IF "%~2" NEQ "" SET /a %~2=%len%
    EXIT /b
)



strLenO_dipstick_powers.txt

Code: Select all

:strLen
( SETLOCAL ENABLEDELAYEDEXPANSION
    set "str=A!%~1!"
    set "len=0"
    for %%P in (4096 2048 1024 512 256) do (
        if "!str:~%%P,1!" NEQ "" (
            set /a "len+=%%P"
            set "str=!str:~%%P!"
        )
    )
    set str=!str:~1!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000
    set /a "len+=0x!str:~0x1FF,1!!str:~0xFF,1!"
)
( ENDLOCAL
    IF "%~2" NEQ "" SET /a %~2=%len%
    EXIT /b
)



And that's it !

Re: strLen boosted

Posted: 05 Apr 2011 12:35
by plp626
very nice!

the list
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
........
3333333333333333222222222222222211111111111111110000000000000000

it's a bit long. i replace it by FEDCBA9876543210 and get this code, it's shorter and efficient.


Code: Select all

:strlen
setlocal enabledelayedexpansion
set "$=!%~1!#"
set N=&for %%a in (4096 2048 1024 512 256 128 64 32 16)do if !$:~%%a^,1!. NEQ . set/aN+=%%a&set $=!$:~%%a!
set $=!$!fedcba9876543210&set/aN+=0x!$:~16,1!

endlocal&If %2. neq . (set/a%2=%N%)else echo %N%

Re: strLen boosted

Posted: 20 Feb 2013 03:19
by Queue
After searching for strlen threads, this one seems the most appropriate place to put this. Building on all previous work in this thread (so very little here is my work), here's my take on strlen:

Code: Select all

:strlen
(   setlocal enabledelayedexpansion & set /a "}=0"
    if defined %~1 (
        for %%# in (4096 2048 1024 512 256 128 64 32 16) do (
            if "!%~1:~%%#,1!" neq "" set "%~1=!%~1:~%%#!" & set /a "}+=%%#"
        )
        set "%~1=!%~1!10000000000000000FEDCBA987654321" & set /a "}+=0x!%~1:~16,1!!%~1:~32,1!"
    )
)
endlocal & if "%~2" neq "" set /a "%~2=%}%" & exit /b

I apologize for it still being a little scrunched up but it should be somewhat readable.

Via tests on my computer, this came out as 1% to 17% (for short to long strings) faster than strLenO_dipstick_powers which was the previous fastest when I tested everything in this thread. For short strings, the difference may fall to random variation and timing inaccuracy, but there were nice savings in speed with massive strings.

Since we're using setlocal anyway, I don't copy the string to a new env var and instead just use the setlocal copy directly which saves some time; it's as much as ~17% faster for a maximum length string (on my computer). Since I'm not sticking a safety character onto the front or back of the string, the ''dipstick'' has to be slightly bulkier to account for a potential 16 character leftover and I use an if defined to check for a zero length string.

I didn't get speed differences between if not a==b and if a neq b so I left it as neq based on precedent set in this thread.

Almost all of the speed improvement came from the lack of the intial string copy, so this could surely be improved upon; coping with the loss of the safety character is primarily what needs to be worked around to improve it.

Edit - Yes, it's obvious that the if defined will have bad behavior if the call sends an empty first argument. Throw a if "%~1" neq "" before the if defined %~1 for safety if it's a concern. Hm, thinking on it, a pre-setlocal abort if argument 1 is empty or the var isn't defined might be worth it (Edit 3 - It's not worth it; another block of code that has to be parsed separately slows it down too much. Now I see why things turned out the way they did. -_-).

Edit 2 - There are some minor structural flaws in the return that would also choke on bad arguments. Guess this needs more work. Regardless, I think there's merit in not initially copying the string data.

Edit 4 - Ok, maybe this instead:

Code: Select all

:strlen
(   setlocal enabledelayedexpansion & set /a "}=0"
    if "%~1" neq "" if defined %~1 (
        for %%# in (4096 2048 1024 512 256 128 64 32 16) do (
            if "!%~1:~%%#,1!" neq "" set "%~1=!%~1:~%%#!" & set /a "}+=%%#"
        )
        set "%~1=!%~1!0FEDCBA9876543211" & set /a "}+=0x!%~1:~32,1!!%~1:~16,1!"
    )
)
endlocal & set /a "%~2=%}%" & exit /b
Set actually gives us good feedback in the case of an empty %2. It honestly seems like a waste to sanity check it. I'd also like to note this function doesn't spit out a garbage return (often 4100 in previous implementations) on a failed env var creation (due to oversized input env vars) and can list string length up to 8189 (and reports 8189 for any string longer than 8189, at least in the test framework). Oh, and don't pass this function cmdcmdline without first ''stabilizing'' it (set cmdcmdline=%cmdcmdline%). %cmdcmdline% is actually the reason I started looking at a strlen function.

Queue

Re: strLen boosted

Posted: 20 Feb 2013 17:12
by Liviu
Queue wrote:Regardless, I think there's merit in not initially copying the string data [...] and can list string length up to 8189 (and reports 8189 for any string longer than 8189, at least in the test framework)

That's good thinking, and the accurate count all the way up is a plus, too.

Since this is a subject that never seems to grow old ;-) here is 2 more cents on it. I believe most algorithms are covered between this thread and Dave's roundup at http://ss64.org/viewtopic.php?pid=6478#p6478. However, there is one less obvious difference between them, worth IMHO noting in the context of raw performance.

A couple of those algorithms are easily amenable to work without temporary variables. This is significant because it means that - when called from code that has enableDelayedExpansion already - the function does not in fact need its own setlocal block. While it's normally assumed (and true) that the penalty of a nested setlocal is minimal, it still is an overhead, and can become measurable in cases of large environments - discussed for example at http://www.dostips.com/forum/viewtopic.php?f=3&t=2597.

When enableDelayedExpansion can be pre-assumed, these variations on the known themes could be competitive.

Code: Select all

:strlen4.edx  StrVar  RtnVar
set /a "%~2=%random%"
echo(!%~1!>"%temp%\strlen!%~2!.tmp"
for %%F in ("%temp%\strlen!%~2!.tmp") do (
  del "%temp%\strlen!%~2!.tmp"
  set /a "%~2 = %%~zF - 2"
)
exit /b

:strlen1.edx  StrVar  RtnVar
@rem use 'if "!%~1" == ""' if StrVar might contain spaces
set /a "%~2 = 0" & if not defined %~1 exit /b
for %%A in (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) do (
  set /a "%~2 |= %%A"
  for %%B in (!%~2!) do if "!%~1:~%%B,1!" == "" set /a "%~2 &= ~%%A"
)
set /a "%~2 += 1" & exit /b

Liviu

Re: strLen boosted

Posted: 20 Feb 2013 21:08
by Queue
Wow, if I point one of the temp file variants at my RAM drive, it is hilariously fast; I don't think any of the in-batch processing variants can beat that.

Very good point though: when we can expect delayedexpansion to be enabled, a function that works on the string directly, non-destructively, gets a big leg up. The binary search should give consistent results regardless of how polluted the env var space is. The temp file variant should as well, but only consistent to a particular drive (and to some tiny extent the file system); I'm sure that's one of the common arguments against it, though not much different than the performance differences between processors for the temp-file-less variants. They both were certainly quick when I tested them; my temp folder isn't on the speediest drive, so it was a bit slower than any of the other fast variants, but not terrible.

Thanks for the ss64 link. This is fun stuff.

Edit - Why did it become the norm to set a var with the resultant length? It seems like all of the functions use exit /b, so why not just return the length as the errorlevel?

Re: strLen boosted

Posted: 20 Feb 2013 23:10
by Liviu
Queue wrote:Why did it become the norm to set a var with the resultant length? It seems like all of the functions use exit /b, so why not just return the length as the errorlevel?

My guess is convenience. Unless the %errorlevel% were to be used right away, it would then take one extra step in the caller code to save it into a persistent variable.

Also, there is the convention to use %errorlevel% as a success/fail indicator. A fully error-checked :strlen4.edx could be written as...

Code: Select all

:strlen4.edx  StrVar  RtnVar  --  be sure to check if the returned errorlevel is 0
if "!" neq "" exit /b 1
set /a "%~2=%random%" || exit /b 2
echo(!%~1!>"%temp%\strlen!%~2!.tmp"|| exit /b 3
for %%F in ("%temp%\strlen!%~2!.tmp") do (
  del "%temp%\strlen!%~2!.tmp"
  set /a "%~2 = %%~zF - 2" && exit /b 0
)
exit /b /4

P.S. Or, just as defensive programming against some doofus having "set errorlevel=123" before "call' ;-)