strLen boosted

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#76 Post by aGerman » 13 Aug 2025 12:20

This is so impressive :!:

BTW Was this known to work?

Code: Select all

if not defined %%1 (if %%? lss ' endlocal)^&(set /a %%2=0) else^
This would have been my expectation:

Code: Select all

if not defined %%1 ((if %%? lss ' endlocal)^&set /a %%2=0) else^

pieh-ejdsch
Posts: 259
Joined: 04 Mar 2014 11:14
Location: germany

Re: strLen boosted

#77 Post by pieh-ejdsch » 13 Aug 2025 15:12

I saw that in Francesco's code.
Then I tried it out.

IF (a TRUE condition) & some more code that should also be fulfilled if TRUE & then something else that should be done if TRUE & (then the last little thing for TRUE) ELSE now the code that comes up if FALSE & something else that comes up if FALSE

I didn't want to flip the logic around,
but with

Code: Select all

(if a==a call) && (if b==a call) && ... 
I could use the IF condition at the end without the entire bracket and continue with a simple &.
So the same bracket logic as with

Code: Select all

(if ... (..)&..&(..) else ..)
.
Since the IF DEFINED always slowed down the test code when it was removed,
and even when a bracket was placed in front of the if,
I tried it with this chaining (call)&&.
All attempts always took longer to run.
I also had a (call) after the last FOR, but the chaining became slower.
So the else had to be moved to the front to make it run better.

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#78 Post by aGerman » 13 Aug 2025 16:19

I'm still trying to address other given feedback without performance impact.

Code: Select all

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:initStrLen
:: Computes the number of resulting UTF-16 code units in a string.
:: %strLen% str [len]
::   str - [ByRef In] Name of the variable containing the string to be measured.
::   len - [ByRef Out, Optional] Name of the variable that receives the measured
::         length. If omitted, the result is assigned to variable len.
::   Variable names must be passed unquoted.
:: Strings of up to 8191 characters are supported.
      %== ! -> exclamation mark, # -> caret ==%     FOR /F "TOKENS=1-3" %%! IN (
                                                        "! ! ^ ^^^! . ^!=^!^^^^"
                                         ) DO FOR %%H IN (FEDCBA9876543210) DO ^
set strlen=^
%==% for /f %%? in ("%%! '") do for %%. in (1 2) do if %%.==2 (^
%=  =% for /f "tokens=1,2" %%1 in ("%%!$args%%! len") do^
%=  =% if not defined %%1 (if %%? lss ' endlocal)^&(set /a %%2=0) else^
%=  =% (if : neq :%%!%%1:~4095%%! (set $=1%%!%%1:~4096%%!)^
%=   =% else set $=0%%!%%1%%!)^&^
%=  =% set $Scale=^
%=   =%%%!$:~256%%#,1%%!%%!$:~512%%#,1%%!%%!$:~768%%#,1%%!%%!$:~1024%%#,1%%!^
%=   =%%%!$:~1280%%#,1%%!%%!$:~1536%%#,1%%!%%!$:~1792%%#,1%%!%%!$:~2048%%#,1%%!^
%=   =%%%!$:~2304%%#,1%%!%%!$:~2560%%#,1%%!%%!$:~2816%%#,1%%!%%!$:~3072%%#,1%%!^
%=   =%%%!$:~3328%%#,1%%!%%!$:~3584%%#,1%%!%%!$:~3840%%#,1%%!%%H^&^
%=  =% for %%_ in (%%!$Scale:~15%%#,1%%!) do set $=%%!$:~%%#,1%%!^
%=   =%%%!$:~0x%%_00%%!%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H^
%=   =%FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
%=   =%BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
%=   =%7777777777777777666666666666666655555555555555554444444444444444^
%=   =%3333333333333333222222222222222211111111111111110000000000000000^&^
%=  =% for %%- in (%%!$:~%%#,1%%!%%_%%!$:~513%%#,1%%!%%!$:~257%%#,1%%!) do^
%=  =% (if %%? lss ' endlocal)^&set /a %%2=0x%%-^
%==% ) else (if %%? gtr ' setlocal enabledelayedexpansion)^&set $args=
goto :eof
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Four-digit hex value evaluated, like 0xWXYZ with ...
W - !$:~,1! = multiples of 4096 [0,1]
X - %%_ = !$Scale:~15,1! = multiples of 256 [0..F]
Y - !$:~513,1! = multiples of 16 [0..F]
Z - !$:~257,1! = multiples of 1 [0..F] ([1..F] if WXY is 000, since empty strings are excluded early)

pieh-ejdsch
Posts: 259
Joined: 04 Mar 2014 11:14
Location: germany

Re: strLen boosted

#79 Post by pieh-ejdsch » 16 Aug 2025 14:42

An if else statement for measuring length was replaced with a for loop,
resulting in a tiny improvement of 3...4 percent.
Other versions are also available in the test code #65.
Since these are only bloated and do not achieve any performance gains,
I did not use them.

Code: Select all

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:initStrLenAll
:: Computes the number of bytes in a string.
:: %strLen% str len
::   str - [ByRef In] Name of the variable containing the string to be measured.
::   len - [ByRef Out] Name of the variable that receives the measured length.
:: Strings of up to 8191 characters are supported.
    %== ! -> exclamation mark, # -> caret ==%     @FOR /F "tokens=1-3" %%! IN (
                                              "! ! ^ ^^^! . ^^^^") DO @^
set strLen=@^
for /f %%? in ("%%! '") do @for %%. in (1 2) do @if %%.==2 ^
 for /f tokens%%#=1-2 %%1 in ("%%!$args%%! len") do^
 @if not defined %%1 (if %%? lss ' endlocal)^&(set /a %%2=0) else^
 for /f tokens%%#=4delims%%#=%%#  %%4 in^
 ("x %%!%%1:~4095,1%%! x%%!%%1:~4095,1%%!x 1 0") do @set $=A%%!%%1:~0x%%4000%%!^&^
 set $Scale=^
%%!$:~256%%#,1%%!%%!$:~512%%#,1%%!%%!$:~768%%#,1%%!%%!$:~1024%%#,1%%!%%!$:~1280%%#,1%%!^
%%!$:~1536%%#,1%%!%%!$:~1792%%#,1%%!%%!$:~2048%%#,1%%!%%!$:~2304%%#,1%%!%%!$:~2560%%#,1%%!^
%%!$:~2816%%#,1%%!%%!$:~3072%%#,1%%!%%!$:~3328%%#,1%%!%%!$:~3584%%#,1%%!%%!$:~3840%%#,1%%!^
FEDCBA9876543210^&^
 for %%3 in (%%!$Scale:~15%%#,1%%!) do @set $=%%!$:~0x%%300%%!^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210FEDCBA9876543210^
FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
7777777777777777666666666666666655555555555555554444444444444444^
3333333333333333222222222222222211111111111111110000000000000000^&^
 for %%$ in (%%4%%3%%!$:~512%%#,1%%!%%!$:~256%%#,1%%!) do^
 @(if %%? lss ' endlocal)^&(set /A %%2=0x%%$^
 ) else (if %%? GTR ' setlocal enabledelayedexpansion)^&set $args=
@goto :eof

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#80 Post by aGerman » 17 Aug 2025 04:04

Hmm, the new code performs slightly worse in my tests. Expanding !%%1:~4095,1! twice might be the reason, although it's a clever way to handle a space at this position.

pieh-ejdsch
Posts: 259
Joined: 04 Mar 2014 11:14
Location: germany

Re: strLen boosted

#81 Post by pieh-ejdsch » 18 Aug 2025 13:16

A version that uses octal numbers.
It's really easy to work with because fewer for variables are needed—four instead of eight.
This also made the code easier to write (don't think it was easy for me...).
So the script is shorter too.
The shift in the first for loop for using the four different variables is, of course, optimal.
IcarusLives uses something like this all the time – I just hadn't quite understood how it works.
The two for loops save one variable creation and only ten variable queries are made instead of seventeen.

Code: Select all

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:initStrLen2v
:: Computes the number of bytes in a string.
:: %strLen% str len
::   str - [ByRef In] Name of the variable containing the string to be measured.
::   len - [ByRef Out] Name of the variable that receives the measured length.
:: Strings of up to 8191 characters are supported.
FOR /F %%! IN  ("! ! ^ ^^^^^^^^^^^!") DO set i=%%!%%1:~03777,1%%!^
 x%%!%%1:~03777,1%%!x&set o=%%!%%1:~0%%4777,1%%! x%%!%%1:~0%%4777,1%%!x^
 %%!%%1:~0%%4377,1%%! x%%!%%1:~0%%4377,1%%!x
set F=FEDCBA9876543210&set $=0000000000000000
   %== ! -> exclamation mark, # -> caret ==%    @FOR /F "tokens=1-3" %%! IN (
                                         "! ! ^ ^^^! . ^^^^") DO ^
set strLen=^
for /f %%? in ("%%! '") do for %%. in (1 2) do if %%.==2 ^
 for /f tokens%%#=1-2 %%1 in ("%%!$args%%! len") do^
 if not defined %%1 (if %%? lss ' endlocal)^&(set /A %%2=0) else^
 for /f "tokens=7,11,15,19delims= " %%4 in (^"^
 %i% %i:3=7% %i:3=13% 17 13 7 3  16 12 6 2  15 11 5 1  14 10 4 0^") do^
 for /f tokens%%#=15delims%%#=%%#  %%3 in (^" %o:4=7% %o:4=6% %o:4=5% %o:*x =%^
 %%44 %%40 %%54 %%50 %%64 %%60 %%74 %%70^") do^
 set $=%%!%%1:~0%%300%%!%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%%F%^
%$:0=F%%$:0=E%%$:0=D%%$:0=C%%$:0=B%%$:0=A%%$:0=9%%$:0=8%^
%$:0=7%%$:0=6%%$:0=5%%$:0=4%%$:0=3%%$:0=2%%$:0=1%%$%^&^
 for %%$ in (0%%300+0x0%%!$:~511%%#,1%%!%%!$:~255%%#,1%%!) do^
 (if %%? lss ' endlocal)^&(set /A %%2=%%$^
 ) else (if %%? GTR ' setlocal enabledelayedexpansion)^&set $args=
for %%i in (i o F $) do set "%%i="
@goto :eof
Test data output

Code: Select all

Check length str1 gtr 8190 X

~~~~~~~~~~~~~~~~~~~~
"2v           TEST"
Delayed ON
Functional test 0 ... 8189 + 8191 with Delayedexpansion ON (create macro)
04.93
04.48
04.48
Delayed OFF
Functional test 0 ... 8189 + 8191 with Delayedexpansion OFF (create macro)
04.50
04.50
04.50

~~~~~~~~~~~~~~~~~~~~
"2w           TEST"
Delayed ON
Functional test 0 ... 8189 + 8191 with Delayedexpansion ON (create macro)
05.05
05.05
05.05
Delayed OFF
Functional test 0 ... 8189 + 8191 with Delayedexpansion OFF (create macro)
05.03
05.04
05.03

~~~~~~~~~~~~~~~~~~~~
"2x           TEST"
Delayed ON
Functional test 0 ... 8189 + 8191 with Delayedexpansion ON (create macro)
05.25
05.25
05.27
Delayed OFF
Functional test 0 ... 8189 + 8191 with Delayedexpansion OFF (create macro)
05.25
05.23
05.24

~~~~~~~~~~~~~~~~~~~~
"2y           TEST"
Delayed ON
Functional test 0 ... 8189 + 8191 with Delayedexpansion ON (create macro)
05.50
05.49
05.50
Delayed OFF
Functional test 0 ... 8189 + 8191 with Delayedexpansion OFF (create macro)
05.39
05.42
05.42

~~~~~~~~~~~~~~~~~~~~
"All      Batch and CMDline"
Delayed ON
Functional test 0 ... 8189 + 8191 with Delayedexpansion ON (create macro)
04.55
04.55
04.56
Delayed OFF
Functional test 0 ... 8189 + 8191 with Delayedexpansion OFF (create macro)
04.56
04.56
04.58
done
Delayed OFF
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
I  x8400 I   all  I   2v   I   2w   I   2x   I   2y   I   2a   I   2b   I   all  I
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
I  8191  I 02.92  I 01.99  I 02.02  I 02.00  I 02.08  I 02.14  I 02.19  I 02.01  I
I  0     I 02.95  I 02.00  I 02.04  I 02.07  I 02.07  I 02.16  I 02.19  I 02.01  I
I  6000  I 03.10  I 02.11  I 02.18  I 02.17  I 02.20  I 02.27  I 02.31  I 02.12  I
I  4000  I 03.04  I 02.06  I 02.13  I 02.14  I 02.17  I 02.24  I 02.25  I 02.09  I
I  4500  I 03.08  I 02.11  I 02.13  I 02.14  I 02.17  I 02.23  I 02.28  I 02.11  I
I  1000  I 02.99  I 02.03  I 02.06  I 02.07  I 02.09  I 02.17  I 02.20  I 02.04  I
I  600   I 02.99  I 02.02  I 02.08  I 02.06  I 02.09  I 02.17  I 02.21  I 02.05  I
I  200   I 02.98  I 02.03  I 02.07  I 02.06  I 02.09  I 02.14  I 02.18  I 02.03  I
I  10    I 02.95  I 02.00  I 02.07  I 02.04  I 02.10  I 02.15  I 02.19  I 02.05  I
I  1     I 02.97  I 02.00  I 02.05  I 02.04  I 02.08  I 02.16  I 02.17  I 02.01  I
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
I  x8400 I   all  I   2v   I   2w   I   2x   I   2y   I   2a   I   2b   I   all  I
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
I  5412  I 03.10  I 02.10  I 02.15  I 02.14  I 02.21  I 02.26  I 02.28  I 02.10  I
I  8026  I 03.17  I 02.17  I 02.21  I 02.22  I 02.30  I 02.28  I 02.33  I 02.16  I
I  2194  I 03.01  I 02.03  I 02.10  I 02.10  I 02.13  I 02.19  I 02.22  I 02.04  I
I  4229  I 03.08  I 02.09  I 02.13  I 02.13  I 02.17  I 02.22  I 02.27  I 02.10  I
I  7246  I 03.13  I 02.15  I 02.17  I 02.20  I 02.21  I 02.30  I 02.33  I 02.16  I
I  690   I 02.99  I 02.03  I 02.06  I 02.06  I 02.11  I 02.14  I 02.20  I 02.03  I
I  5178  I 03.09  I 02.09  I 02.14  I 02.15  I 02.19  I 02.24  I 02.28  I 02.09  I
I  3009  I 03.02  I 02.07  I 02.09  I 02.08  I 02.13  I 02.21  I 02.25  I 02.08  I
I  4674  I 03.08  I 02.09  I 02.14  I 02.12  I 02.16  I 02.24  I 02.27  I 02.07  I
I  800   I 02.97  I 02.02  I 02.05  I 02.03  I 02.10  I 02.17  I 02.22  I 02.03  I
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
I  x8400 I   all  I   2v   I   2w   I   2x   I   2y   I   2a   I   2b   I   all  I
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
I  59    I 02.97  I 02.00  I 02.04  I 02.05  I 02.08  I 02.16  I 02.20  I 02.02  I
I  774   I 02.97  I 02.03  I 02.05  I 02.05  I 02.08  I 02.16  I 02.20  I 02.00  I
I  4284  I 03.06  I 02.08  I 02.14  I 02.14  I 02.16  I 02.24  I 02.26  I 02.08  I
I  6160  I 03.09  I 02.09  I 02.15  I 02.16  I 02.21  I 02.26  I 02.31  I 02.13  I
I  6320  I 03.11  I 02.13  I 02.16  I 02.17  I 02.20  I 02.26  I 02.30  I 02.11  I
I  6950  I 03.11  I 02.16  I 02.15  I 02.16  I 02.20  I 02.25  I 02.29  I 02.13  I
I  3621  I 03.03  I 02.11  I 02.13  I 02.11  I 02.16  I 02.20  I 02.21  I 02.08  I
I  4032  I 03.06  I 02.06  I 02.11  I 02.12  I 02.18  I 02.22  I 02.28  I 02.09  I
I  7713  I 03.15  I 02.14  I 02.19  I 02.18  I 02.23  I 02.25  I 02.29  I 02.14  I
I  4247  I 03.04  I 02.08  I 02.12  I 02.11  I 02.16  I 02.24  I 02.25  I 02.06  I
+--------+--------+--------+--------+--------+--------+--------+--------+--------+
Drücken Sie eine beliebige Taste . . .
It is, of course, astonishing that the functional test runs in less time than with the other versions.
That alone should say enough about the performance.

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#82 Post by aGerman » 19 Aug 2025 10:12

This is now unbelievably fast. Kudos!
However, I admit that I immediately changed the code to get rid of the variables i and o, as they seemed like code obfuscation :lol:

Talking of obfuscation - is there any good reason to prefer ...

Code: Select all

for /f tokens^=15delims^=^  %%3 ...
... to the more readable ...

Code: Select all

for /f "tokens=15 delims= " %%3 ...
I didn't notice any performance degradation.

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#83 Post by aGerman » 20 Aug 2025 15:30

pieh-ejdsch
What if we claim (or define) that control characters are not part of a string? Would that be reasonable?

Code: Select all

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:initStrLen
:: Computes the number of resulting UTF-16 code units in a string.
:: %strLen% str [len]
::   str - [ByRef In] Name of the variable containing the string to be measured.
::   len - [ByRef Out, Optional] Name of the variable that receives the measured
::         length. If omitted, the result is assigned to variable len.
::   Variable names must be passed unquoted.
:: Strings of up to 8191 characters are supported.
      %== ! -> exclamation mark, # -> caret ==%     FOR /F "TOKENS=1-3" %%! IN (
                                                        "! ! ^ ^^^! . ^!=^!^^^^"
      %== _ -> backspace (used as delimiter) ==%            ) DO FOR /F %%_ IN (
                                             '"PROMPT $H&FOR %%B IN (1) DO REM"'
                                         ) DO FOR %%H IN (FEDCBA9876543210) DO ^
set strLen=^
%==% for /f %%? in ("%%! '") do for %%. in (1 2) do if %%.==2^
%=  =% for /f "tokens=1,2" %%1 in ("%%!$args%%! len") do^
%=    =% if not defined %%1 ((if %%? lss ' endlocal)^&set /a %%2=0)^
%=    =% else for /f "tokens=4,8,12,16 delims=%%_" %%4 in (^"^
%=     =%%%!%%1:~2047,1%%!%%_%%!%%1:~4095,1%%!%%_%%!%%1:~6143,1%%!%%_^
%=     =%17%%_13%%_7%%_3%%_16%%_12%%_6%%_2%%_^
%=     =%15%%_11%%_5%%_1%%_14%%_10%%_4%%_0^"^
%=    =% ) do for /f "tokens=8 delims=%%_" %%3 in (^"^
%=     =%%%!%%1:~0%%7777,1%%!%%_%%!%%1:~0%%7377,1%%!%%_^
%=     =%%%!%%1:~0%%6777,1%%!%%_%%!%%1:~0%%6377,1%%!%%_^
%=     =%%%!%%1:~0%%5777,1%%!%%_%%!%%1:~0%%5377,1%%!%%_%%!%%1:~0%%4377,1%%!%%_^
%=     =%%%44%%_%%40%%_%%54%%_%%50%%_%%64%%_%%60%%_%%74%%_%%70^"^
%=    =% ) do set $=%%!%%1:~0%%300%%!^
%=     =%%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H^
%=     =%FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
%=     =%BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
%=     =%7777777777777777666666666666666655555555555555554444444444444444^
%=     =%3333333333333333222222222222222211111111111111110000000000000000^&^
%=    =% for %%- in (0%%300+0x0%%!$:~511%%#,1%%!%%!$:~255%%#,1%%!) do^
%=      =% ((if %%? lss ' endlocal)^&set /a %%2=%%-)^
%==% else (if %%? gtr ' setlocal enabledelayedexpansion)^&set $args=
goto :eof
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
This is essentially your latest code, just with another arrangement. The main difference is that the x!var:~N,1!x workaround is removed. Instead I used the backspace character as delimiter because it should never appear in a string and it is easy to generate. So, we halve the number of substring operations in this area from 20 to 10.

Steffen

jeb
Expert
Posts: 1064
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: strLen boosted

#84 Post by jeb » 22 Aug 2025 03:34

Hi,
aGerman wrote:
20 Aug 2025 15:30
What if we claim (or define) that control characters are not part of a string? Would that be reasonable?
IMHO a bad idea, a function should do what it claims.
And strlen claims to get the length of a string.
Else it should be named strlenButNotControlCharacters.
And if a strlen function can't be used for some characters you need a second, more capabale one.

jeb

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#85 Post by aGerman » 22 Aug 2025 11:11

I absolutely get your point, jeb. And if you would've asked me a decade ago, my answer was pretty much the same as yours.
However, it's a fact that we can't meet the expectation of a user who only relies on what the macro name suggests. Not only because we are technically unable to do that. Also because it depends on the actual expectations of the users that we can't even guess (I'll explain later what I'm talking about here). We have to make a contract with the user about what kind of input we need and what our guaranteed output is in this case. And the way to do that is the macro description.

In more detail:
As long as we measure the "length" of the content of an environment variable, we are limited to both what we are able to store in the environment and how it is stored in process memory. It should be quite clear that we can't measure binary data. That's because binary data could contain NUL characters and binary data needs to be measured as number of bytes, while environment variables are stored UTF-16 encoded where a code unit has a size of 16 bits. I wrote about that earlier.
So, let's just take an example to investigate for what the macro is suitable.

Code: Select all

@echo off
setlocal EnableDelayedExpansion
>nul chcp 65001
call :initStrLen

set "str=🙋🏻‍♂️"
echo %str%
%strLen% str len
echo length: %len%

pause
exit /b

:: macro initialization here ...
(The script is saved UTF-8 encoded w/o BOM.)
I didn't take an emoji because I expect to have emojis in a string (as I also don't expect to have control characters in a string). I took it because it demonstrates pretty well what the macro actually measures, no matter if it contains emojis or not.
That's the output if I run the code:
Screenshot 2025-08-22 174638.png
Screenshot 2025-08-22 174638.png (18.12 KiB) Viewed 2455 times

The expectation of a user might be to get a length of 1 because it's only one glyph (grapheme cluster).
The expectation of a user might be to get a length of 2 because it has a width of two columns.
The expectation of a user might be to get a length of 5 because the emoji consists of 5 codepoints.
The expectation of a user might be to get a length of 17 because it takes 17 bytes (UTF-8 code units) in the script code.
None of those expectations can be met :!:
And that's the reason why I already updated the macro description to clearly state that we only guarantee to measure the number of UTF-16 code units a string consists of, after it was converted and stored into the process environment:
0xD83D 0xDE4B 0xD83C 0xDFFB 0x200D 0x2642 0xFE0F 0x0000 (we never count the null termination, that's why the result is 7).
So, what we are doing with the strLen macro is close to the behavior of the wcslen() C funktion (where wcs is for "wide character string"). However, not even this function name can withstand what it pretends to promise :wink:

So, what is the outcome of this essay?
We can't use the macro for binary data and, thus, there is no point for supporting control characters.
We can't use the macro for characters that exceed the range of code points contained by the Basic Multilingual Plane because there is little to no chance that a user is interested in the number of UTF-16 code units if a string contains surrogate pairs.
We better make a contract with the user that reads that we only support what we would consider being plain text.
(And I'm not even talking about the displayed width as this is a completely different topic that we can't even evaluate in pure Batch. The tool that I offered for that can be found over there: https://github.com/german-one/wtswidth- ... ses/latest.)


Bonus chatter:
It's absurd what we need to do just to perform such a simple operation. What happens in the implementation if something like !var:~N,1! is evaluated? We can rely on getting an empty result if N exceeds the string length. It does never read behind the string end into foreign memory. This can only be ensured if the length of var was known before. Think about how many times the string is measured behind the scenes when our code is executed. And think oubout how cheap it was if the CMD would support something like SET /L or whatever internal command to measure the length by either using a tight loop or by calling out to wcslen() ...

Steffen

pieh-ejdsch
Posts: 259
Joined: 04 Mar 2014 11:14
Location: germany

Re: strLen boosted

#86 Post by pieh-ejdsch » 22 Aug 2025 14:57

And I thought our general default test string for the batch is:

Code: Select all

/? !;:"(!^^ %%)@&%1^"*,><+
@Steffen I got wrong lengths in your code. Using bs in a delayed variable worked differently, but I had two characters too many. Did your code give you a correct result?

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#87 Post by aGerman » 22 Aug 2025 16:00

our general default test string
Huh? Did we really define that? :lol:
However, I actually still have something like that in my test case.

Code: Select all

:: make sure we are in the spotlight:
@if "%~1"=="" start /realtime conhost "%~f0" 1&exit /b
@echo off &setlocal DisableDelayedExpansion

:: 33
set "str0=&<>|^-+*/~!%%=?.,:;#$'`\][}{)(_@ ""

:: 8191
::16   32   64  128  256  512 1024 2048 4096 carets and one "test character"
call call call call call call call call call set "str1=^^^^^^^^x"
set "str1=%str1:~1%"
:: replace the carets with two "test characters" each
set str1=^
%str1:^=xx%

:: 7000
set "str2=%str1:~,7000%"
:: 4096
set "str3=%str1:~,4096%"
:: 4095
set "str4=%str1:~,4095%"
:: 1000
set "str5=%str1:~,1000%"
:: 256
set "str6=%str1:~,256%"
:: 255
set "str7=%str1:~,255%"
:: 100
set "str8=%str1:~,100%"
:: 10
set "str9=%str1:~,10%"
:: 1
set "str10=%str1:~,1%"
:: 0
set "str11="

call :initTimediff

:: comment this or leave it ...
setlocal EnableDelayedExpansion

echo #83
call :initStrLen
call :test str0
call :test str1
call :test str2
call :test str3
call :test str4
call :test str5
call :test str6
call :test str7
call :test str8
call :test str9
call :test str10
call :test str11

pause
exit /b

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:initStrLen
:: Computes the number of resulting UTF-16 code units in a string.
:: %strLen% str [len]
::   str - [ByRef In] Name of the variable containing the string to be measured.
::   len - [ByRef Out, Optional] Name of the variable that receives the measured
::         length. If omitted, the result is assigned to variable len.
::   Variable names must be passed unquoted.
:: Strings of up to 8191 characters are supported.
      %== ! -> exclamation mark, # -> caret ==%     FOR /F "TOKENS=1-3" %%! IN (
                                                        "! ! ^ ^^^! . ^!=^!^^^^"
      %== _ -> backspace (used as delimiter) ==%            ) DO FOR /F %%_ IN (
                                             '"PROMPT $H&FOR %%B IN (1) DO REM"'
                                         ) DO FOR %%H IN (FEDCBA9876543210) DO ^
set strLen=^
%==% for /f %%? in ("%%! '") do for %%. in (1 2) do if %%.==2^
%=  =% for /f "tokens=1,2" %%1 in ("%%!$args%%! len") do^
%=    =% if not defined %%1 ((if %%? lss ' endlocal)^&set /a %%2=0)^
%=    =% else for /f "tokens=4,8,12,16 delims=%%_" %%4 in (^"^
%=     =%%%!%%1:~2047,1%%!%%_%%!%%1:~4095,1%%!%%_%%!%%1:~6143,1%%!%%_^
%=     =%17%%_13%%_7%%_3%%_16%%_12%%_6%%_2%%_^
%=     =%15%%_11%%_5%%_1%%_14%%_10%%_4%%_0^"^
%=    =% ) do for /f "tokens=8 delims=%%_" %%3 in (^"^
%=     =%%%!%%1:~0%%7777,1%%!%%_%%!%%1:~0%%7377,1%%!%%_^
%=     =%%%!%%1:~0%%6777,1%%!%%_%%!%%1:~0%%6377,1%%!%%_^
%=     =%%%!%%1:~0%%5777,1%%!%%_%%!%%1:~0%%5377,1%%!%%_%%!%%1:~0%%4377,1%%!%%_^
%=     =%%%44%%_%%40%%_%%54%%_%%50%%_%%64%%_%%60%%_%%74%%_%%70^"^
%=    =% ) do set $=%%!%%1:~0%%300%%!^
%=     =%%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H%%H^
%=     =%FFFFFFFFFFFFFFFFEEEEEEEEEEEEEEEEDDDDDDDDDDDDDDDDCCCCCCCCCCCCCCCC^
%=     =%BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA99999999999999998888888888888888^
%=     =%7777777777777777666666666666666655555555555555554444444444444444^
%=     =%3333333333333333222222222222222211111111111111110000000000000000^&^
%=    =% for %%- in (0%%300+0x0%%!$:~511%%#,1%%!%%!$:~255%%#,1%%!) do^
%=      =% ((if %%? lss ' endlocal)^&set /a %%2=%%-)^
%==% else (if %%? gtr ' setlocal enabledelayedexpansion)^&set $args=
goto :eof
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:initTimediff
for /f %%! in ("! ^! ^^^!") do ^
set timediff=for /l %%# in (1 1 2) do if %%#==2 for /f "tokens=2" %%$ in ("%%!%%! 1 0") do ((if 1==%%$ setlocal EnableDelayedExpansion)^&for /f "tokens=1-3" %%- in ("%%!_i_%%!") do (set "_t1_=%%!%%~-: =0%%!"^&set "_t2_=%%!%%~.: =0%%!"^&^
set /a "_d_=(8640000+(((1%%!_t2_:~,2%%!*60+1%%!_t2_:~3,2%%!)*60+1%%!_t2_:~6,2%%!)*100+1%%!_t2_:~-2%%!-36610100)-(((1%%!_t1_:~,2%%!*60+1%%!_t1_:~3,2%%!)*60+1%%!_t1_:~6,2%%!)*100+1%%!_t1_:~-2%%!-36610100))%%8640000,_o_=100000000+(_d_%%100),_d_/=100,_o_+=(_d_%%60)*100,_d_/=60,_o_+=(_d_%%60)*10000+_d_/60*1000000"^&^
set "_o_=%%!_o_:~1,2%%!:%%!_o_:~3,2%%!:%%!_o_:~5,2%%!.%%!_o_:~-2%%!"^&for /f %%' in ("%%!_o_%%!") do ((if 1==%%$ endlocal)^&if "%%~/"=="" (echo %%') else set "%%~/=%%'"))) else set _i_=
goto :eof
::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

:test
echo(
set t1=%time%
for /l %%i in (1 1 5000) do %strLen% %1 len
set t2=%time%
echo %len%
%timediff% t1 t2 diff
echo %diff%
set t1=%time%
for /l %%i in (1 1 5000) do %strLen% %1 len
set t2=%time%
echo %len%
%timediff% t1 t2 diff
echo %diff%
set t1=%time%
for /l %%i in (1 1 5000) do %strLen% %1 len
set t2=%time%
echo %len%
%timediff% t1 t2 diff
echo %diff%
set t1=%time%
for /l %%i in (1 1 5000) do %strLen% %1 len
set t2=%time%
echo %len%
%timediff% t1 t2 diff
echo %diff%
goto :eof
Meanwhile I use your way to create the 8191-character string. And I also tried with "test character" changed from x to space or to exclamation mark.
The string lengths are chosen because some of them have been (or still are) edge cases.

This code still gives the correct results when I run it.
Does it always fail for you, or only in some cases?

Steffen

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#88 Post by aGerman » 24 Aug 2025 06:00

I have a feeling my demonstration that measured length is not a good basis for making assumptions about displayed width has left you in paralysis :lol:
In this case - don't worry, you just know now that the two are not connected.

However, it will still work for printable ASCII, box drawing and block characters, scripts like Latin (used by ~70% of world population), Cyrillic, Greek... Perhaps also for Arabic and Hebrew although RTL comes in as a game changer for quite some terminal interfaces.

It doesn't work (and will never work) for a large range of symbols, ligatures, pictograms and emoticons. Also not for CJK scripts or for complex scripts commonly used in India, like Devanagari, Bengali, Telugu ... So, for many people in the world I didn't tell anything new.

miskox
Posts: 669
Joined: 28 Jun 2010 03:46

Re: strLen boosted

#89 Post by miskox » 24 Aug 2025 11:18

aGerman wrote:
22 Aug 2025 11:11
Bonus chatter:
It's absurd what we need to do just to perform such a simple operation. What happens in the implementation if something like !var:~N,1! is evaluated? We can rely on getting an empty result if N exceeds the string length. It does never read behind the string end into foreign memory. This can only be ensured if the length of var was known before. Think about how many times the string is measured behind the scenes when our code is executed. And think oubout how cheap it was if the CMD would support something like SET /L or whatever internal command to measure the length by either using a tight loop or by calling out to wcslen() ...

Steffen
Something like this would be great: OpenVMS has lexical functions which could be used in DCL. See examples: https://wiki.vmssoftware.com/Category:Lexical_Functions

For example:

Code: Select all

set str=some string
set str_len=f$length(str)
echo string length: %str_len%
Or for example: get a PID of a process:

To get PID of the current process:

Code: Select all

REM "" for current process, for other processes privileges are required of coruse, use real PID to get that PID's info. Item could be many things (see link above. For example USERNAME of that process...
REM F$GETJPI(pid,item) 
set pid=F$GETJPI("","PID") 
echo PID=%pid%
set un=F$GETJPI("1234","USERNAME") 
echo un=%un%
and so on.

That would be great or maybe the best solution.

Saso

aGerman
Expert
Posts: 4748
Joined: 22 Jan 2010 18:01
Location: Germany

Re: strLen boosted

#90 Post by aGerman » 24 Aug 2025 13:30

I was referring more to why MS didn't implement it when they refactored Batch with the release of NT. There's no hope of something like that happening again in the future, as they've gone in a new direction with PowerShell (almost two decades ago already).
https://github.com/microsoft/terminal/b ... ksa.md#cmd
FWIW You may consider the CmdLets in PowerShell as lexical functions.

Steffen

Post Reply