Why does SET performance degrade as environment size grows?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Why does SET performance degrade as environment size grows?

#1 Post by dbenham » 05 Dec 2011 22:06

There are many clever uses for environment variables that can cause the size of the environment to grow dramatically. For example:

  • Load rows of data or lines of a file into a pseudo array for sorting (or other) purposes.
  • Loading commands into memory (macros) for speed and convenience of an include-able library of functions.

I remember seeing Ed "too complex" Dyreen report that he saw poor performance as his memory usage increased. I filed that in the back of my mind, but didn't worry much until I ran into performance issues myself with a Stack Overflow question and answer: http://stackoverflow.com/a/8369403/1012053.

I've done some extensive timing tests that demonstrate large environment sizes can dramatically impact the performance of SET and SETLOCAL/ENDLOCAL. Absolute time values are highly machine dependent, but I think the qualitative results are relevant to everyone.

Code: Select all

Aprox. Env Size   Set a var (sec)   Setlocal/Endlocal (sec)   Expand a var (sec)
---------------   ---------------   ------------------------  ------------------
        10KB           0.0001            0.0005                     0.0003
      1293KB           0.0126            0.0322                     0.0003

Based on my tests, the amount of time it takes to SET a value or do a SETLOCAL/ENDLOCAL toggle is roughly linear with the size of the environment. This can have a devastating impact on performance with large data sets and/or with large macro libraries.

It only stands to reason that the performance of SETLOCAL / ENDLOCAL will suffer as the size of the environment grows - Obviously the amount of memory that must be allocated grows with the size of the environment.

But I'm shocked that the time it takes to SET a single value suffers just as badly :!: :shock: The only thing I can think of is CMD.EXE must store the entire environment in one continuous block of memory and it reallocates a new block every time a single value changes. The SET tests I used were using a variable that resides near the beginning of the environment (a). I ran some additional tests using a variable that resides near the end (z) (not shown). The set was as much as 25% faster when working with z vs. a, so there is some positional dependency as well.

It is interesting that the time it takes to expand a variable appears to be independent of the environment size, thank goodness.

Question - Does anyone have any better insight as to why SET degrades linearly as the environment grows :?: I suppose a definitive answer would have to come from someone with knowledge of the internal workings of CMD.EXE. Even better would be a suggestion on how to improve performance when using a large environment, but I doubt there is much that can be done.

Here is my actual timing test code. The timer routine is a macro that is already loaded into memory prior to running the test script.

Code: Select all

@echo off
setlocal enableDelayedExpansion
set "test=a"
for /l %%n in (1 1 10) do set "test=!test!!test!"
set buf1=%test%
set buf2=%test%
set "buf3=a"
for /l %%n in (1 1 9) do set "buf3=!buf3!!buf3!"
set cnt=0
set "a=a"
for %%n in (0 10 20 40 80 160 320 640) do call :test %%n
exit /b

:test
for /l %%n in (1 1 %1) do (
  set /a cnt+=1
  set test!cnt!=%test%
)
set >env.txt
set t1=%time%
for /l %%n in (1 1 1000) do (
  rem
)
set t2=%time%
for /l %%n in (1 1 1000) do (
  rem
  setlocal
  endlocal
)
set t3=%time%
for /l %%n in (1 1 1000) do (
  rem
  setlocal
  set "a=b"
  endlocal
)
set t4=%time%
for /l %%n in (1 1 1000) do (
  rem
  setlocal
  set "a="
  endlocal
)
set t5=%time%
for /l %%n in (1 1 1000) do (
  rem
  echo !a!>nul
)
set t6=%time%
for /l %%n in (1 1 1000) do (
  set "a=b"
  set "a="
)
set t7=%time%
%macro.diffTimeRaw% t1 t2 base
%macro.diffTimeRaw% t2 t3 base_setlocal_endlocal
%macro.diffTimeRaw% t3 t4 base_setlocal_set_endlocal
%macro.diffTimeRaw% t4 t5 base_setlocal_unset_endlocal
%macro.diffTimeRaw% t5 t6 base_expand
%macro.diffTimeRaw% t6 t7 base_set_unset
set /a time_setlocal_endlocal=base_setlocal_endlocal-base
set /a time_set=base_setlocal_set_endlocal-base_setlocal_endlocal
set /a time_unset=base_setlocal_unset_endlocal-base_setlocal_endlocal
set /a time_expand=base_expand-base
set /a time_set_unset_predicted=base+time_set+time_unset
call :padNum time_setlocal_endlocal
call :padNum time_set
call :padNum time_unset
call :padNum time_expand
call :padNum time_set_unset_predicted
call :padNum base_set_unset
echo ---------------------------------------
for %%f in (env.txt) do echo approximate environment size = %%~zf
echo(
echo   setlocal/endlocal = %time_setlocal_endlocal%
echo                 set = %time_set%
echo               unset = %time_unset%
echo              expand = %time_expand%
echo(
echo predicted set/unset = %time_set_unset_predicted%
echo    actual set/unset = %base_set_unset%
echo(
del env.txt
exit /b

:padNum
set "%1=      !%1!"
set "%1=!%1:~-7,5!.!%1:~-2!"
set "%1=!%1: .=0.!"
set "%1=!%1:. =.0!"
exit /b

Here are the results of one run. The times are for 1000 iterations of each operation measured in 1/100 seconds.

Code: Select all

---------------------------------------
approximate environment size = 10352

  setlocal/endlocal =     0.45
                set =     0.09
              unset =     0.10
             expand =     0.25

predicted set/unset =     0.19
   actual set/unset =     0.20

---------------------------------------
approximate environment size = 21051

  setlocal/endlocal =     0.58
                set =     0.18
              unset =     0.15
             expand =     0.28

predicted set/unset =     0.33
   actual set/unset =     0.31

---------------------------------------
approximate environment size = 41711

  setlocal/endlocal =     0.84
                set =     0.31
              unset =     0.23
             expand =     0.29

predicted set/unset =     0.54
   actual set/unset =     0.51

---------------------------------------
approximate environment size = 83033

  setlocal/endlocal =     1.34
                set =     0.58
              unset =     0.41
             expand =     0.27

predicted set/unset =     1.00
   actual set/unset =     0.91

---------------------------------------
approximate environment size = 165726

  setlocal/endlocal =     3.02
                set =     1.23
              unset =     0.92
             expand =     0.24

predicted set/unset =     2.15
   actual set/unset =     2.10

---------------------------------------
approximate environment size = 331166

  setlocal/endlocal =     5.27
                set =     2.12
              unset =     1.56
             expand =     0.27

predicted set/unset =     3.69
   actual set/unset =     3.82

---------------------------------------
approximate environment size = 662046

  setlocal/endlocal =    15.80
                set =     5.91
              unset =     4.74
             expand =     0.28

predicted set/unset =    10.66
   actual set/unset =    11.66

---------------------------------------
approximate environment size = 1324081

  setlocal/endlocal =    32.15
                set =    12.57
              unset =    10.25
             expand =     0.26

predicted set/unset =    22.83
   actual set/unset =    24.88


Dave Benham

orange_batch
Expert
Posts: 442
Joined: 01 Aug 2010 17:13
Location: Canadian Pacific
Contact:

Re: Why does SET performance degrade as environment size gro

#2 Post by orange_batch » 06 Dec 2011 01:07

My only guess is that, like how DOS checks locations for filenames associated with commands etc, maybe it runs through all data in the environment to find and modify an existing variable's memory address (and allocate more if made larger) or to find a new memory address for a new variable?

I don't know the technical aspects too well.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Why does SET performance degrade as environment size gro

#3 Post by dbenham » 06 Dec 2011 07:27

@orange_path - "DOS" is extremely efficient at finding a specific variable within the environment as evidenced by the expansion timing results. If each variable is allocated independently, then environment size should have minimal impact on SET performance. Also, it takes virtually the same amount of time to clear a variable as it does to set one. So I don't think your suggestion is correct.

Dave Benham

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Why does SET performance degrade as environment size gro

#4 Post by Aacini » 06 Dec 2011 22:43

I devised an idea that may solve, at least in part, the performance problems of SET command caused by a very large environment. Let's suppose that the internal operation of SET VAR=VALUE command follow these steps:

  • When a new variable is defined with a value that exceed the current environment size, the environment is copied to a new area if the area beyond it is not available.
  • The new area is just large enough to receive the new variable. No additional space is reserved.
  • The important one: When a large variable is deleted, the remaining free space is NOT released. The environment memory block is never shrunk.

If previous steps are true, then the performance problems may decrease if we first reserve the desired environment space via large (8 KB) variables with the same name of the working variables. For example, to reserve 1024 KB we define 128 large variables; I suppose that the time required to define these 128 variables will be less than the time required to fill the same 1024 KB with shorter variables.

When the process is running, the definition of the first 128 working variables will take the time necessary to delete an 8 KB variable and define a shorter one, but for the variable 129 on the process must be faster because it just define a new variable in an already available space. To aid to this process, the variables must have names that place them at the end of the environment as dbenham indicated.

Code: Select all

:ReserveEnvSpace sizeInKB
rem Define the first large variable (reserving 6 bytes for variable name)
set z1=X
for /L %%i in (1,1,12) do set z1=!z1!!z1!
set z1=!z1!!z1:~8!
rem Define the rest of large variables
set /A lastVar=%1 / 8
for /L %%i in (2,1,%lastVar%) do set z%%i=!z1!
exit /B

The MEM /P command may be used to check the size and placement of the environment memory block. Below there is a rudimentary Batch test program that confirm that the environment block is NOT shrunk when a large variable is deleted:

Code: Select all

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
CALL :ENVDATA "Initial environment              "
SET Z1=X
FOR /L %%I IN (1,1,12) DO SET Z1=!Z1!!Z1!
CALL :ENVDATA "After defined a 4KB variable     "
SET Z1=!Z1!!Z1:~8!
CALL :ENVDATA "After defined a 8KB variable     "
SET Z2=!Z1!
CALL :ENVDATA "After defined two 8KB variables  "
SET Z2=
CALL :ENVDATA "Z2 deleted, there is one 8KB's   "
SET Z2=!Z1!
CALL :ENVDATA "After defined Z2 again: two 8KB's"
SET Z3=!Z1!
CALL :ENVDATA "After defined three 8KB variables"
SET Z3=
CALL :ENVDATA "Z3 deleted, there is two 8KB's   "
SET Z2=
CALL :ENVDATA "Z2 deleted, there is one 8KB's   "
SET Z2=!Z1!
CALL :ENVDATA "After defined Z2 again: two 8KB's"
SET Z3=!Z1!
CALL :ENVDATA "After defined Z3 again: three 8KB"
SET Z4=!Z1!
CALL :ENVDATA "After defined four 8KB variables "
GOTO :EOF

:ENVDATA
SET /P =%~1  < NUL
FOR /F "SKIP=1 TOKENS=1-4" %%A IN ('MEM /P ^| FINDSTR "COMMAND"') DO (
    ECHO Origin=%%A, Size=%%C
)
EXIT /B

Now a precise timing test by first reserving the desired environment space must be done.

EDIT: I just added the results of previous program:

Code: Select all

Initial environment                Origin=004B30, Size=000450
After defined a 4KB variable       Origin=00D740, Size=001370
After defined a 8KB variable       Origin=00D740, Size=002360
After defined two 8KB variables    Origin=00D740, Size=004360
Z2 deleted, there is one 8KB's     Origin=00D740, Size=004360
After defined Z2 again: two 8KB's  Origin=00D740, Size=004360
After defined three 8KB variables  Origin=00D740, Size=006360
Z3 deleted, there is two 8KB's     Origin=00D740, Size=006360
Z2 deleted, there is one 8KB's     Origin=00D740, Size=006360
After defined Z2 again: two 8KB's  Origin=00D740, Size=006360
After defined Z3 again: three 8KB  Origin=00D740, Size=006360
After defined four 8KB variables   Origin=00D740, Size=006360
Last edited by Aacini on 08 Dec 2011 21:57, edited 2 times in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Why does SET performance degrade as environment size gro

#5 Post by dbenham » 06 Dec 2011 23:59

Welcome Aacini :!:

Unfortunately mem.exe is not available on my home Vista 64 machine.

When I did some very quick tests at work I was looking at the 2nd "environment" line when I saw the size shrink upon variable undefine. I'll have to look at the mem output again more closely.

It may be a few days before I can look at this in detail, but I am intrigued with your idea.


Dave Benham

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Why does SET performance degrade as environment size gro

#6 Post by alan_b » 07 Dec 2011 10:17

dbenham wrote:It is interesting that the time it takes to expand a variable appears to be independent of the environment size, thank goodness.

Question - Does anyone have any better insight as to why SET degrades linearly as the environment grows :?: I suppose a definitive answer would have to come from someone with knowledge of the internal workings of CMD.EXE. Even better would be a suggestion on how to improve performance when using a large environment, but I doubt there is much that can be done.
Dave Benham


I have no better insight, but a suggestion of a potentially informative test :-

Obtain or create a 1293 KBytes text file called 1293KB.txt consisting of fairly short strings.
Select about 10 KBytes from that file and copy as 10KB.txt

Measure how long it takes for CMD.EXE to execute each of the following,
(possibly with drive read/write caching disabled to avoid another source of confusion)
COPY 1293KB.txt 1293KB_X.txt
SORT < 1293KB.txt > 1293KB_Y.txt
SORT < 1293KB_Y.txt > 1293KB_Z.txt

The time taken to create 1293KB_X.txt is disc read and write
The time taken to create 1293KB_Y.txt is disc read and write plus placing all strings in memory plus sorting them
The time taken to create 1293KB_Z.txt is disc read and write plus "placing all strings in memory" - already sorted.
Duration 1293KB_Z.txt - 1293KB_X.txt is "placing all strings in memory".

Repeat the exercise with 10KB.txt and if this indicates that time taken
"placing all strings in memory" is "roughly linear with the size of the environment"
then we have a problem that baffled the creators of SORT.EXE.
If however SORT.EXE has solved that problem perhaps there are tricks that could be learnt.

Regards
Alan

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Why does SET performance degrade as environment size gro

#7 Post by Aacini » 07 Dec 2011 20:25

I got a new idea on my method. The goal is to avoid the excessive data movement from one memory location to another when the SET command store a variable. Let´s suppose that after we had defined the 128 large variables named z1..z128 (as I suggested in my previous post) the program start the real process by storing a shorter value in z1. To do that, after the new z1 value is stored in its place, all the values from z2 to z128 must be moved backwards, that is a huge amount of bytes.

To avoid that, all the large variables could be deleted before the real process start. This way, the environment will have the reserved space free and the working variables don't require to have the same names of the large variables. I think (I am confident, really) that the time spent in the creation-deletion of the large variables will be much less than the time the original process takes without such previous space reservation.

Code: Select all

:ReserveEnvSpace sizeInKB
rem Define the first large variable (reserving 6 bytes for variable name)
set z1=X
for /L %%i in (1,1,12) do set z1=!z1!!z1!
set z1=!z1!!z1:~8!
rem Define the rest of large variables
set /A lastVar=%1 / 8
for /L %%i in (2,1,%lastVar%) do set z%%i=!z1!
rem Delete all the large variables in bottom-up order
for /L %%i in (%lastVar%,-1,1) do set z%%i=
exit /B


Note that the reserved environment space will be lost if a SETLOCAL command or any external file is executed, like CMD.EXE. Only the occupied environment space will be copied to the new local environment.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Why does SET performance degrade as environment size gro

#8 Post by dbenham » 10 Dec 2011 16:22

@alan_b

1) I don't understand why SORT.EXE memory management should correlate with how CMD.EXE manages memory for the environment? Or how any tricks that SORT might use could be applied to CMD.EXE environment management? We are constrained with the limitations of how CMD.EXE manages its memory. If we understand how CMD.EXE works, we might be able to develop strategies to compensate (or not).

2) I don't trust that 'Duration 1293KB_Z.txt - 1293KB_X.txt is "placing all strings in memory"'. Do we know what sort algoritm SORT.EXE uses? It's been a long time since I thought about sort algorithms, but I seem to remember at least one method was actually slow if the list is pre-sorted. I see no reason to expect that passing a pre-sorted list to SORT.EXE will result in zero time spent on sorting.

------------------------

@Aacini - Your ideas were good, but it doesn't appear to work. :(

First off, here are the results of mem /p on my Vista 32 at work:

Code: Select all

P:\>setlocal enableDelayedExpansion
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      000CE0     Environment
  012C30      MEM          000C20     Environment
 
P:\>set "test1=!path!"
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001190     Environment
  012C30      MEM          0011B0     Environment
 
P:\>set "test2=!path!"
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          001740     Environment
 
P:\>setlocal
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          001740     Environment
 
P:\>endlocal
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          001740     Environment
 
P:\>set "test1="
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          0011B0     Environment
 
P:\>set "test2="
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          000C20     Environment
 
P:\>setlocal
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          000C20     Environment
 
P:\>endlocal
 
P:\>mem /p   | find "Environment"
  008100      COMMAND      001730     Environment
  012C30      MEM          000C20     Environment
 
P:\>

The environment memory for COMMAND grows as variables are set, but never shrinks when unset.
The environment memory for MEM grows and shrinks with set and unset.
I think I messed up the tests I wanted to do with SETLOCAL / ENDLOCAL

I'm pretty sure both COMMAND and MEM are associated with the CMD environment. I'm not sure how to interpret the results.


Next I developed a script to test your pre-allocation theory. I have an \n var that is normally at the end of the environment. So I did my pre-allocation and z test using \z... to make sure they were at the end of the environment.

Code: Select all

@echo off
setlocal enableDelayedExpansion
set "test=a"
for /l %%n in (1 1 10) do set "test=!test!!test!"
set buf1=%test%
set buf2=%test%
set "buf3=a"
for /l %%n in (1 1 9) do set "buf3=!buf3!!buf3!"
set cnt=0
set "a=a"
for %%n in (0 10 20 40 80 160 320 640) do call :test %%n
exit /b

:test
for /l %%n in (1 1 %1) do (
  set /a cnt+=1
  set test!cnt!=%test%
)
set "testbuf=%test%"
set "testbuf="
set >env.txt

set t1=%time%
for /l %%n in (1 1 1000) do (
  rem
)
set t2=%time%

setlocal
set t3=%time%
for /l %%n in (1 1 1000) do (
  rem
  set "a=a"
  set "a="
)
set t4=%time%
endlocal&set "t3=%t3%"&set "t4=%t4%"

setlocal
set t5=%time%
for /l %%n in (1 1 1000) do (
  rem
  set "\z=a"
  set "\z="
)
set t6=%time%
endlocal&set "t5=%t5%"&set "t6=%t6%"

setlocal
set "\z_preallocate=%test%"
set "\z_preallocate="
set t7=%time%
for /l %%n in (1 1 1000) do (
  rem
  set "a=a"
  set "a="
)
set t8=%time%
endlocal&set "t7=%t7%"&set "t8=%t8%"

setlocal
set "\z_preallocate=%test%"
set "\z_preallocate="
set t9=%time%
for /l %%n in (1 1 1000) do (
  rem
  set "\z=a"
  set "\z="
)
set t10=%time%
endlocal&set "t9=%t9%"&set "t10=%t10%"

%macro.diffTimeRaw% t1 t2 rem
%macro.diffTimeRaw% t3 t4 rem_a_set_unset
%macro.diffTimeRaw% t5 t6 rem_z_set_unset
%macro.diffTimeRaw% t7 t8 rem_preallocate_a_set_unset
%macro.diffTimeRaw% t9 t10 rem_preallocate_z_set_unset
set /a a_set_unset=rem_a_set_unset-rem
set /a z_set_unset=rem_z_set_unset-rem
set /a preallocate_a_set_unset=rem_preallocate_a_set_unset-rem
set /a preallocate_z_set_unset=rem_preallocate_z_set_unset-rem
call :padNum a_set_unset
call :padNum z_set_unset
call :padNum preallocate_a_set_unset
call :padNum preallocate_z_set_unset
echo ---------------------------------------
for %%f in (env.txt) do echo approximate environment size = %%~zf
echo(
echo  set/unset  a = %a_set_unset%
echo  set/unset \z = %z_set_unset%
echo(
echo  set/unset  a after preallocation = %preallocate_a_set_unset%
echo  set/unset \z after preallocation = %preallocate_z_set_unset%
echo(
del env.txt
exit /b

:padNum
set "%1=    !%1!"
set "%1=!%1:~-4,2!.!%1:~-2!"
set "%1=!%1: .=0.!"
set "%1=!%1:. =.0!"
exit /b

Here are the dissapointing results :cry:

Code: Select all

---------------------------------------
approximate environment size = 10365

 set/unset  a =  0.18
 set/unset \z =  0.18

 set/unset  a after preallocation =  0.18
 set/unset \z after preallocation =  0.17

---------------------------------------
approximate environment size = 21059

 set/unset  a =  0.30
 set/unset \z =  0.28

 set/unset  a after preallocation =  0.31
 set/unset \z after preallocation =  0.27

---------------------------------------
approximate environment size = 41719

 set/unset  a =  0.50
 set/unset \z =  0.44

 set/unset  a after preallocation =  0.50
 set/unset \z after preallocation =  0.44

---------------------------------------
approximate environment size = 83039

 set/unset  a =  0.90
 set/unset \z =  0.77

 set/unset  a after preallocation =  0.90
 set/unset \z after preallocation =  0.76

---------------------------------------
approximate environment size = 165731

 set/unset  a =  2.05
 set/unset \z =  1.76

 set/unset  a after preallocation =  2.03
 set/unset \z after preallocation =  1.76

---------------------------------------
approximate environment size = 331175

 set/unset  a =  3.67
 set/unset \z =  3.07

 set/unset  a after preallocation =  3.76
 set/unset \z after preallocation =  3.38

---------------------------------------
approximate environment size = 662055

 set/unset  a = 12.43
 set/unset \z =  9.67

 set/unset  a after preallocation = 11.21
 set/unset \z after preallocation =  9.72

---------------------------------------
approximate environment size = 1324089

 set/unset  a = 23.10
 set/unset \z = 19.73

 set/unset  a after preallocation = 23.08
 set/unset \z after preallocation = 19.62
 


Dave Benham

alan_b
Expert
Posts: 357
Joined: 04 Oct 2008 09:49

Re: Why does SET performance degrade as environment size gro

#9 Post by alan_b » 11 Dec 2011 01:47

@Dave
1) I don't understand why SORT.EXE memory management should correlate with how CMD.EXE manages memory for the environment? Or how any tricks that SORT might use could be applied to CMD.EXE environment management? We are constrained with the limitations of how CMD.EXE manages its memory. If we understand how CMD.EXE works, we might be able to develop strategies to compensate (or not).

I assumed that SORT.EXE ran under the constraints of CMD.EXE,
and that if the architects of CMD.EXE and SORT.EXE did NOT know how to avoid performance degradation with environment size,
then we have little chance of defeating such degradation when using SET,
BUT if SORT can jump hurdles then perhaps SET could be "assisted" if we knew how.

2) I don't trust that 'Duration 1293KB_Z.txt - 1293KB_X.txt is "placing all strings in memory"'. Do we know what sort algoritm SORT.EXE uses? It's been a long time since I thought about sort algorithms, but I seem to remember at least one method was actually slow if the list is pre-sorted. I see no reason to expect that passing a pre-sorted list to SORT.EXE will result in zero time spent on sorting.

It was just a best guess.
It has been a few decades since I was heavily involved in sorting algorithms.
I think a general purpose algorithm should NOT make needless changes to what is already sorted,
but Microsoft make their own rules.
That is what makes computing fun - or not :twisted:

Regards
Alan

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Why does SET performance degrade as environment size gro

#10 Post by Aacini » 11 Dec 2011 14:25

Dave:

Let's go back a little. The goal was/is to speed up a program that use a large environment and that run very slowly, right? For example, the loading of a large file in memory variables (that was the origin of this topic, indeed):

Code: Select all

@echo off
setlocal EnableDelayedExpansion
findstr /N ^^ %1 | find /C ":" > lines.tmp
set /P lines=< lines.tmp
(for /L %%i in (1,1,%lines%) do set /P line%%1=) < %1
rem Process the lines here...

I assured that previous program will be speeded up with if an environment of the right size is reserved before entering the process that fill it, and that the total time will be even shorter if the names of the variables aid to decrease the movement of large memory blocks inside the environment:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
for %%a in (%1) do set size=%%~Za
set sizeInKB=%size:~0,-3%
call ReserveEnvironmentSpace %sizeInKB%
findstr /N ^^ %1 | find /C ":" > lines.tmp
set /P lines=< lines.tmp
(for /L %%i in (1,1,%lines%) do set /P z%%1=) < %1
rem Process the lines here...

However, previous test have not done yet and I am still confident that the results will be good. To do an equivalent test with your program, it must be written this way:

Code: Select all

rem Run test with normal Environment
setlocal EnableDelayedExpansion
rem Run the complete tests here...
endlocal

rem Run test with pre-allocated Environment
setlocal EnableDelayedExpansion
call ReserveEnvironmentSpace 1300
rem Run again the complete tests here...

The two main factors that affect the performance of SET command are the necessity for move the environment to another place in order to expand it, and the size of the memory blocks that must be moved in order to keep the variables alphabetically ordered. Note that SETLOCAL/ENDLOCAL tests are not significative for comparative purposes in this case, just the performance of SET command.
Aacini wrote:Note that the reserved environment space will be lost if a SETLOCAL command or any external file is executed, like CMD.EXE. Only the occupied environment space will be copied to the new local environment.

However, this behavior may be used in a large program to reserve the environment space just once, and then use it in several subroutines:

Code: Select all

rem Main file
setlocal EnableDelayedExpansion
call :AllocateEnvironmentSpace %sizeInKB%
call :Sub1
call :Sub2
etc...

:Sub1 or Sub2...
setlocal
call :FreeEnvironmentSpace
rem Do my business...
exit /B

:AllocateEnvironmentSpace sizeInKB
rem Define the first large variable (reserving 6 bytes for variable name)
set z1=X
for /L %%i in (1,1,12) do set z1=!z1!!z1!
set z1=!z1!!z1:~8!
rem Define the rest of large variables
set /A lastVar=%1 / 8
for /L %%i in (2,1,%lastVar%) do set z%%i=!z1!
exit /B

:FreeEnvironmentSpace
rem Delete all the large variables in bottom-up order
for /L %%i in (%lastVar%,-1,1) do set z%%i=
exit /B

I want run my own tests with your program and others; however, I have not the macro.diffTimeRaw macro. I seek for it in these topics and is not there.

viewtopic.php?t=1827
viewtopic.php?p=6930

Please, tell me where I can get this macro from.

Antonio

jeb
Expert
Posts: 1041
Joined: 30 Aug 2007 08:05
Location: Germany, Bochum

Re: Why does SET performance degrade as environment size gro

#11 Post by jeb » 11 Dec 2011 14:56

Hi Aacini,

I doesn't have the macro of dbenham, but my own works nearly the same way.

Code: Select all

::: Timediff pTime1 pTime2 pResult
set $timediff=for /L %%n in (1 1 2) do if %%n==2 (%\n%
      for /F "tokens=1,2,3 delims=, " %%1 in ("!argv!") do (%\n%
         set "time1=!%%~1: =0!" %\n%
         set "time2=!%%~2: =0!" %\n%
         set /a "t1=((1!time1:~0,2!*60+1!time1:~3,2!)*60+1!time1:~6,2!-366100)*1000+1!time1:~-2!*10-1000" %\n%
         set /a "t2=((1!time2:~0,2!*60+1!time2:~3,2!)*60+1!time2:~6,2!-366100)*1000+1!time2:~-2!*10-1000" %\n%
         set /a "diff=t2-t1" %\n%
         for /f "delims=" %%r in ("!diff!") do endlocal^& set "%%~3=%%~r" %\n%
      ) %\n%
) ELSE setlocal enableDelayedExpansion ^& set argv=,


Code: Select all

set time_1start=%time%
call :TimeConsumer
set time_1end=%time%
%$timediff% time_1start time_1end result_time1


Btw. The macro expects the 24hour time format of hh:mm:ss.fff

jeb

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Why does SET performance degrade as environment size gro

#12 Post by Aacini » 14 Dec 2011 01:31

Dave: I could get some time at least to achieve the missing timing test. I slightly modified your program to show results in a more compact form, and I added the calculation of the time the environment takes to grow up to the testing size.

These are the results of original program:

Code: Select all

   Environ       Grow    SetLocal-
      Size     Environ   EndLocal       Set      Unset     Expand
   --------------------------------------------------------------
      4664       0.00       0.46       0.09       0.06       0.11
     15434       0.01       0.67       0.22       0.11       0.21
     36095       0.01       1.02       0.53       0.27       0.27
     77418       0.05       1.79       1.00       0.41       0.34
    160110       0.19       3.26       2.00       0.83       0.48
    325551       0.68       6.23       3.99       1.63       0.84
    656432       2.33      12.81       8.15       3.23       1.52
   1318467      10.93      27.63      16.91       6.58       3.16
The results below were obtained after adding a CALL :ReserveEnvironmentSpace 1290 and changed the name of the test variable:

Code: Select all

   Environ       Grow    SetLocal-
      Size     Environ   EndLocal       Set      Unset     Expand
   --------------------------------------------------------------
      4678       0.00       0.52       0.01       0.02       0.12
     15416       0.02       0.65       0.21       0.12       0.20
     36017       0.01       1.05       0.28       0.22       0.25   47%
     77220       0.06       1.78       0.45       0.43       0.33   55%
    159672       0.22       3.17       0.87       0.79       0.48   56%
    324633       0.72       5.96       1.78       1.70       0.84   55%
    654553       2.50      12.43       3.33       3.35       1.47   59%
   1314669      11.58      27.30       6.65       6.48       3.12   60%
I added the percentage of time gained in SET command in the last method vs. the former one. The tendency of the displayed values allows to predict larger time gains as the environment grows.
Aacini wrote:The two main factors that affect the performance of SET command are the necessity for move the environment to another place in order to expand it, and the size of the memory blocks that must be moved in order to keep the variables alphabetically ordered. Note that SETLOCAL/ENDLOCAL tests are not significative for comparative purposes in this case, just the performance of SET command.

Code: Select all

@echo off
setlocal enableDelayedExpansion

set macro.diffTimeRaw=call :diffTimeRaw

call :ReserveEnvironmentSpace 1290

set "test=a"
for /l %%n in (1 1 10) do set "test=!test!!test!"
set buf1=%test%
set buf2=%test%
set "buf3=a"
for /l %%n in (1 1 9) do set "buf3=!buf3!!buf3!"
set zz=0
set "zzz=a"
cls
echo    Environ       Grow    SetLocal-
echo       Size     Environ   EndLocal       Set      Unset     Expand
echo    --------------------------------------------------------------
for %%n in (0 10 20 40 80 160 320 640) do call :test %%n
exit /b

:test
set t0=%time%
for /l %%n in (1 1 %1) do (
  set /a zz+=1
  set z!zz!=%test%
)
set >env.txt
set t1=%time%
for /l %%n in (1 1 1000) do (
  rem
)
set t2=%time%
for /l %%n in (1 1 1000) do (
  rem
  setlocal
  endlocal
)
set t3=%time%
for /l %%n in (1 1 1000) do (
  rem
  setlocal
  set "zzz=b"
  endlocal
)
set t4=%time%
for /l %%n in (1 1 1000) do (
  rem
  setlocal
  set "zzz="
  endlocal
)
set t5=%time%
for /l %%n in (1 1 1000) do (
  rem
  echo !zzz!>nul
)
set t6=%time%
for /l %%n in (1 1 1000) do (
  set "zzz=b"
  set "zzz="
)
set t7=%time%
%macro.diffTimeRaw% t0 t1 time_fill
%macro.diffTimeRaw% t1 t2 base
%macro.diffTimeRaw% t2 t3 base_setlocal_endlocal
%macro.diffTimeRaw% t3 t4 base_setlocal_set_endlocal
%macro.diffTimeRaw% t4 t5 base_setlocal_unset_endlocal
%macro.diffTimeRaw% t5 t6 base_expand
%macro.diffTimeRaw% t6 t7 base_set_unset
REM set /a time_fill=base-fill
set /a time_setlocal_endlocal=base_setlocal_endlocal-base
set /a time_set=base_setlocal_set_endlocal-base_setlocal_endlocal
set /a time_unset=base_setlocal_unset_endlocal-base_setlocal_endlocal
set /a time_expand=base_expand-base
set /a time_set_unset_predicted=base+time_set+time_unset
call :padNum time_fill
call :padNum time_setlocal_endlocal
call :padNum time_set
call :padNum time_unset
call :padNum time_expand
call :padNum time_set_unset_predicted
call :padNum base_set_unset
for %%f in (env.txt) do set env_size=      %%~zf
echo    %env_size:~-7%   %time_fill%   %time_setlocal_endlocal%   %time_set%   %time_unset%   %time_expand%
del env.txt
exit /b

:padNum
set "%1=       !%1!"
set "%1=!%1:~-7,5!.!%1:~-2!"
set "%1=!%1: .=0.!"
set "%1=!%1:. =.0!"
exit /b

:diffTimeRaw tStart tEnd result=
for /F "tokens=1-4 delims=:." %%a in ("!%1!") do set /A tStart=((%%a*60+10%%b%%100)*60+10%%c%%100)*100+10%%d%%100
for /F "tokens=1-4 delims=:." %%a in ("!%2!") do set /A tEnd=((%%a*60+10%%b%%100)*60+10%%c%%100)*100+10%%d%%100
set /A %3=tEnd-tStart
exit /B

:ReserveEnvironmentSpace sizeInKB
rem Define the first large variable (reserving 6 bytes for variable name)
set z1=X
for /L %%i in (1,1,12) do set z1=!z1!!z1!
set z1=!z1!!z1:~8!
rem Define the rest of large variables
set /A lastVar=%1 / 8
for /L %%i in (2,1,%lastVar%) do set z%%i=!z1!
rem Delete all the large variables in bottom-up order
for /L %%i in (%lastVar%,-1,1) do set z%%i=
exit /B

Antonio

Aacini
Expert
Posts: 1885
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Why does SET performance degrade as environment size gro

#13 Post by Aacini » 14 Dec 2011 07:48

Here it is a third test that use AllocateEnvironmentSpace/FreeEnvironmentSpace instead of ReserveEnvironmentSpace as described above; result times behaved as expected:

Code: Select all

   Environ       Grow    SetLocal-
      Size     Environ   EndLocal       Set      Unset     Expand
   --------------------------------------------------------------
      4678       0.00       0.48       0.07       0.06       0.13   22%
     15416       0.00       0.67       0.17       0.11       0.21   22%
     36017       0.01       1.05       0.28       0.21       0.21   47%
     77220       0.06       1.78       0.47       0.40       0.31   53%
    159672       0.20       3.24       0.91       0.81       0.48   54%
    324633       0.74       6.22       1.67       1.62       0.79   58%
    654553       2.50      12.67       3.39       3.26       1.44   58%
   1314669      11.52      27.42       6.76       6.61       3.07   60%
The first lines of this third test are shown below:

Code: Select all

@echo off
setlocal enableDelayedExpansion

set macro.diffTimeRaw=call :diffTimeRaw

call :AllocateEnvironmentSpace 1290

call :TheProcess
exit /B


:TheProcess
setlocal EnableDelayedExpansion
call :FreeEnvironmentSpace

set "test=a"
for /l %%n in (1 1 10) do set "test=!test!!test!"
set buf1=%test%
etc...

Ed Dyreen
Expert
Posts: 1569
Joined: 16 May 2011 08:21
Location: Flanders(Belgium)
Contact:

Re: Why does SET performance degrade as environment size gro

#14 Post by Ed Dyreen » 01 Nov 2012 05:16

I tried to reduce the negative effects of macros on global performance.

One of the first things I did was to wait loading macros until they were needed instead of preLoading them.
When I implemented The Matrix screenSaver I noticed the rendering was still around 10 times slower.

So I build freeMem_( preserveVar, etc.. ) that alllows me to unLoad.
It did speed up the screenSaver but the macroLess screenSaver still outperforms me as I only unloaded the current local scope.

Liviu
Expert
Posts: 470
Joined: 13 Jan 2012 21:24

Re: Why does SET performance degrade as environment size gro

#15 Post by Liviu » 01 Nov 2012 09:44

dbenham wrote:Question - Does anyone have any better insight as to why SET degrades linearly as the environment grows :?:

As you guessed, the environment is indeed stored in contiguous memory. Also, the list is stored in sorted order, so each insertion causes a search-and-shift-to-make-room. I believe the latter is the dominant factor in the linear slowdown (since I guess the memory block is allocated with some granularity, possibly 4KB pages, so not every new variable would trigger a reallocation).

From http://msdn.microsoft.com/en-us/library/windows/desktop/ms682009(v=vs.85).aspx: Each process has an environment block associated with it. The environment block consists of a null-terminated block of null-terminated strings (meaning there are two null bytes at the end of the block), where each string is in the form: name=value. All strings in the environment block must be sorted alphabetically by name. The sort is case-insensitive, Unicode order, without regard to locale.

Liviu

Post Reply