Performance Issues with Code

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
SIMMS7400
Posts: 475
Joined: 07 Jan 2016 07:47

Performance Issues with Code

#1 Post by SIMMS7400 » 04 Aug 2020 07:41

Hi Folks -

A few months back, Atonio helped me with a chunk of code that determined "min" and "max" month in a data file. I expanded it a bit more to except months in the following formats:
1,01,Jan and January
It's working fine. However, performance is not so great. 1600 lines takes a few minutes to run and I have data files that are 40k rows at times. Is there any way to speed up this code? I assume to the call to the string length function causes this to slow down quite a bit?

Sample file:
"Years"|"Period"|"Scenario"|"Version"|"Plan Element"|"Account"|"Entity"|"Funding"|"State"|"Segment"|"Department"|"Product"|"Amount"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"7020"|"150"|"No Fund"|"No State"|"No Segment"|"150ADCO152"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"7140"|"150"|"No Fund"|"No State"|"No Segment"|"150ADCO152"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"7750"|"150"|"No Fund"|"No State"|"No Segment"|"150ADCO152"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"6010"|"100"|"No Fund"|"No State"|"No Segment"|"100HCCN550"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"6010"|"100"|"No Fund"|"No State"|"No Segment"|"100HCCO100"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"6010"|"100"|"No Fund"|"No State"|"No Segment"|"100HCCO101"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"6010"|"100"|"No Fund"|"No State"|"No Segment"|"100HCCO105"|"No Product"|"100"
"2020"|"Jan"|"Actual"|"PreAlloc"|"TLoad"|"6010"|"100"|"No Fund"|"No State"|"No Segment"|"100HCCO150"|"No Product"|"100"
"2020"|"Feb"|"Actual"|"PreAlloc"|"TLoad"|"6010"|"100"|"No Fund"|"No State"|"No Segment"|"100HCCO154"|"No Product"|"100"
Code:

Code: Select all

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
		SET /A "MAXM=-99999", "MINM=99999"
		FOR /F "skip=1 USEBACKQ tokens=1-2 delims=|" %%a IN ("test.txt") DO ( 
            SET "MONTH=%%~b"
            ECHO "!MONTH!"| FINDSTR /r "^[1-9][0-9]*$">NUL || (
                SET "MONTH=!MONTH:~0,3!"
                IF "!MONTH!"=="Jan" SET "MONTH=01"
                IF "!MONTH!"=="Feb" SET "MONTH=02"
                IF "!MONTH!"=="Mar" SET "MONTH=03"
                IF "!MONTH!"=="Apr" SET "MONTH=04"
                IF "!MONTH!"=="May" SET "MONTH=05"
                IF "!MONTH!"=="Jun" SET "MONTH=06"
                IF "!MONTH!"=="Jul" SET "MONTH=07"
                IF "!MONTH!"=="Aug" SET "MONTH=08"
                IF "!MONTH!"=="Sep" SET "MONTH=09"
                IF "!MONTH!"=="Oct" SET "MONTH=10"
                IF "!MONTH!"=="Nov" SET "MONTH=11"
                IF "!MONTH!"=="Dec" SET "MONTH=12"
            ) && (
                CALL :STRLEN RESULT MONTH
                IF "!RESULT!"=="1" SET "MONTH=0%%~b"
            )

            IF !MONTH! GTR !MAXM! SET "MAXM=!MONTH!"
            IF !MONTH! LSS !MINM! SET "MINM=!MONTH!"
		)
        
        echo %MINM%
        echo %MAXM%
        pause
        
        
 :STRLEN <resultVar> <stringVar>
(   
    SET "S=!%~2!#"
    SET "LEN=0"
    FOR %%P IN (4096 2048 1024 512 256 128 64 32 16 8 4 2 1) DO (
        IF "!S:~%%P,1!" NEQ "" ( 
            SET /a "LEN+=%%P"
            SET "S=!S:~%%P!"
        )
    )
)
( 
    ENDLOCAL
    SET "%~1=%LEN%"
    EXIT /B
)

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#2 Post by ShadowThief » 04 Aug 2020 08:14

Since you're only checking to see if the string has one character, you can replace

Code: Select all

CALL :STRLEN RESULT MONTH
IF "!RESULT!"=="1" SET "MONTH=0%%~b"
with

Code: Select all

if "!MONTH:~1,1!"=="" set "MONTH=0%%~b"

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#3 Post by ShadowThief » 04 Aug 2020 08:51

Or if you don't mind adding a ton of arrays, you can manually set the values for each possible variable (on my machine, this can process half a million lines in 26 seconds).

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
echo %TIME%

set /a "month_val[January]=1",   "month_val[Jan]=1",  "month_val[1]=1", "month_val[01]=1"
set /a "month_val[February]=2",  "month_val[Feb]=2",  "month_val[2]=2", "month_val[02]=2"
set /a "month_val[March]=3",     "month_val[Mar]=3",  "month_val[3]=3", "month_val[03]=3"
set /a "month_val[April]=4",     "month_val[Apr]=4",  "month_val[4]=4", "month_val[04]=4"
set /a "month_val[May]=5",                            "month_val[5]=5", "month_val[05]=5"
set /a "month_val[June]=6",      "month_val[Jun]=6",  "month_val[6]=6", "month_val[06]=6"
set /a "month_val[July]=7",      "month_val[Jul]=7",  "month_val[7]=7", "month_val[07]=7"
set /a "month_val[August]=8",    "month_val[Aug]=8",  "month_val[8]=8", "month_val[08]=8"
set /a "month_val[September]=9", "month_val[Sep]=9",  "month_val[9]=9", "month_val[09]=9"
set /a "month_val[October]=10",  "month_val[Oct]=10",                   "month_val[10]=10"
set /a "month_val[November]=11", "month_val[Nov]=11",                   "month_val[11]=11"
set /a "month_val[December]=12", "month_val[Dec]=12",                   "month_val[12]=12"

SET /A "MAXM=-99999", "MINM=99999"
FOR /F "skip=1 USEBACKQ tokens=1-2 delims=|" %%a IN ("test.txt") DO ( 
    SET "MONTH=!month_val[%%~b]!"
    IF !MONTH! GTR !MAXM! SET "MAXM=!MONTH!"
    IF !MONTH! LSS !MINM! SET "MINM=!MONTH!"
)

echo %MINM%
echo %MAXM%
echo %TIME%
exit /b

SIMMS7400
Posts: 475
Joined: 07 Jan 2016 07:47

Re: Performance Issues with Code

#4 Post by SIMMS7400 » 04 Aug 2020 10:13

Shadow -

Holy Shit** - this is an incredible performance increase!!! It just took a process from ~2 hours down to 8 seconds, wow! Thank you!

One thing is, I copied your code before you made your second edit, adding the SET /A logic. Prior to your edit, "Jan" in the file was returning as "01" which is correct. However now it's returning as "1", which poses an issue for me.

Is there a way to account for that in your new updates or should I should use this logic as you provided earlier to ensure they are 2 digits:

Code: Select all

if "!MONTH:~1,1!"=="" set "MONTH=0%%~b"
Thank you again Shadow, this isawesome!!

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#5 Post by ShadowThief » 04 Aug 2020 10:35

I had to do that because SET /A doesn't allow 08 or 09, since prefixing a number with 0 means it gets interpreted as octal, and octal numbers can't end in 8 or 9.

If you need MINM and MAXM to be zero-padded, you can just add

Code: Select all

if !MINM! LSS 10 set "MINM=0!MINM!"
if !MAXM! LSS 10 set "MAXM=0!MAXM!"
after the for loop.

Eureka!
Posts: 102
Joined: 25 Jul 2019 18:25

Re: Performance Issues with Code

#6 Post by Eureka! » 04 Aug 2020 11:54

Purely out of curiousity ...

Would replacing this:

Code: Select all

FOR /F "skip=1 USEBACKQ tokens=1-2 delims=|" %%a IN ("test.txt") DO ( 
    SET "MONTH=!month_val[%%~b]!"
with:

Code: Select all

FOR /F "skip=1 USEBACKQ tokens=2 delims=|" %%a IN ("test.txt") DO ( 
    SET "MONTH=!month_val[%%~a]!"
make the code any faster? (it would save on setting / un-setting an extra variable every round/line)

SIMMS7400
Posts: 475
Joined: 07 Jan 2016 07:47

Re: Performance Issues with Code

#7 Post by SIMMS7400 » 04 Aug 2020 12:03

Shadow -

Awesome, that's exactly what I did and it's working great. Thank you! Just for my own curiosity, what's the advantage of the SET /A logic vs the previous way of just listing out the different arrays other than a smaller block of code? Any performance increases?

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#8 Post by ShadowThief » 04 Aug 2020 12:04

No performance increases whatsoever, it just saved some lines by being able to set multiple values on the same line. It's purely for aesthetic reasons.

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#9 Post by ShadowThief » 04 Aug 2020 12:06

Eureka! wrote:
04 Aug 2020 11:54
Purely out of curiousity ...

Would replacing this:

Code: Select all

FOR /F "skip=1 USEBACKQ tokens=1-2 delims=|" %%a IN ("test.txt") DO ( 
    SET "MONTH=!month_val[%%~b]!"
with:

Code: Select all

FOR /F "skip=1 USEBACKQ tokens=2 delims=|" %%a IN ("test.txt") DO ( 
    SET "MONTH=!month_val[%%~a]!"
make the code any faster? (it would save on setting / un-setting an extra variable every round/line)
In my tests, it saved about 2-3 seconds on a million-line file, so I don't think they'd see any benefits on their side, unfortunately.

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#10 Post by ShadowThief » 04 Aug 2020 12:12

However, "not setting a variable" gave me an idea to not set the !MONTH! variable at all, and now the script runs in about half the time.

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
echo %TIME%

:: Month hashes
set /a "month_val[January]=1",   "month_val[Jan]=1",  "month_val[1]=1", "month_val[01]=1"
set /a "month_val[February]=2",  "month_val[Feb]=2",  "month_val[2]=2", "month_val[02]=2"
set /a "month_val[March]=3",     "month_val[Mar]=3",  "month_val[3]=3", "month_val[03]=3"
set /a "month_val[April]=4",     "month_val[Apr]=4",  "month_val[4]=4", "month_val[04]=4"
set /a "month_val[May]=5",                            "month_val[5]=5", "month_val[05]=5"
set /a "month_val[June]=6",      "month_val[Jun]=6",  "month_val[6]=6", "month_val[06]=6"
set /a "month_val[July]=7",      "month_val[Jul]=7",  "month_val[7]=7", "month_val[07]=7"
set /a "month_val[August]=8",    "month_val[Aug]=8",  "month_val[8]=8", "month_val[08]=8"
set /a "month_val[September]=9", "month_val[Sep]=9",  "month_val[9]=9", "month_val[09]=9"
set /a "month_val[October]=10",  "month_val[Oct]=10",                   "month_val[10]=10"
set /a "month_val[November]=11", "month_val[Nov]=11",                   "month_val[11]=11"
set /a "month_val[December]=12", "month_val[Dec]=12",                   "month_val[12]=12"

SET /A "MAXM=-99999", "MINM=99999"
FOR /F "skip=1 usebackq tokens=2 delims=|" %%a IN ("test.txt") DO ( 
    IF !month_val[%%~a]! GTR !MAXM! SET "MAXM=!month_val[%%~a]!"
    IF !month_val[%%~a]! LSS !MINM! SET "MINM=!month_val[%%~a]!"
)

if !MINM! LSS 10 set "MINM=0!MINM!"
if !MAXM! LSS 10 set "MAXM=0!MAXM!"

echo %MINM%
echo %MAXM%
echo %TIME%
exit /b

SIMMS7400
Posts: 475
Joined: 07 Jan 2016 07:47

Re: Performance Issues with Code

#11 Post by SIMMS7400 » 04 Aug 2020 12:45

Hi both -

I do need token 1 as I do want to extract the year, but this runs is such quick time anymore, keeping that in make little difference.

Absolutely incredible performance gains on this, thank you Shadow!!! Very much appreciated!

Aacini
Expert
Posts: 1670
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: Performance Issues with Code

#12 Post by Aacini » 04 Aug 2020 14:41

Perhaps this would run a little faster...

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Empty environment
(
   for /F "delims==" %%a in ('set') do set "%%a="
   set "ComSpec=%ComSpec%"
)

set /A "i=0, j=100"
for %%a in (January February March April May June July August September October November December) do (
   set /A i+=1, j+=1
   set "month=%%a"
   set /A "m%%a=i, m!month:~0,3!=i, m!i!=i, m!j:~1!=i"
)

SET /A "zMAXM=1, zMINM=12"
FOR /F "skip=1 USEBACKQ tokens=2 delims=|" %%a in ("test.txt") DO (
   set /A "zDiff=zMAXM-m%%~a, zMAXM+=(zDiff>>31)*zDiff, zDiff=m%%~a-zMINM, zMINM-=(zDiff>>31)*zDiff"
)

if %zMAXM% lss 10 set "zMAXM=0%zMAXM%"
if %zMINM% lss 10 set "zMINM=0%zMINM%"

echo Min: %zMINM%
echo Max: %zMAXM%
Antonio

PS - Please, post the timing... Thanks

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#13 Post by ShadowThief » 04 Aug 2020 15:35

My tests have your code process one million lines in an average of 60 seconds.

Eureka!
Posts: 102
Joined: 25 Jul 2019 18:25

Re: Performance Issues with Code

#14 Post by Eureka! » 04 Aug 2020 16:25

ShadowThief wrote:
04 Aug 2020 12:06
In my tests, it saved about 2-3 seconds on a million-line file, so I don't think they'd see any benefits on their side, unfortunately.
Thanks, @ShadowThief!

Inspired by @Aacini's solution, some pseudo-code as I dont have the time and experience to convert this to proper code:

Code: Select all

Instead of month n=1..12, set month= 2^^n -1 (1.. 4095)
set /a min=4095, max=0
For loop:
  set /a min="min & month", max="max | month"

After the for-loop, convert 2^^n - 1 back to n.
 
Might be faster ..

ShadowThief
Expert
Posts: 986
Joined: 06 Sep 2013 21:28
Location: Virginia, United States

Re: Performance Issues with Code

#15 Post by ShadowThief » 04 Aug 2020 17:32

Eureka! wrote:
04 Aug 2020 16:25
ShadowThief wrote:
04 Aug 2020 12:06
In my tests, it saved about 2-3 seconds on a million-line file, so I don't think they'd see any benefits on their side, unfortunately.
Thanks, @ShadowThief!

Inspired by @Aacini's solution, some pseudo-code as I dont have the time and experience to convert this to proper code:

Code: Select all

Instead of month n=1..12, set month= 2^^n -1 (1.. 4095)
set /a min=4095, max=0
For loop:
  set /a min="min & month", max="max | month"

After the for-loop, convert 2^^n - 1 back to n.
 
Might be faster ..
I'm getting between 60 and 80 seconds for a million rows for this, likely because two set statements are being run every single iteration.

Post Reply