Optimization of Mass Copy Via Batch

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Optimization of Mass Copy Via Batch

#1 Post by Samir » 18 May 2022 13:22

So this is more of a theoretical discussion to get an algorithm down as the coding itself should be fairly simple.

So there are x about of nas units, each unit with a unique IP\path. ie NAS1 with path \\IP_NAS1\PATH_NAS1, NAS2 with path \\IP_NAS2\PATH_NAS2, etc. The paths will be unique as well as the IP.

The data set is the same, but only one unit is the source or original. The others are copies that need to be replicated to. The source unit will always have the same IP and path.

Since there are multiple units, once replication has occurred to another nas unit, that nas unit too can be used as a source, so now you have 2x sources. Once copies have been made to 4, you have 4x sources, etc. This can have a tremendous advantage in speed over simple copying the source over and over to each destination.

There is a certain point where you will have a 1:1 ratio between sources and destinations, which would allow the optimal speed for replication since each unit can run at full speed (negating any network bandwidth issues).

But how do you know when you've hit that 1:1 ratio? And how do you dynamically assign destinations to become sources?

Now let's throw another thing into the mix--not all x units will be on all the time, so you may have let's say 5 on one day and 20 another.

There are also some constants--certain nas units that will always be on 24x7 as well as the source which will be on 24x7. Let's say there are 5 of these (because I can't find my paper where I put the real number).

Currently, I have a batch file that replicates from the source nas to a second nas which then are both used to replicate to other nas units. But I haven't figured out how to make the scaling automatic to go larger than this. I think the identification of if nas units are available or not will be a simple 'IF EXIST \\IP\PATH\nul', and there's a single ROBOCOPY command which handles the replication.

Okay, that's it! I'd love to hear ideas on how to approach this better than just hard coding everything and then having to change it each time a new NAS unit is added. Thank you in advance!

penpen
Expert
Posts: 1991
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Optimization of Mass Copy Via Batch

#2 Post by penpen » 19 May 2022 06:07

I am not familiar with NAS-devices, so the following idea might or might not be possible; typically when you want to send the same data to multiple recipients simultaneously, you would use broadcast (send to all nodes in the network) or multicast (send to multiple nodes in a network) in order to reduce network usage - here i would suggest to use multicast.
But you won't be able to do that using batch, so you would have to write a program using C++, Java, or whatever.

penpen

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Optimization of Mass Copy Via Batch

#3 Post by Samir » 19 May 2022 14:50

The way I would be using the NAS devices would be as standard windows network shares. I'm sure I could do something more advanced with rsync and the like, but I just want to keep it batch simple so when something goes wrong I have a better idea on how to fix it.

The main challenge is to optimize all the 'sources' when copying to 'destinations'. I've seen some algorithms based on linear algebra, but that was never my strong point so it's back to just doing something manually and hardcoded unless someone comes up with a better idea here.

miskox
Posts: 553
Joined: 28 Jun 2010 03:46

Re: Optimization of Mass Copy Via Batch

#4 Post by miskox » 21 May 2022 10:59

I came up with something:

source.original (contains source folder(s)):

Code: Select all

SOURCE1
target.original (contains target folder(s)):

Code: Select all

TARGET1
TARGET2
TARGET3
TARGET4
TARGET5
TARGET6
TARGET7
TARGET8
TARGET9
TARGET10
TARGET11
copy2many.cmd:

Code: Select all

@echo off
if "%1"=="" (
	copy /y source.original source.txt>nul
	copy /y target.original target.txt>nul
REM remove line below if not needed
	if exist logfile del logfile
)

:0
set /a cnt=1
setlocal enabledelayedexpansion
for /f "tokens=1 delims=" %%f in (source.txt) do set source_path_!cnt!=%%f&set /a cnt+=1
set /a src_max=cnt - 1
set /a cnt=1
for /f "tokens=1 delims=" %%f in (target.txt) do set target_path_!cnt!=%%f&set /a cnt+=1
setlocal disabledelayedexpansion

if not "%1"=="" goto :DO_COPY %1 %2

echo Executing jobs...waiting for completion. 

REM timeout of three seconds to make sure logfile (if present) is not locked and it is better for system resources.
for /L %%c in (1,1,%src_max%) do start "" /min %0 %%c&echo Started job #%%c.&timeout /t 3 /nobreak>nul

set /a countr=1
timeout /t 10 /nobreak>nul
:1
call set srccompl=%%source_path_%countr%%%
call set trgcompl=%%target_path_%countr%%%
if "%trgcompl%"=="" (
	echo Everyting done.
	if exist source.txt del source.txt
	if exist target.txt del target.txt
	goto :EOF
)
REM change timeout below to a higher value (5 seconds just for testing).
if not exist job_complete_%countr%.tmp echo Waiting 5 seconds...&&timeout /t 5&goto :1

echo Job %countr% (from %srccompl% to %trgcompl%) completed.

del job_complete_%countr%.tmp
findstr /x /i /v /c:"%trgcompl%" source.txt >source.tmp&move /y source.tmp source.txt>nul
>>source.txt (echo %trgcompl%)
findstr /x /i /v /c:"%trgcompl%" target.txt >target.tmp&move /y target.tmp target.txt>nul
set /a countr+=1

if %countr% LEQ %src_max% goto :1

echo NEXT ROUND...

endlocal
endlocal
goto :0

REM Below is the actual part that copies files...
:DO_COPY
set par=%1

ECHO started %1
call set sourcepath=%%source_path_%par%%%
call set targetpath=%%target_path_%par%%%

REM if target path is empty (end of table) then exit
if "%targetpath%"=="" exit

REM remove ECHO and change copy command to the correct one
ECHO copy "%sourcepath%\*.*" "%targetpath%"
if "%errorlevel%"=="0" break >job_complete_%par%.tmp
REM remove line below if not needed
ECHO copy "%sourcepath%\*.*" "%targetpath%" >>logfile

exit

What it does?

It copies source.original (and target.original) to *.txt because it will update these two .txt files when job progresses (target path will be added to source.txt file and deleted from the target.txt file).
In the line where jobs are STARTed there could be an IF statement not to start jobs that don't have the target defined - currently jobs exit if target path is not defined.

I am sure aacini, dbenham, agerman... would make it more optimal...but it looks it is working.

Of course...update the copy command etc. . Read source what else should be changed.

Just reread your post: there is no checking if target exists... (you say that some NAS devices can be online some could be offline) (or it will exist after copy operation fails - in this case JOB_COMPLETE_cnt_.TMP file would not be created (change ERRORLEVEL if required (or remove this line)).

Maybe just add another IF target exists...

Saso
Last edited by miskox on 22 May 2022 09:30, edited 1 time in total.

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Optimization of Mass Copy Via Batch

#5 Post by Samir » 21 May 2022 13:57

Thank you! This is an approach I never thought of, and hence why I posted since there are so many ways to skin this cat. :lol:

I'm going to digest this better when I have time to sit down with it, but as you mentioned, I would be using some 'if not exist' to test for the presence of a target so that test should also handle a situation where the target is undefined. 8)

miskox
Posts: 553
Joined: 28 Jun 2010 03:46

Re: Optimization of Mass Copy Via Batch

#6 Post by miskox » 21 May 2022 14:10

Just enter as many lines in source.original and in target.original files as you wish and run it. Check the logfile. You will see that everything is done (right column contains all the lines from target.original).

Saso

miskox
Posts: 553
Joined: 28 Jun 2010 03:46

Re: Optimization of Mass Copy Via Batch

#7 Post by miskox » 22 May 2022 09:33

Update: I updated the code above (added timeout /t 10). We must prevent from updating the source.txt and target.txt files *before* all the jobs are run.

Another two ideas:
- to prevent this jobs could be started with two parameters (source, target)
- also when a job completes the next one could be started immediately to save on time

Of course more error checking should be implemented (if source folder exists, if target node/folder exists...).

Saso

Samir
Posts: 384
Joined: 16 Jul 2013 12:00
Location: HSV
Contact:

Re: Optimization of Mass Copy Via Batch

#8 Post by Samir » 23 May 2022 13:13

Apparently I've asked this type of question before (but less specifically), and there was also an idea presented in that thread as well. I'm noting this here so I can digest that solution as well:
viewtopic.php?t=9164

Post Reply