RAID 1.3 via DOS?
Moderator: DosItHelp
RAID 1.3 via DOS?
So here's the dilemma. RAID 1 is really great for redundancy--except when one of the bits on one of the drives gets corrupted. Then you don't know which is the 'correct' file!
I no longer use automated RAID1 because of this. So instead I manually mirror using xxcopy to 3 drives--what I dub RAID 1.3. So now if a bit changes on one of the drives, a simple compare to the other two drives usually sniffs out the bad copy. The problem is automating this task.
Because you guys are the sharpest batch file people I've ever seen, I'd like to hear your ideas on implementing a batch operation to do this 'error-checking'.
It would basically be given a set of three drive letters/paths. It would use one as the source. It would traverse down the tree and compare each file with the same file (with the same path since it's a mirror copy) on the other drives. It will note any discrepancies and automatically go into a comparison mode with all three drives being source to determine the bad file in a set of discrepancies. It will then, as an option, automatically replace the bad file with a good copy, with another option to rename the bad file to *.BAD. This would have to work on long filenames and be able to run in win9x/2000/xp environments or just xp if it's not feasible for the win9x/2000 command.com.
I know this is actually beyond my own batch file skills, but I'd love to attempt it with your help. Thank you!
I no longer use automated RAID1 because of this. So instead I manually mirror using xxcopy to 3 drives--what I dub RAID 1.3. So now if a bit changes on one of the drives, a simple compare to the other two drives usually sniffs out the bad copy. The problem is automating this task.
Because you guys are the sharpest batch file people I've ever seen, I'd like to hear your ideas on implementing a batch operation to do this 'error-checking'.
It would basically be given a set of three drive letters/paths. It would use one as the source. It would traverse down the tree and compare each file with the same file (with the same path since it's a mirror copy) on the other drives. It will note any discrepancies and automatically go into a comparison mode with all three drives being source to determine the bad file in a set of discrepancies. It will then, as an option, automatically replace the bad file with a good copy, with another option to rename the bad file to *.BAD. This would have to work on long filenames and be able to run in win9x/2000/xp environments or just xp if it's not feasible for the win9x/2000 command.com.
I know this is actually beyond my own batch file skills, but I'd love to attempt it with your help. Thank you!
Re: RAID 1.3 via DOS?
It is not recommended to do that using batch... .
If you want a fast and secure way to do this i recommend you to learn c++ or something similar and some little bit about CRC32 and MD5. If you do that you are able to detect even small parts of files that had been corrupted. Also you need to read much fewer bytes from disk, as when using batch jobs. C++ is faster anyway and there are more advantages, but i don't want to talk too much about C++ as this far away from scripting.
Beside the files can get unverifyable easily, if 2 disks may produce errors.
If i were forced to program it using batch i intuitively would do it in this, or a similar, way (warning untested):
penpen
If you want a fast and secure way to do this i recommend you to learn c++ or something similar and some little bit about CRC32 and MD5. If you do that you are able to detect even small parts of files that had been corrupted. Also you need to read much fewer bytes from disk, as when using batch jobs. C++ is faster anyway and there are more advantages, but i don't want to talk too much about C++ as this far away from scripting.
Beside the files can get unverifyable easily, if 2 disks may produce errors.
If i were forced to program it using batch i intuitively would do it in this, or a similar, way (warning untested):
Code: Select all
:main
@echo off
cls
setlocal enableDelayedExpansion
:: drive set
set "driveSet=X Y Z"
set "hdds=%driveSet: =%"
for %%a in (0 1 2) do set "hdd[%%a]=!hdds:~%%a,1!"
set "directories=directories.txt"
set "files=files.txt"
set "tmpFile=temp.txt"
set "log=log.txt"
:: build %files% and %directories% to verify, eliminating multiple entries
(rem:)>"%log%"
(
echo building %directories%
(for %%a in (0 1 2) do for /f "tokens=1,* delims=:" %%b in ('dir !hdd[%%a]!:\ /A:D /B /O:N /S') do echo %%c) > "%directories%"
sort "%directories%" /O "%tmpFile%"
(
set "lastLine="
for /f "tokens=* delims=" %%b in ('findstr "^" "%tmpFile%"') do (
if not "%%b" == "!lastLine!" (
echo %%~b
set "lastLine=%%~b"
)
)
) > "%directories%"
del "%tmpFile%"
echo building %directories%: finished
) >> "%log%"
(
echo building %files%
(for %%a in (0 1 2) do for /f "tokens=1,* delims=:" %%b in ('dir !hdd[%%a]!:\ /A:-D /B /O:N /S') do echo %%c) > "%files%"
sort "%files%" /O "%tmpFile%"
(
set "lastLine="
for /f "tokens=* delims=" %%a in ('findstr "^" "%tmpFile%"') do (
if not "%%a" == "!lastLine!" (
echo %%~a
set "lastLine=%%~a"
)
)
) > "%files%"
del "%tmpFile%"
echo building %files%: finished
) >> "%log%"
:: check for directory-file-name-collision and missing directories, create it if posssible
(
echo checking directories
set "errors="
set "arg="
for /f "tokens=* delims=" %%b in ('findstr "^" "%directories%"') do (
set "args="
for %%a in (0 1 2) do (
for %%c in ("!hdd[%%a]!:%%~b") do (
if not exist %%c md %%c
if not exist %%c (
set "args=!args!F"
) else (
set "arg=%%~ac"
set "args=!args!!arg:~0,1!"
)
)
)
if not "!args:F=!" == "!args!" (
echo checking directories: could not create missing diretory: %%c
set "errors=true"
set "args=!args:F=!"
)
if defined args if not "!args:D=!" == "" (
echo checking directories: detected directory-file-name collision: %%b
set "errors=true"
)
)
if defined errors set "errors=with errors"
echo checking directories: finished !errors!
) >> "%log%"
if defined errors (
echo There were errors, that cannot be fixed automatically.
echo See %log% for further informations.
exit /b 1
)
:: checking files, create files if needed and possible
:: using coded ok in positive logic: bit 2: ok20, bit 1: ok12, bit 0: ok01
:: source: space(ok) --> driveSet
set "source(1)=%hdd[1]%"
set "source(2)=%hdd[2]%"
set "source(3)=%hdd[2]%"
set "source(4)=%hdd[0]%"
set "source(5)=%hdd[0]%"
:: target: space(ok) --> driveSet
set "target(1)=%hdd[2]%"
set "target(2)=%hdd[0]%"
set "target(3)=%hdd[0]%"
set "target(4)=%hdd[1]%"
set "target(5)=%hdd[1]%"
(
echo checking files
for /f "tokens=* delims=" %%b in ('findstr "^" "%files%"') do (
set "ok=7"
if not exist %hdd[0]%:%%b set /A "ok&=2"
if not exist %hdd[1]%:%%b set /A "ok&=4"
if not exist %hdd[2]%:%%b set /A "ok&=1"
if !ok! GEQ 4 (
fc /A /B "%hdd[0]%:%%b" "%hdd[2]%:%%b" > nul
if errorlevel 1 set "ok&=3"
)
set /A "check=!ok!&2"
if !check! == 2 (
fc /A /B "%hdd[1]%:%%b" "%hdd[2]%:%%b" > nul
if errorlevel 1 set "ok&=5"
)
if !ok! == 1 (
fc /A /B "%hdd[0]%:%%b" "%hdd[1]%:%%b" > nul
if errorlevel 1 set "ok&=6"
)
if not !ok! == 7 (
if !ok! GTR 0 (
for %%c in (!ok!) do (
set "source=!source(%%c)!:%%b"
set "target=!target(%%c)!:%%b"
)
if not exist !target! (rem:)>"!target!"
xcopy %source% %target% /V /H /R /K /O /X /Y > nul
if errorlevel 1 fc /A /B %source% %target% > nul
if errorlevel 1 (
echo checking files: updating failed: source: %source%
echo checking files: updating failed: target: %source%
set "errors=true"
)
) else (
echo checking files: unverifiable %%b
set "errors=true"
)
)
)
if defined errors set "errors=with errors"
echo checking files: finished !errors!
) >> "%log%"
if defined errors (
echo There were errors, that cannot be fixed automatically.
echo See %log% for further informations.
exit /b 1
)
echo Finished software raid 1.3 update successfully.
goto :eof
penpen
Re: RAID 1.3 via DOS?
I like batch because it's stable, available, and relatively fast. I've taken some courses in C, but it gets really complicated and just as IO intensive if the entire file is being read. There are MD5 generator/verifiers out there, but that doesn't really do the job of comparing one file to another. And if an md5 file itself gets corrupted, well that becomes another issue in itself.penpen wrote:It is not recommended to do that using batch... .
If you want a fast and secure way to do this i recommend you to learn c++ or something similar and some little bit about CRC32 and MD5. If you do that you are able to detect even small parts of files that had been corrupted. Also you need to read much fewer bytes from disk, as when using batch jobs. C++ is faster anyway and there are more advantages, but i don't want to talk too much about C++ as this far away from scripting.
Beside the files can get unverifyable easily, if 2 disks may produce errors.
Definitely an amazing amount batch to read! Thank you for sharing! It's going to take me weeks if not months to have enough time to parse all the way through it, and I'm sure I'll have some questions. But this is a great start on this project.
Re: RAID 1.3 via DOS?
No. If you have file F on 3 drives A,B,C then you have to compare it pairwise using batch, so you load at least this file (from different locations) at 4 times if there is no difference, or up to 6 times if one file location is corrupted.Samir wrote:... but it gets really complicated and just as IO intensive if the entire file is being read.
For example:
Code: Select all
:: A:\F ?= B:\F read two times:
fc /b A:\F B:\F
:: B:\F ?= C:\F read two times:
fc /b B:\F C:\F
:: (not A:\F == B:\F) and (not B:\F == C:\F) because B:\F differs, then read another 2 times
fc /b A:\F C:\F
This is only the simplest use of using these checksums, and the use of it should not compare them:Samir wrote:There are MD5 generator/verifiers out there, but that doesn't really do the job of comparing one file to another.
You can use them smaller blocks of bytes than whole files, and you may overlap these blocks, ... ... .
If you do it in an adequate way you may even restore data of blocks up to 128 bytes: impossible, to do this using simple batch, maybe with VBS or JScript, but i won't bet.
No it is the same problem, as for example the crc32 checksum is part of the protected data:Samir wrote:And if an md5 file itself gets corrupted, well that becomes another issue in itself.
crc32 algorithm on (data, 0x00000000) gives you the crc32-checksum and
crc32 algorithm on (data, crc32-checksum) gives you 0x00000000 as the new crc32-checksum.
Edit: Sorry On pure MD5 checksum it is INDEED another issue, but not last because of that crc32 should be used additionally.
penpen
Re: RAID 1.3 via DOS?
True, but if most of the files are the same, you still have to read them completely. The only time saved will be on corrupt files--which we hope would not be many!penpen wrote:No. If you have file F on 3 drives A,B,C then you have to compare it pairwise using batch, so you load at least this file (from different locations) at 4 times if there is no difference, or up to 6 times if one file location is corrupted.Samir wrote:... but it gets really complicated and just as IO intensive if the entire file is being read.
For example:Using C++ or similar lets you read this file only 3 times, as you can stop reading the file at any position and compare the data parts in buffers: much faster.Code: Select all
:: A:\F ?= B:\F read two times:
fc /b A:\F B:\F
:: B:\F ?= C:\F read two times:
fc /b B:\F C:\F
:: (not A:\F == B:\F) and (not B:\F == C:\F) because B:\F differs, then read another 2 times
fc /b A:\F C:\FThis is only the simplest use of using these checksums, and the use of it should not compare them:Samir wrote:There are MD5 generator/verifiers out there, but that doesn't really do the job of comparing one file to another.
You can use them smaller blocks of bytes than whole files, and you may overlap these blocks, ... ... .
If you do it in an adequate way you may even restore data of blocks up to 128 bytes: impossible, to do this using simple batch, maybe with VBS or JScript, but i won't bet.No it is the same problem, as for example the crc32 checksum is part of the protected data:Samir wrote:And if an md5 file itself gets corrupted, well that becomes another issue in itself.
crc32 algorithm on (data, 0x00000000) gives you the crc32-checksum and
crc32 algorithm on (data, crc32-checksum) gives you 0x00000000 as the new crc32-checksum.
Edit: Sorry On pure MD5 checksum it is INDEED another issue, but not last because of that crc32 should be used additionally.
penpen
Oh, I'm sure there's all sorts of sophisticated ways to do it even to a block or sector level, but that starts getting into RAID3/4/5. And of course, not really a way to do it with batch unless someone is using debug.
With a crc as part of the file, that does alleviate some of the problem, but it can affect file usability depending on the file format and how the application/os is going to read the file.
Re: RAID 1.3 via DOS?
Even in the good case you are reading only 75% using C++ or similar, instead of using pure batch.
And the checksum must not be really a part of the file, you may decide to compute the md5 hash and crc32 checksum all 2024 (just an exmple) bytes of file data, and store it somewhere else, so it doesn't affect its usability.
penpen
And the checksum must not be really a part of the file, you may decide to compute the md5 hash and crc32 checksum all 2024 (just an exmple) bytes of file data, and store it somewhere else, so it doesn't affect its usability.
penpen
Re: RAID 1.3 via DOS?
Samir wrote:I like batch because it's stable, available, and relatively fast. I've taken some courses in C, but it gets really complicated and just as IO intensive if the entire file is being read.
Going to have to totally disagree with you on that. Any compiled language like C is going to be ten folder faster at performing file operations then batch.
Re: RAID 1.3 via DOS?
Depends on the compiler. If I want to compile this for pure DOS, it's just ancient turbo c, which I've found is no faster than batch on pure file reads.Squashman wrote:Samir wrote:I like batch because it's stable, available, and relatively fast. I've taken some courses in C, but it gets really complicated and just as IO intensive if the entire file is being read.
Going to have to totally disagree with you on that. Any compiled language like C is going to be ten folder faster at performing file operations then batch.
Re: RAID 1.3 via DOS?
I'm a bit puzzled by how by in the good case it's only 75% the reading? If I understand correct, you're saying that a C read will stop when it encounters an error, where as batch will continue reading. If the files are all the same, wouldn't C and batch read the entire file?penpen wrote:Even in the good case you are reading only 75% using C++ or similar, instead of using pure batch.
And the checksum must not be really a part of the file, you may decide to compute the md5 hash and crc32 checksum all 2024 (just an exmple) bytes of file data, and store it somewhere else, so it doesn't affect its usability.
penpen
Computing an md5 and crc32 sidecar file would work, but then there's even more files. Although the comparison would be much quicker since the files would be smaller.
Re: RAID 1.3 via DOS?
If it is an ANSI C/C++ compiler, then it depends on the knowledge (and an iron will) of the programmer,Samir wrote:Depends on the compiler. If I want to compile this for pure DOS, it's just ancient turbo c, which I've found is no faster than batch on pure file reads.
as you can use assembler code, or at minimum opcode, and construct file streams and buffers by yourself.
No, this is not what i've meant.Samir wrote:I'm a bit puzzled by how by in the good case it's only 75% the reading? If I understand correct, you're saying that a C read will stop when it encounters an error, where as batch will continue reading. If the files are all the same, wouldn't C and batch read the entire file?
You are using fc to compare files, and the good case is that all files are all the same.
Lets assume these files with the same content are named A, B and C.
With fc you have to do one of the following cases, to determine if they are equal:
Code: Select all
:: case 1: compare {A, B}, compare {A, C}
:: case 2: compare {A, B}, compare {B, C}
:: case 3: compare {A, C}, compare {A, B}
:: case 4: compare {A, C}, compare {C, B}
:: case 5: compare {B, C}, compare {B, A}
:: case 6: compare {B, C}, compare {C, A}
:: so all cases are equivalent, you have to do this: (just shown for case 1:
fc /b A B
fc /b A C
In the whole you have read this file 4 times (from different locations).
With ANSI C, C++, or any other language with the needed capabilities, you just read (in full or in parts) the files from disk into RAM.
Then you just compare the content of the (RAM) buffer with each others.
Doing it this way, you have only read the (same) file (from different locations)only 3 times from disk.
So you read only 3/4 = 75% of the data from disk.
And because of the low (compared to RAM) speed of the hdds you additionally need nearly 3/4 of the time.
penpen
Re: RAID 1.3 via DOS?
Ahh, I know what you mean about the compilers as assembly can be tedious but brutally efficient.
And your 75% case makes much more sense now. But given that most file sizes are within the cache limits of the drive or operating system, would it be safe to bet on file A (in your example) being a cache hit vs being re-read?
And your 75% case makes much more sense now. But given that most file sizes are within the cache limits of the drive or operating system, would it be safe to bet on file A (in your example) being a cache hit vs being re-read?
Re: RAID 1.3 via DOS?
It is always unsafe, to bet on any state on caches, except you are writing a cache driver, that should know its own internal state.
But i didn't bet on any cache state in my above example;
i 've used buffering instead of caching:
penpen
But i didn't bet on any cache state in my above example;
i 've used buffering instead of caching:
So there are no hit/(re)read problems.penpen wrote:Then you just compare the content of the (RAM) buffer with each others.
penpen
Re: RAID 1.3 via DOS?
Makes sense except you'd have to enough RAM as 2x the largest file size.penpen wrote:It is always unsafe, to bet on any state on caches, except you are writing a cache driver, that should know its own internal state.
But i didn't bet on any cache state in my above example;
i 've used buffering instead of caching:So there are no hit/(re)read problems.penpen wrote:Then you just compare the content of the (RAM) buffer with each others.
penpen
Re: RAID 1.3 via DOS?
There is no need to create a buffer that can hold any file on your system.
It suffices to use a buffer with the capacity to store the hdd cache content to.
So 4 MB per file on most systems is all you need to compare the file content piecewise.
penpen
It suffices to use a buffer with the capacity to store the hdd cache content to.
So 4 MB per file on most systems is all you need to compare the file content piecewise.
penpen
Re: RAID 1.3 via DOS?
Forgive me for not completely understanding, but if it's a piecewise comparison, won't the whole file minus that in the buffer and cache still have to be read?penpen wrote:There is no need to create a buffer that can hold any file on your system.
It suffices to use a buffer with the capacity to store the hdd cache content to.
So 4 MB per file on most systems is all you need to compare the file content piecewise.
penpen