MD5 Message-Digest Algorithm

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

MD5 Message-Digest Algorithm

#1 Post by Magialisk » 02 May 2014 22:58

Good evening all. Since I got some great feedback last time on my implementation of AES encryption in batch, I thought I'd post here my latest crypto-related development. This one implements the common MD5 hashing algorithm, also known as RFC 1321: http://www.ietf.org/rfc/rfc1321.txt. This is a cryptographically secure* one-way hashing algorithm (* - see notes at the end) still in wide use all over the world. You can read more about it [wikipedia.org/wiki/MD5]

The script takes a single parameter which can be either the name of a file or a string. The hash/digest is written to the console and optionally stored in a variable, passed as the second parameter. First is the source code:

Code: Select all

:: This script implements the MD5 Message-Digest Algorithm in accordance with
:: RFC 1321, which can be found at http://www.ietf.org/rfc/rfc1321.txt.
:: The only known limitation is that files to be hashed by this script must
:: be less than 256 MB in size.
@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
IF "%~1"=="" (
   echo md5 ^<string/filename^> [^<variable^>]
   echo    Calculates the MD5 message-digest of the input string or file.
   echo    Prints the result to stdout and optionally stores it in 'variable'.
   exit /b
)
set "inFile=%~1"

:: Binary integer parts of the sines of integers (in Radians)
set i=0
FOR %%a IN (d76aa478 e8c7b756 242070db c1bdceee f57c0faf 4787c62a a8304613 fd469501
            698098d8 8b44f7af ffff5bb1 895cd7be 6b901122 fd987193 a679438e 49b40821
            f61e2562 c040b340 265e5a51 e9b6c7aa d62f105d 02441453 d8a1e681 e7d3fbc8
            21e1cde6 c33707d6 f4d50d87 455a14ed a9e3e905 fcefa3f8 676f02d9 8d2a4c8a
            fffa3942 8771f681 6d9d6122 fde5380c a4beea44 4bdecfa9 f6bb4b60 bebfbc70
            289b7ec6 eaa127fa d4ef3085 04881d05 d9d4d039 e6db99e5 1fa27cf8 c4ac5665
            f4292244 432aff97 ab9423a7 fc93a039 655b59c3 8f0ccc92 ffeff47d 85845dd1
            6fa87e4f fe2ce6e0 a3014314 4e0811a1 f7537e82 bd3af235 2ad7d2bb eb86d391
) DO set /a "Radians[!i!]=0x%%a, i+=1"

:: Per-round rotation amounts
set i=0
FOR %%a IN (7 12 17 22 7 12 17 22 7 12 17 22 7 12 17 22
            5  9 14 20 5  9 14 20 5  9 14 20 5  9 14 20
            4 11 16 23 4 11 16 23 4 11 16 23 4 11 16 23
            6 10 15 21 6 10 15 21 6 10 15 21 6 10 15 21
) DO set /a "Shift[!i!]=%%a, i+=1"

:: Powers of 2 lookup table for custom right shift operation
:: Only includes the values we could possibly need, to keep ENV size down.
set /a Powers2[3]=8, Powers2[4]=16, Powers2[5]=32, Powers2[6]=64, Powers2[8]=256
set /a Powers2[9]=512, Powers2[10]=1024, Powers2[11]=2048, Powers2[13]=8192
set /a Powers2[14]=16384, Powers2[15]=32768, Powers2[16]=65536, Powers2[19]=524288
set /a Powers2[20]=1048576, Powers2[21]=2097152, Powers2[22]=4194304

:: Output digest initialization
set /a digA0=0x67452301, digB0=0xefcdab89, digC0=0x98badcfe, digD0=0x10325476

:: Map for dec->hex conversion
set "hexMap=0123456789ABCDEF"

:: If the input is not a valid filename, it is treated as a string to be hashed. The string is first written to a
:: temp file (without appending a carriage return), then the script recursively calls itself to hash that file.
IF NOT EXIST !inFile! (
   set recursed=true
   echo|set /p ans="%inFile%">"%temp%\_md5.tmp"
   CALL %~f0 "%temp%\_md5.tmp" %~2
   del "%temp%\_md5.tmp"
   :: Pass the result back over the second ENDLOCAL barrier
   IF NOT "%~2"=="" FOR %%a IN (!md5Digest!) DO ENDLOCAL& set %~2=%%a
   exit /b
)
:: Abort with an error if the input file is larger than 268,435,455 bytes (see block padding description below)
IF %~Z1 GTR 268435455 (
   echo ERROR: File size must be ^<= 268,435,455 bytes ^(256 MB^)
   exit /b
)
:: Convert the input file into a hexadecimal representation and split it into 512-bit little endian blocks.
set "tempFile=%temp%\#.tmp"
del "%tempFile%" 2>NUL
fsutil file createnew "%tempFile%" %~Z1 >NUL
set /A i=0, d=0
set block=
set emptyfile=false
for /F "skip=1 tokens=1,2 delims=: " %%b in ('fc /B "%inFile%" "%tempFile%"') do (
   IF NOT %%c==no (
      set /A b=0x%%b
      if !i! neq !b! (
         set /A c=b-i
         for /L %%i in (!c!,-1,1) do (
            set "block=00!block!"
            set /A d+=1
            if !d! geq 64 (
               set /a d=0
               CALL:do_md5 !block!
               set block=
            )
         )
      )
      set "block=%%c!block!"
      set /A i=b+1, d+=1
      if !d! geq 64 (
         set /a d=0
         CALL:do_md5 !block!
         set block=
      )
   ) ELSE set emptyfile=true
)
IF NOT !emptyfile!==true (
   if !i! neq %~Z1 (
      set /A c=%~Z1-i
      for /L %%i in (!c!,-1,1) do (
         set "block=00!block!"
         set /A d+=1
         if !d! geq 64 (
            set /a d=0
            CALL:do_md5 !block!
            set block=
         )
      )
   )
)
:: Some portion of a 512-bit block will be leftover and needs to be padded before processing.
:: First pad the block with a '1' bit and seven '0' bits (0x80)
set "block=80!block!"
set /a d+=1
if !d! geq 64 (
   set /a d=0
   CALL:do_md5 !block!
   set block=
)
:: If the block is now larger than 448 bits, it needs to be zero-padded to 512, hashed, and
:: a second empty block created to hold the length field.
IF !d! GTR 56 (
   FOR /L %%a IN (!d!,1,63) DO set "block=00!block!"
   set /a d=0
   CALL:do_md5 !block!
   set block=
)
:: Zero-pad the remaining block 4 bits at a time until the total length is 448 bits (512-64)
set /a lenBits=!d!*8
FOR /L %%a IN (!lenBits!,4,444) DO set "block=0!block!"

:: The final 64 bits of padding are set equal to the original size in bits of the input.
:: Since cmd only handles 32-bit numbers, we assume the upper 32-bits are zero and the size
:: is stored in the lower 32 bits only. This supports files up to ~256 MB, much larger than
:: anyone with any sense would ever use this script on.
set /a dec=%~Z1*8
set "hex="
FOR /L %%N IN (1,1,8) do (
    set /a "d=dec&15,dec>>=4"
    FOR %%D in (!d!) do set "hex=!hexmap:~%%D,1!!hex!"
)
:: Hash the final block
CALL:do_md5 00000000!hex!!block!

:: Convert the MD5 digest chunks to hex
FOR %%a IN (digA0 digB0 digC0 digD0) DO (
   set /a dec=!%%a!
   set "hex="
   FOR /L %%N IN (1,1,8) do (
       set /a "d=dec&15,dec>>=4"
       FOR %%D in (!d!) do set "%%ahex=!hexmap:~%%D,1!!%%ahex!"
   )
)
:: Rearrange the hex to create the little endian output value
set md5Digest=
FOR %%a IN (digA0hex digB0hex digC0hex digD0hex) DO (
   FOR /L %%b IN (6,-2,0) DO (
      set thisByte=!%%a:~%%b,2!
      set "md5Digest=!md5Digest!!thisByte!"
   )
)
echo !md5Digest!
del "%temp%\#.tmp"
IF NOT "%~2"=="" IF NOT !recursed!==true (ENDLOCAL& set %~2=%md5Digest%) ELSE (ENDLOCAL& set md5Digest=%md5Digest%)
exit /b

:do_md5
set data=%~1
:: Break the input block into sixteen 32-bit words
FOR /L %%a IN (0,1,15) DO (
   set /a "startPos=(15-%%a)*8"
   FOR %%b IN (!startPos!) DO set M[%%a]=!data:~%%b,8!
)
:: Initialize this chunk's digest to the running digest value
set /a digA'=!digA0!, digB'=!digB0!, digC'=!digC0!, digD'=!digD0!

:: Main md5 loop
FOR /L %%i IN (0,1,63) DO (
   IF %%i LEQ 15 (
      set /a "F=!digD'! ^^ (!digB'! & (!digC'! ^^ !digD'!))"
      set /a g=%%i
   ) ELSE IF %%i LEQ 31 (
      set /a "F=!digC'! ^^ (!digD'! & (!digB'! ^^ !digC'!))"
      set /a "g=(5*%%i+1)%%16"
   ) ELSE IF %%i LEQ 47 (
      set /a "F=!digB'! ^^ !digC'! ^^ !digD'!"
      set /a "g=(3*%%i+5)%%16"
   ) ELSE IF %%i LEQ 63 (
      set /a "F=!digC'! ^^ (!digB'! | (!digD'! ^^ 0xFFFFFFFF))"
      set /a "g=(7*%%i)%%16"
   )
   set /a tempD=!digD'!
   set /a digD'=!digC'!, digC'=!digB'!
   FOR %%g IN (!g!) DO set /a "msg=0x!M[%%g]!"
   set /a X=!digA'!+!F!+!Radians[%%i]!+!msg!
   :: Because cmd does an arithmetic (sign-aware) right shift instead of a logical right shift^
   :: 'Negative' numbers will cause incorrect shift results. This implementation performs a proper^
   :: logical right shift on negative numbers by removing the negative sign bit and putting it back^
   :: in the proper position after the shift is completed.
   IF !X! LSS 0 (
      set /a "tempX=!X! & 0x7FFFFFFF"
      set /a rShiftAmt=32-!Shift[%%i]!
      set /a "tempB=!tempX! >> !rShiftAmt!"
      FOR %%j IN (!rShiftAmt!) DO set /a power=31-%%j
      FOR %%k IN (!power!) DO set /a tempB+=!Powers2[%%k]!
      set /a "digB'+=((!X! << !Shift[%%i]!) | !tempB!)"
   ) ELSE set /a "digB'+=((!X! << !Shift[%%i]!) | (!X! >> (32-!Shift[%%i]!)))"
   set /a digA'=!tempD!
)
:: Add this chunk's result to the running digest value
set /a digA0+=!digA'!, digB0+=!digB'!, digC0+=!digC'!, digD0+=!digD'!
exit /b
And the results of a test program, which tests all 7 cases specified in the RFC document:

Code: Select all

c:\Users\Marc\Desktop\md5>md5test
*** Test 1: MD5('emptystring')
D41D8CD98F00B204E9800998ECF8427E - result
d41d8cd98f00b204e9800998ecf8427e - expected
Time: 00:00:00.17
*** Test 2: MD5(a)
0CC175B9C0F1B6A831C399E269772661 - result
0cc175b9c0f1b6a831c399e269772661 - expected
Time: 00:00:00.22
*** Test 3: MD5(abc)
900150983CD24FB0D6963F7D28E17F72 - result
900150983cd24fb0d6963f7d28e17f72 - expected
Time: 00:00:00.20
*** Test 4: MD5(message digest)
F96B697D7CB7938D525A2F31AAF161D0 - result
f96b697d7cb7938d525a2f31aaf161d0 - expected
Time: 00:00:00.22
*** Test 5: MD5(abcdefghijklmnopqrstuvwxyz)
C3FCD3D76192E4007DFB496CCA67E13B - result
c3fcd3d76192e4007dfb496cca67e13b - expected
Time: 00:00:00.20
*** Test 6: MD5(ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789)
D174AB98D277D9F5A5611C2C9F419D9F - result
d174ab98d277d9f5a5611c2c9f419d9f - expected
Time: 00:00:00.29
*** Test 7: MD5(12345678901234567890123456789012345678901234567890123456789012345678901234567890)
57EDF4A22BE3C955AC49DA2E2107B67A - result
57edf4a22be3c955ac49da2e2107b67a - expected
Time: 00:00:00.31
c:\Users\Marc\Desktop\md5>
A generic use case for this would be storage of user passwords, similar to the goal of the original string encoding thread here: viewtopic.php?f=3&t=4579.

A user would enter their password, let's say it's "password" and you'd run it through md5 (preferably after salting, but that's a different topic) to obtain a hash like "5F4DCC3B5AA765D61D8327DEB882CF99". That hash can be stored in the clear, in a text file, etc. and there is no conceivable way to reverse it to obtain the original "password". Then when a user tries to log in to your system, you hash the password they provide, compare that hash to the original value you saved, and if they match the password must have been correct. It is nearly impossible* for someone to come up with another string, such as "DONUT" that will produce the same hash as the string "password". (* - again, see the notes below)

Another common use is for file validation, where before running itself a batch could hash either its own code, or the content of a configuration file, etc. and compare the resulting hash to a "known good" value. If the hashes don't match, the file in question has clearly been tampered with. I'm sure you've all seen the md5 checksums next to files you're downloading online which are intended to verify no corruption occurred during the download.

NOTES: You might have heard that MD5 is cryptographically "broken", which is absolutely true, but this simply means there are attacks against the algorithm that can provide a hash collision (ie: choosing "DONUT" to provide the same hash as "password") in less than brute-force time. Nowadays it's actually broken pretty thoroughly and shouldn't be used for real security-critical situations. However, that doesn't mean it's completely useless, and it's certainly plenty of security for something as trivial as a batch file. Certainly better than the simple substitution codes first suggested in the string encoding thread :D

Anyway if anyone would like to take a look through the code and suggest any efficiency improvements they'd be most welcome. Unfortunately the algorithm is strictly serial so cannot be sped up by multithreading like my AES implementation. Currently it's able to process a 1KB file in about 1.3 seconds, and a 100KB file in about 3 minutes (not sure why it scales so poorly...). You really don't want to use this on large files; for anything more than just "batch tinkering because it's fun" you should get a Jscript or C implementation. But where's the fun in that?! :lol:

Thanks all!

carlos
Expert
Posts: 501
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: MD5 Message-Digest Algorithm

#2 Post by carlos » 03 May 2014 12:38

Very nice code.
I test only one string and works ok.

I only change a little thing, because I run the script from desktop (this folder have space inside the full path):


Code: Select all

IF NOT EXIST !inFile!

to:

Code: Select all

IF NOT EXIST "!inFile!"


and:

Code: Select all

CALL %~f0

to

Code: Select all

CALL "%~f0"

einstein1969
Expert
Posts: 776
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: MD5 Message-Digest Algorithm

#3 Post by einstein1969 » 03 May 2014 13:58

Hi Magialisk!

Interesting reading. I don't know very well this algos of cript. I have approccied time ago a crc32 for check data from two pipe in dos batch but I don't have finished.

I reading that MD5 i better than crc32 for security reason than this is very good for the examples of password and for check file again accidental change or made by some viruses etc. (even if ... " In 2012, the Flame malware exploited the weaknesses in MD5 to fake a Microsoft digital signature. " cit. wikipedia)

For crc32 there are various methos of implement. For examples, using hash table or directly.

There 's the same opportunity to do so with Md5? (I thinks that could be faster...)

EDIT: changed test a little...

einstein1969
Last edited by einstein1969 on 03 May 2014 15:15, edited 3 times in total.

einstein1969
Expert
Posts: 776
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: MD5 Message-Digest Algorithm

#4 Post by einstein1969 » 03 May 2014 14:21

Currently it's able to process a 1KB file in about 1.3 seconds, and a 100KB file in about 3 minutes (not sure why it scales so poorly...).


For the time scale i think that has same behavior of this code (I have not verified). I will do!

Than... use macro? i thinks will speed up.


EDIT: I have verified. Change the code will do the time scale near linear with size.

change this line of code:

Code: Select all

for /F "skip=1 tokens=1,2 delims=: " %%b in ('fc /B "%inFile%" "%tempFile%"') do (

to:

Code: Select all

set "tempFile2=%temp%\#2.tmp"
>"%tempFile2%" fc /B "%inFile%" "%tempFile%"
for /F "usebackq skip=1 tokens=1,2 delims=: " %%b in ("%tempFile2%") do (


einstein1969
Last edited by einstein1969 on 03 May 2014 15:57, edited 1 time in total.

Aacini
Expert
Posts: 1647
Joined: 06 Dec 2011 22:15
Location: México City, México

Re: MD5 Message-Digest Algorithm

#5 Post by Aacini » 03 May 2014 15:19

There is not much chance to optimize this code excepting the arithmetic operations at end of the main md5 loop, that also allows to eliminate the powers of 2 lookup table:

EDIT: I added several small modifications; I think that the simpler code in the main md5 loop may allows the program run slightly faster.

Code: Select all

:: This script implements the MD5 Message-Digest Algorithm in accordance with
:: RFC 1321, which can be found at http://www.ietf.org/rfc/rfc1321.txt.
:: The only known limitation is that files to be hashed by this script must
:: be less than 256 MB in size.

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
IF "%~1"=="" (
   echo md5 ^<string/filename^> [^<variable^>]
   echo    Calculates the MD5 message-digest of the input string or file.
   echo    Prints the result to stdout and optionally stores it in 'variable'.
   exit /b
)
set "inFile=%~1"

:: Binary integer parts of the sines of integers (in Radians)
set i=0
FOR %%a IN (d76aa478 e8c7b756 242070db c1bdceee f57c0faf 4787c62a a8304613 fd469501
            698098d8 8b44f7af ffff5bb1 895cd7be 6b901122 fd987193 a679438e 49b40821
            f61e2562 c040b340 265e5a51 e9b6c7aa d62f105d 02441453 d8a1e681 e7d3fbc8
            21e1cde6 c33707d6 f4d50d87 455a14ed a9e3e905 fcefa3f8 676f02d9 8d2a4c8a
            fffa3942 8771f681 6d9d6122 fde5380c a4beea44 4bdecfa9 f6bb4b60 bebfbc70
            289b7ec6 eaa127fa d4ef3085 04881d05 d9d4d039 e6db99e5 1fa27cf8 c4ac5665
            f4292244 432aff97 ab9423a7 fc93a039 655b59c3 8f0ccc92 ffeff47d 85845dd1
            6fa87e4f fe2ce6e0 a3014314 4e0811a1 f7537e82 bd3af235 2ad7d2bb eb86d391
) DO set /a "Radians[!i!]=0x%%a, i+=1"

:: Per-round rotation amounts
set i=0
FOR %%a IN (7 12 17 22 7 12 17 22 7 12 17 22 7 12 17 22
            5  9 14 20 5  9 14 20 5  9 14 20 5  9 14 20
            4 11 16 23 4 11 16 23 4 11 16 23 4 11 16 23
            6 10 15 21 6 10 15 21 6 10 15 21 6 10 15 21
) DO set /a "Shift[!i!]=%%a, i+=1"

:: Output digest initialization
set /a digA0=0x67452301, digB0=0xefcdab89, digC0=0x98badcfe, digD0=0x10325476

:: Map for dec->hex conversion
set "hexMap=0123456789ABCDEF"

:: If the input is not a valid filename, it is treated as a string to be hashed. The string is first written to a
:: temp file (without appending a carriage return), then the script recursively calls itself to hash that file.
IF NOT EXIST "%inFile%" (
   set recursed=true
   set /p ans="%inFile%" >"%temp%\_md5.tmp" < NUL
   CALL "%~f0" "%temp%\_md5.tmp" %~2
   del "%temp%\_md5.tmp"
   :: Pass the result back over the second ENDLOCAL barrier
   IF NOT "%~2"=="" FOR %%a IN (!md5Digest!) DO ENDLOCAL& set %~2=%%a
   exit /b
)
:: Abort with an error if the input file is larger than 268,435,455 bytes (see block padding description below)
IF %~Z1 GTR 268435455 (
   echo ERROR: File size must be ^<= 268,435,455 bytes ^(256 MB^)
   exit /b
)
:: Convert the input file into a hexadecimal representation and split it into 512-bit little endian blocks.
set "tempFile=%temp%\#.tmp"
del "%tempFile%" 2>NUL
fsutil file createnew "%tempFile%" %~Z1 >NUL
set /A i=0, d=0
set block=
set emptyfile=false
for /F "skip=1 tokens=1,2 delims=: " %%b in ('fc /B "%inFile%" "%tempFile%"') do (
   IF NOT %%c==no (
      set /A b=0x%%b
      if !i! neq !b! (
         set /A c=b-i
         for /L %%i in (!c!,-1,1) do (
            set "block=00!block!"
            set /A d+=1
            if !d! geq 64 (
               set /a d=0
               CALL:do_md5 !block!
               set block=
            )
         )
      )
      set "block=%%c!block!"
      set /A i=b+1, d+=1
      if !d! geq 64 (
         set /a d=0
         CALL:do_md5 !block!
         set block=
      )
   ) ELSE (
      set emptyfile=true
   )
)
IF NOT %emptyfile%==true (
   if %i% neq %~Z1 (
      set /A c=%~Z1-i
      for /L %%i in (!c!,-1,1) do (
         set "block=00!block!"
         set /A d+=1
         if !d! geq 64 (
            set /a d=0
            CALL:do_md5 !block!
            set block=
         )
      )
   )
)
:: Some portion of a 512-bit block will be leftover and needs to be padded before processing.
:: First pad the block with a '1' bit and seven '0' bits (0x80)
set "block=80%block%"
set /a d+=1
if %d% geq 64 (
   set /a d=0
   CALL:do_md5 !block!
   set block=
)
:: If the block is now larger than 448 bits, it needs to be zero-padded to 512, hashed, and
:: a second empty block created to hold the length field.
IF %d% GTR 56 (
   FOR /L %%a IN (%d%,1,63) DO set "block=00!block!"
   set /a d=0
   CALL:do_md5 !block!
   set block=
)
:: Zero-pad the remaining block 4 bits at a time until the total length is 448 bits (512-64)
set /a lenBits=d*8
FOR /L %%a IN (%lenBits%,4,444) DO set "block=0!block!"

:: The final 64 bits of padding are set equal to the original size in bits of the input.
:: Since cmd only handles 32-bit numbers, we assume the upper 32-bits are zero and the size
:: is stored in the lower 32 bits only. This supports files up to ~256 MB, much larger than
:: anyone with any sense would ever use this script on.
set /a dec=%~Z1*8
set "hex="
FOR /L %%N IN (1,1,8) do (
    set /a "d=dec&15, dec>>=4"
    FOR %%D in (!d!) do set "hex=!hexmap:~%%D,1!!hex!"
)
:: Hash the final block
CALL:do_md5 00000000%hex%%block%

:: Convert the MD5 digest chunks to hex
FOR %%a IN (digA0 digB0 digC0 digD0) DO (
   set /a dec=!%%a!
   set "hex="
   FOR /L %%N IN (1,1,8) do (
       set /a "d=dec&15, dec>>=4"
       FOR %%D in (!d!) do set "%%ahex=!hexmap:~%%D,1!!%%ahex!"
   )
)
:: Rearrange the hex to create the little endian output value
set md5Digest=
FOR %%a IN (digA0hex digB0hex digC0hex digD0hex) DO (
   FOR /L %%b IN (6,-2,0) DO (
      set "md5Digest=!md5Digest!!%%a:~%%b,2!"
   )
)
echo %md5Digest%
del "%temp%\#.tmp"
IF NOT "%~2"=="" IF NOT %recursed%==true (ENDLOCAL& set %~2=%md5Digest%) ELSE (ENDLOCAL& set md5Digest=%md5Digest%)
exit /b

:do_md5
set data=%~1
:: Break the input block into sixteen 32-bit words
FOR /L %%a IN (0,1,15) DO (
   set /a "startPos=(15-%%a)*8"
   FOR %%b IN (!startPos!) DO set M[%%a]=!data:~%%b,8!
)
:: Initialize this chunk's digest to the running digest value
set /a digA'=digA0, digB'=digB0, digC'=digC0, digD'=digD0
:: Main md5 loop
FOR /L %%i IN (0,1,63) DO (
   IF %%i LEQ 15 (
      set /a "F=digD' ^ (digB' & (digC' ^ digD')),      g=%%i"
   ) ELSE IF %%i LEQ 31 (
      set /a "F=digC' ^ (digD' & (digB' ^ digC')),      g=(5*%%i+1)%%16"
   ) ELSE IF %%i LEQ 47 (
      set /a "F=digB' ^ digC' ^ digD',                  g=(3*%%i+5)%%16"
   ) ELSE (
      set /a "F=digC' ^ (digB' | (digD' ^ 0xFFFFFFFF)), g=(7*%%i)%%16"
   )
   set /a tempD=digD', digD'=digC', digC'=digB'
   FOR %%g IN (!g!) DO set /a "X=digA'+F+Radians[%%i]+0x!M[%%g]!"
   :: Because cmd does an arithmetic (sign-aware) right shift instead of a logical right shift^
   :: 'Negative' numbers will cause incorrect shift results. This implementation performs a proper^
   :: logical right shift on negative numbers by removing the negative sign bit and putting it back^
   :: in the proper position after the shift is completed.
   IF !X! LSS 0 (
      set /a "rShiftAmt=32-Shift[%%i], tempB=((X & 0x7FFFFFFF)>>rShiftAmt) + (1<<(31-rShiftAmt))"
   ) ELSE (
      set /a "tempB=X >> (32-Shift[%%i])"
   )
   set /a "digB'+=(X << Shift[%%i]) | tempB, digA'=tempD"
)
:: Add this chunk's result to the running digest value
set /a digA0+=digA', digB0+=digB', digC0+=digC', digD0+=digD'
exit /b


Antonio

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: MD5 Message-Digest Algorithm

#6 Post by Magialisk » 03 May 2014 22:57

Wow! Imagine my surprise to come back here after only one night and see such enthusiastic responses to my post. Thank you all for the suggestions so far! I will give them all a try in the next couple of days and report back where any performance improvements were realized.

einstein I'm intrigued by your finding that saving the 'fc' output to a file and then FOR'ing the file is faster than FOR'ing the 'fc' command output directly. Intuition would lead me to believe the opposite, since my way removes the need for extra file I/Os, but we've all written enough batch to know intuition is seldom useful :wink: Do you think it might have something to do with filling up the ENV size since my way processes the output in memory? I will absolutely time test your approach and it'd be great if it does make the timing more linear for large files.

Also Aacini, I'm glad to see you back again. The help you gave me on my AES script was unbelievable, and I see a couple things you've done below that remind me of those previous contributions. Maybe I'll remember all these tips for whatever 3rd algorithm I tackle :lol:

Not to forget carlos, thank you for pointing out my sloppiness with the quotes. Not that it's an excuse, but my first cut at this script processed files only, and then before posting it to the forum I quickly tacked on the ability to process strings by writing a file and calling it recursively. I clearly didn't re-check the script with a spaced filename after that change, good catch!

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: MD5 Message-Digest Algorithm

#7 Post by Magialisk » 04 May 2014 00:10

Well I hate to double post but I couldn't stand the wait before testing out the suggestions and posting back.

First I looked over Aacini's code and when I hit this one line I just about fell off my chair:

Code: Select all

set /a "rShiftAmt=32-Shift[%%i], tempB=((X & 0x7FFFFFFF)>>rShiftAmt) + (1<<(31-rShiftAmt))"
The elegance of that one line in place of 4-5 of mine was jarring, I really wish I'd thought of that instead of the clunky "powers of 2" lookup table. This forum needs the emoticon for bowing :lol:

In any case I cut/paste Aacini's code as is and ran it, but got a syntax error. This line needs to have the 'recursed' in !!'s instead of %%'s, after that it ran great:

Code: Select all

IF NOT "%~2"=="" IF NOT %recursed%==true (ENDLOCAL& set %~2=%md5Digest%) ELSE (ENDLOCAL& set md5Digest=%md5Digest%)
With Aacini's code, tests 1-5 in the test suite run in 15-19 ms, compared to my original code's 17-22 ms. Tests 6 and 7 run in 27-28 ms instead of 31-32 ms. The test of a 1K file came down from 1.73s to 1.48s. So all around it's a 12-15% improvement in speed. Can't thank you enough Aacini!

Unfortunately the scaling with large files was still not great, with the 100k file taking 4m8s, down from 4m30s. That's only an 8% improvement over my own code, still appreciated but clearly not as good as with the small messages.

Alas this is where einstein's code came to the rescue. After changing the FOR loop to process a 2nd temp file as he suggested, the test of a 100k file completed in only 2m15s! That's exactly 50% off the time it took my original code, and technically slightly better than linear scaling (1.47s / 1k = 2m27s / 100k, and he beat that!)

You guys are really fantastic. I don't expect there's much more to squeeze out of such a simplistic algorithm, but I really appreciate these inputs. I'll look over the code a bit more and sometime in the next couple days update the first post with the newer faster source in case anyone wants to use it.

Speaking of "using it", both einstein and I did point out that md5 is seriously cryptographically weakened, so I can't with good conscience tell you to go protect your passwords with it. However it's still orders of magnitude better than the silly obfuscation techniques thrown around in the "string encoding" thread linked above. And of course, while they're intended for completely different purposes, if you'd like something seriously strong cryptographically, there's always the AES script :wink:

einstein1969
Expert
Posts: 776
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: MD5 Message-Digest Algorithm

#8 Post by einstein1969 » 04 May 2014 06:14

Magailisk wrote: I don't expect there's much more to squeeze out of such a simplistic algorithm, but I really appreciate these inputs. I'll look over the code a bit more and sometime in the next couple days update the first post with the newer faster source in case anyone wants to use it.

Speaking of "using it", both einstein and I did point out that md5 is seriously cryptographically weakened, so I can't with good conscience tell you to go protect your passwords with it. However it's still orders of magnitude better than the silly obfuscation techniques thrown around in the "string encoding" thread linked above. And of course, while they're intended for completely different purposes, if you'd like something seriously strong cryptographically, there's always the AES script.


I think that you could instead get a lot!! And with the experience here could also further optimize the AES algorithm in terms of performance.
There are some parts that you can put code a bit more complex and use other tricks to push the limits of the batch. Otherwise we can say that is not the batch to be slow but we are a bit lazy in discovering its hidden spots.

For example: how much it costs to run this operation?

Code: Select all

set /a  a = 1 + 1


at best about 68 micro seconds on my system.

with the trick of the depletion of the environment costs only 26 micro seconds. About 2,6-2,8X faster.

Moreover, even if some things do not parallelizable can be put in the pipeline. In addition, the For / F is not always the fastest way to read a file, etc. ...

Also you did not answer my first question: is it possible to avoid the lookup tables as the CRC32?

einstein1969

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: MD5 Message-Digest Algorithm

#9 Post by Magialisk » 04 May 2014 09:22

Sorry I overlooked your question einstein. I didn't quite understand it the first time I guess, but I do now. In fact this same discussion came up after posting my AES code. With just about any algorithm there is going to be a way to calculate numbers directly instead of using a lookup table, and the situation this creates is called a time-memory tradeoff.

Using the lookup table is trading off large memory consumption in order to save time, since the program does not have to calculate the values at runtime. Granted with cmd, unlike most normal languages, the increased ENV size slows the program down, but still not enough to be worse than calculating directly. The lookup table should always be faster executing than a real-time calculation of the same constants. The real-time calculation on the other hand is trading off slower execution in order to save memory. A couple lines of code (a few dozen bytes?) can replace a whole lookup table containing hundreds of entries of a dozen bytes each. You might get a memory savings of 2 orders of magnitude if you're working on a such a constrained system where that would be a major advantage. Unfortunately the code will run much slower for it.

On the AES algorithm we went through a couple iterations of this tradeoff with both the S-Box definitions and the Rijndael-Galois Field constants. Check out this post and the two that follow it regarding my change from a macro to calculate Galois constants, to a lookup table, which reduced execution time by 90%: viewtopic.php?p=28045#p28045. While the lookup tables were always faster, several people suggested different ways to define the lookup tables that were themselves faster than my initial naive approaches. jeb for example suggested a long string map, similar in concept to the '01-Jan;02-Feb;03-Mar' maps seen elsewhere on this site, and I believe it was Aacini? that suggested the FOR loop to build an array. Previously I'd been using an array, but building it manually with a set statement for each element. That FOR loop turned out to be the best tradeoff in my opinion of code size and code speed for building loookup tables in AES, so I used it again here.

To address your question directly though, if you look at the wikipedia page they do provide the psuedocode formula you'd use to calculate the integer parts of the sines of integers:

Code: Select all

//Use binary integer part of the sines of integers (Radians) as constants:
for i from 0 to 63
    K[i] := floor(abs(sin(i + 1)) × (2 pow 32))
end for
//(Or just use the following table):
K[ 0.. 3] := { 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee }
K[ 4.. 7] := { 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501 }
K[ 8..11] := { 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be }
K[12..15] := { 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821 }
K[16..19] := { 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa }
K[20..23] := { 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8 }
K[24..27] := { 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed }
K[28..31] := { 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a }
K[32..35] := { 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c }
K[36..39] := { 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70 }
K[40..43] := { 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05 }
K[44..47] := { 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665 }
K[48..51] := { 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039 }
K[52..55] := { 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1 }
K[56..59] := { 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1 }
K[60..63] := { 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391 }
So instead of creating the lookup tables what you'd have to do is change this line of code

Code: Select all

set /a X=!digA'!+!F!+!Radians[%%i]!+!msg!
and replace the !Radians[%%i]! lookup with a real-time calculation of 'floor(abs(sin(%%i + 1)) × (2^32))'.
There is another thread on this forum regarding fast methods of calculating sine in a batch, that would be a good place to start. There's another thread or two on implementing floating point arithmetic in batch that might help get past the multiplication by 2^32 which would otherwise be too large for cmd's 32-bit variables. However, as I'm sure is coming into focus by now, it's always going to be much faster to dereference a pre-created variable like !Radians[%%i]! than to calculate the sin of a number, absolute value it, round it down, and then perform floating point math on it :D

Thank you again though for your suggestions, especially that neat trick parsing a temp file instead of a command output. I can't believe how much of a speed difference that made on large hashes!

einstein1969
Expert
Posts: 776
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: MD5 Message-Digest Algorithm

#10 Post by einstein1969 » 05 May 2014 03:42

I have reading. The lookup table in most cases are good, in other cases no. You have to do some testing.

For example the lookup table for floor(abs(sin(%%i)*2^32)) is good if the implementation in dos batch is poor.

How implement in dos Batch? :cry: There are many method. I have developed various sin functions with mclaurin series expansion and cordic but the precision is not enoght for this cases.

So it would be better to abandon this path...

But... we need to calculate:

Code: Select all

sin(1)
sin(2)
...
sin(64)


and sin(n+1) = sin(n) * cos(1) + cos(n) * sin(1)

namely using the first element of lookup table we can calculate the next with 2 mul and 1 add.

Then " * 2^32 " is shift left 32 . This is no complex operation! this is copy from 4Byte to another 4Byte (is not quite so ... but ...)

The number to multiply are near bigger for dos math. But if necessary is possible use the karatsuba method... I have worked for the first time in this Problems with decimal point in my calculator program.

But I think this path is still too tortuous...

I have make some optimizations and for now i passed from 530Byte/s to 560Byte/s but I think you could double the speed without lookup tables!

EDIT:passed to 630Byte/s

einstein1969

einstein1969
Expert
Posts: 776
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: MD5 Message-Digest Algorithm

#11 Post by einstein1969 » 07 May 2014 13:12

Hi Magialisk!

I'm working on the subroutine that processes the md5. I tweaked the code and there is a significant increase in performance. (I have doubled!)

In order to reach high performance I had, however, use of different variable names (but looks like the RFC document) and I had to rewrite the code, which is less elegant and redundant but it was necessary.

This is the part that calculate the round 1 and round 2 (function G). Round 3 and 4 have minimal trick but work more speed and is compact code. I have to finish the rest....

This can be optimized further but i think that should first finish this step.

Code: Select all

:: Rif: http://en.wikipedia.org/wiki/MD5
::      http://www.ietf.org/rfc/rfc1321.txt
(
:do_md5_fast
setlocal

set block=%~1

:: Break the input block into sixteen 32-bit words
set "a=0" & For /L %%a in (120,-8,0) DO set /a M_!a!=0x!block:~%%a,8!, a+=1

:: Initialize this chunk's digest to the running digest value
set /a A=!digA0!, B=!digB0!, C=!digC0!, D=!digD0!

:: Round 1

:: F = (b & c) | ((~b) & d)
:: 7 12 17 22

:: Rotate-Shift = 7
:: X = A + %_F_% + Radians_0 + M_0, A = B + ( X << 7 | ( (X & 0x7FFFFFFF) >> 32-7 ) + ( X >> 31 & 1 << 7-1 ) )"

:: Merge two line of SET /A into one to increase performance.
(
set /a "? = A + ( B & C | ~B & D ) + 0xd76aa478 + M_0 , A = B + ( (? <<  7) | ((? & 0x7FFFFFFF) >> 25) ) + ( ?>>31 & 0x40 )"
set /a "? = D + ( A & B | ~A & C ) + 0xe8c7b756 + M_1 , D = A + ( (? << 12) | ((? & 0x7FFFFFFF) >> 20) ) + ( ?>>31 & 0x800 )"
set /a "? = C + ( D & A | ~D & B ) + 0x242070db + M_2 , C = D + ( (? << 17) | ((? & 0x7FFFFFFF) >> 15) ) + ( ?>>31 & 0x10000 )"
set /a "? = B + ( C & D | ~C & A ) + 0xc1bdceee + M_3 , B = C + ( (? << 22) | ((? & 0x7FFFFFFF) >> 10) ) + ( ?>>31 & 0x200000 )"
set /a "? = A + ( B & C | ~B & D ) + 0xf57c0faf + M_4 , A = B + ( (? <<  7) | ((? & 0x7FFFFFFF) >> 25) ) + ( ?>>31 & 0x40 )"
set /a "? = D + ( A & B | ~A & C ) + 0x4787c62a + M_5 , D = A + ( (? << 12) | ((? & 0x7FFFFFFF) >> 20) ) + ( ?>>31 & 0x800 )"
set /a "? = C + ( D & A | ~D & B ) + 0xa8304613 + M_6 , C = D + ( (? << 17) | ((? & 0x7FFFFFFF) >> 15) ) + ( ?>>31 & 0x10000 )"
set /a "? = B + ( C & D | ~C & A ) + 0xfd469501 + M_7 , B = C + ( (? << 22) | ((? & 0x7FFFFFFF) >> 10) ) + ( ?>>31 & 0x200000 )"
set /a "? = A + ( B & C | ~B & D ) + 0x698098d8 + M_8 , A = B + ( (? <<  7) | ((? & 0x7FFFFFFF) >> 25) ) + ( ?>>31 & 0x40 )"
set /a "? = D + ( A & B | ~A & C ) + 0x8b44f7af + M_9 , D = A + ( (? << 12) | ((? & 0x7FFFFFFF) >> 20) ) + ( ?>>31 & 0x800 )"
set /a "? = C + ( D & A | ~D & B ) + 0xffff5bb1 + M_10, C = D + ( (? << 17) | ((? & 0x7FFFFFFF) >> 15) ) + ( ?>>31 & 0x10000 )"
set /a "? = B + ( C & D | ~C & A ) + 0x895cd7be + M_11, B = C + ( (? << 22) | ((? & 0x7FFFFFFF) >> 10) ) + ( ?>>31 & 0x200000 )"
set /a "? = A + ( B & C | ~B & D ) + 0x6b901122 + M_12, A = B + ( (? <<  7) | ((? & 0x7FFFFFFF) >> 25) ) + ( ?>>31 & 0x40 )"
set /a "? = D + ( A & B | ~A & C ) + 0xfd987193 + M_13, D = A + ( (? << 12) | ((? & 0x7FFFFFFF) >> 20) ) + ( ?>>31 & 0x800 )"
set /a "? = C + ( D & A | ~D & B ) + 0xa679438e + M_14, C = D + ( (? << 17) | ((? & 0x7FFFFFFF) >> 15) ) + ( ?>>31 & 0x10000 )"
set /a "? = B + ( C & D | ~C & A ) + 0x49b40821 + M_15, B = C + ( (? << 22) | ((? & 0x7FFFFFFF) >> 10) ) + ( ?>>31 & 0x200000 )"
)

:: Round 2

:: 5  9 14 20
:: G = (B & D) | (C & (~D) )

(
set /a "? = A + ( B & D | C & ~D ) + 0xf61e2562 + M_1 , A = B + ( (? <<  5) | ((? & 0x7FFFFFFF) >> 27) ) + ( ?>>31 & 0x10 )"
set /a "? = D + ( A & C | B & ~C ) + 0xc040b340 + M_6 , D = A + ( (? <<  9) | ((? & 0x7FFFFFFF) >> 23) ) + ( ?>>31 & 0x100 )"
set /a "? = C + ( D & B | A & ~B ) + 0x265e5a51 + M_11, C = D + ( (? << 14) | ((? & 0x7FFFFFFF) >> 18) ) + ( ?>>31 & 0x2000 )"
set /a "? = B + ( C & A | D & ~A ) + 0xe9b6c7aa + M_0 , B = C + ( (? << 20) | ((? & 0x7FFFFFFF) >> 12) ) + ( ?>>31 & 0x80000 )"
set /a "? = A + ( B & D | C & ~D ) + 0xd62f105d + M_5 , A = B + ( (? <<  5) | ((? & 0x7FFFFFFF) >> 27) ) + ( ?>>31 & 0x10 )"
set /a "? = D + ( A & C | B & ~C ) + 0x02441453 + M_10, D = A + ( (? <<  9) | ((? & 0x7FFFFFFF) >> 23) ) + ( ?>>31 & 0x100 )"
set /a "? = C + ( D & B | A & ~B ) + 0xd8a1e681 + M_15, C = D + ( (? << 14) | ((? & 0x7FFFFFFF) >> 18) ) + ( ?>>31 & 0x2000 )"
set /a "? = B + ( C & A | D & ~A ) + 0xe7d3fbc8 + M_4 , B = C + ( (? << 20) | ((? & 0x7FFFFFFF) >> 12) ) + ( ?>>31 & 0x80000 )"
set /a "? = A + ( B & D | C & ~D ) + 0x21e1cde6 + M_9 , A = B + ( (? <<  5) | ((? & 0x7FFFFFFF) >> 27) ) + ( ?>>31 & 0x10 )"
set /a "? = D + ( A & C | B & ~C ) + 0xc33707d6 + M_14, D = A + ( (? <<  9) | ((? & 0x7FFFFFFF) >> 23) ) + ( ?>>31 & 0x100 )"
set /a "? = C + ( D & B | A & ~B ) + 0xf4d50d87 + M_3 , C = D + ( (? << 14) | ((? & 0x7FFFFFFF) >> 18) ) + ( ?>>31 & 0x2000 )"
set /a "? = B + ( C & A | D & ~A ) + 0x455a14ed + M_8 , B = C + ( (? << 20) | ((? & 0x7FFFFFFF) >> 12) ) + ( ?>>31 & 0x80000 )"
set /a "? = A + ( B & D | C & ~D ) + 0xa9e3e905 + M_13, A = B + ( (? <<  5) | ((? & 0x7FFFFFFF) >> 27) ) + ( ?>>31 & 0x10 )"
set /a "? = D + ( A & C | B & ~C ) + 0xfcefa3f8 + M_2 , D = A + ( (? <<  9) | ((? & 0x7FFFFFFF) >> 23) ) + ( ?>>31 & 0x100 )"
set /a "? = C + ( D & B | A & ~B ) + 0x676f02d9 + M_7 , C = D + ( (? << 14) | ((? & 0x7FFFFFFF) >> 18) ) + ( ?>>31 & 0x2000 )"
set /a "? = B + ( C & A | D & ~A ) + 0x8d2a4c8a + M_12, B = C + ( (? << 20) | ((? & 0x7FFFFFFF) >> 12) ) + ( ?>>31 & 0x80000 )"
)

set /a digA'=!A!, digB'=!B!, digC'=!C!, digD'=!D!

:: Round 3
:: 4, 11, 16, 23
:: H = b ^ c ^ d

FOR /L %%i IN (32,1,47) DO (

   set /a "S=4+%%i %% 2*7+12*(%%i>>1&1), F=!digB'! ^^ !digC'! ^^ !digD'!, g=(3*%%i+5)%%16, tempD=!digD'!, digD'=!digC'!, digC'=!digB'!"

   set /a "X=digA'+F+!Radians[%%i]!+M_!g!, digB'+=((X << S) | ((X & 0x7FFFFFFF) >> (32-S)))+( (X>>31) & (1<< (S-1)) ), digA'=tempD"

)

:: Round 4
:: 6, 10, 15, 21
:: I = c ^ (b | (~d))

FOR /L %%i IN (48,1,63) DO (

   set /a "S=(%%i %% 4+2)*(%%i %% 4+5)/2+1, F=!digC'! ^^ (!digB'! | ~!digD'!), g=(7*%%i)%%16, tempD=!digD'!, digD'=!digC'!, digC'=!digB'!"

   set /a "X=digA'+F+!Radians[%%i]!+M_!g!, digB'+=((X << S) | ((X & 0x7FFFFFFF) >> (32-S)))+( (X>>31) & (1<< (S-1)) ), digA'=tempD"

)

)

:: Add this chunk's result to the running digest value
rem set /a digA0+=!digA'!, digB0+=!digB'!, digC0+=!digC'!, digD0+=!digD'!
(endlocal & set /a digA0+=%digA'%, digB0+=%digB'%, digC0+=%digC'%, digD0+=%digD'%)

exit /b


You can add this subroutine in the main and it should work. Even if it is to be finished the performance tuning...

EDIT : Add complete subroutine code

einstein1969

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: MD5 Message-Digest Algorithm

#12 Post by Magialisk » 09 May 2014 00:32

Einstein that code is insane! I can't even understand what it does anymore :lol:

As you stated, the timing improvement is quite large so far. Tests 1-5 in the test suite run in 11-16ms, down from Aacini's 15-19 ms. Tests 6 and 7 run in 18-21 ms instead of 27-28 ms. The test of a 1K file came down from 1.48s to 0.92s. And the test of a 100k file came down from 2m15s to 1m24s!

I will definitely have to study your code to learn how you've done this. It seems like you were smart to break the 4 individual rounds into their own segments rather than handling the whole 0-63 operations loop in one pass. Within a given round the operations are identically defined, so there was lots of extraneous logic you were able to cut out that way.

Thank you again for all the time you've put into improving this. I can't wait to see the finished product!

carlos
Expert
Posts: 501
Joined: 20 Aug 2010 13:57
Location: Chile
Contact:

Re: MD5 Message-Digest Algorithm

#13 Post by carlos » 09 May 2014 01:49

@einstein1969: your code works ok?. I not read all, but I suspect that something may wrong because the c code in the rfc use unsigned int, and cmd use signed int.

For example: this:
0xd76aa478
is
3614090360
but for cmd is:
-680876936

Also, I always get code from netbsd and adapt it. Is very portable and easy of adapt (I get a rot13 and sha1 function from the netbsd code adapted to a program that run over windows). I never look wikipedia for implementations, I look these sources, because the license is really good (bsd or even beerware) and are really stables. :)
Here you can get a c implementation but it also use unsigned int.

Code: Select all

http://cvsweb.netbsd.org/bsdweb.cgi/src/external/bsd/bind/dist/lib/isc/md5.c?only_with_tag=MAIN

also, openbsd implementation:

Code: Select all

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/crypto/md5.c

look the latest revisions.

einstein1969
Expert
Posts: 776
Joined: 15 Jun 2012 13:16
Location: Italy, Rome

Re: MD5 Message-Digest Algorithm

#14 Post by einstein1969 » 09 May 2014 20:36

carlos wrote:@einstein1969: your code works ok?. I not read all, but I suspect that something may wrong because the c code in the rfc use unsigned int, and cmd use signed int.

For example: this:
0xd76aa478
is
3614090360
but for cmd is:
-680876936

Also, I always get code from netbsd and adapt it. Is very portable and easy of adapt (I get a rot13 and sha1 function from the netbsd code adapted to a program that run over windows). I never look wikipedia for implementations, I look these sources, because the license is really good (bsd or even beerware) and are really stables. :)
Here you can get a c implementation but it also use unsigned int.

Code: Select all

http://cvsweb.netbsd.org/bsdweb.cgi/src/external/bsd/bind/dist/lib/isc/md5.c?only_with_tag=MAIN

also, openbsd implementation:

Code: Select all

http://www.openbsd.org/cgi-bin/cvsweb/src/sys/crypto/md5.c

look the latest revisions.


Thanks Carlos for the info. :) I have see the code and seem ok with this version. For the future i look in this repo!

For the signed integer i study for possible problem...

Seem that the information is intact

Code: Select all

>cmd /c exit -680876936 & call echo %^=exitCode%
D76AA478

>set /a -680876936+1
-680876935

>cmd /c exit -680876935 & call echo %^=exitCode%
D76AA479


The operation on this type of info is sum(+), And (&), Xor, or, not, shift (<<, >>)
The only prolematic is >> but there is a patch in the code

i thinking about this value :? :

Code: Select all

>cmd /c exit -2147483648 & call echo %^=exitCode%
80000000

>cmd /c exit 2147483648 & call echo %^=exitCode%
80000000


but there is not in the initial data... is sufficient ? :?:


einstein1969

Magialisk
Posts: 104
Joined: 25 Jul 2013 19:00

Re: MD5 Message-Digest Algorithm

#15 Post by Magialisk » 10 May 2014 09:05

I myself was quite confused about whether the limitation of cmd's signed ints was going to cause a problem since all of the addition in MD5 (anything written as '+' in the algorithm) is actually defined as addition modulo 2^32 on an unsigned integer. At first I didn't think that this would work properly on a signed integer, and I was chasing a bug I couldn't catch so I thought maybe this was the reason...

I hate to say it, but I ended up writing a whole subroutine called "AddMod2_32" to "fix" this problem I'd created in my head. Only after doing that did I realize that no matter what data I sent through my batch, the result was coming out the same whether I used a normal 'set /a result = !DigA'! + !DigB'! + !Radians[%%i]!' etc. or used my fancy subroutine 'AddMod2_32 result DigA' DigB' Radians[%%i]'.

I can't swear that I exhaustively tested every possibility, but I tested the case of adding a negative and a positive number, two negative numbers, and rolling over the 31-bit maximum value in both directions (pos to neg and neg to pos). I didn't see any issues, so I dejectedly threw away my AddMod2_32 subroutine and looked harder elsewhere for the bug. If you think about it though, it makes sense that it should work because other than in the right shift, which I wrote a custom operation for we don't care what the decimal number equivalent to the value is. If you add two binary numbers it doesn't matter whether the MSB "represents" a sign bit or not, the operation will carry out the same and any carry will be thrown away, hence the mod2^32 result. At least that's what I convinced myself after running my subroutine tests and eventually finding my bug elsewhere :)

Post Reply