Page 1 of 4

Multi-process Advanced Encryption Standard (AES)

Posted: 02 Nov 2013 01:08
by Magialisk
It's been over 2 months since there's been any activity in the original "string encoding" thread (viewtopic.php?f=3&t=4579), so rather than post this in with all the cluttered proofs of concept I put there, I thought a full working release deserved a thread of it's own.

Picking up where I left off in that other thread, my AES batch can now encrypt and decrypt entire files (very slowly :wink:) using as many processes (PIDs) as you have CPU cores for. It's "thread safe", even if something were to go wrong the processes kill themselves after a pre-defined blocking period. The algorithm itself is compliant with FIPS-197 (tested to the examples in the standard document) and the mode of operation is Electronic Codebook (ECB). I do have to give a HUGE thanks to Aacini for his BinToHex and HexToBin scripts (viewtopic.php?f=3&t=4842), as I use modified versions of his work to convert to and from the hex codes processed by my AES batch.

There are still a couple of bugs to work out, and likely some performance optimizations, but the basic functionality is there so I wanted to post a demo. Here is what it looks like under normal operation:

Code: Select all

c:\Users\Marc\Desktop\aes>aes
AES <inFile> [<inKey>] [<pidMax>]
 Encrypts or decrypts the input file using the Advanced Encryption Standard
 algorithm as specified in NIST FIPS-197.  Files with extensions other than
 '.aes' are encrypted to a file of the same name, plus an .aes extension.
 '.aes' input files are decrypted to a file of the same name with the .aes
 extension removed.
   <inFile>:   A complete path to the file to be processed or the name of a
               variable containing the target file's path.
   [<inKey>]:  Either a complete path to a keyfile, a variable containing
               the path to a keyfile, or a varible containing a 32-, 48- or
               64-character string of hexadecimal values (ie: a 128-, 192- or
               256-bit key). If not provided the program will prompt for a key.
   [<pidMax>]: An integer corresponding to the number of parallel processes
               to run simultaneously.  Defaults to one process per CPU core.

c:\Users\Marc\Desktop\aes>aes file.bin aes.key
Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:00.55)
Beginning AES Encryption with 4 PIDs...
...AES Encryption Complete (Elapsed Time: 00:00:41.83)

c:\Users\Marc\Desktop\aes>aes file.bin.aes aes.key
Beginning AES Decryption with 4 PIDs...
The system cannot find the drive specified.
...AES Decryption Complete (Elapsed Time: 00:00:40.42)
Restoring binary file from ASCII-coded hexadecimal...
...Binary file restoration complete (Elapsed Time: 00:00:00.07)

c:\Users\Marc\Desktop\aes>fc /b file.bin file(1).bin
Comparing files file.bin and FILE(1).BIN
FC: no differences encountered
You'll notice one pesky bug I haven't been able to get rid of, a warning that "The system cannot find the drive specified." I know for a fact it finds it just fine and the output is correct, but I haven't been able to shake that message.

In any case, that test file is just barely over 1KB in size, 1041 bytes I believe, so you can see how terribly slow the whole operation is. On this Q6600 machine, with 4 processes going, it averages about 25 bytes/sec total throughput. On my X5660 Xeon at work, running 12 processes, the same file takes a little over 20 seconds, something like 48 bytes/sec. I'll be the first to admit this will never be practical beyond possibly encrypting a text file with a few passwords in it, but I still felt it was a neat application of batch scripting. I'll post the code below, and any comments or suggestions would be very welcome.

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 02 Nov 2013 01:18
by Magialisk
UPDATED CODE POSTED 11/25/13

OK the code comes in two batch files. The first one is the control script, providing the user interface and PID management, and is simply called 'aes.bat'. The second does the [enc/dec]ryption and is called 'aescore.bat'. I'll post their code separately in this post and the next.

So first up for 'aes.bat':

Code: Select all

@echo off
set LF=^


:: Above 2 blank lines are required - do not remove
set ^"\n=^^^%LF%%LF%^%LF%%LF%^^"

SETLOCAL ENABLEDELAYEDEXPANSION
:: Grab input parameters
set "infile=%~1"
set "key=%~2"
set "keyFile="
set /a numCPU=%~3 2>nul

:: Input validation on filename and key (exit with usage statement if necessary)
IF NOT EXIST "%inFile%" set "inFile=!%~1!"
IF NOT EXIST "%inFile%" GOTO:AES_usage
IF DEFINED key (
   :: Check if a valid path to a keyfile was input directly
   IF EXIST "%key%" (
      set keyFile=%key%
      set key=
   )
   :: Check if the input was a variable containing a valid path to a keyfile
   set "key=!%~2!"
   IF EXIST "%key%" (
      set keyFile=%key%
      set key=
   )
   IF NOT DEFINED keyFile (
      :: Check if the input was a variable containing a valid key
      set len=0
      set "strTest=A!key!"
      FOR /l %%A IN (12,-1,0) DO (
         set /a "len|=1<<%%A"
         FOR %%B IN (!len!) DO IF "!strTest:~%%B,1!"=="" set /a "len&=~1<<%%A"
      )
      IF NOT !len!==32 IF NOT !len!==48 IF NOT !len!==64 GOTO:AES_usage
      set /a keysize=!len!*4
      echo "%key%" | findstr /R "[ghijklmnopqrstuvwxyzGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()-_=+[{\]}\\|',<.>/?]" >nul
      IF %ERRORLEVEL%==0 GOTO:AES_usage
   )
)

:: If we still don't have a key, see if we have a keyfile.  Otherwise, prompt user for direct key input.
:AES_getKey
IF NOT DEFINED key (
   IF EXIST "%keyfile%" (
      FOR /F "usebackq" %%a IN ("%keyFile%") DO set key=%%a
      set len=0
      set "strTest=A!key!"
      FOR /l %%A IN (12,-1,0) DO (
         set /a "len|=1<<%%A"
         FOR %%B IN (!len!) DO IF "!strTest:~%%B,1!"=="" set /a "len&=~1<<%%A"
      )
      IF NOT !len!==32 IF NOT !len!==48 IF NOT !len!==64 (
         echo *INVALID KEY FILE*
         set "keyFile=" & set "key=" & GOTO:AES_getKey
      )
      echo "%key%" | findstr /R "[ghijklmnopqrstuvwxyzGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()-_=+[{\]}\\|',<.>/?]" >nul
      IF !ERRORLEVEL!==0 (
         echo *INVALID KEY FILE*
         set "keyFile=" & set "key=" & GOTO:AES_getKey
      )
      set /a keysize=!len!*4
   ) ELSE (
      echo Please enter a 128-, 192- or 256-bit key value using hexadecimal notation.
      echo ^(Type 'quit' to exit.^)
      set /p key="key="
      IF /I "!key!"=="quit" exit /b
      set len=0
      set "strTest=A!key!"
      FOR /l %%A IN (12,-1,0) DO (
         set /a "len|=1<<%%A"
         FOR %%B IN (!len!) DO IF "!strTest:~%%B,1!"=="" set /a "len&=~1<<%%A"
      )
      IF NOT !len!==32 IF NOT !len!==48 IF NOT !len!==64 echo *INVALID KEY LENGTH* & set "key=" & GOTO:AES_getKey
      set /a keysize=!len!*4
      echo "%key%" | findstr /R "[ghijklmnopqrstuvwxyzGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()-_=+[{\]}\\|',<.>/?]" >nul
      IF !ERRORLEVEL!==0 echo *ILLEGAL CHARACTERS IN KEY* & set "key=" & GOTO:AES_getKey
   )
)

:: Configure based on inputs
set "extension=%inFile:~-4%"
IF "%extension%"==".aes" (set mode=Dec) ELSE (set mode=Enc)
IF "%~3"=="" set numCPU=%NUMBER_OF_PROCESSORS%-1
IF %numCPU% LEQ 0 set /a numCPU=1
IF %numCPU% GTR %NUMBER_OF_PROCESSORS% set /a numCPU=%NUMBER_OF_PROCESSORS%-1

:: Clean up any garbage from previous execution
IF EXIST _mutex* del _mutex*
IF EXIST _pipe* del _pipe*

:: Break the input key into the first set of key words, to be expanded upon by the key scheduler algorithm.
set i=0
FOR /L %%c IN (0,1,3) DO FOR /L %%b IN (0,1,3) DO (
   FOR %%a IN (!i!) DO set /a "key[%%b][%%c]=0x!key:~%%a,2!"
   set /a i+=2
)
IF %keysize%==192 (
   FOR /L %%c IN (4,1,5) DO FOR /L %%b IN (0,1,3) DO (
      FOR %%a IN (!i!) DO set /a "key[%%b][%%c]=0x!key:~%%a,2!"
      set /a i+=2
   )
)
IF %keysize%==256 (
   FOR /L %%c IN (4,1,7) DO FOR /L %%b IN (0,1,3) DO (
      FOR %%a IN (!i!) DO set /a "key[%%b][%%c]=0x!key:~%%a,2!"
      set /a i+=2
   )
)

:: Define pre-computed round constants for use in the key scheduler algorithm
set i=1
FOR %%a IN (01 02 04 08 10 20 40 80 1b 36) DO set /a "RC[!i!]=0x%%a, i+=1"

:: Define pre-computed Rijndael S-box values for use in the SubBytes() function and key scheduler algorithm
set i=0
FOR %%a IN (63 7c 77 7b f2 6b 6f c5 30 01 67 2b fe d7 ab 76
            ca 82 c9 7d fa 59 47 f0 ad d4 a2 af 9c a4 72 c0
            b7 fd 93 26 36 3f f7 cc 34 a5 e5 f1 71 d8 31 15
            04 c7 23 c3 18 96 05 9a 07 12 80 e2 eb 27 b2 75
            09 83 2c 1a 1b 6e 5a a0 52 3b d6 b3 29 e3 2f 84
            53 d1 00 ed 20 fc b1 5b 6a cb be 39 4a 4c 58 cf
            d0 ef aa fb 43 4d 33 85 45 f9 02 7f 50 3c 9f a8
            51 a3 40 8f 92 9d 38 f5 bc b6 da 21 10 ff f3 d2
            cd 0c 13 ec 5f 97 44 17 c4 a7 7e 3d 64 5d 19 73
            60 81 4f dc 22 2a 90 88 46 ee b8 14 de 5e 0b db
            e0 32 3a 0a 49 06 24 5c c2 d3 ac 62 91 95 e4 79
            e7 c8 37 6d 8d d5 4e a9 6c 56 f4 ea 65 7a ae 08
            ba 78 25 2e 1c a6 b4 c6 e8 dd 74 1f 4b bd 8b 8a
            70 3e b5 66 48 03 f6 0e 61 35 57 b9 86 c1 1d 9e
            e1 f8 98 11 69 d9 8e 94 9b 1e 87 e9 ce 55 28 df
            8c a1 89 0d bf e6 42 68 41 99 2d 0f b0 54 bb 16
) DO set /a "S[!i!]=0x%%a, i+=1"

:: Key scheduler algorithm
:: Generates a set of round keys by expanding the input key into the required number of words.
:: The first 4 words of the input key will be used before the cipher rounds begin, and any leftover
:: words will be used in the first cipher round.  Each cipher round requires its own 4-word round key.
:: Input      Round 0     Cipher   Cipher     Round 0    Additional   Words/Key   Expansion
:: Key Size   Need/Have   Rounds   WordsReq   Leftover   WordsReq     Expansion    Rounds Req'd
:: 128-bit      4 / 4       10        40    -    0     =    40      /     4     =     10
:: 192-bit      4 / 6       12        48    -    2     =    46      /     6     =     8
:: 256-bit      4 / 8       14        56    -    4     =    52      /     8     =     7
IF %keySize%==128 set /a expStart=4, expEnd=43, numRounds=10
IF %keySize%==192 set /a expStart=6, expEnd=51, numRounds=12
IF %keySize%==256 set /a expStart=8, expEnd=59, numRounds=14

set nextHalfMod=-1
FOR /L %%a IN (%expStart%,1,%expEnd%) DO (
   set /a mod=%%a %% %expStart%
   :: If this word is a multiple of the input key size, start the generator algorithm
   IF !mod!==0 (
      IF %keysize%==256 set /a nextHalfMod=%%a+4
      set /a lastword=%%a-1
      FOR %%b IN (!lastword!) DO (
         :: First step is a one byte circular left shift of the previous word's values
         set /a tempword[0]=!key[1][%%b]!, %\n%
                tempword[1]=!key[2][%%b]!, %\n%
                tempword[2]=!key[3][%%b]!, %\n%
                tempword[3]=!key[0][%%b]!
         :: Next we substitute all bytes per the pre-computed S-box
         FOR /L %%c IN (0,1,3) DO FOR %%d IN (!tempword[%%c]!) DO set /a tempword[%%c]=!S[%%d]!
         :: Then we XOR the first byte with this round's pre-computed round constant
         set /a expRound=%%a/%expStart%
         FOR %%c IN (!expRound!) DO (
            set /a key[0][%%a]=!tempword[0]!^^^^!RC[%%c]!, %\n%
                   key[1][%%a]=!tempword[1]!, %\n%
                   key[2][%%a]=!tempword[2]!, %\n%
                   key[3][%%a]=!tempword[3]!
         )
      )
      :: Generator algorithm complete.  Now we XOR the result with the last word which was a multiple of the input key size
      set /a last=%%a-%expStart%
      FOR %%b IN (!last!) DO (
         set /a key[0][%%a]=!key[0][%%b]!^^^^!key[0][%%a]!, %\n%
                key[1][%%a]=!key[1][%%b]!^^^^!key[1][%%a]!, %\n%
                key[2][%%a]=!key[2][%%b]!^^^^!key[2][%%a]!, %\n%
                key[3][%%a]=!key[3][%%b]!^^^^!key[3][%%a]!
      )
   )
   IF NOT !mod!==0 (
      :: For 256-bit keys only, if 4 words ago was a multiple of the key size we perform the S-box%\n%
         substitution last-multiple XOR on this word as well.
      IF %%a==!nextHalfMod! (
         set /a lastword=%%a-1
         :: Create a temporary copy of the previous word
         FOR %%b IN (!lastword!) DO (
            set /a tempword[0]=!key[0][%%b]!, %\n%
                   tempword[1]=!key[1][%%b]!, %\n%
                   tempword[2]=!key[2][%%b]!, %\n%
                   tempword[3]=!key[3][%%b]!
         )
         FOR /L %%c IN (0,1,3) DO FOR %%d IN (!tempword[%%c]!) DO set /a key[%%c][%%a]=!S[%%d]!
         set /a last=%%a-8
         FOR %%b IN (!last!) DO (
            set /a key[0][%%a]=!key[0][%%b]!^^^^!key[0][%%a]!, %\n%
                   key[1][%%a]=!key[1][%%b]!^^^^!key[1][%%a]!, %\n%
                   key[2][%%a]=!key[2][%%b]!^^^^!key[2][%%a]!, %\n%
                   key[3][%%a]=!key[3][%%b]!^^^^!key[3][%%a]!
         )
      )
      :: The rest of the words in each round are simply formed by XOR operations between%\n%
         the immediately previous word and the word in the same position from the previous round.
      IF NOT %%a==!nextHalfMod! (
         set /a first=%%a-1
         set /a second=%%a-%expStart%
         FOR %%b IN (!first!) DO FOR %%c IN (!second!) DO (
            set /a key[0][%%a]=!key[0][%%b]!^^^^!key[0][%%c]!, %\n%
                   key[1][%%a]=!key[1][%%b]!^^^^!key[1][%%c]!, %\n%
                   key[2][%%a]=!key[2][%%b]!^^^^!key[2][%%c]!, %\n%
                   key[3][%%a]=!key[3][%%b]!^^^^!key[3][%%c]!
         )
      )
   )
)

::Below begins the encryption routine.  If we're in decrypt mode, jump over this section
IF %mode%==Dec GOTO AES_startDec
:AES_startEnc
set time1=0
for /f "tokens=1-4 delims=:." %%A in ("!time: =0!") do set /a "time1=(((1%%A*60)+1%%B)*60+1%%C)*100+1%%D-36610100"
set "outFile=%inFile%.aes"
IF EXIST !outFile! del !outFile!
set "tempFile=%temp%\#.tmp"
del "%tempFile%" 2>NUL

:: Define pre-computed multiplication tables for multiplying by 2 and 3 over the Rijndael Galois
:: finite field.  Required for the MixColumns() function. 
:: Original algorithm author Antonio Perez Ayala - Nov 12, 2013
:: This replaces the original "Peasant's algorithm"-based macro for generating Galois constants.
:: Multiplicand:                    2         3
FOR /L %%x IN (0,1,255) do (
   :: bit 0 value:                  0         1
   set /A "x=%%x,                             G3[%%x]=x" 
   :: bit 1 value:                  1         1
   set /A "x<<=1, x^=(x>>8)*0x11B,  G2[%%x]=x,G3[%%x]^=x"
)

:: Encryption routine requires administrative privileges for 'fsutil'
:: Validate proper privilege and exit if not running as administrator
fsutil >nul 2>&1
IF NOT %ERRORLEVEL%==0 echo FATAL ERROR:  User does not have administrative rights& exit /b
fsutil file createnew "%tempFile%" %~Z1 >NUL

:: The binary input file is converted to an ASCII-coded hexadecimal representation in blocks of 128-bits.
:: Each block is written into a pipe file in round robin fashion for later processing.
:: Each time a new pipe file is created, a child 'aescore' process is started to manage the pipe.
set /A i=0, d=0, pid=0
set block=
echo 1>_mutex1
for /F "skip=1 tokens=1,2 delims=: " %%b in ('fc /B "%inFile%" "%tempFile%"') do (
   set /A b=0x%%b
   if !i! neq !b! (
      set /A c=b-i
      for /L %%i in (!c!,-1,1) do (
         set "block=!block!00"
         set /A d+=1
         if !d! geq 16 (
            set /a d=0, pid+=1
            set /a pipe=!pid! %% %numCPU%
            echo !pid!,!block!>>_pipe!pipe!
            IF !pid! LEQ %numCPU% start /B aescore PIPE !pipe! enc outfile
            set block=
         )
      )
   )
   set "block=!block!%%c"
   set /A i=b+1, d+=1
   if !d! geq 16 (
      set /a d=0, pid+=1
      set /a pipe=!pid! %% %numCPU%
      echo !pid!,!block!>>_pipe!pipe!
      IF !pid! LEQ %numCPU% start /B aescore PIPE !pipe! enc outfile
      set block=
   )
)
if !i! neq %~Z1 (
   set /A c=%~Z1-i
   for /L %%i in (!c!,-1,1) do (
      set "block=!block!00"
      set /A d+=1
      if !d! geq 16 (
         set /a d=0, pid+=1
         set /a pipe=!pid! %% %numCPU%
         echo !pid!,!block!>>_pipe!pipe!
         IF !pid! LEQ %numCPU% start /B aescore PIPE !pipe! enc outfile
         set block=
      )
   )
)

:: Finished loading all the data into pipes (except any partial last block, see below)
:: Now we have to wait until all child processes finish and delete their pipes
CALL:waitOnPipesClosed

:: If the input data is not an even multiple of 32 bytes there will be a partial block leftover.
:: Pad out the block to 32 bytes then encrypt it separately.
IF DEFINED block (
   set len=0
   set "strTest=A!block!"
   FOR /l %%A IN (12,-1,0) DO (
      set /a "len|=1<<%%A"
      FOR %%B IN (!len!) DO IF "!strTest:~%%B,1!"=="" set /a "len&=~1<<%%A"
   )
   set /A numToPad=32-!len!, pid+=1
   FOR /L %%a IN (1,1,!numToPad!) DO set "block=!block!0"
   echo PAD !numToPad!>>!outfile!
   start /B aescore enc !block! outfile !pid!
   set /a pid+=1
   :WaitEncLastLine
   IF NOT EXIST _mutex!pid! GOTO:WaitEncLastLine
)
del "%tempFile%"
del _mutex!pid!

::Below begins the decryption routine.  If we're in encrypt mode, jump over this section
IF %mode%==Enc GOTO AES_done
:AES_startDec
set time1=0
for /f "tokens=1-4 delims=:." %%A in ("!time: =0!") do set /a "time1=(((1%%A*60)+1%%B)*60+1%%C)*100+1%%D-36610100"
set "hexFile=%inFile%.hex"
set "outfile=%infile:~0,-4%")
IF EXIST %hexFile% del %hexFile%

:: If the output file already exists ie: 'Example.txt', change the output file name to 'Example(1).txt'.
:: If that file already exists, delete it.
IF EXIST "!outFile!" set "outFile=!outFile:~0,-4!(1)!outFile:~-4!"
IF EXIST "!outFile!" del "!outFile!"

:: Define pre-computed Inverse Rijndael S-box values for InvSubBytes() function
:: To minimize CMD environment size, these overwrite the "normal" S-box values used
:: previously in the key scheduler algorithm, that are no longer needed
set i=0
FOR %%a IN (52 09 6A D5 30 36 A5 38 BF 40 A3 9E 81 F3 D7 FB
            7C E3 39 82 9B 2F FF 87 34 8E 43 44 C4 DE E9 CB
            54 7B 94 32 A6 C2 23 3D EE 4C 95 0B 42 FA C3 4E
            08 2E A1 66 28 D9 24 B2 76 5B A2 49 6D 8B D1 25
            72 F8 F6 64 86 68 98 16 D4 A4 5C CC 5D 65 B6 92
            6C 70 48 50 FD ED B9 DA 5E 15 46 57 A7 8D 9D 84
            90 D8 AB 00 8C BC D3 0A F7 E4 58 05 B8 B3 45 06
            D0 2C 1E 8F CA 3F 0F 02 C1 AF BD 03 01 13 8A 6B
            3A 91 11 41 4F 67 DC EA 97 F2 CF CE F0 B4 E6 73
            96 AC 74 22 E7 AD 35 85 E2 F9 37 E8 1C 75 DF 6E
            47 F1 1A 71 1D 29 C5 89 6F B7 62 0E AA 18 BE 1B
            FC 56 3E 4B C6 D2 79 20 9A DB C0 FE 78 CD 5A F4
            1F DD A8 33 88 07 C7 31 B1 12 10 59 27 80 EC 5F
            60 51 7F A9 19 B5 4A 0D 2D E5 7A 9F 93 C9 9C EF
            A0 E0 3B 4D AE 2A F5 B0 C8 EB BB 3C 83 53 99 61
            17 2B 04 7E BA 77 D6 26 E1 69 14 63 55 21 0C 7D
) DO set /a "S[!i!]=0x%%a, i+=1"

:: Define pre-computed multiplication tables for multiplying by by 9, 11, 13 and 14 over the
:: Rijndael Galois finite field.  Required for the InvMixColumns() function.
:: Original algorithm author Antonio Perez Ayala - Nov 12, 2013
:: This replaces the original "Peasant's algorithm"-based macro for generating Galois constants.
:: Multiplicand:                    9          11          13          14
FOR /L %%x IN (0,1,255) do (
   :: bit 0 value:                  1          1           1           0
   set /A "x=%%x,                   G9[%%x]=x, G11[%%x]=x, G13[%%x]=x" 
   :: bit 1 value:                  0          1           0           1
   set /A "x<<=1, x^=(x>>8)*0x11B,             G11[%%x]^=x,            G14[%%x]=x"
   :: bit 2 value:                  0          0           1           1
   set /A "x<<=1, x^=(x>>8)*0x11B,                         G13[%%x]^=x,G14[%%x]^=x"
   :: bit 3 value:                  1          1           1           1
   set /A "x<<=1, x^=(x>>8)*0x11B,  G9[%%x]^=x,G11[%%x]^=x,G13[%%x]^=x,G14[%%x]^=x"
)

:: Each block is written into a pipe file in round robin fashion for later processing.
:: Each time a new pipe file is created, a child 'aescore' process is started to manage the pipe.
echo 1>_mutex1
set pid=1
FOR /F "usebackq tokens=1,2 delims= " %%a IN ("%inFile%") DO (
   set data=%%a
   IF NOT !data!==PAD (
      set /a pipe=!pid! %% %numCPU%
      echo !pid!,!data!>>_pipe!pipe!
      IF !pid! LEQ %numCPU% start /B aescore PIPE !pipe! dec hexFile
      set /a pid+=1
   )
   IF !data!==PAD (
      set unPadLastLine=YES
      set numToUnPad=%%b
      set /A numGoodChars=32-!numToUnPad!
   )
)

:: Finished loading all the data into pipes
:: Now we have to wait until all child processes finish and delete their pipes
CALL:waitOnPipesClosed

:: If the final data block was padded during encryption it needs to be unpadded now
:: Generate a temporary file and then copy it back over the original hexfile
IF DEFINED unPadLastLine (
   :: Count the number of lines in the output hexFile
   FOR /F %%a in ('type "%hexFile%"^|find "" /v /c') do set /a numLines=%%a
   :: Walk the hexFile, copying everything but the last line to a temp file.%\n%
      The last line needs to have its padding characters removed.
   set "tempFile=%hexFile%.tmp"
   del "!tempFile!" 2>NUL
   set lineNum=0
   FOR /F "usebackq" %%a IN ("%hexFile%") DO (
      set /a lineNum+=1
      IF NOT !lineNum!==!numLines! (
         echo %%a>>"!tempFile!"
      ) ELSE (
         set data=%%a
         FOR %%i IN (!numGoodChars!) DO set data=!data:~0,%%i!
         echo !data!>>"!tempFile!"
      )
   )
)
IF EXIST "!tempFile!" (
   del "%hexFile%" 2>NUL
   ren "!tempFile!" "%hexFile%" 2>NUL
)

:: The last step is restoring the original binary data from the hex file
:: Writes a temporary VBScript file then executes and deletes the script.
echo Dim line,index>Hex2Bin.vbs
echo Do While Not WScript.StdIn.AtEndOfStream>>Hex2Bin.vbs
echo    line = WScript.StdIn.ReadLine^(^)>>Hex2Bin.vbs
echo    index = ^1>>Hex2Bin.vbs
echo    While index ^< 32>>Hex2Bin.vbs
echo       WScript.StdOut.Write Chr^(CByte^(^"^&H^"^&Mid^(line,index,2^)^)^)>>Hex2Bin.vbs
echo       index = index+2>>Hex2Bin.vbs
echo    WEnd>>Hex2Bin.vbs
echo Loop>>Hex2Bin.vbs
Cscript /B /E:VBS Hex2Bin.vbs < "%hexFile%" > "%outFile%"
del Hex2Bin.vbs 2>nul
del "%hexFile%" 2>nul
del _mutex!pid!

:AES_done
set time2=0
for /f "tokens=1-4 delims=:." %%A in ("!time: =0!") do set /a "time2=(((1%%A*60)+1%%B)*60+1%%C)*100+1%%D-36610100"
set /a "DD=(!time2!)-(!time1!)"
if !DD! lss 0 set /a "DD+=24*60*60*100"
set /a "HH=DD/360000, DD-=HH*360000, MM=DD/6000, DD-=MM*6000, SS=DD/100, DD-=SS*100"
if "!HH:~1!"=="" set "HH=0!HH!"
if "!MM:~1!"=="" set "MM=0!MM!"
if "!SS:~1!"=="" set "SS=0!SS!"
if "!DD:~1!"=="" set "DD=0!DD!"
set "result=!HH!:!MM!:!SS!.!DD!"
echo Time: !result!
exit /b

:: Subroutine to print usage information
:AES_usage
echo AES ^<inFile^> [^<inKey^>] [^<#CPU^>]
echo  Encrypts or decrypts the input file using the Advanced Encryption Standard
echo  algorithm as specified in NIST FIPS-197.  Files with extensions other than
echo  '.aes' are encrypted to a file of the same name, plus an .aes extension.
echo  '.aes' input files are decrypted to a file of the same name with the .aes
echo  extension removed.
echo    ^<inFile^>:   A complete path to the file to be processed or the name of a
echo                variable containing the target file's path.
echo    [^<inKey^>]:  Either a complete path to a keyfile, a variable containing
echo                the path to a keyfile, or a varible containing a 32-, 48- or
echo                64-character string of hexadecimal values ^(ie: a 128-, 192- or
echo                256-bit key^). If not provided the program will prompt for a key.
echo    [^<#CPU^>]:   An integer corresponding to the number of CPU cores to
echo                exercise in parallel.  Defaults to all CPU cores.
exit /b

:: Subroutine to delay while waiting for child processes to finish
:waitOnPipesClosed
IF EXIST _pipe* GOTO waitOnPipesClosed
exit /b


The most notable changes in this version are the removal of all macros, and automatic generation of the Galois lookup tables. A large amount of effort went into minimizing the size of the CMD environment, as the performance of the child processes slows down with larger environments. Removing the macros facilitated this goal, as did only defining encryption constants in encryption mode, and decryption in decryption mode, rather than defining all constants always.

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 02 Nov 2013 01:25
by Magialisk
UPDATED CODE POSTED 11/25/13

Now the code for the real workhorse, aescore.bat:

Code: Select all

@echo off
SETLOCAL ENABLEDELAYEDEXPANSION
:: Check whether we're supposed to operate with a pipe file
:: If so, grab the input parameters and start encoding from the pipe
set pipeTest=%~1
IF "%pipeTest%"=="PIPE" (
   set pipe=%~2
   set operation=%~3
   set "mode=!operation:~0,3!"
   set "output=!%~4!"
   CALL:encPipe <_pipe!pipe!
   :: Once the pipe is empty, delete the pipe file and return to the main program.
   del "_pipe!pipe!"
   exit
) ELSE GOTO:AES_notPiped

:encPipe
:: Get input one line at a time from the pipe file
:: Each line includes a pid# and data, comma separated
set "pipeIn="
set /p pipeIn=
:: As long as there's data in the pipe, tokenize it and run the AES algorithm
IF DEFINED pipeIn (
   FOR /F "tokens=1,2 delims=," %%a in ("!pipeIn!") do (
      set pid=%%a
      set data=%%b
   )
   CALL:AES_start
   GOTO:encPipe
) ELSE (exit /b)

:: Grab input parameters for Legacy (pre-piped) execution mode, which operates on a single 128-bit block
:AES_notPiped
:: Grab input parameters
set operation=%~1
set "mode=%operation:~0,3%"
set data=%~2
set "output=!%~3!"
set PID=%~4
set /a lastPID=%PID%-1
set "pipe="

::Regardless of whether we're running in legacy mode or piped mode, from here down it's all the same
:AES_start
:: Process input data from left to right as bytes 0 through 15
:: Rearrange each into 4 words of 4 bytes each and insert as columns in a state matrix
::   W0   W1   W2   W3        state[r][c] definitions
::  -------------------       ---------------------
::   B0 | B4 | B8 | B12       0,0 | 0,1 | 0,2 | 0,3
::   B1 | B5 | B9 | B13   =   1,0 | 1,1 | 1,2 | 1,3
::   B2 | B6 | B10| B14       2,0 | 2,1 | 2,2 | 2,3
::   B3 | B7 | B11| B15       3,0 | 3,1 | 3,2 | 3,3
set position=0
FOR /L %%c IN (0,1,3) DO FOR /L %%b IN (0,1,3) DO (
   FOR %%a IN (!position!) DO set /a "state[%%b][%%c]=0x!data:~%%a,2!"
   set /a position=!position!+2
)

:: The encryption algorithm starts here.  If we are in decryption mode, jump over it to
:: where the decryption algorithm starts.
IF %mode%==dec goto:AES_Decrypt

:: Prior to beginning the encryption rounds, perform the AddRoundKey() function using the initial key and state
FOR /L %%c IN (0,1,3) DO FOR /L %%b IN (0,1,3) DO set /a state[%%b][%%c]=!key[%%b][%%c]!^^^^!state[%%b][%%c]!

:: Begin the actual encryption algorithm on the state matrix
:: Each round (variable %%a) 1 through n-1 consists of the following four operations:
::  - SubBytes()
::  - ShiftRows()
::  - MixColumns()
::  - AddRoundKey()
:: The final round consists of all of the above except MixColumns()
FOR /L %%a IN (1,1,%numRounds%) DO (
   :: For efficiency, the SubBytes() and ShiftRows() operations are combined into one step to prevent an additional%\n%
      traversal of the entire state matrix.  Each byte is replaced by the appropriate SubBytes() substitution for the%\n%
      byte that would have shifted into its place in the ShiftRows() stage.
   FOR /L %%b IN (0,1,3) DO FOR /L %%c IN (0,1,3) DO (
      set /A newc=%%c-%%b
      IF !newc! LSS 0 set /A newc+=4
      FOR %%d in (!state[%%b][%%c]!) DO set /a tempstate[%%b][!newc!]=!S[%%d]!
   )
   :: Next is MixColumns().  Performed in all rounds except the last.  This function operates on the tempstate%\n%
      variable produced in the previous function, and outputs back into the original state variable.  The%\n%
      AddRoundKey() operation is combined into this step to prevent an extra loop over the state matrix.
   FOR /L %%c IN (0,1,3) DO (
      set stateTarget=tempstate
      IF NOT %%a==%numrounds% (
         FOR %%d IN (!tempstate[0][%%c]!) DO FOR %%e IN (!tempstate[1][%%c]!) DO FOR %%f IN (!tempstate[2][%%c]!) DO FOR %%g IN (!tempstate[3][%%c]!) DO (
            set /a state[0][%%c]=!G2[%%d]!^^^^!G3[%%e]!^^^^!tempstate[2][%%c]!^^^^!tempstate[3][%%c]!, %\n%
                   state[1][%%c]=!tempstate[0][%%c]!^^^^!G2[%%e]!^^^^!G3[%%f]!^^^^!tempstate[3][%%c]!, %\n%
                   state[2][%%c]=!tempstate[0][%%c]!^^^^!tempstate[1][%%c]!^^^^!G2[%%f]!^^^^!G3[%%g]!, %\n%
                   state[3][%%c]=!G3[%%d]!^^^^!tempstate[1][%%c]!^^^^!tempstate[2][%%c]!^^^^!G2[%%g]!
         )
         set stateTarget=state
      )
      set /A keyword=%%a*4+%%c
      FOR %%d IN (!keyword!) DO FOR %%e IN (!stateTarget!) DO FOR /L %%f IN (0,1,3) DO (
         set /a state[%%f][%%c]=!key[%%f][%%d]!^^^^!%%e[%%f][%%c]!
      )
   )
)

:: The decryption algorithm starts here.  If we are in encryption mode, jump over it to
:: where the output starts.
IF %mode%==enc goto:AES_Output
:AES_Decrypt

:: Prior to beginning the decryption rounds, perform the AddRoundKey() function using the final expanded key
:: This is the 'Nth' round, so decrement the number of rounds to run accordingly
set /a origNumRounds=%numrounds%, firstword=%numrounds%*4
FOR /L %%c IN (0,1,3) DO FOR /L %%b IN (0,1,3) DO (
   set /a keyword=%%c+!firstword!
   FOR %%d IN (!keyword!) DO set /a state[%%b][%%c]=!key[%%b][%%d]!^^^^!state[%%b][%%c]!
)
set /a numrounds-=1

:: Begin the actual decryption algorithm on the state matrix
:: Each round (variable %%a) 9 through 1 consists of the following four operations:
::  - InvShiftRows()
::  - InvSubBytes()
::  - AddRoundKey()
::  - InvMixColumns()
:: Round 0 consists of all of the above except InvMixColumns()
FOR /L %%a IN (!numrounds!,-1,0) DO (
   :: For efficiency, InvShiftRows() and InvSubBytes() operations are combined into one step to prevent an additional%\n%
      traversal of the entire state matrix.  Each byte is replaced by the appropriate InvSubBytes() substitution for the%\n%
      byte that would have shifted into its place in the InvShiftRows() stage.
   FOR /L %%b IN (0,1,3) DO FOR /L %%c IN (0,1,3) DO (
      set /A newc=%%c+%%b
      IF !newc! GTR 3 set /a newc-=4
      FOR %%d in (!state[%%b][%%c]!) DO set /a tempstate[%%b][!newc!]=!S[%%d]!
   )
   :: AddRoundKey() is its own inverse.  As before this operation has been combined with InvMixColumns() for efficiency%\n%
      however the order of the two operations is reversed.
   set /a firstword=%%a*4
   FOR /L %%c IN (0,1,3) DO (
      FOR /L %%b IN (0,1,3) DO (
         set /a keyword=%%c+!firstword!
         FOR %%d IN (!keyword!) DO set /a tempstate2[%%b][%%c]=!key[%%b][%%d]!^^^^!tempstate[%%b][%%c]!
         IF %%a==0 set /a state[%%b][%%c]=!tempstate2[%%b][%%c]!
      )
      IF NOT %%a==0 (
         FOR %%d IN (!tempstate2[0][%%c]!) DO FOR %%e IN (!tempstate2[1][%%c]!) DO FOR %%f IN (!tempstate2[2][%%c]!) DO FOR %%g IN (!tempstate2[3][%%c]!) DO (
            set /a state[0][%%c]=!G14[%%d]!^^^^!G11[%%e]!^^^^!G13[%%f]!^^^^!G9[%%g]!, %\n%
                   state[1][%%c]=!G9[%%d]!^^^^!G14[%%e]!^^^^!G11[%%f]!^^^^!G13[%%g]!, %\n%
                   state[2][%%c]=!G13[%%d]!^^^^!G9[%%e]!^^^^!G14[%%f]!^^^^!G11[%%g]!, %\n%
                   state[3][%%c]=!G11[%%d]!^^^^!G13[%%e]!^^^^!G9[%%f]!^^^^!G14[%%g]!
         )
      )
   )
)
:: Restore the round counter before we grab the next block of data from the input pipe.
set numRounds=%origNumRounds%

:: [En/De]cryption complete.  Prepare the output line by converting the state matrix back
:: to a single line of hexadecimal. 
:AES_Output
set outline=
set map=0123456789ABCDEF
FOR /L %%c IN (0,1,3) DO FOR /L %%b IN (0,1,3) DO (
   set dec=!state[%%b][%%c]!
   FOR /L %%i IN (1,1,2) DO (
      set /a "d=dec&15,dec>>=4"
      FOR %%j IN (!d!) DO set "hex=!map:~%%j,1!!hex!"
   )
   set "outline=!outline!!hex!"
   set hex=
)


:: Mutual exclusion code.
:: This is the critical section where the result block is written to the output file.
:: This code block is protected by the mutex/resource file.  No PID will be allowed to
:: output its result until the resource file is granted to it by the previous PID.  After writing its result, 
:: each PID renames the resource file with the number of the next PID to allow that PID into the critical section.
:: If a PID is stalled for too long before being granted the mutex it assumes something is wrong and commits suicide.
IF %PID%==1 GOTO:AES_Critical
set /a timeout=0, maxTimeout=1000
:AES_waitForRelease
IF NOT EXIST _mutex%PID% (
   set /A timeout=!timeout!+1
   IF !timeout! GTR !maxTimeout! (
      echo FATAL ERROR:  PID %PID% was unable to enter critical section and terminated.
      exit
   )
   GOTO:AES_waitForRelease
)
:AES_Critical
echo !outline!>>"!output!"
set /a nextpid=!pid!+1
ren _mutex!pid! _mutex!nextpid!
IF DEFINED pipe (exit /b) ELSE (exit)

The major change here is that the code now supports "legacy" operation mode and "piped" operation mode. So if you want to call it and pass a single 128-bit block of data (as in older versions of aes.bat) that still works, or you can point it to a pipe file full of 128-bit data chunks and it will process them all. The new version of AES.bat posted above takes advantage of the new piped input format. In either case aescore maintains "thread-safety" using a mutex file to guarantee proper output order amongst several child processes.

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 02 Nov 2013 01:29
by Magialisk
UPDATED 11/08/13
Since there's no more need to put code in this third post, I thought I'd post an update on how much faster the modified code runs. For the exact same ~1KB test file on my Q6600 the times are down in the 12-14s range now as opposed to the 40+s seen in the first post:

Code: Select all

c:\Users\Marc\Desktop\AES>aes test.bin 128.key
Converting input file to ASCII-coded hexadecimal...
...Hexadecimal Conversion Complete (Elapsed Time: 00:00:01.10)
Beginning AES Encryption with 8 PIDs...
...AES Encryption Complete (Elapsed Time: 00:00:12.02)

c:\Users\Marc\Desktop\AES>aes test.bin.aes 128.key
Beginning AES Decryption with 8 PIDs...
The system cannot find the drive specified.
...AES Decryption Complete (Elapsed Time: 00:00:14.41)
Restoring binary file from ASCII-coded hexadecimal...
...Binary file restoration complete (Elapsed Time: 00:00:00.08)


UPDATED 11/25/13
This isn't an apples to apples comparison, but I thought I'd post the speed results of the updated code here, running in 3 pipes on my Xeon machine at work. It should be relatively analogous to running on the Q6600 at home, enough to make the point anyway:

Code: Select all

c:\BTP\AESmult>aes test.bin 128.key 3
Time: 00:00:04.73

c:\BTP\AESmult>aes test.bin.aes 128.key 3
Time: 00:00:08.02

c:\BTP\AESmult>
Clearly the piped architecture made a significant difference. The encrypt side saw a much larger speed boost, because it's CMD environment size was cut down to about 25% of the original code by not defining unnecessary values. The decrypt side was only able to cut out about 1/3 of the environment size, so it's speed boost wasn't quite as substantial, but it's still moving along much quicker. There's not much that can be done about the decrypt routine needing access to over 1250 lookup table values...

On the 12-thread Xeon with 11 pipes running I can encrypt a 10KB file in about 19s and decrypt it in 31s. So throughput is up to about 540 Bytes/sec encrypting, and 330 Bytes/sec decrypting. You won't be encrypting .mp3's anytime soon, but if you're willing to wait a minute or so a reasonably sized 50kB document is now a reasonable target file.

As I said before I'd really love to hear what you guys think. And a huge thank you again to Antonio who not only developed the original Bin<=>Hex functions but gave me a number of great suggestions that led to this enhanced version of the code.

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 04 Nov 2013 19:07
by Magialisk
No comments yet, but that's OK, my feelings aren't hurt :D

I'm working on some improvements to the code. So far I tweaked the PID handler to give it finer resolution over starting PIDs. It used to launch 'maxPID' number of PIDs at once, wait for the last one to finish, and then launch 'maxPID' more. Like a sliding window that only allowed PIDs in the current window bounds to be launched, and the window moved 'maxPID' units at a time. Now it uses 'maxPID' (renamed 'pidNum') more like a target value than a max, and after launching that many PIDs it loops, checking the tasklist and counting cmd.exe instances. As soon as there are less instances than allowed, it starts enough PIDs to get back to 'pidNum', plus a fudge factor, speculating on how many PIDs will finish in the delta time before it gets back into its loop. It does a pretty good job making sure there are always around maxPID-2 to maxPID+2 PIDs in flight, instead of the original bursty method with gaps in between bursts.

I'm also planning a large change in the key expansion algorithm. I had a revelation that it needs to be moved out of 'aescore' and into the main control program. This way it only needs to be run one time, and when a new PID gets 'start'ed the CMD environment should copy all the variables (including all derived round keys) into the new PID's environment. I think I'll bring all of the lookup tables back into the main control program as well so those variables only get set once. It will be interesting to see what the difference in performance is with the slower CMD environment copy on PID start (due to ~1500 extra variables) vs. the faster PID execution (due to not re-expanding keys and not doing ~1500 "set" operations). I'm thinking the env. copy and the sets should roughly cancel each other out, but not having to run the expansion algorithm to re-derive everything should be a big savings...

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 05 Nov 2013 02:33
by foxidrive
It seems impressive, but it's not my sphere of interest - unless it creates pics of nudie women in ascii. ;)

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 05 Nov 2013 17:26
by Magialisk
Yes the sphere of interest for this is extremely small by it's very nature, and much more so because the code is so slow it's impractical to use beyond a proof of concept. Thank you for dropping me a kind comment in any case! Maybe in v2.0 I'll work in some ASCII pics for you :shock:

I did implement several changes to the code, most notably moving the key expansion into the controller program. This cut the execution time of the individual AES PIDs roughly in half. They finish so rapidly now that the PID handler couldn't keep up to keep enough of them in flight. After testing various values, I doubled the default PID count to 2x#CPUs to make up for this, and it does a great job keeping the pipe full now.

I gutted all extraneous code from aescore as well, input validation, etc. If it wasn't strictly necessary it's gone. I also fixed a bug where if the file was an exact even multiple of 32 bytes it would encrypt fine but couldn't be decrypted :oops: . After all these changes, the algorithm needs 9m21s to [en/de]crypt 166 Bytes of hex data, compared to ~30m from 2 days ago. Throughput as such is up to a hair over 300 Bytes/sec. That's still only a partial story of course, as the 166 Bytes of hex data is created from a 66 KB file, and it takes ~1.5m to do that Bin2Hex conversion. So from start to finish the operation encrypts 66KB in ~11m, netting just over 100 Bytes/sec.

That's about all I can squeeze out of it I think. I might try multithreading the Bin2Hex conversion for some more improvement, and there's one more algorithmic change I want to experiment with to see if I can eek another couple percent, then I'll update the first posts with the improved code and leave this one for the ages. Thanks again for dropping a line!

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 06 Nov 2013 06:40
by einstein1969
hi , Magialisk

I would like to try your code but I'm in trouble.

Could you post in order to be clear about what are the files?

es.

file1.bat:

Code: Select all

code of file1.bat



file2.bat:

Code: Select all

etc..


thanks anyway!

EDIT: Ok i have done! It's slow.... the set section is very slow.

Einstein1969

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 06 Nov 2013 17:30
by Magialisk
Thank you for giving it a try! I honestly didn't expect anyone to download and tinker with it, that's why I broke the code into sections instead of one big blob at the expense of making it harder to reassemble. I figured at best someone might want to read over some of the sections and comment.

When you say "the set section is very slow" I'm curious exactly which section you mean? The lookup table definitions up top? That's over 2000 'set' operations for the parser to chug through, and unfortunately nothing can be done to speed it along :|. Any tips on speeding up any portion of the code would be very much appreciated though! A hundredth of a second goes a long way when multiplied out by 5000+ PIDs! As I mentioned a couple posts up I've since tweaked it to be a little over twice as fast (probably should say "half as slow instead :lol:) and I'll be updating the posts with that code, probably over the weekend.

Even with the improved code it is still very, very slow, but considering the number of operations being done that was mostly expected. This is after all supposedly the most secure non-classified encryption algorithm known, and the only "public" algorithm allowed to be used with US Top Secret data. In other words, I didn't invent it, I can't change how it "works", but I could potentially implement it in a more or less efficient way...

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 06 Nov 2013 19:57
by dbenham
Magialisk wrote:Even with the improved code it is still very, very slow, but considering the number of operations being done that was mostly expected. This is after all supposedly the most secure non-classified encryption algorithm known, and the only "public" algorithm allowed to be used with US Top Secret data. In other words, I didn't invent it, I can't change how it "works", but I could potentially implement it in a more or less efficient way...

Like perhaps use a different language :twisted:

Your project is totally impractical, yet very interesting, and an amazing accomplishment.


Dave Benham

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 06 Nov 2013 22:15
by Magialisk
Thanks Dave for the kind word. I think I've said too many times already that this was never meant as anything more than a proof of concept, so practicality is a distant secondary concern :D . This was a somewhat natural conclusion from that "string encoding" thread up through the Vigenere cipher, the Enigma machine and now ultimately AES. I just like to see what can be done in batch, it's fun for me.

In fact the original idea to try this came from a lecture I gave about disk encryption that touched on AES at a high level. The lecture wasn't too long after I'd finished the Enigma code, and I made a comment that the algorithm was open and that anyone with the smarts/time/whatever could just sit down at home and write a working AES cipher. It wasn't long after saying that my brain started thinking "I wonder if I could batch it..?" :twisted: That and this gave me a good excuse to dive in and digest how the algorithm works instead of just hand-waving over it all the time.

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 06 Nov 2013 23:25
by Aacini
Excuse me. If the objective is to generate a more efficient Batch code for this application, I certainly will not do that trying to understand (parts of) your code and then modifying it. Doing that will only produce a series of patches that perhaps may interfere with each other and that does not represent the best way to achieve a certain task. I prefer to consult the original specification you used to develop this application and, after review and understand it, propose modifications. However, you have not posted a single reference to AES or FIPS-197 and I am not interested enough in this topic to google for they. In my opinion, you should start this topic with a brief description of what AES/FIPS-197 are, the benefits of their use and a couple links to the reference pages you used to develop this program.

Anyway, after a casual review of your code I can do a couple observations about it:

Although a macro is much more efficient than a subroutine call, it is slightly less efficient than the original code written in its place, because the macro expansion is an additional task that is not done in the second case. In the same way, a !delayed! expansion is less efficient than a %normal% one, and not needed switches and options just slow down a program. For example, in the code below you should change the two separated "set /A numPIDs=%%i & set /A numPIDs-=2" by just one "set /A numPIDs=%%i-2" and this "IF !numPIDs! GEQ !maxPID! ..." by "IF %numPIDs% GEQ %maxPID% ...". Finally, any loop assembled via GOTO are certainly the worst case of efficiency, especially if the program is large.

In my opinion, the code below waste a great amount of CPU resources:

Code: Select all

:: Subroutine to control the spawning of PIDs.  
:: Checks the task list to see how many cmd.exe instances are running.  Subtracts one for itself
:: and one for the tasklist command.  Loops until there are less than maxPID instances before
:: returning to launch another one.
:AESctrl_PIDctrl
FOR /F %%i IN ('tasklist ^| find "cmd.exe" /C') do set /A numPIDs=%%i
set /A numPIDs-=2
IF !numPIDs! GEQ !maxPID! GOTO AESctrl_PIDctrl
exit /b


Perhaps a ping command inserted in the cycle will free CPU resources in order to be used by the rest of the program. However, as I said before, I don't know how this subroutine is used in the whole program, so I don't know if my conclusion is right. To be sure of that, I would need to understand the whole code first!

Antonio

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 07 Nov 2013 00:29
by Magialisk
Antonio thank you so much for your feedback, and while I'm at it thank you again for your BinToHex and HexToBin scripts. I do hope you don't mind that I've modified them for this task! (note in the source I retained your original authorship statements)

Allow me to address your primary concern about "what is" AES. The short version is that the Advanced Encryption Standard is the result of a multi-year, worldwide, open "competition" held by NIST to create a successor to the Data Encryption Standard (DES). The winner of that competition was an algorithm named Rjindael designed by two Belgian cryptographers. NIST took the Rijndael algorithm and standardized a version with slightly reduced capabilities in a document called FIPS-197. This is a Federal Information Processing Standard, one of many that define requirements for protecting the US government's data. That said, NIST is only sanctioned to define standards for data up to "Sensitive but Unclassified" while the NSA has the charter to define standards for protecting data at the SECRET level and above. The AES algorithm is interesting because it has been blessed by the NSA for use in protecting up to US TOP SECRET data, the first and to my knowledge only open cryptographic standard to achieve this. So not only is it the de-facto standard commercially used in everything from hard disks to cell phones it is also the standard militarily, used in the US, EU, NATO, etc. As far as "symmetric block ciphers" go, this is "the one". In addition, with everything going on in the news, allegations of the NSA injecting backdoors or otherwise weakening several cryptographic standards, this is one of the rare few that is both blessed by NSA for it's strength and simultaneously above reproach in terms of their ability to have tampered with it's construction.

The standard itself can be downloaded here: http://csrc.nist.gov/publications/fips/ ... ps-197.pdf, and it contains everything one would need to write a working cipher, including psuedo-code and example block sequences of the algorithm in action. No other reference is necessary, though I admit I did plenty of Googling myself in constructing the cipher. I'd be happy to share more, but I doubt the majority would care to read it. As Foxidrive pointed out, this topic is of limited interest to the majority. I happen to work in a realm where NIST/NSA standards are relevant, I've taken a recent interest in cryptography as a study, and I enjoy writing batch scripts, so here we are. :D

Moving on, you made two very good points about my inefficiencies in that subroutine. Combining the two 'set's into one seems so obvious, it's almost embarrassing to have not done that by default. Also, the delayed vs. normal expansion is a very bad habit I admittedly have. I get burned very often when my variables in code blocks (even simple 'IF' statements) refuse to work without delayed expansion, so I've gotten in the bad habit of using delayed notation just about everywhere. It usually doesn't matter because I'm not trying to optimize my script, but in this case it does matter and I appreciate your critique. I will scrub the batch of all unnecessary delayed expansion before I post the updated code this weekend.

Finally, a comment about the GOTO loop and the waste of CPU resources. While I can't argue with that in principle, I believe that in this situation the GOTO is the best solution. I'll attempt to explain. That tight wait loop occurs in the main program which is controlling the dispatching of child processes (PIDs) that do the real work. It will consume one core on its own because of this unrestrained GOTO loop, but that seemed better than the alternative, especially on high core count machines. I was originally going to use a 0.1 second ping delay in this instance until a thread from earlier this week brought it to my attention that ping can only create a delay in 500ms intervals. Each one of my PIDs only runs for approximately 10-12 hundredths of a second, so if I used a ping, by the time it returned I should have completed at least 5 PIDs. On a quad-core PC, for example, the whole pipeline would be stalled after the 4th PID was finished and I'd have to wait on the ping to timeout before I could launch some more. My useful computational duty cycle would be cut by more than 20%. Instead I used the very tight loop below to detect as quickly as possible when a PID has ended so I can get another one in the pipe to replace it. In the improved version of this routine (not posted yet) I actually fire off additional PIDs based on a speculation of how many more will finish in the time between the subroutine deciding to fire a new PID and the time it takes to actually fire one and get back into the subroutine loop. I've re-written this PID handler at least a half-dozen different ways, and so far this is the best I've come up with (or technically, the revised as-yet-unposted variant of this is). Previous versions showed a constant sawtooth between 90% and 100% CPU, while this one gives a solid straight line on all 12 cores with only an occasional dip to ~98%. That doesn't mean it can't be made better, but there are several ways it could be made worse :D

Thank you again and I apologize for the overly long reply. I'm thrilled you've taken even a cursory interest in my project, so please if you have any other questions or suggestions let me know.

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 07 Nov 2013 14:44
by einstein1969
majalisk wrote:I'm also planning a large change in the key expansion algorithm. I had a revelation that it needs to be moved out of 'aescore' and into the main control program. This way it only needs to be run one time, and when a new PID gets 'start'ed the CMD environment should copy all the variables (including all derived round keys) into the new PID's environment. I think I'll bring all of the lookup tables back into the main control program as well so those variables only get set once. It will be interesting to see what the difference in performance is with the slower CMD environment copy on PID start (due to ~1500 extra variables) vs. the faster PID execution (due to not re-expanding keys and not doing ~1500 "set" operations). I'm thinking the env. copy and the sets should roughly cancel each other out, but not having to run the expansion algorithm to re-derive everything should be a big savings...

I have made some test.


Aacini wrote:Although a macro is much more efficient than a subroutine call, it is slightly less efficient than the original code written in its place, because the macro expansion is an additional task that is not done in the second case. In the same way, a !delayed! expansion is less efficient than a %normal% one, and not needed switches and options just slow down a program. For example, in the code below you should change the two separated "set /A numPIDs=%%i & set /A numPIDs-=2" by just one "set /A numPIDs=%%i-2" and this "IF !numPIDs! GEQ !maxPID! ..." by "IF %numPIDs% GEQ %maxPID% ...". Finally, any loop assembled via GOTO are certainly the worst case of efficiency, especially if the program is large.

This is good speed improvement.



There is other optimization that we can try:

- Map and lookup
- use of type / findstr for faster lookup , etc

- For spawn of process: Implemen an architecture more efficiency
Example:
- Spawn a process that work in the role of "dispatcher" (the main)
- Spawn 'n' processes that do not ending execute (spawn only once)
- For each process execute more work if possible and after finish wait for other work to do without respawn.

test:

Code: Select all


@echo off & setlocal EnableDelayedExpansion

set i=100

set t0=%time%

:loop

set /a RC[1]=0x01 & set /a RC[2]=0x02 & set /a RC[3]=0x04 & set /a RC[4]=0x08 & set /a RC[5]=0x10
set /a RC[6]=0x20 & set /a RC[7]=0x40 & set /a RC[8]=0x80 & set /a RC[9]=0x1b & set /a RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63 & set /a S[1]=0x7c & set /a S[2]=0x77 & set /a S[3]=0x7b & set /a S[4]=0xf2
set /a S[5]=0x6b & set /a S[6]=0x6f & set /a S[7]=0xc5 & set /a S[8]=0x30 & set /a S[9]=0x01
set /a S[10]=0x67 & set /a S[11]=0x2b & set /a S[12]=0xfe & set /a S[13]=0xd7 & set /a S[14]=0xab
set /a S[15]=0x76 & set /a S[16]=0xca & set /a S[17]=0x82 & set /a S[18]=0xc9 & set /a S[19]=0x7d
set /a S[20]=0xfa & set /a S[21]=0x59 & set /a S[22]=0x47 & set /a S[23]=0xf0 & set /a S[24]=0xad
set /a S[25]=0xd4 & set /a S[26]=0xa2 & set /a S[27]=0xaf & set /a S[28]=0x9c & set /a S[29]=0xa4
set /a S[30]=0x72 & set /a S[31]=0xc0 & set /a S[32]=0xb7 & set /a S[33]=0xfd & set /a S[34]=0x93
set /a S[35]=0x26 & set /a S[36]=0x36 & set /a S[37]=0x3f & set /a S[38]=0xf7 & set /a S[39]=0xcc
set /a S[40]=0x34 & set /a S[41]=0xa5 & set /a S[42]=0xe5 & set /a S[43]=0xf1 & set /a S[44]=0x71
set /a S[45]=0xd8 & set /a S[46]=0x31 & set /a S[47]=0x15 & set /a S[48]=0x04 & set /a S[49]=0xc7
set /a S[50]=0x23 & set /a S[51]=0xc3 & set /a S[52]=0x18 & set /a S[53]=0x96 & set /a S[54]=0x05
set /a S[55]=0x9a & set /a S[56]=0x07 & set /a S[57]=0x12 & set /a S[58]=0x80 & set /a S[59]=0xe2
set /a S[60]=0xeb & set /a S[61]=0x27 & set /a S[62]=0xb2 & set /a S[63]=0x75 & set /a S[64]=0x09
set /a S[65]=0x83 & set /a S[66]=0x2c & set /a S[67]=0x1a & set /a S[68]=0x1b & set /a S[69]=0x6e
set /a S[70]=0x5a & set /a S[71]=0xa0 & set /a S[72]=0x52 & set /a S[73]=0x3b & set /a S[74]=0xd6
set /a S[75]=0xb3 & set /a S[76]=0x29 & set /a S[77]=0xe3 & set /a S[78]=0x2f & set /a S[79]=0x84
set /a S[80]=0x53 & set /a S[81]=0xd1 & set /a S[82]=0x00 & set /a S[83]=0xed & set /a S[84]=0x20
set /a S[85]=0xfc & set /a S[86]=0xb1 & set /a S[87]=0x5b & set /a S[88]=0x6a & set /a S[89]=0xcb
set /a S[90]=0xbe & set /a S[91]=0x39 & set /a S[92]=0x4a & set /a S[93]=0x4c & set /a S[94]=0x58
set /a S[95]=0xcf & set /a S[96]=0xd0 & set /a S[97]=0xef & set /a S[98]=0xaa & set /a S[99]=0xfb

set /a i=i-1
if !i! gtr 0 goto :loop

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

set i=100

set t0=%time%

:loop1

call aes_set_1.cmd

set /a i=i-1
if !i! gtr 0 goto :loop1

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%

goto :next_test

rem  aes_set_1.cmd:

set /a RC[1]=0x01 & set /a RC[2]=0x02 & set /a RC[3]=0x04 & set /a RC[4]=0x08 & set /a RC[5]=0x10
set /a RC[6]=0x20 & set /a RC[7]=0x40 & set /a RC[8]=0x80 & set /a RC[9]=0x1b & set /a RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63 & set /a S[1]=0x7c & set /a S[2]=0x77 & set /a S[3]=0x7b & set /a S[4]=0xf2
set /a S[5]=0x6b & set /a S[6]=0x6f & set /a S[7]=0xc5 & set /a S[8]=0x30 & set /a S[9]=0x01
set /a S[10]=0x67 & set /a S[11]=0x2b & set /a S[12]=0xfe & set /a S[13]=0xd7 & set /a S[14]=0xab
set /a S[15]=0x76 & set /a S[16]=0xca & set /a S[17]=0x82 & set /a S[18]=0xc9 & set /a S[19]=0x7d
set /a S[20]=0xfa & set /a S[21]=0x59 & set /a S[22]=0x47 & set /a S[23]=0xf0 & set /a S[24]=0xad
set /a S[25]=0xd4 & set /a S[26]=0xa2 & set /a S[27]=0xaf & set /a S[28]=0x9c & set /a S[29]=0xa4
set /a S[30]=0x72 & set /a S[31]=0xc0 & set /a S[32]=0xb7 & set /a S[33]=0xfd & set /a S[34]=0x93
set /a S[35]=0x26 & set /a S[36]=0x36 & set /a S[37]=0x3f & set /a S[38]=0xf7 & set /a S[39]=0xcc
set /a S[40]=0x34 & set /a S[41]=0xa5 & set /a S[42]=0xe5 & set /a S[43]=0xf1 & set /a S[44]=0x71
set /a S[45]=0xd8 & set /a S[46]=0x31 & set /a S[47]=0x15 & set /a S[48]=0x04 & set /a S[49]=0xc7
set /a S[50]=0x23 & set /a S[51]=0xc3 & set /a S[52]=0x18 & set /a S[53]=0x96 & set /a S[54]=0x05
set /a S[55]=0x9a & set /a S[56]=0x07 & set /a S[57]=0x12 & set /a S[58]=0x80 & set /a S[59]=0xe2
set /a S[60]=0xeb & set /a S[61]=0x27 & set /a S[62]=0xb2 & set /a S[63]=0x75 & set /a S[64]=0x09
set /a S[65]=0x83 & set /a S[66]=0x2c & set /a S[67]=0x1a & set /a S[68]=0x1b & set /a S[69]=0x6e
set /a S[70]=0x5a & set /a S[71]=0xa0 & set /a S[72]=0x52 & set /a S[73]=0x3b & set /a S[74]=0xd6
set /a S[75]=0xb3 & set /a S[76]=0x29 & set /a S[77]=0xe3 & set /a S[78]=0x2f & set /a S[79]=0x84
set /a S[80]=0x53 & set /a S[81]=0xd1 & set /a S[82]=0x00 & set /a S[83]=0xed & set /a S[84]=0x20
set /a S[85]=0xfc & set /a S[86]=0xb1 & set /a S[87]=0x5b & set /a S[88]=0x6a & set /a S[89]=0xcb
set /a S[90]=0xbe & set /a S[91]=0x39 & set /a S[92]=0x4a & set /a S[93]=0x4c & set /a S[94]=0x58
set /a S[95]=0xcf & set /a S[96]=0xd0 & set /a S[97]=0xef & set /a S[98]=0xaa & set /a S[99]=0xfb


::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:next_test

set t0=%time%

For /L %%i in (1,1,100) do (

call aes_set_1.cmd

)

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

set i=100

set t0=%time%

:loop2

(
set /a RC[1]=0x01 & set /a RC[2]=0x02 & set /a RC[3]=0x04 & set /a RC[4]=0x08 & set /a RC[5]=0x10
set /a RC[6]=0x20 & set /a RC[7]=0x40 & set /a RC[8]=0x80 & set /a RC[9]=0x1b & set /a RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63 & set /a S[1]=0x7c & set /a S[2]=0x77 & set /a S[3]=0x7b & set /a S[4]=0xf2
set /a S[5]=0x6b & set /a S[6]=0x6f & set /a S[7]=0xc5 & set /a S[8]=0x30 & set /a S[9]=0x01
set /a S[10]=0x67 & set /a S[11]=0x2b & set /a S[12]=0xfe & set /a S[13]=0xd7 & set /a S[14]=0xab
set /a S[15]=0x76 & set /a S[16]=0xca & set /a S[17]=0x82 & set /a S[18]=0xc9 & set /a S[19]=0x7d
set /a S[20]=0xfa & set /a S[21]=0x59 & set /a S[22]=0x47 & set /a S[23]=0xf0 & set /a S[24]=0xad
set /a S[25]=0xd4 & set /a S[26]=0xa2 & set /a S[27]=0xaf & set /a S[28]=0x9c & set /a S[29]=0xa4
set /a S[30]=0x72 & set /a S[31]=0xc0 & set /a S[32]=0xb7 & set /a S[33]=0xfd & set /a S[34]=0x93
set /a S[35]=0x26 & set /a S[36]=0x36 & set /a S[37]=0x3f & set /a S[38]=0xf7 & set /a S[39]=0xcc
set /a S[40]=0x34 & set /a S[41]=0xa5 & set /a S[42]=0xe5 & set /a S[43]=0xf1 & set /a S[44]=0x71
set /a S[45]=0xd8 & set /a S[46]=0x31 & set /a S[47]=0x15 & set /a S[48]=0x04 & set /a S[49]=0xc7
set /a S[50]=0x23 & set /a S[51]=0xc3 & set /a S[52]=0x18 & set /a S[53]=0x96 & set /a S[54]=0x05
set /a S[55]=0x9a & set /a S[56]=0x07 & set /a S[57]=0x12 & set /a S[58]=0x80 & set /a S[59]=0xe2
set /a S[60]=0xeb & set /a S[61]=0x27 & set /a S[62]=0xb2 & set /a S[63]=0x75 & set /a S[64]=0x09
set /a S[65]=0x83 & set /a S[66]=0x2c & set /a S[67]=0x1a & set /a S[68]=0x1b & set /a S[69]=0x6e
set /a S[70]=0x5a & set /a S[71]=0xa0 & set /a S[72]=0x52 & set /a S[73]=0x3b & set /a S[74]=0xd6
set /a S[75]=0xb3 & set /a S[76]=0x29 & set /a S[77]=0xe3 & set /a S[78]=0x2f & set /a S[79]=0x84
set /a S[80]=0x53 & set /a S[81]=0xd1 & set /a S[82]=0x00 & set /a S[83]=0xed & set /a S[84]=0x20
set /a S[85]=0xfc & set /a S[86]=0xb1 & set /a S[87]=0x5b & set /a S[88]=0x6a & set /a S[89]=0xcb
set /a S[90]=0xbe & set /a S[91]=0x39 & set /a S[92]=0x4a & set /a S[93]=0x4c & set /a S[94]=0x58
set /a S[95]=0xcf & set /a S[96]=0xd0 & set /a S[97]=0xef & set /a S[98]=0xaa & set /a S[99]=0xfb
)

set /a i=i-1
if !i! gtr 0 goto :loop2

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

set t0=%time%

For /L %%i in (1,1,100) do (

set /a RC[1]=0x01 & set /a RC[2]=0x02 & set /a RC[3]=0x04 & set /a RC[4]=0x08 & set /a RC[5]=0x10
set /a RC[6]=0x20 & set /a RC[7]=0x40 & set /a RC[8]=0x80 & set /a RC[9]=0x1b & set /a RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63 & set /a S[1]=0x7c & set /a S[2]=0x77 & set /a S[3]=0x7b & set /a S[4]=0xf2
set /a S[5]=0x6b & set /a S[6]=0x6f & set /a S[7]=0xc5 & set /a S[8]=0x30 & set /a S[9]=0x01
set /a S[10]=0x67 & set /a S[11]=0x2b & set /a S[12]=0xfe & set /a S[13]=0xd7 & set /a S[14]=0xab
set /a S[15]=0x76 & set /a S[16]=0xca & set /a S[17]=0x82 & set /a S[18]=0xc9 & set /a S[19]=0x7d
set /a S[20]=0xfa & set /a S[21]=0x59 & set /a S[22]=0x47 & set /a S[23]=0xf0 & set /a S[24]=0xad
set /a S[25]=0xd4 & set /a S[26]=0xa2 & set /a S[27]=0xaf & set /a S[28]=0x9c & set /a S[29]=0xa4
set /a S[30]=0x72 & set /a S[31]=0xc0 & set /a S[32]=0xb7 & set /a S[33]=0xfd & set /a S[34]=0x93
set /a S[35]=0x26 & set /a S[36]=0x36 & set /a S[37]=0x3f & set /a S[38]=0xf7 & set /a S[39]=0xcc
set /a S[40]=0x34 & set /a S[41]=0xa5 & set /a S[42]=0xe5 & set /a S[43]=0xf1 & set /a S[44]=0x71
set /a S[45]=0xd8 & set /a S[46]=0x31 & set /a S[47]=0x15 & set /a S[48]=0x04 & set /a S[49]=0xc7
set /a S[50]=0x23 & set /a S[51]=0xc3 & set /a S[52]=0x18 & set /a S[53]=0x96 & set /a S[54]=0x05
set /a S[55]=0x9a & set /a S[56]=0x07 & set /a S[57]=0x12 & set /a S[58]=0x80 & set /a S[59]=0xe2
set /a S[60]=0xeb & set /a S[61]=0x27 & set /a S[62]=0xb2 & set /a S[63]=0x75 & set /a S[64]=0x09
set /a S[65]=0x83 & set /a S[66]=0x2c & set /a S[67]=0x1a & set /a S[68]=0x1b & set /a S[69]=0x6e
set /a S[70]=0x5a & set /a S[71]=0xa0 & set /a S[72]=0x52 & set /a S[73]=0x3b & set /a S[74]=0xd6
set /a S[75]=0xb3 & set /a S[76]=0x29 & set /a S[77]=0xe3 & set /a S[78]=0x2f & set /a S[79]=0x84
set /a S[80]=0x53 & set /a S[81]=0xd1 & set /a S[82]=0x00 & set /a S[83]=0xed & set /a S[84]=0x20
set /a S[85]=0xfc & set /a S[86]=0xb1 & set /a S[87]=0x5b & set /a S[88]=0x6a & set /a S[89]=0xcb
set /a S[90]=0xbe & set /a S[91]=0x39 & set /a S[92]=0x4a & set /a S[93]=0x4c & set /a S[94]=0x58
set /a S[95]=0xcf & set /a S[96]=0xd0 & set /a S[97]=0xef & set /a S[98]=0xaa & set /a S[99]=0xfb


)

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::


set t0=%time%

For /L %%i in (1,1,100) do (

set /a RC[1]=0x01, RC[2]=0x02, RC[3]=0x04, RC[4]=0x08, RC[5]=0x10
set /a RC[6]=0x20, RC[7]=0x40, RC[8]=0x80, RC[9]=0x1b, RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63, S[1]=0x7c, S[2]=0x77, S[3]=0x7b, S[4]=0xf2
set /a S[5]=0x6b, S[6]=0x6f, S[7]=0xc5, S[8]=0x30, S[9]=0x01
set /a S[10]=0x67, S[11]=0x2b, S[12]=0xfe, S[13]=0xd7, S[14]=0xab
set /a S[15]=0x76, S[16]=0xca, S[17]=0x82, S[18]=0xc9, S[19]=0x7d
set /a S[20]=0xfa, S[21]=0x59, S[22]=0x47, S[23]=0xf0, S[24]=0xad
set /a S[25]=0xd4, S[26]=0xa2, S[27]=0xaf, S[28]=0x9c, S[29]=0xa4
set /a S[30]=0x72, S[31]=0xc0, S[32]=0xb7, S[33]=0xfd, S[34]=0x93
set /a S[35]=0x26, S[36]=0x36, S[37]=0x3f, S[38]=0xf7, S[39]=0xcc
set /a S[40]=0x34, S[41]=0xa5, S[42]=0xe5, S[43]=0xf1, S[44]=0x71
set /a S[45]=0xd8, S[46]=0x31, S[47]=0x15, S[48]=0x04, S[49]=0xc7
set /a S[50]=0x23, S[51]=0xc3, S[52]=0x18, S[53]=0x96, S[54]=0x05
set /a S[55]=0x9a, S[56]=0x07, S[57]=0x12, S[58]=0x80, S[59]=0xe2
set /a S[60]=0xeb, S[61]=0x27, S[62]=0xb2, S[63]=0x75, S[64]=0x09
set /a S[65]=0x83, S[66]=0x2c, S[67]=0x1a, S[68]=0x1b, S[69]=0x6e
set /a S[70]=0x5a, S[71]=0xa0, S[72]=0x52, S[73]=0x3b, S[74]=0xd6
set /a S[75]=0xb3, S[76]=0x29, S[77]=0xe3, S[78]=0x2f, S[79]=0x84
set /a S[80]=0x53, S[81]=0xd1, S[82]=0x00, S[83]=0xed, S[84]=0x20
set /a S[85]=0xfc, S[86]=0xb1, S[87]=0x5b, S[88]=0x6a, S[89]=0xcb
set /a S[90]=0xbe, S[91]=0x39, S[92]=0x4a, S[93]=0x4c, S[94]=0x58
set /a S[95]=0xcf, S[96]=0xd0, S[97]=0xef, S[98]=0xaa, S[99]=0xfb


)

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

set t0=%time%

For /L %%i in (1,1,100) do (

set /a RC[1]=0x01, RC[2]=0x02, RC[3]=0x04, RC[4]=0x08, RC[5]=0x10, RC[6]=0x20, RC[7]=0x40, RC[8]=0x80, RC[9]=0x1b, RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63, S[1]=0x7c, S[2]=0x77, S[3]=0x7b, S[4]=0xf2, S[5]=0x6b, S[6]=0x6f, S[7]=0xc5, S[8]=0x30, S[9]=0x01
set /a S[10]=0x67, S[11]=0x2b, S[12]=0xfe, S[13]=0xd7, S[14]=0xab, S[15]=0x76, S[16]=0xca, S[17]=0x82, S[18]=0xc9, S[19]=0x7d
set /a S[20]=0xfa, S[21]=0x59, S[22]=0x47, S[23]=0xf0, S[24]=0xad, S[25]=0xd4, S[26]=0xa2, S[27]=0xaf, S[28]=0x9c, S[29]=0xa4
set /a S[30]=0x72, S[31]=0xc0, S[32]=0xb7, S[33]=0xfd, S[34]=0x93, S[35]=0x26, S[36]=0x36, S[37]=0x3f, S[38]=0xf7, S[39]=0xcc
set /a S[40]=0x34, S[41]=0xa5, S[42]=0xe5, S[43]=0xf1, S[44]=0x71, S[45]=0xd8, S[46]=0x31, S[47]=0x15, S[48]=0x04, S[49]=0xc7
set /a S[50]=0x23, S[51]=0xc3, S[52]=0x18, S[53]=0x96, S[54]=0x05, S[55]=0x9a, S[56]=0x07, S[57]=0x12, S[58]=0x80, S[59]=0xe2
set /a S[60]=0xeb, S[61]=0x27, S[62]=0xb2, S[63]=0x75, S[64]=0x09, S[65]=0x83, S[66]=0x2c, S[67]=0x1a, S[68]=0x1b, S[69]=0x6e
set /a S[70]=0x5a, S[71]=0xa0, S[72]=0x52, S[73]=0x3b, S[74]=0xd6, S[75]=0xb3, S[76]=0x29, S[77]=0xe3, S[78]=0x2f, S[79]=0x84
set /a S[80]=0x53, S[81]=0xd1, S[82]=0x00, S[83]=0xed, S[84]=0x20, S[85]=0xfc, S[86]=0xb1, S[87]=0x5b, S[88]=0x6a, S[89]=0xcb
set /a S[90]=0xbe, S[91]=0x39, S[92]=0x4a, S[93]=0x4c, S[94]=0x58, S[95]=0xcf, S[96]=0xd0, S[97]=0xef, S[98]=0xaa, S[99]=0xfb


)

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Execution time: %a%

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
rem test no spawn ambient

for /f "delims==" %%f in ('set s[') do set %%f=

echo main thread: echo S[50]:!S[50]!

set t0=%time%

For /L %%i in (1,1,10) do (start /B /WAIT cmd /c AEScore_set_part.bat)

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Ambient recreation: Execution time: %a%


goto :next_test1

rem AEScore_set_part.bat:

@echo off  & setlocal EnableDelayedExpansion

For /L %%j in (1,1,100) do (

set /a RC[1]=0x01, RC[2]=0x02, RC[3]=0x04, RC[4]=0x08, RC[5]=0x10, RC[6]=0x20, RC[7]=0x40, RC[8]=0x80, RC[9]=0x1b, RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63, S[1]=0x7c, S[2]=0x77, S[3]=0x7b, S[4]=0xf2, S[5]=0x6b, S[6]=0x6f, S[7]=0xc5, S[8]=0x30, S[9]=0x01
set /a S[10]=0x67, S[11]=0x2b, S[12]=0xfe, S[13]=0xd7, S[14]=0xab, S[15]=0x76, S[16]=0xca, S[17]=0x82, S[18]=0xc9, S[19]=0x7d
set /a S[20]=0xfa, S[21]=0x59, S[22]=0x47, S[23]=0xf0, S[24]=0xad, S[25]=0xd4, S[26]=0xa2, S[27]=0xaf, S[28]=0x9c, S[29]=0xa4
set /a S[30]=0x72, S[31]=0xc0, S[32]=0xb7, S[33]=0xfd, S[34]=0x93, S[35]=0x26, S[36]=0x36, S[37]=0x3f, S[38]=0xf7, S[39]=0xcc
set /a S[40]=0x34, S[41]=0xa5, S[42]=0xe5, S[43]=0xf1, S[44]=0x71, S[45]=0xd8, S[46]=0x31, S[47]=0x15, S[48]=0x04, S[49]=0xc7
set /a S[50]=0x23, S[51]=0xc3, S[52]=0x18, S[53]=0x96, S[54]=0x05, S[55]=0x9a, S[56]=0x07, S[57]=0x12, S[58]=0x80, S[59]=0xe2
set /a S[60]=0xeb, S[61]=0x27, S[62]=0xb2, S[63]=0x75, S[64]=0x09, S[65]=0x83, S[66]=0x2c, S[67]=0x1a, S[68]=0x1b, S[69]=0x6e
set /a S[70]=0x5a, S[71]=0xa0, S[72]=0x52, S[73]=0x3b, S[74]=0xd6, S[75]=0xb3, S[76]=0x29, S[77]=0xe3, S[78]=0x2f, S[79]=0x84
set /a S[80]=0x53, S[81]=0xd1, S[82]=0x00, S[83]=0xed, S[84]=0x20, S[85]=0xfc, S[86]=0xb1, S[87]=0x5b, S[88]=0x6a, S[89]=0xcb
set /a S[90]=0xbe, S[91]=0x39, S[92]=0x4a, S[93]=0x4c, S[94]=0x58, S[95]=0xcf, S[96]=0xd0, S[97]=0xef, S[98]=0xaa, S[99]=0xfb


)

rem use of array

set lectures=10000

For /L %%j in (1,1,%lectures%) do ( echo !S[50]! > nul )

set /a "r=(%random% * %random%) %% 10"

if !r! lss 2 echo !r! child s[50]:!S[50]!


:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:next_test1

rem test spawn ambient

set t0=%time%

(
set /a RC[1]=0x01, RC[2]=0x02, RC[3]=0x04, RC[4]=0x08, RC[5]=0x10, RC[6]=0x20, RC[7]=0x40, RC[8]=0x80, RC[9]=0x1b, RC[10]=0x36

:: Define pre-computed Rijndael S-box values for SubBytes() function and key scheduler algorithm
set /a S[0]=0x63, S[1]=0x7c, S[2]=0x77, S[3]=0x7b, S[4]=0xf2, S[5]=0x6b, S[6]=0x6f, S[7]=0xc5, S[8]=0x30, S[9]=0x01
set /a S[10]=0x67, S[11]=0x2b, S[12]=0xfe, S[13]=0xd7, S[14]=0xab, S[15]=0x76, S[16]=0xca, S[17]=0x82, S[18]=0xc9, S[19]=0x7d
set /a S[20]=0xfa, S[21]=0x59, S[22]=0x47, S[23]=0xf0, S[24]=0xad, S[25]=0xd4, S[26]=0xa2, S[27]=0xaf, S[28]=0x9c, S[29]=0xa4
set /a S[30]=0x72, S[31]=0xc0, S[32]=0xb7, S[33]=0xfd, S[34]=0x93, S[35]=0x26, S[36]=0x36, S[37]=0x3f, S[38]=0xf7, S[39]=0xcc
set /a S[40]=0x34, S[41]=0xa5, S[42]=0xe5, S[43]=0xf1, S[44]=0x71, S[45]=0xd8, S[46]=0x31, S[47]=0x15, S[48]=0x04, S[49]=0xc7
set /a S[50]=0x23, S[51]=0xc3, S[52]=0x18, S[53]=0x96, S[54]=0x05, S[55]=0x9a, S[56]=0x07, S[57]=0x12, S[58]=0x80, S[59]=0xe2
set /a S[60]=0xeb, S[61]=0x27, S[62]=0xb2, S[63]=0x75, S[64]=0x09, S[65]=0x83, S[66]=0x2c, S[67]=0x1a, S[68]=0x1b, S[69]=0x6e
set /a S[70]=0x5a, S[71]=0xa0, S[72]=0x52, S[73]=0x3b, S[74]=0xd6, S[75]=0xb3, S[76]=0x29, S[77]=0xe3, S[78]=0x2f, S[79]=0x84
set /a S[80]=0x53, S[81]=0xd1, S[82]=0x00, S[83]=0xed, S[84]=0x20, S[85]=0xfc, S[86]=0xb1, S[87]=0x5b, S[88]=0x6a, S[89]=0xcb
set /a S[90]=0xbe, S[91]=0x39, S[92]=0x4a, S[93]=0x4c, S[94]=0x58, S[95]=0xcf, S[96]=0xd0, S[97]=0xef, S[98]=0xaa, S[99]=0xfb
)

echo main thread: echo S[50]:!S[50]!

For /L %%i in (1,1,10) do (start /B /WAIT cmd /c AEScore_set_part_noset.bat)

set t1=%time%

for /F "tokens=1-8 delims=:.," %%a in ("!t0: =0!:!t1: =0!") do set /a "a=(((1%%e-1%%a)*60)+1%%f-1%%b)*6000+1%%g%%h-1%%c%%d, a+=(a>>31) & 8640000"

echo Ambient trasmission: Execution time: %a%

goto :next_test2

AEScore_set_part_noset.bat:

@echo off  & setlocal EnableDelayedExpansion

set lectures=10000

For /L %%j in (1,1,%lectures%) do ( echo !S[50]! > nul )

set /a "r=(%random% * %random%) %% 10"

if !r! lss 2 echo !r! child s[50]:!S[50]!

:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
:next_test2

rem work in progress


result:

Code: Select all

Execution time: 123
Execution time: 140
Execution time: 106
Execution time: 98
Execution time: 63
Execution time: 50
Execution time: 47
main thread: echo S[50]:
child s[50]:35
child s[50]:35
child s[50]:35
child s[50]:35
Ambient recreation: Execution time: 1907
main thread: echo S[50]:35
child s[50]:35
child s[50]:35
child s[50]:35
child s[50]:35
Ambient trasmission: Execution time: 1507


Einstein1969

Re: Multi-process Advanced Encryption Standard (AES)

Posted: 07 Nov 2013 16:16
by Magialisk
Einstein, thank you very much for that. I'm going to need to take some time tonight and really digest it, as it looks like you've made some significant gains there.

In the meantime, I'll post some of the optimizations I've made myself to the code:
First of all I have good news and bad news. The bad news is that I lied last night. My PIDs were not running for only 0.10-0.12s each. I calculated that in my head too quickly and obviously borked the math, probably counting several PIDs as one or something. It turns out an individual PID, processing 128-bits of data, runs for between 0.71-0.81s with an average time of 75.475ms.

The good news is that Antonio's comment about macros taking longer to run than direct code led me to discover a huge inefficiency in my routine. I've completely removed the Num2Hex macro from the code and replaced it with direct code. However, I realized while doing this that I was using Num2Hex to generate an 8-digit hex value and then stripping off 6 leading zeroes. Obviously a huge waste of time so it now creates the required 2-digit hex value directly instead.

Re-testing my PID speed they now run for between 0.61-0.72s with an average time of 64.125ms. Shaving over a tenth of a second (15%) is a huge difference. That's what I get for having terribly inefficient code the first time around I guess :D

Beyond that "easy" change, I also finally got around to testing an algorithmic change I had been wanting to make. Instead of running SubBytes() and ShiftRows() as two separate operations, which requires traversing the entire state matrix twice, I re-wrote the algorithm to perform these two steps as one operation. Each element of the state matrix gets replaced by the SubBytes() output of the byte that would have later been shifted into its place. Doing it this way requires using a temporary state matrix to hold intermediate results, however that ended up being advantageous as well. My MixColumns() function was already using a temporary state matrix, and at the end of that function copying the data back from the temp into the original state. With the changes above, MixCloumns can operate directly on the temporary matrix created in the first step, and output back into the original state matrix instead of it's own temporary, removing the need for an extra copy operation at the end of the function!

All in all this removed one nested-FOR traversal of the state matrix as well as at least 22 'set' operations per round. Re-timing the code I now see between 0.54-0.61s per PID, with an average of 56.387ms. That's another 12% savings! Due to this success I decided to do something similar with the MixColumns() and AddRoundKey() functions, combining them into a single loop. Re-timing the code it is now down to 0.51-0.54s per PID, with an average of 52.735ms, a further reduction of ~6.5%.

At this point I think the aescore (child) code is pretty much tweaked out, and there are few if any significant performance gains left to be had. Even the variability has decreased greatly, what used to be over a 0.1s spread between worst and best times is down to ~0.03s. The code is much more tight and efficient, but of course I still welcome any suggestions! I will look very closely at what you're suggesting Einstein as your code could improve the efficiency of the main controller program. First I need to port all of my above changes into the decryption portion of aescore, as up to now I've only been tweaking the encryption side. Once I get that done I'll post the modified aescore code back over the first posts and change focus to the main controller. Thanks again!