VIEWS: 9 PAGES: 6 CATEGORY: Technology POSTED ON: 4/3/2010
AES implementation on 8-bit microcontroller Sungha Kim Ingrid Verbauwhede yevgeny@ee.ucla.edu ingrid@ee.ucla.edu Department of Electrical Engineering University of California, Los Angeles Los Angeles, CA-90024 Abstract and logic operation, therefore additional register assignment schedule is needed. The security of sensor network is even more important nowadays. However their physical 1.1 Notation and computational limitation makes achieving Following is the convention used to describe required security level challenging. In this the operations in this paper. project, I will propose highly optimized Rijndael implementation using 16 registers for Nb: input block length divided by 32 storing each state on Atmel’s AVR™ Nk : key length divided by 32 microcontroller. This implementation is 40% Nr : number of rounds faster and smaller than other implementation State : the intermediate cipher result on AES proposal [1]. Sub state: 8 bit, divided unit of state, if block length is 128-bit, it has 16 sub 1 Introduction state of 8-bit 8-bit microcontroller can be used in a GF: finite field, Galois field wide range of applications, such as wireless Indirect address register: to store at sensor network for environment monitoring and memory and load from memory, the battle field ad-hoc network. Also the smart card destination address should be saved at is equipped with 8-bit microcontroller. these registers. Independent of their limited ability in X=R27|R26, Y=R29|R28, Z=R31|R30 computation and power, the security is the one of the most important issue for these applications. Until recently, much effort to 2 Related Work provide reliable security was done with the This section will briefly introduce other assumption having enough computation and implementation of Rijndael. Rijndael can also power supports. In case of applying these be implemented very efficiently on a wide approaches to 8-bit microcontroller, new issues range of processors and in hardware. Rafael R. occur because of their limitation. 8-bit Sevilla implemented by 80186 assembly and assembly environment is critically limited in Geoffrey keating’s Motorola 6805 the perspective of their computational implementation is also available on Rijndael functionality and data managing scheme. The site [3]. maximum size of transferring data is under the size of 28 and fundamentally deficient functionality of assembly language makes every transformation more complicated. 3 What is Rijndael? Moreover small numbers of 32 registers are In early August 1999, NIST selected five barely suffice the requirement for arithmetic algorithms – Mars, Rc6, Rijndael, Serpent and 1 Twofish as candidates of AES(Advanced The ByteSub transformation is a non- Encryption Standard). Finally, the Rijndael linear byte substitution which takes 8-bit sub block cipher algorithm was announced as AES state as its input and produce same size next by FIPS-197 [2] in 2001. The cipher has a sub state. The output is defined at S-box which variable block length and key length. Currently takes 16 by 16 byte of memory. specified keys and blocks have the length of 128, 192, or 256 bits respectively.(all nine 3.2 ShiftRow operation combinations of key length and block length The ShiftRow transformation is are possible). Both block length and key length individual to every last 3 rows of the state. can be extended by multiples of 32 bits. Each of the three rows shifts by all different bit Moreover the operation is based on 8 bits size which decided by block length. of sub state, which gives 8-bit processor the highest advantage to implement it. Rigndael(State, CipherKey) { KeyExpansion (CipherKey, ExpandedKey) AddRoundKey (State, ExpandedKey) Fig4. ShiftRow operation For(i=1;i<Nr;i++) Round (State,ExpandedKey( i )) FinalRound (State, ExpandedKey(Nr)) 3.3 MixColumn operation } The MixColumn transformation operates Fig1. Rijndael algorithm independently on every column of the state and Fig1. shows the whole procedure of Rijndael treats each sub state of the column as term of algorithm. Like DES or other block cipher a(x) in the equation b(x)=c(x)⊗a(x), where algorithm, Rijndael is also composed by a c(x)= ‘03’X3+’01’X2+’01’X+’02’.For example, certain number of rounds which is decided by in the fig5. a(x) is a0,jX3+ai,jX2+a2,jX+a3,j and it input block(Nb) and key length(Nk). is used as multiplicand of operation. Round(State, RoundKey) { ByteSub(State) ShiftRow(State) MixColumn(State) AddRoundKey(State,RoundKey) } Fig5. MixColumn operation Fig2. Round of Rijndael Four different transformations constitute each round. The final round is same as normal round 3.4 AddRoundKey operation without MixColumn. The AddRoundKey operation is simply a bitwise EXOR of roundkey and state. 3.1 ByteSub operation Fig3. SubByte operation Fig6. AddRoundKey operation 2 4 Advantages of Rijndael for 5.1 ByteSub operation Byte substitution can be done in the two 8-bit implementation different ways. First, taking the multiplicative Because 8-bit microcontroller can not inverse in GF(28) and ‘00’ is mapped onto itself. provide any high level compiler except Then, applying an affine (over GF(2)) assembler, the implementation environment transformation defined by 8 by 8 matrix, should depend on the attribute of assembly produces the output. This approach needs language. However every transformation is sacrificing of speed because of several numbers ultimately divided by minimal 8-bit sub state. of extra operations. Whereas the other way Therefore if only load and store the 8-bit sub using S-box gains speed but loses the code size state at right time, right position, the 8-bit for storing S-box on the memory. In this operation size can be optimal implementation S-box approach was chosen. 4.1 SubByte operation 5.2 MixColumn operation Sub Byte is byte substitution done on each MixColumn operation needs a certain sub state independently. Therefore the number of multiplication operations on GF(28). operation target is 8-bit. Every finite field multiplication can be done using tables, Logs and Antilogs tables. Like 4.2 MixColumn operation ByteSub operation, using table need more The matrix of Fig7 describes memory space while increasing speed. theMixColumn operation. Following the matrix Especially, in MixColumn operation, below, the operation is also byte independent. multiplicand is confined as ‘01’, ‘02’, and ‘03’, b 0 02 03 01 01 a0 which means ‘01’ and ‘02’ multiplication can b 01 02 03 01 a provide the clue for ‘03’ multiplication. 1 1 Therefore to exploit this special feature, direct b2 01 01 02 03 a2 multiplication approach was chosen instead of b3 03 01 01 02 a3 using tables. Fig7. Matrix for MixColumn operation 5.3 Storing state Repeated round comprises 4 different 4.3 AddRoundKey operation transformation. Each transformation need state Round key addition is bitwise EXOR as its input and produce output as new state. operation applied between round key and state. Such data transaction between each state and Each 8-bit sub state of state has one to one register executed very frequently. Also for mapping with sub key of round key. Therefore arithmetic and logic operation the operand, this operation is also byte dependent. each sub state should stay at register. Therefore the storing state issue is very critical. If the state stays at memory it is easier to fetch each 5 Implementation 8-bit sub state from memory to register with For implementation on Atmel’s AVR indirect address register X,Y,Z. However this microcontroller, AVR Studio v.3.55 which is approach need more cycle consumption provided by Atmel was chosen for convenience. because from/to memory to/from register It provides integrated development transaction need 2 cycles each, which is twice environment composed of assembler and as much as from/to register to/from register simulator also. Every cycle number and code scheme. If all sub state can be stored at size output depends on AVR Studio. registers, the memory will be referenced only Every design decision is tradeoff between for round key and S-box, which is impossible speed and code size. to be stored at registers. 3 1) st Y+, register (2Cycles) modules have all different execution times as in ld register, Y+ (2Cycles) Cycle column of the Table2 Module Cycle Code 2) mov register1, register2 (1Cycle1) (Byte) Precomputation 1 1 Fig8. storing state issue ByteSub 10 2 1) memory-register scheme ShiftRow 10 2 2) register-register scheme MixColumn 9 1 AddRoundKey 11 3 The register-register scheme requires whole state should stay at register, which means only branch 1 1 16 registers out of 32 registers are available for Table2. Weight factor of each module operation. In this implementation these 16 .Whereas, code length weight factor is registers are barely meet the required registers irrelevant with cycle number weight factor at Mix Column operation. Even high part of because some modules are reused every time indirect address registers, R27, R29, R31 are while the other modules are not. For example, used as general purpose registers while still AddRoudKey module executed at round0, used for storing address. From register0 to round1-9 and round10 but MixColumn register 15, the state is stored. executed only at round1-9, which makes the different code weight factor 3 and 1 respectively. With weight factor total number 6 Simulation result of cycle and code are computed by equation Code was run on the AVR Studio v.3.55 below. with the test vector from Brian Gladman’s technical paper[5]. The implementation was Total cycle number = optimized many times. From many versions of ∑Cycle number( i ) * weight factor( i ) implementation, two simulation results is proposed depending on the storing state issue. Total code size = The one stores the state at memory and the ∑Code number( i ) * weight factor( i ) other at registers. Module Cycle Code ,where i is every module (Byte) Precomputation 2648 1848 The round0-10 number is pure execution ByteSub 161 32 number, which excluding precomputation ShiftRow 64 256 feature. Because precompuatation is composed MixColumn 288 134 of S-box input to memory, key input to AddRoundKey 163 48 memory, key expansion and data block input, branch 23 12 these phases happens just once in the whole life Total 9306 2714 time of sensors. Once done precomputation can (round0-10) 6658 866 be reused afterward if there is no key update Table1. Simulation result of memory version and S-box update. For data block input, it can be assumed as initially staying at each register, The total number includes consideration of the from R0 to R15, beforehand. This procedure number of rounds. In this implementation, the can be done by radio simply interconnect radio block and key length are 128-bit each. with microcontroller. Therefore 10 rounds constitute one encryption The Table3 shows much improved result after procedure. In the 10 rounds of encryption storing state at registers. The improvement is procedure, precomputation is executed just 43% and 42% in the cycle number and code once which gives weight factor 1, and other size respectively. 4 Module Cycle Code (Byte) size improvement Precomputation 2648 1848 300 ByteSub 49 66 Code Size(Byte) 250 ShiftRow 16 32 200 MixColumn 231 104 150 AddRoundKey 48 64 100 branch 45 12 50 0 Total 6463 2352 ShiftRow AddKey ByteSum MixColumn Branch (round0-10) 3815 504 Modules Table3. Simulation result of Register version memory register Table5. Size improvement by register 7 Evaluation Comparing with other implementations the speed improvement output is obviously better. The Table6 and 7 are from Rijndael proposal [1] and they show 350 execution time and code size of other Number of cycles 300 250 implementation depending on key and block 200 length. 150 Key/Bolck length Number of cycles Code length 100 (128, 128) a) 4065 cycles 768 byte 50 0 (128, 128) b) 3744 cycles 826 byte ShiftRow AddKey (128, 128) c) 3168 cycles 1016 byte ByteSub MixColumn Branch Modules (192, 128) 4512 cycles 1125 byte (256, 128) 5221 cycles 1041 byte memory register Table6. Execution time and code size Table4. Speed improvement by register Rijndael in Intel 8051 assembler The fully registered implementation In case of Intel 8051 assembler, as code size produced remarkably improved output as at table4 and table5. Trivially increased time increase, the speed decreases, this improvement consumption at branch module stems from in speed while sacrificing size can be also done at AVR microcontroller by executing complicated register assignment scheduling. This sacrifice makes utilization percent of multiplication by tables. Key/Bolck length Number of cycles Code length register nearly full, which means most of the (128, 128) a) 8390 cycles 919 byte data transactions happen between register and register. The MixColumn module is still takes (192, 128) 10780 cycles 1170 byte much part of cycle number. This module is also (256, 128) 12490 cycles 1135 byte the critical part for register scheduling because Table7. Execution time and code size it need four multipication with four different Rijndael in Motorola 68HC08 assembler sub state at the same time while keeping their initial state. Therefore indirect address registers was used temporarily for normal operations. For other modules the utilization of register is slightly over 50%, which gives more possibility 8 Summary and future work to improve the efficiency. Table5 shows size Rijndael implementation using 16 registers improvement after storing state in the register. for storing state improves the efficiency over 40% in both speed and code size. The other 16 registers are slightly meets the need of logical 5 and arithmetic operations. Therefore if input block size is over 128-bit, it needs different register assignment scheme. Until now 128-bit size of input block is optimal for Rijndael implementation on 8-bit microcontroller. References [1] AES proposal: Rijndael, Joan Daemen, Vincent Rijmen [2] FIPS-197 [2] http://csrc.nist.gov/publications/fi ps/ [3] Rijndael home site http://www.esat.kuleuven.ac. be/~rijmen/rijndael/ [4] Adrian Perrig ,Robert Szewczyk, Victor Wen,D avid Culler,and J.D.Tygar SPINS:Security Protocols for Sensor Networks, MobiCom, July 2001. [5] A Specification for Rijndael, the AES Algorithm v3.3, Brian Gladman, May 2002 [6] A Communications Security Architecture and Cryptographic Mechanisms for Distributed Sensor Networks DARPA SensIT Workshop, Oct 8, 1999 . 6