This post is a continuation of my previous post. I continued my implementation of the DES encryption standard in Verilog by completing the decryption operation, as well as some architectural/practical use changes.
Disclaimer: I am a complete Verilog novice. I'm certain there are bad design/bad practice/bugs in this implementation. Eventually I intend for someone knowledgeable to review and critique my work in an effort to improve.
I finished the implementation of the DES algorithm by adding support for decryption. The decryption operation in DES is identical to the encryption operation except for reversing the key schedule. This functional change, as well as advice from more knowledgable CDL developers, produced some architectural changes. First, I broke up my large always
block into smaller logical sections: initial permutation, key scheduling, cryptographic operations, and final permutation. The reasoning behind this change is that combinational always
blocks must complete within the clock cycle on which they are triggered. Breaking part the always block allows for a faster clock rate without risking late completion. However, because each always
block is triggered on every clock cycle, I needed to add a mechanism for a state machine. I chose to do this with a validity register. As each stage of the cryptographic process completes, a consent bit is set in this register. Once all of the bits are set, the output of the operation is valid. Any time prior, the output is invalid. To facilitate this functionality, I defined some bit masks for each operation.
//Initial permutations valid bit
localparam IP_VALID = 4'h1;
//Key scheduling valid bit
localparam KEY_VALID = 4'h2;
//Crypto operations valid bit
localparam CRYPTO_VALID = 4'h4;
//Final permutations valid bit
localparam FP_VALID = 4'h8;
The "done" state is the OR of all of these bits.
//Done flag
localparam OP_DONE = IP_VALID | KEY_VALID | CRYPTO_VALID | FP_VALID;
To store the state of the operation, I added a register called op_valid
.
reg [3:0] op_valid;
As each step in the process completes, the corresponding bit is set in this register. The first step is the initial permutation operations. The block operates conditionally on an enable
input to the module. This will allow software to enable/disable the module. An enable input, enable
was added to the module. Parameters corresponding to the ENABLE
and DISABLE
states were added.
localparam DISABLE = 1'b0;
localparam ENABLE = 1'b1;
When the initial permutation block is finished, the op_valid
register is OR'd with the corresponding bit.
always @(posedge clock)
begin: ip_ops
if(enable == ENABLE)
begin
//PC-1 permutation of input key
pc1_key = pc1(key);
//Initial permutation of the input data
in_p = ip(input_data);
//Flag the input permutations as valid
op_valid = op_valid | IP_VALID;
end
end
The next step is calculating the round keys. Previously I had integrated the key generation into the encryption loop. I decided the simplist approach for reversing the key schedule for decryption operations was to return to pre-calculation of the keys. The logic for the key generation is identical to my previous post; however, a conditional statement was added to reverse the keys based on a mode input. The mode input was added as an input to the module. At the bottom of the key scheduling block, the KEY_VALID
bit is OR'd onto the op_valid
register.
//....
//Obtain inital C/D keys from PC-1 Key
cprev = pc1_key[0:27];
dprev = pc1_key[28:55];
//Calculate the 16 round keys
for(i = 0; i < 16; i=i+1)
begin
if(mode == ENCRYPT)
begin
cn = left_rotate(cprev, shift_table[i]);
dn = left_rotate(dprev, shift_table[i]);
round_keys[i] = pc2({cn, dn});
cprev = cn;
dprev = dn;
end
else
begin
cn = left_rotate(cprev, shift_table[i]);
dn = left_rotate(dprev, shift_table[i]);
round_keys[16-i-1] = pc2({cn, dn});
cprev = cn;
dprev = dn;
end
end
//Flag the keys as valid
op_valid = op_valid | KEY_VALID;
//...
The cryptographic operation is identical to the implementation in my previous post. As with the other blocks, the op_valid
register is udpated with the CRYPTO_VALID
flag.
//Calculate the 16 round keys
for(i = 0; i < 16; i=i+1)
begin
//Assign L to R-1
ln = rprev;
//Calculate R with f(L-1, Round Key)
rn = lprev ^ f(rprev, round_keys[i]);
//Update -1 components
rprev = rn;
lprev = ln;
end
//Flag the crypto operations as valid
op_valid = op_valid | CRYPTO_VALID;
The final block in the chain is the final permutations. The operation OR's on the final bit, FP_VALID
.
//final permuatations
always @(posedge clock)
begin : fp_ops
//Transposted concatentation of Rn and Ln
reg [0:63] rl;
if (op_valid & CRYPTO_VALID)
begin
//Final concatenation
rl = {rprev, lprev};
//Assign output_data to inverse IP permutation
output_data = ip_inv(rl);
//Flag the final permutations as valid
op_valid = op_valid | FP_VALID;
end
end
The valid
output from the module is assigned to the result of comparing the op_valid
register to the OP_DONE
mask. If each step in the process has run to completion, it means the output data is valid.
//Set valid register if the operation is complete
assign valid = (op_valid == OP_DONE);
The data will remain valid unless the key, input data, or both are changed. Another always
block was added.
//Reset if the input data or key changes
always @(key, input_data)
begin : reset
op_valid = 0;
end
The interface to this module is intended to be used (evenutally by software via memory mapped registers) as follows:
enable
input0
for encryption, or 1
for decryption.key
inputinput_data
inputvalid
bit to be setTo simulate these operations, I updated my test bench to encrypt 64 bits of data with a given key and then decrypt the resulting ciphertext using the same key. Additonally, I verify that the valid
bit is cleared when the input paramters change. The complete test bench file becomes:
`timescale 1ns / 1ps
module test_bench;
//Clock signal
reg clock = 0;
//Mode input
reg mode = 0;
//Enable input
reg enable = 1;
//Key input register
reg [0:63] key = 0;
//Input register
reg [0:63] input_data = 0;
//Ciphertext output
wire [0:63] output_data;
//Valid bit output
wire valid;
//DES block under test
DES uut (
.clock(clock),
.mode(mode),
.enable(enable),
.key(key),
.input_data(input_data),
.output_data(output_data),
.valid(valid)
);
//Loop variable
integer i;
initial
begin
//Set the mode to encryption
mode = 0;
//Enable the module
enable = 1;
//Set the key
key = 64'B00010011_00110100_01010111_01111001_10011011_10111100_11011111_11110001;
//Plaintext input
input_data = 64'B00000001_00100011_01000101_01100111_10001001_10101011_11001101_11101111;
//Display the plaintext
$display("Plaintext = %b", input_data);
//Trigger the clock 5 times
for(i = 0; i < 5; i=i+1)
begin
#5 clock = 1;
#5 clock = 0;
end
//Display the cipher text
$display("Valid: %b Cipher = %b", valid, output_data);
//Pipe the ciphertext to the input
input_data = output_data;
//Set the mode to decrypt
mode = 1;
//Trigger the clock 5 times
for(i = 0; i < 5; i=i+1)
begin
#5 clock = 1;
#5 clock = 0;
end
//Display the plaintext
$display("Valid: %b Plaintext = %b", valid, output_data);
//Write in a new key
key = key + 1;
//Verify the valid bit is cleared
#5 $display("Valid after new key: %b", valid);
//Trigger the clock 5 times
for(i = 0; i < 5; i=i+1)
begin
#5 clock = 1;
#5 clock = 0;
end
//Verify the valid bit is set
#5 $display("Valid: %b", valid);
//Write a new plaintext
input_data = input_data + 1;
//Verify the valid bit is cleared again
#5 $display("Valid after new plaintext: %b", valid);
$finish;
end
endmodule
Running the above test bench in simulation yields the expected cipher text, successful decryption, and expected behavior of the valid
output.
Plaintext = 0000000100100011010001010110011110001001101010111100110111101111
Valid: 1 Cipher = 1000010111101000000100110101010000001111000010101011010000000101
Valid: 1 Plaintext = 0000000100100011010001010110011110001001101010111100110111101111
Valid after new key: 0
Valid: 1
Valid after new plaintext: 0
I'm pleased with how this module is coming along. My next steps are going to be improvements to the design. I would like to introduce a register latching scheme so that a register changing mid-operation doesn't interfere with an on-going operation. Additionally, I'd like to wrap this implementation in a triple-DES (3DES) implementation that enforces the use of 3 separate keys. Lastly, once integrated as a memory mapped AXI peripheral, I'd like to add interrupt support to notify the processing system when encryption/decryption is complete. I will write those changes up in new posts as I make them. If I decide to take this further, I may add support for a data FIFO or memory region to facilitate operating on more than one block at a time.