# 1 bit ALU testing by hcj

VIEWS: 233 PAGES: 16

• pg 1
```									Gabe Rowe

EE 471

Lab 2
Abstract

This 32 bit ALU has a couple great functions. Minimized logic for speed, and carry look -

ahead for even more speed. The goal of this project was to make an ALU that looks like

the figure below. My ALU can perform all operations within approximately 14 gate

delays assuming the zero detect is one gate delay.

Bus A
32

Output
32
32 bit ALU

Bus B                                                Zero
32

Overflow

CarryOut

ALU
Control

Figure 1. 32 bit ALU block diagram
Overflow

In order to calculate the overflow, we need to know what the carry in and the carry out

are into the highest order bits. However, I wanted to minimize logic, so I made a table

following tables.

To minimize logic of overflow:

Overflow=Cout xor Cin

All Overflow Cases

Example       Binvert    A   B       Bmux Cin   Cout   Sum Overflow

A+B < 0           0      0       0     0    1     0     1     1

-A+-B > 0         0      1       1     1    0     1     0     1

A - -B < 0        1      0       1     0    1     0     1     1

-A - B > 0        1      1       0     1    0     1     0     1

Simplified Overflow Cases

Example        Binvert   A       B   Bmux Cin   Cout Sum Overflow

A+B < 0           X      0       X     0    1     0     1     1

-A+-B > 0       X      1       X     1    0     1     0     1

Logic Equation based on table:

Overflow = ~A*~Bmux*Cin + A*Bmux*~Cin
Set Less Than

In order to calculate the set less than output from the highest order bit, we would

typically xor the sum output and overflow. However, this is costly in time, and can be

made the logic equation using the following tables, and a k-map.

Set=Overflow xor Sum
I expanded this, and realized that it was too much
work to minimize logic this way, so I used a k-map

A         B       Cin      Sum     Overflow   Set
0         0        0        0         0        0
0         0        1        1         1        0
0         1        0        1         0        1
0         1        1        0         0        0
1         0        0        1         0        1
1         0        1        0         0        0
1         1        0        0         1        1
1         1        1        1         0        1

K-map
AB      AB        AB        AB
Cin         00      01        10        11
0           0       1         1         1
1           0       0         0         1

This gives us the minimized equation for set of:
Set=AB+A~Cin+B~Cin

1 bit ALU

I decided to go beyond just minimizing logic for the ALU. I decided to do carry-

lookahead. This meant that I would need to employ a partial full adder (PFA). Since the

PFA used an OR gate for the propogate, and an AND gate for the generate outputs, I

simply re-used those for the AND and OR operations required of the ALU. I also
decided to simply use an XOR gate instead of a 2 to 1 mux, since the XOR gate

simplified logic.

OP Code
Cin
3
b invert                                         1               2

AND     0

OR     1
a                                                                               Result
b               a
2

3
Less
p

g
Sum

1-bit ALU                                             PFA
Bmux
Cin
A                                             Overflow
Overflow=
ABmux~Cin+~A~BmuxCin
Bmux
A

Cin                     Set
Set=ABmux+A~Cin+Bmux~Cin

Figure 2. 1-bit ALU

I decided to use the 2 gate level version of the XOR in the 1 bit ALU, however, in my

code to compete with my friends, I decided to use XOR gates with 50ps delays—the

same as the other gates. In reality, the XOR is actually two gate levels, and thus should

be slower. A 4 to 1 mux was used to select the input we want to look at, whether it’s the

sum, the set less than, AND or OR. This is shown below.
4 to 1 mux

sel1 sel0
sel
in0

in0     0
in1
out
in1     1                                  in2
out

in2     2                                  in3

in3     3

Figure 3. 4 to 1 Mux

The next main module I decided to use in this 32 bit ALU was carry look-ahead for my

adder. This increased the speed of my adder by a factor of 8 approximately. The entire

add process takes 13 gate delays to calculate the slowest bit—the 32 nd bit’s sum. The

carry look ahead modules are shown on the following pages. This illustrates how I used

the same 4-bit carry look-ahead module to create a 32-bit carry look-ahead module, with

cascaded 4-bit sections. I first show the main 4-bit carry look-ahead section in the 4-bit

adder. Then the block diagrams used to create the 16-bit and 32-bit adders. Then finally,

I show the actual gate-level design of the 16 and 32 bit adders.
Conclusion

I expect this to work, and to be the fastest in the class. If I were to take the two gate level

XOR’s and make them one gate level, this would be unstoppable. I had a lot of fun

doing this lab, and I look forward to the next labs.
Testing Output Waveforms

1 bit ALU testing

32 bit ALU testing
Appendix A

Verilog Code

/*

Gabe Rowe

EE 471

Lab #2

32 bit ALU with Carry Look Ahead, and minimized logic on set less than and overflow.

*/

module alu_32_bit(bus_a,bus_b,op_code,result_bus,zero_detect,overflow_detect,carryout);

input [31:0] bus_a, bus_b;

input [2:0] op_code;

output [31:0] result_bus;

output zero_detect,overflow_detect,carryout;

wire [15:0] p0,p1,g0,g1,ci0,ci1;

wire gnd=0;

set set0(bus_a[31],bmux31,ci1[15],set_less_than);

overflow overflow0(bus_a[31],bmux31,ci1[15],overflow_detect);

nor
nor0(zero_detect,result_bus[31],result_bus[30],result_bus[29],result_bus[28],result_bus[27],result_bus[26],r
esult_bus[25],

result_bus[24],result_bus[23],result_bus[22],result_bus[21],result_bus[20],result_bus[19],result_bus[18],

result_bus[17],result_bus[16],result_bus[15],result_bus[14],result_bus[13],result_bus[12],result_bus[11],

result_bus[10],result_bus[9],result_bus[8],result_bus[7],result_bus[6],result_bus[5],result_bus[4],

result_bus[3],result_bus[2],result_bus[1],result_bus[0]);

//These two 16 bit carry look ahead blocks make up a 32 bit carry lookahead block

cla_16_bit cla_16_bit0(op_code[2],p0,g0,ci0,gg0,pg0);
cla_16_bit cla_16_bit1(c16,p1,g1,ci1,gg1,pg1);

and #50 and0(pre_c16,pg0,op_code[2]);

and #50 and1(pre_c32_1,pg1,gg0);

and #50 and2(pre_c32_2,pg0,pg1,op_code[2]);

or #50 or0(c16,pre_c16,gg0);

or #50 or1(carryout,pre_c32_1,pre_c32_2,gg1);

alu_1_bit
alu_1_bit0(bus_a[0],bus_b[0],op_code[2],op_code[2],{op_code[1],op_code[0]},set_less_than,p0[0],g0[0],bm
ux0,outsum0,result_bus[0]);

alu_1_bit
alu_1_bit1(bus_a[1],bus_b[1],ci0[1],op_code[2],{op_code[1],op_code[0]},gnd,p0[1],g0[1],bmux1,outsum1,res
ult_bus[1]);

alu_1_bit
alu_1_bit2(bus_a[2],bus_b[2],ci0[2],op_code[2],{op_code[1],op_code[0]},gnd,p0[2],g0[2],bmux2,outsum2,res
ult_bus[2]);

alu_1_bit
alu_1_bit3(bus_a[3],bus_b[3],ci0[3],op_code[2],{op_code[1],op_code[0]},gnd,p0[3],g0[3],bmux3,outsum3,res
ult_bus[3]);

alu_1_bit
alu_1_bit4(bus_a[4],bus_b[4],ci0[4],op_code[2],{op_code[1],op_code[0]},gnd,p0[4],g0[4],bmux4,outsum4,res
ult_bus[4]);

alu_1_bit
alu_1_bit5(bus_a[5],bus_b[5],ci0[5],op_code[2],{op_code[1],op_code[0]},gnd,p0[5],g0[5],bmux5,outsum5,res
ult_bus[5]);

alu_1_bit
alu_1_bit6(bus_a[6],bus_b[6],ci0[6],op_code[2],{op_code[1],op_code[0]},gnd,p0[6],g0[6],bmux6,outsum6,res
ult_bus[6]);

alu_1_bit
alu_1_bit7(bus_a[7],bus_b[7],ci0[7],op_code[2],{op_code[1],op_code[0]},gnd,p0[7],g0[7],bmux7,outsum7,res
ult_bus[7]);

alu_1_bit
alu_1_bit8(bus_a[8],bus_b[8],ci0[8],op_code[2],{op_code[1],op_code[0]},gnd,p0[8],g0[8],bmux8,outsum8,res
ult_bus[8]);

alu_1_bit
alu_1_bit9(bus_a[9],bus_b[9],ci0[9],op_code[2],{op_code[1],op_code[0]},gnd,p0[9],g0[9],bmux9,outsum9,res
ult_bus[9]);

alu_1_bit
alu_1_bit10(bus_a[10],bus_b[10],ci0[10],op_code[2],{op_code[1],op_code[0]},gnd,p0[10],g0[10],bmux10,out
sum10,result_bus[10]);

alu_1_bit
alu_1_bit11(bus_a[11],bus_b[11],ci0[11],op_code[2],{op_code[1],op_code[0]},gnd,p0[11],g0[11],bmux11,out
sum11,result_bus[11]);

alu_1_bit
alu_1_bit12(bus_a[12],bus_b[12],ci0[12],op_code[2],{op_code[1],op_code[0]},gnd,p0[12],g0[12],bmux12,out
sum12,result_bus[12]);

alu_1_bit
alu_1_bit13(bus_a[13],bus_b[13],ci0[13],op_code[2],{op_code[1],op_code[0]},gnd,p0[13],g0[13],bmux13,out
sum13,result_bus[13]);

alu_1_bit
alu_1_bit14(bus_a[14],bus_b[14],ci0[14],op_code[2],{op_code[1],op_code[0]},gnd,p0[14],g0[14],bmux14,out
sum14,result_bus[14]);

alu_1_bit
alu_1_bit15(bus_a[15],bus_b[15],ci0[15],op_code[2],{op_code[1],op_code[0]},gnd,p0[15],g0[15],bmux15,out
sum15,result_bus[15]);

alu_1_bit
alu_1_bit16(bus_a[16],bus_b[16],c16,op_code[2],{op_code[1],op_code[0]},gnd,p1[0],g1[0],bmux16,outsum1
6,result_bus[16]);

alu_1_bit
alu_1_bit17(bus_a[17],bus_b[17],ci1[1],op_code[2],{op_code[1],op_code[0]},gnd,p1[1],g1[1],bmux17,outsum
17,result_bus[17]);

alu_1_bit
alu_1_bit18(bus_a[18],bus_b[18],ci1[2],op_code[2],{op_code[1],op_code[0]},gnd,p1[2],g1[2],bmux18,outsum
18,result_bus[18]);

alu_1_bit
alu_1_bit19(bus_a[19],bus_b[19],ci1[3],op_code[2],{op_code[1],op_code[0]},gnd,p1[3],g1[3],bmux19,outsum
19,result_bus[19]);

alu_1_bit
alu_1_bit20(bus_a[20],bus_b[20],ci1[4],op_code[2],{op_code[1],op_code[0]},gnd,p1[4],g1[4],bmux20,outsum
20,result_bus[20]);

alu_1_bit
alu_1_bit21(bus_a[21],bus_b[21],ci1[5],op_code[2],{op_code[1],op_code[0]},gnd,p1[5],g1[5],bmux21,outsum
21,result_bus[21]);

alu_1_bit
alu_1_bit22(bus_a[22],bus_b[22],ci1[6],op_code[2],{op_code[1],op_code[0]},gnd,p1[6],g1[6],bmux22,outsum
22,result_bus[22]);

alu_1_bit
alu_1_bit23(bus_a[23],bus_b[23],ci1[7],op_code[2],{op_code[1],op_code[0]},gnd,p1[7],g1[7],bmux23,outsum
23,result_bus[23]);

alu_1_bit
alu_1_bit24(bus_a[24],bus_b[24],ci1[8],op_code[2],{op_code[1],op_code[0]},gnd,p1[8],g1[8],bmux24,outsum
24,result_bus[24]);

alu_1_bit
alu_1_bit25(bus_a[25],bus_b[25],ci1[9],op_code[2],{op_code[1],op_code[0]},gnd,p1[9],g1[9],bmux25,outsum
25,result_bus[25]);

alu_1_bit
alu_1_bit26(bus_a[26],bus_b[26],ci1[10],op_code[2],{op_code[1],op_code[0]},gnd,p1[10],g1[10],bmux26,out
sum26,result_bus[26]);
alu_1_bit
alu_1_bit27(bus_a[27],bus_b[27],ci1[11],op_code[2],{op_code[1],op_code[0]},gnd,p1[11],g1[11],bmux27,out
sum27,result_bus[27]);

alu_1_bit
alu_1_bit28(bus_a[28],bus_b[28],ci1[12],op_code[2],{op_code[1],op_code[0]},gnd,p1[12],g1[12],bmux28,out
sum28,result_bus[28]);

alu_1_bit
alu_1_bit29(bus_a[29],bus_b[29],ci1[13],op_code[2],{op_code[1],op_code[0]},gnd,p1[13],g1[13],bmux29,out
sum29,result_bus[29]);

alu_1_bit
alu_1_bit30(bus_a[30],bus_b[30],ci1[14],op_code[2],{op_code[1],op_code[0]},gnd,p1[14],g1[14],bmux30,out
sum30,result_bus[30]);

alu_1_bit
alu_1_bit31(bus_a[31],bus_b[31],ci1[15],op_code[2],{op_code[1],op_code[0]},gnd,p1[15],g1[15],bmux31,out
sum31,result_bus[31]);

endmodule

module alu_1_bit(a,b,cin,binv,op,less,p,g,bmux,sum,result);

input [1:0] op;

input a,b,cin,binv,less;

output p,g,bmux,sum,result;

b_mux b_mux0(b,binv,bmux);

pfa pfa0(a,bmux,cin,g,p,sum);

mux_4_to_1 mux_4_to_1_0(op,g,p,sum,less,result);

endmodule

module pfa(a,b,cin,g,p,sum);

input a,b,cin;

output g,p,sum;

and #50 and0(g,a,b);

or #50 or0(p,a,b);

xor #50 xor0(sum,a,b,cin);

endmodule

module cla_16_bit(cin,p,g,ci,gg_out,pg_out);
input cin;

input [15:0] p,g;

output [15:0] ci;

output gg_out,pg_out;

wire [3:0] gg,pg,ci_main;

cla_4_bit cla_4_bit0(cin,{p[3],p[2],p[1],p[0]},{g[3],g[2],g[1],g[0]},ci[1],ci[2],ci[3],gg[0],pg[0]);

cla_4_bit cla_4_bit1(ci[4],{p[7],p[6],p[5],p[4]},{g[7],g[6],g[5],g[4]},ci[5],ci[6],ci[7],gg[1],pg[1]);

cla_4_bit cla_4_bit2(ci[8],{p[11],p[10],p[9],p[8]},{g[11],g[10],g[9],g[8]},ci[9],ci[10],ci[11],gg[2],pg[2]);

cla_4_bit cla_4_bit3(ci[12],{p[15],p[14],p[13],p[12]},{g[15],g[14],g[13],g[12]},ci[13],ci[14],ci[15],gg[3],pg[3]);

cla_4_bit cla_4_bit_main(cin,pg,gg,ci[4],ci[8],ci[12],gg_out,pg_out);

endmodule

module cla_4_bit(cin,p,g,c1,c2,c3,gg,pg);

input cin;

input [3:0] p,g;

output c1,c2,c3;

output gg,pg;

and #50 and0(c1_and0,p[0],cin);

and #50 and1(c2_and0,p[1],g[0]);

and #50 and2(c2_and1,p[1],p[0],cin);

and #50 and3(c3_and0,p[2],g[1]);

and #50 and4(c3_and1,p[2],p[1],g[0]);

and #50 and5(c3_and2,p[2],p[1],p[0],cin);

and #50 and6(c4_and0,p[3],g[2]);

and #50 and7(c4_and1,p[3],p[2],g[1]);

and #50 and8(c4_and2,p[3],p[2],p[1],g[0]);

and #50 and9(pg,p[3],p[2],p[1],p[0]);

or #50 or0(c1,g[0],c1_and0);

or #50 or1(c2,g[1],c2_and0,c2_and1);

or #50 or2(c3,g[2],c3_and2,c3_and1,c3_and0);

or #50 or3(gg,g[3],c4_and2,c4_and1,c4_and0);
endmodule

module overflow(a,b,cin,overflow_detect);

input a,b,cin;

output overflow_detect;

not not0(not_a, a);

not not1(not_b, b);

not not2(not_cin, cin);

and #50 and0(and_a_b_not_cin,a,b,not_cin);

and #50 and1(and_not_a_not_b_cin,not_a,not_b,cin);

or #50 or0(overflow_detect,and_a_b_not_cin,and_not_a_not_b_cin);

endmodule

module set(a,b,cin,set_less_than);

input a,b,cin;

output set_less_than;

not not0(not_cin, cin);

and #50 and0(a_and_b,a,b);

and #50 and1(a_and_not_cin,a,not_cin);

and #50 and2(b_and_not_cin,b,not_cin);

or #50 or0(set_less_than,a_and_b,a_and_not_cin,b_and_not_cin);

endmodule

module mux_4_to_1(sel, in0, in1, in2, in3, out);

input [1:0] sel;

input in0, in1, in2, in3;

output out;

wire [1:0] not_sel;

not not0(not_sel[0], sel[0]);

not not1(not_sel[1], sel[1]);

and #50 and0(sel_in0, in0, not_sel[1], not_sel[0]); //00
and #50 and1(sel_in1, in1, not_sel[1], sel[0]); //01

and #50 and2(sel_in2, in2, sel[1], not_sel[0]); //10

and #50 and3(sel_in3, in3, sel[1], sel[0]); //11

or #50 or0(out, sel_in0, sel_in1, sel_in2, sel_in3);

endmodule

module b_mux(b, binv, bmux);

input b, binv;

output bmux;

xor #50 xor0(bmux, binv, b);

endmodule

```
To top