Advanced Computer Architecture
Homework 1, Oct. 22, 2012
1. (12 pts) Explain the differences between architecture, implementation, and
realization. Explain how each of these relates to the iron law of processor
2. (12 pts) Using Amdahl’s law, compute speedups for a program that is 85%
vectorizable for a system with 4, 8, 16, and 32 processors. What would be a
reasonable number of processors to build into a system for running such an
3. (10 pts) Equation 4, which relates the performance of an ideal pipeline to pipeline
depth, looks very similar to Amdahl’s law. Describe the relationship between the
terms in these two equations, and develop an intuitive explanation for why the two
equations are so similar.
4. (10 pts) The MIPS pipeline employs a two-phase clocking scheme that makes
efficient use of a shared TLB, since instruction fetch accesses the TLB in phase
one and data fetch accesses in phase two. However, when resolving a conditional
branch, both the branch target address and the branch fall-through address need to
be translated during phase one--in parallel with the branch condition check in
phase one of the ALU stage--to enable instruction fetch from either the target or
the fall-through during phase two. This seems to imply a dual-ported TLB.
Suggest an architected solution to this problem that avoids dual-porting the TLB.
5. (12 pts) Show possible functional block diagrams to implement the pipelines of the
MIPS R2000/R3000 and instructional DLX processors by modifying that of the
TYP pipelined processor implementation.
6. (26 pts) A load-immediate instruction, which signextends a 16-bit immediate value
to 32 bits, and stores the result in the destination register, is added to the TYP
instruction set and pipeline. Given that such instructions are able to write their
results into the register in the decode (ID) stage, answer the following
a. (6 pts) Identify what additional hazards such a change might introduce.
b. (8 pts) Ignoring pipeline interlock hardware, what additional pipeline
resources does the change require? Discuss these resources and their cost.
c. (12 pts) Redraw the pipeline interlock hardware shown in Figure 2-18 to
correctly handle the load-immediate instructions.
7. (18 pts) Assume the following instruction mix for a RISC instruction set:
20% stores, 20% loads, 20% branches, 25% integer arithmetic, 10%
integer shift, and 5% integer multiply. Given that stores require two
cycles, loads require two cycles, branches require three cycles,
integer ALU instructions require one cycle, and integer multiplies
require eight cycles, answer the following questions:
a. (6 pts) Compute the overall CPI.
b. (6 pts) If 50% of multiplies can be converted to shift-add sequences with an
average length of three instructions, compute the change in instructions per
program, cycles per instruction, and overall program speedup.
c. (6 pts) Assume that 60% of the additional integer and shift instructions
introduced by converting multiplies are shifts, which now take four cycles to
execute. Recompute the cycles per instruction and overall program speedup.