Document Sample
Algorithms Powered By Docstoc
					                   Advanced Computer Architecture
                           Homework 1, Oct. 22, 2012
1. (12 pts) Explain the differences between architecture, implementation, and
    realization. Explain how each of these relates to the iron law of processor

2. (12 pts) Using Amdahl’s law, compute speedups for a program that is 85%
    vectorizable for a system with 4, 8, 16, and 32 processors. What would be a
   reasonable number of processors to build into a system for running such an

3. (10 pts) Equation 4, which relates the performance of an ideal pipeline to pipeline
    depth, looks very similar to Amdahl’s law. Describe the relationship between the
    terms in these two equations, and develop an intuitive explanation for why the two
   equations are so similar.

4. (10 pts) The MIPS pipeline employs a two-phase clocking scheme that makes
    efficient use of a shared TLB, since instruction fetch accesses the TLB in phase
   one and data fetch accesses in phase two. However, when resolving a conditional
   branch, both the branch target address and the branch fall-through address need to
   be translated during phase one--in parallel with the branch condition check in
   phase one of the ALU stage--to enable instruction fetch from either the target or
   the fall-through during phase two. This seems to imply a dual-ported TLB.
   Suggest an architected solution to this problem that avoids dual-porting the TLB.

5. (12 pts) Show possible functional block diagrams to implement the pipelines of the
    MIPS R2000/R3000 and instructional DLX processors by modifying that of the
   TYP pipelined processor implementation.

6. (26 pts) A load-immediate instruction, which signextends a 16-bit immediate value
    to 32 bits, and stores the result in the destination register, is added to the TYP
    instruction set and pipeline. Given that such instructions are able to write their
   results into the register in the decode (ID) stage, answer the following
     a. (6 pts) Identify what additional hazards such a change might introduce.
     b. (8 pts) Ignoring pipeline interlock hardware, what additional pipeline
       resources does the change require? Discuss these resources and their cost.
    c. (12 pts) Redraw the pipeline interlock hardware shown in Figure 2-18 to
       correctly handle the load-immediate instructions.
7. (18 pts) Assume the following instruction mix for a RISC instruction set:
    20% stores, 20% loads, 20% branches, 25% integer arithmetic, 10%
    integer shift, and 5% integer multiply. Given that stores require two
    cycles, loads require two cycles, branches require three cycles,
    integer ALU instructions require one cycle, and integer multiplies
    require eight cycles, answer the following questions:
     a. (6 pts) Compute the overall CPI.
     b. (6 pts) If 50% of multiplies can be converted to shift-add sequences with an
         average length of three instructions, compute the change in instructions per
       program, cycles per instruction, and overall program speedup.
    c. (6 pts) Assume that 60% of the additional integer and shift instructions
       introduced by converting multiplies are shifts, which now take four cycles to
       execute. Recompute the cycles per instruction and overall program speedup.