Final Exam Review Problems by D1G7dj76


									                           Final Exam Practice Problems
                        for EEL4713 Computer Architecture
                                    Spring 2006
The present plan for the exam is to have two mandatory problems (on Multicycle
Datapath and Cache) and two optional problems (on Performance Metrics and Floating-
point representations). For full credit, you must do BOTH of the mandatory problems
and at least ONE of the optional problems. You may do both optional problems if you
wish. Your final exam grade will be calculated as the sum of your scores on the two
mandatory problems and the higher-scoring of the two optional problems. If you score
higher on either of the optional problems than the corresponding problem on the midterm,
your new score will replace your score on that problem on the midterm. All problems are
worth 35 points.

(MANDATORY) Q. #1 (HW7) Mcyc DP & ctrl (CIO 7 aeo) ........................................... 1
(MANDATORY) Question #2 (HW8). Caches (CIO 8 aceo). ........................................... 3
(OPTION 1) Question #3 – Perf.&Cost Metrics (CIO 1 aeo) ............................................ 4
(OPTION 2) Question #4 – FP Reps. (CIO 1 aeo) ............................................................. 5

(MANDATORY) Q. #1 (HW7) Mcyc DP & ctrl (CIO 7 aeo)
Below are the schematic and controller state machine for a simple multicycle
implementation of MicroMIPS, a subset of the MIPS instruction set, as described in
Parhami ch. 14. Consider the operation of this datapath in executing the BEQ instruction.
Highlight all lines (including control lines) that are needed, and fill out the table on the
following page showing the control signal values on each clock cycle.
                                             26                30
                                              /                 /                                                0
                                                    4 MSBs                                    SysCallAddr        1     30

                                Inst Reg                               x Reg                 ALUZero
                                                   rs                               x Mux       ALUOvfl
              Address                                           (rs)
   PC                                                                                0       Zero         z Reg
                                                                                     1                               4
                                                 0                                              Ovfl
          0                                  rd 1       Reg                                                               1
                 Cache                       31 2                                              ALU                        2
                                                        file                      y Mux                                   3
                                                   0            (rt)           4   0          Func
              Data                                 1                               1
                                                                                   2                   ALU out
                                                                               4 3
                            Data Reg              imm 16       32 y Reg
                                                       /     SE /

                                      op   fn
     InstData       MemWrite                    RegInSrc                 ALUSrcX            ALUFunc                   PCSrc
PCWrite       MemRead           IRWrite    RegDst       RegWrite                   ALUSrcY                  JumpAddr

                        The multicycle datapath from Parhami figure 14.3, p. 261.
                  Cycle 1                      Cycle 2                                    Cycle 3                              Cycle 4                  Cycle 5
          Notes for State 5:
          % 0 for j or jal, 1 for syscall,                                                 State 5                             State 6
              don’t-care for other instr’s                     Jump/                     ALUSrcX = 1
          @ 0 for j, jal, and syscall,                        Branch                     ALUSrcY = 1                    InstData = 1
              1 for jr, 2 for branches                                                  ALUFunc = ‘’
           # 1 for j, jr, jal, and syscall,                                                                            MemWrite = 1
                                                                                        JumpAddr = %
              ALUZero () for beq (bne),                                                  PCSrc = @
              bit 31 of ALUout for bltz                                                  PCWrite = #
          For jal, RegDst = 2, RegInSrc = 1,
              RegWrite = 1                                                                                    sw
              State 0                            State 1                                   State 2                             State 3                     State 4
           InstData = 0                                                lw/
          MemRead = 1                                                   sw
            IRWrite = 1                      ALUSrcX = 0                                ALUSrcX = 1            lw       InstData = 1                   RegDst = 0
          ALUSrcX = 0                        ALUSrcY = 3                                ALUSrcY = 2                    MemRead = 1                     RegInSrc = 0
          ALUSrcY = 0                        ALUFunc = ‘+’                              ALUFunc = ‘+’                                                  RegWrite = 1
          ALUFunc = ‘+’
             PCSrc = 3
            PCWrite = 1

                                                                                           State 7                             State 8

                                                                                     ALUSrcX = 1                      RegDst = 0 or 1
             Note for State 7:                                                     ALUSrcY = 1 or 2                    RegInSrc = 1
             ALUFunc is determined based                              ALU-         ALUFunc = Varies                    RegWrite = 1
             on the op and fn fields

                                  Multicycle controller FSM from Parhami figure 14.4, p. 264.





        Cycle #


          1             1          0            1          0                   1             X          X           0              0         0         ‘+’           X          3
(MANDATORY) Question #2 (HW8). Caches (CIO 8 aceo).
You are considering two alternative designs for the memory hierarchy for a simple CPU
with a base CPI (including level-1 hits) of 2, and with an average of 1.15 memory
accesses per instruction. Design #1 has a single-level 6 MB cache with a hit rate of 97%.
Design #2 has a two-level cache, where the first level is 2 MB and has a local hit rate of
91%, while the second level is 8 MB, requires 3 extra cycles to access, and has a local hit
rate of 88% for accesses that miss at level 1. In either design, for the cache system to
access main memory incurs an additional latency of 70 clock cycles. Assume the cache
hardware costs $10−5 (0.001¢) per bit, while the rest of the processor costs $150. Your
goal is to select the cache design that leads to the best overall cost-performance for the
processor. For simplicity in this problem, ignore the cost of the bookkeeping bits (tags,
valid bits, etc.)

(a) Identify the engineering problem. What characteristics of each cache design do you
need to calculate? Describe them. What figure of merit (or demerit) do you want to
maximize (or minimize)?

(b) Formulate the engineering problem. Compose algebraic expressions for the
important design characteristics that you indentified in part (a), for the cases of both 1-
level and 2-level caches. You may use the following symbols:

       CPIbase – Base CPI of the CPU.

       ainst – Number of memory accesses per instruction.

       S1, S2 – Sizes of level 1, 2 caches.

       h1, h2 – Hit rates of level 1, 2 caches.

       L2, LM − Latency in cycles to go to Level 2 cache and main memory, resp.

       cbit – Cost per bit of cache technology.

       cCPU – Cost of the rest of the CPU aside from the memory hierarchy.

(c) Solve the engineering problem. Evaluate your formulas for the particular cache
designs described in the problem description by plugging in the numbers given. Compare
the two designs. Which design should you select, and why?
(OPTION 1) Question #3 – Perf.&Cost Metrics (CIO 1 aeo)
Suppose you are in charge of setting up a corporate data center, and you have a total
budget of $100,000 to spend on a new cluster of computers. The users of your data
center need to constantly and repeatedly run a given application program “P” on the
machines in this cluster. You are trying to decide what type of computers to buy for the
cluster. The company’s goal is to enable the pool of users to run the program P as
frequently as possible – the more often, the better. You can buy as many computers for
your cluster as you can afford while staying within budget.

1a. Identify the true nature of the problem to be solved, as an engineering problem. What
      quantity or quantities should you really be trying to optimize, and for each one,
      should you be trying to maximize or minimize that particular quantity? Circle all
      that apply.

       i. Number of instructions-per-second executed per machine.          Max / min?
       ii. Total throughput of your data center, within budget.            Max / min?
       iii. Performance of each individual machine on program P.           Max / min?
       iv. Cost-performance (performance per unit cost) on P               Max / min?
                of the type of machine that is purchased.
       v. Execution time of each machine when running program P.           Max / min?
       vi. The CPI of the type of machine that is purchased.               Max / min?

1b. Now suppose that for each type of machine M, you know all of the following
        The dynamic instruction count IC of machine M when running program P.
        The average cycles-per-instruction CPI of the machine when running P.
        The clock frequency f of the machine.
        The cost C of the machine, in dollars.
    Now, formulate an expression for the key figure of merit that you should be trying to
    maximize or minimize, in terms of the above variables. Write the expression below.

1c. Given the below data for the following three machines A,B,C (with IC and CPI as
    measured for program P) use your formula from part (1b) to solve the problem of
    deciding which of these three types of machines you ought to buy. Show your work
    below the table. How many times better (according to the correct figure of merit) is
    the best machine, compared to the second-best alternative?

                       Type A computers       Type B computers       Type C computers
Instruction count      12×109                 3×109                  4×109
Cycles per instr.      1                      1.5                    2
Clock frequency        4 GHz                  3 GHz                  2.8 GHz
Cost                   $1,000                 $2,000                 $200
(OPTION 2) Question #4 – FP Reps. (CIO 1 aeo)
Suppose you have been asked to design and implement a microprocessor-based
embedded system for analysis of sensor data. The application involves performing
floating-point arithmetic on input data values that could be as small as 10−30 in
magnitude. In one part of the algorithm (section A), results are obtained by adding
various data values together. In another part of the algorithm (section B), results are
obtained by multiplying together pairs of data values. You are trying to decide which
IEEE standard floating-point data type (single or double precision) to use in each part of
the algorithm. You want the application to be as energy-efficient as possible, and you
know that your microprocessor has separate single-precision and double-precision
floating-point units that are each optimized to achieve the best possible energy efficiency.

1a) Identify the engineering problem to be solved. What must you calculate in order to
    determine which floating-point data type can be used in a given case?

1b) Formulate the engineering problem. For each of section A and section B, write an
    inequality that indicates whether single-precision can be used for that section. Use
    the variable M to stand for the minimum value (in this case, 10−30) of an input datum.

1c) Solve the problem. Which data type should be used for section A? Which data type
    should be used for section B? Justify your answers.

1d) Hand-convert the value 10−30 to IEEE-standard single-precision floating-point. Show
    your calculations and clearly delineate all fields in the binary result.

To top