Final Exam Practice Problems for EEL4713 Computer Architecture Spring 2006 The present plan for the exam is to have two mandatory problems (on Multicycle Datapath and Cache) and two optional problems (on Performance Metrics and Floating- point representations). For full credit, you must do BOTH of the mandatory problems and at least ONE of the optional problems. You may do both optional problems if you wish. Your final exam grade will be calculated as the sum of your scores on the two mandatory problems and the higher-scoring of the two optional problems. If you score higher on either of the optional problems than the corresponding problem on the midterm, your new score will replace your score on that problem on the midterm. All problems are worth 35 points. (MANDATORY) Q. #1 (HW7) Mcyc DP & ctrl (CIO 7 aeo) ........................................... 1 (MANDATORY) Question #2 (HW8). Caches (CIO 8 aceo). ........................................... 3 (OPTION 1) Question #3 – Perf.&Cost Metrics (CIO 1 aeo) ............................................ 4 (OPTION 2) Question #4 – FP Reps. (CIO 1 aeo) ............................................................. 5 (MANDATORY) Q. #1 (HW7) Mcyc DP & ctrl (CIO 7 aeo) Below are the schematic and controller state machine for a simple multicycle implementation of MicroMIPS, a subset of the MIPS instruction set, as described in Parhami ch. 14. Consider the operation of this datapath in executing the BEQ instruction. Highlight all lines (including control lines) that are needed, and fill out the table on the following page showing the control signal values on each clock cycle. 26 30 / / 0 4 MSBs SysCallAddr 1 30 Inst Reg x Reg ALUZero jta rs x Mux ALUOvfl Address (rs) PC 0 Zero z Reg rt 1 4 0 Ovfl 0 0 rd 1 Reg 1 1 Cache 31 2 ALU 2 file y Mux 3 0 (rt) 4 0 Func Data 1 1 2 ALU out 4 3 Data Reg imm 16 32 y Reg / SE / op fn InstData MemWrite RegInSrc ALUSrcX ALUFunc PCSrc PCWrite MemRead IRWrite RegDst RegWrite ALUSrcY JumpAddr The multicycle datapath from Parhami figure 14.3, p. 261. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Notes for State 5: % 0 for j or jal, 1 for syscall, State 5 State 6 don’t-care for other instr’s Jump/ ALUSrcX = 1 @ 0 for j, jal, and syscall, Branch ALUSrcY = 1 InstData = 1 1 for jr, 2 for branches ALUFunc = ‘’ # 1 for j, jr, jal, and syscall, MemWrite = 1 JumpAddr = % ALUZero () for beq (bne), PCSrc = @ bit 31 of ALUout for bltz PCWrite = # For jal, RegDst = 2, RegInSrc = 1, RegWrite = 1 sw State 0 State 1 State 2 State 3 State 4 InstData = 0 lw/ MemRead = 1 sw IRWrite = 1 ALUSrcX = 0 ALUSrcX = 1 lw InstData = 1 RegDst = 0 ALUSrcX = 0 ALUSrcY = 3 ALUSrcY = 2 MemRead = 1 RegInSrc = 0 ALUSrcY = 0 ALUFunc = ‘+’ ALUFunc = ‘+’ RegWrite = 1 ALUFunc = ‘+’ PCSrc = 3 PCWrite = 1 Start State 7 State 8 ALUSrcX = 1 RegDst = 0 or 1 Note for State 7: ALUSrcY = 1 or 2 RegInSrc = 1 ALUFunc is determined based ALU- ALUFunc = Varies RegWrite = 1 on the op and fn fields type Multicycle controller FSM from Parhami figure 14.4, p. 264. MemWrite JumpAddr MemRead ALUSrcX ALUSrcY ALUFunc RegWrite RegInSrc Inst’Data PCWrite IRWrite Cycle # RegDst PCSrc 1 1 0 1 0 1 X X 0 0 0 ‘+’ X 3 2 3 4 5 (MANDATORY) Question #2 (HW8). Caches (CIO 8 aceo). You are considering two alternative designs for the memory hierarchy for a simple CPU with a base CPI (including level-1 hits) of 2, and with an average of 1.15 memory accesses per instruction. Design #1 has a single-level 6 MB cache with a hit rate of 97%. Design #2 has a two-level cache, where the first level is 2 MB and has a local hit rate of 91%, while the second level is 8 MB, requires 3 extra cycles to access, and has a local hit rate of 88% for accesses that miss at level 1. In either design, for the cache system to access main memory incurs an additional latency of 70 clock cycles. Assume the cache hardware costs $10−5 (0.001¢) per bit, while the rest of the processor costs $150. Your goal is to select the cache design that leads to the best overall cost-performance for the processor. For simplicity in this problem, ignore the cost of the bookkeeping bits (tags, valid bits, etc.) (a) Identify the engineering problem. What characteristics of each cache design do you need to calculate? Describe them. What figure of merit (or demerit) do you want to maximize (or minimize)? (b) Formulate the engineering problem. Compose algebraic expressions for the important design characteristics that you indentified in part (a), for the cases of both 1- level and 2-level caches. You may use the following symbols: CPIbase – Base CPI of the CPU. ainst – Number of memory accesses per instruction. S1, S2 – Sizes of level 1, 2 caches. h1, h2 – Hit rates of level 1, 2 caches. L2, LM − Latency in cycles to go to Level 2 cache and main memory, resp. cbit – Cost per bit of cache technology. cCPU – Cost of the rest of the CPU aside from the memory hierarchy. (c) Solve the engineering problem. Evaluate your formulas for the particular cache designs described in the problem description by plugging in the numbers given. Compare the two designs. Which design should you select, and why? (OPTION 1) Question #3 – Perf.&Cost Metrics (CIO 1 aeo) Suppose you are in charge of setting up a corporate data center, and you have a total budget of $100,000 to spend on a new cluster of computers. The users of your data center need to constantly and repeatedly run a given application program “P” on the machines in this cluster. You are trying to decide what type of computers to buy for the cluster. The company’s goal is to enable the pool of users to run the program P as frequently as possible – the more often, the better. You can buy as many computers for your cluster as you can afford while staying within budget. 1a. Identify the true nature of the problem to be solved, as an engineering problem. What quantity or quantities should you really be trying to optimize, and for each one, should you be trying to maximize or minimize that particular quantity? Circle all that apply. i. Number of instructions-per-second executed per machine. Max / min? ii. Total throughput of your data center, within budget. Max / min? iii. Performance of each individual machine on program P. Max / min? iv. Cost-performance (performance per unit cost) on P Max / min? of the type of machine that is purchased. v. Execution time of each machine when running program P. Max / min? vi. The CPI of the type of machine that is purchased. Max / min? 1b. Now suppose that for each type of machine M, you know all of the following quantities: The dynamic instruction count IC of machine M when running program P. The average cycles-per-instruction CPI of the machine when running P. The clock frequency f of the machine. The cost C of the machine, in dollars. Now, formulate an expression for the key figure of merit that you should be trying to maximize or minimize, in terms of the above variables. Write the expression below. 1c. Given the below data for the following three machines A,B,C (with IC and CPI as measured for program P) use your formula from part (1b) to solve the problem of deciding which of these three types of machines you ought to buy. Show your work below the table. How many times better (according to the correct figure of merit) is the best machine, compared to the second-best alternative? Type A computers Type B computers Type C computers Instruction count 12×109 3×109 4×109 Cycles per instr. 1 1.5 2 Clock frequency 4 GHz 3 GHz 2.8 GHz Cost $1,000 $2,000 $200 (OPTION 2) Question #4 – FP Reps. (CIO 1 aeo) Suppose you have been asked to design and implement a microprocessor-based embedded system for analysis of sensor data. The application involves performing floating-point arithmetic on input data values that could be as small as 10−30 in magnitude. In one part of the algorithm (section A), results are obtained by adding various data values together. In another part of the algorithm (section B), results are obtained by multiplying together pairs of data values. You are trying to decide which IEEE standard floating-point data type (single or double precision) to use in each part of the algorithm. You want the application to be as energy-efficient as possible, and you know that your microprocessor has separate single-precision and double-precision floating-point units that are each optimized to achieve the best possible energy efficiency. 1a) Identify the engineering problem to be solved. What must you calculate in order to determine which floating-point data type can be used in a given case? 1b) Formulate the engineering problem. For each of section A and section B, write an inequality that indicates whether single-precision can be used for that section. Use the variable M to stand for the minimum value (in this case, 10−30) of an input datum. 1c) Solve the problem. Which data type should be used for section A? Which data type should be used for section B? Justify your answers. 1d) Hand-convert the value 10−30 to IEEE-standard single-precision floating-point. Show your calculations and clearly delineate all fields in the binary result.
Pages to are hidden for
"Final Exam Review Problems"Please download to view full document