hw1 by ashrafp


									           Summer                CDA5155 Homework 1
Due: May 28th, 2010, 11:59pm

You are not allowed to take or give help in completing this assignment. Submit the PDF
version of the submission in e-Learning website before the deadline. Please include the
sentence in bold on top of your submission: “I have neither given nor received any
unauthorized aid on this assignment”.

Total Points: 70 pts

1. [10 points] Using the following table, solve the following questions:
    Chip                     Num of Cores     Memory               Processor
                                              performance          performance

    Athlon 64 X2 4800 +                   2                 3423                 20178

    Pentium EE 840                        2                 3228                 18893

    Pentium D 820                         2                 3000                 15220

    Athlon 64 X2 3800 +                   2                 2941                 17129

    Pentium 4                             1                 2731                  7621

    Athlon 64 3000+                       1                 2953                  7628

    Pentium $ 570                         1                 3501                 11210

    Processor X                           1                 7000                  5000

        a. Create a table similar to the given table, except express the results as
           normalized to the Pentium 4 for both memory performance and processor

        b. Calculate the arithmetic mean of the performance of each processor using
           both the original performance and your normalized performance in part a).

        c. Given the answer from part b), are there any conflicting conclusions you can
2. [15 points] Your company’s internal studies show that a single-core system is
   sufficient for the demand on your processing power. You are exploring, however,
   whether you could save power by using two cores.

       a. Assume that your application is 90% parallelizable. By how much could you
          decrease the frequency and get the same performance?

       b. Assume that the voltage may be decreased linearly with the frequency. Using
          the equation in Section 1.5, how much dynamic power would the dual-core
          system require as compared to the single-core system?

       c. Now assume that the voltage may not decrease below 30% of the original
          voltage. This voltage is referred to as the “voltage floor,” and any voltage
          lower than that will lose the state. Using the equation in Section 1.5, how
          much dynamic power would the dual‐core system require from part (a)
          compared to the single‐core system when taking into account the voltage floor?

3. [10 points] You are designing a 32-bit instruction-set architecture which needs to
   support 100 opcodes, three source operands and two destination operands. All the
   source and destination operands are registers. Moreover, all the operands should be
   able to access all the registers. What is the maximum size of the register file that this
   architecture can use (show your computations)?

4. [15 points] In the load-store architecture of MIPS, operands of arithmetic and logical
   instruction must be from registers. For a typical integer program, the instruction
   distribution and CPI of 4 groups are given in the following table.

                   Type                 Frequency                  CPI
                    ALU                    50%                      1
                   Load                    25%                      2
                   Store                   15%                      2
                  Branch                   10%                      4

       a. Calculate the average CPI of the integer program.

       b. Now, assume that a set of new memory-register type of arithmetic and logical
          instructions are added into the ISA. Each memory-register ALU instruction
           combines one Load and one original ALU instruction together. It takes 4
           cycles to execution this new type of instruction. Assume 60% of the load
           instructions can be combined for the program; calculate the new CPI of the
           integer program.

       c. Assume the modification makes the overall cycle time increased by 5%. Is
          this modification really worthwhile?

5. [20 points] Assume that values A, B, C and D reside in memory. Also assume that
   instruction operation codes are represented in 8 bits, memory addresses are 64 bits
   and register addresses are 8 bits. Assume all the data are 32-bits, and the instruction
   lengths are in the table.

       a. Write the code sequence for D=A+B*(A+C) for the following instruction set
       architectures: 1) Stack; 2) Accumulator; 3) Register (Register-memory); 4)
       Register (Load-Store). (You can refer to class slides, or Figure B.1-B.2 on page
       B-4 of the Appendix B )

       b. Compute the total instruction number and code size for each sequence you get.

       c. Compute how many bytes are transferred to or from the memory in executing
       the code sequences, including fetching instructions, read data, write data.

               ISA                        Instruction Length
               Stack                      8 or 72
               Accumulator                72
               Register-memory            32 or 80 or 88
               Load-Store                 32 or 80

To top