Embed
Email

h1

Document Sample

Shared by: Nuhman Paramban
Categories
Tags
Stats
views:
2
posted:
10/28/2011
language:
English
pages:
2
CSC 462/562 Homework #1

Due: Monday, January 22



Graduate students answer all 6 questions. Undergraduates answer any 5 questions (answer all 6 for extra credit).

Word process all answers. Figures may be hand drawn. Show your work for partial credit.



1) A given benchmark consists of 35% loads, 10% stores, 16% branches, 27% integer ALU operations, 8% FP

+/-, 3% FP * and 1% FP /. We want to compare the benchmark as run on two processors, as described

below. Which processor is faster and by how much? IC will be the same for both machines.

Processor 1 Processor 2

ALU 4 3

Load/store 5 4

Branch 3 CPI 2 CPI

FP +, - 8 4

FP * 10 6

FP / 30 15

CPU rate 2.5GHz 2.0 GHz



2) Architects have added numerous registers to a processor and are deciding whether the registers should be

used as ordinary registers in order to reduce the number of loads and stores, or used for parameter passing

by making them register windows (see the figure below). In the former case, an optimizing compiler can

successfully remove 40% of the loads and 60% of the stores from a given benchmark. In the latter case,

procedure calls and returns no longer require accessing memory (cache) as the values being passed and

returned will be stored in registers so that the CPI of procedure calls and returns drops from 15 down to 4.

In both cases, the clock rate will be the same, so this value can be factored out of your comparison. Given

the following CPI values and the benchmark’s breakdown of instructions, how should the architects use

these new registers, as normal or as register windows? Demonstrate your answer by determining which

provides the greater speedup on the benchmark and by how much. NOTE: the first approach will require

that you determine a new breakdown of instructions because the IC will change by removing loads and

stores such that all percentages will need to be adjusted (we covered a problem like this in class).

CPI: Load/Store: 4, ALU and Unconditional Branch: 2, Conditional Branch: 3

Procedure Call and Return: 15

Benchmark breakdown: 40% Load, 13% Store, 31% ALU, 8% Conditional Branch, 2%

Unconditional Branch, 3% Procedure Call, 3% Return









Local variables at level j that will be passed as

parameters to the next level, j+1, are stored in

the “temporary registers” whereas local

variables that are not passed are stored in “local

registers”



No data movement is necessary, instead the

CPU merely shifts its focus in the set of

registers in the window by moving to the next

group counterclockwise (in the figure) for any

function call, and clockwise upon function

return.

3) Including a dual processor in a computer gives the computer the potential for a 2 times speedup if the

second processor can be utilized full time. To take advantage of this when running a single program, the

program must be completely parallelized. This is not practical. However, the dual processor can still

provide a decent amount of speedup if a program can be parallelized enough. Provide a table that shows

how much speedup the computer will gain in using the dual processor if a program can be parallelized by

each of the following: 25%, 50%, 75%, 90%, 95% and 99%.



4) Architects are considering implementing one of three enhancements to a processor. The first offers a 3

times speedup in enhanced mode, which is available 25% of the time. The second offers a 20 times

speedup in enhanced mode, which is available 10% of the time. The third offers a 1.5 times speedup in

enhanced mode, which is available 60% of the time. Which enhancement should be selected? Show why

by computing the overall speedup for each enhancement.



5) In the 1980s and 1990s, architects debated whether the RISC or CISC approach was better. The list below

denotes some of the differences in philosophy between the two forms of architecture. For each of the

following, explain how it would improve CPU time in terms of which of the following in our CPU time

formula would be decreased: IC, CPI, Clock Cycle Time, or some combination. NOTE: some of these

may increase but you do not need to discuss what increases, only what decreases.

a. In RISC, there are a great number of registers available, less so in a CISC machine

b. In CISC, there can be complex addressing modes such as indirect addressing to obtain the datum

pointed to by a pointer

c. In RISC, a pipeline is used to perform each part of the fetch-execute cycle as an independent stage

d. In CISC, variable sized instruction lengths are common so that multiple memory operands can be

accessed at the same time



6) A floating point benchmark has the following breakdown of instructions executed (note: these are the

number of instructions, not percentages):

Loads: 103,198

Stores: 28,998

Branches: 37,643

Integer ALU operations: 75,387

FP +/-: 53,837

FP *: 12,111

FP /: 3,002

FP Comparisons: 6,391

A processor executes floating point operations as sequences of integer operations as follows:

FP +/-: 8 integer operations for 1 +/-

FP *: 32 integer operations for 1 *

FP /: 128 integer operations for 1 /

FP Comparison: 10 integer operations for 1 compare

Assume loads and stores have a CPI of 4 and branches and integer operations have a CPI of 3.

a. If the processor has a clock rate of 2.5 GHz, what is the machine’s MIPS rating? MIPS can be

computed as clock rate / (integer operation IC * 10 6).

b. If we replace the processor with one capable of performing floating operations such that FP +/-

/compare are performed in 8 cycles, * in 12 and / in 25, what is the machine’s new MIPS rating?

To compute this, you have to recomputed the integer IC because all of the FP operations are now

being performed in FP hardware and not as sequences of integer operations.

c. How much faster is the machine from part b over the machine from part a?

d. Does your change in MIPS rating from part b to part a roughly agree with your answer in part c?

If not, try explain why not.



Related docs
Other docs by Nuhman Paramba...
NSH_State_MEETINGTIMELINE
Views: 1  |  Downloads: 0
vb090208
Views: 0  |  Downloads: 0
1248-Infosys Placement Paper and Puzzles - 52
Views: 3  |  Downloads: 0
MSCDExpeditedResearchCategories
Views: 0  |  Downloads: 0
The_Water_Cycle_Game_-_Write-up
Views: 0  |  Downloads: 0
D4financeM6 Appendix 3
Views: 0  |  Downloads: 0
RFSL DVG NIT-Ext1
Views: 0  |  Downloads: 0
ScholarshipResults2002
Views: 8  |  Downloads: 0
shome.nit
Views: 5  |  Downloads: 0
Industrial Infrared Thermography
Views: 4  |  Downloads: 0
By registering with docstoc.com you agree to our
privacy policy

You are almost ready to download!

You are almost ready to download!