Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Amdahl’s Law - Computer Architecture and Engineering - Solved Exams, Exams of Computer Architecture and Organization

Kannur University Computer Architecture and Organization

Main points of this past exam are: Amdahl’S Law, Principle Components, Integer Arithmetic, Instruction Class, Floating Point, Branches, Mips Rating, Optimized Version, Unoptimized Program, Maximum Speedup

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3 🇮🇳

4.8

(8)

55 documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

University of California, Berkeley

College of Engineering

Computer Science Division  EECS

Fall 1999 John Kubiatowicz

Midterm I

SOLUTIONS

October 6, 1999

CS152 Computer Architecture and Engineering

Your Name:

SID Number:

Discussion Section:

Problem Possible Score

120

215

335

430

Total

Partial preview of the text

Download Amdahl’s Law - Computer Architecture and Engineering - Solved Exams and more Exams Computer Architecture and Organization in PDF only on Docsity!

University of California, Berkeley College of Engineering Computer Science Division  EECS

Fall 1999 John Kubiatowicz

Midterm I

SOLUTIONS October 6, 1999 CS152 Computer Architecture and Engineering

Your Name:

SID Number:

Discussion Section:

Problem Possible Score

1 20

2 15

3 35

4 30

Total

Problem 1: Performance

Problem 1a: Name the three principle components of runtime that we discussed in class. How do they combine to yield runtime?

Three components: Instruction Count, CPI, and Clock Period (or Rate)

Clock Rate

InstCount CPI

Runtime InstCount CPI Clockperiod × =

= × ×

Problem 1b: What is Amdahl’s law for speedup? State as a formula which includes a factor for clock rate.

×

Freq old

Freqnew

( 1 )

Timenew

Speedup Timeold

f f

Where f = fraction of cycles sped up by optimization, n is the speedup.

Let us suppose that you have been running an important program on your company’s 300MHz Acme II processor. By running a detailed simulator, you were able to collect the following instruction mix and breakdown of costs for ezach instruction type:

Instruction Class Frequency (%) Cycles Integer arithmetic and logical 40 1 Load 20 1 Store 10 2 Branches 20 3 Floating Point 10 5 Problem 1c : What is the CPI and MIPS rating of the Acme II for this program?

CPI = .4(1)+.2(1)+.1(2)+.2(3)+.1(5) = 1. MIPS = 300/1.9 = 157.

Problem 1d: Suppose that you turn on the optimizer and it eliminates 30% of the arithmetic/logic instructions (i.e. 12% of the total instructions), 30% of load instructions, and 20% of the floating-point instructions. None of the other instructions are effected. What is the speedup of the optimized program? (Be sure to state the formula that you are using for speedup and show your work)

The easiest way to compute this is to imagine that the original program had 100 instructions in it. Then, we can compute the number of cycles for the optimized version of this program vs the original:

[ ]

Timenew

Timeold (^) = = × − + −

×

Speedup = =

Problem 2: Propagation Delay

Problem 2a: Assume the following characteristics for NAND gates: Input load: 150fF, Internal delay: TPlh=0.2ns, TPhl=0.5ns, Load-Dependent delay: TPlhf=.0020ns, TPhlf=.0021ns

For the circuit below, assume that inputs X 0 – X 5 are all set to 1. What are the propagation delays from A to Y (for rising and falling-edges of Y)?

Because there are an even number of gates between A and Y, and because they are equally loaded, both transitions have equal propagation delay:

TAY ↑ =TAY ↓ = 2 × [( 0. 002 × 150 + 0. 2 ) +( 0. 0021 × 150 + 0. 5 )] =2.63 ns

Or, including a wire-delay estimate:

TAY ↑ =TAY ↓ = 2 × [(^ 0. 002 × 300 + 0. 2 ) +( 0. 0021 × 300 + 0. 5 )] =3.86 ns

X 0

A

X 1

Y

Z

X 2 X 3 X 4 X 5

Problem 2b : Suppose that we construct a new gate, XOR, as follows:

Compute the standard parameters for the linear delay models for this complex gate, assuming the parameters given above for the NAND gate:

A Input Capacitance: 150+150= 300 fF Load-dependent Delays: B Input Capacitance: 300fF TPAYlhf: 0.0020 ns/fF TPAYhlf: 0.0021 ns/fF TPBYlhf: 0.0020 ns/fF TPBYhlf: 0.0021 ns/fF

Internal delays for A⇒Y, assuming that B is set to 1 (worst case delays): TPAYlh: 0.2 + 0.002x300 + 0.5 + 0.0021x150 + 0.2 = 1.815 ns TPAYhl: 0.5 + 0.0021x300 + 0.2 + 0.002x150 + 0.5 = 2.13ns

With estimated wire delay, these numbers would be: 2.73 and 3.06 respectively.

Problem 2c: Now, suppose we use our new XOR gate in the circuit below. Let X 0 – X 5 be set to 1. Compute the propagation delays from A ⇒ Y (both rising and falling edges):

This has the same symmetry as in part (a). So:

TAY ↑ =TAY ↓ = 2 ×[ 1. 815 + (. 002 × 300 ) + 2. 13 + (. 0021 x 300 ) ] = 10. 35 ns

or, with wire delay:

TAY ↑ =TAY ↓ = 2 ×[ 2. 73 + (. 002 × 600 ) + 3. 06 + (. 0021 x 600 ) ] = 16. 5 ns

A

B

Y

X 0

X 1

X 2 X 3 X 4 X 5

Problem 3a: The above example showed unsigned M. Is it easy extend the algorithm for a signed M?

Yes: Since it doesn’t make sense to take the square root of a negative number, we simply need to cause a “bad operand” fault if the sign bit of M is set.

Problem 3b: From this point on, assume M is unsigned. For a 64-bit, unsigned-value M, what is the largest possible integer square-root, S (^) max? How many bits would it take to represent? Explain without using a calculator. ( hint: Start by finding the smallest integer that is bigger than S (^) max ).

First, note that 2^32 × 232 = 2^64 = Mmax +1. So, S (^) max < 2^32. Further, we know that:

(2^32 – 1)^2 < (2^32 )^2 = Mmax+1 ⇒ (2^32 – 1)^2 ≤ Mmax

Thus, we can conclude: S (^) max = (2^32 – 1)

This takes 32 bits to represent.

Problem 3c: Also for a 64-bit unsigned-value M, what is the largest possible remainder, R (^) max? How many bits would it take to represent? Explain without using a calculator. ( Use the same hint as above).

First, note that the spaces between successive squares keeps increasing:

1, 4, 9, 16, 25, 36, 49, ...

This means that the maximum remainder would be between S (^) max^2 and (S (^) max+1)^2 =Mmax+

R (^) max = (M (^) max – S (^) max^2 ) = (S (^) max+1)^2 – 1 – S (^) max^2 = S (^) max^2 + 2S (^) max + 1 – 1 – S (^) max^2 = 2S (^) max

Thus, Rmax= 2 (2^32 – 1)

This would take 33 bits to represent.

Here is pseudo-code for a square root algorithm. Assume that the input value of M has been restricted so that S (^) max is no more than 31 bits in size and R (^) max is no more than 32 bits in size. Let Result and Remain be 32-bit global values which will store the square root and remainder respectively. Inputs Mhi and Mlow are 32-bit arguments that give the upper and lower 32-bits of the input. This code is modeled after version 3 of the divider from class:

isqrt(Mlow ,Mhi ) ⇒ (Result, Remainder) { /* All temporaries are 32-bit values */ int nextbit, temp, topbit, lowerbits;

/* missing initialization instructions */

while (nextbit > 0) { ROL96(topbits,Remainder, lowerbits);

/* Above restrictions on M ensure temp only 32 bits. */ temp = (2 * Result) | nextbit; if (topbits > 0 || Remainder ≥ temp) { Result = Result | nextbit; SUBcarry(topbits, Remainder, temp); } nextbit = nextbit >> 1; } }

The ROL96(hi,low,extra) pseudo-instruction takes three 32-bit registers and treats them as a combined 96-bit register. It shifts the combined value left by one position, inserting a zero at the far right (of the extra register).

The SUBcarry(hi,low,subvalue) pseudo-instruction takes three 32-bit registers. It treats the first two as a combined 64-bit register. It subtracts the 32-bit subvalue from this 64- bit register.

Problem 3d: The pseudo-code is missing some initialization instructions. What should be there? ( hint: look at the example square root again and try to figure out what the various arguments to ROL96 must be. Also, make sure that every variable has an initial value!):

nextbit = 2^31

topbits = 0 Remainder = M (^) hi Initialization for the 96 bit shift register lowerbits = M (^) low

Result = 0

Problem 3f: Implement the ROL96($t0,$t1,$t2) pseudo-instruction in 7 MIPS instructions. Assume that $t0, $t1, and $t2 are the three input registers (with $t0 the most significant). (hint: what happens if you use signed slt on unsigned numbers?)

Soln: We are going to use the assembler register $at as a temp to hold the MSBs

slt $at, $t1, $r0 ; Get top bit of $t sll $t0, $t0, 1 or $t0, $at, $t slt $at, $t2, $r0 ; Get top bit of $t sll $t1, $t1, 1 or $t0, $at, $t sll $t2, $t2, 1

Problem 3g: Implement the SUBcarry($t0,$t1,$t2) pseudo-instruction in 3 MIPS instructions.

Soln: Compute the carry out and put it in $at. Be careful to use sltu****!

sltu $at, $t1, $t sub $t1, $t1, $t sub $t0, $t0, $t

Problem 3h: What is the maximum “CPI” of your isqrt() procedure? (i.e. what is the total number of cycles to perform an isqrt)? Assume that each real MIPS instruction takes 1 cycle, and pseudo- instructions ROL96 and SUBcarry take 7 and 3 cycles respectively:

Number of cycles in inner loop = 18 Start up = 5 cycles Ending = 1 cycle

CPI for this “instruction” = 5 + 18x32 + 1 = 582 `

EXTRA CREDIT [5pts => Save until last!]: Draw the data path for a hardware square-root engine that does 64-bit square-roots. Explain what you are doing and how this will be controlled.

The following data path will do the trick. Notice that we have essentially duplicated the algorithm in hardware. A little thinking will verify that you never need more than 34-bits inthe remainder register to handle the maximum temporary results.

34 bits (Remainder + topbits) 32 bits (lowerbits)

34-bit subtractor

32 bits (Quotient) (^) Decoder5=>

33-bit Or

32-bit Or

5-bit coumter

Controller Load1/Load2/Shift

Load/ Shift

DecrementSet 31/

zero?

Negative?

LoadClear/

00||Mhi Mlo

In class, we made our multicycle machine support the following six MIPS instructions:

op | rs | rt | rd | shamt | funct = MEM[PC] op | rs | rt | Imm16 = MEM[PC]

INST Register Transfers ADDU R[rd] ← R[rs] + R[rt]; PC ← PC + 4 SUBU R[rd] ← R[rs] - R[rt]; PC ← PC + 4 ORI R[rt] ← R[rs] + zero_ext(Imm16); PC ← PC + 4 LW R[rt] ← MEM[ R[rs] + sign_ext(Imm16)]; PC ← PC + 4 SW MEM[R[rs] + sign_ext(Imm16)] ← R[rs]; PC ← PC + 4 BEQ if ( R[rs] == R[rt] ) then PC ← PC + 4 + sign_ext(Imm16) || 00 else PC ← PC + 4

For your reference, here is the microcode for two of the 6 MIPS instructions:

Label ALU SRC1 SRC2 ALUDest Memory MemReg PCWrite Sequence Fetch Add PC 4 ReadPC IR ALU Seq Dispatch Add PC ExtShft Dispatch

RType Func rs rt Seq rd-ALU Fetch BEQ Sub rs rt ALUoutCond Fetch

In this problem, we are going to add three new instructions to this data path:

lui $rd, ⇒ R[rd] ← Imm16 || 0000000000000000 multacc $rd, $rs, $rt ⇒ R[rd] ← (R[rs]×R[rt]) + R[rd] bltual $rs, $rt ⇒ if (R[rs] < R[rt]) then PC ← PC + 4 + sign_ext(Imm16) || 00 R[31] ← PC + 4 else PC ← PC + 4

1. The lui instruction is familiar to you from the normal MIPS instruction set. It places the 16 bit immediate field into the upper 16 bits of R[rd], filling the lower 16 bits of R[rd] with zeros. Important note: the encoding for the lui instruction has a zero in the rs field.

The multacc instruction (multiply-accumulate) uses register R[rd] as both a source and a destination register. It multiplies the values R[rs] and R[rt], adds the result to register R[rd], then places the result back into register R[rd]. Assume that this instruction does not overflow.
The bltual instruction (branch on less than unsigned and link) checks to see if R[rs] is less than R[rt]. If it is, it will save the PC in $ra (like jal), then branch to the offset.

Problem 4a: How wide are microinstructions in the original datapath (answer in bits and show some work!)?

15 = 2+1+3+2+2+1+2+

The trickiest part of this computation is the PC Write field. We have to remember to represent the “do nothing” option, which means that there are actually three different values for the PC Write field.

Problem 4b: Draw a block diagram of a microcontroller for the unmodified datapath. Include sequencing hardware, the dispatch ROM, the microcode ROM, and decode blocks to turn the fields of the microcode into control signals. Make sure to show all of the control signals coming from somewhere. ( hint: The PCWr, PCWrCond, and PCSrc signals must come out of a block connected to thePCWrite field of the microinstruction).

Problem 4c: Come up with a binary encoding for the ALUDest field of the microinstruction (rd-ALU, rt-ALU, rt-Mem, or blank). Construct logic which maps this binary field to the appropriate control signals from problem 4b.

Name Code rt-ALU 00 rt-mem 01 rd-ALU 10 blank 11

ROM

MUX

+1 microPC

Dispatch ROM

2 ALU Funct

ALUop

SRC1SRC2 ALUDestMemory

MemRegPcWrite

ALUSelA ExtOpALUSelBRegDest

MemtoReg RegWr IorDMemWr IrWr PCWr

PCWrCond PCSrc

2 1 3 2 2 1 2

MemToReg RegDest

I 0

RegWr I 1 I 0

I 1

Problem 4e: Describe changes to the microinstruction assembly language for these new instructions. How wide are your microinstructions now?

New Field: BSRC. Possible values: GetRT, GetRD, blank. GetRT and blank are equivalent

Addition to ALU field: Mul (Do a multiplication) Addition to SRC1 field: ALUOut (Use ALUOut register value) Addition to SRC2 field: Shift16 (Use value of immediate shifted left by 16) Addition to ALUDest field: ra-PC-cond (Write PC to $ra if PC condition is true) Addition to PCWrite: ALUNegBr (Update PC if ALU NEG output is true)

Need 1 new bit for BSRC, 1 additional bit for ALU field, 1 additional bit for SRC1, 0 additional bits for SRC2 field, 1 additional bit for ALUDest, 0 additional bits for PCWrite. So, total = 19

Problem 4f: Write complete microcode for the three new instructions. Include the Fetch and Dispatch microinstructions. If any of the microcode for the original instructions must change, explain how ( Hint: since the original instructions did not use R[rd] as a register input, you must make sure that your changes do not mess up the original instructions).

Label ALU BSRC SRC1 SRC2 ALUDest Memory MemReg PCWrite Sequence Fetch Add PC 4 ReadPC IR ALU Seq Dispatch Add GetRT PC ExtShft Dispatch

lui Add rs Shift16 Seq rd-ALU Fetch multacc Mult GetRD rs rt Seq Add ALUOut rt Seq rd-ALU Fetch bltual Sub rs rt ra-PC-cond ALUNegBr Fetch

Note that we assert GetRT during the dispatch stage, so that the first post-dispatch cycle of every instruction has the value of R[rt] in the “B” register (so to not mess up our other instructions). Since there are only two options for the BSRC field, you could imagine that “blank” produces this normal behavior. However, we have coded it explicitly in order to make our point. Problem 4g: What are the CPI values for each of the three new instructions?

lui: CPI = 4 (if you didn’t go through an “Add”, this could be as low as 3) multacc: CPI = 5 bltual: CPI = 3

Amdahl’s Law - Computer Architecture and Engineering - Solved Exams, Exams of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Amdahl’s Law - Computer Architecture and Engineering - Solved Exams and more Exams Computer Architecture and Organization in PDF only on Docsity!

Midterm I

= × ×

×

×

X 0

A

X 1

Y

Z

X 2 X 3 X 4 X 5

TAY ↑ =TAY ↓ = 2 ×[ 1. 815 + (. 002 × 300 ) + 2. 13 + (. 0021 x 300 ) ] = 10. 35 ns

TAY ↑ =TAY ↓ = 2 ×[ 2. 73 + (. 002 × 600 ) + 3. 06 + (. 0021 x 600 ) ] = 16. 5 ns

A

B

Y

In this problem, we are going to add three new instructions to this data path:

ROM