Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Architecture Midterm 1 - CS152, University of California, Berkeley, Exams of Computer Architecture and Organization

The solutions to midterm 1 of the computer architecture course (cs152) at the university of california, berkeley. It includes problems on critical path and delay, single-cycle processors, and single-cycle datapath design. Students are expected to understand delay parameters, propagation delays, load dependent delays, clock cycle time, energy consumption, and register transfer language description.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3
shashikanth_0p3 🇮🇳

4.8

(8)

55 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
University of California, Berkeley
College of Engineering
Department of Electrical Engineering and Computer Science
Spring 2000 Prof. Bob Brodersen
Midterm 1
March 15, 2000
CS152: Computer Architecture
This midterm consists of four problems, each of which has multiple parts, so budget your time accordingly. The
exam is closed-book, but calculators and one sheet of notes are allowed. Good luck!
Name SOLUTIONS
SID
Discussion
1
2
3
4
Total
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Computer Architecture Midterm 1 - CS152, University of California, Berkeley and more Exams Computer Architecture and Organization in PDF only on Docsity!

University of California, Berkeley Department of Electrical Engineering and Computer ScienceCollege of Engineering Spring 2000 Prof. Bob Brodersen

Midterm 1

CS152: Computer ArchitectureMarch 15, 2000

This midterm consists of four problems, each of which has multiple parts, so budget your time accordingly. The exam is closed-book, but calculators and one sheet of notes are allowed. Good luck!

Name SOLUTIONS

SID

Discussion

Total

Problem 1: Critical Path and Delay (25 points)

Throughout this problem, use the simple linear delay model presented in class. For the circuit below, assume the following delay parameters: NAND: tInput capacitance: 100fFplh = 0.5ns, t (^) phl = 0.5ns, t (^) plhf = 0.002ns/fF, t (^) phlf = 0.002ns/fF Inverter: tInput capacitance: 50fFplh = 0.2ns, t (^) phl = 0.2ns, t (^) plhf = 0.001ns/fF, t (^) phlf = 0.001ns/fF Wiring Capacitance: (Equal for all nodes) 5fF

Y F

Z

X

a) What is the worst case delay? Assume there is no delay at the inputs X, Y and Z. The equation for the worst case delay is as follows: .2ns+105fF.001ns/fF.5ns+205fF.002ns/fF (INVERTER)(NAND1) .5ns+205fF.002ns/fF.5ns+105fF.002ns/fF (NAND2)(NAND

  • .5ns+5fF*.002ns/fF= 3.345 ns (NAND4)

Note: There is no delay at the input nodes, and remember to include fan-out and wiring delay! b) Now assume that you want to generate a symbol for the circuit in part (a). Determine the following parameters for your symbol: tplh, tphl, and the load dependant delay (in ns/fF). X Y F Z First the propagation delays: tplh = tphl, = 3.345 ns. This is the same as the critical path from the last part. For the load dependent delay, since we only have a single NAND driving the output, it is the same as the NAND itself:0.002 ns/fF.

Problem 2: Single-cycle Processors (25 points)

The following MIPS code finds the maximum integer within a bounded array, where $4 contains a pointer to the beginning of the array, $5 contains the length of the array and $3 contains the pointer to store the result at the end.(Assuming there is no branch delay slot.)

LWADDI $2, $4, 0($4) $4, 4 // assume the first number is the largest ADDI $5, $5, - max: LW $6, 0($4) // load array element and increment pointer ADDI $4, $4, 4 SLTBEQ $7,$7, $2,$0, $6next // update $2 if $6 is larger ADD $2, $0, $ next: ADDI $5, $5, -1 // continue the search until end of array BEQJ max $5, $0, finish

finish: SW $2, 0($3) // store result The single-cycle datapath and control unit are shown on the next page. Assume that the delay and energyconsumption per operation for each functional unit is as follows:

  • Memory (read or write): 3 ns, 3 pJ
  • ALU and adder: 2 ns, 2 pJ
  • Register file (read or write): 1 ns, 1 pJ
  • All other units: 0 ns, 0 pJ a) What is the minimum clock cycle time for this processor? The minimum cycle time of the processor is 10ns (or a frequency of 100MHz). b) For an array of length N, what is the range of execution time for this program (e.g. – the minimum possible execution time and the worst case execution time)? We asked for a range on this problem since the exact number of instructions is data dependent (a result of the firstbranch). The range of execution times is a minimum of 10(7N-4) ns and a maximum of 10(8N-5) ns.

c) What is the energy consumption (per instruction) for each type of instruction in the program? Assume that components are completely “turned off” and do not consume energy when they are not needed. The energy consumption of each type of instruction is listed below.LW: 12 pJ ADDI/ADD/SLT: 9 pJBEQ: 10 pJ SW: 11 pJ

Diagram and scratch space for Problem 2:

a) Draw the datapath showing all interconnections and components (including the controller).

b) What is the critical path? The critical path is stressed on the ADD and ADDIU instructions. It includes the PC, instruction memory, register file, the ALUSrc mux, the ALU, the WrSrc mux, and the setup time for writing to the register file. c) What is the delay of the critical path? The sum of all the delays above is 10. Don’t forget the clock-to-Q of the PC and the setup time of the register file! d) Show the values of all the control points for each instruction. (The Enable for the PC is given as anexample)

PCEnable ALUSrc ALUOp Sign/Zero Rotate WrSrc ADDIU 1 0 00 Sign X 0 ADD 1 1 00 X X 0 SRL 1 X XX X 0 1 Rotate 1 X XX X 1 1

Problem 4: Multi-cycle Processors (25 points)

For this problem you will be working with the multi-cycle datapath components on the next page. All inputs for the functional units are labeled, and registers only have one data input and one data output (you should not draw theclock lines). You will not need to deal with control in this problem, so the control inputs to each block are not shown. a) Given the datapath components on the next page, determine the register transfer language description for eachof the standard MIPS instructions in the table below. You do not need to fill in every row. Hint: Do not write to the register file at the end of a cycle (i.e. – only write directly from a register, not a functional unit). ADDU IRPC  Mem(PC); PC + 4;

A  Reg(IR[rs]); B  Reg(IR[rt]); S  A + B; Reg(IR[rd])  S;

LW

IRPC  Mem(PC); PC + 4; A  Reg(IR[rs]); S  A + Ext(imm); M  Mem(S); Reg(IR[rt])  M;

SW

IRPC   Mem(PC); PC + 4; A  Reg(IR[rs]); B  Reg(IR[rt]); S  A + Ext(imm); Mem(S)  B;

JAL

IRS   PC + 4; Mem(PC); Reg(31)  S; PC  PC[31:26] || IR[j] || 00;

b) You’ll notice that some components need to be reused during execution of an instruction. Wire the datapath tosupport all four instructions, adding only muxes as needed. You may provide constants as inputs to any component. Be sure to label special buses, such as instruction fields. You do not need to draw any controlsignals (including mux select signals) – just assume they will be correctly generated in all cases.

c) For each instruction in part (a), calculate the CPI and indicate on the table above which operations occur during each cycle. The register transfers in part (a) above are already separated into the operations that can be performed in each clockcycle. This corresponds to the following CPI: ADDU – 4, LW – 5, SW – 4, and JAL – 2.

d) The table below indicates the worst case delay through each of the functional units used in the datapath. Given these delays, calculate the execution time of this processor for a program consisting of 400,000 adds, 250,000loads, 250,000 stores, and 100,000 branches.

Functional Unit Memory Worst-case Delay 50ns Register File (write)Register File (read)^ 25ns15ns All othersALU^ 30ns0ns

The trick to this question was realizing that cycle time is only affected by the longest path between registers in thedatapath. In this case, since there is no setup time or clock-to-Q delay, the cycle time will be 50ns. There is no need to have two functional units in series in this datapath, and doing so will only reduce performance. The execution time of the processor will be the total number of cycles required multiplied by the cycle time: Time = (400,000 × 4 + 250,000 × 5 + 250,000 × 4 + 100,000 × 2) × 50ns = 202.5ms