Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Instruction Decode - Computer Architecture and Engineering - Exams, Exams of Computer Architecture and Organization

Main points of this past exam are: Instruction Decode, Pipelining/Hazards, Pipelined Mips, Mips Machine, Pipelined Datapath, Instruction Decode, Decode Stage, Nop Instructions, Appropriate Places, Wire Delays

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3
shashikanth_0p3 🇮🇳

4.8

(8)

55 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
CS152
COMPUTER ARCHITECTURE AND ENGINEERING
EXAMINATION #2
NAME:____________________________
DISCUSSION SECTION TIME:_____________
PROBLEM NUMBER SCORE
#1
#2
#3
#4
TOTAL SCORE
NOTE: Please show your work CLEARLY for all problems. I hope you
enjoy the test!
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Instruction Decode - Computer Architecture and Engineering - Exams and more Exams Computer Architecture and Organization in PDF only on Docsity!

C S 1 5 2

COMPUTER ARCHITECTURE AND ENGINEERING

EXAMINATION

N A M E : _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

DISCUSSION SECTION TIME:_____________

PROBLEM NUMBER SCORE

TOTAL SCORE

NOTE: Please show your work CLEARLY for all problems. I hope you enjoy the test!

PROBLEM 1: PIPELINING/HAZARDS

For all five parts of this question, assume that we are using the five-stage pipelined MIPS machine described in the CS152 textbook.

a. (6 points) The following is some code from Mr. Oza’s Nut Factory. Assume that the pipelined datapath has NO FORWARDING. Find the register hazards in the following code. Enter your answers in the table on the next page. Also, for each hazard that you find, classify the hazard (under “Hazard Type” in the table below) into one of the following three types:

(1). The write register of the instruction in the EXECUTION stage is the same as the read register of the instruction in the INSTRUCTION DECODE stage.

(2). The write register of the instruction in the MEMORY stage is the same as the read register of the instruction in the INSTRUCTION DECODE stage.

(3). The write register of the instruction in the WRITE-BACK stage is the same as the read register of the instruction in the INSTRUCTION DECODE stage

When designating the two instructions between which there is a hazard (under Instruction#1 and Instruction #2 below), use the number to the left of the instruction. When designating the type of hazard, use the number corresponding to one of the three hazards listed above.

  1. add $3,$1,$
  2. lw $1,0($4)
  3. and $5,$3,$
  4. and $6,$1,$
  5. or $1,$3,$
  6. sw $1,4($4)
  7. lw $2,4($4)
  8. sub $3,$5,$

d. (4 points) Pipeline the following circuit for maximum throughput by adding pipeline registers (by drawing vertical lines on one or more wires) at appropriate places. Use as few pipeline registers as possible. On each component, the number in parentheses is that component’s latency. THERE ARE NO WIRE DELAYS OR OTHER DELAYS IN THE CIRCUIT. None of the components are clocked; therefore, you have to make sure that, for each component, both inputs arrive at the same time.

(1)

(2)

(1)

(2)

(1)

(1)

(2)

e. (4 points) What is the lowest possible cycle time for the pipelined version of the circuit above? How can you figure it out without actually pipelining the circuit?

PROBLEM 2: BUSES

a. (6 points) Name three traditionally classified buses and their features:




b. (16 points) We want to compare the maximum bandwidth for a synchronous and an asynchronous bus.

  • The synchronous bus has a clock cycle time of 30ns, and each bus transmission takes 2 clock cycles.
  • The asynchronous bus requires 35ns per handshake.
  • The data portion of both buses is 32 bits wide.
  • Addresses are 32 bits(word) long but the returning data are 64 bits(double word) long.

Find the bandwidth for each bus when performing one-double-word READS from a 100ns memory (64 bits data).

Calculations for Synchronous bus (4 points):

Synchronous Bus bandwidth (4 points): _________________ (Megabytes/Second)

PROBLEM 3: MEMORY SYSTEM DESIGN

In this problem, you are going to design a memory system for a computer. One thing we want you to learn from this class is how to solve a BIG problem by solving a series of little problems. So we have divided this problem into multiple sub-problems.

CPU

Write Through

Physical Addressed

Data Cache

Store Buffer

Main

Memory

First here are some numbers and equations you may need to solve this problem:

Virtual Memory Page Size = 4K Byte DRAM Chips Available: Size: 4 Megabit = 256K words x 8-bit Read Access Time = Write Access Time = 50ns Read Cycle Time = Write Cycle Time = 100ns

Data cache Options Hit Time Miss Rate Miss Penalty

J-byte Direct-Mapped 1 cycle 8% 4 cycles L-byte 2-way Set Associative 1 cycle 4% 6 cycles M-byte 4-way Set Associative 1 cycle 2% 8 cycles

Note: J, L, M are values you need to pick in Part (a) of this problem

Queuing Theory Review: Utilization = Request Rate / Service Rate Average Q Length = Utilization / (1 - Utilization) Probability of Overflow of a N-entry Q = (Utilization) N

Part (a) Physically Addressed Data Cache (12 points)

Part (a): The first thing you need to pick is the configuration of the write-through, physically addressed, data cache. Your options are:

  • J-byte Direct Mapped
  • L-byte 2-Way Set Associative
  • M-byte 4-Way Set Associative

(a.1) (2 points): What are J, L, and M in order to index these caches with the virtual address?

a.2 (6 points): Assume Load instructions have a CPI of 1.2 if they h i t in the data cache and if they miss, they have a CPI of (1.2 + Miss Penalty). All other instructions have a CPI of 1.2. Furthermore assume 20% of the instructions are loads. What is the machine’s CPI for each of the 3 cache options (keep 3 decimal points for your answer, example: 1.xxx)? Which cache should you use if CPI is t h e only criterion?

b.2 (1 point): What is the maximum rate at which DRAM can service store requests? Assume that the DRAM Write Cycle Time = 100ns.

b.3 (2 points) What is the utilization and mean Q length based on queuing theory?

b.4 (1 point) What is the probability of overflow if the queue has two entries?

b.5 (3 points) What is the minimum number of entries the Store Buffer needs to have to lower the probability of buffer overflow t o less than 4%?

Part (c) Main Memory Design (5 points)

Data Cache Block Size = 32B

Main Memory N =? (^) # of chips?

Total =? MB

(c) In this part, assume the data cache you designed in Part (a) has a Block Size = 32 B. On a cache miss, you want to fill up the cache in 2 DRAM read cycles.

c.1 (2 point) How wide does the datapath between the DRAM and your data cache have to be in order to achieve our goal of filling the cache in 2 DRAM read cycles?

c.2 (3 points) What is the minimum number of DRAM chips (256K x 8 ) you need for the main memory and what is the minimum memory size?

The ALU operation is controlled by the AOP register. AOP may be one of {ADD, SUB, SEQ}, which are all defined constants which may be used as the Immediate of an instruction. The two operands to t h e ALU are the A and B registers, and the result of the ALU operation can be read from AOUT. ADD is defined as A+B, SUB is defined as A - B, while SEQ is defined as 1 if (A == B), else 0 if (A != B). The memory is a simple synchronous word-addressed one-cycle memory. For a read operation, the address is first placed in t h e ADDR register, then a data value can be read from the READ register. For a write, first place the address in the ADDR register, and then write the data to the WRITE register. Immediate data values may be in the range -32 to 31 (6 b i t s signed 2's complement.) This system uses a shared i n s t r u c t i o n / d a t a memory model. Large immediate values may be in-lined in the code and accessed using PC-relative addressing. Show below are two examples, the first one very simple and t h e second one more complex.

example 1) Write out code equivalent to the MIPS instruction "addi r6, r5, #12" AOP < - ADD Get the ALU ready for addition. ADDR < - 5 Get the memory ready to read 'register' 5. A < - READ Read the memory and store in register A. B < - 1 2 Load the immediate constant 12. ADDR < - 1 Get ready to write-back to 'register' 6. WRITE<- AOUT Write the ALU result to memory.

example 2) Write out code equivalent to the MIPS instruction "addi r6, r5, #-142"

AOP < - ADD Get the ALU ready for addition. A < - PC Load the PC to calculate effective address. B < - 1 1 13-2 = 11.(Data at line 13, PC from line 2) ADDR <- AOUT Save the address of the in-line immediate. B < - READ Load the in-line immediate data. ADDR < - 5 Get ready to load 'register' 5. A < - READ Read the input data from memory. ADDR < - 6 Get ready to save the result. WRITE < - AOUT Save the result of the ALU add. A < - PC Now we need to skip over line #13, because B < - 4 it is immediate data and won't execute. NPC < - AOUT So use 14-10 = 4 added to the PC and jump!

  • 1 4 2 Data for the ALU add. Don't try to execute!

Below are four incomplete code fragments. Fill in all the missing blanks, INCLUDING COMMENTS.

a) (5 points) Write out code equivalent to the MIPS instruction "sub r10, r11, r12"

AOP Get the ALU ready to do a subtract. ADDR 1 1 Get ready to read memory location

A READ Read the memory into register A. ADDR B READ Read the memory into register B.

WRITE AOUT Write the result to memory.

d) (10 points) Write out code equivalent to the MIPS instruction "bne r4, r5, -4"

  1. <<< If r4 != r5, this branch will infinite loop >>> AOP SEQ Ready to ALU for the comparison ADDR 4 Ready to read memory 4 A READ Read location 4 ADDR 5 Ready to read memory 5 B READ Read location 5 A B AOP ADD No more comparison tests A AOUT B PC Compute offsets from here.

NPC AOUT And jump!

  • 9 !=, jump to line 1, 1-10 = -9. 6 ==, jump to line 16, 16-10 = 6. <<< If r4 = r5, continue execution here >>>

THE END!!