Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Midterm Questions on Computer Architecture and Engineering from UC Berkeley EECS, Exams of Computer Architecture and Organization

Midterm questions for a computer architecture and engineering course at the university of california, berkeley, in the electrical engineering and computer sciences (eecs) division. The questions cover topics such as cache hierarchy, performance analysis, and processor design. Students are allowed to use a calculator and one double-sided page of notes. The questions ask for calculations, comparisons, and explanations.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3
shashikanth_0p3 🇮🇳

4.8

(8)

55 documents

1 / 18

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
University of California, Berkeley
College of Engineering
Computer Science Division | EECS
Fall 1997 D.A. Patterson
Midterm II
October 19, 1997
CS152 Computer Architecture and Engineering
You are allowed to use a calculator and one 8.5" x 1" double-sided page of notes. You
have 3 hours. Good luck!
Your Name:
SID Number:
Discussion Section:
1 /20
2 /20
3 /20
4 /20
Total
/
80
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12

Partial preview of the text

Download Midterm Questions on Computer Architecture and Engineering from UC Berkeley EECS and more Exams Computer Architecture and Organization in PDF only on Docsity!

University of California, Berkeley College of Engineering Computer Science Division | EECS

Fall 1997 D.A. Patterson

Midterm I I Octob er 19, 1997 CS152 Computer Architecture and Engineering

You are allowed to use a calculator and one 8.5" x 1" double-sided page of notes. You have 3 hours. Go o d luck!

Your Name:

SID Numb er:

Discussion Section:

Total / 80

Question 1

In addition to higher hardware costs, a larger cache can also have a longer access time. Beyond a certain size, a L1 cache will reduce the p erformance of the CPU. It is p ossible, however, to reduce the miss p enalty without a ecting cycle time. To do this, many mo dern computers use a second level cache. The L2 cache is not inside the pip elin e and is accessed only when there is a miss in L1. Main memory access is required only when there is a miss in L2.

For this question, we will not distinguish reads from writes. Both the L1 and L2 caches are synchronized to the CPU clo ck. For b oth L1 and L2, a miss is handled with an access to the next lower level in the memory hierarchy, and after the miss the request is handled exactly like a hit. For example, in a read, L1 will rst up date its contents with data from L2, and then pass it on to the CPU.

De nitions:

 MR - Miss rate. Fraction of accesses to this cache that result in a miss.

 HT - Hit time. Access time during a cache hit.

 MP - Miss p enalty. Additional access time incurred by a cache miss.

a) For a system with two levels of caching, L1 and L2, give the average access time of L1 in terms of M RL 1 ; H TL 1 ; M RL 2 ; H TL 2 ; andM PL 2 , for the case where HT and MP are integral multiples of CPU cycle time.

Question 1 (cont)

c) A L2 cache was added to the system. Now is it a go o d idea to double the L1 cache from its original size? What will b e the change in p erformance?

d) In general, you tend to care more ab out miss ratio for L1 caches and hit times for L2 caches. True or False?

e) Compare your answer to part b) with part c). Do es the presence or absence of a L2 cache in uences design decisions for L1? Explain why.

Question 2

You have recently b een recruited by InDec Corp. to work on a second version of their \Alphium" pro cessor. Alphium is a simple ISC pro cessor with 5 pip elin e stages, just like the one presented in class. Its clo ck frequency is 200MHz and the chip p ower supply (Vdd) is 3.3V. For the highly appreciated MisSp ec b enchmark, Alphium achieves a IPC of 0.8 instructions p er cycle (I P C = 1 C P I ).

Your assignment is to evaluate the improvement in execution time, p ower and energy consumption for four prop osed new versions of the Alphium pro cessor. Here is a description of the alternatives:

 Low Power Alphium (LPA): This would b e the same design as the original Alphium but it would b e clo cked at 133MHz to save p ower. The p ower supply (Vdd) would remain 3.3V and the achieved IPC for MisSp ec would b e 0.8 again.

 Sup erscalar Alphium (SSA): This would b e a 4-way sup erscalar version of the Alphium. It would have the ability to issue up to 4 instructions p er cycle. The p ower supply (Vdd) would remain 3.3V but the clo ck frequency would b e reduced to 166MHz, due to the complexity of the issuing logic. The achieved IPC for MisSp ec would b e 2. Assume that the e ective capacitance switched in the sup erscalar design would b e 4 times that of the original.

 Low Voltage Alphium (LVA): This would b e the same design as the original Alphium again but b oth the p ower supply and the clo ck frequency would b e reduced. Power supply (Vdd) would b e 2V and clo ck frequency would b e 133MHz. The IPC for MisSp ec would b e 0.8 once again.

 Low Voltage Sup erscalar Alphium (LVSSA): This would b e a 4-way sup erscalar version with reduced p ower supply. The clo ck frequency would b e 100MHz, the p ower supply (Vdd) would b e 2V and an IPC of 2 would b e achieved for MisSp ec. Assume that the e ective capacitance switched in the sup erscalar design would b e 4 times that of the original.

Here is some information that you may nd useful:

Power: When we measure p ower for a system, we care ab out the maximum instantaneous p ower the system can consume. This is imp ortant as it determines the maximum current that the p ower supply must b e able to supply to the system and the amount of heat that has to b e removed from the system.

Energy: E=CVdd^2 is just the energy p er transaction. This is not interesting. We care ab out the energy consumed from the p ower supply to execute a task (or p erform some computation). Once the task is executed, the pro cessor can b e turned o and no further energy is needed. The energy p er task determines how many tasks you can execute b efore the battery runs out.

Question 2 (cont)

b) Fill in the two following tables. In each b ox write how the prop osed new version compares to the original Alphium for that feature (e.g. (^) E ExecT^ xecT ime^ imeornew ig inal , (^) P Pow^ ow er^ erornew ig inal etc). Two fractional digits p er entry are enough. Use the following (blank) page as scratch pap er.

Relative Relative Relative IPC Freq Vdd ExecTime Power Energy

LPA 0.8 133MHz 3.3V

SSA 2.0 166MHz 3.3V

LVA 0.8 133MHz 2.0V

LVSSA 2.0 100MHz 2.0V

Relative Relative IPC Freq Vdd Performance=Power Performance=Energy

LPA 0.8 133MHz 3.3V

SSA 2.0 166MHz 3.3V

LVA 0.8 133MHz 2.0V

LVSSA 2.0 100MHz 2.0V

Question 2 (cont)

Question 3

The DiSPlacement is a hyp othetical DSP variation of the MIPS architecture. Here are the 3 changes from MIPS:

  1. Load and store instructions are changed to have ONLY the following two addressing mo des:

(a) Register indirect: the address is the contents of the register. For example: lwi r5, r1 # r5 Mem[r1] (b) Register autoincrement (ai): the address is the contents of the register; as part of this instruction, increment this register by the size of the data in bytes. Note that the memory address is the ordinal value of the register b efore incrementing. For example: lwai r5, r1 # r5 Mem[r1]; r1 r1 + 4

  1. There is a new 64-bit register called Acc, standing for accumulator.
  2. There is a multiply accumulate instruction (MAC), which b oth adds the contents of the Hi:Lo to Acc and multiplies two 32-bit registers and puts the 64-bit pro duct into the existing registers Hi:Lo. For example: mac r3, r4 # Acc Acc + Hi:Lo; Hi:Lo r3r

Putting these extensions together, the unrolled lo op of the FIR lter lo oks like this (assume that Acc and Hi:Lo are initialize d to 0): lwai r5, r1 # r5 Mem[r1]; r1 r1 + 4 mac r2, r5 # Acc Acc + Hi:Lo; Hi:Lo r2r lwai r5, r1 # r5 Mem[r1]; r1 r1 + 4 mac r2, r5 # Acc Acc + Hi:Lo; Hi:Lo r2r

...

Since the memory accesses are based on the contents of registers only, the designers of DiSPlacement decided to change the 5-stage pip eline by swapping the EX and MEM stages:

  1. Instruction Fetch
  2. Instruction Deco de/Register Fetch
  3. Memory Access
  4. Execute
  5. Write Back

Assume that the execute stage has a 1 clo ck cycle multiplier and that the ALU can p erform 64-bit additions. The gure on the next page shows the mo di ed pip eline datapath.

Replace this page with displacement pip eline

(e) Arithmetic-logical then Arithmetic-logical

(f ) Arithmetic-logical then Store

(g) Arithmetic-logical then Branch

Question 3 (cont)

b) Remove as many of these hazards as you can, but you are limited to changes in the datapath from the following list:

  1. Change the numb er of read or write p orts on the register le;
  2. Add one more adder (of whatever width you need);
  3. Add multiplexors to the inputs of memory, multipliers, ALUs, or adders.

Do not worry ab out the control of any changes. In the table b elow, list the original hazard, hardware changes, and why the change resolves the hazard.

Hazard Hardware Changes Why It Resolves

Question 4

The I/O bus and memory system of a computer are capable of sustaining 1000 MB/s without interfering with the p erformance of an 700-MIPS CPU (costing $50,000). This system will b e used as a transaction pro cessing (TP) system. TP involves many relatively small changes (transactions) to a large b o dy of shared information (the database account le). For example, airline reservation systems as well as banks are traditional customers for TP. Here are the assumptions ab out the software on the system that will execute a TP b enchmark:

 Each transaction requires 2 disk reads plus 2 disk writes.

 The op erating system uses 50,000 instructions for each disk read or write.

 The database software executes 500,000 instructions to pro cess a transaction.

 The amount of data transferred p er transaction is 2048 bytes.

You have a choice of two di erent typ es of disks:

 A small disk (2.5") that stores 1000 MB and costs $60.

 A big disk (3.5") that stores 2500 MB and costs $150.

Either disk in the system can supp ort on average 100 disk reads or writes p er second.

You wish to evaluate di erent system con gurations based on a transaction pro cessing b enchmark that uses a 20 GB database account le. Answer parts (a){(e) based on this b enchmark. Assume that the requests are spread evenly to all the disks, and that there is no waiting time due to busy disks. Show all work for all parts.

a) Complete the table b elow. \Numb er of Units" refers to the minimum numb er of that item required for each organization; \Demand p er Transaction" refers to the demand (in MIPS, bytes, or I/Os) that each transaction places on that comp onent; and \TP/s Limit" refers to the maximum numb er of transactions p er second that each subsystem (pro cessor, bus, or disks) could supp ort.

Units Performance Numb er of Units Demand p er Transaction TP/s Limit

CPU 700 MIPS 1 MIPS

Bus 1000 MB/s 1 bytes

2.5" disks 100 IOs/s I/Os

3.5" disks 100 IOs/s I/Os

Question 4 (cont)

b) How many transactions p er second are p ossible with each disk organization, assuming that each uses the minimum numb er of disks to hold the account le?

c) What is the system cost p er transaction p er second of each alternative for the b enchmark?