










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Midterm questions for a computer architecture and engineering course at the university of california, berkeley, in the electrical engineering and computer sciences (eecs) division. The questions cover topics such as cache hierarchy, performance analysis, and processor design. Students are allowed to use a calculator and one double-sided page of notes. The questions ask for calculations, comparisons, and explanations.
Typology: Exams
1 / 18
This page cannot be seen from the preview
Don't miss anything!
University of California, Berkeley College of Engineering Computer Science Division | EECS
Fall 1997 D.A. Patterson
Midterm I I Octob er 19, 1997 CS152 Computer Architecture and Engineering
You are allowed to use a calculator and one 8.5" x 1" double-sided page of notes. You have 3 hours. Go o d luck!
In addition to higher hardware costs, a larger cache can also have a longer access time. Beyond a certain size, a L1 cache will reduce the p erformance of the CPU. It is p ossible, however, to reduce the miss p enalty without a ecting cycle time. To do this, many mo dern computers use a second level cache. The L2 cache is not inside the pip elin e and is accessed only when there is a miss in L1. Main memory access is required only when there is a miss in L2.
For this question, we will not distinguish reads from writes. Both the L1 and L2 caches are synchronized to the CPU clo ck. For b oth L1 and L2, a miss is handled with an access to the next lower level in the memory hierarchy, and after the miss the request is handled exactly like a hit. For example, in a read, L1 will rst up date its contents with data from L2, and then pass it on to the CPU.
De nitions:
MR - Miss rate. Fraction of accesses to this cache that result in a miss.
HT - Hit time. Access time during a cache hit.
MP - Miss p enalty. Additional access time incurred by a cache miss.
a) For a system with two levels of caching, L1 and L2, give the average access time of L1 in terms of M RL 1 ; H TL 1 ; M RL 2 ; H TL 2 ; andM PL 2 , for the case where HT and MP are integral multiples of CPU cycle time.
c) A L2 cache was added to the system. Now is it a go o d idea to double the L1 cache from its original size? What will b e the change in p erformance?
d) In general, you tend to care more ab out miss ratio for L1 caches and hit times for L2 caches. True or False?
e) Compare your answer to part b) with part c). Do es the presence or absence of a L2 cache in uences design decisions for L1? Explain why.
You have recently b een recruited by InDec Corp. to work on a second version of their \Alphium" pro cessor. Alphium is a simple ISC pro cessor with 5 pip elin e stages, just like the one presented in class. Its clo ck frequency is 200MHz and the chip p ower supply (Vdd) is 3.3V. For the highly appreciated MisSp ec b enchmark, Alphium achieves a IPC of 0.8 instructions p er cycle (I P C = 1 C P I ).
Your assignment is to evaluate the improvement in execution time, p ower and energy consumption for four prop osed new versions of the Alphium pro cessor. Here is a description of the alternatives:
Low Power Alphium (LPA): This would b e the same design as the original Alphium but it would b e clo cked at 133MHz to save p ower. The p ower supply (Vdd) would remain 3.3V and the achieved IPC for MisSp ec would b e 0.8 again.
Sup erscalar Alphium (SSA): This would b e a 4-way sup erscalar version of the Alphium. It would have the ability to issue up to 4 instructions p er cycle. The p ower supply (Vdd) would remain 3.3V but the clo ck frequency would b e reduced to 166MHz, due to the complexity of the issuing logic. The achieved IPC for MisSp ec would b e 2. Assume that the e ective capacitance switched in the sup erscalar design would b e 4 times that of the original.
Low Voltage Alphium (LVA): This would b e the same design as the original Alphium again but b oth the p ower supply and the clo ck frequency would b e reduced. Power supply (Vdd) would b e 2V and clo ck frequency would b e 133MHz. The IPC for MisSp ec would b e 0.8 once again.
Low Voltage Sup erscalar Alphium (LVSSA): This would b e a 4-way sup erscalar version with reduced p ower supply. The clo ck frequency would b e 100MHz, the p ower supply (Vdd) would b e 2V and an IPC of 2 would b e achieved for MisSp ec. Assume that the e ective capacitance switched in the sup erscalar design would b e 4 times that of the original.
Here is some information that you may nd useful:
Power: When we measure p ower for a system, we care ab out the maximum instantaneous p ower the system can consume. This is imp ortant as it determines the maximum current that the p ower supply must b e able to supply to the system and the amount of heat that has to b e removed from the system.
Energy: E=CVdd^2 is just the energy p er transaction. This is not interesting. We care ab out the energy consumed from the p ower supply to execute a task (or p erform some computation). Once the task is executed, the pro cessor can b e turned o and no further energy is needed. The energy p er task determines how many tasks you can execute b efore the battery runs out.
b) Fill in the two following tables. In each b ox write how the prop osed new version compares to the original Alphium for that feature (e.g. (^) E ExecT^ xecT ime^ imeornew ig inal , (^) P Pow^ ow er^ erornew ig inal etc). Two fractional digits p er entry are enough. Use the following (blank) page as scratch pap er.
Relative Relative Relative IPC Freq Vdd ExecTime Power Energy
LPA 0.8 133MHz 3.3V
SSA 2.0 166MHz 3.3V
LVA 0.8 133MHz 2.0V
LVSSA 2.0 100MHz 2.0V
Relative Relative IPC Freq Vdd Performance=Power Performance=Energy
LPA 0.8 133MHz 3.3V
SSA 2.0 166MHz 3.3V
LVA 0.8 133MHz 2.0V
LVSSA 2.0 100MHz 2.0V
The DiSPlacement is a hyp othetical DSP variation of the MIPS architecture. Here are the 3 changes from MIPS:
(a) Register indirect: the address is the contents of the register. For example: lwi r5, r1 # r5 Mem[r1] (b) Register autoincrement (ai): the address is the contents of the register; as part of this instruction, increment this register by the size of the data in bytes. Note that the memory address is the ordinal value of the register b efore incrementing. For example: lwai r5, r1 # r5 Mem[r1]; r1 r1 + 4
Putting these extensions together, the unrolled lo op of the FIR lter lo oks like this (assume that Acc and Hi:Lo are initialize d to 0): lwai r5, r1 # r5 Mem[r1]; r1 r1 + 4 mac r2, r5 # Acc Acc + Hi:Lo; Hi:Lo r2r lwai r5, r1 # r5 Mem[r1]; r1 r1 + 4 mac r2, r5 # Acc Acc + Hi:Lo; Hi:Lo r2r
...
Since the memory accesses are based on the contents of registers only, the designers of DiSPlacement decided to change the 5-stage pip eline by swapping the EX and MEM stages:
Assume that the execute stage has a 1 clo ck cycle multiplier and that the ALU can p erform 64-bit additions. The gure on the next page shows the mo di ed pip eline datapath.
Replace this page with displacement pip eline
(e) Arithmetic-logical then Arithmetic-logical
(f ) Arithmetic-logical then Store
(g) Arithmetic-logical then Branch
b) Remove as many of these hazards as you can, but you are limited to changes in the datapath from the following list:
Do not worry ab out the control of any changes. In the table b elow, list the original hazard, hardware changes, and why the change resolves the hazard.
Hazard Hardware Changes Why It Resolves
The I/O bus and memory system of a computer are capable of sustaining 1000 MB/s without interfering with the p erformance of an 700-MIPS CPU (costing $50,000). This system will b e used as a transaction pro cessing (TP) system. TP involves many relatively small changes (transactions) to a large b o dy of shared information (the database account le). For example, airline reservation systems as well as banks are traditional customers for TP. Here are the assumptions ab out the software on the system that will execute a TP b enchmark:
Each transaction requires 2 disk reads plus 2 disk writes.
The op erating system uses 50,000 instructions for each disk read or write.
The database software executes 500,000 instructions to pro cess a transaction.
The amount of data transferred p er transaction is 2048 bytes.
You have a choice of two di erent typ es of disks:
A small disk (2.5") that stores 1000 MB and costs $60.
A big disk (3.5") that stores 2500 MB and costs $150.
Either disk in the system can supp ort on average 100 disk reads or writes p er second.
You wish to evaluate di erent system con gurations based on a transaction pro cessing b enchmark that uses a 20 GB database account le. Answer parts (a){(e) based on this b enchmark. Assume that the requests are spread evenly to all the disks, and that there is no waiting time due to busy disks. Show all work for all parts.
a) Complete the table b elow. \Numb er of Units" refers to the minimum numb er of that item required for each organization; \Demand p er Transaction" refers to the demand (in MIPS, bytes, or I/Os) that each transaction places on that comp onent; and \TP/s Limit" refers to the maximum numb er of transactions p er second that each subsystem (pro cessor, bus, or disks) could supp ort.
Units Performance Numb er of Units Demand p er Transaction TP/s Limit
Bus 1000 MB/s 1 bytes
2.5" disks 100 IOs/s I/Os
3.5" disks 100 IOs/s I/Os
b) How many transactions p er second are p ossible with each disk organization, assuming that each uses the minimum numb er of disks to hold the account le?
c) What is the system cost p er transaction p er second of each alternative for the b enchmark?