Download MIPS Instruction Set Architecture and Computer Systems Performance and more Exams Computer Architecture and Organization in PDF only on Docsity!
University of California
College of Engineering
Computer Science Division -EECS
Sp 1996 D.E. Culler
CS 152 Midterm I
Your Name:________SOLUTION_______________
ID Number:_______________________________________________________
Discussion Section:__________________________________________________
You may bring two pages of notes and you may use a calculator, but no book or computer.
Please print you name clearly on the cover sheet and on every page. The point value of
each question is indicated in brackets. There are a total of 120 points. You have 170 min-
utes. Show your work. Write neatly and be well organized. It never hurts to make it easy to
grade.
Good luck.
Problem Possible Score
Total 120
Problem 1 (40 points)
1a [3] State the five major components of a computer.
Processor datapath
Processor Control
Memory
Input
Output
1b [5] State five major distinct issues that must be addressed in an instruction set architec-
ture.
programmable storage
data types and encodings
set of operations
instruction formats
number of operands
where besides memory can operands be located
how memory operands are speified (addressing modes)
1c [2] Define Little Endian.
word is addressed by the byte address of the least significant byte (least
significant byte is at lowest address in the word.)
1d[3] Decode the following MIPS instruction using the opcode encoding table at the end
of the exam (Fig A.18) 10001100111001110000000000000111. Give its RTL (register
transfer language) meaning.
lw $7, 7($7) R[7] <โ mem(R[7] + 7)
1e [3] What is the value of 1000 1100 1110 0111 0000 0000 0000 0111 as a 32bit 2s com-
plement number?
number is negative so, comp+1=> 0111 0011 0001 1000 1111 1111 1111 1001
-(2^30 + 2^29 + 2^28 + 2^25 + 2^24 + 2^20 + 2^19 + 2^16 - 7) = 1931018233
1f [3] What is the value of 10001100111001110000000000000111 as a single-precision
IEEE floating-point number?
1k[5]. Assume the NAND Gate has the following characteristics: Input load = 100fF,
propagation delay low-to-high TPlh = 0.5ns, TPhl = 0.1ns, TPlhf=0.002 ns/fF,
TPhlf=0.002ns/fF. Identify the critical path in the following cell and fully characterize it
using the linear delay model.
Input load is 100fF on A and B
TPlhf=0.002 ns/fF, TPhlf=0.002ns/fF
Fixed internal delay is a little wierd because output ends up being high, after a
glitch in one case. The important part was understanding the calculation
TPcell = (TPnand + 3 * TPfnand ) + (TPnand + 1* TPnand f) + TPnand
This gives
TPlh = (0.5 + 0.6) + (0.1 + 0.2) + 0.5 = 1.9 (or 0, since no change on output)
TPhl = (0.1 + 0.6) + (0.5 + 0.2) + 0.1 = 1.5 (glitches, then settles)
1l[3] Give the definition of speedup due an enhancement.
Speedup with E = (Time without E) / (Time with E)
= (Performance with E) / (Performance without E)
A
B
Q
Problem 2 (20 points).
This problem looks at performance and cost in the real world. The Feb 20, 1996 issue of
PC Magazine provides the following data in its โPentium or Pro?โ cover story.
CPUMark 32 and CPUMark 16 are indicators of performance (speed) similar to SPECMarks
(bigger is better) for 32 and 16 bit programs.
2a. [3] Assuming CPUmarks are indicative of performance on real programs on these
machines, how much faster is the Pentium Pro on 32 bit code? Show your work.
(a) 0.
(b) 0.
(c) 1.
(d) 1.
2b. [3] Assuming CPUmarks are indicative of performance on real programs on these
machines, how much faster is the Pentium Pro on 16bit code? Show your work.
(a) 0.
(b) 0.
(c) 1.
(d) 1.
2c. [2] How much faster in performance per dollar is the Pro on 32 and 16 bit code?
32bit:(430/7800) / ( 270/3750) = 0.76 times faster
16bit:(270/7800) / ( 276/3750) = 0.47 times faster
Pretty sad, isnโt it.
Pentium (P5) Pentium Pro (P6) Clock Rate 150 MHz 150 MHz Transistor Count 3.3 M 5.5 M Ave. System Price $3,750 $7, CPUMark 32 273 430 CPUMark 16 276 270
Speedup = Performance Pro / Performance P5= 430/
Speedup = Performance Pro / Performance P5= 270/
Problem 3. (20 points) Complete the skeleton of MIPS assembly language (with delayed
branches) below for the following C function. (Underlines may not be exactly right.)
extern int f (int);
int foo(int *A, int n) { int i = 0; int sum = 0; for (i = 0; i<n; i++) { sum = sum + f(A[i+1]); } return(sum); }
foo: subu $sp,$sp, sw $31,32($sp) sw $19,28($sp) # save A in $ sw $18,24($sp) # save n in $ sw $17,20($sp) # i in $ sw $16,16($sp) # sum in $ mov $16, $0 ; sum = 0 (2 pts for initialization) mov $17, $0 ; i = 0 slt $2, $17,$5 # i=0 < n (1 pt) beq $2,$0,$L3 # delayed branch fall-through mov $19, $4 # setup A (2 pts) mov $18, $5 # setup n $L7: # top of loop addi $4, $17, 1 # i+1 (2 pts) sll $4, $17, 2 # convert index to byte address (2 pts) addi $4, $19, $4 # &A[i+1] (2 pts) lw $4, 0($4) # fetch A[i+1] into argument register (2 pts) jal f # delayed jump and link addu $16, $16, $2 # accumulate return value into sum (2 pts) slt $2, $17, $18 # i < n (2 pts) bne $2,$0,$L7 # delayed branch to top of loop addi $17, $17, 1 # i++ $L3: # fall-through mov $2, $16 # return sum (1 pt) lw $31, 32($sp) lw $19, 28($fp) (2 pts) lw $18,24($sp) lw $17,20($sp) lw $16,16($sp) j $31 # delayed return jump addu $sp,$sp, .end foo
Problem 4 (20 points):
The Single-Cycle processor developed in class below (which was very similar to the one
in the book) supports the following instructions. (Note that, as in the virtual machine, the
branch is not delayed.)
Consider adding the following instructions to our subset: ADDIU, OR, AND, BLTZAL
(branch on less than Zero and Link). On the following pages, write the register transfers
for the new instructions. Sketch the modifications to the datapath and specify the control
points for each of the new instructions.
op | rs | rt | rd | shamt | funct = MEM[ PC ] op | rs | rt | Imm16 = inst Register Transfers ADDU R[rd] <โ R[rs] + R[rt]; PC <โ PC + 4 SUBU R[rd] <โ R[rs] โ R[rt]; PC <โ PC + 4 ORi R[rt] <โ R[rs] + zero_ext(Imm16); PC <โ PC + 4 LOAD R[rt] <โ MEM[ R[rs] + sign_ext(Imm16)]; PC <โ PC + 4 STORE MEM[ R[rs] + sign_ext(Imm16)] <โ R[rs]; PC <โ PC + 4 BEQ if ( R[rs] == R[rt] ) then PC <โ PC + sign_ext(Imm16)] || 00 else PC <โ PC + 4
32
ALUctr
Clk
busW
RegWr
32
32
busA
32
busB
5 5 5
Rw Ra Rb 32 32-bit Registers
Rs
Rt
Rt
Rd
RegDst
Extender
Mux
imm16 16 32
ExtOp ALUSrc
Mux
MemtoReg
Clk
Data In
(^32) WrEnAdr
Data Memory
MemWr
ALU
Equal
Instruction<31:0>
0
1
0
1
1 0
<21:25><16:20><11:15><0:15>
Rt Rs Rd Imm
=
imm
Adder
Adder
PC
Clk
00 Mux
4
nPC_sel
PC Ext
Adr
Inst Memory
0
1
3
Problem 4 (cont)
TABLE 1.
Ext ALUsrc ALUCtr MemWr Mem2Reg PC2Reg RegDst RegWr nPC ADDIU sign 1 add 0 0 0 0 1 0 OR x 0 OR 0 0 0 1 1 0 AND x 0 AND 0 0 0 1 1 0 BLTZAL x x x x x 1 2 1 LT
Problem 5 (20 points)
5a [7] Write a MIPS subroutine to perform a unsigned 64-bit subtraction. The operands
are passed in registers A1:A0 and A3:A2 with the MSW in the higher numbered register.
The result should be returned in registers v1:v0 with the same convention. Explain on the
reverse side why your code works.
__________________________________________________________________
/* v1:v0 = a1:a0 -a3:a2 */
/* 4 instructions/cycles */
sltu t0, a0, a
subu v0, a0, a
subu v1, a1, a
subu v1, v1, t
5b.[13] In class (and in the book) we developed an unsigned multiplier that required 32
shift-and-add steps for a 32-bit multiply. We also developed a multi-bit shifter. Using
adders, registers, and multiplexors, design a 32-bit multiply unit that skips over sequences
of trailing zeros in the multiplier. Give the algorithm and the block diagram and explain
how it works.
You could do this with any of the four multipliers that we discussed in class. You
hang a piece of logic off the multiplier to determine the number of trailing zeros.
(This is essentially like a carry-chain, but simpler.) If you used a barrel shifter the
unary shift amount is exactly right, otherwise you need to do a unary-to-binary
conversion. If the entire multiplier is zero (32 zeros) it is time to stop the algorithm.
Otherwise, shift the multiplier right over the string of zeros and (logically) shift the
product left by this amount. If you used the third version of the multiplier-design,
this was just shifting the entire 64-bit product register right by the shift amount.
Whenever there is a one in the lsb of the multiplier, do the usual add step.