









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Different ppt on the subject. MCQs on the subject
Typology: Slides
1 / 15
This page cannot be seen from the preview
Don't miss anything!
In Lecture 1 we learnt that the separation of data from control helped simplify the definition and design of sequential circuits, particularly when there were many registers involved only with storage.
We saw that the main memory, as well as holding plain data, could hold the program data — the ordered list of instructions that specify what you want to machine to do.
We speculated that when an instruction is read from memory, it could be passed im- mediately to the control part of the machine in order to change the effective transfer function of the data section.
In this lecture we develop the detailed organization of the CPU to support that idea.
Rather than introduce individual components separately and then stick them together, we will dive in at the deep end by revealing a “Bog Standard Architecture” for the CPU.
The CPU contains
1
2/2 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE
SETalu
Address Bus
Data Bus
CLKmem
IR(opcode) IR(address)
Status
Memory
Control Lines
INCpc/LOADpc
to Registers, ALU, Memory, etc
Figure 2.1: Our Bog Standard Architecture
2.1.1 CPU Registers
K MAR The Memory Address Register is used to store the address to access memory.
K MBR The Memory Buffer Register stores information that is being sent to, or received from, the memory along the bidirectional data bus.
K AC The Accumulator is used to store data that is being worked on by the ALU, and is the key register in the data section of the cpu. Notice that the memory can’t access the AC directly. The MBR is an intermediary.
K PC Connected to the internal address bus, the Program Counter holds the address in memory of the next program instruction. Notice that it doesn’t connect directly to the memory, but must go via the the MAR. (The PC is special in that it is both a register and a counter. More later.)
K IR When memory is read, the data first goes to the MBR. If the data is an instruction it gets moved to the Instruction Register. The IR has two parts:
2/4 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE
K ALU The Arithmetic Logic Unit is responsible for bit operations on data held in the AC and MBR and for storing the results. It contains arithmetic adders, logical AND-ers and OR-ers, and so on. A special requirement in our architecture is a “null operation” or “no-op” which simply allows the output of the AC to appear at the output of the ALU. Again we will detail its hardware later.
K SR Closely associated with the ALU is the Status Register or Condition Control Word or Status Word. It is not quite the same as the other registers in that it really just a collection of 1bit flags that indicate the outcome of operations that the ALU has just carried out. There are the flags (you met in P2) Carry C, Overflow V flags, negative flag N, and zero flag Z. These are monitored by the CU.
2.1.3 Buses, registers, and their widths
The buses carry words of information which are many bits wide, and on diagrams a bus is indicated either by a wide line, or by a single line with a dash through it often accompanied by the bus width in bits.
Data: Microcontrollers have data bus widths of 4 bits, 8-bits, 16-bits and 32-bits, while the most advanced PCs use 64 bits. In these lectures we will assume that the “memory width” is 16 bits or 2 Bytes. This means that each location can store 2 Bytes. We will also assume that the data bus is 16 bits wide, and the MBR and AC registers on the data side of the CPU are therefore also 16 bits wide. The ALU is also 16 bits wide.
16bits
16bits
16 16bits
AC
MBR Memory MBR
(a) (b)
Figure 2.3: (a) The data side is 2 Bytes or 16 bits wide. The ALU has been omitted here, but is also 16 bits wide. (b) You should not think that the MBR register (for example) has grown multiple electrical inputs. The actual wiring involves tri-state buffers, as becomes clear in Lecture 3.
Address: The address bus does not have to be the same width as the data bus. The width on CPUs over time has increased in step with contemporary memory technology,
2.1. A BOG STANDARD ARCHITECTURE 2/
with the the Intel 8086 (from 1979) having n = 20 address lines to current processors having n = 36 − 40.
Having n address lines means that that there are 2 n^ addresses or locations in the address space. A convenient method of figuring out 2 n^ is to remember that 210 = 1024, so n = 10 lines address 1K locations, n = 20 lines address 1M locations, and n = 30 can address 1G locations. Of course microcontrollers tend to have a smaller amount of memory, because they are not designed to multitask (i.e., run multiple programs), and 256K locations is the largest number spotted (in 2010).
However, for lecturing purposes it is useful (i) to have different numbers on the address and data side, and (ii) to keep things in multiple of 8 — so here we will assume a 24 bit address bus, able to access 16M location. (Note this is not necessarily 16MByte of memory. Why not?) The PC, SP, and MAR in our cpu will therefore be 24 bits wide.
24bits 24bits
24bits (^2) locations
SP
MAR
PC Memory
Figure 2.4: The address side is 24 bits or 3 Bytes wide. The address space has 224 locations.
The IR is special. The IR (opcode) part should be wide enough to take the largest opcode. We assume the opcode is a fixed 8 bits wide, allowing 256 different instructions — which is plenty enough. The IR (address) part has to have the same width as the address bus, 24 bits. So the whole IR is 32 bits wide.
It is however fed from the internal data bus which is only 16 bits wide in our architecture. We will return to solve this conundrum in §2.6.1.
IR(opcode) IR(address) MBR
Figure 2.5: The IR must be 8 + 24 = 32 bits width.
2.3. FETCHING AND DECODING AN INSTRUCTION 2/
Fetching and an Executing an instruction simply require the CPU’s Control Section to issue Levels and Pulses which set up pathways and fire register transfers so that
If you are in need of an analogy, we are doing little more than “playing trains” with data. The Control Secton uses Levels to “set the points” and create the route between A and B, and uses a Pulse to send the train from A to B.
To start processing the cpu needs to fetch the first instruction in the program from the main memory. The Program Counter is the key register here. The PCalways holds the address of the next program instruction in the main memory. It is said to point to the next instruction^1. But remember that the memory address register acts as a gatekeeper to the memory, so the first thing to happen is that the program counter gets copied into the memory address register. The register transfer is
MAR←PC
Because it is the MAR that is clocked, this leaves the PC unaltered. Now read the memory into the MBR.
MBR←〈MAR 〉
The next step is to copy the instruction from the MBR to the instruction register.
IR←MBR
In our standard architecture the IR is split into two parts, IR (opcode) and IR (address). As far as the instruction fetch is concerned it the IR (opcode) that is important. The opcode is decoded by the control unit, as described later.
Last comes a touch of housekeeping. Usually the next instruction in the program is located in the next memory location, so the program counter is incremented.
PC←PC + 1 (^1) Any memory address points to the memory contents at that address.
2/8 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE
So to summarize, the instruction fetch requires the following in RTL, where you should note that the program counter can be incremented at the same clock tick as loading the instruction register.
Instruction fetch
CPU
Outside the CPU
SETalu
Address Bus
Data Bus
CLKmem
SP
MAR
AC
IR(opcode) IR(address)
Status
MBR IR
CU ALU
Memory
Control Lines
INCpc/LOADpc^ PC
to Registers, ALU, Memory, etc
Our CPU uses 8-bit opcodes, so could distinguish 256 different instructions. For the purpose of explanation we give just nine from our instruction set. Column 1 contains the assembler language mnemonic, which is shorthand for several lines of RTL. Column 2 gives an overall “RTL-like” description. Column 3 is the binary opcode.
Inst Overall RT Opcode Meaning HALT 00000000 Stop the clock LDA x AC←〈x〉 00000001 Load AC with contents of mem address x STA x 〈x〉 ←AC 00000010 Store AC in memory at address x ADD x AC←AC + 〈x〉 00000011 Add mem contents at x to AC AND x AC =AC ∧ 〈x〉 00000100 Logical and ... JMP x PC←x 00000101 Jump to instruction at address x BZ x if Z=1 then PC←x 00000110 if Z-flag is set then jump NOT AC←AC 00000111 Two’s complement the AC SHR AC←RightShift(AC) 00001000 Shift the AC 1bit to right
An assembler language is designed around a particular cpu, and there is no standard set of mnemonics. However, once you understand the purpose of the instructions, it is trivial to convert between languages.
During the instruction fetch, an opcode is put into the IR (opcode), and is decoded by the control unit (exactly how we will see later). The CU now “knows” which instruction it should execute, and can therefore output a sequence of of Levels and Pulses to set up paths and effect the desired the register transfers.
2/10 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE
2.5.2 Example of LDA x fetch and execute.
Figure 2.7 gives an example of the information flows in the entire fetch and execute of LDA x. In the example, the instruction is found in memory location 2, and the actual instruction is LDA 5.
don’t care
don’t care
don’t care
don’t care
AC=527 dec
LDA x So, MAR<−IR(address) MBR<−M
INC
MAR=
Memory
00000101
PC=2 MAR= PC=
00000000 00000101
5 4 3 2 1
IR(opcode) IR(address)^0
00000010 00001111
MBR=
MBR= 00000000 00000101
00000010 00001111
00000001
Figure 2.7: Example of the Fetch and execute of the LDA x instruction with x=5 and starting with PC =2. The steps follow a,b,c, ...,i.
(a) During the fetch, MAR←PC
(b) Addressing location 2
(c) Reading the memory MBR←〈MAR 〉
(d) Now the MBR is transferred to the IR.
(e) The last part of the fetch is to increment the PC.
(f) Decode, then first step of execute is MAR←IR (operand)
(g) Now addressing location 5.
(h) Reading the memory MBR←〈MAR 〉 again.
(i) Now transfer to the Accumulator AC←MBR.
2.5. EXECUTING AN INSTRUCTION 2/
2.5.3 Execute STA x
Store the contents of the AC in the memory location x which is held in the IR (operand).
STA x
CPU
Outside the CPU
SETalu
Address Bus
Data Bus
CLKmem
SP
MAR
AC
IR(opcode) IR(address)
Status
IR MBR
CU ALU
Memory
Control Lines
INCpc/LOADpc^ PC
to Registers, ALU, Memory, etc
2.5.4 Execute ADD x
Add the memory contents of x to the AC. This involves the ALU. Looking back at the BSA diagram we see that the ALU has two inputs one from the AC and the other from the MBR. To perform this line of RTL we must, before firing the transfer, set up the ALU to perform the binary addition of its inputs, rather than say the logical AND or logical OR. As you might guess, and as we will see in Lecture 4, this requires the configuration of combinational circuitry using a level signal. Obviously we are going to need a CSL in addition to a CSP for this line of RTL.
ADD x
CPU
Outside the CPU
SETalu
Address Bus
Data Bus
CLKmem
SP
MAR
AC
IR(opcode) IR(address)
Status
IR MBR
CU ALU
Memory
Control Lines
INCpc/LOADpc^ PC
to Registers, ALU, Memory, etc
2.5.5 AND x
This again involves the ALU, but now we are performing a logical AND.
AND x
2.6. DECODING THE OPCODE 2/
Earlier on, when discussing the Instruction Fetch, we wrote “then decode the opcode”. Although it is common to talk of the fetch-execute cycle, it is useful to think about decoding as a separate distinct phase in a fetch-decode-execute cycle.
Now suppose that
then we could write a decoding step using RTL’s conditional goto (see lecture 1):
Decoding (NB! this is RTL)
where 10, etc, are the first lines of the RTL which execute the respective instruction.
2.6.1 Decoding detail — engineering to the rescue
So far, so good. However, in our BSA, the MBR and data bus are 16 bits wide. Hence, IR ←MBR can only supply the 8 bit opcode with an 8 bit operand. If operands are only 8 bits long, we can only access 256 of our 224 locations. How can we fill the operand up to its full 24 bits?
The engineered solution is to make the LDA instruction (and any other instruction that needs a full operand) to read the next 16-bit word of memory into the MBR, and then into the IR (address), building up a 24 bit address for transferral to the MAR.
In our instruction set the first 6 opcodes LDA, STA, ADD, AND, JMP, BZ require this extra read of memory. We could write the decoding stage in RTL as follows:
Longer Decoding
This solution ef-
fectively divides up the opcodes into two sets: those that need full operands and those that don’t.
2/14 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE