Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Three Engines And Interface With Memory-Advance Computer Architecture-Lecture Slides, Slides of Advanced Computer Architecture

This course focuses on quantitative principle of computer design, instruction set architectures, datapath and control, memory hierarchy design, main memory, cache, hard drives, multiprocessor architectures, storage and I/O systems, computer clusters. This lecture includes: Three, Engines, Interface, Memory, Architecture, Fetch, Innstructions, Cache, Bytes, Decoder

Typology: Slides

2011/2012

Uploaded on 08/06/2012

amrusha
amrusha 🇮🇳

4.4

(32)

149 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Intel P-VI: Three Engines and Interface
with Memory
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download Three Engines And Interface With Memory-Advance Computer Architecture-Lecture Slides and more Slides Advanced Computer Architecture in PDF only on Docsity!

Intel P-VI: Three Engines and Interface

with Memory

Intel P-VI: Major Units

The FETCH/DECODE unit:

- An in-order unit that takes as input the user program instruction stream from the instruction cache, and - decodes them into a series of μ -operations ( μops ) that represent the dataflow of that instruction stream

The pre-fetch is speculative

Intel P-VI: Major Units

The BUS INTERFACE unit:

- The bus interface unit communicates directly with the L2 (second level) cache supporting up to four concurrent cache accesses. - The bus interface unit also controls a transaction bus, with MESI snooping protocol, to system memory

Intel P-VI: Inside Fetch

Intel P-VI: Inside Fetch

The μops are queued, and sent to the Register Alias Table (RAT) unit, where

the logical Intel Architecture-based register references are converted into references to physical registers in P6 family processors physical register references

μopa are entered into the instruction pool

The instruction pool is implemented as an array of Content Addressable Memory called the Re-Order Buffer (ROB).

Intel P-VI: Inside Dispatch /Execute

Intel P-VI: Inside Dispatch /Execute

The results of the μop are later returned to the pool

There are five ports on the Reservation Station, and the multiple resources are accessed as shown

The P6 family of processors can schedule (in an out-of-order fashion) at a peak rate of 5 μops per clock, one to each resource port, but a sustained rate of 3 μops per clock is more typical

Intel P-VI: Inside Dispatch /Execute

Note that many of the μops are branches

The Branch Target Buffer (BTB) will correctly predict most of these branches

Branch μops are tagged (in the in-order pipeline) with their fall-through address and the destination that was predicted for them

…. Inside dispatch/execute cont’d

Intel P-VI: Inside Retire

The Retire Unit is also checking the status of μops in the instruction pool

Once removed, the original architectural target of the μops is written as per the original Intel Architecture instruction.

The Retire Unit must also re-impose the original program order on them

… Cont’d

Intel P-VI: Inside Retire

The Retire Unit must first read the instruction pool to find the potential candidates for retirement and determine which of these candidates are next in the original program order

Then it writes the results of this cycle’s retirements to the Retirement Register File (RRF).

The Retire Unit is capable of retiring 3 μops per clock.

Intel P-VI: Bus Interface Unit

Loads are encoded into a single μop.

Stores therefore require two μops, one to generate the address and one to generate the data. These μops must later re - combine for the store to complete.

Stores are never performed speculatively since there is no transparent way to undo them

Stores are also never re-ordered among themselves

Intel P-VI: Bus Interface Unit

A store is dispatched only when both the address and the data are available and there are no older stores awaiting dispatch

A study of the importance of memory access reordering concluded:

- Stores must be constrained from passing other stores, for only a small impact on performance. - Stores can be constrained from passing loads, for an inconsequential performance loss.

Summary

Today we have studied four advance computer architecture: PowerPC 750 and 970 FX Intel P-VI

With this we have completed our discussion on all topic of Advanced Computer Architecture

Next time, in the last lecture we will review all concepts we have studied in our earlier lectures

Till then

Thanks

and

Allah Hafiz