









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Brief Introduction to Cache Memory in Computer Organisation and Architecture
Typology: Lecture notes
1 / 16
This page cannot be seen from the preview
Don't miss anything!
๏ The maximum size of the memory that can be used in any computer is determined by the addressing scheme. ๏ For example, a 16-bit computer that generates 16-bit addresses is capable of addressing up to 2 16 = 4
10 = 64KB memory locations. ๏ Similarly, machines whose instructions generate 32-bit addresses can utilize a memory that contains up to 2 32 = 2 2
from the memory it takes 10 ns (The time taken to access the memory location (data + instruction) is known as Memory Access time). ๏ There is a mismatch of the speed between the processor and the memory. So, after processing the instruction, the processor will be in an idle state (Stall) for 5ns, leads to decrease the throughput of the processor. ๏ The processor of a computer can usually process instructions and data faster than they can be fetched from the memory, so we can say that the processor is a fast unit as compared to memory (slow). ๏ The memory cycle time is the bottleneck in the system. ๏ One way to reduce the memory access time is to use a cache memory. ๏ Cache memory is the fastest system memory, required to keep up with the CPU as it fetches and executes instructions. The data most frequently used by the CPU is stored in cache memory. The fastest portion of the CPU cache is the register file, which contains multiple registers. Registers are small storage locations used by the CPU to store instructions and data. ๏ Virtual memory is another important concept in related to memory organization. It is used to increase the apparent size of the physical memory. Data are addressed in a virtual address space that can be as large as the addressing capability of the processor. But at any given time, only the active portion of this space is mapped onto locations in the physical memory. The remaining virtual addresses are mapped onto the bulk storage devices used, such as magnetic disks.
๏ When the required data is not present in the Main Memory then itโs a miss then the reference will be forward to Secondary storage (HDD). ๏ Secondary storage is the final memory in the computer system. The required data will always be hit in Secondary storage. So, the required data will be transferred to Main Memory in the form of Pages, Main Memory to Cache in the form of blocks, Cache to CPU in the form of words.
๏ To align with the processor speed, cache memories are very small so that it takes less time to find and fetch data. They are usually divided into multiple layers based on the architecture. The size of cache should accommodate the size of blocks of the Main Memory.
= 4(2 bits are required) Cache Memory No. of Cache blocks (Cache line/Line no.) ๏ Line Offset Main Memory (MM) Size = 16 byte B Block size/Offset size/Word size = 2 byte No. of Main Memory blocks = Main Memory Size / Block size = 16 / 2 = 8(3 bits are required) Tag B Main Memory
๏ The process of transferring the data from Main Memory to Cache Memory is known as cache mapping.
Block Size Block Size
Block size = 4 byte ๏ So, 2 bits (LSB) will be required to represent the Block size and 5 bits (MSB) will be required to represent the Block No (Tag). ๏ The MM block size is equivalent to Cache memory block size, ๏ Line offset = CM Size/Block size & the remaining bits will represent the Tag. 00010(MSB) 5 bits 10(LSB) 2 bits Block No. 000(MSB) 3 bits 10 2 bits 10(LSB) 2bits ๏ W10 ๏ B ๏ If block no. W10 (B2) is present in the cache memory and its tag bit is matched with the processor generated request (Tag) then it is cache Hit, otherwise itโs a cache miss, so the B2 must be transferred from MM to Cache Memory. ๏ If CPU generated Memory address tag (Ex-: CPU generated Memory request๏ 00010 10, Tag๏ 000 ) is not matched with the tag (001) of the cache memory then itโs a Cache Miss. Types of cache Miss ๏ Compulsory Miss, First access to a cache will cause a miss(cache is initially empty) ๏ Capacity misses occur when the cache is too small to hold all concurrently used data. ๏ Conflict misses (Drawback) are caused when several addresses map to the same set and evict blocks that are still needed. Changing cache parameters can affect one or more types of cache miss. LO
We know that MM block can map only to a particular line offset of the cache Memory. So, B1 can be mapped to L1 by the formula K mod n. Next iteration B5 can be mapped to L1 by removing B1. Next iteration B9 can be mapped to L by removing B1.Next iteration B1 again mapped to L1 by removing B9.B1 was initially present in the cache memory but we have removed it because of B5 that create cache miss. If we see L0, L2 & L3 are empty but we are unable to utilize them which results into conflict miss.
๏ In direct mapping, there is no need of any replacement algorithm. ๏ This is because a main memory block can map only to a particular line of the cache and the position of each block is predetermined. ๏ Thus, the new incoming block will always replace the existing block (if any) in that particular line.
๏ In this type of mapping, a block of main memory can map to any line of the cache that is freely available at that moment. ๏ This means that the word id bits are used to identify which word in the block is needed, but the tag becomes all of the remaining bits. This enables the placement of any word at any place in the cache memory at the cost of the size of the Tag bit. ๏ It is considered to be the fastest and the most flexible mapping form. Cache Memory Tag Line-Offset Block Size 000 L0(00) B 000 L1(01) B 000 L2(10) B 000 L3(11) B (Tag = Tag+ Line-Offset) Tag Block Size 00000 B 00001 B 00010 B 00011 B Cache size = 16 byte Block size =4 byte 7 bits Tag Block Size 5 2 Main memory Tag Block Size 00000(B0) W0, W1, W2, W 00001(B1) W4, W5, W6, W 00010(B2) W8, W9, W10, W 00011(B3) W12, W13, W14, W 00100(B4) W16, W17, W18, W 00101(B5) W20, W21, W22, W 00110(B6) W24, W25, W26, W 00111(B7) W28, W29, W30,W
.. .. 11111(B31) W124,W125,W126,W Main memory size = 128 byte (7bits) Block size = 4 byte = 2 2 No. of blocks =MM size / Block size = 128 / 4 = 32 = 2 5 7 bits No. of blocks Block Size 5 2
7 bits Tag Set No. Block Size 4 1 2
5 7 bits No. of blocks Block Size 5 2 ๏ If all the cache lines are occupied, then one of the existing blocks will have to be replaced. Need of Replacement Algorithm- ๏ Set associative mapping is a combination of direct mapping and fully associative mapping. ๏ It uses fully associative mapping within each set. ๏ Thus, set associative mapping requires a replacement algorithm. Q1.> Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main memory is 128 KB. Find Number of bits in tag and Tag directory size? Soln: Cache memory size = 16 KB = 2 14 Block size = Frame size = Line size = 256 bytes = 2 8 Main memory size = 128 KB = 2 17 Number of bits in physical address = 17 bits Line offset = CM size / Block size = 2 14
8
6 lines =6 bits required. 3 6 8 Cache Memory 17 Number of Bits in Tag =17 -14 = 17 Main Memory 9 8 Tag directory size = Number of tags X Tag size Tag Line Offset Block Size Tag Block Size
= Number of lines in cache X Number of bits in tag = 2 6 X 3 bits = 64 X 3 bits = 192 bits = 192/8 = 24 bytes
๏ Since size of cache memory is less as compared to main memory. So to check which part of main memory should be given priority and loaded in cache is decided based on locality of reference. Types of Locality of reference Spatial Locality of reference(Space) ๏ This says that there is a chance that word present in the close proximity to the reference word will be accessed next. Temporal Locality of reference(Time) ๏ This says that if a word is referenced now then same word will be referenced in the near future. ๏ In this, least recently used (LRU) algorithm will be used.
๏ In case of a cache miss, a replacement policy needs to be followed, to decide which block in the corresponding set will be replaced. This requires additional decision hardware. There are several possible replacement policies: LRU (Least Recently Used): ๏ Replace the block which was least recently referenced (more complex and expensive hardware, but lower miss rate, assuming that the most recently referenced words are most likely to be referenced again in the near future). FIFO(First In First Out): ๏ Replace the block that was replaced least recently (so, blocks are replaced based on the order in which they were copied, rather than accessed).
๏ Initially all slots are empty, so when 7, 0, 1, 2 are allocated to the empty slots so, 4 Cache Miss. ๏ 0 is already their so itโs a cache Hit. ๏ When 3 came it will take the place of 7 because it is least recently used so its cache Miss. ๏ 0 is already in cache memory so itโs a cache Hit. ๏ 4 will takes place of 1, so its cache Miss. ๏ Now for the further page reference string, itโs a cache Hit because they are already available in the cache memory.
๏ Whenever a Processor wants to Read/write a word, it checks to see if the address it wants to Read/write the data to, is present in the cache or not. ๏ If address is present in the cache i.e., Read/write Hit. ๏ In Read operation, the Main memory is not involved. ๏ In Write operation, the Main memory is involved. ๏ We can update the value in the cache and avoid an expensive main memory access. As both cache and main memory have different data at same memory location, it will 0 3 0 3 Data at location 10101100 not updated in the Main Memory Simultaneously. 1 2 3 4
cause problem in two or more devices sharing the main memory (as in a multiprocessor system).This property is called as coherence/ cache coherence. ๏ This results in Inconsistent Data Problem. ๏ In write operation, the system can proceed in two ways i.e Write Through and Write Back protocol. Write Through Protocol ๏ In write through, data is simultaneously updated to cache and memory. This process is simpler and more reliable. This is used when there are no frequent writes to the cache (Number of write operation is less). ๏ It helps in data recovery (In case of power outage or system failure). ๏ A data write will experience latency (delay) as we have to write to two locations (both Memory and Cache). ๏ It solves the inconsistency problem. Write Back Protocol ๏ The data is updated only in the cache and updated into the memory in later time. Data is updated in the memory only when the cache line is ready to replaced (cache line replacement is done using Least Recently Used Algorithm, FIFO, LIFO and others depending on the application).
๏ Here the CPU at first checks whether the desired data is present in the Cache Memory or not i.e. whether there is a โhitโ in cache or โmissโ in cache. Suppose there are 3 miss in Cache Memory then the Main Memory will be accessed only 3 times. ๏ Here the Cache performance is optimized further by introducing multilevel Caches. ๏ We are considering 2 levels Cache Design. Suppose there are 3 miss in the L1 Cache Memory and out of these 3 misses there are 1 miss in the L2 Cache Memory then the Main Memory will be accessed only 1 times. It is clear that here the Miss Penalty is reduced considerably than that in the previous case thereby improving the Performance of Cache Memory.