Download Multiprocessors and Process Synchronization and Consistency and more Study Guides, Projects, Research Advanced Computer Architecture in PDF only on Docsity!
Shared Memory Multiprocessors
- Introduction
- UMA systems
- NUMA systems
- COMA systems
- Cache coherency
- Process synchronization
- Models of memory consistency
Shared memory multiprocessors
- A system with multiple CPUs “sharing” the same main memory is called multiprocessor.
- In a multiprocessor system all processes on the various CPUs share a unique logical address space , which is mapped on a physical memory that can be distributed among the processors.
- Each process can read and write a data item simply using load and store operations, and process communication is through shared memory.
- It is the hardware that makes all CPUs access and use the same main memory.
Shared memory multiprocessors
- Since all CPUs share the address space, only a single instance of the operating system is required.
- When a process terminates or goes into a wait state for whichever reason, the O.S. can look in the process table (more precisely, in the ready processes queue) for another process to be dispatched to the idle CPU.
- On the contrary, in systems with no shared memory , each CPU must have its own copy of the operating system, and processes can only communicate through message passing.
- The basic issue in shared memory multiprocessor systems is memory itself, since the larger the number of processors involved, the more difficult to work on memory efficiently.
Shared memory multiprocessors
- All modern OS (Windows, Solaris, Linux, MacOS) support
symmetric multiprocessing , ( SMP ), with a scheduler
running on every processor (a simplified description, of
course).
- “ready to run” processes can be inserted into a single queue,
that can be accessed by every scheduler, alternatively there
can be a “ready to run” queue for each processor.
- When a scheduler is activated in a processor, it chooses one
of the “ready to run” processes and dispatches it on its
processor (with a single queue, things are somewhat more
difficult, can you guess why?)
Shared memory multiprocessors
- Modern OSs designed for SMP often have a separate queue
for each processor (to avoid the problems associated with a
single queue).
- There is an explicit mechanism for load balancing, by which
a process on the wait list of an overloaded processor is
moved to the queue of another, less loaded processor.
- As an example, SMP Linux activates its load balancing
scheme every 200 ms, and whenever a processor queue
empties.
Shared memory multiprocessors
- Migrating a process to a different processor can be costly
when each core has a private cache (can you guess why?).
- This is why some OSs, such as Linux, offer a system call to
specify that a process is tied to the processor, independently
of the processors load.
- There are three classes of multiprocessors, according to the
way each CPU sees main memory:
Shared memory multiprocessors
2. Non Uniform Memory Access (NUMA) : these systems have a shared logical address space, but physical memory is distributed among CPUs, so that access time to data depends on data position, in local or in a remote memory (thus the NUMA denomination)
- These systems are also called Distributed Shared Memory (DSM) architectures (Hennessy-Patterson, Fig. 6.2)
Shared memory multiprocessors
3. Cache Only Memory Access (COMA) : data have no specific “permanent” location (no specific memory address) where they stay and whence they can be read (copied into local caches) and/or modified (first in the cache and then updated at their “permanent” location).
- Data can migrate and/or can be replicated in the various memory banks of the central main memory.
UMA multiprocessors
- Larger multiprocessor systems (>32 CPUs) cannot use a single bus to interconnet CPUs to memory modules, because bus contention becomes un-manegeable.
- CPU – memory is realized through an interconnection network (in jargon “fabric”).
UMA multiprocessors
- Caches local to each CPU alleviate the problem, furthermore each processor can be equipped with a private memory to store data of computations that need not be shared by other processors. Traffic to/from shared memory can reduce considerably (Tanenbaum, Fig. 8.24)
UMA multicores - manycores
shared RAM CORE L1 cache L2 cache Shared L3 cache CORE L1 cache L2 cache CORE L1 cache L2 cache CORE L1 cache L2 cache
multicores: 2 ÷ 22 manycores:^ ∼^70
Shared LLC cache shared RAM CORE L1 cache L2 cache CORE L1 cache L2 cache CORE L1 cache L2 cache CORE L1 cache L2 cache
Caches and memory in
multiprocessors
- Memory (and the memory hierarchy) in multiprocessors poses two different problems:
- Coherency: whenever the address space is shared – the same memory location can have multiple instances (cached data) at different processors
- Consistency: whenever different access times can be seen by processors – write operations from different processors require some model for guaranteeing a sound, consistent behaviour ( the when issue – namely, the ordering of writes)
Crossbar switch UMA systems
- A switch is located at each crosspoint between a vertical and a horizontal line, allowing to connect the two, when required.
- In the figure, three switches are closed, thus connecting CPU- memory pairs (001-000), (101-101) and (110-010). (Tanenbaum, Fig. 8.27)
Crossbar switch UMA systems
- It is possible to configure the switches so that each CPU can connect to each memory bank (and this makes the system UMA)
- The number of switches for these scheme scales with the number of CPUs and memories; n CPU and n memories require n 2 switches.
- This pattern fits well medium scale systems (various multiprocessor systems from Sun Corporation use this scheme); certainly, a 256- processor system cannot use it ( 2 switches would be required !!).