Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Mca 502, Lecture notes of Computer Architecture and Organization

Jawaharlal Nehru Technological University Computer Architecture and Organization

Notes - Notes

Typology: Lecture notes

2015/2016

Uploaded on 08/22/2016

Dr_Sunil.VK_Gaddam 🇮🇳

1 document

1 / 195

This page cannot be seen from the preview

Don't miss anything!

Author: Dr. Deepti Mehrotra Vetter: Dr. Sandeep Arya

Lesson: Parallel computer models Lesson No. : 01

1.1 Objective

1.2 Introduction

1.3 The state of computing

1.3.1. Evolution of computer system

1.3.2 Elements of Modern Computers

1.3.3 Flynn's Classical Taxonomy

1.3.4 System attributes

1.4 Multiprocessor and multicomputer,

1.4.1 Shared memory multiprocessors

1.4.2 Distributed Memory Multiprocessors

1.4.3 A taxonomy of MIMD Computers

1.5 Multi vector and SIMD computers

1.5.1 Vector Supercomputer

1.5.2 SIMD supercomputers

1.6 PRAM and VLSI model

1.6.1 Parallel Random Access machines

1.6.2 VLSI Complexity Model

1.7 Keywords

1.8 Summary

1.9 Exercises

1.10 References

1.0 Objective

The main aim of this chapter is to learn about the evolution of computer systems, various

attributes on which performance of system is measured, classification of computers on

their ability to perform multiprocessing and various trends towards parallel processing.

1.1 Introduction

From an application point of view, the mainstream of usage of computer is experiencing

a trend of four ascending levels of sophistication:

Partial preview of the text

Download Mca 502 and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!

Author: Dr. Deepti Mehrotra Vetter: Dr. Sandeep Arya Lesson: Parallel computer models Lesson No. : 01

1.1 Objective 1.2 Introduction 1.3 The state of computing 1.3.1. Evolution of computer system 1.3.2 Elements of Modern Computers 1.3.3 Flynn's Classical Taxonomy 1.3.4 System attributes 1.4 Multiprocessor and multicomputer, 1.4.1 Shared memory multiprocessors 1.4.2 Distributed Memory Multiprocessors 1.4.3 A taxonomy of MIMD Computers 1.5 Multi vector and SIMD computers 1.5.1 Vector Supercomputer 1.5.2 SIMD supercomputers 1.6 PRAM and VLSI model 1.6.1 Parallel Random Access machines 1.6.2 VLSI Complexity Model 1.7 Keywords 1.8 Summary 1.9 Exercises 1.10 References 1.0 Objective The main aim of this chapter is to learn about the evolution of computer systems, various attributes on which performance of system is measured, classification of computers on their ability to perform multiprocessing and various trends towards parallel processing. 1.1 Introduction From an application point of view, the mainstream of usage of computer is experiencing a trend of four ascending levels of sophistication:

Data processing
Information processing
Knowledge processing
Intelligence processing With more and more data structures developed, many users are shifting to computer roles from pure data processing to information processing. A high degree of parallelism has been found at these levels. As the accumulated knowledge bases expanded rapidly in recent years, there grew a strong demand to use computers for knowledge processing. Intelligence is very difficult to create; its processing even more so. Todays computers are very fast and obedient and have many reliable memory cells to be qualified for data- information-knowledge processing. Parallel processing is emerging as one of the key technology in area of modern computers. Parallel appears in various forms such as lookahead, vectorization concurrency, simultaneity, data parallelism, interleaving, overlapping, multiplicity, replication, multiprogramming, multithreading and distributed computing at different processing level. 1.2 The state of computing Modern computers are equipped with powerful hardware technology at the same time loaded with sophisticated software packages. To access the art of computing we firstly review the history of computers then study the attributes used for analysis of performance of computers. 1.2.1 Evolution of computer system Presently the technology involved in designing of its hardware components of computers and its overall architecture is changing very rapidly for example: processor clock rate increase about 20% a year, its logic capacity improve at about 30% in a year; memory speed at increase about 10% in a year and memory capacity at about 60% increase a year also the disk capacity increase at a 60% a year and so overall cost per bit improves about 25% a year. But before we go further with design and organization issues of parallel computer architecture it is necessary to understand how computers had evolved. Initially, man used simple mechanical devices – abacus (about 500 BC) , knotted string, and the slide rule for

implementations. And designers always tried to manufacture a new machine that should be upward compatible with the older machines.

Concept of specialized registers where introduced for example index registers were introduced in the Ferranti Mark I, concept of register that save the return-address instruction was introduced in UNIVAC I, also concept of immediate operands in IBM 704 and the detection of invalid operations in IBM 650 were introduced.
Punch card or paper tape were the devices used at that time for storing the program. By the end of the 1950s IBM 650 became one of popular computers of that time and it used the drum memory on which programs were loaded from punch card or paper tape. Some high-end machines also introduced the concept of core memory which was able to provide higher speeds. Also hard disks started becoming popular.
In the early 1950s as said earlier were design specific hence most of them were designed for some particular numerical processing tasks. Even many of them used decimal numbers as their base number system for designing instruction set. In such machine there were actually ten vacuum tubes per digit in each register.
Software used was machine level language and assembly language.
Mostly designed for scientific calculation and later some systems were developed for simple business systems.
Architecture features Vacuum tubes and relay memories CPU driven by a program counter (PC) and accumulator Machines had only fixed-point arithmetic
Software and Applications Machine and assembly language Single user at a time No subroutine linkage mechanisms Programmed I/O required continuous use of CPU
examples: ENIAC, Princeton IAS, IBM 701

IInd generation of computers (1954 – 64) The transistors were invented by Bardeen, Brattain and Shockely in 1947 at Bell Labs and by the 1950s these transistors made an electronic revolution as the transistor is

smaller, cheaper and dissipate less heat as compared to vacuum tube. Now the transistors were used instead of a vacuum tube to construct computers. Another major invention was invention of magnetic cores for storage. These cores where used to large random access memories. These generation computers has better processing speed, larger memory capacity, smaller size as compared to pervious generation computer. The key features of this generation computers were

The IInd^ generation computer were designed using Germanium transistor, this technology was much more reliable than vacuum tube technology.
Use of transistor technology reduced the switching time 1 to 10 microseconds thus provide overall speed up.
Magnetic cores were used main memory with capacity of 100 KB. Tapes and disk peripheral memory were used as secondary memory.
Introduction to computer concept of instruction sets so that same program can be executed on different systems.
High level languages, FORTRAN, COBOL, Algol, BATCH operating system.
Computers were now used for extensive business applications, engineering design, optimation using Linear programming, Scientific research
Binary number system very used.
Technology and Architecture Discrete transistors and core memories I/O processors, multiplexed memory access Floating-point arithmetic available Register Transfer Language (RTL) developed
Software and Applications High-level languages (HLL): FORTRAN, COBOL, ALGOL with compilers and subroutine libraries Batch operating system was used although mostly single user at a time
Example : CDC 1604, UNIVAC LARC, IBM 7090

IIIrd Generation computers (1965 to 1974) In 1950 and 1960 the discrete components ( transistors, registers capacitors) were manufactured packaged in a separate containers. To design a computer these discrete

Software and Applications Multiprogramming and time-sharing operating systems Multi-user applications
Examples : IBM 360/370, CDC 6600, TI ASC, DEC PDP-

IVth Generation computer ( (1975 to 1990) The microprocessor was invented as a single VLSI (Very large Scale Integrated circuit) chip CPU. Main Memory chips of 1MB plus memory addresses were introduced as single VLSI chip. The caches were invented and placed within the main memory and microprocessor. These VLSIs and VVSLIs greatly reduced the space required in a computer and increased significantly the computational speed.

Technology and Architecture feature LSI/VLSI circuits, semiconductor memory Multiprocessors, vector supercomputers, multicomputers Shared or distributed memory Vector processors Software and Applications Multprocessor operating systems, languages, compilers, parallel software tools Examples : VAX 9000, Cray X-MP, IBM 3090, BBN TC Fifth Generation computers( 1990 onwards) In the mid-to-late 1980s, in order to further improve the performance of the system the designers start using a technique known as “instruction pipelining”. The idea is to break the program into small instructions and the processor works on these instructions in different stages of completion. For example, the processor while calculating the result of the current instruction also retrieves the operands for the next instruction. Based on this concept later superscalar processor were designed, here to execute multiple instructions

in parallel we have multiple execution unit i.e., separate arithmetic-logic units (ALUs). Now instead executing single instruction at a time, the system divide program into several independent instructions and now CPU will look for several similar instructions that are not dependent on each other, and execute them in parallel. The example of this design are VLIW and EPIC.

Technology and Architecture features ULSI/VHSIC processors, memory, and switches High-density packaging Scalable architecture Vector processors
Software and Applications Massively parallel processing Grand challenge applications Heterogenous processing
Examples : Fujitsu VPP500, Cray MPP, TMC CM-5, Intel Paragon Elements of Modern Computers The hardware, software, and programming elements of modern computer systems can be characterized by looking at a variety of factors in context of parallel computing these factors are:

Computing problems
Algorithms and data structures
Hardware resources
Operating systems
System software support
Compiler support Computing Problems
Numerical computing complex mathematical formulations tedious integer or floating -point computation
Transaction processing accurate transactions large database management information retrieval
Logical Reasoning logic inferences symbolic manipulations

Parallel software can be developed using entirely new languages designed specifically with parallel support as its goal, or by using extensions to existing sequential languages.
New languages have obvious advantages (like new constructs specifically for parallelism), but require additional programmer education and system software.
The most common approach is to extend an existing language. Compiler Support
Preprocessors use existing sequential compilers and specialized libraries to implement parallel constructs
Precompilers perform some program flow analysis, dependence checking, and limited parallel optimzations
Parallelizing Compilers requires full detection of parallelism in source code, and transformation of sequential code into parallel constructs
Compiler directives are often inserted into source code to aid compiler parallelizing efforts 1.2.3 Flynn's Classical Taxonomy Among mentioned above the one widely used since 1966, is Flynn's Taxonomy. This taxonomy distinguishes multi-processor computer architectures according two independent dimensions of Instruction stream and Data stream. An instruction stream is sequence of instructions executed by machine. And a data stream is a sequence of data including input, partial or temporary results used by instruction stream. Each of these dimensions can have only one of two possible states: Single or Multiple. Flynn’s classification depends on the distinction between the performance of control unit and the data processing unit rather than its operational and structural interconnections. Following are the four category of Flynn classification and characteristic feature of each of them. 1. Single instruction stream, single data stream (SISD)

Figure 1.1 Execution of instruction in SISD processors The figure 1.1 is represents a organization of simple SISD computer having one control unit, one processor unit and single memory unit.

Figure 1.2 SISD processor organization

They are also called scalar processor i.e., one instruction at a time and each instruction have only one set of operands.
Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle
Single data: only one data stream is being used as input during any one clock cycle
Deterministic execution
Instructions are executed sequentially.
This is the oldest and until recently, the most prevalent form of computer
Examples: most PCs, single CPU workstations and mainframes b) Single instruction stream, multiple data stream (SIMD) processors
A type of parallel computer
Single instruction: All processing units execute the same instruction issued by the control unit at any given clock cycle as shown in figure 13.5 where there are multiple processor executing instruction given by one control unit.

c) Multiple instruction stream, single data stream (MISD)

A single data stream is fed into multiple processing units.
Each processing unit operates on the data independently via independent instruction streams as shown in figure 1.5 a single data stream is forwarded to different processing unit which are connected to different control unit and execute instruction given to it by control unit to which it is attached.

Figure 1.5 MISD processor organization

Thus in these computers same data flow through a linear array of processors executing different instruction streams as shown in figure 1.6.
This architecture is also known as systolic arrays for pipelined execution of specific instructions.
Few actual examples of this class of parallel computer have ever existed. One is the experimental Carnegie-Mellon C.mmp computer (1971).
Some conceivable uses might be:

multiple frequency filters operating on a single signal stream
multiple cryptography algorithms attempting to crack a single coded message.

Figure 1.6 Execution of instructions in MISD processors

d) Multiple instruction stream, multiple data stream (MIMD)

Multiple Instruction: every processor may be executing a different instruction stream
Multiple Data: every processor may be working with a different data stream as shown in figure 1.7 multiple data stream is provided by shared memory.
Can be categorized as loosely coupled or tightly coupled depending on sharing of data and control
Execution can be synchronous or asynchronous, deterministic or non- deterministic

Figure 1.7 MIMD processor organizations

As shown in figure 1.8 there are different processor each processing different task.
Examples: most current supercomputers, networked parallel computer "grids" and multi-processor SMP computers - including some types of PCs.

Figure 1.8 execution of instructions MIMD processors

benchmarks were developed. Computer architects have come up with a variety of metrics to describe the computer performance. Clock rate and CPI / IPC : Since I/O and system overhead frequently overlaps processing by other programs, it is fair to consider only the CPU time used by a program, and the user CPU time is the most important factor. CPU is driven by a clock with a constant cycle time (usually measured in nanoseconds, which controls the rate of internal operations in the CPU. The clock mostly has the constant cycle time (t in nanoseconds). The inverse of the cycle time is the clock rate ( f = 1/τ, measured in megahertz). A shorter clock cycle time, or equivalently a larger number of cycles per second, implies more operations can be performed per unit time. The size of the program is determined by the instruction count (Ic). The size of a program is determined by its instruction count, I c , the number of machine instructions to be executed by the program. Different machine instructions require different numbers of clock cycles to execute. CPI (cycles per instruction) is thus an important parameter. Average CPI It is easy to determine the average number of cycles per instruction for a particular processor if we know the frequency of occurrence of each instruction type. Of course, any estimate is valid only for a specific set of programs (which defines the instruction mix), and then only if there are sufficiently large number of instructions. In general, the term CPI is used with respect to a particular instruction set and a given program mix. The time required to execute a program containing I c instructions is just T = I c * CPI * τ. Each instruction must be fetched from memory, decoded, then operands fetched from memory, the instruction executed, and the results stored. The time required to access memory is called the memory cycle time, which is usually k times the processor cycle time τ. The value of k depends on the memory technology and the processor-memory interconnection scheme. The processor cycles required for each instruction (CPI) can be attributed to cycles needed for instruction decode and execution ( p ), and cycles needed for memory references ( m* k ). The total time needed to execute a program can then be rewritten as T = I c* (p + mk)τ.

MIPS : The millions of instructions per second , this is calculated by dividing the number of instructions executed in a running program by time required to run the program. The MIPS rate is directly proportional to the clock rate and inversely proportion to the CPI. All four systems attributes (instruction set, compiler, processor, and memory technologies) affect the MIPS rate, which varies also from program to program. MIPS does not proved to be effective as it does not account for the fact that different systems often require different number of instruction to implement the program. It does not inform about how many instructions are required to perform a given task. With the variation in instruction styles, internal organization, and number of processors per system it is almost meaningless for comparing two systems. MFLOPS (pronounced megaflops'') stands formillions of floating point operations per second.'' This is often used as a bottom-line'' figure. If one know ahead of time how many operations a program needs to perform, one can divide the number of operations by the execution time to come up with a MFLOPS rating. For example, the standard algorithm for multiplying **n*n** matrices requires **2n**^3 **– n** operations (n^2 inner products, with **n** multiplications and **n-1** additions in each product). Suppose you compute the product of two 100 *100 matrices in 0.35 seconds. Then the computer achieves (2(100)^3 – 100)/0.35 = 5,714,000 ops/sec = 5.714 MFLOPS The termtheoretical peak MFLOPS'' refers to how many operations per second would be possible if the machine did nothing but numerical operations. It is obtained by calculating the time it takes to perform one operation and then computing how many of them could be done in one second. For example, if it takes 8 cycles to do one floating point multiplication, the cycle time on the machine is 20 nanoseconds, and arithmetic operations are not overlapped with one another, it takes 160ns for one multiplication, and (1,000,000,000 nanosecond/1sec)(1 multiplication / 160 nanosecond) = 6.2510^6 multiplication /sec so the theoretical peak performance is 6.25 MFLOPS. Of course, programs are not just long sequences of multiply and add instructions, so a machine rarely comes close to this level of performance on any real program. Most machines will achieve less than 10% of their peak rating, but vector processors or other machines with internal pipelines that have an effective CPI near 1.0 can often achieve 70% or more of their theoretical peak on small programs.

compiler technology (affects Ic and p and m )
CPU implementation and control (affects p *t ) cache and memory hierarchy (affects memory access latency, k ´t  )
Total CPU time can be used as a basis in estimating the execution rate of a processor. Programming Environments Programmability depends on the programming environment provided to the users. Conventional computers are used in a sequential programming environment with tools developed for a uniprocessor computer. Parallel computers need parallel tools that allow specification or easy detection of parallelism and operating systems that can perform parallel scheduling of concurrent events, shared memory allocation, and shared peripheral and communication links. Implicit Parallelism Use a conventional language (like C, Fortran, Lisp, or Pascal) to write the program. Use a parallelizing compiler to translate the source code into parallel code. The compiler must detect parallelism and assign target machine resources. Success relies heavily on the quality of the compiler. Explicit Parallelism Programmer writes explicit parallel code using parallel dialects of common languages. Compiler has reduced need to detect parallelism, but must still preserve existing parallelism and assign target machine resources. Needed Software Tools Parallel extensions of conventional high-level languages. Integrated environments to provide different levels of program abstraction validation, testing and debugging performance prediction and monitoring visualization support to aid program development, performance measurement graphics display and animation of computational results 1.3 MULTIPROCESSOR AND MULTICOMPUTERS Two categories of parallel computers are discussed below namely shared common memory or unshared distributed memory. 1.3.1 Shared memory multiprocessors

Shared memory parallel computers vary widely, but generally have in common the ability for all processors to access all memory as global address space.
Multiple processors can operate independently but share the same memory resources.
Changes in a memory location effected by one processor are visible to all other processors.
Shared memory machines can be divided into two main classes based upon memory access times: UMA , NUMA and COMA.

Uniform Memory Access (UMA):

Most commonly represented today by Symmetric Multiprocessor (SMP) machines
Identical processors
Equal access and access times to memory
Sometimes called CC-UMA - Cache Coherent UMA. Cache coherent means if one processor updates a location in shared memory, all the other processors know about the update. Cache coherency is accomplished at the hardware level.

Figure 1.9 Shared Memory (UMA)

Mca 502, Lecture notes of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Mca 502 and more Lecture notes Computer Architecture and Organization in PDF only on Docsity!