
Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
The concept of parallel processing and the efficiency of using multiple processors to execute a single task. It introduces the ware model, which helps estimate the speedup of a tightly coupled system on a single application. The relationship between speedup and parallelism (a) and the number of processors, highlighting the importance of highly parallel algorithms for achieving significant speedup. It also mentions the limitations of the ware model and the need for research to find suitable replacements for algorithms that do not contain the requisite parallelism.
Typology: Study notes
1 / 1
This page cannot be seen from the preview
Don't miss anything!
Frontiers of Supercomputing
arallel processing, or the application of several processors to a single task, is an old idea with a relatively large literature. The advent of very large-scale integrated technology has made testing the idea feasible, and the fact that single- processor systems are approaching their maximum performance level has made it necessary. We shall show, however, that successful use of parallel processing imposes stringent performance requirements on algo- rithms, software, and architecture. The so-called asynchronous systems that use a few tightly coupled high-speed proces- sors are a natural evolution from high-speed single-processor systems, Indeed, systems with two to four processors will soon be available (for example, the Cray X-MP, the Cray-2, and the Control Data System 2XX). Systems with eight to sixteen processors are likely by the early 1990s. What are the prospects of using the parallelism in such systems to achieve high speed in the execu- tion of a single application? Early attempts with vector processing have shown that plunging forward without a precise under- standing of the factors involved can lead to disastrous results. Such understanding will be even more critical for systems now con- templated that may use up to a thousand processors. The key issue in the parallel processing of a single application is the speedup achieved, especially its dependence on the number of processors used. We define speedup (S) as the factor by which the execution time for the application changes: that is,
execution time for one processor S = execution time for p processors
To estimate the speedup of a tightly coupled system on a single application, we use a model of parallel computation in- troduced by Ware. We define a as the fraction of work in the application that can be processed in parallel. Then we make a simplifying assumption of a two-state ma- chine; that is, at any instant either all p processors are operating or only one proc- essor is operating. If we normalize the execu- tion time for one processor to unity, then
1 S (p,a) =
LOS ALAMOS SCIENCE Fall 1983
Note that the first term in the denominator is the execution time devoted to that part of the
parallel, and the second term is the time for that part that can be processed in parallel. How does speedup vary with a? In par- ticular. what is this relationship for a = 1. the ideal limit of complete parallelization? Dif- ferentiating S, we find that
The accompanying figure shows the Ware model of speedup as a function of a for a 4- processor, an 8-processor, and a 16-proc- essor system. The quadratic dependence of
a less than 0.9. Consequently, to achieve
parallel algorithms. It is by no means evident that algorithms in current use on single- processor machines contain the requisite parallelism. and research will be required to find suitable replacements for those that do not. Further, the highly parallel algorithms available must be implemented with care. For example, it is not sufficient to look at just those portions of the application amenable to parallelism because a is de- termined by the entire application. For a close to 1. changes in those few portions less amenable to parallelism will cause small changes in a, but the quadratic behavior of the derivative will translate those small changes in a into large changes in speedup. Those who have experience with vector processors will note a striking similarity between the Ware curves and plots of vector processor performance versus the fraction of vectorizable computation. This similarity is due to the assumption in the Ware model of a two-state machine since a vector processor can also be viewed in that manner. In one state it is a relatively slow, general-purpose machine. and in the other state it is capable of high performance on vector operations, Ware’s model is inadequate in that it assumes that the instruction stream executed on a parallel system is the same as that executed on a single processor. Seldom is this the case because multiple-processor sys- tems usually require execution of instruc- tions dealing with synchronization of the processes and communication between
processors. Further, parallel algorithms may inherently require additional instructions. To correct for this inadequacy, we add a term,
mentation that is at best nonnegative and
the algorithm. the architecture, and even of
modified model. Then
If the application can be put completely in parallel form, then
In other words, the maximum speedup of a real system is less than the number of
Also note that, whatever the value of a, S will have a maximum for sufficiently large p
continues to increase, Thus the research challenge in parallel processing involves finding algorithms, pro- gramming languages, and parallel architec- tures that, when used as a system, yield a large amount of work processed in parallel (large a) at the expense of a minimum num-