




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
advanced computer architecture
Typology: Exams
1 / 639
This page cannot be seen from the preview
Don't miss anything!
TATA MCGRA EDITION Limited preview! Not for commercial use
Tata McGraw-Hill
ADVANCED COMPUTER ARCHITECTURE Parallelism. Scalability, Programmability
Copyright C 2000 by The McGraw-Hill Companies. Inc. All righis reserved. No part of ihis publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission o f publisher, with the exception that the program listings may be entered, stored, and executed in a computer system, but they may not be reproduced lor publication
Tata McGraw-Hill Edition 200 1
Eighteenth reprint 2008 RA<YCDRXRBZQD
Reprinted in India by arrangement with The McGraw-Hill Companies Inc.. New York
Sales territories: India, Pakistan, Nepal. Bangladesh* Sri Lanka and Bhutin
Library of Congress Cataloging-in-Publicatiun Data Hwang. Kai, 1943 - Advanced Computer Architecture: Parallelism. Scalability. Programmability/ Kai Hwang p cm. -(McGraw-Hill computer science series. Computer organization and architecture. Network, parallel and distributed computing. McGraw-Hill computer engineering scries) Includes bibliographical references i p j and index. ISBN 0-07-031622- I Computer Architecture I. Title II Series QA76.9A7H87 1993 (HM'35-dc20 92-4494 4
!SBN- 13: 978-0-07-053070- ISBN-10: 0-07-053070-X Published by Tata McGraw-Hill Publishing Company Limited. 7 West Patel Nagar, New Delhi 110 008, and printed at Gopaljce Enterprises, Delhi 1 10 053
Foreword
by Gordon Bell
Kai Hwang has introduced the issues in designing and using high performance parallel computers at a time when a plethora of scalable computers utilizing commodity microprocessors offer higher peak performance than traditional vector supercomputers. These new machines, their operating environments including the operating system and languages, and the programs to effectively utilize them are introducing more rapid changes for researchers, builders, and users than at any time in the history of computer structures.
For the first time since the introduction of Cray 1 vector processor in 1975, it may again be necessary to change and evolve the programming paradigm — provided that massively parallel computers can be shown to be xiseful outside of research on massive parallelism. Vector processors required modest data parallelism and these operations have been reflected either explicitly in Fortran programs or implicitly with the need to evolve Fortran (e.g.. Fortran 90) to build in vector operations.
So far, the main line of supercomputing as measured by the usage (hours, jobs, number of programs, program portability) has been the shared memory, vector multi- processor as pioneered by Cray Research. Fujitsu, IBM, Hitachi, and NEC all produce computers of this type. In 1993- the Cray C90 supercomputer delivers a peak of 16 billion floating-point operations per second (a Gigaflops) with 16 processors and costs about S30 million, providing roughly 500 floating-point operations per second per dollar.
In contrast, massively parallel computers introduced in the early 1990s are nearly all based on utilizing the same powerful, RISC-based. CMOS microprocessors that are used in workstations. These scalar processors provide a peak of =s 100 million floating- point operations per second and cost $20 thousand, providing an order of magnitude more peak per dollar (5000 flops per dollar). Unfortunately, to obtain peak power requires large-scale problems that can require 0 ( n 3 ) operations over supers, and this significantly increases the running time when peak power is the goal.
xvn
Preface
T h e Aims This book provides a comprehensive study of scalable and parallel computer ar- chitectures for achieving a proportional increase in performance with increasing system resources. System resources are scaled by the number of processors used, the mem- ory capacity enlarged, the access latency tolerated, the I/O bandwidth required, the performance level desired, etc. Scalable architectures delivering a sustained performance are desired in both se- quential and parallel computers. Parallel architecture has a higher potential to deliver scalable performance. The scalability varies with different architecfcure-algorithm com- binations. Both hardware and software issttes need to be studied in building scalable computer systems.
It is my intent to put the reader in a position to design scalable computer systems. Scalability is defined in a broader sense to reflect the interplay among architectures, algorithms, software, and environments. The integration between hardware and software is emphasized for building cost-effective computers. We should explore cutting-edge technologies in scalable parallel computing. Sys- tems architecture is thus studied with generality, scalability, programmability, and per- formability in mind. Since high technology changes so rapidly, I have presented the material in a generic manner, unbiased toward particular machine implementations. Representative proces- sors and systems are presented only if they contain important features which may last into the future. Every author faces the same dilemma in writing a technology-dependent book which may become obsolete quickly. To cope with the problem, frequent updates with newer editions become a necessity, and I plan to make revisions every few years in the future.
xix
XX Preface
The Contents
This book consists of twelve chapters divided into four parts covering theory, tech- nologyt architecture, and software aspects of parallel and vector computers as shown in the flowchart:
Electrical Engineering TVack
Part II: / Technology > (^) Chapter 5: Bus, Cache, Memory
Part HI:
Architectures
I Chapter 7; Multiprocessors, Multicomputer?
i
Computer Science Track
Chapter 1: Machine Models
I Chapter 3: Performance, Scalability
i
Parti: Theory
Chapter 4; Processor, Memory Hierarchy
I
[CS bypass paths)
Chapter 6: Pipelining, Superscalar
Chapter 8: Multi vector, SIMD Machines
I Chapter 9: Scalable, Multithreaded Arc hi lectures (^) (EE optional)
I
(EE optional)
Chapter 10: Programming Models, Compilers
Chapter 11: Parallel Program Development
!
Part IV: Software
Chapter 12: Parallel UNIX
Readers' Guide
The first four chapters should be taught to all disciplines. The three technology chapters are necessary for EE and CE students. The three architecture chapters can be selectively taught to CE and CS students, depending on the instructor's interest and the computing facilities available to teach the course. The three software chapters are written for CS students and are optional to EE students.
Five course outlines are suggested below for different audiences. The first three outlines are for 45-hour, one-semester courses. The last two outlines are for two-quarter courses in a sequence.
( 1 ) For a Computer Science course on Parallel Computers and Programming, the minimum coverage should include Chapters 1-4, 7, and 9-12. ( 2 ) For an exclusive Electrical Engineering course on Advanced Computer Architec- ture^ the minimum coverage should include Chapters 1-9. ( 3 ) For a joint CS and EE course on Parallel Processing Computer Systems, the minimum coverage should include Chapters 1-4 and 7-12. (4 ) Chapters 1 through 6 can be taught to a senior or first-year graduate course under the title Computer Architecture in one quarter (10 weeks / 30 hours). ( 5 ) Chapters 7 through 12 can be taught to a graduate course on Parallel Computer Architecture and Programming in a one-quarter course with course (4) as the prerequisite.
Instructors may wish to include some advanced research topics treated in Sections 1.4, 2.3, 3.4, 5.4, 6.2, 6.5, 7.2, 7.3, 8.3, 10.4, 11.1, 12.5, and selected sections from Chapter 9 in each of the above course options. The architecture chapters present four different families of commercially available computers. Instructors may choose to teach a subset of these machine families based on the accessibility of corresponding machines on campus or via a public network. Students are encouraged to learn through hands-on programming experience on parallel computers.
A Solutions Manual is available to instructors only from the Computer Science Editor, College Division, McGraw-Hill, Inc., 1221 Avenue of the Americas, New York, NY 10020. Answers to a few selected exercise problems are given at the end of the book.
T h e Prerequisites
This is an advanced text on computer architecture and parallel programming. The reader should have been exposed to some basic computer organization and program- ming courses at the undergraduate level. Some of the required background material can be found in Computer Architecture: A Quantitative Approach by John Hennessy and David Patterson (Morgan Kaufman, 1990) or in Machine and Assembly Language Programming by Arthur Gill (Prentice-Hall, 1978). Students should have some knowledge and experience in logic design, computer hardware, system operations, assembly languages, and Fortran or C programming. Be- cause of the emphasis on scalable architectures and the exploitation of parallelism in practical applications, readers will find it useful to have some background in probability, discrete mathematics, matrix algebra, and optimization theory.
Preface xxiii
Acknowledgments
I have tried to identify all sources of information in the bibliographic notes. As the subject area evolves rapidly, omissions are almost unavoidable. I apologize to those whose valuable work has not been included in this edition. I am responsible for all omissions and for any errors found in the book. Readers are encouraged to contact me directly regarding error correction or suggestions for future editions. The writing of this book was inspired, taught, or assisted by numerous scholars or specialists working in the area. I would like to thank each of them for intellectual exchanges, valuable suggestions, critical reviews, and technical assistance. First of all, I want to thank a number of my former and current Ph.D. students. Hwang-Cheng Wang has assisted me in producing the entire manuscript in lATjrjX. Be- sides, he has coauthored the Solutions Manual with Jung-Gen Wu, who visited USC dur- ing 1992. Weihua Mao has drawn almost all the figure illustrations using FrameMaker, based on my original sketches. I want to thank D.K. Panda, Joydeep Ghosh, Ahmed Louri, Dongseung Kim, Zhi-Wei Xu, Sugih Jamin, Chien-Ming Cheng, Santosh Rao, Shisheng Shang, Jih-Cheng Liu, Scott Toborg, Stanley Wang, and Myungho Lee for their assistance in collecting material, proofreading, and contributing some of the home- work problems. The errata from Teerapon Jungwiwattanaporn were also useful. The Index was compiled by H.C. Wang and J.G. Wu jointly.
I want to thank Gordon Bell for sharing his insights on supercomputing with me and for writing the Foreword to motivate my readers. John Hennessy and Anoop Gupta provided the Dash multiprocessor-related results from Stanford University. Charles Seitz has taught me through his work on Cosmic Cube, Mosaic, and multicomputers. From MIT, I received valuable inputs from the works of Charles Leiserson, William Dally, Anant Agarwal, and Rishiyur Nikhil. From University of Illinois, I received the Cedar and Perfect benchmark information from Pen Yew. Jack Dongarra of the University of Tennessee provided me the Linpack benchmark results. James Smith of Cray Research provided up-to-date information on the C- 90 clusters and on the Cray/MPP. Ken Miura provided the information on Fujitsu VPP500. Lionel Ni of Michigan State University helped me in the areas of performance laws and adaptive wormhole routing. Justin Ratter provided information on the Intel Delta and Paragon systems. Burton Smith provided information on the Tera computer development. Harold Stone and John Hayes suggested corrections and ways to improve the pre- sentation. H.C. Torng of Cornell University, Andrew Chien of University of Illinois, and Daniel Tobak of George-Mason University made useful suggestions. Among my colleagues at the University of Southern California, Jean-Luc Gaudiot, Michel Dubois, Rafael Saavedra, Monte Ung, and Viktor Prasanna have made concrete suggestions to improve the manuscript. I appreciatn the careful proofreading of an earlier version of the manuscript by D.K. Panda of the Ohio State University. The inputs from Vipm Kumar of the University of Minnesota, Xian-He Sun of NASA Langley Research Center, and Alok Choudhary of Syracuse University are also appreciated.
In addition to the above individuals, my understanding on computer architecture and parallel processing has been influenced by the works of David Kuck, Ken Kennedy,
ADVANCED COMPUTER ARCHITECTURE:
Parallelism, Scalability, Programmability
'