Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

86576509 advanced computer architecture, Exams of Advanced Computer Architecture

advanced computer architecture

Typology: Exams

2015/2016

Uploaded on 03/15/2016

kavyarastogi
kavyarastogi 🇮🇳

3.5

(6)

3 documents

1 / 639

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ADVANCED
COMPUTER
ARCHITECTURE
PARALLELISM
SCALABILITY
PROGRAMMABILITY
Baas ®
'
iiteCitft*-<
£
TATA
MCGRA
EDITION
Limited preview ! Not for commercial use
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download 86576509 advanced computer architecture and more Exams Advanced Computer Architecture in PDF only on Docsity!

ADVANCED

COMPUTER

ARCHITECTURE

PARALLELISM

SCALABILITY

PROGRAMMABILITY

Baas®

' iiteCitft*-< £

TATA MCGRA EDITION Limited preview! Not for commercial use

Tata McGraw-Hill

ADVANCED COMPUTER ARCHITECTURE Parallelism. Scalability, Programmability

Copyright C 2000 by The McGraw-Hill Companies. Inc. All righis reserved. No part of ihis publication may be reproduced or distributed in any form or by any means, or stored in a database or retrieval system, without the prior written permission o f publisher, with the exception that the program listings may be entered, stored, and executed in a computer system, but they may not be reproduced lor publication

Tata McGraw-Hill Edition 200 1

Eighteenth reprint 2008 RA<YCDRXRBZQD

Reprinted in India by arrangement with The McGraw-Hill Companies Inc.. New York

Sales territories: India, Pakistan, Nepal. Bangladesh* Sri Lanka and Bhutin

Library of Congress Cataloging-in-Publicatiun Data Hwang. Kai, 1943 - Advanced Computer Architecture: Parallelism. Scalability. Programmability/ Kai Hwang p cm. -(McGraw-Hill computer science series. Computer organization and architecture. Network, parallel and distributed computing. McGraw-Hill computer engineering scries) Includes bibliographical references i p j and index. ISBN 0-07-031622- I Computer Architecture I. Title II Series QA76.9A7H87 1993 (HM'35-dc20 92-4494 4

!SBN- 13: 978-0-07-053070- ISBN-10: 0-07-053070-X Published by Tata McGraw-Hill Publishing Company Limited. 7 West Patel Nagar, New Delhi 110 008, and printed at Gopaljce Enterprises, Delhi 1 10 053

The McGraw-Hill Companies m to-

  • P A R T I T H E O R Y O F P A R A L L E L I S M Preface - - - xix
    • C h a p t e r 1 Parallel Computer Models - 1.1 The State of Computing - 1.1.1 Computer Development Milestones - 1.1.2 Elements of Modern Computers - 1.1.3 Evolution of Computer Architecture - 1.1.4 System Attributes to Performance
      • 1.2 Multiprocessors and Multicomputer - 1.2.1 Shared-Memory Multiprocessors - 1.2.2 Distributed-Memory Multicomputers - 1.2.3 A Taxonomy of M1MD Computers
      • 1.3 Multivector and SIMP Computers - 1.3.1 Vector Supercomputers - 1.3.2 SIMP Supercomputers
      • 1.4 PRAM and VLSI Models - 1.4.1 Parallel Random-Access Machines - 1.4.2 VLSI Complexity Model
      • 1.5 Architectural Development Tracks - 1.5.1 Multiple-Processor Tracks - 1.5.2 Multivector and SIMP Tracks - 1.5.3 Multithreaded and Dataflow Tracks.
      • 1-6 Bibliographic Notes and Exercises
    • C h a p t e r 2 Program a n d N e t w o r k P r o p e r t i e s x Contents - 2.1 Conditions of Parallelism - 2.1.1 Data and Resource Dependences - 2.1.2 Hardware and Software Parallelism - 2.1.3 The Role of Compilers
      • 2.2 Program Partitioning and Scheduling « - 2.2.1 Grain Sizes and Latency - 2.2.2 Grain Packing and Scheduling - 2.2.3 Static Multiprocessor Scheduling
      • 2.3 Program Flow Mechanisms - 2.3.1 Control Flow Versus Data Flow - 2JL2 Demand-Driven Mechanisms - 2.3.3 Comparison of Flow Mechanisms
      • 2.4 System Interconnect Architectures - 2.4.1 Network Properties and Routing - 2.4.3 Dynamic Connection Networks, 2.4.2 StaAir C o m m o t i o n N e t w o r k s fiO
        • 2.5 Bibliographic Notes and Exercises
    • Chapter 3 P r i n c i p l e s of Scalable Performance - 3.1 PerformHTIPP Metrics and Measures - 3,1.1 Parallelism Profile in Programs - 3.1.3 Efficiency, Utilization, and Quality 3.1.2 Harmonic Mean Performance IDS - 3.1.4 Standard Performan™* Measures - 3.2.1 Massive Parallelism for Grand Challenges 3.2 Parallel Processing Applications U S - 3.2.2 Application Models of Parallel Computers - 3.2.3 Scalability of Parallel Algorithms
      • 3.3 Speedup Performance Laws - 3.3.1 Amdahl's Law for a Fixed Workload - 3.3.2 Gustafson's Law for Scaled Problems - 3.3.3 Memory-Bounded Speedup Model
      • 3.4 Scalability Analysis and Approaches - 3.4.1 Scalability Metrics and Goals - 3.4.2 Evolution of Scalable Computers - 3.4.3 Research Issues and Solutions
        • 3.5 Bibliographic Notes and Exercises
  • P A R T IT H A R D W A R E TROHNOLOCTES
    • C h a p t e r 4 P r o c e s s o r s a n d M e m o r y H i e r a r c h y Contents xi - 4.1 Advanced Processor Technology - 4.1.1 Design Space of Processors - 4.1.2 Instruction-Set Architectures - 4.1.4 RISC Scalar Processors 4.1.3 CISC Scalar Processors IfiS - 4.2 Superscalar and Vector Processors - 4.2.1 Superscalar Processors .- - 4.2.2 The VLIW Architecture - 4.2.3 Vector and Symbolic Processors - 4.3 Memory Hierarchy Technology - 4.3.1 Hierarchical Memory Technology - 4.3.2 Inclusion. Coherence, and Locality - 4.3.3 Memory Capacity Planning; - 4.4 Virtual Memory Technology - 4.4.1 Virtual Memory Models - 4.4.2 TLB, Paging, and Segmentation - 4.4.3 Memory Replacement Policies - 4.5 Bibliographic Notes and Exercises
  • C h a p t e r 5 B u s , Cache, and Shared Memory - 5.1 Backplane Bus Systems - 5.1.1 Backplane Bus Specification - 5.1.2 Addressing and Timing Protocols - 5.1.3 Arbitration, Transaction, and Interrupt - 5.1.4 The IEEE Futurebus+ Standards - 5.2 Cache Memory Organizations - 5.2.1 Cache Addressing Models - 5.2.2 Direct Mapping and Associative Caches - 5-2-3 Set-Associative and Sector Caches - 5.2.4 Cache Performance Issues - 5.3 Shared-Memory Organizations - 5.3.1 Interleaved Memory Organization - 5.3.2 Bandwidth and Fauit T»lnran<-ft - 5.3.3 Memory Allocation Schemts - 5.4 Sequential and Weak Consistency Models - 5.4.1 Atomicity and Event Ordering - 5.4.2 Sequential Consistency Model - 5.4.3 Weak Consistency Models - 5.5 Bibliographic Notes and Exercises - 6 1 Linear Pipeline Processors xii Contents - 6.1.1 Asynchronous and Synchronous Models - 6.1.2 Clocking and Timing Control - 6.1.3 Speedup, Efficiency, and Throughput - 6.2 Nonlinear Pipeline Processors - 6.2.1 Reservation and Latency Analysis. - ' 6,2.2 Collision-Free Scheduling - 6.2.3 Pipeline Schedule Optimization - 6.3 Instruction Pipeline Design - 6.3.1 Instruction Execution Phases - 6.3.2 Mechanisms for Instruction Pipelining - 6.3.3 Dynamic Instruction Scheduling - 6.3.4 Branch Handling Techniques - G.4 Arithmetic Pipeline Design - 6.4.1 Computer Arithmetic Principles - 6.4.2 Static Arithmetic Pipelines - 6.4.3 Multifunctional Arithmetic Pipelines - 6.5 Superscalar and Superpipeline Design • - - 6.5.1 Superscalar Pipeline Design - 6.5.2 Superpipelined Design - 6.5.3 Supersyrometry and Design Tradeoffs...... >. - 6.6 Bibliographic Notes and Exercises
  • P A R T H I P A R A L L E L A N D S C A L A B L E A R C H I T E C T U R E S
    • C h a p t e r 7 Multiprocessors and Multicomputer^ - 7.1 Multiprocessor System Interconnects - 7.1.1 Hierarchical Bus Systems - - 7.1.2 Crossbar Switch and Multiport Memory - 7.1.3 Multistage and Combining Networks. - 7.2 Cache Coherence and Synchronization Mechanisms - 7.2.1 The Cache Coherence Problem - 7.2.2 Snoopy Bus Protocols - 7.2.3 Directory-Based Protocols - 7.2.4 Hardware Synchronization Mechanisms - 7.3 Three Generations of Multicomputers - 7.3.1 Design Choices in the Past - 7.3.2 Present and Future Development - 7.3.3 The Intel Paragon System
      • 7.4 Message-Passing Mechanisms - 7.4.1 Message-Routing Schemes - 7.4.2 Deadlock and Virtual Channels Contents xiii - 7.4.3 Flow Control Strategies - 7.4.4 Multicast Routing Algorithms
        • 7.5 Bibliographic Notes and Exercises
  • C h a p t e r 8 Multivector and S I M P C o m p u t e r s - 8.1 Vector Processing Principles - - 8.1.1 Vector Instruction Types - 8.1.2 Vector-Access Memory Schemes - 8.1.3 Past and Present Supercomputers - 8.2 Multivector Multiprocessors - 8.2.1 Performance-Directed Design Rules - 8.2.2 Cray Y-MR C-90, and MPP - 8.2.3 Fujitsu VP2000 and VPP500 - 8.2.4 Mainframes and Minisiipercomputers - 8.3 Compound Vector Processing - 8.3.1 Compound Vector Operations - 8.3.2 Vector Loops and Chaining - 8.3.3 Multipipeline Networking
    • 8.4 SIMP Computer Organizations - 8.4.1 Implementation Models - fi.4.3 The M M P M MP-1 tohStflCtfflB ft 4 2 The CM-2 Architecture! 44 H
    • 8.5 The Connection Machine CM-5 - 8.5.1 * A Synchronized MIMD Machine - 8.5.2 The CM-5 Network Architecture - 8.5.3 Control Processors and Processing Nodes - 8.5.4 Interprocessor Communications
    • 8.6 Bibliographic Notes and Exercises
  • C h a p t e r 9 Scalable, Multithreaded, a n d Dataflow Architectures
    • 9.1 Latency-Hiding Techniques - 9 1 1 Shared Virtual Memory - 9.1.2 Prefetching Techniques - 9.1.3 Distributed Coherent Caches - 9.1.4 Scalable Coherence Interface - 9.1.5 Relaxed Memory Consistency
    • 9.2 Principles of Multithreading - 9.2.1 Multithreading Issues and Solutions - 9.2.2 Multiple-Context Processors - 9.2.3 Multidimensional Architectures
    • 9.3 Fine-Grain Multicomputers - 9.3.1 Fine-Grain Parallelism xiv Contents - 9.3.2 The MIT J-Machine - 9.3.3 ThA Cuhoch Mosaic (!
      • 9.4 Scalable and Multithreaded Architectures - 9.4.1 The Stanford Dash Multiprocessor - 9.4.2 The Kendall Square Research KSR-1 - 9.4.3 The Tera Multiprocessor System
      • 9.5 Dataflow and Hybrid Architectures - 9.5.1 The Evolution of Dataflow Computers - 9.5.2 The ETL/EM-4 in Japan - 9.5.3 The MIT/Motorola *T Prototype
      • 9.6 Bibliographic Notes and Exercises
  • P A R T TV S D F T W A R R F O R P A R A M R I ERQQBAMMIMQ
    • C h a p t e r 10 Parallel Models, Languages, a n d Compilers - 10.1 Parallel Programming Models - 10.1.1 Shared-Variable Model - 10.1.2 Message-Passing Model - 10.1.3 Data-Parallel Model - 10.1.4 Object-Oriented Model - 10.1.5 Functional and Logic Models - 10.2 Parallel Languages and Compilers - 10.2.1 Language Features for Parallelism - 10.2.2 Parallel Language Constructs - 10.2.3 Optimizing Compilers for Parallelism - 10.3 Dependence Analysis of Data Arrays - 10.3.1 Iteration Space and Dependence Analysis - 10.3.2 Subscript Separability and Partitioning - 10.3.3 Categorized Dependence Tests - 10.4 Code Optimization and Scheduling - 10.4.1 Scalar Optimization with Basic Blocks - 10.4.2 Local and Global Optimizations - 10.4.3 Vectorization and Parallelization Methods - 10.4.4 Code Generation and Scheduling - 10.4.5 Trace Scheduling Compilation - 10.5 Loop Parallelization and Pipelining - 10.5.1 Loop Transformation Theory - 10.5.2 Parallelization and Wavefronting - 10.5.3 Tiling and Localization - 10.54 Software Pipelining - 10.6 Bibliographic Notes and Exercises Contents xv
    • Chapter 11 Parallel Program Development and Environments - 11.1 Parallel Programming Environments - 11.1.1 Software Tools and Environments - 11.1.2 Y-MP, Paragon, and CM-5 Environments - 11.1.3 Visualization and Performance Tuning - 11.2 Synchronization and Multiprocessing Modes - 11.2.1 Principles of Synchronization - 11.2.2 Multiprocessor Execution Modes - 11.2.3 Multitasking on Cray Multiprocessors - 11.3 Shared-Variable Program Structures - 11.3.1 Locks for Protected Access - 11.3.2 Semaphores and Applications - 11.3.3 Monitors and Applications - 11.4 Message-Passing Program Development - 11.4.1 Distributing the Computation - 11.4.2 Synchronous Message Passing - 11.4.3 Asynchronous Message Passing
      • 11.5 Mapping Programs onto Multicomputers - 11.5.1 Domain Decomposition Techniques - 11.5.2 Control Decomposition Techniques - 11.5.3 Heterogeneous Processing ,.
      • 11.6 Bibliographic Notes and Exercises
  • Chapter 12 U N I X , Mach, and OSF/1 for Parallel C o m p u t e r s - 12.1 Multiprocessor UNIX Design Goals - 12.1.1 Conventional UNIX Limitations - 12.1.2 Compatibility and Portability - 12.1.3 Address Space and Load Balancing - 12.1.4 Parallel I/O and Network Services - 19 0 MMtpr.Sla.vP anH MnltithrPuHpH I7NIY fiI - 1 0 0 1 Mmrt.pr-S1n.vp K P T H P I S fiI - 12.2.2 Floating-Executive Kernels - 12.3 Multicomputer UNIX Extensions 12 2.3 Multithreaded UNIX KernH fiZS - 12.3-1 Message-Passing OS Models - 12.3.2 Cosmic Environment and Reactive Kernel - 12.3.3 Intel NX/2 Kernel and Extensions - 12.4 Mach/OS Kernel Architecture - 12.4.1 Mach/OS Kernel Functions - 12.4.2 Multithreaded Multitasking - 12.4.3 Message-Based Communications xvi Contents - 12.4.4 Virtual Memory Management ,
    • 12.5 O S F / 1 Architecture and Applications
      • 12.5.1 The OSF/1 Architecture
        • 12.5.2 The OSF/1 Programming Environment
        • 12.5.3 Improving Performance with Threads
    • 12.6 Bibliographic Notes and Exercises
  • Bibliography
  • I n d e x

Foreword

by Gordon Bell

Kai Hwang has introduced the issues in designing and using high performance parallel computers at a time when a plethora of scalable computers utilizing commodity microprocessors offer higher peak performance than traditional vector supercomputers. These new machines, their operating environments including the operating system and languages, and the programs to effectively utilize them are introducing more rapid changes for researchers, builders, and users than at any time in the history of computer structures.

For the first time since the introduction of Cray 1 vector processor in 1975, it may again be necessary to change and evolve the programming paradigm — provided that massively parallel computers can be shown to be xiseful outside of research on massive parallelism. Vector processors required modest data parallelism and these operations have been reflected either explicitly in Fortran programs or implicitly with the need to evolve Fortran (e.g.. Fortran 90) to build in vector operations.

So far, the main line of supercomputing as measured by the usage (hours, jobs, number of programs, program portability) has been the shared memory, vector multi- processor as pioneered by Cray Research. Fujitsu, IBM, Hitachi, and NEC all produce computers of this type. In 1993- the Cray C90 supercomputer delivers a peak of 16 billion floating-point operations per second (a Gigaflops) with 16 processors and costs about S30 million, providing roughly 500 floating-point operations per second per dollar.

In contrast, massively parallel computers introduced in the early 1990s are nearly all based on utilizing the same powerful, RISC-based. CMOS microprocessors that are used in workstations. These scalar processors provide a peak of =s 100 million floating- point operations per second and cost $20 thousand, providing an order of magnitude more peak per dollar (5000 flops per dollar). Unfortunately, to obtain peak power requires large-scale problems that can require 0 ( n 3 ) operations over supers, and this significantly increases the running time when peak power is the goal.

xvn

Preface

T h e Aims This book provides a comprehensive study of scalable and parallel computer ar- chitectures for achieving a proportional increase in performance with increasing system resources. System resources are scaled by the number of processors used, the mem- ory capacity enlarged, the access latency tolerated, the I/O bandwidth required, the performance level desired, etc. Scalable architectures delivering a sustained performance are desired in both se- quential and parallel computers. Parallel architecture has a higher potential to deliver scalable performance. The scalability varies with different architecfcure-algorithm com- binations. Both hardware and software issttes need to be studied in building scalable computer systems.

It is my intent to put the reader in a position to design scalable computer systems. Scalability is defined in a broader sense to reflect the interplay among architectures, algorithms, software, and environments. The integration between hardware and software is emphasized for building cost-effective computers. We should explore cutting-edge technologies in scalable parallel computing. Sys- tems architecture is thus studied with generality, scalability, programmability, and per- formability in mind. Since high technology changes so rapidly, I have presented the material in a generic manner, unbiased toward particular machine implementations. Representative proces- sors and systems are presented only if they contain important features which may last into the future. Every author faces the same dilemma in writing a technology-dependent book which may become obsolete quickly. To cope with the problem, frequent updates with newer editions become a necessity, and I plan to make revisions every few years in the future.

xix

XX Preface

The Contents

This book consists of twelve chapters divided into four parts covering theory, tech- nologyt architecture, and software aspects of parallel and vector computers as shown in the flowchart:

Electrical Engineering TVack

Part II: / Technology > (^) Chapter 5: Bus, Cache, Memory

Part HI:

Architectures

I Chapter 7; Multiprocessors, Multicomputer?

i

Computer Science Track

Chapter 1: Machine Models

I Chapter 3: Performance, Scalability

i

Parti: Theory

Chapter 4; Processor, Memory Hierarchy

I

[CS bypass paths)

Chapter 6: Pipelining, Superscalar

Chapter 8: Multi vector, SIMD Machines

I Chapter 9: Scalable, Multithreaded Arc hi lectures (^) (EE optional)

I

(EE optional)

JL

Chapter 10: Programming Models, Compilers

Chapter 11: Parallel Program Development

!

Part IV: Software

Chapter 12: Parallel UNIX

Readers' Guide

xxii Preface

The first four chapters should be taught to all disciplines. The three technology chapters are necessary for EE and CE students. The three architecture chapters can be selectively taught to CE and CS students, depending on the instructor's interest and the computing facilities available to teach the course. The three software chapters are written for CS students and are optional to EE students.

Five course outlines are suggested below for different audiences. The first three outlines are for 45-hour, one-semester courses. The last two outlines are for two-quarter courses in a sequence.

( 1 ) For a Computer Science course on Parallel Computers and Programming, the minimum coverage should include Chapters 1-4, 7, and 9-12. ( 2 ) For an exclusive Electrical Engineering course on Advanced Computer Architec- ture^ the minimum coverage should include Chapters 1-9. ( 3 ) For a joint CS and EE course on Parallel Processing Computer Systems, the minimum coverage should include Chapters 1-4 and 7-12. (4 ) Chapters 1 through 6 can be taught to a senior or first-year graduate course under the title Computer Architecture in one quarter (10 weeks / 30 hours). ( 5 ) Chapters 7 through 12 can be taught to a graduate course on Parallel Computer Architecture and Programming in a one-quarter course with course (4) as the prerequisite.

Instructors may wish to include some advanced research topics treated in Sections 1.4, 2.3, 3.4, 5.4, 6.2, 6.5, 7.2, 7.3, 8.3, 10.4, 11.1, 12.5, and selected sections from Chapter 9 in each of the above course options. The architecture chapters present four different families of commercially available computers. Instructors may choose to teach a subset of these machine families based on the accessibility of corresponding machines on campus or via a public network. Students are encouraged to learn through hands-on programming experience on parallel computers.

A Solutions Manual is available to instructors only from the Computer Science Editor, College Division, McGraw-Hill, Inc., 1221 Avenue of the Americas, New York, NY 10020. Answers to a few selected exercise problems are given at the end of the book.

T h e Prerequisites

This is an advanced text on computer architecture and parallel programming. The reader should have been exposed to some basic computer organization and program- ming courses at the undergraduate level. Some of the required background material can be found in Computer Architecture: A Quantitative Approach by John Hennessy and David Patterson (Morgan Kaufman, 1990) or in Machine and Assembly Language Programming by Arthur Gill (Prentice-Hall, 1978). Students should have some knowledge and experience in logic design, computer hardware, system operations, assembly languages, and Fortran or C programming. Be- cause of the emphasis on scalable architectures and the exploitation of parallelism in practical applications, readers will find it useful to have some background in probability, discrete mathematics, matrix algebra, and optimization theory.

Preface xxiii

Acknowledgments

I have tried to identify all sources of information in the bibliographic notes. As the subject area evolves rapidly, omissions are almost unavoidable. I apologize to those whose valuable work has not been included in this edition. I am responsible for all omissions and for any errors found in the book. Readers are encouraged to contact me directly regarding error correction or suggestions for future editions. The writing of this book was inspired, taught, or assisted by numerous scholars or specialists working in the area. I would like to thank each of them for intellectual exchanges, valuable suggestions, critical reviews, and technical assistance. First of all, I want to thank a number of my former and current Ph.D. students. Hwang-Cheng Wang has assisted me in producing the entire manuscript in lATjrjX. Be- sides, he has coauthored the Solutions Manual with Jung-Gen Wu, who visited USC dur- ing 1992. Weihua Mao has drawn almost all the figure illustrations using FrameMaker, based on my original sketches. I want to thank D.K. Panda, Joydeep Ghosh, Ahmed Louri, Dongseung Kim, Zhi-Wei Xu, Sugih Jamin, Chien-Ming Cheng, Santosh Rao, Shisheng Shang, Jih-Cheng Liu, Scott Toborg, Stanley Wang, and Myungho Lee for their assistance in collecting material, proofreading, and contributing some of the home- work problems. The errata from Teerapon Jungwiwattanaporn were also useful. The Index was compiled by H.C. Wang and J.G. Wu jointly.

I want to thank Gordon Bell for sharing his insights on supercomputing with me and for writing the Foreword to motivate my readers. John Hennessy and Anoop Gupta provided the Dash multiprocessor-related results from Stanford University. Charles Seitz has taught me through his work on Cosmic Cube, Mosaic, and multicomputers. From MIT, I received valuable inputs from the works of Charles Leiserson, William Dally, Anant Agarwal, and Rishiyur Nikhil. From University of Illinois, I received the Cedar and Perfect benchmark information from Pen Yew. Jack Dongarra of the University of Tennessee provided me the Linpack benchmark results. James Smith of Cray Research provided up-to-date information on the C- 90 clusters and on the Cray/MPP. Ken Miura provided the information on Fujitsu VPP500. Lionel Ni of Michigan State University helped me in the areas of performance laws and adaptive wormhole routing. Justin Ratter provided information on the Intel Delta and Paragon systems. Burton Smith provided information on the Tera computer development. Harold Stone and John Hayes suggested corrections and ways to improve the pre- sentation. H.C. Torng of Cornell University, Andrew Chien of University of Illinois, and Daniel Tobak of George-Mason University made useful suggestions. Among my colleagues at the University of Southern California, Jean-Luc Gaudiot, Michel Dubois, Rafael Saavedra, Monte Ung, and Viktor Prasanna have made concrete suggestions to improve the manuscript. I appreciatn the careful proofreading of an earlier version of the manuscript by D.K. Panda of the Ohio State University. The inputs from Vipm Kumar of the University of Minnesota, Xian-He Sun of NASA Langley Research Center, and Alok Choudhary of Syracuse University are also appreciated.

In addition to the above individuals, my understanding on computer architecture and parallel processing has been influenced by the works of David Kuck, Ken Kennedy,

ADVANCED COMPUTER ARCHITECTURE:

Parallelism, Scalability, Programmability

'