Results Table for Flow Time and Completion Time for Various Months | Lecture notes Communication

Analysis of First-Come-First-Serve Parallel Job Scheduling



Uwe Schwiegelshohn

Ramin Yahyapour

Abstract

This paper analyzes job scheduling for parallel computers by

using theoretical and experimental means. Based on exist-

ing architectures we rst presentamachine and a job model.

Then, we propose a simple on-line algorithm employing job

preemption without migration and derive theoretical bounds

for the performance of the algorithm. The algorithm is ex-

perimentally evaluated with trace data from a large comput-

ing facility. These experiments show that the algorithm is

highly sensitive on parameter selection and that substantial

performance improvements over existing (non-preemptive)

scheduling methods are possible.

1Introduction

Todays massively parallel computers are built to exe-

cute a large number of dierent and indep endent jobs

with a varying degree of parallelism. For reasons of ef-

ciency most of these architectures allow

space sharing

i.e. the concurrent execution of jobs with little paral-

lelism on disjoint node sets. This produces a complex

management task with the scheduling problem, i.e. the

assignment of jobs to nodes and time slots, being a cen-

tral part. Also, in a typical workload it can neither

be assumed that all jobs have the same properties nor

that there is a random distribution of jobs with dierent

properties, see e.g. Feitelson and Nitzberg 3].

As most scheduling problems have been shown to

be computationally hard, theoretical research has cen-

tered around proofs of NP-completeness, approximation

algorithms, and heuristic methods to obtain optimal so-

lutions. At the same time, the experimental evaluation

of various scheduling heuristics, the study of workload

characteristics, and consideration of architectural con-

straints have been the focus of applied researchinthis

area.

But the interaction between both groups has been

rather limited. For instance, so far very few algorithms

from the theoretical communityhave been implemented

within real schedulers. Similarly, heuristics used in real

parallel systems have rarely b een subject of a theoreti-

cal analysis. Various reasons are frequently cited to be

responsible for the lackofinteraction between both com-



Supported by a grant from the NRW Metacomputing project

Computer Engineering Institute, University Dortmund, 44221

Dortmund, Germany,uwe@ds.e-technik.uni-dortmund.de

Computer Engineering Institute, University Dortmund, 44221

Dortmund, Germanyyahya@ds.e-technik.uni-dortmund.de

munities. For instance, designers of commercial sched-

ulers do not care much ab out approximation factors.

For them, deviations of factor 5 or factor 100 from an

optimal solution will usually both be unacceptable for

real workloads. Also, applied researchers often claim

that the models and optimality criteria used in theoret-

ical research rarely match the restrictions of manyreal

life situations. On the other hand, as the worst case be-

havior of those heuristics typically found in commercial

schedulers is usually quite bad with respect to the crite-

ria often used in theoretical research, these algorithms

are of little interest to theoreticians.

In our paper wewant to demonstrate that there are

algorithmic issues in job scheduling where theoretical

and applied research can both contribute to a solution.

To this end we discuss

First-come-rst-serve

(FCFS)

scheduling, a simple job scheduling method which can

often be found in real systems but is usually consid-

ered inadequate by many researchers. First, we derive

a job scheduling model based on the IBM RS/6000 SP2

as described by Hotovy 7] and address scheduling ob-

jectives. Then weshow that for manyworkloads good

performance can be exp ected from an FCFS schedule.

Moreover, bad utilization of the parallel computer can

be prevented byintro ducing gang scheduling into FCFS.

We also demonstrate that the term fairness can be trans-

formed into a specic selection of job weights. Using this

weight selection we can prove the rst constant com-

petitive factor for general on-line weighted completion

time scheduling where submission times and job exe-

cution times are unknown. Finally, we use simulation

experiments with real workloads to determine good and

bad strategies for FCFS scheduling with preemption. As

the performance of the strategies is highly dependenton

the workload, we conclude that an adaptivescheduling

strategy may produce the best results in job scheduling

for parallel computers.

2 The Model

Our model is based on the IBM RS/6000 SP parallel

computer. We assume a massively parallel computer

consisting of

identical nodes. Each node contains

one or more processors, main memory, and local hard

disks while there is no shared memory. Equal access

from all nodes to mass storage is provided. Fast

Results Table for Flow Time and Completion Time for Various Months, Lecture notes of Communication

Related documents

Partial preview of the text

Download Results Table for Flow Time and Completion Time for Various Months and more Lecture notes Communication in PDF only on Docsity!

Analysis of FirstComeFirstServe Parallel Job Scheduling

Uwe Schwiegelshohny^ Ramin Yahyap ourz

X

X

X

X

X

X

X

X

X

X

P

App endix