Libra: An Economy-Driven Cluster Scheduler Design Documentation

Sproj3
Libra: An Economy-Driven Cluster

Scheduler
Design Documentation
Version <1.0>
Libra: An Economy-Driven Cluster Scheduler Version: <1.0>
Design Documentation Date: <23/Dec/01>
First draft
Revision History
Date Version Description People
<28/Jan/02> <1.0> First draft Project Owner and
Client: Rajkumar
Buyya
Faculty Advisor: Dr.
Arif Zaman
Project Group:
Jahanzeb Sherwani,
Nosheen Ali,
Nausheen Lotia,
Zahra Hayat
Confidential Sproj3, 2001 Page 2

First draft
Table of Contents
1. Introduction 4
2. Design 5
3. Architecture 6
4. Interfaces 7
4.1 QMonitor 7
4.2 QMonitor Job Control (Pending Jobs) 8
4.3 QMonitor Job Control (Finished Jobs) 9
4.4 Submit Job 10
4.5 Queue Control 11
4.6 Queue Configuration: Modify 12
4.7 Host Configuration (Administration Host) 13
4.8 Complex Configuration 14
4.9 Cluster Configuration 15
4.10 Scheduler Configuration 16
5. Data Definitions 17
6. Architectural and Procedural Design 22

6.1 Job Input Controller 24
6.2 Job Acceptance Controller 25
6.3 Job Initialization Controller 26
6.4 Job Assignment Controller 27
6.5 Execution Host and Queue Determination Controller 28
6.6 Job Execution Controller 29
6.7 Job Query Controller 30
6.8 Job Modification Controller 31
7. Program Design Language (Low-Level Design) 32

7.1 Log User In and Input Job 32
7.2 Accept / Reject Job 33
7.3 Compute Job finish Time 34
7.4 Determine Execution Host and Queue 34
7.5 Schedule according to deadline and budget 35
7.6 Implement proportional scheduling model according to allocated tickets 36
7.7 Select Job (according to stride scheduling algorithm) 38
7.8 Query Job 39
7.9 Delete Job 40
7.10 Change Job Details 41
8. References 42

First draft
Design Documentation
1. Introduction
This document contains the high-level and low-level design specifications for the Software
Requirements Specifications (SRS) submitted on 23 rd December 2001. Additionally, the software
architecture, and user interfaces for each of the deliverables are described as well.

First draft
2. Design
For the high-level design, a call-return architecture has been specified using structure charts. For
procedural design, the high-level design modules have been explained using PDL.

First draft
3. Architecture
The Libra Scheduler architecture is as follows:
The fileserver stores the
workspaces of all processes, as
well as the binary executables
of the scheduler itself.
Communications daemons
(commd) are running on each of
the host on a specified TCP port
(786), and form the means of
communication between hosts.
Execution hosts contain
execution queues, each with their own scheduling policies, on which jobs are sent to execute.
The master host contains the central queue managing daemon, qmaster, and the scheduling
daemon, which makes the actual scheduling decisions regarding which job is to be assigned to
which execution queue. Submit hosts are the entry-points for the users. A submit host submits
jobs using the qsub command, which submits the job along with its corresponding details to the
master host. The master host summons the scheduler, which has access to the Queue State Table
which in turn contains information on each of the execution hosts. Based on the scheduling
algorithm implemented via Libra, the scheduler will choose an execution queue, which the
qmaster will subsequently dispatch the job to.

First draft
4. Interfaces
4.1 QMonitor
This is the central control screen for the Sun Grid Engine within which the Libra Scheduler is to
be implemented. From here, the options (from top left to bottom right) are:
1. Job Control
2. Queue Control
3. Job Submission
4. Complexes Configuration
5. Host Configuration
6. Cluster Scheduler
7. Scheduler Configuration
8. Calendar Configuration
9. User Configuration
10. Parallel Environment Configuration
11. Checkpoint Configuration
12. Browser
13. Exit

First draft
4.2 QMonitor Job Control (Pending Jobs)
Once a job is submitted, it goes to the master host which places it in the pending jobs queue.
Once there, its ID, Priority, Name, Owner, Status and Queue (to which it will be sent) are stored
until the job is actually dispatched to the queue. Every job initially goes to the pending queue
while the scheduling daemon runs the selected scheduling algorithm to determine the execution
queue to which the job is to proceed. Once this is done, the job goes into the Running Jobs tab
(not shown in the screen images), after which it ends up in the finished jobs queue.

First draft
4.3 QMonitor Job Control (Finished Jobs)
The jobs in the finished jobs queue are those that have been scheduled, executed, and completed.
It is essentially a list of all the hitherto completed jobs that have been submitted and outputs
returned.
While job status may be viewed from the Job Control window, new jobs may be submitted by
clicking on the Submit button, which opens up the Job Submit dialog window.

First draft
4.4 Submit Job
Jobs are submitted using this dialog window. The job script specifies the location of the file
containing the actual executable commands (for example, sleeper.sh). Job args takes in the
arguments for the job, including deadline, budget, and expected exeuction time. Output may be
directed to different files, and is specified in the stdout and stderr input boxes. Only those hosts
that are specified submit hosts on the master host have privileges to submit jobs, however.

First draft
4.5 Queue Control
This window lists the currently configured queues for the execution hosts in the cluster. In the
above example, execution hosts mspc29 and mspc36 each have a queue (called q) running on
each of them, and each of these queues has 0 out of a maximum of 1 job running on them.
Further, more queues may be added to the specified execution hosts, and existing queues may be
modified or deleted.

First draft
4.6 Queue Configuration: Modify
This is the window from which queue configurations may be changed. For instance, different
load/suspend thresholds, which specify the ceiling of maximum allowed load for the queue,
beyond which it will begin to suspend jobs running on it till the threshold is satisfied. Also,
complexes may be attached or detached from the queue, making queues cater to different
resource requirements (in our case, deadline and budget). Checkpointing (the process of saving
the current status of any job, for later restarting, or for job migration) can also be configured for
each queue individually, to set the conditions under which checkpointing and/or job migration
occur. User access specifies which users/hosts are to be given access, and which are to be denied.
Subordinates includes those queues that are to follow suit with whatever instructions are sent to
this queue; for instance, if a queue is suspended, then its subordinates will be suspended as well.
Limits specifies the hard or soft limits imposed on jobs running on the queue (for example, no
job may take more than 10 minutes of CPU time, otherwise it is to be suspended or killed).

First draft
4.7 Host Configuration (Administration Host)
In this window, hosts are configured. The three type of hosts recognized are Administration,
Submit, and Execution hosts. Administration hosts are those hosts that are allowed to view and
edit the configuration for the cluster as a whole, for hosts individually, for queues within specific
hosts, as well as for jobs running on specific queues. Submit hosts are those that are allowed to
submit jobs into the cluster. Execution hosts are those that have at least one queue set up where
jobs scheduled by the cluster may be sent to actually execute.

First draft
4.8 Complex Configuration
Complexes are entities representing some aspect of the cluster that the cluster administrator
defines. For instance, CPU usage, Virtual Memory usage, Disk Storage usage, are all complexes
that the administrator can define and then attach onto specific queues or hosts (as shown in the
queue configuration modification screenshot). By attaching a complex to a queue, and defining
an associated limit for that complex, the administrator can effectively define a policy for the
queue. For instance, one queue may be configured such that no process that requires more than
10 Mb of disk space may be run. Alternatively, another queue may be defined such that the total
amount of memory used by processes does not exceed the physical RAM on the host, such that
no virtual memory is used, thus speeding execution times. In our case, a complex defining user
budget, and another defining user deadline can both be used concurrently to define the behavior
for a queue that can effectively ensure that user deadlines and budgets are accounted for during
queue execution. This can act as a sentinel in case the scheduling algorithm inaccurately
calculates the scheduling policy for the cluster.

First draft
4.9 Cluster Configuration
This window allows the administrator to define the local policies for hosts in the cluster as well
as the global policies for the entire cluster as a whole. Global policies are defined which are
inherited by hosts, and in turn inherited by queues in those hosts. Alternatively, hosts may define
their individual policies that override the global policies as required. Finally, policies may also be
defined for individual queues that override both the global as well as host-level policies. Thus,
there is a great degree of customizability for the specific needs of the cluster.

First draft
4.10 Scheduler Configuration
This is the window from which the scheduling algorithm (as described by the high-level design
diagrams) is selected for the master hosts scheduling daemon. This is effectively the master
scheduling policy for the cluster, which selects the hosts and queues that are to be allocated to
each job. This algorithm effectively schedules jobs based on the user constraints of deadline and
budget, within which it maximizes as much as possible system-centric constraints such as CPU
utilization, etc.

First draft
5. Data Definitions
Name: Cluster Information stored in Cluster State Table

Aliases: Updated Cluster Information
Where used/how used: Schedule Job (input)
Accept/Reject Job (input)
Content Description: Cluster Information = CPU Load + Node Status + Remaining
Time of Pending Jobs + Available Memory
CPU Load = numeric
Node Status = [full | overloaded | partially-full]
full = string
overloaded = string
partially-full = string
Remaining Time of Pending Jobs = (hours) + (minutes) +

(seconds)
Hours = double
Minutes = double
Seconds = double
Available Memory = numeric
Supplementary Information: This is the information maintained by the master host. It is

collected periodically from all the execution hosts and also
communicated to the scheduler to enable it to make its scheduling
decisions.
Name: Job Details

Aliases: Job Parameters, Job Specifications
Where used/how used: Get Job Details (output)
Schedule Job (output)
Accept/Reject Job (input)
Determine Execution Host (input)
Implement Scheduling Policy (input)

First draft
Content Description: Job Details = Job ID + Job Name + Job Type + Standalone
Execution Time + Location of Executable and Input Data Sets +
Location of Output + System Type + Budget + Deadline
Job ID = integer
Job Name = string
Job Type = [Sequential Job | Embarrassingly Parallel Job]
Sequential Job = char
Embarrassingly Parallel Job = char
Job Category = [Deadline | Budget | Deadline and Budget]
This explains what is provided as the criterion for job
Job Start Time = integer
Job Submit Time = integer
Standalone Execution Time = Job Duration
Job Duration = (hours) + (minutes) + (seconds)
Hours = double
Minutes = double
Seconds = double
Location of Execution and Input Data Sets = string
Location of Output = string
System Type = string
Budget = rupees
Rupees = double
Deadline = (hours) + (minutes) + (seconds)
Hours = double
Minutes = double
Seconds = double
Supplementary Information: none

First draft
Name: Scheduling Information

Aliases: Scheduling Policy Implementation
Where used/how used: Schedule Job (output)
Execute Job (input)
Implement Scheduling Policy (output)
Content Description: Scheduling Information = Tickets + Stride + Pass
Tickets = integer
Stride = integer
Pass = integer
Supplementary Information: Tickets represent the clients resource allocation, relative to other
clients.
Stride represents the interval between selection for execution,
measured in passes, and is the inverse of tickets.
Pass represents the virtual time index for the clients next selection.
Name: Job Results

Aliases: Job Output
Where used/how used: Execute Job (output)
Content Description: Job Results = (string) + (char) + (numeric)
Supplementary Information: These are the results of the execution of the job, and are
communicated to the user on the submit host from the execution
host via the master host.
Name: Required Budget/Deadline

Aliases: none
Where used/how used: Accept/Reject Job (output)
Content Description: Required Budget/Deadline = Deadline + Budget
Budget = rupees
Rupees = double
Hours = double
Minutes = double
Seconds = double
Supplementary Information: This data object represents the response of the system to the
submission of a job that cannot be completed with the requested
parameters. The system then notifies the user, and demands a
different budget and/or deadline.
Name: Chosen Execution Host

First draft
Aliases: Chosen Execution Node

Where used/how used: Determine Execution Host (output)
Dispatch Job to Execution Host (input)
Dispatch Job to Execution Host (output)
Update Cluster Status (input)
Content Description: Chosen Execution Host = (Host Name) + Host IP Address
Host Name = string
Host IP Address = numeric
Supplementary Information: This is decided by the scheduler, based on the Cluster Information
available to it.
Name: Chosen Execution Queue

Aliases: none
Where used/how used: Determine Execution Host (output)
Dispatch Job to Execution Host (input)
Update Cluster Status (input)
Content Description: Chosen Execution Queue = (Queue Name) + Queue Number
Queue Name = string
Queue Number = numeric
Supplementary Information: This is decided by the scheduler, based on the Cluster Information
available to it.
Name: Scheduled Job

Aliases: Dispatched Job
Where used/how used: Schedule Job (output)
Execute Job: Stride Scheduling (input)
Content Description: Scheduled Job = Job Details + Chosen Execution Host + Chosen
Execution Queue
Job Details = Job ID + Job Name + Job Type + Standalone
Execution Time + Location of Executable and Input Data Sets +
Location of Output + System Type + Budget + Deadline
Job ID = integer
Job Name = string
Job Type = [Sequential Job | Embarrassingly Parallel Job]
Sequential Job = char

First draft
Embarrassingly Parallel Job = char

Standalone Execution Time = Job Duration
Job Duration = (hours) + (minutes) + (seconds)
Hours = double
Minutes = double
Seconds = double
Location of Execution and Input Data Sets = string
Location of Output = string
System Type = string
Budget = rupees
Rupees = double
Hours = double
Minutes = double
Seconds = double
Chosen Execution Host = (Host Name) + Host IP Address
Host Name = string
Host IP Address = numeric
Chosen Execution Queue = (Queue Name) + Queue Number
Queue Name = string
Queue Number = numeric
Supplementary Information: This data object represents a job once it has been accepted by the
system, and its execution node and execution queue.

First draft
6. Architectural and Procedural Design
Libra Scheduler
up
da
l t ed
na clu
sig
sche uled job
ste
sche ling info,
d
sche
ng
e
pt rs
job
lis ndi
ce ta
dulin
ac tu
ils
d
job d pe
duled
s
job
eta
t
g info
du
ls,
te
job d
tai
da
sche
e
up
d
,
job
Job Submission Job Initialization Job Assignment Job Execution Job Querying Job Modification
Controller Controller Controller Controller Controller Controller

First draft
Job Submission
Controller
job sig
s
ail
ac nal
t
de
ce
pte
job
job
d
de
tai
ls
Job Input Job Acceptance
Controller Controller
Job Assignment
Controller
up
da
ils
te
ta
d
de
job details,
clu
jo
b
st
job
de
er
ta
st
i
at
ls
s u
Execution Host
Scheduling and Queue Cluster Update
Controller Determination Controller
Controller
Job Execution
Control
led fo,
job
edu g in
sch dulin
sel
job
e
ect
ed
sch
ed
ect
job
sel
Quantum
Job Selection
Allocation
Controller
Controller

First draft
6.1 Job Input Controller
Job Input
Controller
word
job details
, pass u
user id job ser id
user id
de ,
Log User In tail
s
Forward Job
au s
Details To Master
th ign
d
en a
ri Host
tic l
e
us
at
Get Job
password
io
n
Parameters
d
ri
us ssw
e
us
pa
er or
id d
,
job
veri tails
id
Get User
tails
de
fied
Get User ID Authenticate User
r
use
Password
deta
job
de
ils
job
Input Job Authenticate Job

Parameters Paramters

First draft
6.2 Job Acceptance Controller
Job Acceptance us
Controller er
ac
ce
pte
rces
resou ds
ilable ign
job id, available

a va job
job accepted
al, al
y sign ac
resources
te r read ce
signal
clu s pte
d sig
Assess Cluster na
State l
Inform User
ava urces
Assess Job Finish

res
d
loa
Time
ilab
o
e
eu
le
qu
available resources
job finish time

Query Cluster Query Cluster
c, t, e, b, d,
wi
State Table Resource List
th
in
id
de
job ob d
job
ad
lin
d
j
fin ead
b,
e
ih lin
st e
e,
im
t,
c = category (deadline only, budget only,

c,
e,
deadline and budget)
t = type (sequential, emb parallel) Compare Job
e = execution time Retrieve Relevant Compute Job
Finish Time to Job
b = budget Job Details Finish TIme
Deadline
d = deadline

First draft
6.3 Job Initialization Controller
Job Initialization
Controller
up
da job
te lis
ils
d
ta
pe t
de
job
nd
job
ing
de
ta
ils
Retrieve Job
Set Job Details
Details
up
da job
ted lis
pe t
nd
job
id
ing
id,
job
de
ta
ils
Update Pending
Assign Job Id
Job List

First draft
6.4 Job Assignment Controller
upd
ate
Job Assignment d
sta clus
Controller tus ter
fo job
in c ho det a
ling
chose scheduled
e du hos sen e ils,
sch t, q xec
ue u
n exe
e
job de
c hos job
Scheduling Cluster Update
tails,
r info
Controller Controller
t and
clu
cluste
queue
ster in
sche
ls
job led
eue exec
,
etai
fo
dulin
job d
list
edu
d
host chosen
chos b deta
g
job
sch
etail
Execution Host
info
exec queue,
en e
jo
, qu
and Queue
job details
s
Get Cluster Info

,
Determination tails
xec
Retrieve Calculate Controller
e
ils
Scheduled Scheduling
hos
job d
Job LIst Information
t,
Reserve
Update Exec Update
ca
allocated tickets
Resources
Host Queue Cluster
lcu
on Exec
Status State Table
lat
ils
stride
Host
ed
eta
str
p
d
ide
tick ated
ass
job
ets
c
allo
Allocate Calculate Calculate

Tickets Stride Pass

First draft
6.5 Execution Host and Queue Determination Controller
Execution Host
and Queue
Determination
Controller
ch
min loaded host
os
sc
sorted host load list
en
he
du
se
led
lec
te
job
d
ho
st,
qu
eu
selec
o
inf
e
list
job d xec hos
selec
ter
list
ted e
load
s
list
clu
etails
load
ted q
ad
hos t
l s,
t lo
tai
hos t
ueue
s
de
d
ho
sorte
job
Select
Gather Host Sort Hosts Choose Min Set Submit
Appropriate
Load Info by Load Loaded Host Time
Queue
updated scheduled job

scheduled job
ails ost,
sele
job exec h
cte
que
det
fo
d
cted
r
y in
que
y in
r
sele
ue
fo
que
Query Determine
Dispatch
Queue State Queue Slot
Job
Table Availability

First draft
6.6 Job Execution Controller
Job Execution
Control
sele
job sche cted jo
cted dulin b,
sele g in
fo Quantum
Job Selection
Allocation
Controller
Controller
g
se
list utin
job
lec
scheduling info
scheduling info
job exec
sorted job list
selected job,
t ed
so
ted
updated
rt
job list
lec
job
ed
y
ntl
se
se
j
ob
rre
lec
cu
list
ted
job
Insert Job in
Get Currently Calculate and
Sort Job List by Select Minimum Allot Quantum/s Currently
Executing Job Set New Pass
Pass Value Pass Value Job to Job and Run Executing Jobs
List Value
List
sc
nf o
pass, stride
up uling
new
he
gi
ne pass value
da
d
lin
ted info
du
pa
id e
he
ss
str
sc
va
,
ss
lue
pa
Get Present Pass Add Stride to Set New Pass

and Stride Present Pass Value
Values Value

First draft
6.7 Job Query Controller
Job Querying
Controller
sc
scheduled job
he
d
details
de ule
job id
ta d
ils j o
b
Get Info from User Get Job Details Display Job Details
ssw ,
ord
pa er id
sche tails
job
us
us swor
pa
id
er
de
dule
s
job id
e
id,
choic
d job
d
,
choic
e
Log User In Get Job ID
Retrieve Details
Display Options Based on User's
auth ignal
Choise
entic
s
job id
ation
job id
Prompt User for Authenticate Job

Job ID ID

First draft
6.8 Job Modification Controller
Job Modification
Controller
cluster status
updated
id
job id
job
rd,
asswo
, p
r id
use
Display Change
Get Info from User Job/Delete Job
Option
id
job
s
ta tu job
us ter s id
d cl
up date
Delete Job Change Job

up
da
authentication
te
choice
ls e,
d
ex jo
u
host, queue
job id, exec
clu
ec b
de que
id
signal
st
ho de
job
er
tai
job ost,
st tail
ce
st
,q s
jo
at
oi
b
h
ue
us
ch
id
ec
ue
ex
Determine Job Display

Delete Job Update Cluster Changeable
Execution Host Authenticate Update Job
From Queue Status Parameters
and Queue Choice Info
and Get User's
Choice
exe job de
que host,
ch
scheduled
job details
ue
ost ails
c
job id
exe
, qu
t
eue
,
Get Update Exec Update

Scheduled Host Queue Cluster
Job Details Status State Table

First draft
7. Program Design Language (Low-Level Design)
7.1 Log User In and Input Job
Output: userID
UserID = getUserIdFromUser
Password = getPasswordFromUser
If UserID and Password match UserID and Password in database
Send authentication signal
Request Job details
Verify Job Details
Assign Job Id and Initialize Job
Else
Display error and ask for UserID and Password again
Return userID

First draft
7.2 Accept / Reject Job
Input: jobDetails
Output: jobAcceptedSignal
qLoadList = clusterStateTable.QueueList
clusRes = clusterResourceList
send clusterReady signal
c = jobDetails.category
t = jobDetails.type
e = jobDetails.execTime
b = jobDetails.budget
d = jobDetails.deadline
Loop through qLoadList
If current.rate <= b and current.type = t
finTime = call procedure computeJobFinishTime(current.load, e)
If fineTime <= deadline
Return accept job signal
Return deny job signal

First draft
7.3 Compute Job finish Time
Input: queueLoad, jobExecTime
Output: finishTime
Loop through the queueLoad timeSlots until jobExecTime is zero
If currentTimeSlot is free
jobExecTime = jobExecTime - size of currentTimeSlot
Return currentTimeSlot
7.4 Determine Execution Host and Queue
Input: jobDetails, ClusterInfo
Output: chosenExecQueue, chosenHost
HostLoadList = List of the Loads on each Host
Set initial load as minimum
Loop though list of loads
If current load < minimum so far
minimum = current number
chosenHost = HostLoadList(minimum)
queueList = QueueStateTable [HostLoadList(chosenHost)]
Set first queue as chosenExecQueue

First draft
Loop through the queuelist
If there exist slots in current queue that fulfill jobdetails.execTime
choose current queue as chosenExecQueue
ScheduledJobDetails.startTime = current time
Return (chosenExecQueue,chosenHost)
7.5 Schedule according to deadline and budget

Algorithm:
M - Resources, N - Jobs, D - deadline
Note: Cost of any Ri is less than any of Ri+1 Or Rm
RL: Resource List need to be maintained in increasing order of cost
Ct - Time when accessed (Time now)
Ti - Job runtime (average) on Resource i (Ri) [updated periodically]
Ti is acts as a load profiling parameter.
Ai - number of jobs assigned to Ri , where:
Ai = Min (No.of Unassigned Jobs, No. Jobs Ri can complete by remaining deadline)
No.UnAssignedJobsi = Diff( N, (A1++Ai-1))
JobsRi consume = RemainingTime (D- Ct) DIV Ti
ALG: Invoke Job Assignment() periodically until all jobs done.
Job Assignment():
Establish ( RL, Ct , Ti , Ai ) dynamically.
For all resources (I = 1 to M) { Assign Ai Jobs to Ri , if required }

First draft
7.6 Implement proportional scheduling model according to allocated tickets

We are using the stride scheduling algorithm to allocate quantums between jobs, to implement
their priorities according to their budget and deadline. The tickets represent their priority and are
determined according to prioritizing sorted lists of jobs according to budget and sorted lists
according to deadline, and determining their combined priorities to figure out the eventual
priorities of these jobs.
The Algorithm is as follows:
/* per-client state */
typedef struct {

int tickets, stride, pass, remain;
} *client t;
/* quantum in real time units (e.g. 1M cycles) */

const int quantum = (1 << 20);
/* large integer stride constant (e.g. 1M) */

const int stride1 = (1 << 20);
/* current resource owner */

client t current;
/* global aggregate tickets, stride, pass */

int global tickets, global stride, global pass;
/* update global pass based on elapsed real time */

void global pass update(void)
{
static int last update = 0;
int elapsed;
/* compute elapsed time, advance last update */

elapsed = time() - last update;
last update += elapsed;
/* advance global pass by quantum-adjusted stride */

global pass +=
(global stride * elapsed) / quantum;
}

First draft
/* update global tickets and stride to reflect change */

void global tickets update(int delta)
{
global tickets += delta;
global stride = stride1 / global tickets;
}
/* initialize client with specified allocation */

void client init(client t c, int tickets)
{
/* stride is inverse of tickets, whole stride remains */
c->tickets = tickets;
c->stride = stride1 / tickets;
c->remain = c->stride;
}
/* join competition for resource */

void client join(client t c, queue t q)
{
/* compute pass for next allocation */
global pass update();
c->pass = global_pass + c->remain;
/* add to queue */
global tickets update(c->tickets);
queue insert(q, c);
}
/* leave competition for resource */

void client leave(client t c, queue t q)
{
/* compute remainder of current stride */
global pass update();
c->remain = c->pass - global_pass;
/* remove from queue */

global tickets update(-c->tickets);
queue remove(q, c);
}

First draft
/* proportional-share resource allocation */

void allocate(queue t q)
{
int elapsed;
/* select client with minimum pass value */
current = queue remove min(q);
/* use resource, measuring elapsed real time */

elapsed = use resource(current);
/* compute next pass using quantum-adjusted stride */
current->pass +=
(current->stride * elapsed) / quantum;
queue insert(q, current);
}
7.7 Select Job (according to stride scheduling algorithm)
Output: selectedJob
currJobList = currentlyExecutingJobList
Set first job as minimum
Loop through currJobList
If currentJob.pass < minimum.pass
minimum = currentJob
selectedJob = currJobList(minimum)
Return selectedJob

First draft
7.8 Query Job

Output: Required queries displayed on screen
userID = getUserIdFromUser
password = GetPasswordFromUser
If UserID and Password match UserID and Password in database
Else display error and ask for UserID and Password again
jobID = getJobIDFromUser
If jobID and UserID tally
Else display error message and ask user for jobID again
Display job attirbutes that user may view
If remainingTime is selected
call procedure calculateRemainingTime(jobID)
If startTime is selected
display jobDetails.startTime
If budget is selected
display jobDetails.budget
If Deadline is selected
display jobDetails.deadline
If Execution Host and Queue is selected
display jobDetails.execHost and jobDetails.execQueue

First draft
7.9 Delete Job
Input: jobID
Output: updatedClusterStatus
eHost = jobDetails.execHost
eQueue = jobDetails.execQueue
eQueue.delete(jobID)
clusterStateTable.host(eHost).resources -= calculateRemainingTime(jobID)

First draft
7.10 Change Job Details
Input: jobID, choice
If choice = name
newName = readNewNameFromUser()
If newName is blank
Display error message and ask user to enter new name again
Else
jobDetails.name = newName
If choice = location of execution and input data sets
currJobList = currentlyExecutingJobList
Loop through currJobList
If currentJob.jobID = jobID
Display error message
Return
newLoc = readNewLocationFromUser()
jobDetails.inputLocation = newLoc
If choice = location of output
newLoc = readNewLocationFromUser()
jobDetails.outputLocation=newLoc

First draft
8. References
[1] R. Buyya, D. Abramson, and J. Giddy, Nimrod/G: An Architecture for a Resource
Management and Scheduling System in a Global Computational Grid, HPC ASIA2000, China,
IEEE CS Press, USA, 2000.
[2] R. Buyya, D. Abramson, J. Giddy, An Economy Driven Resource Management Architecture
for Global Computational Power Grids, The 2000 International Conference on Parallel and
Distributed Processing Techniques and Applications (PDPTA 2000), Las Vegas, USA, June 26-
29, 2000.
[3] R. Buyya, D. Abramson, and J. Giddy, An Economy Grid Architecture for Service-Oriented
Grid Computing, 10th IEEE International Heterogeneous Computing Workshop (HCW 2001),
with IPDPS 2001, SF, California, USA, April 2001.
[4] Rajkumar Buyya, Heinz Stockinger, Jonathan Giddy, and David Abramson, Economic
Models for Management of Resources in Peer-to-Peer and Grid Computing, Monash University.
[5] B.N. Chun and D.E. Culler, Market-based proportional resource sharing for clusters.
Submitted for publication, September 1999.
[6] Sun Microsystems, Inc., Sun Grid Engine 5.2.3 Manual. July 2001

Libra: An Economy-Driven Cluster Scheduler Design Documentation

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Libra: An Economy-Driven Cluster Scheduler Design Documentation

Diunggah oleh

Hak Cipta:

Format Tersedia

Sproj3

Libra: An Economy-Driven Cluster

Confidential Sproj3, 2001 Page 2

6. Architectural and Procedural Design 22

7. Program Design Language (Low-Level Design) 32

Confidential Sproj3, 2001 Page 3

Requirements Specifications (SRS) submitted on 23 rd December 2001. Additionally, the software

Confidential Sproj3, 2001 Page 4

Confidential Sproj3, 2001 Page 5

The Libra Scheduler architecture is as follows:

The fileserver stores the

workspaces of all processes, as

well as the binary executables

of the scheduler itself.

(commd) are running on each of

the host on a specified TCP port

(786), and form the means of

communication between hosts.

Execution hosts contain

qmaster will subsequently dispatch the job to.

Confidential Sproj3, 2001 Page 6

10. Parallel Environment Configuration

11. Checkpoint Configuration

Confidential Sproj3, 2001 Page 7

4.2 QMonitor Job Control (Pending Jobs)

Confidential Sproj3, 2001 Page 8

4.3 QMonitor Job Control (Finished Jobs)

Confidential Sproj3, 2001 Page 9

4.4 Submit Job

Confidential Sproj3, 2001 Page 10

4.5 Queue Control

Confidential Sproj3, 2001 Page 11

4.6 Queue Configuration: Modify

Confidential Sproj3, 2001 Page 12

4.7 Host Configuration (Administration Host)

jobs scheduled by the cluster may be sent to actually execute.

Confidential Sproj3, 2001 Page 13

4.8 Complex Configuration

queue configuration modification screenshot). By attaching a complex to a queue, and defining

calculates the scheduling policy for the cluster.

Confidential Sproj3, 2001 Page 14

4.9 Cluster Configuration

Confidential Sproj3, 2001 Page 15

4.10 Scheduler Configuration

Confidential Sproj3, 2001 Page 16

Name: Cluster Information stored in Cluster State Table

CPU Load = numeric

Node Status = [full | overloaded | partially-full]

Remaining Time of Pending Jobs = (hours) + (minutes) +

Available Memory = numeric

Supplementary Information: This is the information maintained by the master host. It is

Name: Job Details

Confidential Sproj3, 2001 Page 17

Confidential Sproj3, 2001 Page 18

Name: Scheduling Information

Name: Job Results

Name: Required Budget/Deadline

Name: Chosen Execution Host

Confidential Sproj3, 2001 Page 19

Aliases: Chosen Execution Node

Host Name = string

Host IP Address = numeric

Name: Chosen Execution Queue