From Bovet and Cesati
SWAP: A Scheduler With Automatic
Process Dependency Detection
Maybe: Review Threads
User-level vs. Kernel level threads
Virtual-Time Round Robin:
An O(1) Proportional Share Scheduler
Nieh, Vaill, Zhong
History of Proportional Share Schedulers
Weighted RR (WRR) – ancient, Simple O(1)
Run all clients with same frequency but adjust size of their time
After a client has received its service, its well ahead.
Fair-share based on controlling priority values
Fast, limited in accuracy
Starting in late 1980s: Fair Queuing
Better proportional sharing accuracy
Time to select a client for execution grows with the number of
Many linear time, some logarithmic.
Scheduling overhead can hit 20% of system resources for large
numbers of clients.
A,B,C:3,2,1. Error can be -1 to 1.5
3000:2000:1000, error -1000 to 1500.
Fair Share: Hard to analyze.Likley big
Fair Queueing: Error would be small for
both examples above.
Proportional Share Scheduler
Virtual time of a client is a measure of the degree to
which a client has received its proportional allocation
relative to other clients.
When a client executes, its virtual time advances at a
rate inversely proportional to the client’s share;
specifically virtual time of a client A at time t is ratio of
Given client’s virtual time, client’s virtual finishing time
(VFT) defined as virtual time client would have after
executing for one time quantum.
WFQ: Schedule client with smallest VFT.
Applied as stride scheduling.
Combines benefit of low overhead
round-robin scheduling with high-
accuracy mechanisms of virtual time and
virtual finishing time.
1. Order clients in run queue from largest to smallest share. Unlike
fair queueing, a client’s position on the run queue only changes
when its share changes, an infrequent event, and not on each
2. Starting from the beginning of the run queue, run each client for
one time quantum in a round-robin manner. VTRR uses the fixed
ordering property of RR in order to choose in constant time
which client to run.
3. In Step 2, if a client has received more than its proportional
allocation, skip the remaining clients in the run queue and start
running clients from the beginning of the run queue again.Since
the clients with the larger share values are placed first in the
queue, this allows them to get more service than the lower-share
clients at the end of the queue.
State Maintained for VTRR
Share (resource rights)
Tracks the number of time quanta client must receive
before the period over and perfect fairness is
Implementation is easy – change < 100 lines of
LINUX Kernel code.
VTRR has fairness properties that are
much better than WRR and nearly as good
Figure 3: Its O(1).
User-Level vs. Kernel-Level
Two broad categories.
User-level: all of the work of thread
management done by user. No magic
hooks into secret kernel routines.
Advantages of ULT
Thread creation, scheduling switching
does not require kernel mode privileges.
Saves overhead of mode switch, no use of
kernel resources – can be fast and cheap.
Scheduling can be application specific
ULTs can run on any OS.
Disadvantages of ULT
When thread makes blocking system call,
entire process blocks
Can’t easily take advantage of
multiprocessing. (More on this later).
All work done by kernel. No thread management
code in application, simply an API call to kernel
Kernel maintains context information for the
process as a whole and for individual threads
within the process.
Overcomes the two major drawbacks
Disadvantage: Transfer of control from one
thread to another requires a mode switch to the
Combined Approach (Solaris)
Thread creation is done completely in user
space as is the bulk of scheduling and
synchronization of threads within an application.
Multiple ULTs from a single application are
mapped onto some smaller or equal number of
Programmer may adjust the number of KLTs for
a particular application and machine to achieve
best overall results.
Could be the best of both worlds.
Process: normal UNIX process. Include’s user’s address
space, tack and process control block.
Threads (user level). Implemented through a threads
library in the address space of a process, invisible to OS.
“Interface for application parallelism”
Lightweight processes (LWP).
Mapping between ULTs and kernel threads
Each LWP supports one or more threads and maps to one
Kernel threads – fundamental entities that can be
scheduled and dispatched to run on one of the system
A picture (4.15) is worth 1000 words.
Process 1: No concurrency
Process 2: Pure ULT strategy
Process 3: Several threads on smaller number of LWPs.
Specifies degree of parallelism at kernel level that will
support this process.
Process 4: Threads permanently bound to LWPs in1-1
mapping. Makes kernel-level parallelism fully visible to
application. Useful if threads will be frequently be
suspended in a blocking fashion.
Process 5: Kitchen sink.
Not shown: kernel threads to execute system functions.
Flexibility = power
Multiple windows, only one active at a time
Many threads on one LWP. Creation, Destruction,
blocking, etc. w/o involving kernel.
Application with threads that block” multiple
LWPs. Nonblocked threads running happens
Independent matrix computations: 1-1.