K42 an Open-Source Linux-Compatible Scalable Operating System Kernel

Document Sample
K42 an Open-Source Linux-Compatible Scalable Operating System Kernel Powered By Docstoc
					      K42: an Open-Source Linux-Compatible Scalable Operating
                          System Kernel
                Jonathan Appavoo† Marc Auslander † Maria Butrico † Dilma Da Silva                                †

                          Orran Krieger † Mark Mergen † Michal Ostrowski †
                       Bryan Rosenburg † Robert W. Wisniewski† Jimi Xenidis †

Abstract                                                          gies. K42 supports the Linux API and ABI.
                                                                     Open-source system software has a set of stresses
K42 is an open-source scalable kernel supporting the              placed on it by the user community. A common goal of
Linux API and ABI. Its design allows rapid proto-                 open-source software is to be available to a wide audi-
typing of operating system policies and mechanisms,               ence of users. This implies the software should be config-
providing open-source developers a conduit via which              urable in different ways to meet the disparate audiences’
they may quickly test potential operating system in-              needs and should be flexible enough so the core code base
novations. K42’s object-oriented model provides per-              does not fragment. As the open-source community is di-
instance resource management. Per-instance manage-                verse the system should 1) allow groups to leverage de-
ment allows K42 to be customized simultaneously to dif-           velopment work by other groups, 2) allow developers on
fering demands of applications, allows that customiza-            different hardware platforms to take advantage of the fea-
tion to change on-the-fly, providing autonomic and on-             tures available on those platforms, 3) run on both large-
demand functionality, and allows base K42 source to af-           and small-scale machines, and 4) provide a mechanism
fect only specific applications. Thus, developers with             and tools for developers to understand the performance
non-mainstream needs can have their code integrated into          of the system. Though additional requirements could be
K42 without affecting other users unless the other users          listed, the overarching theme is that open-source system
also desire the modifications.                                     software should be flexible so it can run across different
   K42 is being used both internally by the IBM teams,            platforms, support different applications, and be used by
and externally by an increasing number of universities.           different users that have varying constraints. The descrip-
In this paper we describe K42’s overall structure and de-         tion of K42 throughout this paper focuses on the items
sign. We share the experiences we have had designing and          enumerated above to demonstrate its successful develop-
implementing K42, and highlight those aspects of K42’s            ment model for an open-source community.
design and features that facilitate smooth open-source col-
                                                                     K42 was designed to scale up to machines containing
                                                                  thousands of processors, and down to machines contain-
                                                                  ing two to four processors. To achieve such scalabil-
                                                                  ity, K42 has been structured in an object-oriented man-
1 Introduction                                                    ner where each resource of the system is managed by a
                                                                  per-instance set of objects. The object-oriented design
The K42 project is developing a new scalable open-source
                                                                  also provides a high degree of customizability because ap-
operating system kernel incorporating innovative mecha-
                                                                  plications can choose the implementation of a system re-
nisms and policies, and modern programming technolo-
                                                                  source best suited to their needs. Further, the per-resource
  † IBM T. J. Watson Research Center. Work supported by Defense   management allows different concurrent applications to
Advanced Research Project Agency Contract NBCH30390004            choose different implementations. Even within one appli-
cation, different uses of a given resource may be managed           management policy selection to occur either auto-
differently. We have also provided autonomic and on-                nomically, on demand, or by compiler-inserted or di-
demand capabilities by implementing mechanisms that al-             rect application hints.
low these per-instance resources to be swapped on-the-fly
while the system is running.                                      • Maintainability/Extensibility A) Provide a modu-
                                                                    larized model for supporting different hardware plat-
   The modular design allows rapid prototyping of oper-
ating system policies and mechanisms, providing open-               forms and for tuning to different applications. B)
source developers a conduit via which they may quickly              Allow the system to be upgraded with new com-
                                                                    ponents either for performance, security, or correct-
test potential operating system innovations. The modular
design in combination with the Linux API and ABI sup-               ness, without interruption in system services.
port, allows developers interested in new open-source op-         • Accessibility/Linux Compatability: A) Be avail-
erating system policies, e.g., a new memory management              able to a large open-source and research commu-
policy, to experiment without requiring detailed knowl-             nity. B) Make it easy to add specialized components
edge of the whole system. As evidence of the open-source            for experimenting with policies and implementation
model, several specific technologies from K42 have been              strategies. C) Open up for experimentation parts of
incorporated into Linux [?, ?, ?]. Further, K42’s per-              the system that are traditionally accessible only to
instance resource management allows base K42 code to                experts.
affect only specific applications. Thus, open-source de-
velopers with non-mainstream needs can have their code      K42 is Linux API and ABI compatible. Also, we ex-
integrated into K42 without affecting other users, unlessploit the rich set of device drivers, file systems, and other
those users also desire the modifications.                code available with Linux, and are part of the commu-
   A few key philosophies govern the design of K42. In   nity that is developing core kernel technology. We are
K42 we have 1) structured the system using modular,      developing an alternative to the Linux kernel for research
object-oriented code, 2) avoided centralized code paths, prototyping, not a new operating system.
global data structures, and global locks, and 3) moved      K42 is open source. In combination with supporting
system functionality from the kernel to server processes Linux interfaces, this allows K42 to be accessible to a
and into application libraries. While we emphasized theselarge community interested in exploring operating system
design principles, our primary focus has been on perfor- policies. One of the goals of the modular design of K42
mance and we made compromises when the two con-          is to provide access to kernel policies to a wider set of
                                                         developers. The intent is that rather than requiring an un-
flicted. The intent was not to carry the design philosophies
                                                         derstanding of the whole kernel, knowledge of a particular
to an extreme in order to fully explore their ramifications,
                                                         module, e.g., memory management or file system, could
but rather to use them to achieve a better performing, more
customizable system adhering to our goals.               be gained more quickly, thus allowing to a wider audi-
   The key goals of the K42 project include:             ence to explore resource management policies. While this
                                                         modular structure has been successful to this extent, the
  • Scalability/Performance: A) Scale up to run well overall kernel structure is significantly different from pre-
    on large multiprocessors and support large-scale ap- vious Unix kernel implementations and thus requires ker-
    plications efficiently. B) Scale down to run well on nel developers to understand a new model. In Section 3,
    small multiprocessors. C) Support small-scale appli- Experiences, we describe in more detail the advantages
    cations as efficiently on large multiprocessors as on and challenges our design presents.
    small multiprocessors.                                  The rest of the paper is structured as follows. Section 2
                                                         presents an overview of K42’s structure and technologies.
  • Customizability: A) Provide applications with sys- More details of specific components can be found in white
    tem resources managed to meet their specific needs. papers on the K42 web site (w.research.ibm.com/K42).
    B) Be able to change that management dynamically Section 3 describes the experiences we and others have
    as the application needs change. C) Allow resource had with K42. We highlight those aspects of K42’s design

and features that facilitate smooth open-source collabora- run inside the kernel or in user mode. These libraries pro-
tion, and present some of the challenges K42 and Linux vide the run-time environment that Linux-kernel compo-
provide to achieving smooth collaboration. Section 4 de- nents expect.
scribes our current research directions and presents se-
lected results. Section 5 concludes.                       2.2 Object-Oriented Design
                                                                    We have applied object-oriented technology to the entire
2 K42’s Structure and Technologies                                  system. Each virtual (e.g., virtual memory region, net-
                                                                    work connection, file, process) and physical (e.g., mem-
2.1 Structural Overview                                             ory bank, network card, processor, disk) resource is man-
                                                                    aged by a different set of object instances. Each object
K42 is structured around a client-server model as in Fig-           encapsulates the meta-data necessary to manage the re-
ure 1. The kernel is one of the core servers. It cur-               source as well as the locks necessary to manipulate the
rently provides memory management, process manage-                  meta-data. We avoid global locks, data structures, and
ment, inter-process communication (IPC) infrastructure,             policies.
base scheduling, networking, and device support. Above                 The vast majority of requests to an operating system are
the kernel are applications and system servers, including           independent and may be processed independently if the
an NFS file server, a KFS scalable file server, name server,          underlying design and data structures permit. Some re-
socket server, PTY server, and pipe server. For flexibility,         quests, however, are based on actual sharing of resources.
and to avoid IPC overhead, we implement as much func-               K42 provides an enhanced object-oriented model via clus-
tionality as possible in application-level libraries. For ex-       tered objects [?, ?] that makes this distinction transparent.
ample, all thread scheduling is done by a user-level sched-         Clients use clustered objects, and the underlying imple-
uler linked into each process.                                      mentation determines transparently the appropriate distri-
   All layers of K42, the kernel, system servers, and user-         bution for achieving good multiprocessor performance.
level libraries, make extensive use of object-oriented tech-           There are several advantages object-orient technology
nology. We use a stub compiler with decorations on the              provides in K42. One of the advantages is good multipro-
C++ class declarations to automatically generate IPC calls          cessor performance, which is achieved because 1) inde-
from a client to a server, and have optimized these IPC             pendent requests to different resources proceed indepen-
paths to have good performance. The kernel provides the             dently because no shared data structures are traversed and
basic IPC transport, and attaches sufficient information             no shared locks are accessed, 2) good locality is achieved
for the server to perform authentication on those calls.            for resources accessed by a small number of processors,
   Linux API and ABI compatibility is accomplished by               and 3) the clustered-object technology lets widely ac-
an emulation layer that implements Linux system calls by            cessed objects be implemented in a distributed fashion.
method invocations on K42 objects. When writing an ap-                 A second advantage is customizability. K42 can pro-
plication to run on K42, it is possible to program to the           vide applications or sub-systems with resource manage-
Linux API or directly to the native K42 interfaces. Pro-            ment tailored to their needs because per-resource object
gramming against the native interfaces allows the applica-          instances allow multiple policies and implementations to
tion to take advantage of K42-specific optimizations. The            be supported simultaneously by the system. Because each
translation of standard Linux system calls is done by in-           resource instance is implemented by an independent ob-
tercepting glibc system calls and implementing them with            ject, resource management policies and implementations
K42 code. While Linux is the first and currently only                can be controlled on a per-virtual-resource basis, thus dif-
personality we support, the base facilities of K42 are de-          ferent applications can use different resource management
signed to be personality-independent.                               policies. Even within a given application, different poli-
   The Linux-kernel internal personality is provided by a           cies may be supplied for different instances of a given
set of libraries that allow Linux-kernel components such            resource. For example, every open file may have a dif-
as device drivers, file systems, and network protocols to            ferent pre-fetching policy, and different page caches may


                                                                                                           Linux API/ABI
                                                                    Linux libraries/glibc

                                                                      Linux emulation
                  K42 OS libraries
                                                                                        K42 OS libraries

     File                                                                         File
    Server                                      Name                              Server

                              Linux device drivers/IP stack

                                          Figure 1: Structural Overview of K42

have different replacement policies. K42 extends these              tages described in this paragraph, and describe the chal-
customizability advantages by providing hot swapping.               lenges we have found and some solutions that may be ef-
Hot swapping allows object instances to be replaced dy-             fective in realizing the advantages.
namically as an application’s behavior changes, as new
functionality becomes available, or as bug fixes are im-
                                                                    2.3   User-Level Implementation of System
plemented [?, ?, ?].
   There are other advantages object-oriented technology
provides. The modular nature of the system makes it more            In K42, much of the functionality traditionally imple-
maintainable by providing a clean model for supporting              mented in the kernel or servers, is moved to libraries in
new applications and new hardware. For each new plat-               the application’s address space. This work has a simi-
form or application that K42 supports, additional objects           lar flavor to the Exokernel [?], Psyche [?], and Scheduler
may be created. Thus, each object remains simple and                Activations [?]. This change allows for a large degree
easy to program for the given platform or application. As           of customization because applications can implement tra-
code does not affect all users in the system, developers            ditional system functionality using libraries customized
working on features that apply to a narrow audience may             to their needs. For example, applications with special-
still contribute to K42 because only those applications for         ized needs, e.g., subsystems, games, and scientific appli-
which features are advantageous need to use them. Fur-              cations, can provide their own libraries, replacing much
ther, because modifying a given object does not involve             of the operating system functionality that would tradition-
code from many aspects of the system, non-kernel devel-             ally be implemented in the kernel, without sacrificing se-
opers may more easily develop a new resource manage-                curity and without impacting the performance of other ap-
ment policy for their application. In theory, these advan-          plications. Security is not affected because only informa-
tages are nicely described. In practice, details often inter-       tion that would have been accessible to an application is
fere with cleanly achieving the described advantages. In            stored in the library. Overhead is reduced in many cases
Section 3, Experiences, we examine the intended advan-              because crossing address space boundaries to invoke sys-

tem services can be avoided. Also, space and time is con-          enough to determine if the fault was in-core. If it is in-
sumed in the application and not in the kernel or servers.         core, the kernel handles the fault directly. Otherwise, the
For example, an application can have a large number of             fault is reflected back to the dispatcher, and the user-level
threads without consuming any additional kernel mem-               scheduler can schedule another runnable thread or yield.
ory. In many cases, we can handle common-case critical             This contrasts with existing systems that use an M on
paths e.g., non-shared files, efficiently at user-level, while       N thread model. With our model, kernel resources are
handling more complex situations in the kernel or in a             saved because the kernel only needs to reserve resources
system server. A discussion of our experiences with this           for one entity per address space. As with timer events,
model appears in Section 3. Along with systems such as             page-fault completions are passed up to the scheduler as
L4 [?], K42 utilizes an efficient IPC mechanism that has            asynchronous notifications. The user-level functionality
performance comparable to system calls. In the rest of             provided for page-faults enables customizations.
this section we describe some of the services that have               IPC Services: The IPC services implemented in the
user-level implementations in K42.                                 K42 kernel are simple. The kernel hands the proces-
   Thread Scheduling: All thread scheduling has been               sor from the sender to the receiver address space, keep-
moved to user level. The kernel is aware only of user pro-         ing most registers intact, and giving the receiver an un-
cesses and maintains an entity called a dispatcher to repre-       forgeable identifier for the sender. Most of the work of
sent that process. All threads are multiplexed at user level       IPC is done in user-level libraries that are responsible for
on this dispatcher. Events that would ordinarily block the         marshaling and de-marshaling arguments into registers,
process are instead reflected back to the scheduler library         setting up shared regions for transferring bulk data, and
code running in the application. This scheduler code can           authenticating requests. The K42 IPC facility is as effi-
then take the appropriate action, for example blocking the         cient as the best kernel IPC facilities in the literature [?].
current thread and running another thread. In this way,            The kernel provides the basic IPC transport, and attaches
any number of threads can be multiplexed on a dispatcher           sufficient information for the server to perform authen-
without negative consequences for the application. This            tication on those calls. Because the implementation is
ties up fewer kernel resources, makes the scheduling more          in user-level, it can be customized to, for example, use
efficient, and most importantly, allows flexibility for opti-        problem-domain-specific transports for efficiency, mini-
mizations at user level. This user-level scheduling facility       mize authentication overhead, and minimize state saving
provides a framework that has allowed other services to            when communicating between trusted parties.
be moved to user level [?].                                           I/O Servers: In most client/server operating systems,
   Timer Interrupts: If an application has thousands of            servers maintain state for every outstanding request from
threads, many of those threads may be waiting for timer            a client thread. In K42, if the resources necessary to han-
events, e.g., timeouts on socket operations. In K42, the           dle a request are unavailable, the server returns an error,
dispatcher has a single timer request outstanding, for its         the application blocks the thread in its own address space,
next timeout, and all subsequent timeouts are maintained           and the server notifies the client when a request can be
in the application’s address space. This results in better         re-issued. For example, servers that provide services such
performance because most timeouts handle exceptional               as sockets, PTYs, and pipes maintain information about
events, and are canceled before occurring. By keeping the          all the applications attached to a communication port, and
state in the application address space, we can avoid inter-        notify the applications when new data becomes available.
action with the kernel when a timeout is canceled, pro-            While we first introduced this scheme in order to avoid
viding an inexpensive mechanism for the common-path                using up server resources, it also has two other benefits.
timer operation. When a timer event for a dispatcher ac-           First, complete state about the file descriptors an appli-
tually occurs, it is passed up to the dispatcher as an asyn-       cation is accessing is available in the application’s own
chronous notification and the user-level code can decide            address space. This means that operations like Posix se-
how to handle the event.                                           lect() can be implemented efficiently without commu-
   Page Fault Handling: On a page fault, we maintain               nication with the kernel or servers. Second, and more im-
the state of the faulting thread in the kernel only long           portantly, it allows us to use an event-driven rather than

a polling model for handling I/O requests, making imple-           efits of 40% or more. With hot swapping, when the file
mentations of services such as select() more efficient.             becomes shared, a new object can dynamically replace
This allows more efficient implementations of, for exam-            the old object. This new object communicates with the
ple, web servers, because there is no need to block threads        file system to maintain the control information. We have
for long periods of time.                                          also used hot swapping to switch between shared and dis-
                                                                   tributed implementations of the object representing a re-
2.4 Additional K42 Technologies                                    gion in the kernel when we discovered the region object
                                                                   being used across the multiprocessor.
Integrated Performance Monitoring: As part of the                     We envision extending our hot swapping infrastructure
original design, K42 included an integrated tracing and            to 1) perform dynamic granularity system monitoring, 2)
performance monitoring infrastructure. More recently, we           provide better system availability, and 3) allow more ex-
have extended the model to encompass all aspects of the            tensive testing. 1) System monitoring: Monitoring is re-
software stack. K42’s event tracing infrastructure pro-            quired to detect security threats, performance problems,
vides for correctness debugging, performance debugging,            etc. However, there is a tradeoff between placing exten-
and performance monitoring of the system. The infras-              sive monitoring in the system and the performance over-
tructure allows for cheap and parallel logging of events by        head this entails. With hot swapping, upon detection of
all levels of the system including applications, libraries,        a problem by broad-based monitoring, it becomes possi-
servers, and the kernel. This event log may be exam-               ble to dynamically insert additional monitoring, tracing,
ined while the system is running, written out to disk, or          or debugging without incurring overhead when the more
streamed over the network. Post-processing tools allow             extensive code is not needed. 2) System Availability: Nu-
the event log to be converted to a human readable form or          merous mission-critical systems require five-nines level
to be displayed graphically.                                       availability, making software upgrades challenging. Sup-
   We achieved several goals with the tracing infrastruc-          port for hot-swapping allows software to be upgraded, i.e.,
ture in K42 [?]. The goals are to 1) provide a unified              for bug fixes, security patches, new features, performance
set of events for correctness debugging, performance de-           improvements, without having to take the system down.
bugging, and performance monitoring, 2) have the infras-           3) Testing: Even in existing relatively inflexible systems,
tructure always compiled into the system thereby allowing          testing is a significant cost that constrains development.
data gathering to be enabled dynamically, 3) separate the          Hot-swapping can ease the burden of testing the system
collection of events from their analysis, 4) allow events to       by hot swapping a control object in front of the to-be-
be efficiently gathered on a multiprocessor, 5) have mini-          tested object, having the control object generate input val-
mal impact on the system when tracing is not in use and            ues, including faulty ones, and the examining the results.
allow for zero impact by providing the ability to compile          In a similar manner, delays can be injected into the system
out events if desired, 6) provide cheap but flexible logging        at internal interfaces, allowing potential race conditions to
for either small or large amounts of data per event.               be explored.
   This performance monitoring infrastructure has proven              Customizable Scalable File System: In addition to
invaluable not only in helping us achieve good perfor-             supporting standard Linux file systems such as ext2, K42
mance in K42, but has served to aid developers of Linux            includes KFS [?], a fine-grained adaptable file system that
programs struggling to understand their application be-            is customizable at the granularity of files and directories
havior.                                                            allowing K42 to meet the requirements and usage access
   Hot Swapping: Hot-swapping allows the individual                patterns of various workloads. In KFS, each file or direc-
object instances used to implement a service to be tuned           tory may have its own tailored service implementation,
to the varying demands on that service. For example, in            and these implementations may be replaced on the fly.
K42, when a file is accessed exclusively by one applica-            By doing so, KFS addresses the difficulties found in tra-
tion, an object in the application’s address space handles         ditional file systems designed to handle a specific set of
the file control structures, allowing it to take advantage          requirements and assumptions about file characteristics,
of mapped file I/O, thereby achieving performance ben-              expected workload, and usage and failure patterns. KFS

also includes meta-data snapshotting, allowing it to have          K42’s threading model allows the kernel to be preempted
the properties of a journalled file system with much lower          at any point. This provides for low-latency interrupt han-
performance overhead. KFS induces less overhead than               dling, and requires pinning of only kernel code and data
a write-ahead journalling file system, and scales better as         of low-level objects. Reducing the required pinned mem-
the number of clients and file system operations grows.             ory potentially reduces the kernel’s footprint and provides
KFS has also been implemented successfully in Linux.               more physical memory for applications.
   Comprehensive Scheduling: We are developing a
scheduling infrastructure that can provide quality-of-
service guarantees for processors, memory, and I/O, and            3    Experiences
that simultaneously supports real-time, gang-scheduled,
regular time-shared, and background work. K42 uses                 In this section we describe our experiences with K42 as
synchronized clocks (hardware or software) on different            well as those of our collaborators. We classify our expe-
processors to allow work to be scheduled simultaneously            riences into the following general topics: overall object-
for short periods of time on multiple processors, without          oriented design; modularity for maintainability, new ap-
the need for global synchronization. The ability to sup-           plications, and new hardware; development and collab-
port fine-grained gang-scheduled applications can sim-              orator model; user-level implementation; and version is-
plify parallel programming tasks, and the ability to run           sues.
mixes of all classes of jobs allows developers to develop             One of the attractions of K42 is that it provides a ve-
and test in environment similar to the final production en-         hicle with which to perform rapid prototyping of ideas.
vironment.                                                         For example, one could use it to determine whether a new
   Lock Avoidance: Traditionally, the error of using a             memory management policy shows promise. If it does,
stale pointer to deleted storage is avoided by existence           then there is a stronger motivation for taking the time to
locks or use counts to protect pointers. Full-scale garbage        implement it in Linux or another system. Because K42 is
collection can also solve this problem, but is not appropri-       well modularized, a prototype implementation of a policy
ate for low-level operating system code. K42 uses an in-           is reasonably easy to program, and potentially able to be
dependently developed mechanism similar to the RCU [?]             implemented and tested by developers without significant
(Read Copy Update) mechanism, in which deletion of                 kernel expertise.
K42 objects is deferred until all currently running threads           Overall Object-Oriented Design:
have finished. This allows a programming style whereby                 K42’s object-oriented model provides a development
an object releases its own lock before making a call on            environment in which it is comparatively easy to bring
another object, thus improving base system performance,            up a system. By developing simple or stubbed-out imple-
increasing scalability, and eliminating the need for com-          mentations behind a full and correct interface, we were
plex lock hierarchies and the resulting complex deadlock           able to quickly bring up K42. This would hold true when
avoidance algorithms. The technique used is related to             moving to a new architecture as well. For example, be-
type safe memory [?], but minimizes the amount of time             hind the interface to the objects that provide a file system
during which the memory is guaranteed to be type safe.             and IP service, we implemented a simple serial protocol to
   Other K42 features: K42 was designed to run on                  an existing machine that could provide the service. Com-
64-bit processors. Designing for 64-bit architecture en-           ponent by component we have replaced these services as
ables pervasive implementation optimizations. Examples             needed. So far this model has worked well. There have
include the use of large virtual arrays rather than hash           been a few glitches that we have had to handle. For ex-
tables, the allocation of memory bits for distinguishing           ample, when we implemented a new variant of the page
classes of allocated memory, and exploiting the fact that          cache object, it would have been more convenient to have
we can atomically manipulate 64-bit quantities efficiently.         had a virtual address rather than physical. Overall how-
K42 is fully preemptable and most of the kernel data               ever, the model has been been a solid success.
structures are pageable. Except for low-level interrupt               One of our concerns about individual instances manag-
handling and code for dispatching real-time applications,          ing each resource in the system was that it might be diffi-

cult to achieve a global state if all the data for achieving       because we have avoided compiler-supported multiple in-
that understanding is scattered throughout many object in-         heritance, we have had increased object proliferation.
stances. For example, we have many objects managing                   Currently, we do not have many choices for what ob-
the pages for many small files. However, to effectively             ject to choose to manage a given resource. As the number
use a working set algorithm, a certain minimum number              of choices increase, we may need to add infrastructure
of pages are needed, therefore we can not run such an al-          that tracks application’s selection, and tracks which ob-
gorithm on what might be a natural granularity. As an-             jects performs best for that application. We have begun
other example, it is difficult to select the globally next          work on a continuous program optimization infrastructure
highest priority thread when the priorities of threads are         that would enable this.
distributed throughout a series of user-level schedulers.             Modularity for Maintainability, New Applications,
Moving to instances of objects for each resource in the            and New Hardware:
system has had tremendous advantages, but it has also                 Our experience leads us to believe that the object-
opened interesting research issues that we have had to ad-         oriented model will continue to make it simpler for open-
dress. To date, the difficulty of implementing local poli-          source developers to implement new services, or port
cies that behave as if they had global state has not been          K42 to new architectures. The model allows new ser-
an inhibitor towards achieving good performance. If any-           vices, such as large page or segment support, to be added
thing, we have found that changes to a specific object              with minimal impact. There is no duplication of struc-
were more easily made because their effects were local.            tures forced by writing to a HAL (Hardware Abstraction
This should imply that open-source developers will have            Layer), rather, the machine specific hardware interface
an easier time tuning the performance of a particular com-         can be at different levels for different architectures.
ponent because they will not need to gain an understand-              As stated above, stubbed interfaces allow quick bring-
ing of the overall system in order to do so.                       up of the system. We have provided unoptimized
   The largest drawback of the object-oriented nature of           architecture-independent code for much of the machine
the system is the complexity it introduces when trying to          dependent part of K42. This allows ports to new architec-
understand the interactions between different components           tures to proceed quickly, with tuning occurring after the
of the system. Originally, the plan was to avoid imple-            system is booted and running.
mentation inheritance, but in many places we decided the              While we have not had much experience with either
benefits outweighed the difficulties introduced. These de-           new hardware or new architectures, two examples favor-
cisions, in addition to the nature of object-oriented soft-        ably support our intuition. Both a port to the AMD x86
ware, means understanding cross-module interactions re-            64-bit platform (through kernel bring-up), and test code
quires wading through significant layers of interface. For          for a theoretical new memory model were incorporated
open-source developers familiar with standard Unix oper-           into K42 relatively quickly by developers who were not
ating system structure, gaining this understanding poses a         formerly familiar with the system. The developers did
barrier to overall system development. We believe how-             however, work closely with the core K42 team.
ever, as stated above, that work on individual components             Development and Collaborator Model:
can proceed with less expertise.                                      One of the tenets behind the object-oriented model was
   A concern we had involved the assembly code gener-              that developers working on policies not desired by the ma-
ated by the compiler for the C++ code of K42. We have              jority of users could still contribute code to K42. Only ap-
found that a pragmatic approach is needed when using               plications desiring this code would have to invoke it. As
C++, for example, using larger-grained objects to amor-            a concrete example, NASA Ames made significant scal-
tize the overhead, and examining the assembly code gen-            ability development patches to Linux, but this work did
erated to discover performance bugs in the compiler. For           not make it back into the main kernel because of the con-
the most part, we have found these aspects of the perfor-          cern it would negatively impact the bulk of users. The
mance to be sufficient, and in a few cases have worked              desire to make specific changes that are potentially not
with g++ developers to fix or understand particular be-             advantageous to small workstation or uniprocessor users
havior. Another consequence of C++ performance is that             is representative of the HPC (High Performance Comput-

ing) community.                                                       Version Issues:
   We believe that K42 will be able to incorporate such
efforts into a single code base. There are, however, sig-
nificant hurdles that must be overcome to making this a
reality. We are just beginning to have a significant collab-            We stated earlier that K42 supports a Linux API and
orator base outside the core team. A first, and pragmatic            ABI. If a program compiles or runs on PowerPC Linux,
issue is deciding who will commit code directly to the              it should run or compile and run on K42. However, that
K42 base. This has several implications. In addition to             is for a specific version of glibc, Linux, tool chain, etc.
maintaining the code style, and more importantly, overall           Dealing with multiple versions of the software that are
programming principles of K42, there are security issues            among themselves incompatible makes it difficult for K42
with allowing anyone to commit source to K42. The cur-              to generically support Linux. This problem has been ex-
rent plan has been to design and code review the first sev-          acerbated until now because we required that collabora-
eral commits and then allow direct commits with a part-             tors build a development environment which includes the
ner from the core K42 team examining the “diffs”. While             tool chain. We will soon be moving to a model where
this addresses to some extent the first two issues, it does          the K42 kernel will be able to be placed onto an exist-
not address security concerns. However, as witnessed by             ing Linux box and run. This will significantly reduce the
the recent security issues in Linux, maintaining commit             complexity of the developer’s responsibility of running
control by a limited group does not solve security issues           K42, but does not reduce the impact of the wide variety of
either.                                                             Linux versions that exist. To run on K42, the application
   A more fundamental issue involves the testing and ver-           will first have to run on a particular variant or small set
ification of the objects in the system. Unlike projects              of variants. Though the above sounds prohibitive, most
such as SPIN [?] or Vino [?], K42 does not place restric-           applications run across all versions Linux, glibc, etc, and
tions on where code may be replaced, or on the program-             thus will not pose a problem when running on K42.
ming model for that code. The model has been to allow
trusted developers to download new kernel objects writ-
ten in C++. Even given the assumption of well-intended
developers, there is still a need to verify that the new ob-           In addition to causing difficulties in building and run-
jects match the specifications of the object they are to re-         ning K42, the incompatibilities between glibc and Linux
place. While formal verification models do not handle the            kernel data structures also add complexity and hurt perfor-
needed generality, a technique that combines verification            mance in the emulation layer. For example, the stat(2)
and an empirical harness based on specifications with it-            system call in Linux, invoked from glibc, involves con-
erative testing of the code shows promise.                          versions between Linux’s struct stat passed in from
   User-Level Implementation:                                       the glibc layer and Linux’s struct kstat used in
   Implementation in the application’s address space im-            the kernel implementation. K42 uses glibc’s definition
pacts the design of many operating system services. The             of struct stat, so the stat(2) operation involves
implementation is not necessarily more difficult, but it is          conversions between glibc’s struct stat, Linux’s
different. So far, we have been able to develop imple-              struct stat, and Linux’s struct kstat. As with
mentations for the user-level services described in the last        the above, most of the incompatibility issues have been
section that are as efficient as those of other operating sys-       dealt with, so new developers will be able to concentrate
tems. Moreover, in some cases we have found implemen-               on designing modules targeted to their application needs.
tations that are more efficient. Once again, this model is           However, developers will always need to be aware of ver-
different from what an open-source developer is used to             sion and data structure incompatibility issues. This is true
from Linux. Our experience indicates though, that once              independent of K42. As with other open-source projects,
the model is understood, it is advantageous from an open-           because of incompatibilities, it can be challenging for our
source development perspective because it allows finer               collaborators to debug problems that we can not repro-
control over what is being modified.                                 duce.

                                                               scripts/hour (thousands)
4 Current Directions and Status                                35

4.1 Current Directions                                         30

A large part of the early effort on K42 was to design for,     25
and show good performance on, a scalable multiproces-
sor. This continues to be an important direction, and we
make sure that new implementations of services are con-
sistent with the scalable K42 model.
   A recent initiative has been to engage more groups to       10
use K42 as a prototyping platform in order to gather feed-
back from the community about its advantages and disad-         5
vantages. As we stated in the experiences section, using
K42 allows developers to test out new ideas quickly and         0
                                                                            5    10     15    20
ascertain their value for inclusion into Linux, or other op-                number of processors
erating systems. Some of the technology originally devel-
oped for K42 has been transferred to Linux [?, ?, ?], and      Figure 2: Results of a benchmark based on SDET running
because the ability to easily prototype ideas, we believe      on K42 on a 24-way multiprocessor
this will continue. In addition, as more useful policies
are implemented in K42, it will gain a wider user base
and work for more applications. K42 is currently the pro-      tential transfer to other operating systems if it is success-
totyping environment being used for the DARPA HPCS             ful.
IBM PERCS project [?], and has been useful in that role.          The file system is another area of active research. KFS
   Hot swapping began as an initiative to support auto-        has been shown to be highly adaptable. It follows an
nomic computing, allowing the operating system to swap         object-oriented design, with each element in the file sys-
in new components as needed for performance, or to be          tem being represented by a different set of objects. Devel-
upgraded while remaining available. Early work in that         opers can add new implementations to address their spe-
area demonstrated the usefulness of swapping individual        cific needs without affecting the performance and func-
objects for performance reasons. Based on that positive        tional behavior of other clients. KFS also has been imple-
experience, current work is underway to extend K42’s hot       mented on Linux with performance similar to ext2, while
swapping capability to allow the system to switch all in-      providing the journalling capabilities of ext3. Work is on-
stances of a given class, thus allowing for dynamic up-        going in KFS to explore its flexible design beyond perfor-
grade.                                                         mance gains. Our current goal is to evaluate how much
   To decide what instances of which objects to switch,        KFS’s flexibility simplifies the support of object-based
K42 must have performance monitoring data. However,            storage.
just monitoring the kernel may not provide the best data          We are examining and optimizing significant subsys-
on which to make a hot-swapping decision. We are partic-       tems such as a database or a JVM (Java Virtual Machine).
ipating in a broad continuous optimization effort with the     Work has begun to get these applications running on K42
goal of monitoring the entire system, from the hardware        and to understand the areas in which K42’s customizabil-
counters and low-level firmware and software, up through        ity could most help improve their performance.
the operating system, compiler, runtime and middleware,
to the application. Code running across the system will        4.2    Status
examine this data and make recommendations back to the
various layers, including the operating system. This feed-     We have made considerable progress towards making
back, for example, might be to use large pages for a given     K42 a real system capable of supporting large applica-
region. This work is being prototyped on K42, with po-         tions. K42 runs on PowerPC 64-bit platforms including

                                  180                                                               of the top line is when hot swapping is enabled, allow-
                                                                                  Default FCM
                                  160                                             Adaptive FCM      ing the detection of a streaming file access pattern to trig-
                                                                                                    ger dynamic modification of the page management policy
1−way SDET Throughput (scripts/hour)

                                  140                                                               for that file. The streaming application achieves the same
                                  120                                                               performance while the other users in the system, as repre-
                                                                                                    sented by SDET scripts, achieve better throughput.

                                                                                                    5     Conclusions

                                                                                                    We have outlined the structure and core technology of
                                                                                                    K42, and described our and our collaborators’ experi-
                                       20                                                           ences with it, and its ability to serve as an open-source
                                                                                                    development platform. Its modular structure makes it a
                                          0         1         2          3         4        5    6  valuable teaching, research, and prototyping vehicle, and
                                                    Number of concurrent background streams
                                                                                                    we expect that the policies and implementations studied in
                                          Figure 3: Result of K42 hot swapping page management this framework will continue to be transferred into Linux
                                          strategies to maintain SDET performance in light of n and other systems. In the longer term we believe the fun-
                                          streaming applications.                                   damental technologies we are studying will be important.
                                                                                                       Our system is available open source under an LGPL
                                                                                                    license. Please see our home page (http://www.
                                          POWER3, POWER4, POWER4+, Power Mac G5, and research.ibm.com/K42) for additional papers on
                                          Apple G5 Xserve hardware, and several different simu- K42 or to participate in this research project.
                                          lated PowerPC hardware platforms. We have run a next
                                          generation JVM, a benchmark based on SPEC SDET,
                                          Apache, mySQL, various scientific applications such as Acknowledgments
                                          umt2k, and an ASCII nuclear transport simulation appli-
                                                                                                    A kernel development project is a huge undertaking and
                                          cation. We are working to be fully self-hosting and soon
                                                                                                    without the efforts of many people, K42 would not be
                                          hope to be running our development environment on K42.
                                                                                                    in the state it is today. In addition to the authors, the
                                          K42 is being used by several university collaborators and
                                                                                                    follow people have contributed to K42 and we appre-
                                          by multiple groups inside IBM.
                                                                                                    ciate their work: Andrew Baumann, Michael Britvan,
                                             Other papers have focused on the performance aspects
                                                                                                    Chris Colohan, Phillipe DeBacker, Khaled Elmeleegy,
                                          of K42. While that is not the focus of this paper, we
                                                                                                    David Edelsohn, Hubertus Franke, Ben Gamsa, Garth
                                          highlight two aspects that have been and continue to be
                                                                                                    Goodson, Kevin Hui, Craig MacDonald, Michael Peter,
                                          a focus, namely scalability and hot swapping. Details of
                                                                                                    Jim Peterson, Eduardo Pinheiro, Rick Simpson, Livio
                                          these experiments can be found in our Freenix [?] and
                                                                                                    Soares, Craig Soules, David Tam, Manu Thambi, Nathan
                                          USENIX [?] papers respectively.
                                                                                                    Thomas, Gerard Tse, Timothy Vail, and Amos Water-
                                             Figure 2 shows the results from an experiment based on land.
                                          SPEC SDET that models the behavior of n simultaneous
                                          Unix users. K42 scales well through 24 processors where
                                          its peak of 33808.1 scripts/hour is achieved yielding an
                                          efficiency of 89.4 percent. The results demonstrate the
                                          effectiveness of K42’s scaling.
                                             Figure 3 shows the performance of 1-way experiment
                                          based on SDET in the face of competing streaming ap-
                                          plications, all running over NFS. The higher throughput