denali by mudoc123


									 Andrew Whitaker, Marianne Shaw, and Steven D. Gribble

Presented By
 Steve Rizor
   The Denali isolation kernel is an operating system architecture designed to safely multiplex
    a large number of internet services on shared hardware

   Allows new services to be “pushed” onto third-party infrastructures, relieving authors from
    the burden of maintaining physical infrastructure

   Exposes a virtual machine abstraction but does not attempt to emulate the underlying
    hardware precisely

   Modifies the virtual architecture to gain scale, performance, and simplicity of
                       With the proliferation of Internet services comes the
                        need for hardware solutions – but obviously one
                        machine per service is usually highly inefficient

 A large fraction of web
services are infrequently
accessed, while a small
  fraction is frequently
                                  Why not virtualize all of the
                               infrequently-accessed services?

 If one machine can handle
  10,000 requests per hour
  for one service, why can’t
    one machine handle 1
request per hour for 10,000
Making a Case for Isolation Kernels

   Many services can already run on one machine – but there is a
    need for security
     Isolation not only enables many services to run, but they run without the ability
      to affect one another
     This enables the push of new/untrusted services without the worry of harming
      other services
     It also brings about an interesting experimentation infrastructure – the ability to
      deploy wide-area testbeds for network research: thousands of running
      subjects without the physical machines
    Isolation Kernel Design Principles
            An isolation kernel is a small-kernel operating system architecture targeted
             at hosting multiple un-trusted applications that require little data sharing.

1. Expose low-level resources rather than high-level abstractions.
        •   High-level abstractions entail significant complexity and typically have a wide API,
            violating the security principle of economy of mechanism. They also invite “layer below”
            attacks, in which an attacker gains unauthorized access to a resource by requesting it
            below the layer of enforcement

2. Prevent direct sharing by exposing only private, virtualized namespaces.

        •   Little direct sharing is needed across Internet services, and therefore an isolation kernel
            should prevent direct sharing by conning each application to a private namespace.
            Memory pages, disk blocks, and all other resources should be virtualized, eliminating
            the need for a complex access control policy: the only sharing allowed is through the
            virtual network.
    Isolation Kernel Design Principles
             An isolation kernel is a small-kernel operating system architecture targeted
              at hosting multiple un-trusted applications that require little data sharing.

3. Scalability.

         •   An isolation kernel designed for internet services must be able to scale up into the
             thousands on a single machine. As such, the memory footprint (including the kernel
             metadata) must be minimized. Since the set of all unpopular services won’t fit in
             memory, the kernel must treat memory as a cache of popular services, swapping
             inactive services to disk. It will also have a poor hit rate, so there must be rapid
             swapping to reduce cache miss penalties.

4. Modify the virtualized architecture for simplicity, scale, and performance.

         •   VMMs such as Disco adhere to the first two principles. They also strive to support
             legacy operating systems by precisely emulating the physical hardware. In this case,
             however, deviating from the underlying physical hardware can enhance performance,
             simplicity, and scalability. The drawback to this is that this removes support for
             unmodified legacy operating systems.
Delani Isolation Kernel
While the Delani Isolation Kernel
looks like a standard VMM:

                                                            The virtual machine interface is
                                                            quite different from most others

The Delani virtual instruction set is a subset of x86, so that most virtual instructions execute
directly on the physical processor. x86 VMMs normally have to use binary rewriting and
memory protection techniques to virtualize some of the instructions. Since Delani does not
support legacy operating systems, those instructions are simply defined to have ambiguous
semantics. At worst, the VM will harm only itself. However, such instructions are rarely
used, and none are emitted by C compilers such as gcc.

The instruction set also adds an “idle-with-timeout” instruction that relinquishes control to
another VM instead of using time in an idle loop, an instruction to terminate the VM, and
several virtual registers revealing information about the system.
Delani Isolation Kernel
   Delani’s virtual machine interface is also different in that the emulated hardware
    is not a representation of the physical system:
       By keeping the emulated devices static, there is no need to poll for hardware.
       By keeping the devices simple, it reduces the number of programmed I/O instructions used to transmit
        or receive a single packet.

   Delani uses a round-robin schedule across all the active VMs (those with active
    threads) and uses a buffered interrupt scheme to prevent thrashing
       Those VMs which voluntarily give up time via the “idle-with-timeout” instruction are given priority once
        the timeout has finished

   Each Denali VM is given its own (virtualized) physical 32-bit address space.
       A VM may only access a subset of this 32-bit address space, the size and range of which is chosen by
        the isolation kernel when the VM is instantiated. The kernel itself is mapped into a portion of the address
        space that the VM cannot access; because of this, we can avoid physical TLB flushes on VM/VMM
       Virtual registers are stored in a page at the beginning of a VM's (virtual) physical address space. This
        page is shared between the VM and the isolation kernel, avoiding the overhead of kernel traps for
        register modications. In other respects, the virtual registers behave like normal memory (for example,
        they can be paged out to disk).
  For testing, since a standard operating system must be modified for use on the Delani
  Isolation Kernel, a small guest OS was developed based on the virtual machine
  interface named Ilwaco.

Because of the simplification of the virtual network device, fewer programmed I/O instructions
are needed per packet. However, there still needs to be a user/kernel switch for Delani, where
there does not need to be one in BSD. Adding a syscall to BSD packets (forcing this
user/kernel switch) brings the BSD performance more into line with Delani.

The performance gains for buffering interrupt requests are quite obvious.
Note the performance hit around 800 VMs due to memory demands and
excessive paging.

      Using the new instruction, there is a huge
     performance gain over normal OS-idle loops.
             Even at 800 virtual machines
             running, there is still an
             astonishing throughput

             The effects of paging are
             quite obvious – with a larger
             amount of memory, the cliff
             can be pushed further out.

Running the Quate II Linux server on Delani, it is apparent that even with
30 servers (4 clients each), there is no change in latency or reliability. The
scheduling algorithm combined with the idle-with-timeout instruction and
the buffered interrupts keep the servers running without issues.
   Andrew Whitaker, Marianne Shaw, and Steven D. Gribble, “Scale and Performance in
    the Denali Isolation Kernel”, OSDI’02.

To top