Docstoc

OpenSolaris Virtualization Technologies

Document Sample
OpenSolaris Virtualization Technologies Powered By Docstoc
					OpenSolaris Virtualization
Technologies




David Edmondson
Solaris Engineering
Agenda
•   Virtualization Overview
•   Zones
•   BrandZ
•   Xen
•   CrossBow
•   Q&A
The Need for Virtualization
• Driven by the need to consolidate multiple hosts
  and services on a single machine
• Leads to...
  > Increased hardware utilization (currently average
    data center utilization is below 15%)
  > Greater flexibility in resource allocation
  > Reduce power requirements
  > Minimize management costs
  > Lower the cost of ownership
 Types of Virtualization
Hard Partitions   Virtual Machines OS Virtualization Resource Mgmt.
                                                                              App


                                                                              OS


                                                                             Server

                    Multiple OS's Single OS
   Trend to flexibility                     Trend to isolation
 Dynamic System    Logical Domains   Solaris Containers   Solaris Resource
    Domains              Xen          (Zones + SRM)        Manager (SRM)
                                           BrandZ
                                          CrossBow
Agenda
•   Virtualization Overview
•   Zones
•   BrandZ
•   Xen
•   CrossBow
•   Q&A
Solaris Zones
• Basic concept: isolated execution environment
  within a Solaris instance
• Virtualizes OS layer: file system, devices, network,
  processes
• Provides:
  > Privacy: can't see outside zone
  > Security: can't affect activity outside zone
  > Failure isolation: application failure in one zone doesn't
    affect others
• Lightweight, granular, efficient
• No porting for most apps; ABI/APIs are the same
Typical Uses for Zones
• Consolidating data center workloads such as
  multiple databases.
• Deploying multiple-tier application stacks
• Hosting untrusted or hostile applications or those
  that require global resources like IP port space
• Deploying Internet facing services
• Software development
Zones Block Diagram
   global zone                (serviceprovider.com)
    twilight zone (twilight.com)              drop zone (drop.net)               fracture zone (fracture.org)
    zone root: /zone/twilight                 zone root: /aux0/drop              zone root: /export/fracture

     web services                              login services                      web services
     (WS 6.1, J2SE 5.0)                        (OpenSSH 3.4)                       (Apache 2.0.52, J2SE 1.4)




                                                                                                                                          Environment
                                                                                                                            Application
     enterprise service                        network services                    network services
     (Oracle 10g, AS 8.1EE)
                                                                22%
                                               (BIND 8.3, Sendmail 8.13.1)         (BIND 9.2.4, Postfix 2.1)

     core services                             core services                       core services
     (NIS, inetd, automountd)                  (NIS+, inetd, rpcbind)              (DNS, inetd, automountd)




                                                                                                                                          Platform
                              zcons




                                                                 zcons




                                                                                                    zcons



                                                                                                                    ge0:2
                                                                         ce0:1
                                      ge0:1




                                                                                                            ce0:2
                                       65%




                                                                                                                            Virtual
                                                                                             /usr
                /opt

                       /usr




     zoneadmd                                  zoneadmd                           zoneadmd


      zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)
    core services                             remote administration              platform administration
    (LDAP, inetd, rpcbind,                    (SNMP, SunMC, WBEM, ...)           (syseventd, devfsadm, ...)
    automountd, snmpd, dtlogin,
    Sendmail 8.13.1, Sun SSH)



                                                                                               storage complex
      network device                  network device
          (ce0)                           (ge0)
Zones Security
• Each zone has a security boundary around it
• Runs with a subset of privileges(5)
• A compromised zone is unable to escalate its
  privileges
• Important name spaces are isolated
• BSM auditing can be configured globally or on a
  per-zone basis
• Processes running in a zone are unable to affect
  activity in other zones
Zones Processes
• Certain system calls are not permitted or have
  restricted scope inside a zone
• From the global zone, all processes can be seen,
  but control is privileged
• From within a zone, only processes in the same
  zone can be seen or affected
• The /proc file system has been virtualized to only
  show processes in the same zone
Zones Networking
• Single TCP/IP instance for the system.
  > Shields non-global zones from configuration details
  > Prevents per-zone control (ARP spoofing, routing,
    tuning)
• Each zone can be assigned any number of IPv4
  and IPv6 addresses, and each zone has its own
  port space
• Applications can bind to INADDR_ANY and will only
  get packets addressed to that zone
• Zones cannot send packets with addresses other
  than those it has been assigned
Zones and Resource Pools
• Multiple zones can share a resource pool or in order
  to meet service guarantees, a single zone can be
  bound to a specific pool
• CPUs can be partitioned with arbitrary granularity
  using the fair share scheduler
• By default, all zones, including the global one, have
  one fair share scheduler share assigned to them
Zones and Resource Pools Example


   drop      fracture          twilight      global




  cpu1 cpu2 cpu3 cpu4          cpu5   cpu6      cpu7 cpu8


  whirl                 tide                 default
Zones Observability
• From the global zone, process tools like prstat(1M),
  ps(1), and truss(1) can be used to observe
  processes in other zones
• DTrace can be used from the global zone and
  supports a “zonename” variable as well as
  psinfo_t's pr_zoneid field for use with the proc
  provider, e.g.
 global# dtrace -n 'io:::start{@[zonename] = count()}'

• A subset of the DTrace functionality (non-kernel
  probes) is available from within non-global zones
Zones Configuration & Installation
• zonecfg(1M) is used to configure a zone by
  specifying resources (file-systems, network
  interfaces, etc) and properties (zone name, file-
  system directory path, etc)
• zoneadm(1M) is used to administer zones (boot,
  install, etc)
• Each Zone is assigned its own file system,
  constructed by copying packaged files from the
  global zone.
Zones Status (1)
• Initially supported in Solaris 10
• Available today through OpenSolaris
  > Faster provisioning with ZFS
  > Configurable privileges on a per non-global zone basis
  > Full boot argument support
  > Zone migration, local (enables zones to be moved and
    renamed) and remote (using a new attach/detach facility)
  > New resource controls
Zones Status (2)
• Coming to OpenSolaris soon
  > Tighter integration between resource management and
    zones
  > Integration with Live Upgrade
  > DTrace within a non-global zone
  > Enhancement in resource management (memory sets,
    swap sets, networking through CrossBow)
Agenda
•   Virtualization Overview
•   Zones
•   BrandZ
•   Xen
•   CrossBow
•   Q&A
BrandZ: Branded Zones
• Extends Zones model to support “non-native” zones
  on a Solaris system
  > Only supports user-space environments
  > If you need a different kernel, see Xen
• Each distinct zone type is called a Brand
• Possible uses:
  >   A Linux zone
  >   A Solaris GNU zone (Nexenta/ShilliX/BeleniX)
  >   Support for Solaris N-1 on Solaris N
  >   A MacOS X zone
Linux Zones
• Users expressed interest in running:
  > Matlab, Maple, EDA, Cadence, Acroread
  > Home-grown and open source apps
• 59% intend to deploy mission-critical applications
• Most willing to take a 5-10% performance hit
• Most want a RedHat Enterprise Linux-based system
  > SuSE a distant second
The lx Brand
• Implements Solaris Containers for Linux Applications
• Enables Linux Binaries to run unmodified on Solaris
• Creates a zone for Linux application execution
  > Zone is populated only with Linux software
  > At boot, it runs the Linux init(1M), configuration scripts, and
    applications
  > It all runs on a Solaris kernel.
• There is no Linux software delivered with BrandZ
  > This is not a Linux distro and we do not include our own
    special Linux software, we install and run standard Linux
    distributions
BrandZ Use Cases
• As a transition tool, reducing the Linux “barrier to exit”
  > Customer would like to move to Solaris, but has legacy
    Linux applications
• Best of both worlds
  > Users familiar with Linux environment
  > Administrators want Solaris' enterprise-class features:
    resource management, fault management, DTrace
• Developer/ISV workload
  > Solaris has strong development tools, let Linux developers
    leverage them.
  > We want Solaris to be a better Linux development platform
    than Linux.
What BrandZ is Not
• Not a full system emulator or virtualization layer
  > No non-Solaris kernel code is ever executed.
  > You can't run any random Linux distribution.
• Doesn't support all Linux kernel functionality.
  > No support for Linux file systems, kernel modules, or
    device drivers.
  > Not all system calls are fully supported.
• Not simply binary emulation (like lxrun, wine, etc.)
  > You can't just run the Linux version of acroread from
    your Solaris shell prompt.
Devices in a Linux Zone
• Zones approach makes it easy to control which
  devices accessible to Linux apps
• Initially only supporting minimum needed:
  > /dev/null, /dev/zero, /dev/ptmx, /dev/pts/*, /dev/tty,
    /dev/console, /dev/random, /dev/urandom
  > OSS audio devices (good for Quake)
• No network or disk devices
  > Network plumbing done by the global zone
     > Status is collected from /proc/net and ioctl()s on sockets
  > Filesystems mounted by the global zone
• No support for framebuffers (bad for Quake)
Observability in a Linux Zone
• Support both Solaris and Linux tools
• In the Linux zone:
  > strace – syscall tracer
  > gdb – GNU debugger
• From the Global zone:
  > DTrace: has the PID provider and Linux syscall provider
  > mdb: is able to manipulate live processes and core files
• Goal: to be a better Linux development platform
  than Linux
BrandZ Status
• Zones running a Red Hat Enterprise Linux 3.x or
  CentOS 3.5 operating environment
  > Support for Linux 2.4.21 system call interface
  > Basic /proc and /dev support
• DTrace support for Linux applications
  > Linux syscall provider
  > PID provider
• Rapid deployment and teardown of Linux zones.
  Perfect for building 'throwaway' zones for
  development/QA
• In OpenSolaris from build 49
Agenda
•   Virtualization Overview
•   Zones
•   BrandZ
•   Xen
•   CrossBow
•   Q&A
Hypervisors 101
●   The “Virtual Machine” abstraction
    ●   Yes, the one from the 1970's
    ●   Virtualizes hardware – memory, CPU, I/O devices
    ●   May emulate real devices
●   For x86/x64 multiple choices
    ●   Xen, VMware, Parallels, Qemu
    ●   Microsoft now: Virtual Server, future: Viridian
Solaris Containers and Hypervision
●   Containers (zones)
    •   Scalable, fast, virtual platform, platform agnostic
    •   Emphasis on controlled sharing, simpler admin
    •   Improved fault isolation over “single system.”
    •   Alternate brands
●   Hardware Virtualization
    •   Emphasis on separation
    •   Fault isolation, (Xen: single points of failure remain)
    •   Live Migration
    •   Foreign OSes
Xen
• Open source hypervisor technology developed at
  the University of Cambridge
  > http://www.cl.cam.ac.uk/Research/SRG/netos/xen/
• OpenSolaris on Xen community
  > http://www.opensolaris.org/os/community/xen
• 2006: HW Virtualization Everywhere
  > x64 CPU capabilities (VT-x, AMDV)
  > Workload consolidation
Xen 3.x Architecture
              dom0                     domU1                          domU2                 domU3
               VM0                     VM1                           VM2                    VM3
             Device                 Unmodified                    Unmodified            Unmodified
            Manager &                  User                          User                  User
            Control s/w              Software                      Software              Software

             GuestOS                 GuestOS                       GuestOS               Unmodified
     AGP     (Solaris)               (XenLinux )                    (Solaris)             GuestOS
                                                                                          (WinXP ))
     ACPI     Back -End               Back -End
                                                                      SMP
     PCI            Native          Native                                                               VT-x
                    Device          Device                          Front -End            Front -End
                    Driver          Driver                        Device Drivers        Device Drivers   AMDV


              Control IF       Safe HW IF         Event Channel           Virtual CPU   Virtual MMU
 32/64bit                        Xen Virtual Machine Monitor

                     Hardware (SMP, MMU, physical memory, Ethernet, SCSI/IDE)
Key Xen Capabilities
●   Checkpoint/Restart and Live Migration
    •   Provisioning
    •   Grid operations
●   Multiple OSes running simultaneously
    •   Linux, Solaris, Windows XP
    •   No longer a boot-time decision
●   Special purpose kernels
    •   Drivers, filesystems
Xen Live Migration Experiment (2004)
●   Two machines running Xen 2.0
    • 2GHz Hyperthreaded CPUs
    • 1Gbit Ethernet
    • Remote storage
    • XenoLinux
●   SPECweb99 benchmark
    • 800Mbyte domU, 90% CPU utilization

Now, move workload from machine A to machine B ...
Live Migration: SPECweb99




     From LinuxWorld 2005 Virtualization BoF
Why Solaris Domain 0
•   Observability, debugging tools
•   ZFS
•   Containers and Trusted Extensions
•   CrossBow network virtualization
•   Hardware support
OpenSolaris on Xen Port
• “Platform” rather than “arch” port
  > Privileged operations -> hypercalls
  > MMU, segmentation, exceptions
  > Xen “event” model of interrupts
• Direct boot
  > No more multiboot
• New virtual device drivers
  > net, disk, console
• Dom0 infrastructure and tools
OpenSolaris on Xen Status
• OpenSolaris domU and dom0
  > 32/64-bit, UP, MP (virtual 32-way!)
  > Virtual disks, network, bridge
  > CPU and Memory Hot plug support
• Versions
  > Xen 3.0.2-3 (xen-3.0.3-sun)
  > OpenSolaris build 44
• Future OpenSolaris drops
  > Performance, Bug fixes
  > Soon-ish: Mercurial SCM
Agenda
•   Virtualization Overview
•   Zones
•   BrandZ
•   Xen
•   CrossBow
•   Q&A
The Need for Network Virtualization
• ISP offering web and e-mail services
  > Consolidate multiple hosts on a single machine
  > Users expect minimal performance level per virtual host
• Financial services
  > Consolidate multiple services on a single machine
  > Some services have minimum performance
    requirements, or higher priority
• Data centers using separate VLANs or LANs
  > Want separate Zones to only connect to “their” (V)LANs
  > May want to control network utilization as well
• The need applies to Zones as well as Xen
CrossBow: Foundations for Network
Virtualization in Solaris
• Virtual Network Interface Cards (VNICs): define
  multiple virtual NICs on hardware NICs, and assign
  them to Zones, Xen domains, etc.
• Network Resource Control: associate priority and
  bandwidth limits to VNICs, services, or protocols.
• IP Instances: provide the option to assign a Zone
  its own instance of IP, ensuring IP-level separation.
CrossBow Virtual NICs
• Carve up 1Gb/s and 10Gb/s hardware NIC into
  multiple virtual NICs
• Implemented as a Nemo/GLDv3 MAC driver.
• Assign NIC hardware resources (interrupts, rings,
  etc) to virtual NICs
• Rely on hardware-based flow classification to steer
  traffic to VNICs and maximize performance
• Assign VNICs to Zones or Xen domains
CrossBow Virtual NICs Example
                                                      Compute Resources

                                      Zone 1      Zone 2                         Zone 'n'
                                      Virtual
                                      Squeue
                                                  Virtual
                                                  Squeue     .. .                Virtual
                                                                                 Squeue



     Zone 1 Virtual SQUEUE                                       Zone 2
                                                            Virtual SQUEUE
    HTTP      HTTPS             Default
    Squeue    Squeue
                         .. .   Squeue                          All Traffic

                        VNIC1                                                 VNIC2

     Zone 1    Zone 1            Zone 1                         Zone 2

                         .. .
     HTTP      HTTPS             Default                        All Traffic
     Ring      Ring              Ring
                                                  .. . .        Ring

                                Flow Classifier

                                      NIC
Network Resource Control
• Allows the assignment of bandwidth and priorities to
  VNICs, protocols, or services
• Each VNIC, protocol, or service is associated with a
  dedicated squeue (extended from FireEngine and
  Nemo)
• The squeues control the receive rings:
  > Dynamically switch the NIC between interrupt and polling
    mode
  > Resource usage policies are controlled by pulling only
    the allowed number of packets from the ring
  > Results in little or no performance overhead
CrossBow Virtual NICs for Xen
                                                           Solaris Guest OS 2
  Solaris Host OS                         Solaris Guest OS 1
                                                                                                         NIC
                                               NIC                                                  Virtualization
     NIC                                                                                               Engine
Virtualization                            Virtualization
   Engine                                    Engine
                                                                                               Guest OS 2
            Host OS                       Guest 1 Virtual SQUEUE
                                                                                             Virtual SQUEUE
        Virtual SQUEUE                                                                           All Traffic
                                           HTTP        HTTPS                Default
            All Traffic                   Squeue       Squeue
                                                                 .. .       Squeue
                                                                                            Guest OS 2 VNIC
         Host OS VNIC                     Solaris Guest OS 2 VNIC


                 HOST OS                  Guest OS 1    Guest OS 1            Guest OS 1            Guest OS 2
                 All traffic
                   Ring        .. . . .     HTTP
                                            Ring
                                                         HTTPS
                                                          Ring
                                                                     .. .
                                                                               Default
                                                                                Ring
                                                                                           .. . .
                                                                                                     All Traffic
                                                                                                        Ring

                                                Flow Classifier
                                                        NIC
IP Instances for Zones -
The Benefits of Separation
• Separate LANs or VLANs can be attached to
  different zones with no IP leakage between them
  > Possible to have a management network separate from
    the data network
• Enables the use of IP-level features for zones
  > DHCP
  > IPsec
  > IP Filter
• Per instance network configuration (routing tables,
  transport tunable, etc)
IP Instances and separate VLANs

                       Global        Zone 1                      Zone 2
                       Zone
                        Global        Zone 1                       Zone 2
                         Zone         Squeue                       Squeue
       Specific         Squeue
         To
      Containers        Shared       Exclusive                 Exclusive
                          IP             IP                        IP


                         LAN           VLAN
                                       tag 1         .. .          VLAN
                                                                   tag 33




      Common       Global Zone
                      Ring
                                         Zone1                      Zone n

                                                          .. .
                                          Ring                       Ring
       To All
       Zones       Flow Classifier               Flow Classifier
                          NIC                        NIC
       Management                                        Data network – bge1 with VLANs
       network - bge0                                    bge10001 and bge33001
CrossBow Status
• Available at OpenSolaris.org:
  > Core VNIC functionality
  > Bandwidth Control for TCP
  > IP Instances
Join Us...
• Our communities and projects are open on
  OpenSolaris.org:
  > Zones: http://opensolaris.org/os/community/zones
  > Resource management:
    http://opensolaris.org/os/project/rm
  > BrandZ: http://opensolaris.org/os/community/brandz
  > Xen: http://opensolaris.org/os/community/xen
  > CrossBow: http://opensolaris.org/os/project/crossbow
• Where you will find:
  > Lively discussions, design docs, FAQs, source code
    drops, preliminary binary releases, etc...
OpenSolaris Virtualization
Technologies



David Edmondson
dme@sun.com

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:2
posted:7/9/2011
language:English
pages:49