User-space System Device Enumeration _uSDE_

Document Sample
User-space System Device Enumeration _uSDE_ Powered By Docstoc
					User-space System Device
  Enumeration (uSDE)
        Mark Bellon
   MontaVista Software, Inc.
• Enumerate - to specify one after another
  – Specify/instantiate/remove system devices
     • create
     • delete
     • diagnostics
  – Deal with devices in a dynamic environment
     • system start up
     • hot insertions and removals
               uSDE (1)?

• An architecturally and philosophically
  neutral framework for enumerating the
  devices attached to a computer system
• An open, extensible implementation (even
  in real-time!) of device enumeration that
  supports one or more systems of
  enumeration - simultaneously if necessary!
               uSDE (2)?
• Provides transaction protected consistent
  real-time (low latency) access to data
• Designed for carrier grade and embedded
  environments; desktops fall out trivially
• Optimized for speed; can handle a huge
  number of devices
• Small and reliable
                  uSDE (3)?
• It did not start life as as specialized or
  limited handler; from its beginning it has
  been designed to handle all device types
• It does not mandate a formal database
• It operates entirely in user space
  – MVL CGE 3.1
  – 2.6 test 6 or later
                 uSDE Overview
                    uSDE executive
uSDE scanner

uSDE agent
 uSDE agent

 uSDE utility
  uSDE utility      Configuration
                       Files          Policy
        uSDE External Stimuli (1)
    uSDE         Insert/remove events

                 Appear events
                                        uSDE executive
uSDE scanner

                Aspect-change events
uSDE agent
    uSDE External Stimuli (2)
• uSDE /sbin/hotplug replacement
  – A binary that provides the functionality of
    existing shell scripts
  – Forwards all hotplug events to the uSDE
    executive for processing
  – Device insert and remove event are of
    particular interest
    uSDE External Stimuli (3)
• uSDE scanner
  – Invoked by the uSDE executive to determine
    the initial ensemble of system devices
  – Scans sysfs for appropriate devices and sends
    “appear” events
  – Typically runs only once (when uSDE
    executive runs for “the first time”
    uSDE External Stimuli (4)
• uSDE agent
  – A program, usually a daemon, that provides
    information necessary for the manipulation of a device
    that is otherwise unavailable from sysfs, /proc or the
  – Commonly used to send aspect-change events
     • Multi-chassis, geographical addressing
         – ATCA
         – “well known” platforms
     • IPMI and/or networks
       uSDE Executive (1)

                 Internal Events
uSDE executive                     Policy
   daemon                            Policy
         uSDE Executive (2)
• Loads configuration files
• Determines initial device ensemble
  – device scanner
• Initializes event/device handlers
  – sends (internal) “init” event to each handler
• Processes events
  – handles out of order arrival issues
           uSDE Executive (3)
• Event processing
  –   Classifies device associated with an event
  –   Maps external event to an internal event
  –   Queues the internal event for servicing
  –   Schedules internal event processing
  –   Provides logging of critical data
          uSDE Executive (4)
• Device classification (phase 1)
  – Derived directly from device’s sysfs path
     • class
        – disk, ethernet, cdrom, floppy, loop, raid, etc.
     • sub-class
        – sda -> class “disk”, sub-class “scsi”
        – eth0 -> class “ethernet”, sub-class “generic”
          uSDE Executive (5)
• Device classification (phase 2)
  – sub-class from phase 1 may be updated
     • Determine parent device
     • Search for additional information and, if present,
       override initial classification
        – “scsi” may become “fibrechannel”, “ieee-1394”, etc.
        – “ide” may become “eide”, “serial-ata”, etc.
  – No limitations on sub-class override
  – pci-info file provides information for this phase
         uSDE Executive (6)
• The internal event is queued for service
  – sysfs path of device
  – internal event type
  – class and sub-class assigned to device
• Enumeration service maintains queues
  – each class has a queue
  – sub-class is ignored
         uSDE Executive (7)
• Device queues are aggressively scheduled
  – All queues may be running concurrently
  – No concurrent servicing within a queue
• Events may be coalesced
  – identical event type and sub-class
  – each sysfs path is added to a list
• A service container is invoked in response
  to an event
           uSDE Executive (8)
• A service container is a list of one or actions
  that are invoked in a definite order
  – a configuration file specifies the service containers
• Class and sub-class control handling
  – A service container is associated with each class and
• An internal event is sent to each action
  within the service container
         uSDE Executive (9)
• An action contained in a service container is
  known as a policy method
  – implement the policies of its designer
  – Each policy method is sent the same parameters
• Policy methods must be prepared to accept
  multiple arguments (devices)
  – minimized number of invocations
  – “closeness” optimizations are possible
uSDE Policy Methods (1)

     uSDE Policy Methods (2)
• Policy methods:
  – Are Linux programs
     • Write in any language you wish including shells
  – Are invoked with a standardized command line
     •   class
     •   sub-class
     •   event type
     •   device argument(s) - sysfs path
     •   standardized options
     uSDE Policy Methods (3)
• Policy methods:
  – actually enumerates a device
  – determine which instance within class should
    be associated with a device
  – are free to implement whatever policies they
    see fit
                 uSDE Files (1)

                    uSDE executive

uSDE utilities      Configuration
                       Files          Policy
                     Backing store
              uSDE Files (2)
• Human readable - ASCII
• Formal grammars (YACC) for each file
  – One can be sure the file is valid
• Hand optimized lexer for speed
  – still room for improvement
• Separate API for each file via shared library
  – No wasted memory
                 uSDE Files (3)
• Deployment-model
   – how to handle events and permissions
• hardware-map (optional)
   – how to control your special hardware
• pci-info (optional)
   – additional information for classification
• backing-store (optional)
   – a place to retain critical information
• exec-cache (optional in the future (special case))
   – executive caches classification here
uSDE Policy Method Toolkit (1)

                       Persistent          Emulation
Trivial Policy
                        Policy              Policy
                        Method              Method

    A wonderful set of sample code to play with...
     Policy Method Toolkit (2)
• disk-ide-policy
  – implements persistent device naming
     • Vendor/model string, Serial number
  – handles IDE, EIDE, serial ATA and USB
    hosted [E]IDE devices
  – Implements replacement and relocation policies
    for [E]IDE and mapped serial ATA
     Policy Method Toolkit (3)
• disk-scsi-policy
  – implements persistent device naming
     • Vendor ID, Product ID, Serial number
  – handles parallel SCSI, IEEE-1394,
    FibreChannel and USB hosted SCSI devices
  – handles multi-ported storage devices
  – implements replacement and relocation policies
    for parallel SCSI
    Policy Method Toolkit (4)
• floppy-policy
  – handles internal floppies
  – USB floppies show up as disks
• simple-device-policy
  – handles block and character devices
  – “catch all” for many device classes
     Policy Method Toolkit (5)
• ethernet-policy
  – implements persistent device naming
     • initial MAC address
  – implements replacement and relocation policies
     • USB ethernet devices not supported yet (trivial)
  – uses hardware-map file to insure specific
    interfaces retain names despite device search
     Policy Method Toolkit (6)
• Emulation policies (for those that need it)
  – devfs
  – Linux Standard Base (LSB)
     Policy Method Toolkit (7)
• Special purpose policies
  – disk-cs-policy
     • An example of a policy that makes use of an agent
     • Names are based on the geographical address of a
       disk in a chassis/slot environment
  – multipath-policy
     • automatic provisioning of multi-ported disks
     • Not limited to SCSI or FibreChannel
               Where is it?
         Future Directions (1)
• A sufficient portion of our ideas are
  expressed in this prototype; it’s time to get
  lots of feedback and additional input
• Implementation is open source and
• sourceforge project is up and running
         Future Directions (2)
• Event mechanism is a closed socket hack.
  This should be replaced with an open
  messaging system
• grammar cleanup throughout
• classification scheme should be reviewed,
  simplified; scripted?
• Utilities should be improved and expanded
  – helpers for scripted policies that want retention
          Future Directions (3)
•   general walk-through and review
•   multipathing - additional controls
•   more device classes; more policies
•   devfs and lsb emulation needs work
•   flood of ideas from the community
•   backing store content wars
             Discussion Items
•   Disk naming
•   Multi-chassis agent example
•   backing-store and deployment-model examples
•   Critical definitions
•   Configuration file details
•   Transaction details
•   More on events
       A Few Definitions (1)
• Interface Technology Path (ITP)
  – The unique, unambiguous and repeatable path
    over which a system traverses hardware to
    arrive at the “location” of a device.
  – Must remain constant across system crashes,
    reset and reboots
  – For PCI devices the ITP is the Slot Path
    Address (SPA) of a device
        A Few Definitions (2)
• Interface Domain IDentifier (IDID)
  – The unique identification of a device within the
    domain managed by the device’s parent device
  – Examples : address/LUN, Dev/Func
       A Few Definitions (3)
• Device Discrimination (DD)
  – The ability to discern a difference between
    devices that on the surface appear to be
    identical. Specifically, it is the ability to
    uniquely identify one device from another
    where the devices share the same class, vendor
    and product descriptions
        A Few Definitions (4)
• Device Discrimination (continued)
  – The most common form of device
    discrimination is implemented via a serial
  – When a device is not discriminatable a useful
    equivalent is possible - use the ITP and IDID!
       A Few Definitions (5)
• Persistent Device Naming (PDN)
  – Associates a unique name with a device based
    on several of the device’s attributes
  – This differs from the current Linux device
    naming scheme where the “name” of a device is
    actually a (shorthand) description of the data
    path and selection criteria used to access the
        A Few Definitions (6)
• Persistent Device Naming (PDN) (cont.)
  – Persistently named devices must provide an
    ensemble of attributes, including the ITP, IDID
    and DD, that unambiguously discriminates one
    device from all others. It is then possible to
    recognize and insure that the device name
    remains constant regardless of how the device
    is interfaced to the a system
       A Few Definitions (7)
• Persistent Device Naming (PDN) (cont.)
  – When a device’s name cannot be built directly
    from its attributes some form of non-volatile
    storage must be be available to record the
    unique attributes along with the name assigned
    (aliased) to the device
                 uSDE Files in Detail (1)

                         uSDE executive

uSDE utilities           Configuration
                            Files          Policy
                          Backing store
    deployment-model File (1)
• service directive
  – Specifies which list of policy methods is
    associated with a given class and sub-class
• device-node-default directive
  – specifies the device node control information
    for a given class and sub-class
     • mode
     • group
     • owner
     deployment-model File (2)
• device-node-specific directive
  – specifies the device node control information
    for a specific device - class and instance within
     • mode
     • group
     • owner
• alias directive
  – specifies an alias associated with a specific
    device - class and instance within class
          hardware-map File
• Optional
• map directive
  – specifies that a particular device, identified by
    its ITP, is to be treated as a specific instance
    within a class
  – force eth0 hardware to stay eth0 no matter what
    the discovery order
• Additional information in the future
              pci-info File
• Specifies the sub-class associated with a
  given PCI device by mapping the PCI
  vendor and product registers to a sub-class
• Will be generalized to handle other
  interfaces in the near future
• Optional
              exec-cache File
• Not a configurable file
• Used internally by the uSDE executive to cache
  the mapping of a sysfs path to class and sub-class
   – Have to remember how a device was classified
     so the correct service action can be invoked
     upon remove/disappear
• Will be made optional via a special insert/appear
  mode of the executive in the near future
           backing-store File
• Optional file used to store non-volatile
  – policy methods store their data, if any, here
  – simple “data base”
  – hierarchical model
        File Transactions (1)
• All files are protected via a transaction
• Transaction framework is tuned for speed
  and simplicity
  – lock contention is expected to be minimal
  – files are expected to be small
  – files are human readable - ASCII
          File Transactions (2)
• Serialization is performed at transaction
  start and end times:
  – lock is held only within the formal transaction start and
    end routines
  – All of the files involved in the transaction are read into
  – Modified files are rewritten if modified
  – transaction must be repeated if modified file has been
    previously modified (but after transaction start within a
    given thread) by another thread of execution
     uSDE External Events (1)
• Insert event
  – a device has been physically inserted into the
• Remove event
  – a device has been physically removed from the
     uSDE External Events (2)
• Appear event
  – a device has been detected that was not inserted
     • initial device scanning
     • diagnostics (return to service)
• Disappear event
  – a device currently known to the system and in
    service has disappeared from the system
     • no longer in service
     • diagnostics (removal from service)
    uSDE External Events (3)
• Aspect-Change Event
  – A parameter associated with a device has
    become available or has changed
     • information otherwise unavailable from the kernel
     • “unusual” information sources - “out of band”
 Unambiguous Disk Naming (1)
• Names should be persistent
  – Name remains fixed across reboots and
    configuration changes
• Multi-ported disks are a challenge:
  – How is a disk named?
  – How does on unambiguously access a port?
  – How does generic SCSI logically work?
     • One node or multiple?
 Unambiguous Disk Naming (2)
• /dev/sde-disk/disk-name/d<n>p<m>
  – <n> is data port number (all disks have 0)
  – <m> is partition number
• generic SCSI node is either:
  – generic (if one)
  – generic_d<n> (if multiple)
• multi-path nodes are “multi_p<m>”
                backing-store details (1)
object "ethernet0"
         string "class" "ethernet"
         string "sub-class" "generic"
         string "vendor-string" "Intel Corp. 82544EI Gigabit Ethernet Controller"
         string "product-string" "Intel Corp. 82544EI Gigabit Ethernet Controller"
         string "discriminator" "00:02:b3:c3:5d:ac"
        string "interface-technology-path" "/devices/pci0000:00/0000:XX:03.0/0000:XX:1d.0/0000:XX:01.0"
        integer "class-instance" 0
        string "state" "present"
                         backing-store details (2)
object "disk0"
            string "device-path" "/dev/sde-disk/disk0"
            string "class" "disk"
            string "sub-class" "fibrechannel"
            string "vendor-string" "IBM      "
            string "product-string" "DDYF-T36950R     "
            string "discriminator" "TFF6C829"
            integer "class-instance" 0
            string "state" "present"
            string "service-location" "unknown"
            object "ports"
                        object "0"
                                     string "interface-technology-path" "/devices/pci0000:00/0000:XX:02.0/0000:XX:1d.0/0000:XX:01.0"
                                     string "interface-domain-ID" "0:9:0"
                                     string "sysfs-path" "/sys/block/sdd"
                                     integer "reference-count" 3
                        object "1"
                                     string "interface-technology-path" "/devices/pci0000:00/0000:XX:02.0/0000:XX:1d.0/0000:XX:01.1"
                                     string "interface-domain-ID" "0:9:0"
                                     string "sysfs-path" "/sys/block/sdb"
                                     integer "reference-count" 3
            deployment-model details
service-container disk fibrechannel { disk-scsi-policy multipath-policy }
service-container disk ide { disk-ide-policy lsb-policy devfs-policy }
service-container ethernet generic { ethernet-policy }

device-node-default disk fibrechannel
        mode 0x642
        owner “root”
        group “foo”
  Multi-chassis agent example (1)
        Chassis 0x1234             Chassis 0x5678
           C   C   C   D
                       I             C   D   C   C
                   U   S
                           Disk      P
                                         S   P
           S   S   S   S
           L   L   L                 S   S   S   S
           O   O   O   L             L       L   L
                       O             O   L   O   O
           T   T   T   T                 O
                                     T   T   T   T
           1   2   3   4             1       3   4
                           Net           2

Chassis have their disks and networks interconnected
Hot swap notification is limited to the chassis (IPMI)
A publisher agent broadcasts hot swap events to other chassis
Each CPU runs a subscriber agent - processes hot swap events
Each CPU is running a uSDE executive
    Multi-chassis agent example (2)

                                   Insert event for chassis
                                   0x5678, slot 2, disk ID
      Hot Swap Subscriber                                       Hot Swap Subscriber
        and uSDE agent                                            and uSDE agent

                                  Aspect-change event
          uSDE Executive                                           uSDE Executive

                                  Aspect-change event
           disk-cs-policy                                            disk-cs-policy

                                  Create device node
     /dev/chassis5678/slot2/...                               /dev/chassis5678/slot2/...

Chassis 0x1234, slot 1,2 3                               Chassis 0x5678, slot 1,3, 4

Shared By: