Docstoc

SCSI Mid-layer

Document Sample
SCSI Mid-layer Powered By Docstoc
					                         SCSI Mid-layer



Eric Youngdale




   2nd Annual Linux Storage Management Workshop
   October 2000
                                   Introduction

Main point of this talk:
  –   Historical evolution of Linux SCSI.
  –   Explain state of the art in Linux 2.2.
  –   Discuss changes for 2.4.
  –   Discuss pending changes in the 2.5 kernel.
             Block devices and Linux
• Linux has a generic block device layer with
  which all filesystems will interact.
• SCSI is no different in this regard – it
  registers itself with the block device layer
  so it can receive requests.
• SCSI also handles character device requests
  and ioctls that do not originate in the block
  device layer.
          What is the “Mid-Layer”?

• Linux SCSI support can be viewed as 3 levels.
• Upper level is device management, such as
  tape, cdrom, disk, etc.
• Lower level talks to host adapters.
• Middle layer is essentially a traffic cop,
  handing requests from rest of kernel, and
  dispatching them to the rest of SCSI.
        State of the art in Linux-2.2
• Error handling handled better for drivers
  that make use of new error handling code.
  New error handling code introduced in 2.2.
• Queue management fundamentally
  unchanged since the Linux 1.x days. “The
  Code that Time Forgot”. Lots of dinosaurs
  running around in the code.
• Rest of mid-level largely stagnant.
              What was wrong in 2.2?
• The elevator algorithms in 2.2 allowed requests to
  grow irregardless of the capabilities of the
  underlying device.
• All SCSI disks were handled in a single queue.
• Disk driver had to split requests that had become
  too large.
• One set of common logic for verifying requests had
  not become too large.
     What was wrong in 2.2 (cont)

• Character device requests not in queue.
• SMP safety was clumsily handled, leading
  to race conditions and poor performance.
• Poor scalability.
• Many drivers continue to use old error
  handling code.
             Queue handling in 2.2

Disk Queue Head      Disk1

                     Disk2

                     Disk1

                     Disk3

                     Disk1
               Changes for Linux-2.4
• Block device layer was generalized to
  support a “request_queue_t” abstract
  datatype that represents a queue.
• Contains function pointers that drivers can
  use for managing the size of requests
  inserted into queues.
• Requests no longer can grow to be too large
  to be handled at one time.
               Changes for 2.4 (cont)

• No longer any need for splitting requests.
• No need for ugly logic to scan a queue for a
  queueable request.
• SMP locking in mid-layer cleaned up to
  provide finer granularity.
               Changes for 2.4 (cont)
• A SCSI queuing library was created – a set
  of functions for queue management that are
  tailored to different sets of requirements.
• SCSI was modified to use a single queue for
  each physical device.
• Character device requests and ioctls are
  inserted into the same queue at the tail, and
  handled the same as other requests.
                                      Queuing library
Maintainability is a problem if multiple instances of
 code can perform similar function.
__inline static int
__scsi_merge_requests_fn(request_queue_t * q,
                   struct request * req, struct request * next,
                   int use_clustering,
                   int dma_host)
{
/* * Appropriate contents */
}
               Queueing Library (Cont).
#define MERGEREQFCT(_FUNCTION, _CLUSTER, _DMA) \
static int _FUNCTION(request_queue_t * q, \
           struct request * req, \
           struct request * next) \
 {\
 return __scsi_merge_requests_fn(q, req, next, _CLUSTER, _DMA);
    \
 }
    MERGEREQFCT(scsi_merge_requests_fn_, 0, 0)
    MERGEREQFCT(scsi_merge_requests_fn_d, 0, 1)
    MERGEREQFCT(scsi_merge_requests_fn_c, 1, 0)
    MERGEREQFCT(scsi_merge_requests_fn_dc, 1, 1)
               Changes for 2.4 (cont)

• In 2.2, there were separate functions and
  code paths for initializing SCSI for the case
  of compiled into kernel and loaded via
  modules.
• In 2.4, this was cleaned up – redundant code
  was removed, and the same code is used to
  initialize for both modules and compiled
  into kernel.
          Upcoming changes for 2.5

• All drivers will be forced to use new error
  handling code.
• Disk driver will be updated to handle larger
  number of disks.
• SMP locking will be cleaned up some more
  to improve scalability.
            Old error handling code
• Essentially a bad state machine.
• Has tons of SMP problems that are not
  easily fixed.
• Tries to resolve errors while allowing new
  requests to be queued.
• Many kernel reliability problems are
  because of old error handling problems.
• Needs to be discarded in the worst way.
           New error handling code
• The new error handling code has been
  available since the 2.1.75 kernel.
• To force driver authors to update their
  drivers, the old error handling code will
  simply be removed. Drivers that have not
  been updated will fail to compile.
• Orphaned drivers will be handled on a case-
  by-case basis.
               Further SMP cleanups
• All low-level drivers currently use
  io_request_lock for SMP safety.
• This lock is also used by all other block
  devices on the system to protect their
  queues.
• Plans are in the works to switch the block
  device layer to use a per-queue lock,
  thereby isolating SCSI from other devices.
                SMP Cleanups (cont).

• Low-level drivers don’t need to protect
  queue – they don’t have access to it.
• Each low-level driver should have a
  separate lock – ideally one per instance of
  host, but could be a driver-wide lock
  initially. This should be up to the low-level
  driver.
                 SMP Cleanups (cont)

• Block device layer has a number of arrays,
  indexed by major/minor:

     blksize_size[MAJOR(dev)][MINOR(dev)]
• Access is not protected by any locks.
• Impossible for block drivers to resize
  without introducing race condition.
              Large numbers of disks
• Current disk driver allocates 8 majors,
  allowing for only 128 disks.
• Plans are in the works to allow disk driver
  to dynamically allocate major numbers.
• Would support up to about 4000 disks,
  when major numbers are exhausted.
• Possible to go beyond this by using fewer
  bits for partitions.
                                 Wish list.

• Implement some SCSI-3 features (larger
  commands, sense buffers).
• Improve support for shared busses.
• Support target-mode.
• Check module add/remove code for SMP
  safety, implement locks.
• Improvements related to high-availability.
                                  Conclusions
   The major goal of a rewrite of SCSI queuing has
been accomplished. A number of architectural
problems were resolved at the same time.
   There are still some interesting tasks still to be
addressed for 2.5.
   See http://www.andante.org/scsi.html for more
info, and http://www.andante.org/scsi_todo.html for
“todo” list.
                                   Contacts

Email: eric@andante.org
Web: http://www.andante.org
The notes for this talk are on the website.

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:10
posted:8/28/2012
language:English
pages:24