Docstoc

fs

Document Sample
fs Powered By Docstoc
					File System Interface and
     Implementations

           Fred Kuhns
    CS523 – Operating Systems




                Washington
           WASHINGTON UNIVERSITY IN ST LOUIS
                   FS Framework in UNIX

• Provides persistent storage
• Facilities for managing data
    – file - abstraction for data container, supports
      sequential and random access
    – file system - permits organizing, manipulating and
      accessing files
• User interface specifies behavior and
  semantics of relevant system calls
    – Interface exported abstractions: files, directories,
      file descriptors and different file systems


Fred Kuhns (11/13/2010)   CS523 – Operating Systems        2
             Kernel, Files and Directories
• kernel provides control operations to name,
  organize and control access to files but it
  does not interpret contents
• Running programs have an associated current
  working directory. Permits use of relative
  pathnames. Otherwise complete pathnames
  are required.
• File viewed as a collection of bytes
    – Applications requiring more structure must define
      and implement themselves



 Fred Kuhns (11/13/2010)   CS523 – Operating Systems      3
             Kernel, Files and Directories
• files and directories form hierarchical tree
  structure name space.
   – tree forms a directed acyclic graph
• Directory entry for a file is known as a hard
  link.
   – Files may also have symbolic links
• File may have one or more links
• POSIX defines library routines {opendir(),
  readdir(), rewinddir(), closedir()}
                           struct dirent {
                             ino_t d_ino;
                             char d_name[NAME_MAX + 1];
                           }
 Fred Kuhns (11/13/2010)       CS523 – Operating Systems   4
        File and Directory Organization


   (hard) links                             /


     bin                  etc        dev                    usr         vmunix


     sh                                         local             etc

            /usr/local/bin/bash
                                                bin

                                                bash

Fred Kuhns (11/13/2010)         CS523 – Operating Systems                        5
                              File Attributes
• Type – directory, regular file, FIFO, symbolic link, special.
• Reference count – number of hard links {link(), unlink()}
• size in bytes
• device id – device file resides on
• inode number - one inode per file, inodes are unique within a
  disk partition (device id)
• ownership - user and group id {chown()}
• access modes - Permissions and modes {chmod()}
    – {read, write execute} for {owner, group or other}
• timestamps – three different timestamps: last access, last
  modify, last attributes modified. {utime()}



    Fred Kuhns (11/13/2010)     CS523 – Operating Systems   6
                       Permissions and Modes
• Three Mode Flags = {suid, sgid and sticky}
   – suid –
        • File: if set and executable then set the user’s effective user id
        • Directory: Not used
   – sgid –
        • File: if set and executable then set the effective group id. If sgid is set
          but not executable then mandatory file/record locking
        • Directory: if set then new files inherit group of directory otherwise
          group of creator.
   – sticky –
        • File: if set and executable file then keep copy of program in swap area.
        • Directory: if set and directory writable then remove/rename if EUID =
          owner of file/directory or if process has write permission for file.
          Otherwise any process with write permission to directory may remove or
          rename.



   Fred Kuhns (11/13/2010)      CS523 – Operating Systems                       7
                            User View of Files
• File Descriptors (open, dup, dup2, fork)
    –   All I/O is through file descriptors
    –   references the open file object
    –   per process object
    –   file descriptors may be dup’ed {dup(), dup2()}, copied on fork
        {fork()} or passed to unrelated process {(see ioctl() or sendmsg(),
        recvmsg()}permitting multiple descriptors to reference one object.
• File Object - holds context
    – created by an open() system call
    – stores file offset
    – reference to vnode
• vnode - abstract representation of a file




  Fred Kuhns (11/13/2010)        CS523 – Operating Systems            8
                           How it works
fd = open(path, oflag, mode);         lseek(), read(), write() affect offset
  File Descriptors                         Open File Objects
   {{0, uf_ofile}
                                      {*f_vnode,f_offset,f_count,...},
    {1, uf_ofile}                     {*f_vnode,f_offset,f_count,...},
    {2 , uf_ofile}
                                    {*f_vnode,f_offset,f_count,...},
     {3 , uf_ofile}                   {*f_vnode,f_offset,f_count,...},
      {4 , uf_ofile}                  {*f_vnode,f_offset,f_count,...}}
      {5 , uf_ofile}}

             Vnode/vfsVnode/vfs
            In-memory          Vnode/vfs
                      In-memory           Vnode/vfs
           representation     In-memory          Vnode/vfs
                     representation      In-memory
                             representation In-memory
               of file of file          representation
                                               representation
                                  of file of file
                                                    of file
 Fred Kuhns (11/13/2010)    CS523 – Operating Systems                      9
                                  Overview

                                   System calls



                                  vnode interface



tmpfs        swapfs        UFS       HSFS          PCFS          RFS    /proc    NFS


                           disk     cdrom         diskette             Process
 Anonymous
                                                                       address
  memory
                                                                        space

                                                              Example from Solaris
 Fred Kuhns (11/13/2010)          CS523 – Operating Systems                          10
                          File Systems

 • File hierarchy composed of one or more File
   Systems
 • One File System is designated the Root File
      System
 • Attached to mount points
 • File can not span multiple File Systems
 • Resides on one logical disk




Fred Kuhns (11/13/2010)    CS523 – Operating Systems   11
                            Logical Disks
• Viewed as linear sequence of fixed sized, randomly
  accessible blocks.
   – device driver maps FS blocks to underlying storage device.
   – created using newfs or mkfs utilities
• A file system must reside in a logical disk, however a logical
  disk need not contain a file system (for example the swap
  device).
• Typically logical disk corresponds to partion of a physical
  disk. However, logical disk may:
   – map to multiple physical disks
   – be mirrored on several physical disks
   – striped across multiple disks or other RAID techniques.




  Fred Kuhns (11/13/2010)    CS523 – Operating Systems            12
                            File Abstraction
• Abstracts different types of I/O objects
  – for example directories, symbolic links, disks, terminals,
    printers, and pseudodevices (memory, pipes sockets etc).
• Control interface includes fstat, ioctl, fcntl
• Symbolic links: file contains a pathname to the
  linked file/directory. {lstat(), symlink(), readlink()}
• Pipe and FIFO files:
  – FIFO created using mknod(), lives in the file system
    name space
  – Pipe created using pipe(), persists as long as opened for
    reading or writing.


  Fred Kuhns (11/13/2010)      CS523 – Operating Systems   13
                     OO Style Interfaces

   Abstract base class                 Instance of derived class
Struct interface_t               Struct interface_t
{                                {
// Common functions:               open (), close ()
   open (), close ()               type, count
// Common data:                    *ops
   type, count                     *data                      {my_read()
// Pure virtual functions        }                             my_write()
   *ops (Null pointer)                                         my_init()
// Private data                                 {device_no,    my_open()
   *data (Null pointer)                          free_list,   …}
}                                                lock, …}

Fred Kuhns (11/13/2010)     CS523 – Operating Systems                  14
Sun’s (SVR4) Vfs/Vnode Framework
• Concurrently support multiple file system
  types
• transparent interoperation of different file
  systems within one file hierarchy
     – enable file sharing over network
     – abstract interface allowing easy integration of
       new file systems by vendors




Fred Kuhns (11/13/2010)   CS523 – Operating Systems      15
                           Objectives
• Operation performed on behalf of current
  process
• Support serialized access, I.e. locking
• must be stateless
• must be reentrant
• encourage use of global resources (cache,
  buffer)
• support client server architectures
• use dynamic storage allocation


 Fred Kuhns (11/13/2010)    CS523 – Operating Systems   16
                           Vnode/vfs interface
• Define abstract interfaces
• vfs: Fundamental abstraction representing a file
  system to the kernel
   – Contains pointerss to file system (vfs) dependent
     operations such as mount, unmount.
• vnode: Fundamental abstraction representing a
  file in the kernel
   – defines interface to the file, pointer to file system
     specific routines. Reference counted.
   – accessed in two ways:
         • 1) I/O related system calls
         • 2) pathname traversal



 Fred Kuhns (11/13/2010)        CS523 – Operating Systems    17
                              vfs Overview
                                           fs dependent                            fs dependent
                                          Struct vfsops {                         Struct vfsops {
                                           *vfs_mount,                             *vfs_mount,
                                           *vfs_root, …}                           *vfs_root, …}
rootvfs                                         private data                           private data


                Struct vfs {                                Struct vfs {
                 *vfs_next,                                  *vfs_next,
                 *vfs_vnodecovered,                          *vfs_vnodecovered,
                 *vfs_ops,                                   *vfs_ops,
                 *vfs_data, …}                               *vfs_data, …}




  Struct vnode {                  Struct vnode {                     Struct vnode {
   *v_vfsp,                        *v_vfsp,                           *v_vfsp,
   *v_vfsmountedhere,…}            *v_vfsmountedhere,…}               *v_vfsmountedhere,…}
        / (root)                             /usr                        / (mounted fs)
 Fred Kuhns (11/13/2010)              CS523 – Operating Systems                                 18
                          Mounting a FS


• mount(spec, dir, flags, type, dataptr,
  datalen);
• SVR5 uses a global virtual file system switch
  table (vfssw)
• allocate and initialize private data
• initialize vfs struct
• locate and initialize root vnode of FS in
  memory (VFS_ROOT)


Fred Kuhns (11/13/2010)     CS523 – Operating Systems   19
                            Pathname traversal
• Path traversal must, for each path component perform
  the following:
   –   Verify vnode is directory, if not then stop
   –   invoke VOP_LOOKUP (ufs_lookup()),
   –   if component found, return pointer to vnode.
   –   if not found and last component return vnode of parent directory
   –   Otherwise not end and not found then ENOENT error.
• If a component corresponds to a mount point then locate
  root vnode of mounted fs.
• If component is a symbolic link, then append path
• vnodes reference counts incremented during lookup
• May use a Directory Lookup Cache (name to vnode)




  Fred Kuhns (11/13/2010)       CS523 – Operating Systems           20
             Other vfs/vnode interfaces
• 4.4 BSD vfs/vnode interface
     – Adds state to interface
     – enhanced lookup
     – vnode locking across multiple operations
• OSF/1
     – uses timestamps to optimize lookups




Fred Kuhns (11/13/2010)   CS523 – Operating Systems   21
                          Local File Systems
• S5fs - System V file system. Based on the
  original implementation.
• FFS/UFS - BSD developed filesystem with
  optimized disk usage algorithms




Fred Kuhns (11/13/2010)       CS523 – Operating Systems   22
                          S5fs - Disk layout

• Viewed as a linear array of blocks
• Typical disk block size 512, 1024, 2048
  bytes
• Physical block number is the block’s index in
  array
• disk uses cylinder, track and sector
• first few blocks are the boot area, which is
  followed by the inode list (fixed size)



Fred Kuhns (11/13/2010)       CS523 – Operating Systems   23
                            Disk Layout


                          tract         sector           heads


        cylinder


 Rotational speed                                        platters
  disk seek time

Fred Kuhns (11/13/2010)      CS523 – Operating Systems              24
                          S5fs disk layout


       bootarea superblock inode list                    data

Boot area - code to initialize bootstrap the system

Superblock - metadata for filesystem. Size of FS,
size of inode list, number of free blocks/inodes,
free block/inode list

   inode list - linear array of 64byte inode structs
Fred Kuhns (11/13/2010)      CS523 – Operating Systems          25
                          s5fs - some details

              inode         name
                8             .                             Di_mode (2)
               45             ..                            di_nlinks (2)
                0             “”                            di_uid (2)
               123          myfile                          di_gid (2)
                                                            di_size (4)
                                                            di_addr (39)
                                                            di_gen (1)
                                                            di_atime (4)
                                                            di_mtime (4)
              2 byte         14byte                         di_ctime (4)

                      directory                             On-disk inode
Fred Kuhns (11/13/2010)         CS523 – Operating Systems                   26
                       Locating file data blocks
         Assume 1024 Byte Blocks                    3 B/index => 224 = 16 M blocks
                0                                                           or 16 GB of data
                1
                2
                3
3 Bytes/entry




                4
                5                                256 links

                6
                7                                                    256 links

                8                                256 links                         256 links

                9
                10 - indirect                                 256 links           256 links

                11 - double indirect
                                                 256 links
                12 - triple indirect                                             256 links


      Fred Kuhns (11/13/2010)          CS523 – Operating Systems                               27
            S5fs Kernel Implementation
  • In-Core Inodes - also include vnode, device
    id, inode number, flags
  • Inode lookup uses a hash queue based on
    inode number (may also use device number)
  • kernel locks inode for reading/writing
  • Read/Write use a buffer cache or VM




Fred Kuhns (11/13/2010)   CS523 – Operating Systems   28
                           Problems with s5fs
• Superblock – contains essential information but is
  not replicated.
• on-disk inodes – inodes physically located at front
  of disk, may result in long seek times
• Disk block allocation – free block order is not
  optimized (blocks of a file may not be “close”)
• Disk block size – 512 or 1024 Byte blocks
• file name size – max of 14 chars




 Fred Kuhns (11/13/2010)       CS523 – Operating Systems   29
      Berkeley Fast File System - FFS
• Disk partition divided into cylinder groups
• superblocks restructured and replicated across
  partition
    – Constant information
    – cylinder group summary info such as free inodes and
      free block
• support block fragments – typcial block size
  8KB, fragment can be as small as 512B
• Long file names
• new disk block allocation strategy



Fred Kuhns (11/13/2010)   CS523 – Operating Systems    30
                     FFS Allocation strategy
• Goal: Collocate similar data/info
• attempt to locate file inodes in same cyl group as directory
• new directories created in different cyl groups
   – choose from list of groups with above average free inode counts
• attempt to place file data blocks and inode in same cyl
  group
• Change cyl group when file size reaches 48KB, and
  thereafter every 1 MB.
• allocate sequential blocks at a rotationally optimal position.
• Choose cyl group with “best” free count




   Fred Kuhns (11/13/2010)   CS523 – Operating Systems             31
                          Is FFS/UFS Better?
• Measurements have shown substantial
  performance benefits over s5fs
• FFS however, is sub-optimal when the disk is
  nearly full. Thus 10% is always kept free.
• Modern disks however, no longer match the
  underlying assumptions of FFS




Fred Kuhns (11/13/2010)       CS523 – Operating Systems   32
                 Traditional Buffer Cache


                                Free
Hash (device,inode)            (LRU)




Fred Kuhns (11/13/2010)   CS523 – Operating Systems   33
   Other Limitations of s5fs and FFS

   • Performance - hardware designs and
     modern architectures have redefined the
     computing environment
   • Crash Recovery do you like waiting for
     fsck()?
   • Security - do we need more than just 7
     bits
   • File Size limitations



Fred Kuhns (11/13/2010)   CS523 – Operating Systems   34
                           Performance Issues
• FFS has a target rotational delay which
  estimates the time spent by kernel calculating
  the next read/write.
    – alternative is to read/write entire track
    – factor in that many disks have built-in caches
• Due to the buffer cache, most disk I/O
  operations are writes. Note, given locality of
  reference assumptions most writes should be
  deferred.
• Synchronous writes of metadata
• Disk head seeks are expensive

 Fred Kuhns (11/13/2010)        CS523 – Operating Systems   35
                            Sun-FFS (cluster)
• Goal: Cluster I/O Operations to improve
  performance
• Keeps disk block allocator
• Assume rotational interleaving is not necessary:
  – sets rotational delay to 0
  – store cluster size in superblock, overloading maxcontig
• read clustering: read in physically contiguous blocks
  for file up to maxcontig blocks.
• write clustering: pages are left in cache untill
  either a synchronous write is necessary or
  contigsize blocks can be written.


  Fred Kuhns (11/13/2010)       CS523 – Operating Systems   36
              4.4BSD Log-Structured FS
• Entire disk dedicated to log – completely
  describes the file system.
   – Log divided into segments, with each segment pointing
     to the next (non-contiguous segments)
   – all writes are to tail of log file
• garbage collection by a cleaner daemon to
  permit the log to wrap around.
• Segment describes physical partitioning of disk
  and is comprised of partial segments.



Fred Kuhns (11/13/2010)   CS523 – Operating Systems    37
                           BSD-LFS
• Directory and inode structures retained, issue is
  locating inodes
   – inodes written to disk as part of log, modified inodes
     written to a new location on disk.
• Requires new data structure: inode map. A map of
  all inodes and their location on disk. Map is
  periodically written to disk (checkpointed).




 Fred Kuhns (11/13/2010)   CS523 – Operating Systems     38
                          Segments
  • Segment usage table: contains Bytes
    stored in segment and time of last
    modification
  • partial segment is an atomic write and
    contains
        – checksum,
        – for each file with data blocks in segment the
          inode number, version and logical block
          numbers.
        – disk address of each inode contained in PS



Fred Kuhns (11/13/2010)   CS523 – Operating Systems       39
                          Example Write

  • Dirty buffer collected until it has a full
    segment.
  • logical blocks are ordered, inode updated
    and segment written to tail of log file. Old
    copies of file blocks and inode are now free
    and available to the garbage collector.




Fred Kuhns (11/13/2010)     CS523 – Operating Systems   40
                           Log-structured FS
• Requires a large cache for read efficiency
• Write efficiency is obtained since the system is
  always writing to the end of the log file.
   – Why does this help?
• Why does performance compare to Sun-FFS?
• What about crash recovery?
   – locate checkpointed imap and segment table, update
     from subsequent log entries (rely on timestamps)
   – cycle through timestamps until reach last checkpoint




 Fred Kuhns (11/13/2010)       CS523 – Operating Systems   41
                          Garbage Collection
• log wraps from end to start of disk
  necessitating GC
• GC reads segment and identifies valid entries
  which are written to tail, allowing segment to be
  freed.
• GC implemented by cleaner process which uses
  the ifile (system files holding the imap and
  segment table)




Fred Kuhns (11/13/2010)       CS523 – Operating Systems   42
                            Assessing BSD-LFS
• all changed metadata may not make it into a signal
  partial segment. Complicates recovery
• Block allocation when segment written to disk, thus
  must ensure blocks will be available when time to
  write.
• Requires large physical memory for the large cache.
• BSD-LFS superior to FFS but compared to Sun-
  FFS advantages are less clear.
  – BSD-LFS faster at metadata operations
  – Sun-FFS faster with I/O intensive applications
  – comparable for general purpose use.



  Fred Kuhns (11/13/2010)       CS523 – Operating Systems   43
                          4.4BSD Portal FS


                                                             Portal
                User process
                                                            daemon

   /p/<path>               fd                        <path>       fd


           Protal file system                               Sockets




Fred Kuhns (11/13/2010)         CS523 – Operating Systems              44
                     Stackable Filesystems


                          application                   application

       /mylocal
                            MyFS

                                                         /local
                                        UFS

• For a given mount point, there is now possible
  many file systems

Fred Kuhns (11/13/2010)            CS523 – Operating Systems          45

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:8
posted:11/13/2010
language:English
pages:45