Docstoc

fs

Document Sample
fs Powered By Docstoc
					           File System Interface and
                Implementations

                                 Fred Kuhns
                          CS523 – Operating Systems




                                        Washington
                                   WASHINGTON UNIVERSITY IN ST LOUIS




                   FS Framework in UNIX

• Provides persistent storage
• Facilities for managing data
    – file - abstraction for data container, supports
      sequential and random access
    – file system - permits organizing, manipulating and
      accessing files
• User interface specifies behavior and
  semantics of relevant system calls
    – Interface exported abstractions: files, directories,
      file descriptors and different file systems


Fred Kuhns (11/25/2003)         CS523 – Operating Systems              2




            Kernel, Files and Directories
• kernel provides control operations to name,
  organize and control access to files but it
  does not interpret contents
• Running programs have an associated current
  working directory. Permits use of relative
  pathnames. Otherwise complete pathnames
  are required.
• File viewed as a collection of bytes
   – Applications requiring more structure must define
     and implement themselves



Fred Kuhns (11/25/2003)         CS523 – Operating Systems              3
                 Kernel, Files and Directories
    • files and directories form hierarchical tree
      structure name space.
        – tree forms a directed acyclic graph
    • Directory entry for a file is known as a hard
      link.
        – Files may also have symbolic links
    • File may have one or more links
    • POSIX defines library routines {opendir(),
      readdir(), rewinddir(), closedir()}
                                     struct dirent {
                                       ino_t d_ino;
                                       char d_name[NAME_MAX + 1];
                                     }
     Fred Kuhns (11/25/2003)             CS523 – Operating Systems                        4




             File and Directory Organization


         (hard) links                                /


          bin                  etc            dev                    usr         vmunix


          sh
                                                         local             etc

                 /usr/local/bin/bash
                                                         bin

                                                         bash

     Fred Kuhns (11/25/2003)             CS523 – Operating Systems                        5




                                     File Attributes
• Type – directory, regular file, FIFO, symbolic link, special.
• Reference count – number of hard links {link(), unlink()}
• size in bytes
• device id – device file resides on
• inode number - one inode per file, inodes are unique within a
  disk partition (device id)
• ownership - user and group id {chown()}
• access modes - Permissions and modes {chmod()}
     – {read, write execute} for {owner, group or other}
• timestamps – three different timestamps: last access, last
  modify, last attributes modified. {utime()}



     Fred Kuhns (11/25/2003)             CS523 – Operating Systems                        6
                       Permissions and Modes
• Three Mode Flags = {suid, sgid and sticky}
   – suid –
         • File: if set and executable then set the user’s effective user id
         • Directory: Not used
   – sgid –
         • File: if set and executable then set the effective group id. If sgid is set
           but not executable then mandatory file/record locking
         • Directory: if set then new files inherit group of directory otherwise
           group of creator.
   – sticky –
         • File: if set and executable file then keep copy of program in swap area.
         • Directory: if set and directory writable then remove/rename if EUID =
           owner of file/directory or if process has write permission for file.
           Otherwise any process with write permission to directory may remove or
           rename.



   Fred Kuhns (11/25/2003)        CS523 – Operating Systems                      7




                             User View of Files
 • File Descriptors (open, dup, dup2, fork)
     –   All I/O is through file descriptors
     –   references the open file object
     –   per process object
     –   file descriptors may be dup’ed {dup(), dup2()}, copied on fork
         {fork()} or passed to unrelated process {(see ioctl() or sendmsg(),
         recvmsg()}permitting multiple descriptors to reference one object.
 • File Object - holds context
     – created by an open() system call
     – stores file offset
     – reference to vnode
 • vnode - abstract representation of a file




   Fred Kuhns (11/25/2003)        CS523 – Operating Systems                      8




                                How it works
 fd = open(path, oflag, mode);              lseek(), read(), write() affect offset
     File Descriptors                            Open File Objects
      {{0, uf_ofile}
                                            {*f_vnode,f_offset,f_count,...},
       {1, uf_ofile}                        {*f_vnode,f_offset,f_count,...},
       {2 , uf_ofile}
                                          {*f_vnode,f_offset,f_count,...},
         {3 , uf_ofile}                    {*f_vnode,f_offset,f_count,...},
         {4 , uf_ofile}                    {*f_vnode,f_offset,f_count,...}}
         {5 , uf_ofile}}

               Vnode/vfs
                          Vnode/vfs
              In-memory             Vnode/vfsVnode/vfs
                         In-memory In-memory          Vnode/vfs
             representation
                       representation       In-memory
                 of file         representation In-memory
                                          representation
                            of file of file        representation
                                               of file
                                                       of file
   Fred Kuhns (11/25/2003)        CS523 – Operating Systems                      9
                                   Overview

                                    System calls



                                   vnode interface



 tmpfs        swapfs        UFS       HSFS          PCFS          RFS    /proc    NFS


                            disk     cdrom         diskette             Process
   Anonymous
                                                                        address
    memory
                                                                         space

                                                               Example from Solaris
  Fred Kuhns (11/25/2003)          CS523 – Operating Systems                          10




                              File Systems

   • File hierarchy composed of one or more File
     Systems
   • One File System is designated the Root File
        System
   • Attached to mount points
   • File can not span multiple File Systems
   • Resides on one logical disk




  Fred Kuhns (11/25/2003)          CS523 – Operating Systems                          11




                               Logical Disks
• Viewed as linear sequence of fixed sized, randomly
  accessible blocks.
   – device driver maps FS blocks to underlying storage device.
   – created using newfs or mkfs utilities
• A file system must reside in a logical disk, however a logical
  disk need not contain a file system (for example the swap
  device).
• Typically logical disk corresponds to partion of a physical
  disk. However, logical disk may:
   – map to multiple physical disks
   – be mirrored on several physical disks
   – striped across multiple disks or other RAID techniques.




  Fred Kuhns (11/25/2003)          CS523 – Operating Systems                          12
                            File Abstraction
• Abstracts different types of I/O objects
   – for example directories, symbolic links, disks, terminals,
     printers, and pseudodevices (memory, pipes sockets etc).
• Control interface includes fstat, ioctl, fcntl
• Symbolic links: file contains a pathname to the
  linked file/directory. {lstat(), symlink(), readlink()}
• Pipe and FIFO files:
   – FIFO created using mknod(), lives in the file system
     name space
   – Pipe created using pipe(), persists as long as opened for
     reading or writing.


  Fred Kuhns (11/25/2003)      CS523 – Operating Systems                  13




                       OO Style Interfaces

     Abstract base class                  Instance of derived class
  Struct interface_t                Struct interface_t
  {                                 {
  // Common functions:                open (), close ()
     open (), close ()                type, count
  // Common data:                     *ops
     type, count                      *data                      {my_read()
  // Pure virtual functions         }                             my_write()
     *ops (Null pointer)                                          my_init()
  // Private data                                  {device_no,    my_open()
     *data (Null pointer)                           free_list,   …}
  }                                                 lock, …}

  Fred Kuhns (11/25/2003)      CS523 – Operating Systems                  14




  Sun’s (SVR4) Vfs/Vnode Framework
  • Concurrently support multiple file system
    types
  • transparent interoperation of different file
    systems within one file hierarchy
       – enable file sharing over network
       – abstract interface allowing easy integration of
         new file systems by vendors




  Fred Kuhns (11/25/2003)      CS523 – Operating Systems                  15
                                Objectives
• Operation performed on behalf of current
  process
• Support serialized access, I.e. locking
• must be stateless
• must be reentrant
• encourage use of global resources (cache,
  buffer)
• support client server architectures
• use dynamic storage allocation


 Fred Kuhns (11/25/2003)              CS523 – Operating Systems                                 16




                           Vnode/vfs interface
• Define abstract interfaces
• vfs: Fundamental abstraction representing a file
  system to the kernel
   – Contains pointerss to file system (vfs) dependent
     operations such as mount, unmount.
• vnode: Fundamental abstraction representing a
  file in the kernel
   – defines interface to the file, pointer to file system
     specific routines. Reference counted.
   – accessed in two ways:
         • 1) I/O related system calls
         • 2) pathname traversal



 Fred Kuhns (11/25/2003)              CS523 – Operating Systems                                 17




                              vfs Overview
                                           fs dependent                            fs dependent
                                          Struct vfsops {                         Struct vfsops {
                                           *vfs_mount,                             *vfs_mount,
                                           *vfs_root, …}                           *vfs_root, …}
rootvfs                                         private data                           private data


                Struct vfs {                                Struct vfs {
                 *vfs_next,                                  *vfs_next,
                 *vfs_vnodecovered,                          *vfs_vnodecovered,
                 *vfs_ops,                                   *vfs_ops,
                 *vfs_data, …}                               *vfs_data, …}




  Struct vnode {                  Struct vnode {                     Struct vnode {
   *v_vfsp,                        *v_vfsp,                           *v_vfsp,
   *v_vfsmountedhere,…}            *v_vfsmountedhere,…}               *v_vfsmountedhere,…}
        / (root)                             /usr                        / (mounted fs)
 Fred Kuhns (11/25/2003)              CS523 – Operating Systems                                 18
                             Mounting a FS


  • mount(spec, dir, flags, type, dataptr,
    datalen);
  • SVR5 uses a global virtual file system switch
    table (vfssw)
  • allocate and initialize private data
  • initialize vfs struct
  • locate and initialize root vnode of FS in
    memory (VFS_ROOT)


 Fred Kuhns (11/25/2003)       CS523 – Operating Systems            19




                           Pathname traversal
• Path traversal must, for each path component perform
  the following:
   –   Verify vnode is directory, if not then stop
   –   invoke VOP_LOOKUP (ufs_lookup()),
   –   if component found, return pointer to vnode.
   –   if not found and last component return vnode of parent directory
   –   Otherwise not end and not found then ENOENT error.
• If a component corresponds to a mount point then locate
  root vnode of mounted fs.
• If component is a symbolic link, then append path
• vnodes reference counts incremented during lookup
• May use a Directory Lookup Cache (name to vnode)




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems            20




              Other vfs/vnode interfaces
 • 4.4 BSD vfs/vnode interface
       – Adds state to interface
       – enhanced lookup
       – vnode locking across multiple operations
 • OSF/1
       – uses timestamps to optimize lookups




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems            21
                          Local File Systems
• S5fs- System V file system. Based on the
  original implementation.
• FFS/UFS- BSD developed filesystem with
  optimized disk usage algorithms




Fred Kuhns (11/25/2003)       CS523 – Operating Systems              22




                          S5fs - Disk layout

• Viewed as a linear array of blocks
• Typical disk block size 512, 1024, 2048
  bytes
• Physical block number is the block’s index in
  array
• disk uses cylinder, track and sector
• first few blocks are the boot area, which is
  followed by the inode list (fixed size)



Fred Kuhns (11/25/2003)       CS523 – Operating Systems              23




                             Disk Layout


                           tract         sector           heads


        cylinder


 Rotational speed                                         platters
  disk seek time

Fred Kuhns (11/25/2003)       CS523 – Operating Systems              24
                                 S5fs disk layout


                 bootarea superblock inode list                             data

      Boot area - code to initialize bootstrap the system

    Superblock - metadata for filesystem. Size of FS,
    size of inode list, number of free blocks/inodes,
    free block/inode list

                inode list - linear array of 64byte inode structs
      Fred Kuhns (11/25/2003)           CS523 – Operating Systems                               25




                                s5fs - some details

                      inode       name
                        8           .                                 Di_mode (2)
                       45           ..                                di_nlinks (2)
                        0           “”                                di_uid (2)
                       123        myfile                              di_gid (2)
                                                                      di_size (4)
                                                                      di_addr (39)
                                                                      di_gen (1)
                                                                      di_atime (4)
                                                                      di_mtime (4)
                      2 byte       14byte                             di_ctime (4)

                            directory                                On-disk inode
      Fred Kuhns (11/25/2003)           CS523 – Operating Systems                               26




                       Locating file data blocks
          Assume 1024 Byte Blocks                    3 B/index => 224 = 16 M blocks
                0                                                            or 16 GB of data
                1
                2
                3
3 Bytes/entry




                4
                                           cks




                5                                 256 links
                                        blo




                6
                7
                                       256




                                                                      256 links

                8                                 256 links                         256 links

                                                     k   s
                9                                 oc
                10 - indirect                  Bl
                                           K                   256 links           256 links

                11 - double indirect     64
                12 - triple indirect              256 links
                                                                                  256 links
                                       16M Blocks
      Fred Kuhns (11/25/2003)           CS523 – Operating Systems                               27
             S5fs Kernel Implementation
        - ore
    • In C Inodes- also include vnode, device
      id, inode number, flags
    • Inode lookup uses a hash queue based on
      inode number (may also use device number)
    • kernel locks inode for reading/writing
    • Read/Write use a buffer cache or VM




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems   28




                           Problems with s5fs
• Superblock – contains essential information but is
  not replicated.
   - isk
• on d inodes – inodes physically located at front
  of disk, may result in long seek times
• Disk block allocation – free block order is not
  optimized (blocks of a file may not be “close”)
• Disk block size – 512 or 1024 Byte blocks
• file name size – max of 14 chars




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems   29




       Berkeley Fast File System - FFS
• Disk partition divided into cylinder groups
• superblocks restructured and replicated across
   partition
     – Constant information
     – cylinder group summary info such as free inodes and
       free block
• support block fragments – typcial block size
   8KB, fragment can be as small as 512B
• Long file names
• new disk block allocation strategy



 Fred Kuhns (11/25/2003)       CS523 – Operating Systems   30
                    FFS Allocation strategy
• Goal: Collocate similar data/info
• attempt to locate file inodes in same cyl group as directory
• new directories created in different cyl groups
   – choose from list of groups with above average free inode counts
• attempt to place file data blocks and inode in same cyl
  group
• Change cyl group when file size reaches 48KB, and
  thereafter every 1 MB.
• allocate sequential blocks at a rotationally optimal position.
• Choose cyl group with “best” free count




  Fred Kuhns (11/25/2003)   CS523 – Operating Systems              31




                        Is FFS/UFS Better?
 • Measurements have shown substantial
   performance benefits over s5fs
                      - ptimal when the disk is
 • FFS however, is sub o
   nearly full. Thus 10% is always kept free.
 • Modern disks however, no longer match the
   underlying assumptions of FFS




  Fred Kuhns (11/25/2003)   CS523 – Operating Systems              32




                   Traditional Buffer Cache


                                  Free
   Hash (device,inode)           (LRU)




  Fred Kuhns (11/25/2003)   CS523 – Operating Systems              33
     Other Limitations of s5fs and FFS

     • Performance- hardware designs and
       modern architectures have redefined the
       computing environment
     • Crash Recovery do you like waiting for
       fsck()?
     • Security- do we need more than just 7
       bits
     • File Size limitations



  Fred Kuhns (11/25/2003)        CS523 – Operating Systems   34




                            Performance Issues
 • FFS has a target rotational delay which
   estimates the time spent by kernel calculating
   the next read/write.
     – alternative is to read/write entire track
     – factor in that many disks have built-in caches
 • Due to the buffer cache, most disk I/O
   operations are writes. Note, given locality of
   reference assumptions most writes should be
   deferred.
 • Synchronous writes of metadata
 • Disk head seeks are expensive

  Fred Kuhns (11/25/2003)        CS523 – Operating Systems   35




                             Sun-FFS (cluster)
• Goal: Cluster I/O Operations to improve
  performance
• Keeps disk block allocator
• Assume rotational interleaving is not necessary:
  – sets rotational delay to 0
  – store cluster size in superblock, overloading maxcontig
• read clustering: read in physically contiguous blocks
  for file up to maxcontig blocks.
• write clustering: pages are left in cache untill
  either a synchronous write is necessary or
  contigsize blocks can be written.


  Fred Kuhns (11/25/2003)        CS523 – Operating Systems   36
               4.4BSD Log-Structured FS
• Entire disk dedicated to log – completely
  describes the file system.
     – Log divided into segments, with each segment pointing
       to the next (non-contiguous segments)
     – all writes are to tail of log file
• garbage collection by a cleaner daemon to
  permit the log to wrap around.
• Segment describes physical partitioning of disk
  and is comprised of partial segments.



 Fred Kuhns (11/25/2003)   CS523 – Operating Systems       37




                           BSD-LFS
• Directory and inode structures retained, issue is
  locating inodes
    – inodes written to disk as part of log, modified inodes
      written to a new location on disk.
• Requires new data structure: inode map. A map of
  all inodes and their location on disk. Map is
  periodically written to disk (checkpointed).




 Fred Kuhns (11/25/2003)   CS523 – Operating Systems       38




                           Segments
    • Segment usage table: contains Bytes
      stored in segment and time of last
      modification
    • partial segment is an atomic write and
      contains
         – checksum,
         – for each file with data blocks in segment the
           inode number, version and logical block
           numbers.
         – disk address of each inode contained in PS



 Fred Kuhns (11/25/2003)   CS523 – Operating Systems       39
                            Example Write

    • Dirty buffer collected until it has a full
      segment.
    • logical blocks are ordered, inode updated
      and segment written to tail of log file. Old
      copies of file blocks and inode are now free
      and available to the garbage collector.




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems          40




                           Log-structured FS
• Requires a large cache for read efficiency
• Write efficiency is obtained since the system is
  always writing to the end of the log file.
   – Why does this help?
                                    -
• Why does performance compare to Sun                      FFS?
• What about crash recovery?
   – locate checkpointed imap and segment table, update
     from subsequent log entries (rely on timestamps)
   – cycle through timestamps until reach last checkpoint




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems          41




                           Garbage Collection
• log wraps from end to start of disk
  necessitating GC
• GC reads segment and identifies valid entries
  which are written to tail, allowing segment to be
  freed.
• GC implemented by cleaner process which uses
  the ifile (system files holding the imap and
  segment table)




 Fred Kuhns (11/25/2003)       CS523 – Operating Systems          42
                            Assessing BSD-LFS
• all changed metadata may not make it into a signal
  partial segment. Complicates recovery
• Block allocation when segment written to disk, thus
  must ensure blocks will be available when time to
  write.
• Requires large physical memory for the large cache.
      -   FS
• BSD L superior to FFS but compared to Sun     -
  FFS advantages are less clear.
  – BSD-LFS faster at metadata operations
  – Sun-FFS faster with I/O intensive applications
  – comparable for general purpose use.



  Fred Kuhns (11/25/2003)              CS523 – Operating Systems                 43




                              4.4BSD Portal FS


                                                                       Portal
                  User process
                                                                      daemon

     /p/<path>                    fd                        <path>          fd


             Protal file system                                       Sockets




  Fred Kuhns (11/25/2003)              CS523 – Operating Systems                 44




                      Stackable Filesystems


                            application                     application

         /mylocal
                              MyFS

                                                             /local
                                            UFS

  • For a given mount point, there is now possible
    many file systems

  Fred Kuhns (11/25/2003)              CS523 – Operating Systems                 45

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:7
posted:11/13/2010
language:English
pages:15