Document Sample
fs Powered By Docstoc
					           File System Interface and

                                 Fred Kuhns
                          CS523 – Operating Systems

                                   WASHINGTON UNIVERSITY IN ST LOUIS

                   FS Framework in UNIX

• Provides persistent storage
• Facilities for managing data
    – file - abstraction for data container, supports
      sequential and random access
    – file system - permits organizing, manipulating and
      accessing files
• User interface specifies behavior and
  semantics of relevant system calls
    – Interface exported abstractions: files, directories,
      file descriptors and different file systems

Fred Kuhns (11/25/2003)         CS523 – Operating Systems              2

            Kernel, Files and Directories
• kernel provides control operations to name,
  organize and control access to files but it
  does not interpret contents
• Running programs have an associated current
  working directory. Permits use of relative
  pathnames. Otherwise complete pathnames
  are required.
• File viewed as a collection of bytes
   – Applications requiring more structure must define
     and implement themselves

Fred Kuhns (11/25/2003)         CS523 – Operating Systems              3
                 Kernel, Files and Directories
    • files and directories form hierarchical tree
      structure name space.
        – tree forms a directed acyclic graph
    • Directory entry for a file is known as a hard
        – Files may also have symbolic links
    • File may have one or more links
    • POSIX defines library routines {opendir(),
      readdir(), rewinddir(), closedir()}
                                     struct dirent {
                                       ino_t d_ino;
                                       char d_name[NAME_MAX + 1];
     Fred Kuhns (11/25/2003)             CS523 – Operating Systems                        4

             File and Directory Organization

         (hard) links                                /

          bin                  etc            dev                    usr         vmunix

                                                         local             etc



     Fred Kuhns (11/25/2003)             CS523 – Operating Systems                        5

                                     File Attributes
• Type – directory, regular file, FIFO, symbolic link, special.
• Reference count – number of hard links {link(), unlink()}
• size in bytes
• device id – device file resides on
• inode number - one inode per file, inodes are unique within a
  disk partition (device id)
• ownership - user and group id {chown()}
• access modes - Permissions and modes {chmod()}
     – {read, write execute} for {owner, group or other}
• timestamps – three different timestamps: last access, last
  modify, last attributes modified. {utime()}

     Fred Kuhns (11/25/2003)             CS523 – Operating Systems                        6
                       Permissions and Modes
• Three Mode Flags = {suid, sgid and sticky}
   – suid –
         • File: if set and executable then set the user’s effective user id
         • Directory: Not used
   – sgid –
         • File: if set and executable then set the effective group id. If sgid is set
           but not executable then mandatory file/record locking
         • Directory: if set then new files inherit group of directory otherwise
           group of creator.
   – sticky –
         • File: if set and executable file then keep copy of program in swap area.
         • Directory: if set and directory writable then remove/rename if EUID =
           owner of file/directory or if process has write permission for file.
           Otherwise any process with write permission to directory may remove or

   Fred Kuhns (11/25/2003)        CS523 – Operating Systems                      7

                             User View of Files
 • File Descriptors (open, dup, dup2, fork)
     –   All I/O is through file descriptors
     –   references the open file object
     –   per process object
     –   file descriptors may be dup’ed {dup(), dup2()}, copied on fork
         {fork()} or passed to unrelated process {(see ioctl() or sendmsg(),
         recvmsg()}permitting multiple descriptors to reference one object.
 • File Object - holds context
     – created by an open() system call
     – stores file offset
     – reference to vnode
 • vnode - abstract representation of a file

   Fred Kuhns (11/25/2003)        CS523 – Operating Systems                      8

                                How it works
 fd = open(path, oflag, mode);              lseek(), read(), write() affect offset
     File Descriptors                            Open File Objects
      {{0, uf_ofile}
       {1, uf_ofile}                        {*f_vnode,f_offset,f_count,...},
       {2 , uf_ofile}
         {3 , uf_ofile}                    {*f_vnode,f_offset,f_count,...},
         {4 , uf_ofile}                    {*f_vnode,f_offset,f_count,...}}
         {5 , uf_ofile}}

              In-memory             Vnode/vfsVnode/vfs
                         In-memory In-memory          Vnode/vfs
                       representation       In-memory
                 of file         representation In-memory
                            of file of file        representation
                                               of file
                                                       of file
   Fred Kuhns (11/25/2003)        CS523 – Operating Systems                      9

                                    System calls

                                   vnode interface

 tmpfs        swapfs        UFS       HSFS          PCFS          RFS    /proc    NFS

                            disk     cdrom         diskette             Process

                                                               Example from Solaris
  Fred Kuhns (11/25/2003)          CS523 – Operating Systems                          10

                              File Systems

   • File hierarchy composed of one or more File
   • One File System is designated the Root File
   • Attached to mount points
   • File can not span multiple File Systems
   • Resides on one logical disk

  Fred Kuhns (11/25/2003)          CS523 – Operating Systems                          11

                               Logical Disks
• Viewed as linear sequence of fixed sized, randomly
  accessible blocks.
   – device driver maps FS blocks to underlying storage device.
   – created using newfs or mkfs utilities
• A file system must reside in a logical disk, however a logical
  disk need not contain a file system (for example the swap
• Typically logical disk corresponds to partion of a physical
  disk. However, logical disk may:
   – map to multiple physical disks
   – be mirrored on several physical disks
   – striped across multiple disks or other RAID techniques.

  Fred Kuhns (11/25/2003)          CS523 – Operating Systems                          12
                            File Abstraction
• Abstracts different types of I/O objects
   – for example directories, symbolic links, disks, terminals,
     printers, and pseudodevices (memory, pipes sockets etc).
• Control interface includes fstat, ioctl, fcntl
• Symbolic links: file contains a pathname to the
  linked file/directory. {lstat(), symlink(), readlink()}
• Pipe and FIFO files:
   – FIFO created using mknod(), lives in the file system
     name space
   – Pipe created using pipe(), persists as long as opened for
     reading or writing.

  Fred Kuhns (11/25/2003)      CS523 – Operating Systems                  13

                       OO Style Interfaces

     Abstract base class                  Instance of derived class
  Struct interface_t                Struct interface_t
  {                                 {
  // Common functions:                open (), close ()
     open (), close ()                type, count
  // Common data:                     *ops
     type, count                      *data                      {my_read()
  // Pure virtual functions         }                             my_write()
     *ops (Null pointer)                                          my_init()
  // Private data                                  {device_no,    my_open()
     *data (Null pointer)                           free_list,   …}
  }                                                 lock, …}

  Fred Kuhns (11/25/2003)      CS523 – Operating Systems                  14

  Sun’s (SVR4) Vfs/Vnode Framework
  • Concurrently support multiple file system
  • transparent interoperation of different file
    systems within one file hierarchy
       – enable file sharing over network
       – abstract interface allowing easy integration of
         new file systems by vendors

  Fred Kuhns (11/25/2003)      CS523 – Operating Systems                  15
• Operation performed on behalf of current
• Support serialized access, I.e. locking
• must be stateless
• must be reentrant
• encourage use of global resources (cache,
• support client server architectures
• use dynamic storage allocation

 Fred Kuhns (11/25/2003)              CS523 – Operating Systems                                 16

                           Vnode/vfs interface
• Define abstract interfaces
• vfs: Fundamental abstraction representing a file
  system to the kernel
   – Contains pointerss to file system (vfs) dependent
     operations such as mount, unmount.
• vnode: Fundamental abstraction representing a
  file in the kernel
   – defines interface to the file, pointer to file system
     specific routines. Reference counted.
   – accessed in two ways:
         • 1) I/O related system calls
         • 2) pathname traversal

 Fred Kuhns (11/25/2003)              CS523 – Operating Systems                                 17

                              vfs Overview
                                           fs dependent                            fs dependent
                                          Struct vfsops {                         Struct vfsops {
                                           *vfs_mount,                             *vfs_mount,
                                           *vfs_root, …}                           *vfs_root, …}
rootvfs                                         private data                           private data

                Struct vfs {                                Struct vfs {
                 *vfs_next,                                  *vfs_next,
                 *vfs_vnodecovered,                          *vfs_vnodecovered,
                 *vfs_ops,                                   *vfs_ops,
                 *vfs_data, …}                               *vfs_data, …}

  Struct vnode {                  Struct vnode {                     Struct vnode {
   *v_vfsp,                        *v_vfsp,                           *v_vfsp,
   *v_vfsmountedhere,…}            *v_vfsmountedhere,…}               *v_vfsmountedhere,…}
        / (root)                             /usr                        / (mounted fs)
 Fred Kuhns (11/25/2003)              CS523 – Operating Systems                                 18
                             Mounting a FS

  • mount(spec, dir, flags, type, dataptr,
  • SVR5 uses a global virtual file system switch
    table (vfssw)
  • allocate and initialize private data
  • initialize vfs struct
  • locate and initialize root vnode of FS in
    memory (VFS_ROOT)

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems            19

                           Pathname traversal
• Path traversal must, for each path component perform
  the following:
   –   Verify vnode is directory, if not then stop
   –   invoke VOP_LOOKUP (ufs_lookup()),
   –   if component found, return pointer to vnode.
   –   if not found and last component return vnode of parent directory
   –   Otherwise not end and not found then ENOENT error.
• If a component corresponds to a mount point then locate
  root vnode of mounted fs.
• If component is a symbolic link, then append path
• vnodes reference counts incremented during lookup
• May use a Directory Lookup Cache (name to vnode)

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems            20

              Other vfs/vnode interfaces
 • 4.4 BSD vfs/vnode interface
       – Adds state to interface
       – enhanced lookup
       – vnode locking across multiple operations
 • OSF/1
       – uses timestamps to optimize lookups

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems            21
                          Local File Systems
• S5fs- System V file system. Based on the
  original implementation.
• FFS/UFS- BSD developed filesystem with
  optimized disk usage algorithms

Fred Kuhns (11/25/2003)       CS523 – Operating Systems              22

                          S5fs - Disk layout

• Viewed as a linear array of blocks
• Typical disk block size 512, 1024, 2048
• Physical block number is the block’s index in
• disk uses cylinder, track and sector
• first few blocks are the boot area, which is
  followed by the inode list (fixed size)

Fred Kuhns (11/25/2003)       CS523 – Operating Systems              23

                             Disk Layout

                           tract         sector           heads


 Rotational speed                                         platters
  disk seek time

Fred Kuhns (11/25/2003)       CS523 – Operating Systems              24
                                 S5fs disk layout

                 bootarea superblock inode list                             data

      Boot area - code to initialize bootstrap the system

    Superblock - metadata for filesystem. Size of FS,
    size of inode list, number of free blocks/inodes,
    free block/inode list

                inode list - linear array of 64byte inode structs
      Fred Kuhns (11/25/2003)           CS523 – Operating Systems                               25

                                s5fs - some details

                      inode       name
                        8           .                                 Di_mode (2)
                       45           ..                                di_nlinks (2)
                        0           “”                                di_uid (2)
                       123        myfile                              di_gid (2)
                                                                      di_size (4)
                                                                      di_addr (39)
                                                                      di_gen (1)
                                                                      di_atime (4)
                                                                      di_mtime (4)
                      2 byte       14byte                             di_ctime (4)

                            directory                                On-disk inode
      Fred Kuhns (11/25/2003)           CS523 – Operating Systems                               26

                       Locating file data blocks
          Assume 1024 Byte Blocks                    3 B/index => 224 = 16 M blocks
                0                                                            or 16 GB of data
3 Bytes/entry


                5                                 256 links


                                                                      256 links

                8                                 256 links                         256 links

                                                     k   s
                9                                 oc
                10 - indirect                  Bl
                                           K                   256 links           256 links

                11 - double indirect     64
                12 - triple indirect              256 links
                                                                                  256 links
                                       16M Blocks
      Fred Kuhns (11/25/2003)           CS523 – Operating Systems                               27
             S5fs Kernel Implementation
        - ore
    • In C Inodes- also include vnode, device
      id, inode number, flags
    • Inode lookup uses a hash queue based on
      inode number (may also use device number)
    • kernel locks inode for reading/writing
    • Read/Write use a buffer cache or VM

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems   28

                           Problems with s5fs
• Superblock – contains essential information but is
  not replicated.
   - isk
• on d inodes – inodes physically located at front
  of disk, may result in long seek times
• Disk block allocation – free block order is not
  optimized (blocks of a file may not be “close”)
• Disk block size – 512 or 1024 Byte blocks
• file name size – max of 14 chars

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems   29

       Berkeley Fast File System - FFS
• Disk partition divided into cylinder groups
• superblocks restructured and replicated across
     – Constant information
     – cylinder group summary info such as free inodes and
       free block
• support block fragments – typcial block size
   8KB, fragment can be as small as 512B
• Long file names
• new disk block allocation strategy

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems   30
                    FFS Allocation strategy
• Goal: Collocate similar data/info
• attempt to locate file inodes in same cyl group as directory
• new directories created in different cyl groups
   – choose from list of groups with above average free inode counts
• attempt to place file data blocks and inode in same cyl
• Change cyl group when file size reaches 48KB, and
  thereafter every 1 MB.
• allocate sequential blocks at a rotationally optimal position.
• Choose cyl group with “best” free count

  Fred Kuhns (11/25/2003)   CS523 – Operating Systems              31

                        Is FFS/UFS Better?
 • Measurements have shown substantial
   performance benefits over s5fs
                      - ptimal when the disk is
 • FFS however, is sub o
   nearly full. Thus 10% is always kept free.
 • Modern disks however, no longer match the
   underlying assumptions of FFS

  Fred Kuhns (11/25/2003)   CS523 – Operating Systems              32

                   Traditional Buffer Cache

   Hash (device,inode)           (LRU)

  Fred Kuhns (11/25/2003)   CS523 – Operating Systems              33
     Other Limitations of s5fs and FFS

     • Performance- hardware designs and
       modern architectures have redefined the
       computing environment
     • Crash Recovery do you like waiting for
     • Security- do we need more than just 7
     • File Size limitations

  Fred Kuhns (11/25/2003)        CS523 – Operating Systems   34

                            Performance Issues
 • FFS has a target rotational delay which
   estimates the time spent by kernel calculating
   the next read/write.
     – alternative is to read/write entire track
     – factor in that many disks have built-in caches
 • Due to the buffer cache, most disk I/O
   operations are writes. Note, given locality of
   reference assumptions most writes should be
 • Synchronous writes of metadata
 • Disk head seeks are expensive

  Fred Kuhns (11/25/2003)        CS523 – Operating Systems   35

                             Sun-FFS (cluster)
• Goal: Cluster I/O Operations to improve
• Keeps disk block allocator
• Assume rotational interleaving is not necessary:
  – sets rotational delay to 0
  – store cluster size in superblock, overloading maxcontig
• read clustering: read in physically contiguous blocks
  for file up to maxcontig blocks.
• write clustering: pages are left in cache untill
  either a synchronous write is necessary or
  contigsize blocks can be written.

  Fred Kuhns (11/25/2003)        CS523 – Operating Systems   36
               4.4BSD Log-Structured FS
• Entire disk dedicated to log – completely
  describes the file system.
     – Log divided into segments, with each segment pointing
       to the next (non-contiguous segments)
     – all writes are to tail of log file
• garbage collection by a cleaner daemon to
  permit the log to wrap around.
• Segment describes physical partitioning of disk
  and is comprised of partial segments.

 Fred Kuhns (11/25/2003)   CS523 – Operating Systems       37

• Directory and inode structures retained, issue is
  locating inodes
    – inodes written to disk as part of log, modified inodes
      written to a new location on disk.
• Requires new data structure: inode map. A map of
  all inodes and their location on disk. Map is
  periodically written to disk (checkpointed).

 Fred Kuhns (11/25/2003)   CS523 – Operating Systems       38

    • Segment usage table: contains Bytes
      stored in segment and time of last
    • partial segment is an atomic write and
         – checksum,
         – for each file with data blocks in segment the
           inode number, version and logical block
         – disk address of each inode contained in PS

 Fred Kuhns (11/25/2003)   CS523 – Operating Systems       39
                            Example Write

    • Dirty buffer collected until it has a full
    • logical blocks are ordered, inode updated
      and segment written to tail of log file. Old
      copies of file blocks and inode are now free
      and available to the garbage collector.

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems          40

                           Log-structured FS
• Requires a large cache for read efficiency
• Write efficiency is obtained since the system is
  always writing to the end of the log file.
   – Why does this help?
• Why does performance compare to Sun                      FFS?
• What about crash recovery?
   – locate checkpointed imap and segment table, update
     from subsequent log entries (rely on timestamps)
   – cycle through timestamps until reach last checkpoint

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems          41

                           Garbage Collection
• log wraps from end to start of disk
  necessitating GC
• GC reads segment and identifies valid entries
  which are written to tail, allowing segment to be
• GC implemented by cleaner process which uses
  the ifile (system files holding the imap and
  segment table)

 Fred Kuhns (11/25/2003)       CS523 – Operating Systems          42
                            Assessing BSD-LFS
• all changed metadata may not make it into a signal
  partial segment. Complicates recovery
• Block allocation when segment written to disk, thus
  must ensure blocks will be available when time to
• Requires large physical memory for the large cache.
      -   FS
• BSD L superior to FFS but compared to Sun     -
  FFS advantages are less clear.
  – BSD-LFS faster at metadata operations
  – Sun-FFS faster with I/O intensive applications
  – comparable for general purpose use.

  Fred Kuhns (11/25/2003)              CS523 – Operating Systems                 43

                              4.4BSD Portal FS

                  User process

     /p/<path>                    fd                        <path>          fd

             Protal file system                                       Sockets

  Fred Kuhns (11/25/2003)              CS523 – Operating Systems                 44

                      Stackable Filesystems

                            application                     application



  • For a given mount point, there is now possible
    many file systems

  Fred Kuhns (11/25/2003)              CS523 – Operating Systems                 45

Shared By: