Document Sample
ch6 Powered By Docstoc
					  Chapter 6
File Systems

6.1 Files
6.2 Directories
6.3 File system implementation
6.4 Example file systems

• Requirements for long term information storage:
  1.Must store large amounts of data
  2.Information stored must survive the termination of
    the process using it
  3.Multiple processes must be able to access the
    information concurrently
• Solution: Store information on disk or other
  external media in units called files.

• The file system is a part of operating system that
  manages files.
• File naming - file name.file extention
• File structure
  – unstructured sequence of bytes.   MS-DOS/UNIX
  – record sequence.                  CP/M
  – B-tree.

   File Naming

Typical file extensions.
File Structure

• Three kinds of files
  – byte sequence
  – record sequence
  – tree                 5
• File types
  – Regular files contain user information either ASCII
    or binary.
  – Directories are system files for maintaining the
    structure of the file system.
  – Character special files are used to model serial I/O
    devices such as terminals, printers, and networks.
  – Block special files are used to model disks.
• A UNIX executable file stats with a magic
  number, identifying the file as an executable
            File Types

(a) An executable file (b) An archive   7
                   File Access
• Sequential access
  – read all bytes/records from the beginning
  – cannot jump around, could rewind or back up
  – convenient when medium was magnetic tape
• Random access
  – bytes/records read in any order
  – essential for database systems
  – Two methods are used for specifying where to start
     • read and then move file marker
     • move file marker (seek), then read

               File Attributes
• Operating systems associate extra information with
  each file, called file attributes.

                Possible file attributes               9
        File Operations

1. Create         7. Append
2. Delete         8. Seek
3. Open           9. Get attributes
4. Close          10.Set Attributes
5. Read           11.Rename
6. Write

An Example Program Using File System Calls

An Example Program Using File System Calls

   Memory-Mapped Files (Ex11.c)
• To facilitate access to files, systems provides system
  calls to map files into the address space of a running
  process and remove (unmap) the files from the address
• File services as a backing store for the process and
  when the process finishes all mapped, modified pages
  are written back to their files.
   – Advantage: eliminate need for I/O.
   – Disadvantage:
      • difficult to know size of output file. In case of all zeroes,
                                              10 0's ?? or 100 0's ??
      • Mapped file modified by one process is read differently by another.
        Two processes need to see consistent views of the file.
      • file may be too large to fit.
      Memory-Mapped Files

(a) Segmented process before mapping files
  into its address space
(b) Process after mapping
   existing file abc into one segment
   creating new segment for xyz
• File systems have directories or folders to keep track
  of files.
   – A single-level directory has one directory (root) containing
     all the files.
   – A two-level directory has a root directory and user directories.
   – A hierarchical directory has a root directory and arbitrary
     number of subdirectories.
• Two different methods are used to specify file names in
  a directory tree:
   – Absolute path name consists of the path from the root
     directory to the file.
   – Relative path name consists of the path from the current
     directory (working directory).

      Directories - A single level
              directory system

• A single level directory system
  – contains 4 files
  – owned by 3 different people, A, B, and C
    Two-level Directory Systems

Letters indicate owners of the directories and files

Hierarchical Directory Systems

  A hierarchical directory system
• The path name would be written:
  – Winodws \usr\ast\mailbox
  – UNIX /usr/ast/mailbox
  – MULTICS >usr>ast>mailbox
• Dot and dot dot are two special entries in the file
  – Dot (.) refers to the current directory.
  – Dot dot (..) refers to its parent.

Path Names

A UNIX directory tree
        Directory Operations

1.   Create       5. Readdir
2.   Delete       6. Rename
3.   Opendir      7. Link
4.   Closedir     8. Unlink

       File System Implementation
• File system layout:
   – MRB (Master Boot Record) is used to boot the computer.
   – The partition table gives the starting and ending addresses of
     each partition.
   – Partitions:
      • The first block, boot block, of the active partition is read in by the
        MRB program when the system is booted.
      • The superblock contains all the key parameters about the file system.
      • Free blocks information
      • i-nodes tells all about the file.
      • Root directory
      • Directories and files

File System Implementation

   A possible file system layout

       File System Implementation
• Implementing file storage is keeping track of which
  disk blocks go with which files.
• Contiguous Allocation - store each file as contiguous
  block of data.
   – Advantage:
      • Simple to implement
      • Read performance is excellent
   – Disadvantage:
      • Disk fragmentation
      • The maximum file size must be known when file is created.
   – Example: CD-ROMs, DVDs, and write-once optical media
• Linked List Allocation - keep linked list of disk blocks
   – Disadvantage:
      • random access slow
      • amount of data in a block not a power of 2
            Implementing Files

(a) Contiguous allocation of disk space for 7 files
(b) State of the disk after files D and E have been removed
        Implementing Files

Storing a file as a linked list of disk blocks

      File System Implementation
• Linked List Allocation using an index - take
  table pointer word from each block and put them
  in an index table, FAT (File Allocation Table)
  in memory.
  – Disadvantage - entire table must be in memory all
    the time
• I-node (index-node) lists the attributes and disk
  addresses of the file's blocks.

             Implementing Files

Linked list allocation using a file allocation table in RAM
Implementing Files

An example i-node
       Implementation directories
• When a file is opened, the file system uses the
  path name to locate the directory entry.
• The directory provides information needed to
  find the disk blocks.
  – disk address of the entire file (contiguous blocks)
  – the number of first block (linked list)
  – the number of i-node (i-node)
• Where to store attributes? In directory or i-node?

           Implementing Directories

(a) A simple directory – MS-DOS/Windows
   fixed size entries
   disk addresses and attributes in directory entry
(b) Directory in which each entry just refers to an i-node - UNIX
       Implementation directories
• Handling long file names in a directory:
  – Fixed-length names (Waste space)
  – In-line (When a file is removed, a variable-sized gap
    is introduced.)
  – Heap (The heap management needs extra effort.)
• How to search files in each directory?
  – Linearly
  – Hash table
  – Cache the results of searches

      Implementing Directories

• Two ways of handling long file names in directory
   – (a) In-line
   – (b) In a heap                                    33
                  Shared files
• A shared file is used to allow a file to appear in
  several directories.
• The connection between a directory and the
  shared file is called a link. The file system is a
  Directed Acyclic Graph (DAG).
• Problem: If directories contain disk address, a
  copy of the disk address will have to be made in
  directory B. What if A or B append the file, the
  new blocks will only appear in one directory.

                   Shared files
• Solution:
  – Do not list disk block addresses in directories but in
    a little data structure.
        (use i-nodes)  (hard link)
  – Create a new file of type link which contains the
    path name of the file to which it is linked 
    symbolic linking

         Shared Files

File system containing a shared file
                      Shared files
• ln file1 file2
• Problem of hard link - when should i-node be removed?
  Suppose A: rm file2
         could set count = 1 and leave i-node intact.
             when count = 0, delete file and i-node.
• Problem above does not occur in symbolic link because
  only the owner directory has a pointer to i-node. The
  problem is extra overhead in the traversing path.
• Other problem is having multiple copies of a file may
  set copied when dumping an files onto a disk (tar).
   – do not descend path involving symbolic links.

             Shared Files

(a) Situation prior to linking
(b) After the link is created
(c)After the original owner removes the file
         Disk space management
• Strategies for storing an n byte file:
  – Allocate n consecutive bytes of disk space - segment
  – Allocate a number [n/k] blocks of size k bytes each -
  – problem – if the file grows it will have to be moved
    on the disk, it is an expensive operation and causes
    external fragmentation. =>
  – All file systems chop files up into fixed-size blocks
    that need not to be adjacent.

                  Block size
• When block size increase, disk space utilization
  decrease (space efficiency decrease and internal
• When block size decrease, data transfer rate
  decrease (time efficiency decrease)
• usual size k = 512bytes, 1k (UNIX), or 2k

          Disk Space Management

                           Block size

• Dark line (left hand scale) gives data rate of a disk
• Dotted line (right hand scale) gives disk space efficiency
• All files 2KB                                                41
                            Block size
• Example: disk with 131072 bytes per track.
             rotation time = 8.33 msec
             average seek time = 10 msec.
   time to read a block of k bytes
   = 10 + 8.33/2 + (k/131072) * 8.33
   = 10 + 4.165 + k/131072 * 8.33
   If k = 1 KB = 1024 bytes
       = 14.165 + 1024/131072 * 8.33
       = 14.165 + 0.065
       = 14.23 msec
• Disk space efficiency = % of block used by data.
   – Observation: Assume that all files are 1 kbytes, on average 1/2 of last
     block is empty.
       Keeping Track OF Free Blocks
• Use linked list of disk blocks: each block holds as
  many free disk block numbers as will fit.
   – With 1 KB block and 32-bit disk block number  1024 *
     8/32 = 256 disk block numbers  255 free blocks (and) 1
     next block pointer.
• Use bit-map: A disk with (n) blocks requires a bit map
  with (n) bits
   –   Free blocks are represented by 1's
   –   Allocated blocks represented by 0's
   –   16GB disk has 224 1-KB and requires 224 bits  2048 blocks
   –   Using a linked list = 224/255 = 65793 blocks. However, these
       blocks can be freed up as the disk is filled up.
• Bit map generally better if it can be kept completely in
         Disk Space Management

(a) Storing the free list on a linked list
(b) A bit map
           Disk Space Management

(a) Almost-full block of pointers to free disk blocks in
   - three blocks of pointers on disk
(b) Result of freeing a 3-block file
(c) Alternative strategy for handling 3 free blocks
   - shaded entries are pointers to free disk blocks
      Disk Space Management

Quotas for keeping track of each user’s disk use
             File System Reliability
• The loss of a file system can be catastrophic.
• Methods to safeguard a file system:
   – Bad Block Management
   – Backups
• Bad Block Management
   – Hardware solution - dedicate a sector to a "bad block list“
     when disk controller is initiated, the bad block list is read and
     a spare block is picked to replace each bad block. The
     mapping is recorded in the bad block list.
   – Software solution - user or file system carefully construct a
     file containing all the bad blocks

         File System Reliability
• Backups are made to handle: recover from
  disaster or stupidity.
• Considerations of backups
  – Entire or part of the file system
  – Incremental dumps: dump only files that have
  – Compression
  – Backup an active file system
  – Security

          File System Reliability
• Two strategies can be used for dumping a disk to
  – Physical dump: starts at block 0 to the last one.
     • Advantages: simple and fast
     • Disadvantages: backup everything
  – Logical dump: starts at one or more specified
    directories and recursively dumps all files and
    directories found that have changed since some given
    base date.

        File System Reliability

                                        File that has
                                        not changed

• A file system to be dumped
  – squares are directories, circles are files
  – shaded items, modified since last dump
  – each directory & file labeled by i-node number      50
         File System Reliability

• Bit maps used by the logical dumping algorithm
  – After 4 phases, the dump is complete.
        File System Consistency
• A utility program, called a file system checker
  (fsck in UNIX or scandisk in Windows), can be
  used to test the consistency of a file system.
• Two types of consistency checks can be made:
  (a) blocks (b) files (directory)

        File System Consistency
• Block consistency:
  – Build two tables with a counter per block, initially
    set to 0
     • The counters in the first table keep track of number of
       times each block is present in a file.
     • The counters of the second table record the number of
       times in free list,
  – Then, the program reads all the i-nodes and uses the
    i-nodes to build a list of all blocks used in the files
    (incrementing file counter as each block is read).
  – Check free list or bit map to find all blocks not in
    use (increment free list counter for each block in
    free list).
        File System Reliability

• File system states
  (a) consistent
  (b) missing block – add it to the free list
  (c) duplicate block in free list – rebuild the free list
  (d) duplicate data block – copy the block to a free block
          File System Consistency
• For checking directories – keep a list of counters per file starting
  at the root directory, recursively inspect each directory. For each
  file, increment the counter for the files i-node
• Compare computed value with link count stored in each i-node.
   – i-node link count > computed value = number of directory
        • Even if all files are removed, the i-node link count > 0. So
          the i-node will not be removed.
        • Solution : set i-node link count = value computed
   – i-node link count < computed value
        • The i-node may bfreed even when there is another
          directory points       to it
        • directory will be pointing to unused         i-node
   – solution : set inode        link count = computed value
• Protecting the user - rm * .o
          File System Performance
• A block cache or buffer cache is a collection of blocks
  that logically belong on the disk, but are kept in
  memory to improve performance.
   – All of the previous paging replacement algorithms can be
     used to determine which block should be written when a new
     block is needed and the cache is full.
• Modified LRU Scheme:
   – The block is not likely to be needed again soon? No => go to
   – The block is essential for the file system to be consistency
     e.g. i- nodes, etc ? Yes => write immediately

         File System Performance
• Periodically, all data block should be written out (e.g.
   write works all day).
• UNIX - system call sync forces modified blocks out to
   the disk immediately. Hard-disk oriented. e.g. update
   runs in background during sync every 30 seconds
• MS-DOS - write-through cache => all modified
   blocks are written immediately. Floppy disk oriented.
e.g. write a 1K block one character at a time
              UNIX collect then together
              MS-DOS 1 at a time

File System Performance

 The block cache data structures

       File System Performance

• Reading a block needs one access for the i-node and
   one for the block. Save i-node access time.
(a) I-nodes placed at the start of the disk
(b) Disk divided into cylinder groups
   – each with its own blocks and i-nodes               59
    Log-Structured File Systems
• With CPUs faster, memory larger
  – disk caches can also be larger
  – increasing number of read requests can come from
  – thus, most disk accesses will be writes
• LFS Strategy structures entire disk as a log
  – have all writes initially buffered in memory
  – periodically write these at the end of the disk log
  – when file opened, locate i-node by the i-node map,
    then find blocks

 Example File Systems
    CD-ROM File Systems

The ISO 9660 directory entry

The CP/M File System

  Memory layout of CP/M
The CP/M File System

The CP/M directory entry format

The MS-DOS File System

  The MS-DOS directory entry

     The MS-DOS File System

• Maximum partition for different block sizes
• The empty boxes represent forbidden combinations
        The Windows 98 File System


The extended MOS-DOS directory entry used in Windows 98

        The Windows 98 File System



An entry for (part of) a long file name in Windows 98

    The Windows 98 File System

An example of how a long name is stored in Windows 98

The UNIX V7 File System

   A UNIX V7 directory entry

The UNIX V7 File System

     A UNIX i-node
The UNIX V7 File System

The steps in looking up /usr/ast/mbox

Shared By: