OS File Systems

W
Document Sample
scope of work template
							HARD HAT AREA - WHITE PAPER




                                   OS File Systems
                                      Longhorn Brings Changes For Windows


T
           he upcoming Longhorn operating          partitions up to 4TB in size. In ext2, each         GFS. The Global File System works
           system will bring several new           file is represented by an inode, which           especially well with Linux cluster file
           technologies to the next genera-        includes a detailed description of the file,     systems. It provides better security than
tion of Windows. Although many of the              including the file type, access rights, size,    ext2, as well.
changes will be immediately visible and will       and pointers to the appropriate data blocks.        Microsoft file systems support. Some
affect the way you interact with Windows,             Of course, Linux wouldn’t be Linux            developers have created support for FAT32
many of the other changes will be in the           unless many developers were contributing         and NTFS within Linux.
background and will affect the way the             to the project. Dozens of different file sys-       Minix. This file system format origi-
operating system performs.                         tems are under development or are available      nated in Minix, which is a variation of
   The work of the file system in Long-            for use with Linux. Some of them include:        Unix that sparked many features in
horn will be one of those important,
behind-the-scenes changes. Every type of
writeable storage disk, whether it is a hard         Implicit Query
drive platter, a CD, or a diskette, must use
a file system. The file system performs sev-           t also appears that as part of WinFS and Longhorn, Microsoft is seeking a Google-like
eral tasks related to data storage, including
naming, storing, and retrieving data.
                                                    I  user interface and search function that will make it easier to find a particular piece of
                                                    data. Microsoft has created an application called IQ (Implicit Query) that would service
   Before delving into the features of              this function. (A version of IQ was part of a Microsoft presentation at Comdex in
Longhorn’s file system, let’s look at the           November 2003.) The key force driving IQ would be its ability to perform searches in the
types of file systems Linux and Mac OS X            background without your prompts. The file structure and file organization in WinFS would
use. Most of you are probably familiar              make technology such as IQ possible. For example, if you’re working on a particular doc-
with the current main Windows file sys-             ument (as shown below), IQ would scan your document for key text strings as you enter
tems: FAT/FAT16, FAT32, and NTFS.                   them and perform searches on those strings, making data available to you in the back-
For a refresher, see the “Windows File              ground if you need it—and maybe even before you realize you need it. Analysts say that
Systems” sidebar.                                   in the long run, an idea such as IQ could make search engines obsolete, automatically
                                                    building into WinFS and the Longhorn OS the ability to search the Web automatically
Linux File Systems                                  alongside the ability to search hard drives.
    Linux can use several file systems, but                                                                           IQ searches your email
what many people refer to as the “native”                                                                             program for any messages
Linux file system is ext2. Ext2, which first                                                                          matching your To: line,
appeared in 1993, is short for The Second                                                                             just in case you need to
Extended File System. The first version                                                                               look back at previous
(called ext) initially appeared in 1992 but is                                                                        messages to John.
no longer part of the Linux kernel. A third
version (called ext3) is backward-compati-                                                                          IQ would look through your
ble with ext2 and essentially adds a journal-                                                                       hard drive for any email
ing file system to ext2. Ext3 is included in                                                                        messages, calendar entries,
some kernels of 2.4.x. However, ext2 is the                                                                         Word documents, and
foundation for the Linux file system.                                                                               other documents and items
    Ext2 has its roots as a basic Unix file sys-                                                                    linked to the conference.
tem. Ext2 uses many of the same principles
as Windows OS file systems. It stores files in                                                                      IQ would look for any
data blocks (called clusters in Windows file                                                                        documents, email mes-
systems) and uses a hierarchical tree for its                                                                       sages, calendar items, and
structure of directories, subdirectories, and                                                                       Web sites related to Bill
files. Ext2 can handle long file names and                                                                          Gates and Omaha.
                                                                            WHITE PAPER - HARD HAT AREA



  WinFS Data Model
     he WinFS data model               relationship, a source type is     documents, for instance. (More
 T   describes the concepts for
 data structure and organiza-
                                       required. A target type can be
                                       part of the relationship, but it
                                                                          than one folder can have a
                                                                          holding relationship with a
                                                                                                               between two or more types
                                                                                                               with a holding relationship in
                                                                                                               place aren’t allowed.
 tion within the file system.          isn’t required. (A relationship    document at a time.) If a               Reference relationships. In
 Within the WinFS data model           with a source type and no          source-type folder is deleted,       a reference relationship, source




 are types, which describe the         target type is called a dangling   the target-type document             types have no control over tar-
 pieces of data. Each type has         relationship.) Two major types     remains available as long as         get types. No restrictions on
 certain properties and fields         of relationships exist in WinFS,   one or more other source-type        relationships are allowed, and
 that relate to it and describe        called holding relationships       folders are pointing to it.          dangling relationships are
 it. For example, a type called        and reference relationships.       However, if all source-type          allowed. A reference relation-
 “person” might have proper-               Holding relationships. In a    folders are deleted in the hold-     ship can involve cycles, too, as
 ties and fields such as “name”        holding relationship, the source   ing relationship, and the target-    shown in Figure 3. A reference
 and “address” that relate to          type controls the target type,     type document has no source-         relationship might be used over
 it. The types and their proper-       and the relationship doesn’t       type relationships, it’s deleted.    a network, for example. If the
 ties aid in data organization         end until the source type ends     One other important point            network experiences a problem
 and data searches.                    it. Dangling relationships are     about holding relationships:         and the link between a source
     WinFS types experience            not possible in a holding rela-    Cycles are not allowed, as           and target type is temporarily
 relationships, too, which are         tionship. While organizing data,   shown in Figures 1 and 2. Any        broken, a reference relationship
 rules related to organizing           WinFS might use a holding          holding relationships that           could still exist for the other
 and using the data. Within the        relationship with folders and      would create a cycle or a loop       types that remain linked.

Linux. However, Minix probably won’t               and HFS+) is the primary file system for         Longhorn & WinFS
appear in future versions of the Linux             the Mac OS X.                                       WinFS (Windows File System) should
kernel, as it has been replaced by ext2.              HFS Plus is a replacement for HFS,            debut in the upcoming Longhorn OS,
   Read-only file systems. A few read-             which was a long-time file system for            which is the next major desktop version of
only file systems designed for boot disks          Macintosh computers developed in the             Windows, expected to replace WinXP,
are available for Linux, too. Cramfs is a          late 1980s. Apple decided to replace HFS         probably in 2005 or 2006. (Although some
read-only file system that can use com-            because of problems it was having with           say WinFS is short for Windows Future
pression. Romfs is a basic read-only file          larger hard drives. (In that regard, HFS is      Storage, in a recent speech, Bill Gates called
system that cannot use compression.                comparable to FAT/FAT16 from Win-                it Windows File System.) WinFS is a data
Squashfs is a read-only file system still in       dows computers, which was replaced by            storage system that will make information
development that will squeeze as much              FAT32 and NTFS.)                                 easier to find. The goal of WinFS is to push
data as possible in the boot area.                    Both HFS and HFS Plus use B-trees for         efficient file-storage capabilities to heights
   Reiserfs. The Linux 2.4.x kernel makes          cataloging the file system, just as NTFS         that most of us can scarcely imagine today.
Reiserfs available. The strength of Reiserfs       does. In 2002, Apple added optional jour-           Plans for WinFS involve using several
lies in its ability to efficiently handle large    naling features to HFS Plus for additional       types of technologies, including NTFS.
numbers of small files.                            security. Even though Apple uses different       Microsoft officials have said that WinFS
                                                   terminology to describe its file system—         won’t replace NTFS but will build on
Mac OS X File Systems                              Apple calls its data blocks allocation           top of it, using NTFS and allowing the
   Newer Macintosh computers use the               blocks, while Windows calls them clus-           strengths of both technologies to work
HFS (Hierarchical File System) Plus file           ters—many of the HFS Plus features are           together. Industry experts say the ability
system, which debuted in Mac OS 8.1.               comparable to FAT32 and NTFS in clus-            of WinFS to work on top of NTFS
HFS Plus (also called HFS Extended                 ter size and disk space allocation.              should speed the acceptance of WinFS
HARD HAT AREA - WHITE PAPER



 Data Storage In Windows File Systems
     epending on the Windows operating system you’re using, you may have a choice between FAT/FAT16, FAT32, and NTFS as your file
 D   system. As the “Cluster Size” chart and this graphic show, each file system stores data a little differently, which can cause wasted
 space within the hard drive (also called slack space).

                    FAT/FAT16 vs. FAT32                                                               FAT32 vs. NTFS

    On a 2GB hard drive or partition, FAT/FAT16 uses a                           On a 20GB hard drive or partition, FAT32 uses a default
 default cluster size of 32KB, while FAT32 uses 4KB clusters.                 cluster size of 16KB, while NTFS uses 4KB clusters. (Each
 (Each small square represents 1KB; the bold-lined squares                    small square represents 1KB; the bold-lined squares and
 and rectangles represent a cluster.)                                         rectangles represent a cluster.)

          FAT/FAT16                             FAT32                                  FAT32                                  NTFS




    In the above example of a 31KB file (blue), both types of file sys-       In the above example of a 29KB file (blue), both types of file systems
 tems have the same amount of slack space (peach). However, in the        have the same amount of slack space (peach). However, in the bottom
 bottom example of a 40KB file, the FAT32 file system still has six 4KB   example of a 50KB file, the NTFS file system still has three 4KB clusters
 clusters available as free space (white) and no slack space, while the   available as free space (white) and 2KB of slack space, while the FAT32
 FAT/FAT16 file system has 24KB of slack space in the second cluster.     file system has 14KB of slack space in the fourth cluster.




and Longhorn. If WinFS used a vastly               APIs to access data. Each of these tech-            Searching. By organizing data in a
different file system than is available in         nologies will let the WinFS data model           DAG, WinFS opens a new world of
today’s Windows PCs, Longhorn might                strongly handle what Microsoft calls             possibilities for searching. Users will
experience compatibility problems with             the three key components of a data stor-         be able to search using multiple criteria,
today’s software packages. In addition,            age platform: organization, searching,           which is impossible in a tree structure.
customers would probably be leery of               and sharing.                                     Microsoft says the WinFS search capa-
making the switch until WinFS had                     Organization. When organizing and             bilities will be superior to the filtering
proven its stability.                              presenting data, WinFS will follow a             capabilities of today’s search engines.
   To make data easier to find, WinFS              different path from NTFS, which uses a              Sharing. Sharing data between users
will use a few different APIs than                 B-tree structure. WinFS will present             and between applications will be an easier
NTFS, however. Look for XML tech-                  data in what Microsoft calls a DAG               process under WinFS. The technology in
nologies to appear in WinFS and pro-               (directed acyclic graphic). Data organi-         WinFS for sharing data will allow for a
vide the file system with high-end                 zation will be far more flexible under           common security model in Longhorn and
data-labeling capabilities. WinFS will             WinFS, with the ability to organize by           will work well with other types of tech-
also use relational and object-oriented            several methods, including relationships.        nologies, such as peer-to-peer networking.
                                                                               WHITE PAPER - HARD HAT AREA



 Fragmentation In Windows File Systems                                                            Cluster Size
                                                                                                     n a Windows file system, clusters are
  W        hen the hard drive
           is empty, it’s easy
                                   the hard drive (Figure 2).
                                   Although splitting files
                                                                  system can retrieve an
                                                                  entire file from adjacent       I  the smallest possible storage units
                                                                                                  on a hard drive. (You can think of a
  for the file system to fill      across different areas of      clusters, it can work
  the clusters in order            the hard drive doesn’t         faster. Running a defrag-       cluster as a drawer in a filing cabinet.)
  (Figure 1). However, as          affect the file, it does       menting program will            A file can extend over several clusters,
  files are deleted and as         affect system perfor-          rearrange the clusters to       but each cluster can only hold one file.
  new files are added, the         mance because the file         try to place them closer        Because of this rule, if a file or a por-
  file system might not be         system must take addi-         together (Figure 3). (In        tion of a file only occupies a small per-
  able to squeeze files into       tional time to collect each    the graphic, different files    centage of a cluster, the remainder of
  the available space, forc-       portion of the file from       are represented by differ-      the cluster is wasted, empty space.
  ing it to split the file stor-   the different areas of the     ent colors; each square            Clusters play a key role in hard drive
  age into different areas of      hard drive. When the file      represents a cluster.)          and system performance. Most experts
                                                                                                  agree that a 4KB cluster size is best for
                                                                                                  balancing system performance with
                                                                                                  minimal wasted space. Clusters that are
                                                                                                  too small result in less wasted, empty
                                                                                                  space, but they aren’t as efficient in
                                                                                                  performance. Clusters that are too
                                                                                                  large have the opposite problems.
The Hope For WinFS                                 For example, instead of searching for a        Cluster size is dependant on the size of
                                                   keyword in the file name of a Word             partitions, on the size of hard drives,
   Microsoft hopes WinFS can become a
                                                   document, users will have the option to        and—most importantly—on the file sys-
jack-of-all-trades file system, giving
                                                   search for a topic in the actual text of       tem in place. The chart below shows
Microsoft a single storage system that can
                                                   the Word document as part of a general         the default cluster sizes for varying
find stored information quickly while
                                                   file search. Developers will also be able      sizes of partitions under each type of
working well with a variety of applications.
                                                   to use the metadata system in WinFS to         Windows file system.
Microsoft says WinFS will be far more
than a file system; it will also deal with         search all types of applications at the        FAT/FAT16
nonfile data, such as personal contacts and        same time, which is a difficult task at        Partition/                   Default
email messages.                                    best now.                                      Hard Drive Size              Cluster Size
   Microsoft officials say they expect                 Many industry experts say the im-          16MB to 127MB                2KB
WinFS will be able to simplify and                 proved search capabilities of WinFS can’t      128MB to 255MB               4KB
streamline data organizing, searching,             come quickly enough. As hard drive sizes       256MB to 511MB               8KB
and sharing. This will be no small task;           soar beyond hundreds of gigabytes,             512MB to 1,023MB             16KB
after all, application types currently play        improved search capabilities are vitally       1,024MB to 2,047MB           32KB
the key role in determining the storage            important. Without an improved search          2,048MB to 4,096MB           64KB
of data. Because each type of applica-             capability, efficiently organizing and
tion—whether it’s a database, an email             using the huge amounts of data stored on       FAT32
server, a file system, or another applica-         a hard drive will be next to impossible.       Partition/                   Default
tion—follows a slightly different                      Microsoft has many hopes for the future    Hard Drive Size              Cluster Size
method for storing data, it can be a               of WinFS. However, the reality—accord-         0.5GB to 8GB                 4KB
nightmare to retrieve and find data                ing to industry experts and analysts—is that   8GB to 16GB                  8KB
stored by different applications.                  the daunting task of making WinFS work         16GB to 32GB                 16KB
   To improve the ability of Windows               as expected is going to be extremely diffi-    32GB and larger              32KB
computers to search for data throughout            cult for Microsoft. In fact, developing
all applications, WinFS will use metada-           WinFS may already have caused a delay in       NTFS
ta stored within each file. The metadata           the eventual release of Longhorn, which        Partition/                   Default
system, which will use XML technology,             initially was rumored to be early in 2005.     Hard Drive Size              Cluster Size
will let developers link specific pieces of        The release is now expected to occur some-     0.5GB or less                512 bytes
data to each file that will aid in the             time later in 2005 or 2006.                    0.5GB to 1GB                 1KB
search capabilities and write applications                                                        1GB to 2GB                   2KB
that allow for more focused searches.                                      by Kyle Schurman       2GB and larger               4KB
HARD HAT AREA - WHITE PAPER



 Windows File Systems
      icrosoft currently uses            Other companies now will         use smaller clusters on larger             Log File. Whenever changes
M     two file systems with its
newest Windows operating
                                     have the option of licensing
                                     Microsoft’s various versions of
                                                                          hard drives and partitions. (See
                                                                          the “Cluster Size” chart.) This fea-
                                                                                                                 occur in the file system, NTFS
                                                                                                                 records the change in the Log
systems: FAT32 and NTFS.             its FAT file system, thanks to a     ture greatly improved the perfor-      File. It doesn’t record the actual
However, Microsoft’s file system     recent decision by the company.      mance of FAT32 over FAT, in            changes to the data; it just
technology started with FAT.         Microsoft says it wants to join a    most instances.                        makes note of when the change
                                     broad industry effort to make                                               occurred, which is useful to par-
FAT/FAT16                            core technologies more avail-        NTFS                                   ticular types of software, such as
    In older versions of Windows     able to all kinds of companies           Windows NT 3.1, which ap-          antivirus software.
(Win95 and older), most hard dri-    through licensing.                   peared in 1993, was the first              MFT Mirror. The MFT Mirror
ves used the FAT file system. The        Because FAT is a popular for-    to use NTFS. WinNT 4.0 users           is a copy of the first 16 MFT files,
latest, 16-bit version of FAT goes   mat for exchanging media be-         had the option of using FAT or         which are responsible for aspects
by the name FAT16. Win95             tween computers and digital de-      NTFS. Microsoft’s release of the       of system operation. NTFS stores
OSR2/98/XP users have the option     vices—and because it works well      WinXP Home Edition operating           the MFT Mirror in a different area
of using FAT or FAT32. (WinXP        with many operating systems—         system marked the first time           of the hard drive than the MFT,
users can also select NTFS.)         Microsoft decided to make FAT        Windows home users had the             allowing the MFT Mirror to serve
    FAT can trace its roots to       one of the first technologies it’s   option of using NTFS instead of        as a backup if the MFT suffers
1976, when it appeared in BASIC,     offering under the new policy.       FAT or FAT32.                          damage. NTFS reserves the first
and 1981, when it appeared in        By having FAT available for              Microsoft designed NTFS to         12% of the hard drive for the
Mi-crosoft’s first version of DOS.   licensing, Microsoft hopes other     be a more reliable and secure          MFT, and the 16 system opera-
The initial FAT only worked with     companies will have an easier        option for network users in a          tion files are at the beginning of
hard drives up to 32MB in size.      time making compatible prod-         corporate environment. NTFS            that area. The MFT Mirror is in
As hard drives grew throughout       ucts. You can find more informa-     technology gave system admin-          the middle of the other 88% of
the 1980s and early 1990s, FAT       tion on Microsoft’s licensing        istrators the option of assigning      the hard drive.
morphed in subsequent versions       policy at www.microsoft.com          permission to individual files             Quota Table. Microsoft
of DOS to allow for larger hard      /mscorp/ip/.                         within folders, which allows for       added this metadata file to the
drives. By 1991, Microsoft re-                                            flexible control of the network’s      MFT in Win2000/XP. The Quota
leased DOS 5.0, and it included a    FAT32                                most secure files.                     Table gives you control over the
16-bit FAT, which supported hard         FAT32 is a 32-bit file system,       One area where NTFS shines         amount of hard drive space any
drives up to 2GB in size in          which allows it to address more      is in its file organization. NTFS      directory can occupy. For exam-
Win95/98 and up to 4GB in size       clusters than FAT/FAT16. FAT32       stores file attributes in its MFT      ple, this feature might be handy
in WinNT. FAT has continued to       initially appeared in 1996 with      (master file table), which is well     at home if you’re sharing a
use 16-bit technology. One of        the OSR2 release of Win95,           organized and can hold a large         computer with the rest of your
FAT’s biggest advantages is its      which allowed for hard drives of     amount of file attribute informa-      family; each person’s folder for
ability to work with computers       larger than 2GB. (Win95 OSR2         tion. The MFT is similar to FAT’s      personal files can have a limit
running a variety of Windows         was only available to computer       file allocation table, but the MFT     on its size.
operating systems.                   manufacturers; Win98 was the         can store a more complex set of            NTFS also uses B-trees when
    FAT works best with smaller      first retail OS featuring FAT32.)    data about the files.                  dealing with large folder struc-
hard drives because its simplis-         Microsoft attempted to make          MFT consists of metadata files,    tures. (A B-tree is a method of
tic design allows it to work         FAT and FAT32 as compatible          which NTFS uses to manage the          organizing and finding files in a
quickly and provide good data-       as possible, and they’re fairly      file system. Some of the impor-        database or on a hard drive.) The
access times on a small hard         similar. However, a few signifi-     tant MFT metadata files include:       B-trees let NTFS precisely orga-
drive. FAT’s simplistic design       cant differences can cause               Bad Cluster File. NTFS uses        nize the attributes for the files
hampers its ability to work          problems with running certain        this metadata file to mark any         and folders within the large fold-
through a lot of data on larger      types of older software on           malfunctioning clusters on the         er structure, making for efficient
hard drives, though, resulting in    FAT32 systems. For example,          hard drive and avoids storing          searches. This folder structure
poor data-access times. FAT also     some hard drive compression          data there.                            wasn’t available with FAT. v
makes inefficient use of hard        software made for FAT will not           Cluster Allocation Bitmap.
drive space, which is a problem      run on FAT32.                        This map of all clusters on the         See www.cpumag.com/cpufeb
that’s compounded on larger              FAT32’s biggest advantage        partition helps the file system         04/filesystems for information
hard drives.                         over FAT/FAT16 is its ability to     find any available clusters.                on NTFS search methods.
                                     NTFS Search Methods
                                     NTFS uses B-trees to aid in organizing and finding files on the
                                     hard drive. (A B-tree diagram is shown below.) At right, you can
                                     see that, in most instances, a tree-type search is faster than a
                                     brute search, which goes from top to bottom in a list. Within the
                                     NTFS tree search, each record specifies whether the file in ques-
                                     tion is above or below the current position. As the search pro-
                                     gresses, the number of files remaining to be searched continues
                                     to be cut in half.
SOURCE: DIGIT-LIFE.COM, WHATIS.COM

						
Related docs