"OS File Systems"
HARD HAT AREA - WHITE PAPER OS File Systems Longhorn Brings Changes For Windows T he upcoming Longhorn operating partitions up to 4TB in size. In ext2, each GFS. The Global File System works system will bring several new file is represented by an inode, which especially well with Linux cluster file technologies to the next genera- includes a detailed description of the file, systems. It provides better security than tion of Windows. Although many of the including the file type, access rights, size, ext2, as well. changes will be immediately visible and will and pointers to the appropriate data blocks. Microsoft file systems support. Some affect the way you interact with Windows, Of course, Linux wouldn’t be Linux developers have created support for FAT32 many of the other changes will be in the unless many developers were contributing and NTFS within Linux. background and will affect the way the to the project. Dozens of different file sys- Minix. This file system format origi- operating system performs. tems are under development or are available nated in Minix, which is a variation of The work of the file system in Long- for use with Linux. Some of them include: Unix that sparked many features in horn will be one of those important, behind-the-scenes changes. Every type of writeable storage disk, whether it is a hard Implicit Query drive platter, a CD, or a diskette, must use a file system. The file system performs sev- t also appears that as part of WinFS and Longhorn, Microsoft is seeking a Google-like eral tasks related to data storage, including naming, storing, and retrieving data. I user interface and search function that will make it easier to find a particular piece of data. Microsoft has created an application called IQ (Implicit Query) that would service Before delving into the features of this function. (A version of IQ was part of a Microsoft presentation at Comdex in Longhorn’s file system, let’s look at the November 2003.) The key force driving IQ would be its ability to perform searches in the types of file systems Linux and Mac OS X background without your prompts. The file structure and file organization in WinFS would use. Most of you are probably familiar make technology such as IQ possible. For example, if you’re working on a particular doc- with the current main Windows file sys- ument (as shown below), IQ would scan your document for key text strings as you enter tems: FAT/FAT16, FAT32, and NTFS. them and perform searches on those strings, making data available to you in the back- For a refresher, see the “Windows File ground if you need it—and maybe even before you realize you need it. Analysts say that Systems” sidebar. in the long run, an idea such as IQ could make search engines obsolete, automatically building into WinFS and the Longhorn OS the ability to search the Web automatically Linux File Systems alongside the ability to search hard drives. Linux can use several file systems, but IQ searches your email what many people refer to as the “native” program for any messages Linux file system is ext2. Ext2, which first matching your To: line, appeared in 1993, is short for The Second just in case you need to Extended File System. The first version look back at previous (called ext) initially appeared in 1992 but is messages to John. no longer part of the Linux kernel. A third version (called ext3) is backward-compati- IQ would look through your ble with ext2 and essentially adds a journal- hard drive for any email ing file system to ext2. Ext3 is included in messages, calendar entries, some kernels of 2.4.x. However, ext2 is the Word documents, and foundation for the Linux file system. other documents and items Ext2 has its roots as a basic Unix file sys- linked to the conference. tem. Ext2 uses many of the same principles as Windows OS file systems. It stores files in IQ would look for any data blocks (called clusters in Windows file documents, email mes- systems) and uses a hierarchical tree for its sages, calendar items, and structure of directories, subdirectories, and Web sites related to Bill files. Ext2 can handle long file names and Gates and Omaha. WHITE PAPER - HARD HAT AREA WinFS Data Model he WinFS data model relationship, a source type is documents, for instance. (More T describes the concepts for data structure and organiza- required. A target type can be part of the relationship, but it than one folder can have a holding relationship with a between two or more types with a holding relationship in place aren’t allowed. tion within the file system. isn’t required. (A relationship document at a time.) If a Reference relationships. In Within the WinFS data model with a source type and no source-type folder is deleted, a reference relationship, source are types, which describe the target type is called a dangling the target-type document types have no control over tar- pieces of data. Each type has relationship.) Two major types remains available as long as get types. No restrictions on certain properties and fields of relationships exist in WinFS, one or more other source-type relationships are allowed, and that relate to it and describe called holding relationships folders are pointing to it. dangling relationships are it. For example, a type called and reference relationships. However, if all source-type allowed. A reference relation- “person” might have proper- Holding relationships. In a folders are deleted in the hold- ship can involve cycles, too, as ties and fields such as “name” holding relationship, the source ing relationship, and the target- shown in Figure 3. A reference and “address” that relate to type controls the target type, type document has no source- relationship might be used over it. The types and their proper- and the relationship doesn’t type relationships, it’s deleted. a network, for example. If the ties aid in data organization end until the source type ends One other important point network experiences a problem and data searches. it. Dangling relationships are about holding relationships: and the link between a source WinFS types experience not possible in a holding rela- Cycles are not allowed, as and target type is temporarily relationships, too, which are tionship. While organizing data, shown in Figures 1 and 2. Any broken, a reference relationship rules related to organizing WinFS might use a holding holding relationships that could still exist for the other and using the data. Within the relationship with folders and would create a cycle or a loop types that remain linked. Linux. However, Minix probably won’t and HFS+) is the primary file system for Longhorn & WinFS appear in future versions of the Linux the Mac OS X. WinFS (Windows File System) should kernel, as it has been replaced by ext2. HFS Plus is a replacement for HFS, debut in the upcoming Longhorn OS, Read-only file systems. A few read- which was a long-time file system for which is the next major desktop version of only file systems designed for boot disks Macintosh computers developed in the Windows, expected to replace WinXP, are available for Linux, too. Cramfs is a late 1980s. Apple decided to replace HFS probably in 2005 or 2006. (Although some read-only file system that can use com- because of problems it was having with say WinFS is short for Windows Future pression. Romfs is a basic read-only file larger hard drives. (In that regard, HFS is Storage, in a recent speech, Bill Gates called system that cannot use compression. comparable to FAT/FAT16 from Win- it Windows File System.) WinFS is a data Squashfs is a read-only file system still in dows computers, which was replaced by storage system that will make information development that will squeeze as much FAT32 and NTFS.) easier to find. The goal of WinFS is to push data as possible in the boot area. Both HFS and HFS Plus use B-trees for efficient file-storage capabilities to heights Reiserfs. The Linux 2.4.x kernel makes cataloging the file system, just as NTFS that most of us can scarcely imagine today. Reiserfs available. The strength of Reiserfs does. In 2002, Apple added optional jour- Plans for WinFS involve using several lies in its ability to efficiently handle large naling features to HFS Plus for additional types of technologies, including NTFS. numbers of small files. security. Even though Apple uses different Microsoft officials have said that WinFS terminology to describe its file system— won’t replace NTFS but will build on Mac OS X File Systems Apple calls its data blocks allocation top of it, using NTFS and allowing the Newer Macintosh computers use the blocks, while Windows calls them clus- strengths of both technologies to work HFS (Hierarchical File System) Plus file ters—many of the HFS Plus features are together. Industry experts say the ability system, which debuted in Mac OS 8.1. comparable to FAT32 and NTFS in clus- of WinFS to work on top of NTFS HFS Plus (also called HFS Extended ter size and disk space allocation. should speed the acceptance of WinFS HARD HAT AREA - WHITE PAPER Data Storage In Windows File Systems epending on the Windows operating system you’re using, you may have a choice between FAT/FAT16, FAT32, and NTFS as your file D system. As the “Cluster Size” chart and this graphic show, each file system stores data a little differently, which can cause wasted space within the hard drive (also called slack space). FAT/FAT16 vs. FAT32 FAT32 vs. NTFS On a 2GB hard drive or partition, FAT/FAT16 uses a On a 20GB hard drive or partition, FAT32 uses a default default cluster size of 32KB, while FAT32 uses 4KB clusters. cluster size of 16KB, while NTFS uses 4KB clusters. (Each (Each small square represents 1KB; the bold-lined squares small square represents 1KB; the bold-lined squares and and rectangles represent a cluster.) rectangles represent a cluster.) FAT/FAT16 FAT32 FAT32 NTFS In the above example of a 31KB file (blue), both types of file sys- In the above example of a 29KB file (blue), both types of file systems tems have the same amount of slack space (peach). However, in the have the same amount of slack space (peach). However, in the bottom bottom example of a 40KB file, the FAT32 file system still has six 4KB example of a 50KB file, the NTFS file system still has three 4KB clusters clusters available as free space (white) and no slack space, while the available as free space (white) and 2KB of slack space, while the FAT32 FAT/FAT16 file system has 24KB of slack space in the second cluster. file system has 14KB of slack space in the fourth cluster. and Longhorn. If WinFS used a vastly APIs to access data. Each of these tech- Searching. By organizing data in a different file system than is available in nologies will let the WinFS data model DAG, WinFS opens a new world of today’s Windows PCs, Longhorn might strongly handle what Microsoft calls possibilities for searching. Users will experience compatibility problems with the three key components of a data stor- be able to search using multiple criteria, today’s software packages. In addition, age platform: organization, searching, which is impossible in a tree structure. customers would probably be leery of and sharing. Microsoft says the WinFS search capa- making the switch until WinFS had Organization. When organizing and bilities will be superior to the filtering proven its stability. presenting data, WinFS will follow a capabilities of today’s search engines. To make data easier to find, WinFS different path from NTFS, which uses a Sharing. Sharing data between users will use a few different APIs than B-tree structure. WinFS will present and between applications will be an easier NTFS, however. Look for XML tech- data in what Microsoft calls a DAG process under WinFS. The technology in nologies to appear in WinFS and pro- (directed acyclic graphic). Data organi- WinFS for sharing data will allow for a vide the file system with high-end zation will be far more flexible under common security model in Longhorn and data-labeling capabilities. WinFS will WinFS, with the ability to organize by will work well with other types of tech- also use relational and object-oriented several methods, including relationships. nologies, such as peer-to-peer networking. WHITE PAPER - HARD HAT AREA Fragmentation In Windows File Systems Cluster Size n a Windows file system, clusters are W hen the hard drive is empty, it’s easy the hard drive (Figure 2). Although splitting files system can retrieve an entire file from adjacent I the smallest possible storage units on a hard drive. (You can think of a for the file system to fill across different areas of clusters, it can work the clusters in order the hard drive doesn’t faster. Running a defrag- cluster as a drawer in a filing cabinet.) (Figure 1). However, as affect the file, it does menting program will A file can extend over several clusters, files are deleted and as affect system perfor- rearrange the clusters to but each cluster can only hold one file. new files are added, the mance because the file try to place them closer Because of this rule, if a file or a por- file system might not be system must take addi- together (Figure 3). (In tion of a file only occupies a small per- able to squeeze files into tional time to collect each the graphic, different files centage of a cluster, the remainder of the available space, forc- portion of the file from are represented by differ- the cluster is wasted, empty space. ing it to split the file stor- the different areas of the ent colors; each square Clusters play a key role in hard drive age into different areas of hard drive. When the file represents a cluster.) and system performance. Most experts agree that a 4KB cluster size is best for balancing system performance with minimal wasted space. Clusters that are too small result in less wasted, empty space, but they aren’t as efficient in performance. Clusters that are too large have the opposite problems. The Hope For WinFS For example, instead of searching for a Cluster size is dependant on the size of keyword in the file name of a Word partitions, on the size of hard drives, Microsoft hopes WinFS can become a document, users will have the option to and—most importantly—on the file sys- jack-of-all-trades file system, giving search for a topic in the actual text of tem in place. The chart below shows Microsoft a single storage system that can the Word document as part of a general the default cluster sizes for varying find stored information quickly while file search. Developers will also be able sizes of partitions under each type of working well with a variety of applications. to use the metadata system in WinFS to Windows file system. Microsoft says WinFS will be far more than a file system; it will also deal with search all types of applications at the FAT/FAT16 nonfile data, such as personal contacts and same time, which is a difficult task at Partition/ Default email messages. best now. Hard Drive Size Cluster Size Microsoft officials say they expect Many industry experts say the im- 16MB to 127MB 2KB WinFS will be able to simplify and proved search capabilities of WinFS can’t 128MB to 255MB 4KB streamline data organizing, searching, come quickly enough. As hard drive sizes 256MB to 511MB 8KB and sharing. This will be no small task; soar beyond hundreds of gigabytes, 512MB to 1,023MB 16KB after all, application types currently play improved search capabilities are vitally 1,024MB to 2,047MB 32KB the key role in determining the storage important. Without an improved search 2,048MB to 4,096MB 64KB of data. Because each type of applica- capability, efficiently organizing and tion—whether it’s a database, an email using the huge amounts of data stored on FAT32 server, a file system, or another applica- a hard drive will be next to impossible. Partition/ Default tion—follows a slightly different Microsoft has many hopes for the future Hard Drive Size Cluster Size method for storing data, it can be a of WinFS. However, the reality—accord- 0.5GB to 8GB 4KB nightmare to retrieve and find data ing to industry experts and analysts—is that 8GB to 16GB 8KB stored by different applications. the daunting task of making WinFS work 16GB to 32GB 16KB To improve the ability of Windows as expected is going to be extremely diffi- 32GB and larger 32KB computers to search for data throughout cult for Microsoft. In fact, developing all applications, WinFS will use metada- WinFS may already have caused a delay in NTFS ta stored within each file. The metadata the eventual release of Longhorn, which Partition/ Default system, which will use XML technology, initially was rumored to be early in 2005. Hard Drive Size Cluster Size will let developers link specific pieces of The release is now expected to occur some- 0.5GB or less 512 bytes data to each file that will aid in the time later in 2005 or 2006. 0.5GB to 1GB 1KB search capabilities and write applications 1GB to 2GB 2KB that allow for more focused searches. by Kyle Schurman 2GB and larger 4KB HARD HAT AREA - WHITE PAPER Windows File Systems icrosoft currently uses Other companies now will use smaller clusters on larger Log File. Whenever changes M two file systems with its newest Windows operating have the option of licensing Microsoft’s various versions of hard drives and partitions. (See the “Cluster Size” chart.) This fea- occur in the file system, NTFS records the change in the Log systems: FAT32 and NTFS. its FAT file system, thanks to a ture greatly improved the perfor- File. It doesn’t record the actual However, Microsoft’s file system recent decision by the company. mance of FAT32 over FAT, in changes to the data; it just technology started with FAT. Microsoft says it wants to join a most instances. makes note of when the change broad industry effort to make occurred, which is useful to par- FAT/FAT16 core technologies more avail- NTFS ticular types of software, such as In older versions of Windows able to all kinds of companies Windows NT 3.1, which ap- antivirus software. (Win95 and older), most hard dri- through licensing. peared in 1993, was the first MFT Mirror. The MFT Mirror ves used the FAT file system. The Because FAT is a popular for- to use NTFS. WinNT 4.0 users is a copy of the first 16 MFT files, latest, 16-bit version of FAT goes mat for exchanging media be- had the option of using FAT or which are responsible for aspects by the name FAT16. Win95 tween computers and digital de- NTFS. Microsoft’s release of the of system operation. NTFS stores OSR2/98/XP users have the option vices—and because it works well WinXP Home Edition operating the MFT Mirror in a different area of using FAT or FAT32. (WinXP with many operating systems— system marked the first time of the hard drive than the MFT, users can also select NTFS.) Microsoft decided to make FAT Windows home users had the allowing the MFT Mirror to serve FAT can trace its roots to one of the first technologies it’s option of using NTFS instead of as a backup if the MFT suffers 1976, when it appeared in BASIC, offering under the new policy. FAT or FAT32. damage. NTFS reserves the first and 1981, when it appeared in By having FAT available for Microsoft designed NTFS to 12% of the hard drive for the Mi-crosoft’s first version of DOS. licensing, Microsoft hopes other be a more reliable and secure MFT, and the 16 system opera- The initial FAT only worked with companies will have an easier option for network users in a tion files are at the beginning of hard drives up to 32MB in size. time making compatible prod- corporate environment. NTFS that area. The MFT Mirror is in As hard drives grew throughout ucts. You can find more informa- technology gave system admin- the middle of the other 88% of the 1980s and early 1990s, FAT tion on Microsoft’s licensing istrators the option of assigning the hard drive. morphed in subsequent versions policy at www.microsoft.com permission to individual files Quota Table. Microsoft of DOS to allow for larger hard /mscorp/ip/. within folders, which allows for added this metadata file to the drives. By 1991, Microsoft re- flexible control of the network’s MFT in Win2000/XP. The Quota leased DOS 5.0, and it included a FAT32 most secure files. Table gives you control over the 16-bit FAT, which supported hard FAT32 is a 32-bit file system, One area where NTFS shines amount of hard drive space any drives up to 2GB in size in which allows it to address more is in its file organization. NTFS directory can occupy. For exam- Win95/98 and up to 4GB in size clusters than FAT/FAT16. FAT32 stores file attributes in its MFT ple, this feature might be handy in WinNT. FAT has continued to initially appeared in 1996 with (master file table), which is well at home if you’re sharing a use 16-bit technology. One of the OSR2 release of Win95, organized and can hold a large computer with the rest of your FAT’s biggest advantages is its which allowed for hard drives of amount of file attribute informa- family; each person’s folder for ability to work with computers larger than 2GB. (Win95 OSR2 tion. The MFT is similar to FAT’s personal files can have a limit running a variety of Windows was only available to computer file allocation table, but the MFT on its size. operating systems. manufacturers; Win98 was the can store a more complex set of NTFS also uses B-trees when FAT works best with smaller first retail OS featuring FAT32.) data about the files. dealing with large folder struc- hard drives because its simplis- Microsoft attempted to make MFT consists of metadata files, tures. (A B-tree is a method of tic design allows it to work FAT and FAT32 as compatible which NTFS uses to manage the organizing and finding files in a quickly and provide good data- as possible, and they’re fairly file system. Some of the impor- database or on a hard drive.) The access times on a small hard similar. However, a few signifi- tant MFT metadata files include: B-trees let NTFS precisely orga- drive. FAT’s simplistic design cant differences can cause Bad Cluster File. NTFS uses nize the attributes for the files hampers its ability to work problems with running certain this metadata file to mark any and folders within the large fold- through a lot of data on larger types of older software on malfunctioning clusters on the er structure, making for efficient hard drives, though, resulting in FAT32 systems. For example, hard drive and avoids storing searches. This folder structure poor data-access times. FAT also some hard drive compression data there. wasn’t available with FAT. v makes inefficient use of hard software made for FAT will not Cluster Allocation Bitmap. drive space, which is a problem run on FAT32. This map of all clusters on the See www.cpumag.com/cpufeb that’s compounded on larger FAT32’s biggest advantage partition helps the file system 04/filesystems for information hard drives. over FAT/FAT16 is its ability to find any available clusters. on NTFS search methods. NTFS Search Methods NTFS uses B-trees to aid in organizing and finding files on the hard drive. (A B-tree diagram is shown below.) At right, you can see that, in most instances, a tree-type search is faster than a brute search, which goes from top to bottom in a list. Within the NTFS tree search, each record specifies whether the file in ques- tion is above or below the current position. As the search pro- gresses, the number of files remaining to be searched continues to be cut in half. SOURCE: DIGIT-LIFE.COM, WHATIS.COM