VIEWS: 1,306 PAGES: 7 CATEGORY: Software POSTED ON: 6/10/2010
what is file organization? how file organization works?? what are the types of file organization?
what is file organization? how file organization works?? what are the types of file organization?
ASSIGNMENT Database Management System Prepared by: Nazama Liaqat Nicegirl_laj@hotmail.com File Organization A file consists of a collection of records. A key element in file management is the way in which the records themselves are organized inside the file, since this heavily affects system performances as far as record finding and access. Note carefully that by ``organization'' we refer here to the logical arrangement of the records in the file and not instead to the physical layout of the file as stored on a storage media, To prevent confusion, the latter is referred to by the expression ``record blocking'', and will be treated later on. Choosing a file organization is a design decision, hence it must be done having in mind the achievement of good performance with respect to the most likely usage of the file. The criteria usually considered important are: 1. Fast access to single record or collection of related records. 2. Easy record adding/update/removal, without disrupting. 3. Storage efficiency. 4. Redundancy as a warranty against data corruption. Needless to say, these requirements are in contrast with each other for all but the most trivial situations, and it's the designer job to find a good compromise among them, yielding and adequate solution to the problem at hand. For example, easiness of adding/etc. is not an issue when defining the data organization of a CD-ROM product, whereas fast access is, given the huge amount of data that this media can store. However, as it will become apparent shortly, fast access techniques are based on the use of additional information about the records, which in turn competes with the high volumes of data to be stored. Logical data organization is indeed the subject of whole shelves of books, in the ``Database'' section of your library. Here we'll briefly address some of the simpler used techniques, mainly because of their relevance to data management from the lower-level (with respect to a database's) point of view of an OS. Five organization models will be considered: Pile. Sequential. Indexed-sequential. Indexed. Hashed. 2 Pile File Organization: Heap file is otherwise known as random file or pile file. In heap file organization, file records are inserted at the end of the file or in any file block with free space, hence insertion of record is efficient. Data in files are collected in the order that they arrives. It is not analyzed, categorized, or forced to fit field definitions or field sizes, at best, the order of the records may be chronological. Records may be of variable length and need not have similar sets of data elements. Uses of Pile File Organization: Heap files are used in situations where data are collected prior to processing, where data are not easy to recognize, and in some research on file structures. Since much of the data collected in real-world situations are in the form of piles, this file organization is considered as the base for other evaluations. Drawback of Pile Organization: In heap file organization, data analysis can become very expensive because of the time required for retrieval of a statistically adequate number of sample records: 1. Searching of records is difficult. Normally linear search is used to locate the record. 2. Deletion of the record is difficult. Because if we want to delete a particular record, first we have to locate the file and delete. Sequential File Organization: This is the most common structure for large files that are typically processed in their entirety, and it's at the heart of the more complex schemes. In this scheme, all the records have the same size and the same field format, with the fields having fixed size as well. The records are sorted in the file according to the content of a field of a scalar type, called ``key''. The key must identify uniquely a records, hence different record have different keys. This organization is well suited for batch processing of the entire file, without adding or deleting items: this kind of operation can take advantage of the fixed size of records and file; moreover, this organization is easily stored both on disk and tape. The key ordering, along with the fixed record size, makes this organization amenable to dichotomy search However, adding and deleting records to this kind of file is a tricky process: the logical sequence of records typically matches their physical layout on the media storage, so to ease file navigation, hence adding a record and maintaining the key order requires a reorganization of the whole file. The usual solution is to make use of a ``log file'' (also called ``transaction file''), structured as a pile, to perform this kind of modification, and periodically perform a batch update on the master file. Advantages of Sequential File Organization: The Sequential file organization permits the economical and efficient use of sequential processing techniques when the activity rate is high, 3 This organization also provides quick access to records in a relatively efficient way. Records can be inserted or updated in the middle of the file. Disadvantages of Sequential file Organization: Indexed file organization is less efficient in the use of storage space than some other file organizations. It requires relatively expensive hardware and software resources. It requires unique keys. Processing is occasionally slow. Requires periodic reorganization of file. Indexed File Organization: Each record in the file has one or more embedded keys (referred to as key data items); each key is associated with an index. An index provides a logical path to the data records according to the contents of the associated embedded record key data items. Indexed files must be direct-access storage files. Records can be fixed length or variable length. Each record in an indexed file must have an embedded prime key data item. When records are inserted, updated, or deleted, they are identified solely by the values of their prime keys. Thus, the value in each prime key data item must be unique and must not be changed when the record is updated. You tell COBOL the name of the prime key data item in the RECORD KEY clause of the file-control paragraph. In addition, each record in an indexed file can contain one or more embedded alternate key data items. Each alternate key provides another means of identifying which record to retrieve. You tell COBOL the name of any alternate key data items on the ALTERNATE RECORD KEY clause of the file-control paragraph. Life sequential organization the data is stored in physical contiguous box. How ever the difference is in the use of indexes. There are three areas in the disc storage: Primary Area: - Contains file records stored by key or ID numbers. Overflow Area: - Contains records area that cannot be placed in primary area. Index Area: - It contains keys of records and there locations on the disc. Advantages of Indexed file: • Faster access to rows where the indexed column is searched on. Disadvantage of Index file: • Inserts & deletes slower • Updates to indexed columns slower • Increases storage used 4 Hash Files Organization: Hashing (hash addressing) is a technique for providing fast direct access to a specific record on the basis of a given value of some field. If two or more key values hash to the same disk address, we have a collision. The hash function should distribute the domain of the key possibly evenly among the address space of the file to minimize the chance of collision. The collisions may cause a page to overflow. 1. Hashing involves computing the address of a data item by computing a function on the search key value. 2. A hash function h is a function from the set of all search key values K to the set of all bucket addresses B. We choose a number of buckets to correspond to the number of search key values we will have stored in the database. To perform a lookup on a search key value Ki, we compute h(Ki), and search the bucket with that address. If two search keys i and j map to the same address, because h(Ki) = h(Kj), then the bucket at the address obtained will contain records with both search key values. In this case we will have to check the search key value of every record in the bucket to get the ones we want. Insertion and deletion are simple. Hash Functions 1. A good hash function gives an average-case lookup that is a small constant, independent of the number of search keys. 2. We hope records are distributed uniformly among the buckets. 3. The worst hash function maps all keys to the same bucket. 4. The best hash function maps all keys to distinct addresses. 5. Ideally, distribution of keys to addresses is uniform and random. 5 6. Suppose we have 26 buckets, and map names beginning with ith letter of the alphabet to the ith bucket. Problem: this does not give uniform distribution. Many more names will be mapped to "A" than to "X". Typical hash functions perform some operation on the internal binary machine representations of characters in a key. For example, compute the sum, modulo # of buckets, of the binary representations of characters of the search key. Handling of bucket over flows 1. Open hashing occurs where records are stored in different buckets. Compute the hash function and search the corresponding bucket to find a record. 2. Closed hashing occurs where all records are stored in one bucket. Hash function computes addresses within that bucket. (Deletions are difficult.) Not used much in database applications. 3. Drawback to our approach: Hash function must be chosen at implementation time. Number of buckets is fixed, but the database may grow. If number is too large, we waste space. If number is too small, we get too many "collisions", resulting in records of many search key values being in the same bucket. Choosing the number to be twice the number of search key values in the file gives a good space/performance trade. Disadvantage: Any search other than on equality is very expensive (linear search or involves sorting). Prediction of total number of buckets is difficult. Allocate a large space. Estimate a ``reasonable'' size and periodically reorganize 6 Indexed Sequential File Organization An index file can be used to effectively overcome the above mentioned problem, and to speed up the key search as well. The simplest indexing structure is the single-level one: a file whose records are pair’s key-pointer, where the pointer is the position in the data file of the record with the given key. Only a subset of data records, evenly spaced along the data file, are indexed, so to mark intervals of data records. A key search then proceeds as follows: the search key is compared with the index ones to find the highest index key preceding the search one, and a linear search is performed from the record the index key points onward, until the search key is matched or until the record pointed by the next index entry is reached. In spite of the double file access (index + data) needed by this kind of search, the decrease in access time with respect to a sequential file is significant. Consider, for example, the case of simple linear search on a file with 1,000 records. With the sequential organization, averages of 500 key comparisons are necessary (assuming uniformly distributed search key among the data ones). However, using and evenly spaced index with 100 entries, the number of comparisons is reduced to 50 in the index file plus 50 in the data file: a 5:1 reduction in the number of operations. This scheme can obviously be hierarchically extended: an index is a sequential file in itself, amenable to be indexed in turn by a second-level index, and so on, thus exploiting more and more the hierarchical decomposition of the searches to decrease the access time. Obviously, if the layering of indexes is pushed too far, a point is reached when the advantages of indexing are hampered by the increased storage costs, and by the index access times as well. Advantages of Indexed sequential file Reorganizing Indexed Files: This operation is usually done by a utility program supplied by the manufacturer of the COBOL compiler that you use. Refer to the manuals of your compiler for details and instructions. References: http://cayfer.bilkent.edu.tr/~cayfer/ctp108/indexed.htm www.cim.mcgill.ca/~franco/OpSys-304-427/lecture-notes/node56.html cayfer.bilkent.edu.tr/~cayfer/ctp108/indexed.htm 7
Pages to are hidden for
"ASSIGNMENT Database Management System Prepared by Nazama Liaqat Nicegirl laj hotmail com File Organization A file consists of a collection of"Please download to view full document