VIEWS: 25 PAGES: 56 POSTED ON: 9/11/2012
Chapter 10 Distributed Database 10 Management System Database Systems: Design, Implementation, and Management 4th Edition Story of Papa John’s by Nation’s Restaurant News (Aug. 1997) Papa John’s selects Oracle Universal Server as its database platform 10 Papa John’s operates 1300 –plus units in 39 states. The Oracle platform is to support distributed databases for relational video, audio, text, messages, and other types of data. The Key component of Oracle’s Universal data server is Oracle 7, which has the following capabilities. Share data across its communication network, allowing users to access data without knowing or specifying the location Provide parallel server technology Replicates databases among multiple locations What Distributed Databases and Distributed DBMS One database, and many pieces 10 Many large user organizations are abandoning the traditional concept of a massive, single, centralized data repository. To expedite the delivery of data into the hands of those who need it the most, businesses are employing complex new software systems that replace the central-site information warehouse with a more efficient network of data distribution outlets. What Is A Distributed Database and DDBMS? One database, and many pieces 10 A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) permits the management of the DDB and makes the distribution transparent to the users. Transparent-- easily understood; manifest; obvious. The Adv/disadv of Distributed DBMS DDBMS Advantages DDBMS Disadvantages Data are located near the Complexity of “greatest demand” site. management and control 10 Faster data access Faster data processing Security Lack of standards Growth facilitation Increased storage requirements Improved communications Reduced operating costs User-friendly interface Less danger of a single- point failure Processor independence The Evolution of Distributed DBMS Social and Technical Changes in the 1980’s Business operations became more decentralized geographically. 10 Competition increased at the global level. Customer demands and market needs favored a decentralized management style. Rapid technological change created low-cost microcomputers. The LANs became the basis for computerized solutions. The large number of applications based on DBMSs and the need to protect investments in centralized DBMS software made the notion of data sharing attractive. The Evolution of Distributed DBMS Two Database Requirements in a Dynamic Business Environment: Quick ad hoc data access became crucial in the quick- 10 response decision making environment. The decentralization of management structure based on the decentralization of business units made decentralized multiple-access and multiple-location databases a necessity. Developments in the 1990’s affecting DBMS The growing acceptance of the Internet and the World Wide Web as the platform for data access and distribution. The increased focus on data analysis that led to data mining and data warehousing. Distributed Processing and Distributed Database Distributed processing shares the database’s logical processing among two or more physically 10 independent sites that are connected through a network. (See Figure 10.1) Distributed database stores a logically related database over two or more physically independent sites connected via a computer network. (See Figure 10.2) Distributed Database Environment 10 Figure 10.2 Distributed Processing and Distributed Database Distributed processing does not require a distributed database, but a distributed database requires 10 distributed processing. Distributed processing may be based on a single database located on a single computer. In order to manage distributed data, copies or parts of the database processing functions must be distributed to all data storage sites. Both distributed processing and distributed databases require a network to connect all components. What Is A Distributed Database and DDBMS? A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. 10 A distributed database management system (DDBMS) permits the management of the DDB and makes the distribution transparent to the users. Transparent-- easily understood; manifest; obvious. Features of a DDBMS Distribution transparency-- If a DDBMS exhibits it, you, the user, are treating a DDB as a single logical DB and therefore you do not need to know 10 Whether the data is partitioned (Fragmentation transparency) the data location (location transparency) Whether the data is replicated at several sites Performance Transparency– A query can be submitted from any location and it will run with comparable performance. The system performs as if it were a centralized DBMS. Transaction Transparency– A user can run a transaction that updates data at a number of sites or that it will be aborted. The transaction behaves exactly like a local transaction. Heterogeneity transparency-- If a DDBMS exhibits it, it allows the integration of different database models (hierarchical, relational, network) under a 10 common/global schema. Distributed DB Transparency DDBMS transparency features have the common property of allowing the end users 10 to think that he is the database’s only user. Distribution transparency Transaction transparency Failure transparency Performance transparency Heterogeneity transparency Distribution Transparency Distribution transparency allows us to manage a physically dispersed database as though it were a centralized database. 10 Three Levels of Distribution Transparency Fragmentation transparency Location transparency Local mapping transparency Table 10.2 Distribution Transparency Example (Figure 10.9): Employee data (EMPLOYEE) are distributed over three locations: New York, Atlanta, and Miami. 10 Depending on the level of distribution transparency support, three different cases of queries are possible: Figure 10.9 Fragment Locations Distribution Transparency Case 1: DB Supports Fragmentation Transparency 10 SELECT * FROM EMPLOYEE WHERE EMP_DOB < ‘01-JAN-1940’; Distribution Transparency Case 2: DB Supports Location Transparency 10 SELECT * FROM E1 WHERE EMP_DOB < ‘01-JAN-1940’; UNION SELECT * FROM E2 WHERE EMP_DOC < ‘01-JAN-1940’; UNION SELECT * FROM E3 WHERE EMP_DOC < ‘01-JAN-1940’; Distribution Transparency Case 3: DB Supports Local Mapping Transparency 10 SELECT * FROM E1 NODE NY WHERE EMP_DOB < ‘01-JAN-1940’; UNION SELECT * FROM E2 NODE ATL WHERE EMP_DOC < ‘01-JAN-1940’; UNION SELECT * FROM E3 NODE MIA WHERE EMP_DOC < ‘01-JAN-1940’; What Is A Distributed DBMS? Functions of a DDBMS Application interface Validation to analyze data requests 10 Transformation to determine request’s components Query-optimization to find the best access strategy Mapping to determine the data location I/O interface to read or write data Formatting to prepare the data for presentation Security to provide data privacy Backup and recovery Database administration Concurrency control Transaction management Fully Distributed Database Management System 10 Figure 10.4 DDBMS Components Computer workstations that form the network system. Network hardware and software components that 10 reside in each workstation. Communications media that carry the data from one workstation to another. Transaction processor (TP) is the software component found in each computer that (1) request data and (2) receives and processes the application’s data requests. Data processor (DP) stores and retrieves data located at the site. Also known as data manager (DM). A Distributed Transaction 10 Figure 10.12 DDBMS Components DDBMS protocol determines how the DDBMS will: Interface with the network to transport data and commands between DPs and TPs. 10 Synchronize all data received from DPs (TP side) and route retrieved data to the appropriate TPs (DP side). Ensure common database functions in a distributed system -- security, concurrency control, backup, and recovery. Levels of Data & Process Distribution Single-Site Processing, Single-Site Data (SPSD) All processing is done on a single CPU or host computer. All data are stored on the host computer’s local disk. 10 The DBMS is located on the host computer. The DBMS is accessed by dumb terminals. Typical of most mainframe and minicomputer DBMSs. Typical of the 1st generation of single-user microcomputer database. Table 10.1 Nondistributed (Centralized) DBMS 10 Figure 10.6 Levels of Data & Process Distribution Multiple-Site Processing, Single-Site Data (MPSD) All processing is done on different computers sharing a single database 10 Typically, MPSD requires a network file server on which conventional applications are accessed through a LAN. Figure 10.7 Levels of Data & Process Distribution Multiple-Site Processing, Multiple-Site Data (MPMD) Fully distributed DBMS with support for multiple DPs and TPs at multiple sites. 10 Homogeneous DDMS integrate only one type of centralized DBMS over the network. Heterogeneous DDBMS integrate different types of centralized DBMSs over a network. (See Figure 10.8) Distribution Transparency Distribution transparency is supported by distributed data dictionary (DDD) or a distributed data catalog (DDC). 10 The DDC contains the description of the entire database as seen by the database administrator. The database description, known as the distributed global schema, is the common database schema used by local TPs to translate user requests into subqueries. Transaction Transparency Transaction transparency ensures that database transactions will maintain the database’s integrity and consistency. The transaction will be completed 10 only if all database sites involved in the transaction complete their part of the transaction. Related Concepts: Remote Requests Remote Transactions Distributed Transactions Distributed Requests Transaction Transparency Distributed Requests and Distributed Transactions A remote request allows us to access data to be processed by a single remote database processor. 10 (Figure 10.10) A remote transaction, composed of several requests, may access data at only a single site. (Figure 10.11) A distributed transaction allows a transaction to reference several different (local or remote) DP sites. (Figure 10.12) A distributed request lets us reference data from several remote DP sites. (Figure 10.13) It also allows a single request to reference a physically partitioned table. (Figure 10.14) A Remote Request 10 Figure 10.10 A Remote Transaction 10 Figure 10.11 A Distributed Transaction 10 Figure 10.12 A Distributed Request 10 Figure 10.13 Another Distributed Request 10 Figure 10.14 Performance Transparency and Query Optimization The objective of a query optimization routine is to minimize the total cost associated with the execution 10 of a request. The costs associated with a request are a function of the: Access time (I/O) cost involved in accessing the physical data stored on disk. Communication cost associated with the transmission of data among nodes in distributed database systems. CPU time cost associated with the processing overhead of managing distributed transactions. Performance Transparency and Query Optimization Query optimization must provide distribution transparency as well as replica transparency. 10 Replica transparency refers to the DDBMSs ability to hide the existence of multiple copies of data from the user. Most of the query optimization algorithms are based on two principles: Selection of the optimum execution order Selection of sites to be accessed to minimize communication costs Performance Transparency and Query Optimization Operation Modes of Query Optimization Automatic query optimization means that the DDBMS 10 finds the most cost-effective access path without user intervention. Manual query optimization requires that the optimization be selected and scheduled by the end user or programmer. Timing of Query Optimization Static query optimization takes place at compilation time. Dynamic query optimization takes place at execution time. Performance Transparency and Query Optimization Optimization Techniques by Information Used Statistically based query optimization uses statistical 10 information about the database. In the dynamic statistical generation mode, the DDBMS automatically evaluates and updates the statistics after each access. In the manual statistical generation mode, the statistics must be updated periodically through a user-selected utility. Rule-based query optimization algorithm is based on a set of user-defined rules to determine the best query access strategy. Distributed Database Design The design of a distributed database introduces three new issues: 10 How to partition the database into fragments. Which fragments to replicate. Where to locate those fragments and replicas. Data Fragmentation Data fragmentation allows us to break a single object into two or more segments or fragments. Each fragment can be stored at any site over a 10 computer network. Data fragmentation information is stored in the distributed data catalog (DDC), from which it is accessed by the transaction processor (TP) to process user requests. Three Types of Fragmentation Strategies: Horizontal fragmentation Vertical fragmentation Mixed fragmentation A Sample CUSTOMER Table 10 Figure 10.16 Data Fragmentation Horizontal Fragmentation Division of a relation into subsets (fragments) of tuples (rows). Each fragment is stored at a different node, and 10 each fragment has unique rows. Each fragment represents the equivalent of a SELECT statement, with the WHERE clause on a single attribute. Table 10.3 Horizontal Fragmentation of the CUSTOMER Table By State Table Fragments In Three Locations 10 Figure 10.17 Data Fragmentation Vertical Fragmentation Division of a relation into attribute (column) subsets. Each subset (fragment) is stored at a different node, 10 and each fragment has unique columns -- with the exception of the key column. This is the equivalent of the PROJECT statement. Table 10.4 Vertical Fragmentation of the CUSTOMER Table Vertically Fragmented Table Contents 10 Figure 10.18 Data Fragmentation Mixed Fragmentation Combination of horizontal and vertical strategies. A table may be divided into several horizontal subsets 10 (rows), each one having a subset of the attributes (columns). 10 Table 10.5 Mixed Fragmentation of the CUSTOMER Table 10 Figure 10.19 Data Replication Data replication refers to the storage of data copies at multiple sites served by a computer network. Fragment copies can be stored at several sites to serve 10 specific information requirements. The existence of fragment copies can enhance data availability and response time, reducing communication and total query costs. Figure 10.20 Data Replication Mutual Consistency Rule Replicated data are subject to the mutual consistency 10 rule, which requires that all copies of data fragments be identical and that DDBMS must ensure that a database update is performed at all sites where replicas exist. Data Replication Replication Conditions A fully replicated database stores multiple copies of all database fragments at multiple sites. 10 A partially replicated database stores multiple copies of some database fragments at multiple sites. Factors for Data Replication Decision Database Size Usage Frequency Data Allocation Data allocation describes the processing of deciding where to locate data. 10 Data Allocation Strategies Centralized The entire database is stored at one site. Partitioned The database is divided into several disjoint parts (fragments) and stored at several sites. Replicated Copies of one or more database fragments are stored at several sites. Data Allocation Data allocation algorithms take into consideration a variety of factors: 10 Performance and data availability goals Size, number of rows, the number of relations that an entity maintains with other entities. Types of transactions to be applied to the database, the attributes accessed by each of those transactions. Client/Server vs. DDBMS Client/server architecture refers to the way in which computers interact to form a system. 10 It features a user of resources or a client and a provider of resources or a server. The architecture can be used to implement a DBMS in which the client is the transaction processor (TP) and the server is the data processor (DP).
Pages to are hidden for
"Database Systems: Design, Implementation, and Management THIRD EDITION"Please download to view full document