DISTRIBUTED DBMS SUSHIL KULKARNI DDBMS Concepts Applications Characteristics, Properties of DDBMS Distributed Processing Advantages & Disadvantages DDBMS Types & Functions of DDBMS Main Issues of DDBMS Component Architecture for DDBMS Data Allocation & Fragmentation Transparencies CONCEPTS CONCEPTS • So far, we assume a centralized database Data are stored in one location (e.g. a single hard disk) A centralized database management system to handle transaction To handle multiple requests, a client-server system is used - Client send requests for data to server - Server handle query, transaction management etc. SUSHIL KULKARNI CONCEPTS • This is not the only possibility • In many cases, it may be advantageous for data to be distributed – Branches of a bank – Different part of the government storing different kind of data about a person – Different organizations sharing part of their data • Thus, distributed databases SUSHIL KULKARNI CONCEPTS • Data spread over multiple machines (also referred to as sites or nodes. • Network interconnects the machines • Data shared by users on multiple machines SUSHIL KULKARNI CONCEPTS Distributed database Logical interrelated collection of shared data, along with description of data, physically distributed over a computer network. SUSHIL KULKARNI CONCEPTS Distributed DBMS The software system that permits the management of the distributed database and makes the distribution transparent to users SUSHIL KULKARNI CONCEPTS Applications • User access distributed database via applications SUSHIL KULKARNI CONCEPTS TWO types of Applications • Local application : Application that do not required data from other sites. • Global application : Application that required data from other sites. SUSHIL KULKARNI TYPES OF DDBMS • In a homogeneous distributed database: – All sites have identical software. – Are aware of each other and agree to cooperate in processing user requests. – Each site surrenders part of its autonomy in terms of right to change schemas or software. – Appears to user as a single system. SUSHIL KULKARNI TYPES OF DDBMS • In a heterogeneous distributed database: – Different sites may use different schemas and software. • Difference in schema is a major problem for query processing. • Difference in software is a major problem for transaction processing. – Sites may not be aware of each other and may provide only limited facilities for cooperation in transaction processing. SUSHIL KULKARNI TYPE: HOMOGENEOUS DBMS Identical DBMSs SUSHIL KULKARNI TYPE: HETROGENEOUS DBMS Non-identical DBMSs SUSHIL KULKARNI OBJECTIVES : DISTRIBUTED ARCHITECTURE • Location Transparency – User does not have to know the location of the data. – Data requests automatically forwarded to appropriate sites • Local Autonomy – Local site can operate with its database when network connections fail – Each site controls its own data, security, logging, recovery SUSHIL KULKARNI SIGNIFICANT TRADE -OFF Synchronous • Asynchronous Distributed Database Distributed Database • All copies of the same • Some data inconsistency data are always is tolerated identical • Data update propagation is delayed • Data updates are immediately applied to • Lower data integrity all copies throughout • Less overhead faster network response time • Good for data integrity • High overhead slow response times NOTE: all this assumes replicated data (to be discussed later) Advantages & Disadvantages Advantages Disadvantages • Increased reliability • Software cost & & availability complexity • Local control • Processing overhead • Modular growth • Data integrity • Lower • Slow response communication costs • Faster response DISTRIBUTED PROCESSING A centralized database that can be accessed over a computer network. SUSHIL KULKARNI DISTRIBUTED PROCESSING T T T T T T COM 1 COM 2 Communication Network DB T T T SUSHIL KULKARNI COM 3 FUNCTIONS OF DDBMS Functions of a centralized DBMS plus: extended communication to allow the transfer of queries and data among sites extended system catalog to store data distribution details distributed query processing , including query optimization SUSHIL KULKARNI FUNCTIONS OF DDBMS extended concurrency control to maintain consistency of replicated data. extended recovery services to take account of failures of individual sites and common links SUSHIL KULKARNI TWO MAIN ISSUES IN DDBMS Making query from one site to the same or remote site. Logical database is partitioned in to different data streams and located at different sites. SUSHIL KULKARNI COMPONENT ARCHITECTURE FOR DDBMS • Local DBMS • Data Communication Component • Global System Catalog • Distributed DBMS component SUSHIL KULKARNI DATA ALLOCATION DATA ALLOCATION • Centralized • Fragmented • Complete replication • Selective replication SUSHIL KULKARNI Distributed Data Storage • Assume relational data model. • Replication: – System maintains multiple copies of data, stored in different sites, for faster retrieval and fault tolerance. • Fragmentation: – Relation is partitioned into several fragments stored in distinct sites • Replication and fragmentation can be combined: – Relation is partitioned into several fragments: System maintains several identical replicas of each such fragment. Data Replication • A relation or fragment of a relation is replicated if it is stored redundantly in two or more sites. • Full replication of a relation is the case where the relation is stored at all sites. • Fully redundant databases are those in which every site contains a copy of the entire database. SUSHIL KULKARNI Data Replication (Cont.) Data Replication • Advantages of Replication: – Availability: failure of site containing relation r does not result in unavailability of r is replicas exist. – Parallelism: queries on r may be processed by several nodes in parallel. – Reduced data transfer: relation r is available locally at each site containing a replica of r. SUSHIL KULKARNI Data Replication (Cont.) Data Replication • Disadvantages of Replication – Increased cost of updates: each replica of relation r must be updated. – Increased complexity of concurrency control: concurrent updates to distinct replicas may lead to inconsistent data unless special concurrency control mechanisms are implemented. • One solution: choose one copy as primary copy and apply concurrency control operations on primary copy. Data Fragmentation • Division of relation r into fragments r1, r2, …, rn which contain sufficient information to reconstruct relation r. • Horizontal fragmentation: each tuple of r is assigned to one or more fragments. • Vertical fragmentation: the schema for relation r is split into several smaller schemas. – All schemas must contain a common candidate key (or superkey) to ensure lossless join property. – A special attribute, the tuple-id attribute may be added to each schema to serve as a candidate key. • Example : relation account with following schema. • Account-schema = (branch-name, account-number, balance). HORIZONTAL FRAGMENTATION Original relation A1 A2 ………. An T1 A1 A2 ………. An T2 T1 T3 1 T2 . 1 T3 .T60 1 . 2 Site 1 .T60 T61 2 A1 A2 ………. An 3 . T61 3 . . 3 Tn . Tn Site 2 -Fragments contain subsets of complete tuples (all attributes at all sites) How to reconstruct R= Rs1 Rs2 ……. Rsn VERTICAL FRAGMENTATION A1 A2 A3 A4 Original t1 How to Reconstruct: Relation (R) t2 R=Rs1 Rs2 Rsn TID –Tuple ID Hidden Attribute to ensure account tn and simple join reconstruction A1 A2 TID TID A3 A4 RS2 t1 1 1 t1 RS1 2 2 t2 t2 RS1.TID=RS2.TID n n tn tn Join condition SITE1 SITE2 VERTICAL FRAGMENTATION A1 A2 A3 A4 Original t1 How to Reconstruct: Relation (R) t2 R=Rs1 Rs2 Rsn TID –Tuple ID Hidden Attribute to ensure account tn and simple join reconstruction A1 A2 TID TID A3 A4 RS2 t1 1 1 t1 RS1 2 2 t2 t2 RS1.TID=RS2.TID n n tn tn Join condition SITE1 SITE2 MIXED FRAGMENTATION Rs1 A1 A2 A3 A4 A5 Rs3 u R s a A1 A2 A3 A4 A5 Rs2 A1 A2 A3 E A4 A5 u (Salary (Benefit r Attributes) Attributes) o p Rs4 e MIXED FRAGMENTATION A1 A2 A3 A4 Original t1 How to Reconstruct: Relation (R) t2 R=Rs1 Rs2 Rsn TID –Tuple ID Hidden Attribute to ensure account tn and simple join reconstruction A1 A2 TID TID A3 A4 RS2 t1 1 1 t1 RS1 2 2 t2 t2 RS1.TID=RS2.TID n n tn tn Join condition SITE1 SITE2 Horizontal Fragmentation of account Relation branch-name account-number balance Hillside A-305 500 Hillside A-226 336 Hillside A-155 62 account1=branch-name=“Hillside”(account) branch-name account-number balance Valleyview A-177 205 Valleyview A-402 10000 Valleyview A-408 1123 Valleyview A-639 750 account2=branch-name=“Valleyview”(account) SUSHIL KULKARNI Vertical Fragmentation of employee-info Relation branch-name customer-name tuple-id Hillside Lowman 1 Hillside Camp 2 Valleyview Camp 3 Valleyview Kahn 4 Hillside Kahn 5 Valleyview Kahn 6 Valleyview Green 7 deposit1=branch-name, customer-name, tuple-id(employee-info) account number balance tuple-id A-305 500 1 A-226 336 2 A-177 205 3 A-402 10000 4 A-155 62 5 A-408 1123 6 A-639 750 7 deposit2=account-number, balance, tuple-id(employee-info) Advantages of Fragmentation • Horizontal: – allows parallel processing on fragments of a relation – allows a relation to be split so that tuples are located where they are most frequently accessed • Vertical: – allows tuples to be split so that each part of the tuple is stored where it is most frequently accessed – tuple-id attribute allows efficient joining of vertical fragments – allows parallel processing on a relation • Vertical and horizontal fragmentation can be mixed. – Fragments may be successively fragmented to an arbitrary depth. SUSHIL KULKARNI REPLICATION and FRAGMENTATION Partition of Attributes/tuples need not be disjoint A1 A2 A3 A4 A5 A1 A2 A3 A4 A2 A3 A4 A5 Overlap (replication of attributes) TRANSPARENCIES TRANSPARENCIES IN DDBMS • Transparencies hide implementation details from the user • Example in Centralized databases : Data independence • Main types of transparencies in DDBMS: o Distributed Transparency o Transaction Transparency SUSHIL KULKARNI DISTRIBUTED TRANSPARENCY Allows the user to see the database as a single, logical entity. If this transparency is exhibited then the user does not need to know that 1. The data are partitioned. 2. Data can be replicated at several sites. 3. Data location. SUSHIL KULKARNI EXAMPLE Staff (staffNo, position, sex, dob, salary, fName, lName, branchNo) Vertical fragmentation: S Π staffNo, position, sex , dob, salary (Staff) 1 S Π staffNo, fName, lName , dbranchNo (Staff) 2 SUSHIL KULKARNI EXAMPLE Fragment S 2 according to branch number. Assume that there are only three branches. Horizontal fragmentation: S (Staff) 21 branchNo ' B003 ' S (Staff) 22 branchNo ' B005 ' S (Staff) 23 branchNo ' B007 ' SUSHIL KULKARNI EXAMPLE Assume that : S 1 and S 2 are at site 5, S 21 at site 3 S 22 at site 5 S 23 at site 7 SUSHIL KULKARNI FRAGMENTATION TRANSPARENCY If it is provided then the user does not need to know the data is fragmented. Example: SELECT fName, lName FROM Staff WHERE position = ‘ Manager ’ SUSHIL KULKARNI LOCATION TRANSPARENCY If it is provided then the user must know how the data has been fragmented but still does not have know the location of the data. SUSHIL KULKARNI LOCATION TRANSPARENCY Example: SELECT fName, lName FROM S21 WHERE staffNo IN (SELECT staffNO FROM S1 where position = ‘ Manager ’) UNION SELECT fName, lName FROM S22 WHERE staffNo IN (SELECT staffNO FROM S1 where position = ‘ Manager ’) SUSHIL KULKARNI LOCATION TRANSPARENCY Example: UNION SELECT fName, lName FROM S23 WHERE staffNo IN (SELECT staffNO FROM S1 where position = ‘ Manager ’ ) SUSHIL KULKARNI LOCAL MAPPING TRANSPARENCY If it is provided then the user must know how the data has been fragmented as well as the location of the data. SUSHIL KULKARNI LOCATION TRANSPARENCY Example: SELECT fName, lName FROM S21 AT SITE 3 WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 5 where position = ‘ Manager ’) UNION SELECT fName, lName FROM S22 AT SITE 5 WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3 where position = ‘ Manager ’) SUSHIL KULKARNI LOCATION TRANSPARENCY Example: UNION SELECT fName, lName FROM S23 AT SITE 7 WHERE staffNo IN (SELECT staffNO FROM S1 AT SITE 3 where position = ‘ Manager ’ ) SUSHIL KULKARNI TRANSACTION TRANSPARENCY It maintains distributed database’s integrity and consistency. SUSHIL KULKARNI QUERY PROCESSING IN DDMS Issues 1: Horizontal Parallel Processing across Fragments fragmentations =Emp1 U Emp2 LName(salary>40,000(Employee)) 2 Fragments LName( salary>40,000(Emp1)) U LName( salary>40,000(Emp2)) Site 1 Site 2 Execution in Parallel on fragments and union results together QUERY PROCESSING IN DDMS Site1 Site2 Site3 Joins- symmetric and associative (A B) C Parallel Processing (xx(A)) (B C) A (B C) QUERY PROCESSING IN DDMS Join Strategies R= Fnames, Cnames, Dnames (Employee Department) Site 3 Site 1 Mg rssn Site 2 100 records, 2000 bytes 10,000 records, to ssn 100 records, 3000 1,000,000 bytes bytes Strategies: 1,003,000 1)Ship both relations to the result site and join there bytes transfered 1,002,000 2)Ship employee to 2, join at 2, results to 3 bytes transfered 3)Ship Department to 1, join at 1, results to 3 5,000 bytes transfered minimize total communication cost of data transfer THANKS !