Bootstrapping Grid Database
Krzysztof Kaczmarski
Faculty of Mathematics and Information Science Warsaw University of Technology
Bootstrapping Grid Database
Bootstrapping
• • • • • • • start from something small but at least partially working develop the system in iterations (spiral model) better understand the problem better recognize possible risks continuous system’s operation (!) slow (cautious) evolution of system’s environment slow (cautious) update of cooperating systems
1
Bootstrapping Grid Database
Distributed P2P Database difficulties
• General questions (business problems) – What are we going to share? – What is the connection schema? – Who can join us? – ... • Implementation questions (development problems) – How shall we evaluate queries in uncertain network? – What shall we do in case of redundant (conflicting) information? – How shall we propagate updates (what about temporary off-line sites)? – ...
2
Bootstrapping Grid Database
Current popular P2P applications
• File Sharing – read-only access – simplified data structure and search engines (name index) – download heuristics • Knowledge (scientific) databases – sometimes sophisticated data structures – dedicated servers – web-services architecture – polices for users – cooperation and common strategy • Collaboration networks – focused on many services – partial data sharing
3
Bootstrapping Grid Database
Categories of P2P systems
• open applications – easy to access – open for public – accidental architecture – fully independent nodes (some initiation required) – very simple data/metadata – very dynamic – main goal – get more users • corporation applications – closed to certain group of users – designed architecture – independent nodes (group of specialized servers) – still rather simple data/metadata – not so dynamic – main goal – access data better
4
Bootstrapping Grid Database
Databases and P2P architecture
• architecture must be carefully designed (but still open) • data structures must be fixed (impossible?) • • • • all nodes independent high redundancy of data carefully designed search/update strategies no data without metadata
• It is not easy to create a Grid Database ! – Avoiding concentration points
search strategies, update strategies, enrollment,. . .
– Dynamic cooperation between nodes – Possible unpredictable data redundancy – Short time of a node existence
5
Bootstrapping Grid Database
Strategy for Bootstrapping
• Uncertain goals and future – many open questions • Start from something known – Get working solution as soon as possible – Use well known technologies • Using currently working solution: iterate until everything is done: – Identify and analyze new targets – Design solutions and implement them – Test parts of the system – Deploy new stable version
6
Bootstrapping Grid Database
Bootstrapping Grid
• data sharing – from centralized to decentralized services and data (views) • data schemes – from chained views to cumulated views – from constant to changing data structures (?!!!) • enrollment – from fixed to unlimited number of users
7
Bootstrapping Grid Database
Separated Client–Server Nodes (1)
• • • • • easy maintenance separated data schemes no transparent data sharing nor exchange independent databases all changes are limited to single server
8
Bootstrapping Grid Database
Establishing agreement on common schema between selected nodes (2)
• • • • partial data exchange and sharing some nodes have common concentration point (Global View) each node has its specific Contributory View agreement’s clients access the Global View Node
9
Bootstrapping Grid Database
• Classical methodology of federation construction Strategic phase decision on creating a Grid is made (gov. initiative, . . . ) Analysis phase existing resources are elaborated (heterogeneity, incompleteness, . . . ) Design phase precise definition of global schema and contributions of all participants Finalization phase all participants sign final agreement Implementation phase all necessary data transformations are implemented, grid is created
10
Bootstrapping Grid Database
• Federation Construction
__________ __________ __________ __________ __________ __________
Contributory Schema - slice for node 1 )
Global Schema
__________ __________
___ __________ __________ __________ __________ __________ __________
____ ____
__________ __________
___
__________ __________
Contributory Schema - slice for node 2 )
__________ __________
__________ __________
__________ __________
Contributory Schema - slice for node 3 )
__________ __________
__________ __________
11
Bootstrapping Grid Database
• Data transformation
CV Node 1 to clients
to data source 1
GV CV Node 2 to data source 2
Node 1 Contributory Schema Global Schema Node 2 Contributory Schema
Results of a design phase - constraints for integration
12
Bootstrapping Grid Database
Establishing further cooperation of Global View Nodes (2b)
• • • • • desired level of [all] data sharing/integration desired Global View Nodes have additional Contributory View establishing a next level of Global View Node next level clients access the Global View Node (this step may be repeated up to total data sharing)
CV CV GV CV CV CV CV
...
CV GV CV GV
GV
CV
GV
CV
GV
CV
GV
CV
GV
...
CV
CV
GV CV
13
Bootstrapping Grid Database
Properties of the federated database
• • • • • • • independent nodes local changes limited to one node data is shared as a result of cooperation transparent data sharing concentration point(s) required global management (agreement required) limited number of nodes (consortium)
14
Bootstrapping Grid Database
Identifying and opening data sharing groups (3a)
• some group of users have common needs • they want to share data but also want to be open for new users • Global View for such a group must be redesigned – For each node in ...do ... – must be open for unlimited number of users – merge-like algorithms • each group share only predefined resources (fixed structure) • (but) one node may participate in many sharing groups • Merging View orthogonal to number of participating nodes
15
Bootstrapping Grid Database
How to join a sharing group?
• • • • conform to resources sharing rules create Contributory View if necessary pass authorization (ask Coordination Node) connect to dedicated Merging View
16
Bootstrapping Grid Database
Integration of resources within a single node (3b)
• participation in separated groups may require vertical integration – must be designed for each integration task separately – join-like queries
C1V C1V
...
M1-V I-V M2-V C2V C1V
...
C2V
17
Bootstrapping Grid Database
Distributing Global View within a sharing group (4)
• some coordination required • join (connection) strategy (appropriate for number of users) – all-to-all – single-parent-tree (grand-parent rescue, max.child no.) – rings and other . . .
GV GV GV CV GV CV GV GV
GV
GV
CV
GV
GV
GV
18
Bootstrapping Grid Database
Types of Nodes, Views and Clients in Grid
• Global View – the first level of data integration (stored on Concentration Node) • Contributory View – transforms local Participant’s data to form required by Grid • Merging View – merges horizontally fragmented data • Integration View – integrates vertically fragmented data • • • • Concentration Node – contains Views used by Clients and Participants Client Node – uses Global View Participating Node – shares data within a sharing group Coordination Node – stores information about current Participants, available Resources
19
Bootstrapping Grid Database
Merging-View needs internal states
• current state of merging • list of active nodes in sharing group / connection to coordination node
Merge-View
List of Data Source Nodes
Schema of Res. Updateable View
...
20
Bootstrapping Grid Database
Summary of Bootstrapping Data Sharing and Participation
• from centralized to decentralized services and data (views) – Establishing agreement on common schema between selected nodes – Identifying and opening data sharing groups – Integration of resources within a single node – Distributing Global View within sharing group • from fixed to unlimited number of users – Fixed Membership – dedicated concentration point – Unpredicted node existence – Unlimited number of nodes within a sharing group
21
Bootstrapping Grid Database
Not the last stage yet!
22
Bootstrapping Grid Database
Advantages of a Distributed Store
C1V G1V C2V C3V C2V C3V G2V
• data 100% safe and accesible • no bottle-necks – automatic data replication • easy controll • middle-layer must be really huge (Ocean Store ?)
23
Bootstrapping Grid Database
Grid Modeling Schemes
Local schema describing node’s internal database Global schema grid database available for grid’s users Contributory schema description of node’s shared data Integration schema informal description of integration and semantics
24
Bootstrapping Grid Database
Grid’s modeling language
• description of objects, services (interfaces) • ability to be cut into parts describing roles for nodes • ability to describe data attributes required by data integration • notions for replications’ descriptions • reuse a global schema as a contributory schema (grid embedding) • modeling views
25
Bootstrapping Grid Database
Example
Student ____________ attends Name * IDNumber Year
*
participant
Lecture ____________ Name
Professor ____________ leads Name ____________ * Talk()
*
Lecture ____________ Name
results
Mark ____________ Value
*
Mark ____________ Value StudentName StudentIdNo
26
Bootstrapping Grid Database
Example
Student ____________ attends Name * IDNumber Year
*
participant
Lecture ____________ Name
Professor ____________ leads Name ____________ * Talk()
*
Lecture ____________ Name
results
Mark ____________ Value
*
Mark ____________ Value StudentName StudentIdNo
?
Student ____________ Name IDNumber AvgMark 1.. 5 Year __________ Number AvgMark Lecture ____________ * Name ProfName
27
Bootstrapping Grid Database
Example
Student ____________ attends Name * IDNumber
*
participant
Lecture ____________ Name
Professor ____________ leads Name * ____________ Talk()
*
Lecture ____________ Name
results Mark ____________ Value Year
*
Mark ____________ Value StudentName StudentIdNo Lecture ____________ Name Mark Year Student ____________ Name IDNumber Lecture ____________ Name ProfName Mark
Student ____________ Name IDNumber
*
*
Student ____________ Name IDNumber AvgMark
1.. 5
Year __________ Number AvgMark
Lecture ____________ * Name ProfName
28
Bootstrapping Grid Database
Key Uniqueness Example
Student ____________ Name IDNumber {$: Node} attends
*
participant
*
Lecture ____________ Name {$: Node}
Professor ____________ leads Name ____________ * Talk()
*
Lecture ____________ Name {$: Node}
results Mark ____________ Value Year
*
Mark ___________________ Value StudentName StudentIdNo {$: Lecture} Lecture ____________ Name {$: Student} Mark Year Student ____________ Name IDNumber {$: Node} Lecture ____________ Name {$: Student} ProfName Mark
Student ____________ Name IDNumber {$: Node}
*
*
Student ____________ Name IDNumber {$: Node} AvgMark
1.. 5
Year __________ Number AvgMark
*
Lecture ____________ Name {$: Node} ProfName
29
Bootstrapping Grid Database
Modeling Grid means Modeling Views (1)(dependency)
30
Bootstrapping Grid Database
Modeling Grid means Modeling Views (2)(integration)
31
Bootstrapping Grid Database
Conclusions
• Updateable Views are good for Bootstrapping • Bootstrapping is good for Grid Databases • Updateable Views are good for Grid Databases • Working on Modeling Language • Working on P2P dedicated SBQL algorithms and strategies
32
Bootstrapping Grid Database
Thank You.
33