VIEWS: 23 PAGES: 45 POSTED ON: 8/8/2011
Designing the data warehouse / data marts Methodologies and Techniques Basic principles Life cycle of the DW First time load Operational Databases Warehouse Database Refresh Refresh Purge or Archive Refresh Oracle Warehouse Any Source Components Any Data Any Access Relational Relational / tools Operational Multidimensional data Oracle Medi` OLAP Text, image Spatial tools Audio, External Web video data Applications/ Web Oracle Intelligence Tools IS develops Business users Analysts user’s Views Current Tactical Strategic Oracle Reports Oracle Discoverer Oracle Express Oracle Data Mart Suite Data Modeling Oracle Data Mart Designer OLTP Data Mart Databases Database OLTP Ware- Engines housing Oracle8 Engines SQL*PLUS Data Data Data Access Extraction Management & Analysis Oracle Data Mart Oracle Enterprise Discoverer & Builder Manager Oracle Reports “Big Bang” Approach: Advantages and • Disadvantages Advantages: – warehouse built as part of major project (eg: BPR) – Having a “big picture” of the data warehouse before starting the data warehousing project • Disadvantages: – Involves a high risk, takes a longer time – Runs the risk of needing to change requirements Incremental Approach to Warehouse Development Strategy • Multiple iterations • Shorter Definition Analysis Design implementations Build • Validation of each Production phase Benefits of an Incremental Approach • Delivers a strategic data warehouse solution through incremental development efforts • Provides extensible, scalable architecture • Quickly provides business benefits and ensures a much earlier return of investment • Allows a data warehouse to be built based on a subject or application area at a time • Allows the construction of an integrated data mart environment Data Mart • A subset of a data warehouse that supports the requirements of a particular department or business function. • Characteristics include: – Do not normally contain detailed operational data unlike data warehouses. – May contain certain levels of aggregation Dependent Data Mart Flat Files Operational Systems Marketing Marketing Sales Sales Finance Human Resources Data Finance Warehouse Data Marts External Data Independent Data Mart Operational Flat Files Systems Sales or Marketing External Data Reasons for Creating a Data Mart • To give users more flexible access to the data they need to analyse most often. • To provide data in a form that matches the collective view of a group of users • To improve end-user response time. • Potential users of a data mart are clearly defined and can be targeted for support Reasons for Creating a Data Mart • To provide appropriately structured data as dictated by the requirements of the end-user access tools. • Building a data mart is simpler compared with establishing a corporate data warehouse. • The cost of implementing data marts is far less than that required to establish a data warehouse. Data Marts Issues • Data mart functionality • Data mart size • Data mart load performance • Users access to data in multiple data marts • Data mart Internet / Intranet access • Data mart administration • Data mart installation Example of DW tool OLAP • Rotate and drill down to successive levels of detail. • Create and examine calculated data interactively on large volumes of data. • Determine comparative or relative differences. • Perform exception and trend analysis. • Perform advanced analytical functions for example forecasting, modeling, and regression analysis Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture Original OLAP Rules 6. Multiuser support 7. Unrestricted cross-dimensional operations 8. Intuitive data manipulation 9. Flexible reporting 10. Unlimited dimensions and aggregation levels Relational Database Model Attribute 1 Attribute 2 Attribute 3 Attribute 4 Name Age Gender Emp No. Row 1 Anderson 31 F 1001 Row 2 Green 42 M 1007 Row 3 Lee 22 M 1010 Row 4 Ramos 32 F 1020 The table above illustrates the employee relation. Multidimensional Database Customer Store Model Store Time Time SALES FINANCE Product GL_Line The data is found at the intersection of dimensions. Two dimensions Three dimensions Specialised Multidimensional tool • Benefits: – Quick access to very large volumes of data – Extensive and comprehensive libraries of complex functions • analysis • Strong modeling and forecasting capabilities – Can access multidimensional and relational database structures – Caters for calculated fields • Disadvantages: – Difficulty of changing model – Lack of support for very large volumes of data – May require significant processing power MOLAP Server • The application layer stores data in a multidimensional structure DSS client • The presentation layer provides the MOLAP multidimensional view Engine • Efficient storage and processing Application • Complexity hidden from the layer user • Analysis using preaggregated summaries and precalculated Warehouse measures ROLAP Server • The warehouse stores DSS client atomic data. • The application layer ROLAP generates SQL for the engine three- dimensional view. Application • The presentation layer Multiple layer SQL provides the multidimensional view. Warehouse server MOLAP MDDB Query Periodic load Data Warehouse Express Express Server user ROLAP Cache Live fetch Query Data Data cache Express Express Warehouse Server user Also Hybrid (HOLAP) Choosing a Reporting Architecture • Business needs Good • Potential for growth MOLAP • interface Query Performance • enterprise architecture ROLAP OK • Network architecture Simple Complex • Speed of access Analysis • Openness Data Acquisition • Identify, extract, transform, and transport source data • Consider internal and external data • Perform gap analysis between source data and target database objects • Plan move of data between sources and target • Define first-time load and refresh strategy • Define tool requirements • Build, test, and execute data acquisition modules Modeling • Warehouses differ from operational structures: – Analytical requirements – Subject orientation • Data must map to subject oriented information: – Identify business subjects – Define relationships between subjects – Name the attributes of each subject • Modeling is iterative • Modeling tools are available Modeling the Data Warehouse 1 1. Defining the business model Select a 2. Creating the dimensional business process model 2, 3 3. Modeling summaries 4. Creating the physical model 4 Physical model Identifying Business Rules Location Product Geographic proximity Type Monitor Status 0 - 1 miles PC 15 inch New 1 - 5 miles Server 17 inch Rebuilt > 5 miles 19 inch Custom None Time Store Month > Quarter > Year Store > District > Region Creating the Dimensional Model Identify fact tables – Translate business measures into fact tables – Analyze source system information for additional measures – Identify base and derived measures – Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users Dimension Tables Dimension tables have the following characteristics: • Contain textual information that represents the attributes of the business • Contain relatively static data • Are joined to a fact table through a foreign key reference Product Channel Facts (units, price) Customer Time Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metrics) of the business • May contain summarized (aggregated) data • May contain date-stamped data • Are typically additive • Have key value that is typically a concatenated key composed of the primary keys of the dimensions • Joined to dimension tables through foreign keys that reference primary keys in the dimension tables Dimensional Model (Star Schema) Fact table Product Channel Facts (units, price) Customer Time Dimension tables Star Schema Model Product Table Store Table Product_id Store_id Product_desc District_id … ... Sales Fact Table • Central fact table Product_id Store_id • Radiating dimensions Item_id Day_id • Denormalized model Sales_dollars Sales_units ... Time Table Item Table Day_id Item_id Month_id Item_desc Period_id ... Year_id Star Schema Model • Easy for users to understand • Fast response to queries • Simple metadata • Supported by many front end tools • Less robust to change • Slower to build • Does not support history Snowflake Schema Model Product Table Store Table District Table Product_id Store_id District_id Product_desc Store_desc District_desc District_id Sales Fact Table Item_id Store_id Sales_dollars Sales_units Time Table Item Table Dept Table Mgr Table Week_id Item_id Dept_id Dept_id Period_id Item_desc Dept_desc Mgr_id Year_id Dept_id Mgr_id Mgr_name Snowflake Schema Model • Direct use by some tools • More flexible to change • Provides for speedier data loading • May become large and unmanageable • Degrades query performance • More complex metadata Using Summary Data Phase 3: Modeling summaries • Provides fast access to precomputed data • Reduces use of I/O, CPU, and memory • Is distilled from source systems and precalculated summaries • Usually exists in summary fact tables Designing Summary Tables • Average • Total • Maximum • Percentage Units Sales(€) Store Product A Total Product B Total Product C Total Summary Tables Example SALES FACTS SALES BY MONTH/REGION Sales Region Month Month Region Tot_Sales$ 10,000 North Jan 99 Jan 99 North 41,000 12,000 South Feb 99 Jan 99 East 10,000 11,000 North Jan 99 Feb 99 South 40,000 15,000 West Mar 99 Mar 99 West 17,000 18,000 South Feb 99 20,000 North Jan 99 10,000 East Jan 99 2,000 West Mar 99 SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000 Summary Management in Oracle8i Sales Sales summary Region State City Product Time Summary advisor Summary Space usage Summary requirements recommendations The Time Dimension • Time is critical to the data warehouse. • A consistent representation of time is required for extensibility. Time Sales fact dimension How and where should it be stored?
Pages to are hidden for
"Designing the data warehouse data marts"Please download to view full document