Designing the data warehouse data marts
Shared by: pptfiles
-
Stats
- views:
- 22
- posted:
- 8/8/2011
- language:
- English
- pages:
- 45
Document Sample


Designing the data warehouse
/ data marts
Methodologies and Techniques
Basic principles
Life cycle of the DW
First time load
Operational Databases Warehouse Database
Refresh
Refresh
Purge or Archive
Refresh
Oracle Warehouse
Any Source Components
Any Data Any Access
Relational
Relational / tools
Operational Multidimensional
data
Oracle Medi`
OLAP
Text, image Spatial
tools
Audio,
External Web video
data Applications/ Web
Oracle Intelligence Tools
IS develops Business users Analysts
user’s Views
Current Tactical Strategic
Oracle Reports Oracle Discoverer Oracle Express
Oracle Data Mart Suite
Data Modeling
Oracle Data Mart Designer
OLTP Data Mart
Databases Database
OLTP Ware-
Engines housing Oracle8
Engines
SQL*PLUS
Data Data Data Access
Extraction Management & Analysis
Oracle Data Mart Oracle Enterprise Discoverer &
Builder Manager Oracle Reports
“Big Bang” Approach:
Advantages and
• Disadvantages
Advantages:
– warehouse built as part of major project
(eg: BPR)
– Having a “big picture” of the data
warehouse before starting the data
warehousing project
• Disadvantages:
– Involves a high risk, takes a longer time
– Runs the risk of needing to change
requirements
Incremental Approach to
Warehouse Development
Strategy • Multiple iterations
• Shorter
Definition
Analysis
Design implementations
Build
• Validation of each
Production
phase
Benefits of an Incremental
Approach
• Delivers a strategic data warehouse
solution through incremental development
efforts
• Provides extensible, scalable architecture
• Quickly provides business benefits and
ensures a much earlier return of
investment
• Allows a data warehouse to be built based
on a subject or application area at a time
• Allows the construction of an integrated
data mart environment
Data Mart
• A subset of a data warehouse that
supports the requirements of a
particular department or business
function.
• Characteristics include:
– Do not normally contain detailed operational
data unlike data warehouses.
– May contain certain levels of aggregation
Dependent Data Mart
Flat Files
Operational
Systems Marketing
Marketing
Sales Sales
Finance
Human Resources
Data Finance
Warehouse
Data Marts
External Data
Independent Data Mart
Operational Flat Files
Systems
Sales or Marketing
External Data
Reasons for Creating a Data
Mart
• To give users more flexible access to
the data they need to analyse most
often.
• To provide data in a form that matches
the collective view of a group of users
• To improve end-user response time.
• Potential users of a data mart are
clearly defined and can be targeted for
support
Reasons for Creating a Data
Mart
• To provide appropriately structured data as
dictated by the requirements of the end-user
access tools.
• Building a data mart is simpler compared with
establishing a corporate data warehouse.
• The cost of implementing data marts is far
less than that required to establish a data
warehouse.
Data Marts Issues
• Data mart functionality
• Data mart size
• Data mart load performance
• Users access to data in multiple data
marts
• Data mart Internet / Intranet access
• Data mart administration
• Data mart installation
Example of DW tool OLAP
• Rotate and drill down to successive
levels of detail.
• Create and examine calculated data
interactively on large volumes of data.
• Determine comparative or relative
differences.
• Perform exception and trend analysis.
• Perform advanced analytical functions
for example forecasting, modeling, and
regression analysis
Original OLAP Rules
1. Multidimensional conceptual view
2. Transparency
3. Accessibility
4. Consistent reporting performance
5. Client-server architecture
Original OLAP Rules
6. Multiuser support
7. Unrestricted cross-dimensional
operations
8. Intuitive data manipulation
9. Flexible reporting
10. Unlimited dimensions and
aggregation levels
Relational Database Model
Attribute 1 Attribute 2 Attribute 3 Attribute 4
Name Age Gender Emp No.
Row 1 Anderson 31 F 1001
Row 2 Green 42 M 1007
Row 3 Lee 22 M 1010
Row 4 Ramos 32 F 1020
The table above illustrates the employee relation.
Multidimensional Database
Customer Store
Model
Store
Time Time
SALES FINANCE
Product GL_Line
The data is found at the intersection of
dimensions.
Two dimensions
Three dimensions
Specialised Multidimensional tool
• Benefits:
– Quick access to very large volumes of data
– Extensive and comprehensive libraries of
complex functions
• analysis
• Strong modeling and forecasting capabilities
– Can access multidimensional and relational
database structures
– Caters for calculated fields
• Disadvantages:
– Difficulty of changing model
– Lack of support for very large volumes of data
– May require significant processing power
MOLAP Server
• The application layer
stores data in a
multidimensional structure DSS client
• The presentation layer
provides the
MOLAP
multidimensional view Engine
• Efficient storage and processing Application
• Complexity hidden from the layer
user
• Analysis using preaggregated
summaries and precalculated Warehouse
measures
ROLAP Server
• The warehouse stores DSS client
atomic data.
• The application layer
ROLAP
generates SQL for the engine
three- dimensional view. Application
• The presentation layer Multiple layer
SQL
provides the
multidimensional view.
Warehouse
server
MOLAP
MDDB
Query
Periodic
load Data
Warehouse Express Express
Server user
ROLAP
Cache
Live
fetch Query
Data Data
cache
Express Express
Warehouse
Server user
Also Hybrid (HOLAP)
Choosing a Reporting
Architecture
• Business needs Good
• Potential for growth MOLAP
• interface Query
Performance
• enterprise architecture ROLAP
OK
• Network architecture
Simple Complex
• Speed of access Analysis
• Openness
Data Acquisition
• Identify, extract, transform, and transport
source data
• Consider internal and external data
• Perform gap analysis between source data
and target database objects
• Plan move of data between sources and target
• Define first-time load and refresh strategy
• Define tool requirements
• Build, test, and execute data acquisition
modules
Modeling
• Warehouses differ from operational
structures:
– Analytical requirements
– Subject orientation
• Data must map to subject oriented
information:
– Identify business subjects
– Define relationships between subjects
– Name the attributes of each subject
• Modeling is iterative
• Modeling tools are available
Modeling the Data Warehouse
1
1. Defining the business
model Select a
2. Creating the dimensional business
process
model
2, 3
3. Modeling summaries
4. Creating the physical model
4
Physical model
Identifying Business Rules
Location Product
Geographic proximity Type Monitor Status
0 - 1 miles PC 15 inch New
1 - 5 miles Server 17 inch Rebuilt
> 5 miles 19 inch Custom
None
Time Store
Month > Quarter > Year Store > District > Region
Creating the Dimensional Model
Identify fact tables
– Translate business measures into fact
tables
– Analyze source system information for
additional measures
– Identify base and derived measures
– Document additivity of measures
Identify dimension tables
Link fact tables to the dimension
tables
Create views for users
Dimension Tables
Dimension tables have the following
characteristics:
• Contain textual information that
represents the attributes of the business
• Contain relatively static data
• Are joined to a fact table through a
foreign key reference Product Channel
Facts
(units,
price)
Customer Time
Fact Tables
Fact tables have the following characteristics:
• Contain numeric measures (metrics) of the
business
• May contain summarized (aggregated) data
• May contain date-stamped data
• Are typically additive
• Have key value that is typically a concatenated
key composed of the primary keys of the
dimensions
• Joined to dimension tables through foreign
keys that reference primary keys in the
dimension tables
Dimensional Model (Star
Schema)
Fact table
Product Channel
Facts
(units,
price)
Customer Time
Dimension tables
Star Schema Model
Product Table Store Table
Product_id Store_id
Product_desc District_id
… ...
Sales Fact Table
• Central fact table Product_id
Store_id
• Radiating dimensions Item_id
Day_id
• Denormalized model Sales_dollars
Sales_units
...
Time Table Item Table
Day_id Item_id
Month_id Item_desc
Period_id ...
Year_id
Star Schema Model
• Easy for users to understand
• Fast response to queries
• Simple metadata
• Supported by many front end tools
• Less robust to change
• Slower to build
• Does not support history
Snowflake Schema Model
Product Table Store Table
District Table
Product_id Store_id
District_id
Product_desc Store_desc
District_desc
District_id
Sales Fact Table
Item_id
Store_id
Sales_dollars
Sales_units
Time Table Item Table Dept Table Mgr Table
Week_id Item_id Dept_id Dept_id
Period_id Item_desc Dept_desc Mgr_id
Year_id Dept_id Mgr_id Mgr_name
Snowflake Schema Model
• Direct use by some tools
• More flexible to change
• Provides for speedier data loading
• May become large and
unmanageable
• Degrades query performance
• More complex metadata
Using Summary Data
Phase 3: Modeling summaries
• Provides fast access to precomputed
data
• Reduces use of I/O, CPU, and memory
• Is distilled from source systems and
precalculated summaries
• Usually exists in summary fact tables
Designing Summary Tables
• Average • Total
• Maximum • Percentage
Units Sales(€) Store
Product A
Total
Product B
Total
Product C
Total
Summary Tables Example
SALES FACTS SALES BY MONTH/REGION
Sales Region Month Month Region Tot_Sales$
10,000 North Jan 99 Jan 99 North 41,000
12,000 South Feb 99 Jan 99 East 10,000
11,000 North Jan 99 Feb 99 South 40,000
15,000 West Mar 99 Mar 99 West 17,000
18,000 South Feb 99
20,000 North Jan 99
10,000 East Jan 99
2,000 West Mar 99 SALES BY MONTH
Month Tot_Sales
Jan 99 51,000
Feb 99 40,000
Mar 99 17,000
Summary Management
in Oracle8i
Sales Sales
summary
Region
State
City
Product Time
Summary advisor
Summary Space
usage Summary requirements
recommendations
The Time Dimension
• Time is critical to the data warehouse.
• A consistent representation of time is
required for extensibility.
Time
Sales fact
dimension
How and where should it be stored?