Data Warehouse Concepts
Contents
Data & Information Introduction to Data warehouse (DWH) Characteristics of DWH Operational System Vs DWH DWH Architectures Data Marts Metadata
Data & Information
A fundamental concept of data warehouse is the distinction between data and information. Data is composed of observable and recordable facts that are often found in operational or transactional systems. In a data warehouse environment, data only comes to have value to end-users when it is organized and presented as information. Information is an integrated collection of facts and is used as the basis for decision making.
Introduction to Data Warehouse
Definitions:
"A data warehouse is a subject oriented, integrated, time-variant, non volatile collection of data in support of management's decision making process". A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. A Data Warehouse is a structured repository (Subject Oriented) of Historic Data.
Data warehouses separate analysis part from transactional part and enables the organization to collect data from several sources.
Characteristics of Data Warehouse
Data Warehouse is usually: Subject Oriented Integrated Non-Volatile Time-Variant Accessible & Process Oriented
Subject Oriented
Sales
DWH
Marketing
Finance
Information is presented according to specific subjects or Information is presented according to specific subjects or areas of interest. areas of interest. Data is manipulated to provide information about a particular Data is manipulated to provide information about a particular subject. subject.
Integrated
Operational Systems Appln A – m/f Appln B – 1/0 Appln C – Male/Female Appln A – Bal_On_Hand Appln B – Current_Balance Appln C – Cash_On_Hand DWH m/f
Current_Balance
Though the data in the data warehouses is scattered around different tables, databases or even servers but the data is integrated consistently in the values of variables, naming conventions and physical data definitions (datatype).
Time-Variant
Operational Systems View of Business Today DWH Designated Time Frame (3 – 10 years). DWH stores historical data.
Contains a history of the subject, as well as current Contains a history of the subject, as well as current information. information. Historical information is an important component of a data Historical information is an important component of a data warehouse. warehouse.
Non-Volatile
Operational Systems
Insert Create Read Load
DWH
Read Read
Update
Read Delete
Read Read Read Only
Stable information that doesn’t change each time an Stable information that doesn’t change each time an operational process is executed. Information is consistent operational process is executed. Information is consistent regardless of when the warehouse is accessed. regardless of when the warehouse is accessed. There exist only two operations – time based loading of data, There exist only two operations – time based loading of data, accessing the loaded data. accessing the loaded data.
Accessible & Process Oriented
Accessible: The primary purpose of a data warehouse is to provide readily accessible information to end-users. Process-Oriented: It is important to view data warehousing as a process for delivery of information.
Operational System Vs Data Warehouse
Operational System Characteristics Data Focused, Transaction Processing focused system. Current, Near-term (Today, Last week). Day-to-day decisions, Current operational results. Twice daily, Daily, Weekly. Data Warehouse Subject Oriented, Integrated, Non-Volatile, Time-Variant. Historic (Last month, Quarterly, Five years). Long-term decisions, Reporting, Trend detection. Weekly, Monthly, Quarterly.
Age of the data
Primary Use
Frequency of load
DWH Architectures
Data Warehouse Architecture (Basic) Data Warehouse Architecture (with a Staging Area) Data Warehouse Architecture (with a Staging Area and Data Marts)
DWH Architectures (contd..)
Operational Systems
Data Extraction Operational System Data Transformation
Data Warehouse
Data Storing Data Access
Users
Meta Data
Analysis
DWH
Reporting
Data Loading Legacy Systems
Data Warehouse Architecture (Basic)
Mining
DWH Architectures (contd..)
Operational Systems
Data Extraction Operational System Data Transformation
Data Warehouse
Data Storing Data Access
Users
Staging Area
Meta Data
Analysis
DWH
Reporting
Data Loading Legacy Systems
Data Warehouse Architecture (with a Staging Area)
Mining
DWH Architectures (contd..)
Operational Systems
Data Extraction Operational System Data Transformation Staging Area
Data Warehouse
Data Storing Data Marts Data Access
Users
Meta Data
Analysis Sales
DWH
Marketing Reporting
Data Loading Legacy Systems Finance
Data Warehouse Architecture (with a Staging Area and Data Marts)
Mining
Data Marts
Data Marts:
Data mart is a subset of DWH. A data mart is a specialized version of a DWH. A data mart configuration emphasizes easy access to relevant information.
DWH
Data Marts
Data Marts (contd..)
Dependent data mart: Data can be derived from an enterprise-wide data warehouse. Independent data mart: Data can be collected directly from sources.
Data Marts (contd..)
Reasons for creating a Data mart
Eases access to frequently needed data Creates collective view by a group of users Improves end-user response time Ease of creation Lower cost than implementing a full Data warehouse
Metadata
Metadata:
Metadata is data about data. Something can be data and metadata at the same time. It is possible to create meta-meta-...-metadata.
Metadata is used to speed up and enrich searching for resources.
E.g: Browsers automatically download and locally cache metadata, to improve the speed at which files can be accessed and searched.
Questions ?
Thank You !