Data Warehouse - DOC by fdjerue7eeu


									Data Warehouse

Read a lot of data warehouse information, involve a "data mart"
this view, the beginning of the data warehouse and data mart must understand the
difference between more superficial, now be a thorough summary of the main from
the following Several aspects of formulation:
(1) basic concepts
(2) Why make a data mart
(3) data warehouse design methodology
(4), data marts and data warehouses difference
(5) Warehouse Modeling and Market Modeling
(6) Case Study: Telecom CRM data warehouse
Bill Inmon said something called "IT managers are faced with in the end
the most important issue is to first establish a data warehouse or data mart to first
establish a" clear enough to explain the relationship between the two is
very important and urgent! Usually before considering the establishment of data
warehouse, will involve the following questions:
(1) to take top-down or bottom-up design
(2) enterprise-wide or sector-wide
(3) to first establish a data warehouse or data mart
(4) to establish or direct the implementation of pilot system
(5) whether the independent data marts

First, the basic concept
The term data warehouse is still no uniform definition of well-known experts in WH
Inmon data warehouse, in his book "Buildingthe Data
Warehouse" to the following description of the book: Data Warehouse
(Data Warehouse) is a subject-oriented (SubjectOri2ented), integrated The (Integrate),
relatively stable (Non-Volatile), reflect the historical changes (TimeVariant) data
collection to support management decisions. The concept for the data warehouse can
be read on two levels, first of all, the data warehouse to support decision-making, for
analytical data processing, which is different from their existing operational databases;
Second, the data warehouse is the number of different effective integration of
configuration data source, after integration was restructured in accordance with the
theme, and includes historical data, and stored in the data warehouse data are
generally not amended.
To achieve maximum flexibility, integration of data warehouse data should be stored
in standard RDBMS, by and through standard database design, and in order to
increase performance of some summary information and non-standard design Xing.
This type of data warehouse design is known as the atomic data warehouse. Subset of
atomic data warehouse, but also Pu Jing Zhi Xun favorite small T interrogate hard
socialist downtown neon pro mountain elf Jing Kang armed tomb Bru dice. Yun Cang
mountain elf holding currency bills Siegesbeckiae hard Zhi Zhi Suan T badger wrote
tai thumb narrow ⒓ faded cabbage steamed lotus Pyo bottle age support Carex Lu
Qi Jue crop-full of noise Tuo not У mother bang in the doorpost toad F Ottawa pupa?
That each data mart to be copied to the other computer, can be used as their own data
warehouse. Data mart can produce them as large as the atomic data warehouse, or
even more. They can be located near the atomic data warehouse or distribution to a
location closer to the user, place where depends on the use and communication costs.
Data marts are used to meet specific application needs of users of data warehouses,
their size may reach several hundred GB. The key to a data mart is its use of objective,
scope, rather than size.
Data mart can be understood as a small department or work group-level data
warehouse. There are two types of data marts (below):
Independent (directly from the operating environment for the data type): These data
mart is a specific working group, department or business line of control, and fully
meet their needs for the built. In fact, they even work with other groups, departments
or business lines in the data mart without any connectivity
Dependent type (from the enterprise data warehouse for data): This data mart often
distributed ways. Although the different data marts is a specific working group,
department or production line to achieve, but they can be integrated, interconnected to
provide a more global view of business data. In fact, the highest integration level, they
can be business data warehouse. This means that a sector end-users can access and use
of data marts in other sectors, the data
Second, why do they move data mart
While OLTP and legacy systems have valuable information, but may be difficult to
extract from these systems meaningful information and speed slower. And although
these systems generally support the operation of pre-defined reports, but often can not
support an organization for the history, united, smart, or easy access to the
information needs. Because the data distribution system and in many cross-platform
table, and usually is "dirty", contains a inconsistent and invalid
value, make it difficult to analyze.
Data mart will be combined in different systems of data sources to meet business
information needs. Effectively be achieved if the data mart will be able to quickly and
easily access information and systems and a simple view of history. A well-designed
data mart has the following features (some features of the data warehouse also has
some features relative to the data warehouse in terms of):
(1) the information needed to specific user groups, usually a department or a specific
organization of the user, without the need for a large number of source systems
subject to demand and operational crisis (like the data warehouse).
(2) to support access to non-volatile (nonvolatile) business information. (Non-volatile
information is updated at scheduled intervals, and from OLTP systems in the update.)
(3) reconcile the organization from multiple operating system information, such as
accounting, sales, inventory and customer management, and external to the
organization of industry data.
(4) by default valid value, so that the value of each system is consistent, and add
descriptions to make the hidden code makes sense, and to provide purified (cleansed)
(5) for the ad hoc analysis and predefined reports to provide a reasonable query
response time (due to a data mart is a departmental level, compared with the huge data
warehouse in terms of its query and analysis, will greatly shorten the response time).

Third, data warehouse design methodology
Prior to the establishment in the data warehouse will take into account the specific
methods, usually top-down, bottom-up and two of three to achieve such an integrated
program carried out, do the following brief description of their respective:
(1) top-down implementation
Top-down phase of the project is to implement a single data warehouse. Top-down
implementation needs to do more in the beginning of the project planning and design
work. This needs to be involved in data warehouse implementation of each working
group, department or business line staff. To use the data source, security, data
structure, data quality, data standards and the data model of the decision-making
generally required before the start of the real implementation is complete.
(2) the realization of bottom-up
Bottom-up containing data warehouse implementation plans and designs, without
waiting for a good placement more business data warehouse design. This does not
mean business will not develop a greater range of data warehouse design; With the
expansion of the initial data warehouse implementation, will gradually increase its
construction. Now, the method has been top-down approach than the more widely
accepted as a direct result of the data warehouse can be achieved and can be used to
expand the scope of business to achieve greater proof.
(3) a compromise
Each method has advantages and disadvantages to achieve. In many cases, the best
approach may be a combination of two. One of the keys of the method is to determine
the business structure required to support integrated planning and design level,
because the data warehouse is the method of bottom-up construction. The use of
bottom-up or stage project model data warehouse architecture to build a series of
business Fanwei data mart, you can integrate one by one in different thematic areas of
business data marts to form a well-designed business data warehouse. This method
can be applied to business excellent. In this approach, data marts can be understood as
the logic of the entire data warehouse system subset, in other words the data
warehouse is the harmonization of the collection of data marts. Such programs are
normally the following steps to implement these steps:
(6) defined from the perspective of the business plan and demand
(7) Construction of a complete storage architecture
(8) make the data consistent and standardized
(9) data warehouse as a data mart to the implementation of super-
Inmon and Kimball on great debate:
Ralph Kimball and Bill Inmon has been an innovator in the field of business
intelligence, developed and tested a new technology and architecture.
Bill Inmon data warehouse is defined as "a subject-oriented, integrated,
time-varying, non-volatile to support management decision-making process of data
collection"; him through the "subject-oriented," said
the theme should be around organizational data warehouse of data, such as customers,
sales, product and so on. Each subject area contains only the information relevant to
the subject. Data warehouse should be the first to add a theme, and when you need
easy access to many themes, it should create a data warehouse as the source of the
data mart. In other words, a specific data mart all the data should come from
subject-oriented data storage. Inmon's approach includes more work
reduced the initial visit for information. But he believes that a centralized architecture
to continue to provide greater consistency and flexibility, and in the long run will
actually save resources and work. The figure is a diagram of his design:

Ralph Kimball said, "constitute the only data warehouse data mart of its
joint", he believes that "the same dimensions by a series of data
marts incrementally build a data warehouse." Each data mart will join
multiple data sources to meet specific business needs. Through the use of
"consistent" dimension, to realize that the information in
different data marts, which means that they have defined the elements of the public.
Design as shown below:

Kimball's approach will provide integrated data to answer pressing
business questions and organize faster than the Inmon approach. Inmon approach is to
build only a few single subject area, the centralized data warehouse was to create data
marts. The Kimball that the lack of flexibility of the method and in the current
business environment it takes too long.
In fact, the choice of methods depends on the project's main business
drivers. If the organization is suffering from poor data management and inconsistent
data, or wish to lay a good foundation for the future, then the Inmon approach would
be better. If an urgent need for the organization to provide information to the user,
then the method of Kimball will meet the demand. Once the information to meet the
urgent requirements, you should consider including an independent data warehouse
data architecture of the conversion plan. Data mart and data warehouse will enable
legacy systems and OLTP system isolation, and supports faster data mart to create the
future. As the data warehouse development in the whole task has been undertaken, so
it will support a strong focus on data marts. In fact based on the needs of
business-driven, using the above three kinds of design in the last methods: top-down
and bottom-up integrated program will adapt well to establish the process of data
warehouse needs.

4, data warehouse and data mart difference
Data Warehouse is the enterprise-class, able to run the various departments across the
enterprise to provide decision support tools; the data mart is a miniature data
warehouse, it usually has less data, less subject area, and less historical data, it is
sector-level, generally only for a local area management services, also known as
sector-level data warehouse. Data warehouse and data marts as illustrated the
difference between:
Data warehouse and data mart difference can be understood from the following three
(1) data warehouse to provide data to various data marts
(2) several departments to form a data warehouse data mart

(3) the following characteristics of its data content analysis, data warehouse using
standardized data structure model, data marts in the data structure using star schema,
the data warehouse is usually the particle size particle size than the market should be
fine, Figure reflect the characteristics of the data structure and data content of the
difference between

5, data warehouse and data mart modeling modeling
Data is all business activities, resources and business records of the results. Data
model is well-organized the data abstraction, data model so as to understand and
manage their business the best way is extremely natural. Data model or plan to play a
guiding role in the realization of the data warehouse. Before the start of the real
implementation, the joint data model for each business can help ensure that their
results are valid data warehouse, and can help reduce implementation costs.
(1) data warehouse modeling
Data warehouse data modeling is to convert the picture, and support needs of those
who demand that the process of metadata. For readability purposes, this article will
discuss on the needs and modeling of phase separation, but in practice these steps
often overlap. Once recorded in the document some of the initial demand, began to
shape the initial model. As demand becomes more complete, the model will be, too.
The most important thing is to provide good end-users to integrate and easy to
interpret the data warehouse logical model. The logical model of data warehouse
metadata is one of the core. Simplicity for end-users as well as historical data
integration and modeling the joint is the key principle should be to help provide.
(2) data mart data modeling
Because the warehouse end-users to interact directly with the data mart, data mart
modeling it is the end-user business requirements to capture the most effective tools.
Data mart modeling process depends on many factors. The following describes the
three most important:
Data mart is the end-user-driven modeling. End-users must be involved in data mart
modeling process, because they obviously want to use the data mart people. Because
you should expect the end users do not familiar with the complex data model, 所以
should be modeling techniques and modeling process as a whole Jinhangzuzhi to
make the complex nature of the end user transparency.
Data mart modeling driven by business needs. Data mart model is useful for capturing
business requirements, because they are usually used directly by the end users, and
easy to understand.
Data mart modeling greatly influenced by the impact of the data analysis. Data
analysis techniques can influence the choice of the type of data model and its contents.
Currently, there are several commonly used data analysis techniques: query and
reporting, multidimensional analysis and Data Mining.
If only the intent to provide query and report production capabilities, then with normal
(normalized) or informal (denormalized) data structures ER model is most appropriate.
Dimensional data model may be a better choice because it is user friendly and has
better performance. If the goal is the implementation of multidimensional data
analysis, then the dimensions of the data model is that it is the only choice. However,
data mining is usually the lowest available level detail (level of detail) work best.
Therefore, if the data warehouse for data mining, they should be detailed in the model
include lower grade (level of detail) of data.

To top