Data warehouse and data mart Read a lot of data warehouse information, involve a &quot;data mart&quot; this view, the beginning of the data warehouse and data mart must understand the difference between more superficial, now be a thorough summary of the main from the following Several aspects of formulation: (1) basic concepts (2) Why make a data mart (3) data warehouse design methodology (4), data marts and data warehouses difference (5) Warehouse Modeling and Market Modeling (6) Case Study: Telecom CRM data warehouse Bill Inmon said something called &quot;IT managers are faced with in the end the most important issue is to first establish a data warehouse or data mart to first establish a&quot; clear enough to explain the relationship between the two is very important and urgent! Usually before considering the establishment of data warehouse, will involve the following questions: (1) to take top-down or bottom-up design (2) enterprise-wide or sector-wide (3) to first establish a data warehouse or data mart (4) to establish or direct the implementation of pilot system (5) whether the independent data marts ? First, the basic concept The term data warehouse is still no uniform definition of well-known experts in WH Inmon data warehouse, in his book &quot;Buildingthe Data Warehouse&quot; to the following description of the book: Data Warehouse (Data Warehouse) is a subject-oriented (SubjectOri2ented), integrated The (Integrate), relatively stable (Non-Volatile), reflect the historical changes (TimeVariant) data collection to support management decisions. The concept for the data warehouse can be read on two levels, first of all, the data warehouse to support decision-making, for analytical data processing, which is different from their existing operational databases; Second, the data warehouse is the number of different effective integration of configuration data source, after integration was restructured in accordance with the theme, and includes historical data, and stored in the data warehouse data are generally not amended. To achieve maximum flexibility, integration of data warehouse data should be stored in standard RDBMS, by and through standard database design, and in order to increase performance of some summary information and non-standard design Xing. This type of data warehouse design is known as the atomic data warehouse. Subset of atomic data warehouse, also known as data marts. The main purpose of atomic warehouse data mart as a basis for the work, but also as a reference data warehouse. Atomic size of the warehouse, centrally located and database design may not be able to meet the diverse needs of specific types of users. The subset of all data marts that are copied to other computers can be used as their own data warehouse. Data mart can produce them as large as the atomic data warehouse, or even more. They can be located near the atomic data warehouse or distribution to a location closer to the user, place where depends on the use and communication costs. Data marts are used to meet specific application needs of users of data warehouses, their size may reach several hundred GB. The key to a data mart is its use of objective, scope, rather than size. Data mart can be understood as a small department or work group-level data warehouse. There are two types of data marts (below): ? Independent (directly from the operating environment for the data type): These data mart is a specific working group, department or business line of control, and fully meet their needs for the built. In fact, they even work with other groups, departments or business lines in the data mart without any connectivity Dependent type (from the enterprise data warehouse for data): This data mart often distributed ways. Although the different data marts is a specific working group, department or production line to achieve, but they can be integrated, interconnected to provide a more global view of business data. In fact, the highest integration level, they can be business data warehouse. This means that a sector end-users can access and use of data marts in other sectors, the data ? Second, why do they move data mart While OLTP and legacy systems have valuable information, but may be difficult to extract from these systems meaningful information and speed slower. And although these systems generally support the operation of pre-defined reports, but often can not support an organization for the history, united, smart, or easy access to the information needs. Because the data distribution system and in many cross-platform table, and usually is &quot;dirty&quot;, contains a inconsistent and invalid value, make it difficult to analyze. Data mart will be combined in different systems of data sources to meet business information needs. Effectively be achieved if the data mart will be able to quickly and easily access information and systems and a simple view of history. A well-designed data mart has the following features (some features of the data warehouse also has some features relative to the data warehouse in terms of): (1) the information needed to specific user groups, usually a department or a specific organization of the user, without the need for a large number of source systems subject to demand and operational crisis (like the data warehouse). (2) to support access to non-volatile (nonvolatile) business information. (Non-volatile information is updated at scheduled intervals, and from OLTP systems in the update.) (3) reconcile the organization from multiple operating system information, such as accounting, sales, inventory and customer management, and external to the organization of industry data. (4) by default valid value, so that the value of each system is consistent, and add descriptions to make the hidden code makes sense, and to provide purified (cleansed) data. (5) for the ad hoc analysis and predefined reports to provide a reasonable query response time (due to a data mart is a departmental level, compared with the huge data warehouse in terms of its query and analysis, will greatly shorten the response time). Third, data warehouse design methodology Prior to the establishment in the data warehouse will take into account the specific methods, usually top-down, bottom-up and two of three to achieve such an integrated program carried out, do the following brief description of their respective: (1) top-down implementation Top-down phase of the project is to implement a single data warehouse. Top-down implementation needs to do more in the beginning of the project planning and design work. This needs to be involved in data warehouse implementation of each working group, department or business line staff. To use the data source, security, data structure, data quality, data standards and the data model of the decision-making generally required before the start of the real implementation is complete. (2) the realization of bottom-up Bottom-up containing data warehouse implementation plans and designs, without waiting for a good placement more business data warehouse design. This does not mean business will not develop a greater range of data warehouse design; With the expansion of the initial data warehouse implementation, will gradually increase its construction. Now, the method has been top-down approach than the more widely accepted as a direct result of the data warehouse can be achieved and can be used to expand the scope of business to achieve greater proof. (3) a compromise Each method has advantages and disadvantages to achieve. In many cases, the best approach may be a combination of two. One of the keys of the method is to determine the business structure required to support integrated planning and design level, because the data warehouse is the method of bottom-up construction. The use of bottom-up or stage project model data warehouse architecture to build a series of business Fanwei data mart, you can integrate one by one in different thematic areas of business data marts to form a well-designed business data warehouse. This method can be applied to business excellent. In this approach, data marts can be understood as the logic of the entire data warehouse system subset, in other words the data warehouse is the harmonization of the collection of data marts. Such programs are normally the following steps to implement these steps: (6) defined from the perspective of the business plan and demand (7) Construction of a complete storage architecture (8) make the data consistent and standardized (9) data warehouse as a data mart to the implementation of super- Inmon and Kimball on great debate: Ralph Kimball and Bill Inmon has been an innovator in the field of business intelligence, developed and tested a new technology and architecture. Bill Inmon data warehouse is defined as &quot;a subject-oriented, integrated, time-varying, non-volatile to support management decision-making process of data collection&quot;; him through the &quot;subject-oriented,&quot; said the theme should be around organizational data warehouse of data, such as customers, sales, product and so on. Each subject area contains only the information relevant to the subject. Data warehouse should be the first to add a theme, and when you need easy access to many themes, it should create a data warehouse as the source of the data mart. In other words, a specific data mart all the data should come from subject-oriented data storage. Inmon&#39;s approach includes more work reduced the initial visit for information. But he believes that a centralized architecture to continue to provide greater consistency and flexibility, and in the long run will actually save resources and work. The figure is a diagram of his design: ? Ralph Kimball said, &quot;constitute the only data warehouse data mart of its joint&quot;, he believes that &quot;the same dimensions by a series of data marts incrementally build a data warehouse.&quot; Each data mart will join multiple data sources to meet specific business needs. Through the use of &quot;consistent&quot; dimension, to realize that the information in different data marts, which means that they have defined the elements of the public. Design as shown below: ? ?Kimball&#39;s approach will provide integrated data to answer pressing business questions and organize faster than the Inmon approach. Inmon approach is to build only a few single subject area, the centralized data warehouse was to create data marts. The Kimball that the lack of flexibility of the method and in the current business environment it takes too long. In fact, the choice of methods depends on the project&#39;s main business drivers. If the organization is suffering from poor data management and inconsistent data, or wish to lay a good foundation for the future, then the Inmon approach would be better. If an urgent need for the organization to provide information to the user, then the method of Kimball will meet the demand. Once the information to meet the urgent requirements, you should consider including an independent data warehouse data architecture of the conversion plan. Data mart and data warehouse will enable legacy systems and OLTP system isolation, and supports faster data mart to create the future. As the data warehouse development in the whole task has been undertaken, so it will support a strong focus on data marts. In fact based on the needs of business-driven, using the above three kinds of design in the last methods: top-down and bottom-up integrated program will adapt well to establish the process of data warehouse needs. 4, data warehouse and data mart difference Data Warehouse is the enterprise-class, able to run the various departments across the enterprise to provide decision support tools; the data mart is a miniature data warehouse, it usually has less data, less subject area, and less historical data, it is sector-level, generally only for a local area management services, also known as sector-level data warehouse. Data warehouse and data marts as illustrated the difference between: Data warehouse and data mart difference can be understood from the following three aspects: (1) data warehouse to provide data to various data marts (2) several departments to form a data warehouse data mart ? (3) the following characteristics of its data content analysis, data warehouse using standardized data structure model, data marts in the data structure using star schema, the data warehouse is usually the particle size particle size than the market should be fine, Figure reflect the characteristics of the data structure and data content of the difference between ? ? 5, data warehouse and data mart modeling modeling Data is all business activities, resources and business records of the results. Data model is well-organized the data abstraction, data model so as to understand and manage their business the best way is extremely natural. Data model or plan to play a guiding role in the realization of the data warehouse. Before the start of the real implementation, the joint data model for each business can help ensure that their results are valid data warehouse, and can help reduce implementation costs. (1) data warehouse modeling Data warehouse data modeling is to convert the picture, and support needs of those who demand that the process of metadata. For readability purposes, this article will discuss on the needs and modeling of phase separation, but in practice these steps often overlap. Once recorded in the document some of the initial demand, began to shape the initial model. As demand becomes more complete, the model will be, too. The most important thing is to provide good end-users to integrate and easy to interpret the data warehouse logical model. The logical model of data warehouse metadata is one of the core. Simplicity for end-users as well as historical data integration and modeling the joint is the key principle should be to help provide. (2) data mart data modeling Because the warehouse end-users to interact directly with the data mart, data mart modeling it is the end-user business requirements to capture the most effective tools. Data mart modeling process depends on many factors. The following describes the three most important: Data mart is the end-user-driven modeling. End-users must be involved in data mart modeling process, because they obviously want to use the data mart people. Because you should expect the end users do not familiar with the complex data model, 所以 should be modeling techniques and modeling process as a whole Jinhangzuzhi to make the complex nature of the end user transparency. Data mart modeling driven by business needs. Data mart model is useful for capturing business requirements, because they are usually used directly by the end users, and easy to understand. Data mart modeling greatly influenced by the impact of the data analysis. Data analysis techniques can influence the choice of the type of data model and its contents. Currently, there are several commonly used data analysis techniques: query and reporting, multidimensional analysis and Data Mining. If only the intent to provide query and report production capabilities, then with normal (normalized) or informal (denormalized) data structures ER model is most appropriate. Dimensional data model may be a better choice because it is user friendly and has better performance. If the goal is the implementation of multidimensional data analysis, then the dimensions of the data model is that it is the only choice. However, data mining is usually the lowest available level detail (level of detail) work best. Therefore, if the data warehouse for data mining, they should be detailed in the model include lower grade (level of detail) of data. ?