"Primer on Data Warehousing"
Chapter 1 Introduction 1.1 General Concepts This thesis paper deals with the design and implementation of an integrated data warehouse for the telecommunication industries in Bangladesh. An enormous expansion of databases in almost every area of human endeavor has created a great demand for new, powerful tools for turning data into useful, task-oriented knowledge. Data warehousing is one of the most widely used methods for the extraction of useful knowledge from a plethora of data accumulated in the ever expanding databases. The construction of data warehouse (DW) involves data cleaning, data integration and data transformation and can be viewed as an important preprocessing step for data mining. Moreover, data warehouses provide on-line analytical processing (OLAP) tools for the interactive analysis of multidimensional data of varied granularities, which facilitates effective data generalization and data mining. Many other data mining functions, such as association, classification, prediction, and clustering, can be integrated with OLAP operations to enhance interactive mining of knowledge at multiple levels of abstraction. Hence, the data warehouse has become an increasingly important platform for data analysis and on-line analytical processing. Therefore, data warehousing and OLAP form an essential step in the knowledge discovery process. Slackly speaking, a data warehouse refers to a database that is maintained separately from an organization‟s operational databases. Data warehouse systems allow for the integration of a variety of application systems. They support information processing by providing a solid platform of consolidated historical data for analysis. According to William H. Inmon, a leading architect in the construction of data warehouse systems, “A data warehouse is a subject-oriented, integrated, time-variant, and  nonvolatile collection of data in support of management‟s decision making process” . This short, but comprehensive definition presents the major features of a data warehouse. In their book Data Mining: Concepts and Techniques Jiawei Han and Micheline Kamber opines that four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. According to them these salient features of the data warehouse are: Subject-oriented: A data warehouse is organized around major subjects, such as customer, supplier, product, and sales. Rather than intent on the day-to-day operations and transaction processing of a business, a data warehouse focuses on the modeling and scrutiny of data for decision makers. Hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. Integrated: A data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. Data cleaning and data integration techniques are applied to ensure consistency in naming conventions, attribute measures, and so on. Thus a data warehouse combines multiple heterogeneous data sources under a unified schema and provides a homogeneous querying capability. Time-variant: Data are stored to provide information from a historical perspective by arranging the data according to specific time periods. Every key structure in the data warehouse contains, either implicitly or explicitly, an element of time. Nonvolatile: A data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. Due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. It usually requires only two operations in data accessing: initial loading of data and access of data . In sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. It is the main difference between a data warehouse and other business databases. Trying to use the data contained in the operational databases to carry out relevant statistical analysis for the business is almost impossible. On the other hand, a data warehouse is built with the specific aim of finding relevant information or knowledge in mind. Often the data warehouse is a copy of operational data with value-added data (e.g., summaries, history) which makes it easier to apprehend. It outperforms traditional query driven approach to learning in terms of efficiency, clarity and celerity. High query performance of data warehouses are accompanied by the fact that queries are not visible outside the warehouse and that local processing at sources are unaffected. These features make the data warehouse a hugely popular and a formidable analytical tool. The architecture of a typical data warehouse and its environment is shown in Fig. 1. The main components of a data warehouse include a database; data loading and extracting tools; administration and management platforms; and application tools such as data mining, query, and reporting, visualization and display, and so on . Fig-1: Architecture of a typical data warehouse Of the various querying and reporting tools one of the most efficient and widely used tools is the Online Analytical Processing or OLAP. OLAP offers analytical modeling capabilities, including a calculation engine for deriving ratios, variance, and so on, and for computing measures across multiple dimensions. It can generate summarizations, aggregations, and hierarchies at each granularity level and at every dimension intersection. OLAP also supports functional models for forecasting, trend analysis, and statistical analysis. In this context, an OLAP engine is a powerful data analysis tool. Telecommunication companies have traditionally been an ardent user of data warehouses. Carriers rely on scrutiny of terabytes of Call Detail Record (CDR) data to enable them to make critical management decisions that will aid them in the realization of their profit aspirations. High-end data warehouses and powerful Business Intelligence (BI) solutions are thus becoming indispensable tools to help carriers meet profit goals. Analyzing and integrating in-depth data enables carriers to reduce revenue leakage and churn, mitigate fraud, optimize network usage and increase profits. Interestingly, as mobile penetration is mounting and even impending saturation, the focus of Telecom Business Intelligence is shifting from customer acquisition to customer  retention . It has been estimated that it is much cheaper to retain an existing  customer than to acquire a new one . They also require pertinent knowledge of the shared characteristics of their most profitable customer base and inherent expectations of the customers in order to reduce churn rate. Executives are more and more challenged to make quick, well informed decisions that address growing business issues and regulatory standards. Today‟s Telecommunications Operators face growing challenges such as: Continuing competition from global competitors , high churn rates in increasingly penetrated markets , a systemic decline in core voice telephony revenues etc. In response to these forces, many telecommunications operators are striving to obtain and maintain market leadership by: Increasing emphasis on internal growth in core businesses with market leadership , shifting focus from market share and acquisition to customer penetration, profitability and retention, lowering provisioning and network costs through downsizing and increased utilization , developing, bundling and selling more products in shorter timeframes, improving customer relationships to build brand loyalty, empowering staff through governance processes , restructuring the organization so it is more responsive to support market needs and customer requirements  . In this paper we have described the implementation of an integrated data warehouse for the telecommunication companies in Bangladesh .Our integrated data warehouse provides a common framework based upon which temporal data from different vendors rendering their services in the telecommunication sector in Bangladesh have been simulated in gathering , cleaning, transforming and loading into a common data warehouse implemented through a snowflake schema and which provides analytical options with the implementation of OLAP creating user queried materialized views  . 1.2 Problems In the course of designing and implementing the data warehouse various problems were faced. These are listed as follows: The initial problem was concerned with the determination of the appropriate schema design. Although the star schema was a viable and widely used option the snowflake schema was deemed more appropriate for the design of the data warehouse. The next problem was concerned with the selection of features in the fact table. This selection had to both include the necessary fields to gather useful intelligence from the Call Record Detail (CDR) and simultaneously had to exclude redundancies. The next problem was concerned with the selection of the dimension tables. These had to include all the necessary demographic and economic information for useful knowledge discovery. The next problem was faced in deciding which sort of partitioning to use for the efficient functioning of the data warehouse. The main type of partitioning used in building the data warehouse was range partitioning instead of hash, list and composite partitioning. This was both efficient and required less processing power. The next problem was faced in the generation of the data simulating the heterogeneous data sources. The data sources had to have some marked dissimilarity to truly imitate the real life scenario. The next problem was faced while loading the disparate data in the fact table. An efficient algorithm was used to transform the incongruent data into the unified fact table. The next problem was faced while making the extracted raw data usable for applying to the data warehouse. The next problem was faced in creating the various materialized views. A user- friendly interface was designed to aid the whole process of querying. The next problem was faced while synchronizing the user-interface with all the required function of the whole development of the data warehouse from data generation to extraction, transformation, loading (ETL) to the creation and updating of the fact table and the materialized views accordingly. 1.3 Objectives The main objective of the thesis is to design and implement an integrated data warehouse for the telecom industries in Bangladesh simulating the heterogeneous data sources and providing a common interface for the analysis and knowledge discovery by a third party. This project is aimed to explore the possible difficulties in designing and implementing an integrated data warehouse for the telecommunication companies in Bangladesh and device possible ways to get around those problems. The following objectives are set for our thesis endeavor: To design a schema that would enable the analysis of various heterogeneous data sources. To generate random data simulating the real life scenario of disparate Call Detail Records (CDR) of different vendors. To load all the heterogeneous data in to the unified fact table. To provide a common interface for the analysis and knowledge discovery on the generated heterogeneous data. 1.4 Organization of the thesis Chapter 2 Literature survey 2.1 Data warehouse(existing technologies) 2.1.1 Data Warehouse Technology One of the most important assets of any organization is its information. This asset is almost always kept by an organization in two forms: the operational systems of record and the data warehouse. Crudely speaking, the operational systems are where the data is put in, and the data warehouse is where we get the data out. A data warehouse is a repository (or archive) of information gathered from multiple  sources, stored under a unified schema, at a single site . Once gathered, the data are stored for a long time, permitting access to historical data. Thus data warehouses provide the user a single consolidated interface to data, making decision support queries easier to write. Moreover, by accessing information for decision support from a data ware house, the decision maker ensures that online transaction-processing system are not affected by the decision-support workload. 2.1.2 Goals of a Data Warehouse Now a day‟s data warehouse has become an essential part of information system of a business organization. A data warehouse has some particular goals to meet the business requirement of a corporate body. They are described below- a) The data warehouse must make an organization‟s information easily accessible. The contents of the data warehouse must be understandable. The data must be intuitive and obvious to the business user, not merely the developer. b) The data warehouse must present the organization‟s information consistently. The data in the warehouse must be credible. Data must be carefully assembled from a variety of sources around the organization, cleansed, quality assured, and released only when it is fit for user consumption. c) The data warehouse must be adaptive and resilient to change. User needs, business conditions, data, and technology are all subject to the shifting sands of time. The data warehouse must be designed to handle this inevitable change. Changes to the data warehouse should be graceful, meaning that they don‟t invalidate existing data or application. d) The data warehouse must be a secure bastion that protects information assets. e) The data warehouse must serve as the foundation for improved decision making. The data warehouse must have the right data in it to support decision making. f) The data warehouse must provide enough facility for reporting as well as analysis. 2.1.3 Components of a Data Warehouse A data warehouse environment needs some specific components. Each warehouse component serves a specific function. There are four separate and distinct components to be considered. They are a) Operational Source System. b) Data Staging Area. c) Data Presentation Area. d) Data Access Tools. 22.214.171.124 Operational Source System: Operational source system are developed to a capture and process original business transaction. These systems are designed for data entry, not for reporting, but it is from here the data in data warehouse gets populated. Figure: Basic component of the data warehouse The main priorities of the source system are processing performance and availability. In a source driven architecture for gathering data, the data sources transmit new information, either continually or periodically. In a destination driven architecture, the data warehouse periodically sends requests for new data to the sources. 126.96.36.199 Data Staging Area: The data staging area of the data warehouse is both a storage area and a set of processes  commonly referred to as extract-transformation-load (ETL) . The data staging area is everything between the operational source systems and the data presentation area. It is typically not accessible to users. It is accessible only to skilled professionals. In this area raw operational data is transformed into a warehouse deliverable fit for user query and consumption. Data staging is a major process that includes the following sub procedure: #Extraction: Extraction is the first step in the process of getting data into the data warehouse environment. Extracting means reading and understanding the source data and copying the data needed for the data warehouse into the staging area for further manipulation. #Transformation: Once the data is extracted into data staging area, there are many transformation steps, including a) Cleansing the data by correcting misspellings, resolving domain conflicts, dealing with missing elements, or parsing into standard formats. b) Purging selected fields from the legacy data that are not useful for data warehouse. c) Combining data sources by matching exactly on key values or by performing fuzzy matches on non key attributes. d) Creating surrogates keys for each dimension record in order to avoid dependency on legacy defined keys, where surrogates key generation process enforces referential integrity between the dimensional tables and fact tables. e) Building the aggregates for boosting the performance for common query. #Loading & indexing: At the end of the transformation process, the data is in the form of load record images. Loading in the data warehouse environment usually take the form of replicating the dimensional tables and fact tables and presenting this tables to bulk loading facilities of each recipient data mart. Bulk loading is a very important capability that is to be contrasted with record at a time loading, which is far slower. The target data mart must then index the newly arrived data for query performance. 188.8.131.52 Data Presentation Area: The data presentation area is where data is organized, stored, and made available for direct querying by users, report writers, and other analytical application. It is all the business community sees and touches via data access tools. The data presentation area is mainly made by series of integrated data marts. A data mart presents the data from a  single business process . These business processes cross the boundaries of organizational functions. Data marts contain detailed and atomic data. Atomic data is required to withstand assaults from unpredictable ad hoc user queries. The data marts also may contain performance-enhancing summary data, or aggregates. All the data marts must be built using common dimensions and facts. Dimensional data marts are organized by subject area such as finance, sales and marketing and coordinated by data category such as customer, product and location. These flexible information stores allows data structure to respond to business changes such as product line addition, new staff responsibilities, mergers, consolidation and acquisition. 184.108.40.206 Data Access Tools The final major component of the data warehouse environment is the data access tools. All data access tools query the data in the data warehouse‟s presentation area. A data access tool can be as simple as an ad hoc query tool or as complex as a sophisticated data  mining or modeling application . Ad hoc query tools, as powerful as they are, can be understood and used effectively only by a small percentage of the potential data warehouse business user population. The majority of the business user base likely will access the data via prebuilt parameter-driven analytic applications. 2.1.4 Warehouse Schemas Data warehouses typically have schemas that are designed for data analysis, using tools such as OLAP tools. Thus, data are usually multidimensional data, with dimension attributes and measure attributes. For this reason, two types of tables are used: Fact tables and Dimension tables. 220.127.116.11 Fact Table: Tables containing multidimensional data are called fact tables and are usually very large.  These are a used to record actual facts and measures in business . Facts are numeric data items that are of interest to the business. Example: telecommunication – length of call in minutes, average number of calls. 18.104.22.168 Dimension Table To minimize storage requirements, dimension attributes are usually short identifiers that are foreign keys into other tables called dimension tables. Dimension tables establish the  context of the facts . Dimension tables store fields that describe the facts. Example, telecommunication- call origin, call destination. Using both fact table and dimension table a data warehouse schema is built. There are two types of data warehouse schema. They are- Star schema and Snowflake schema. 22.214.171.124 Star Schema The star schema (sometimes referenced as star join schema) is the simplest style of data warehouse schema. The star schema consists of a few "fact tables" (possibly only one, justifying the name) referencing any number of "dimension tables". The star schema is considered an important special case of the snowflake schema. Main characteristics of a star schema are given below- One fact table. De-normalized dimension table. One column per level/attributes. Simple and easy overview => ease of use. Relatively flexible. Fact table is normalized. Dimension tables are often relatively small, Recognized by many RDBMSs. Good performance. Hierarchies are hidden in the columns. Figure: The Star Schema 126.96.36.199 Snowflake schema A snowflake schema is a logical arrangement of tables in a relational database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions. In the snowflake schema, however, dimensions are normalized into multiple related tables whereas the star schema's dimensions are de- normalized with each dimension being represented by a single table. When the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and where child tables have multiple parent tables, a complex snowflake shape starts to emerge. The "snow flaking" effect only affects the dimension tables and not the fact tables. Main characteristics of snowflake schema are given below- Dimensions are normalized. One dimension table per level. Each dimension table has integer key, level name and one column per attribute. Hierarchies are made explicit/visible. Very flexible. Dimension table use less space. Harder to use due to many joins. Worse performance Figure: The Snowflake Schema Both type of schema have advantages and disadvantages. A data warehouse designer should chose schema wisely because it has big effect on the other stages of designing. 2.1.5 OLAP (Online Analytical Processing) Online analytical processing or OLAP is an approach to quickly answer multidimensional analytical queries. OLAP is part of the broader category of business intelligence, which also encompasses relational reporting and data mining. OLAP tools support interactive analysis of summary information. Databases configured for OLAP use a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. At the core of any OLAP system is the concept of an OLAP cube (also called a multidimensional cube or a hypercube). It consists of numeric facts called measures which are categorized by dimensions. The cube metadata is typically created from a star schema or snowflake schema of tables in a relational database. Measures are derived from the records in the fact table and dimensions are derived from the dimension tables. OLAP systems have been traditionally categorized using the following taxonomy. 188.8.131.52 MOLAP (Multidimensional) MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database. Therefore it requires the pre-computation and storage of information in the cube - the operation known as processing. 184.108.40.206 ROLAP (Relational) ROLAP works directly with relational databases. The base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information. Depends on a specialized schema design. 220.127.116.11 HOLAP (Hybrid) There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data. 2.2 Data Warehouse Development Methodologies Building a data warehouse is a very challenging issue because compared to software engineering it is quite a young discipline and does not yet offer well-established strategies and techniques for the development process. Current data warehouse development methods can fall within three basic groups: data-driven, goal-driven and user-driven. They are described below. 2.2.1 Data-Driven Methodologies Bill Inmon, the founder of data warehousing argues that data warehouse environments are data driven, in comparison to classical systems, which have a requirement driven  development life cycle . He states that requirements are the last thing to be considered in the decision support development lifecycle, they are understood after the data warehouse has been populated with data and results of queries have been analyzed by users. The data warehouse development strategy is based on the analysis of the corporate data model and relevant transactions. The approach ignores the needs of data warehouse users a priori. Company goals and user requirements are not reflected at all. User needs are integrated in the second cycle. Golfarelli, Maio and Rizzi propose a semi-automated methodology to build a dimensional data warehouse model from the pre-existing E/R schemes that represent operational databases . The data-driven development methodology is recommended for data mining and data exploration purposes. The bottom-up approach exploits the database and is suited for tailor made measurement. The data-driven development methodology is particularly suited for production workflows. These workflows generate a high business value, have a high degree of repetition, are customer focused, often time critical and therefore require tight and close monitoring. 2.2.2 Goal-Driven Methodologies Böhnlein and Ulbrich-vom Ende present an approach that is based on the SOM (Semantic Object Model) process modeling technique in order to derive the initial data warehouse  structure . The first stage of the derivation process determines goals and services the company provides to its customers. Then the business process is analyzed by applying the SOM interaction schema that highlights the customers and their transactions with the process under study. In a third step sequences of transactions are transformed into sequences of existing dependencies that refer to information systems. The last step identifies measures and dimensions: One has to find enforcing (information request) transactions for measures and get dimensions from existing dependencies. In our opinion this highly complex approach works only well when business processes are designed throughout the company and are combined with business goals. Kimball proposes a four-step approach where he starts to choose a business process, takes  the grain of the process, and chooses dimensions and facts . He defines a business process as a major operational process in the organization that is supported by some kind of legacy system (or systems). The goal-driven development methodology supports modern management methods and is a foundation for decision support at all organizational levels. The level of granularity is much higher compared to that of the data-driven approach. While the Process Warehouse based on the goal-driven development methodology measures only the cycle time for business processes and has only one cube, the Process Warehouse based on the data- driven development methodology measures the duration of all process and activity states as well as the workload of departments and roles. End-users are rarely involved. They are only required when operational detail matters. As the model is aligned with the corporate strategy, it is very stable. Measures and dimensions are balanced: financial, non-financial, qualitative and quantitative aspects are considered. A lot of data sources are integrated, because a holistic approach is based on all aspects of an organization. 2.2.3 User-Driven Methodologies Westerman describes an approach that was developed at Wal-Mart and has its main focus on implementing business strategy . The methodology assumes that the company goal is the same for everyone and the entire company will therefore be pursuing the same direction. It is proposed to set up a first prototype based on the needs of the business. Business people define goals and gather, priorities as well as define business questions supporting these goals. Afterwards the business questions are prioritized and the most important business questions are defined in terms of data elements, including the definition of hierarchies. Although the Wal-Mart approach focuses on business needs, business goals that are defined by the organization are not taken into consideration at all. Poe proposes a catalogue for conducting user interviews in order to collect end user  requirements . She recommends interviewing different user groups in order to get a complete understanding of the business. The questions cover a very board field and include also topics like job responsibilities. Basically, a monopolization of this user-driven development methodology is risky and must be avoided, as it generates performance information that reflects the organizational level of the people involved. Therefore, selected measures, dimensions, the level of granularity and the targeting level of the organizational hierarchy are very unstable. The methodology has a bottom-up tendency, because most employees do not see the organization from a broad angle, theirs is a narrow-minded, egocentric point of view. The project duration may be long-winded and very costly, as project participants request long discussions on a lot of unnecessary measures and dimension. This development methodology may well raise acceptance of a system, but must be combined with the data-driven or goal-driven development methodology in order to improve the longevity of the system. The more a system suffers rejection, the more user involvement is required beside a focus on organizational strategies or the corporate data model. 2.3 Existing related work done in telecom industry Beginning in the late 1990‟s, data integration moved from he lab into the commercial arena. Today, this industry known as Enterprise Information Integration (EII). The vision underlying this industry is to provide tools for integrating data from multiple sources without having to first load all the data into a central warehouse as required by previous solutions. A collection of short articles by some of the players in this industry appears in  literature . A data integration scenario started with identifying the data sources that will participate in the application, and then building a mediated schema (often called a virtual schema) which would be queried by users or applications, and building semantic mappings from the data sources to the mediated schema . Data integration products are offered by most major DBMS vendors, and are also playing a significant role in the business analytics products (e.g., Actuate and Hyperoll). [15, 16, and 17] Personal Information Management is also an application where data integration is taking a significant role . SCORE is an information integration approach that focuses on integrating  structured and unstructured data . Information integration solutions typically require the application to formulate the SQL logic to retrieve the needed structured data on one hand, and identify a set of keywords to retrieve the related unstructured data on the other. SCORE proposes a novel approach wherein the application specifies its information needs using only a SQL query on the structured data, and this query is automatically “translated” into a set of keywords that can be used to retrieve relevant unstructured data. Korea Telecom‟s Call Data Analysis Team made SIMS (Strategic Information Management System), a prototype Data Warehouse System for telecommunications pricing strategy . Telecom Italia has their own data warehouse project IBDA which now consists of 52 databases  . LGR Telecommunications, a specialized solutions provider to the global telecommunications industry, offering a unique business solution that taps directly into the source of each customer interaction with the network by accessing the call data record (CDR) it creates. LGR‟s approach intelligently captures CDR data in real time, appends additional business information to the record, stores the data within a comprehensive Oracle data warehouse solution, and provides real-time analysis to the telecom service  provider . Other telecomm companies using Oracle‟s Data Warehouse includes Telefonica Germany, Turkcell Telecommunications, Mobiltel (Bulgaria), Anhui Telecom Company Ltd (China) and others . In Bangladesh Grameenphone engaged Oracle Certified Advantage Partner IBCS- PRIMAX Software to develop a data warehouse based on Oracle database 10g with a built-in business intelligence solution, the first such solution developed in Bangladesh . So although data integration has come a long way and at the same time individual data warehouses for individual telecommunication company has been in existence for a long time there is a real dearth of industry wide integrated data warehouses suited for all the telecommunication companies in a country . So our project is unique in that essence and a prototype in Bangladesh. References  W. H. Inmon. Building the Data Warehouse. JohnWiley & Sons, 1996.  Jiawei Han, Micheline Kamber. Data Mining: Concepts and Techniques. Elsevier Inc.,2006  Berson, A., & Smith, S. J. Data warehousing, data mining, and OLAP. New York: McGraw-Hill,1997  Social Ties and their Relevance to Churn in Mobile Telecom Networks Koustuv Dasgupta, Rahul Singh, Balaji Viswanathan, Dipanjan Chakraborty, Sougata Mukherjea, Amit A. Nanavati IBM India Research Lab .  Euler, T. Churn prediction in telecommunications using Miningmart. In Proceedings of the Workshop on Data Mining and Business (DMBiz, 2005).  IBM - Telecommunications Data Warehouse (TDW) General Information Manual  Kimball, Reeves, Ross et al , The Data Warehouse Lifecycle Toolkit  A.Silberschatz, Henry F. korth, S. Sudarshan: Database:System Concept. McGraw- Hill Education(2006)  Kimball, R.: The Data Warehouse Toolkit: Practical Techniques For Building Dimensional Data Warehouse. John Wiley & Sons (1996)  Golfarelli, M., Maio, D., Rizzi, S.: Conceptual Design of Data Warehouses from E/R Schemes. In: Proceedings of the 31st HICSS, IEEE Press (1998)  Boehnlein, M., Ulbrich vom Ende, A.: Business Process Oriented Development of Data Warehouse Structures. In: Proceedings of Data Warehousing 2000, Physica Verlag (2000)  Westerman, P.: Data Warehousing using the Wal-Mart Model, Morgan Kaufmann (2001)  Poe, V.: Building a Data Warehouse for Decision Support. Prentice Hall (1996)  Alon Y. Halevy, Naveen Ashish, Dina Bitton ,Michael J. Carey, Denise Draper, Jeﬀ Pollock, Arnon Rosenthal, and Vishal Sikka. Enterprise information integration: successes, challenges and controversies. In SIGMOD Conference, pages 778–787, 2005.  Xin Dong and Alon Halevy. A Platform for Personal Information Management and Integration. In Proc. of CIDR, 2005.  Jim Gemmell, Gordon Bell, Roger Lueder, Steven Drucker, and Curtis Wong. Mylifebits: Fulﬁlling the memex vision. In ACM Multimedia, 2002.  Dennis Quan, David Huynh, and David R. Karger. Haystack: A platform for authoring end user semantic web applications. In ISWC, 2003.  Data Integration: The Teenage Years - Alon Halevy,Google inc. ,Anand Rajaraman , Kosmix Corp. , Joann Ordille , Avaya Labs . ACM. VLDB „06, September 12-15, 2006, Seoul, Korea.  Roy, P., Mohania, M., Bamba, B., and Raman, S. 2005. Towards Automatic Association of Relevant Unstructured Content with Structured Query Results. CIKM 2005.  Case Study : How to Make Telecom Pricing Strategy Using Data Warehouse Approach Seungjae Shin, Gilju Park, Wonjun Lee, Sunmi Lee Korea Telecom R&D Group, Telecom Economic Research Lab  Data Warehousing and Integration in Telecom Italia , M. Trisolini et. al. , 2000.  Oracle Customer Snapshot: http://www.oracle.com/customers/snapshots/lgr- telecommunications-db-business-benefits-case-study.pdf  Oracle Data Warehousing Customers list: http://www.oracle.com/customers/products/data-warehouse.html  Oracle Customer Snapshot: http://www.oracle.com/customers/snapshots/grameenphone-tech-snapshot.pdf