Oracle Warehouse Builder 10g Release 2 Implementing Advanced Data Warehouse Concepts at Elizabeth Arden Inc. October 2006 Note: This document is for informational purposes. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described in this document remains at the sole discretion of Oracle. This document in any form, software or printed matter, contains proprietary information that is the exclusive property of Oracle. This document and information contained herein may not be disclosed, copied, reproduced, or distributed to anyone outside Oracle without prior written consent of Oracle. This document is not part of your license agreement nor can it be incorporated into any contractual agreement with Oracle or its subsidiaries or affiliates. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 2 Oracle Warehouse Builder 10g Release 2 Implementing Advanced Data Warehouse Concepts at Elizabeth Arden INTRODUCTION In many organizations information requirements become more advanced and require an adaptive architecture to deliver on these advanced requirements in a timely manner. This latter requirement forces organizations to put more emphasis on planning ahead and implementing more advanced concepts in production environments. The specific solutions we are discussing in this paper are technically elegant and effective, but need to achieve real business goals to warrant their development time. For Elizabeth Arden Inc. the goals were concrete and tangible: · Ensure business users own good as well as bad data, allowing for better overall information quality and clear information ownership in the business. · Empower business users to create their own complex reporting solutions reducing the involvement of IT developers and reducing the cost to the business. · Allow for quick and easy conversion of outside or external data reducing the time and effort required to integrate these new data sets. Ultimately, if all of these goals are achieved, Elizabeth Arden achieves better information quality, faster turn around cycles to add or move new data and gets better decisions directly from users. The company also offloads some reporting related efforts from the IT staff, removing bottle-necks from the reporting process for end-users. In this paper we discuss these advanced concepts that are useful in any data warehouse situation: · In place rejection and realignment of data elements to allow for business ownership of all data. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 3 · Embedded special time constructs to enable business users to create complex analysis reports. · Hierarchy based data conversions to easily incorporate new data, loading data at a different level from the display level to end users. We will elaborate on the theory behind the concepts and show the implementation of the concepts using Warehouse Builder 10g Release 2 as a framework. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 4 IN PLACE REJECTION AND AUTOMATED REALIGNMENT OF DATA ELEMENTS The in place rejection concept is based on the fact that business users ultimately own the data they are viewing in their reports. This simple fact means that business users are also ultimately responsible for bad data. To allow business users to see for themselves where bad data is residing it should be allowed into the warehouse. In place rejection and realignment is a way to achieve this in a dimensional model. Concepts In a dimensional model, the fact table is linked to the dimensions based on some key value. For each fact record, to load it into the fact table, the dimension keys must be a match, otherwise the constraints are enforced and the record is not loaded. To load facts, therefore the dimensions must hold the exact data elements that the new fact records expect. If a fact record contains bad data (for at least one of the dimensions it references), then the normal process in a relational database is to reject the record. Rejecting the record, however does not expose the records to the business user. In some cases it might end up in a pile of incorrect data that someone should go through at some point in time. Experience learns that that task is hardly ever completed in a timely manner. Hence the concept of in place rejection is introduced. By rejecting the records in place within the fact table the business users see the misaligned data at each report, and are consciously aware the bad data is there and how much of it is lying around. They also see the business value (in dollars, units or whatever the measurement type is) of misaligned data. In place rejection To achieve the goal of placing the bad data in the actual reporting systems, in place rejection uses generated dimension members to capture the bad data. For each dimension member at the linked level (typically this is the lowest level of the dimension) a value is introduced that is then used as a garbage collector. All data that does not match any of the members is assigned to this garbage collector value. The garbage collector value can either be a static value like rejected, or a dynamic value like for example REJ_originalvalue. Alternatively you can log the entire input row and the new key value in a table for later processing of reloads. In all cases the lowest level rolls up to a distinct level in the top of the hierarchy and the total gets the in place rejected records so the total of a star always rolls up correctly. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 5 Realignment In some cases in place rejection is caused not by bad data, but by bad timing. In that case the realignment happens without any interference directly upon the next load of the facts and dimensions. When bad data is truly the cause of misalignment, realignment can only happen if the data is fixed and then reloaded. Regardless of the fix, the system must be built in such a way that the realignment is detected and the data that was labeled as incorrect is realigned in the database to be correct. It is crucial that the system is not double counting due to a previous error. A look at Warehouse Builder These concepts are automated in the ETL processes in Warehouse Builder 10g Release 2. Here we describe some of the implementation details. The first pass through the data is loading the dimensions in a regular mapping. For this paper that mapping is not interesting as it simply loads all dimension records currently present in the source. The fact mapping is the first mapping that will encounter any issues with the load if a dimension record is not (yet) loaded. That mapping utilizes a plain Warehouse Builder lookup, but returns a default link value if the dimension record is not found. Figure 1 Using no-match rows in key-lookup to always load a fact Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 6 This way, all facts are loaded, with a default dimension ID, which is as is shown in Figure 1 set to 4. Because of the lookup functionality that comes with Warehouse Builder this is s straightforward step in the ETL process. The next step is less trivial, and that is because the keys need to be realigned when a dimension record is found. To match and recode a source table is used that contains the original references, from there a lookup is done to the final dimension to establish if there is a new record that requires recoding. Figure 2 Lookup a dimension value for recoding If there is no recoding (e.g. the record is still not found in the dimension) the default key is loaded into the recoding table again. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 7 Figure 3 If no record is found, the ID remains set to the default value Because the recoding table has both the identifying information for the entire fact record, all of its logical key references are stored as well as the surrogate key for the recoded dimension, facts now correctly roll up to their new dimension link. Benefits By applying this strategy Elizabeth Arden achieves some of the following benefits in either the infrastructure provided or in the business environment: · No special structure needed to store error data simplifies the data model. · All data is visible in a single environment rather than in a special area for erroneous data. · Balancing data to the source is more efficient. · Simplicity in dependencies between ETL jobs makes the loading process simpler and less prone to delays. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 8 EMBEDDED SPECIAL TIME CONSTRUCTS While time is a simple concept in our every day life, it actually can complicate reporting on a system. In most cases the data warehouse or reporting database takes only a base time period as its storage point. In other words, sales orders are loaded when they are booked, and that is the time of the transaction. Users on the reporting side however often have questions like, how does this months sales compare to last month, or to last year for the same month. When the reporting system is only concerned with the base time, the user now has to create the calculations to get the desired answers. Embedded special time constructs are moving the calculation of these desired time constructs down into the reporting system. Essentially the data model will embed the special time constructs for the user to choose from. By embedding it in the base data layer of the reporting system, heavy lifting to run the calculations is done when the data is refreshed, not when the user needs an answer. Concepts The main important concept is pre-calculation, where a reporting entity is pre-built for the users. A data warehouse in itself is a good example of implementing this concept. Instead of heaving each user join the tables to get the sales for all customers over the last month out of the ERP system, it is pre-built. Embedded special time goes one step further. It creates more reporting constructs up-front giving easy access to hard to get data. To pre-calculate many options can be looked at. On-Line Analytical Processing (OLAP) is one storage technique that heavily uses pre-built information. In this section however we cover a strictly relational implementation, using two diverse techniques to pre-build the special time constructs: · Analytic SQL · A bitmap table linking fact to dimensions Using Analytic SQL As with OLAP data, if you start to create facts that only exist in certain cells for the dimensions, sparsity is the result. The data is sparse in that for certain dimension combinations no fact is stored. The problem with sparse data is that it increases storage and typically can decrease performance. To avoid this sparsity in the query objects, an extra join is used to condense the data set, and then analytic SQL functions enable the data to be shown to users as embedded special time constructs. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 9 The other advantage (this is part of why we do this) is that the heavy lifting is not done by users in client calculations, but by the database. This increases performance and alleviates the burden of creating the calculations. Using a Bitmap Table A bitmap table is an intermediate table that links the fact table to the time dimension allowing the facts to be looked at for different time constructs by pre- joining the data set. A bitmap table would look something like this (note this is not a complete table but should convey the idea): from_period to_period CP LY_CP LU_R2 LY_R3 QTD_CP LY_QTD_CP 200701 200701 1 0 0 0 1 0 200601 200701 0 1 1 1 0 0 200512 200701 0 0 1 1 0 0 200511 200701 0 0 0 1 0 0 Table 1 Bitmap Table Example The columns in the bitmap table represent the special time constructs we want to work with. The SQL will join on the rows to determine when there are values to be shown leveraging the bitmap in the table. A look at Warehouse Builder These concepts are added to Warehouse Builder in the schema design, and the queries are created in views again stored in the Warehouse Builder repository. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 10 Table 2 A densified time view in the data object editor Focus on Analytic SQL As we said, there are 2 important concepts here. The first is to deal with the analytic SQL itself. The below fragment shows a particular snippet from the select clause of the statement, note the analytic SQL constructs. Select <snip> , NVL ( SUM ( fact.quantity_on_hand ) OVER ( PARTITION BY fact.company_id, fact.item_sku_id, v_time.fiscal_period_cd ORDER BY v_time.fiscal_period_cd ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) , 0 ) ptd_quantity_on_hand <snip> The second part is the actual join used to densify the data. It literally squeezes the air of the result set and gives us better performance in retrieving the results. That join is shown in the snippet below. Select <snip> FROM Hybrid_Eaden.c_inventory_rpt fact PARTITION BY (fact.company_id,fact.item_sku_id) RIGHT OUTER JOIN Hybrid_Eaden.d_ea_month v_time ON ( fact.ea_month_id = v_time.ea_month_id ) Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 11 <snip> Focus on the Bitmap Table The bitmap table is linked from the fact table to the dimension. The bitmap (either a 1 or a 0) is then added into the select clause of the query. If we look at our bitmap table again, and then at the usage in the select clause we can understand where data is shown (there is a 1) or not shown in the results (there is a 0) allowing us to create a nice reporting solution. , (fact.sales_units*time.ly_qtd_cp) , (fact.sales_units*time.ly_tot_qtr_cp) , (fact.sales_units*time.ly_tot_hyr_cp) , (fact.sales_units*time.ly_tot_yr_cp) , (fact.sales_units*time.ly_r3) , (fact.sales_units*time.ly_r6) , (fact.sales_units*time.ly_r12) FROM c_global_sales_affiliate fact , d_ea_month_matrix time WHERE fact.ea_month_id = time.from_ea_month_id The bitmap is simply joined to the fact table but the measures in the fact are multiplied with the value in the bitmap. This allows each measure to either hold data, or to hold a zero value. In this case zero means that this cell is not applicable for the measure. Rolling the totals up will then just add a zero value and not influence any totals. In the case where the bitmap has a one, the value is shown in the measure and is counted in rollups and in reporting. Users do not have to figure this out, they simply see either a value or a zero value. Advantages and disadvantages Both methods have their advantages and disadvantages. At Elizabeth Arden the following points came out of the woodworks. Analytic SQL advantage has the advantage that it outperforms the traditional approach of writing complex sub-queries. It has the disadvantage that performance can be impacted by because query predicates are not always passed to the underlying view. The main disadvantage of the bitmap method is that the query pulls in more data than is strictly required. This of course will impact performance as well. Both methods however do achieve their main goal of creating a simpler reporting environment for the end user and allows the IT professionals to focus on developing better systems. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 12 Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 13 DIMENSION BASED DATA CONVERSION In today’s world a successful project faces some interesting challenges. Mostly the addition of new data from either new subsidiaries, or acquisitions, or other similar systems from what the system started with. Adding data to an existing system may sound simple, simply load the new (or old historical) data into the database and you are done. If it was just that simple, this section would not be written. The problem, especially with adding similar but different systems, is that in many cases data needs to be recoded or rolled up in different ways. When loading the new data it needs to conform to the current hierarchies and strategies. To achieve this can be a daunting task. Concepts The dimension-based conversion is based on a simple but elegant concept. Rather than changing data structures or original data loaded into a dimension to convert data, the rollup in a hierarchy manages the changes. This concept has a number of very compelling benefits. One is that a change is not permanent or destructive to the original data. The original record and linkage is still in place, it just rolls up differently. The second benefit is that these typically can be done quickly with much less impact on the system than a normal update. Load level vs. Reporting levels The two benefits we mentioned earlier are achieved by adding a level to each dimension (or better to a hierarchy) that acts as the lowest level. This lowest level is called a load level. In other words data for the fact is always loaded at this level in the hierarchy. The load level however is never exposed to users of the reporting system. These users only see the so-called reporting levels. Realigning vs. Overwriting In many cases conversion methods are destructive. For example fact data might be stored at a higher grain, and the new data needs to be aggregated. Doing this in a normal load will convert the data once. If that turned out to be not completely accurate, then there is no easy fix. If the source is lost by that time, there is no fix at all. Realigning simply keeps the new records as is in the grain they are in originally. Overwriting is not what we want to do to our data. An example If we look at a table structure and data, this concept will become a lot clearer and the benefits become visible instantly. Let’s say we have loaded three products into the dimension table (in this example we only show a few dimension columns). The result with the booked (which is what the load level is called in this example) and reporting levels will look like this: Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 14 booked_products reported_products source_key A A STANDARD C C STANDARD D D STANDARD Table 3 The dimension table with booked and reported levels Now lets say we need to merge from a new vendor into this dimension, and we do have the following mapping table that shows how a product from the new vendor can be recoded to the Elizabeth Arden product codes (and therefore hierarchies): Vendor XYZ Product Elizabeth Arden Product 123 A 654 A 234 D A C Table 4 Recoding or matching table With the levels as shown in Table 3 we can now quickly load the dimension and see the following as a result after loading and recoding: booked_products reported_products source_key A A STANDARD C C STANDARD D D STANDARD 123 A VENDORXYZ 654 A VENDORXYZ 234 D VENDORXYZ A C VENDORXYZ Table 5 Adding new products is simple and retains history We can now see where the big benefits come in. The table in Table 5 shows both the original data and it shows the Elizabeth Arden product to the user. The revenue for this dimension is now directly available in the relevant business units. The other advantage is when a recoding is required, where for example product 234 should be linked to Elizabeth Arden product A, the dimension can be updated instead of all related facts. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 15 More to come The concept of building a dimension with two separate levels playing two separate roles, one for loading one for reporting is currently not in production at Elizabeth Arden, but may be implemented in future. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 16 SUMMARY AND CONCLUSION Inventive use of an ETL tool, forward looking data modeling and embedding smarts in data structures allows Elizabeth Arden to achieve two goals for their business. First, the IT staff can focus on their core capabilities of building and maintaining systems, not on creating reports for end users. Second, business users get better information quicker allowing these users to again do their job, run the business. With Warehouse Builder Elizabeth Arden has found a tool that it can use to implement these concepts to get the maximum benefits for the company. Implementing advanced data warehouse concepts at Elizabeth Arden Inc. Page 17 Implementing Advanced Data Warehouse Concepts at Elizabeth Arden October 2006 Author: Jean-Pierre Dijcks – Oracle and Tim Seyfried – Elizabeth Arden Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 www.oracle.com Copyright © 2006, Oracle. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners.