Testing the Incremental Load Improvements in 10gR2 OLAP - Part I December 23, 2005 Mark Rittman One of the questions that often comes up when I'm helping people with Oracle OLAP, is what is the most efficient way to load incremental data into a cube? Take for example a situation where you have an initial load of 8000 rows into a cube that has five dimensions, and subsequently you need to load another 2000 odd rows; do you rebuild the cube from scratch when you need to add the further 2000 rows, or do you leave the cube as it is and add the 2000 rows on top of what's already there, either as a separate load on their own or mixed in with the data that was already loaded? If you're using AWM10g, do you select the "Aggregate the full cube" option, or the "Aggregate just for incoming values" option, and if you choose the latter what benefits does it bring? This is actually one of the questions I've addressed in the forthcoming article I've written for DBAZine on Oracle Database 10g OLAP performance tuning, but I don't think I'm giving too much away when I tell you that, with 9iR2 and 10gR1, it actually doesn't make any difference at all whether you choose to aggregate just for new values or for all values in the cube, or whether you separate out the new data and load it in on it's own, or you just reload the entire dataset, old data and new - the aggregation process takes the same time whatever. However, one of the new things promised with 10gR2 is that the loading of incremental data is much faster than with 9iR2 or 10gR1, as this excerpt from the Oracle OLAP 10gR2 PDF shows: Quite an impressive improvement. With this in mind, I thought I'd load up the dataset that I used for the article and see what sort of improvements I could find. The cube consisted of two measures (order quantity, ship quantity) and five dimensions (product, customer, channel, time and promotion). The source table for the measures contained 11429 rows, and I split this into two load batches, one of 8884 rows, and 2545 in the second. I then put together four scenarios: 1. Load the first 8884 rows into the cube, select "Aggregate the full cube", then load these 8884 rows plus 2545 additional rows, again selecting "aggregate the full cube". 2. Load the first 8884 rows as before, select "Aggregate the full cube", then load these 8884 rows plus 2545 additional rows, but this time select "aggregate for incoming values only" 3. Load the first 8884 rows as before, then additionally load just the 2545 new rows and select "aggregate the full cube" 4. Load the first 8884 rows as before, then additionally load just the 2545 new rows, but this time select "aggregate for incoming values only" These scenarios were tested on 10gR1 and 10gR2 on Windows XP SP2, with 10gR2 being the developer preview that's available on OTN. 10gR2 was running in a VMWare virtual machine with 1GB of RAM allocated, whilst 10gR1 was running on the OS proper. All settings were left as the default, with no changes to SGA_TARGET, PGA_AGGREGATE_TARGET and so on. With 10g Release 1, the results looked like this: As I said earlier, there's no real difference between any of the load scenarios; each of them took around the same time to process the additional data. Also, the rate at which incremental data is processed is about one-fifth of the rate of the original data load - my take on this is that the load routine within Oracle OLAP has to "merge" the incoming data in with existing cells in the measures, which is a lot slower than straight "inserts". Nonetheless, there's no performance boost to be had by either selecting "aggregate incoming values only", or by separating out new data from old. Infact, if you look at the last scenario, doing a full load of all the 11k rows, you'll see that it's actually quicker to rebuild the cube from scratch than to try and add in an incremental load. With 10g Release 2 (remember, it's running in VMWare virtual machine, so we're not strictly comparing apples and apples, and it's beta code) the results look like this: A few points to note here: The rate at which initial data is loaded is about 50% faster than 10gR1 - not bad. When the new data is mixed in with the old and reloaded into the cube, 10gR2 is still faster than 10gR1, but the benefit is less than with our initial data load If the new data is split out from the old before it's loaded, 10gR2 is up to twice as fast as 10gR1 for loading incremental data. Strangely, it didn't seem to matter whether I selected the "aggregate just incoming values" option or not, in fact in the tests I carried out the load was actually faster if I just selected "aggregate all values". With these performance improvements, it's now worthwhile doing an incremental load, rather than rebuilding your cube from scratch each time. I ran the tests a few times to try and get some stable / averaged out figures, but you should bear in mind that the dataset is comparatively small and the incremental load "boosting" algorithm may provide more benefits the larger the size of your cube. So, what this says then is that there are performance improvements in 10gR2 for loading incremental data, but (from my tests, anyway) they are found when you just load just the new data into a cube, rather than reloading the entire dataset including your new data. I wasn't however able to see any performance benefit from choosing the "aggregating incoming values only" option - I seemed to get this benefit whether the option was selected or not. The performance benefit isn't huge - certainly not the 18-fold improvement that the Oracle slide suggests - but it is there, and it made the difference for me between reloading the cube from scratch each time, to putting in place a system where we perform an incremental load of just the new rows. Anyone else had a play around with this, perhaps got closer to the improvements in the slide? Let me know if you do. UPDATE 14-Sep-05: Scott Powell dropped me a line this evening with some comments on the above article. They're actually very significant as they explain why the incremental loads are taking almost as long as the full loads. Read on... "I suspect this is what is also causing your performance problems / results with the incremental loads in the article you published on Sept 8th "Testing the Incremental Load Improvements in 10gR2 OLAP". I've attached a quick script I wrote. This script creates 3 "tuple" tables, that hold all of the composite tuples from the source data. create table tuples_all as select distinct channelid, promoid, prodid, custid from gs_sales; create table tuples_load1 as select distinct channelid, promoid, prodid, custid from gs_sales_load_1; create table tuples_load2 as select distinct channelid, promoid, prodid, custid from gs_sales_load_2; select count(*) from tuples_all; select count(*) from tuples_load1; select count(*) from tuples_load2; TUPLES_ALL loads from gs_sales, TUPLES_LOAD1 loads from gs_sales_load_1, and TUPLES_LOAD2 loads from gs_sales_load_2. Here are the results: tuples_all 1,688 tuples tuples_load1 1,634 tuples tuples_load2 1,135 tuples Basically, with time as the dense dimension, I've removed it as a consideration from the tuples (because any reference to any time period will have to load the entire page). As you can see, your "load1" and "load2" data is touching almost all of the tuples. I believe this is why your performance is almost always taking almost the same amount of time as a full load - because essentially you are updating almost every page both with the initial and the incremental loads. I suspect (but have not yet tested) that your performance results would be markedly different if you did the following: a) redesigned the variables to be dimensioned like this: <channel composite<time customer product promotion>> b) split out the gs_sales_load_1 to include all data for time periods < 01-apr-2005 c) split out the gs_sales_load_2 to include all data for time periods >= 01-apr-2005 This gives a better representation of what an incremental load would look like - both because the variable now has TIME as a sparse dimension (meaning when the engine tries to aggregate it, it doesn't have to touch every page), and because loading incremental loads based on time periods is pretty common from a business perspective. I will do my best to clean up this documentation, but hopefully you'll find this useful. I'll try to check the performance of the incremental load in the next couple of days - if you beat me to it, let me know how it goes." Thank Scott! I'll rerun the tests as soon as possible and report back on the results. Until then, treat the results in my test as unreliable until verified otherwise. Update 15-Sep-05 : Scott send a follow-up email: "Mark, just FYI, I had the chance to redefine the cube and do some further testing. See the results immediately below - saw huge performance gains by choosing channel as the dense dim instead of time: Your initial cube <time <customer product promotion channel>> Load data time 65 seconds Aggregate data time almost 1.5 hours Size of variables approx 315 Mb each New cube <channel <time customer product promo>> Load data time 1 second Aggregate data time 33 seconds Size of variables approx 10 Mb each Redefining sparsity obviously made a huge difference. However, I then tried to do an incremental load, and did NOT see the results I expected. Basically, I broke up the gs_sales table into gs_sales_load_1 containing everything with a date < 01-apr-2005 (10,675 records), and gs_sales_load_2 with a date >= 01-apr-2005 (754 records). I expected to see a huge difference between the initial load and the incremental update, but didn't: Step 1: load data from gs_sales_load_1 and do a full cube aggregation: Load time: 1 sec Full aggregation time: 35 sec (note - a bit longer than the 33 seconds before, but not significant???) Step 2: incremental load from gs_sales_load_2 and do an incremental cube aggregation: Load time: 3 sec Incremental agg time: 21 secs Obviously the 21 secs is better than the 35 seconds, but not by the orders of magnitude I expected to see (I thought it would take just 3 or 4 seconds to do the whole thing) I'm not sure exactly what to make of this. With the solve times and disk size so small, its hard to judge if these results are a true representation. I'm considering loading up my big play cube to try a more robust test. What do you think? Note also that I haven't tested anything yet with compressed composites turned on. Thx, Scott" Update 16-Sep-05 :I had a word with Paul Cannon, our resident OLAP/Express expert, who came back with the following observations "Mark, This depends a bit on what Scott means by incremental load. Presumably he means loading data for one time period at a time. If this is the case then having time inside the composite is forcing the creating of a new set of tuples for each load. For example, if the combination of customer, product and promotion results in, say, 1000 tuples, then you will have 1000 'sets' of these tuples for each time period, but of course these tuples won't exist until data for each time period has been loaded. Therefore when loading a new set of data for a new time period as well as loading the data, the AW has the internal overhead of creating the extra tuples. If Time is outside the composite then the same single set of tuples will get used over again for each time period. (this is a very simplistic view - the actually number of tuples will usually vary for each load, but we'll ignore that for this example!) To put this another way, with time outside the composite, loading a new set of data will result in variable size growing by one dimension value. With time inside the composite, the variable is growing by 1000 dimension values. Whilst putting time inside the composite may result in gains in one area (aggregations etc...) you have to bear in mind the negative effect this extra internal maintenance is having. Hence Scott won't get quite the performance improvement he's expecting. Unless the time dimension is genuinely sparse (which seems unlikely as data is being loading on a per time period basis and presumably aggregated up a time hierarchy), perhaps Scott should try leaving both time and channel outside of the composite. Paul." Update 16-Sep-05 : Scott has posted a question on this subject to the OTN Forum. You can check it out here. Update 16-Sep-05 : Scott subsequently commented on Paul's observations: "I think Paul is referring to a typical OFA type setup, where almost every composite value (usually account & org) exists for just about every value of time (which is usually at the month level), making time a very dense dimension. However, on your GSW sample data, this is showing customer purchases at the day level spread out over 8 years. Unless the same customer purchases the same product through the same channel with the same promotion on just about every day over 8 years, time is going to be sparse. If you look at the original .xls worksheet (copied here), the sparsity rating of 433 for TIME on the base level data means that the same customer / product / channel / promo tuple only shows up once for every 433 time periods - meaning that if time is dense the other 432 cells on a page would be empty. Note also that Paul is completely right that the data gets more dense after aggregation - in this case, the time sparsity rating goes from 433 down to 41, but that still means that, on average, the same customer / product / channel / promo tuple shows up only once every 41 time periods... This is why making time sparse instead of dense resulting in the variable sizes shrinking from 600+ Mb down to about 20 Mb, and reduced the solve times from 1.5+ hours to only 33 seconds. One final thing to note is that I did re-split the data for the "initial" vs. "incremental" load to better highlight the incremental performance problem for Oracle. I split your GS_SALES table down to have everything prior to 6/28/2005 being in gs_sales_load_1 (11,411 records), and only 6/28/2005 - 6/29/2005 being in gs_sales_load_2 (18 records). This means that the incremental load should only be creating 18 new tuples plus the associate parent tuples (comes out to being exactly 1,466 total tuples). Given that the entire solved cube contains 435,274 total tuples, I would expect the time required to aggregate should be (1,446 / 435,274) * the 33 seconds the full aggregation takes - or about .11 seconds. Obviously there may be some overhead on setting up the incremental loads (perhaps that is what is distorting the numbers on this small sample?) So long and short - Paul's comments make sense in more of an "OFA" type setup, but not for a cube that focuses on customer purchases, etc. Also note that this is one of the big reasons I hate to see recommendations by Oracle that time always be made a dense dimension...because I have found, for MANY applications, that time is often one of the LEAST dense dimensions. Let me know what you and Paul think! I'll let you know what I hear from the Oracle side (the thread is on OTN also)." There's obviously a lot more to dealing with scarsity than saying that "time is always the dense dimension" and Scott in particular has sent me some useful calculations and examples for working out sparsity both prior to, and after, a cube load. I'll post an article when I get back from Open World on what my take on all this is.
Pages to are hidden for
"Testing the Incremental Load Improvements in 10gR2 OLAP"Please download to view full document