Oracle Database 10g SQL Model Clause

Document Sample
Oracle Database 10g SQL Model Clause Powered By Docstoc
					40166



        Oracle Database 10g
         SQL Model Clause


          Andy Witkowski, Architect
          Thomas Kyte, VP
          Oracle Corporation
What’s now in SQL for Modeling
    Aggregation Enhancements
     –   Cube, Rollup, Grouping Sets
     –   New aggregates: Inverse Distribution,
         FIRST/LAST,etc
    Analytic Functions
     –   Window Functions: Rank, Moving, Cumulative
     –   Statistical Functions: Correlation, Linear
         Regression,etc
  Old tools still have more modeling power than
   SQL
 SQL Model enhances SQL with modeling power
     –   Spreadsheets, MOLAP engines
Case Study – Modeling with
Excel fits well at the personal scale
 Excel
    –   UI and Formatting
    –   Calculations (build-in functions, formulas)
    –   What-If analysis
   Excel fits poorly at corporate scale for
    modeling
    –  Cryptic row-column addressing
    – No metadata, No standards, No mathematical
       model
    – 100s of spreadsheets and consolidation by hand
    – Does not scale (1000’s formulas, TB of data)
        Excel Modeling with SQL Modeling
ReplacePerpetual data exchange: databases->Excel
    –
 Modeling with SQL Model
 Language: Spreadsheet-like calculations in SQL
    –   Inter-row calculation. Treats relations as an N-Dim array
    –   Symbolic references to cells and their ranges
    –   Multiple Formulas over N-Dim arrays
    –   Automatic Formula Ordering
    –   Recursive Model Solving
    –   Model is a relation & can be processed further in SQL
    –   Multiple arrays with different dimensionality in one query
 Performance
    –   Parallel Processing in partitioning & formulas
    –   Multiple-self joins with one data access structure
    –   Multiple UNIONs with one data access structure
 Why Better?
    –   Automatic Consolidation (models as views – combine using
        SQL)
    –   Self Adjusting (as database changes no need to re-define)
    –   One version of truth (calc directly over data base, no exchange)
SQL Model
Concepts
        prod time                 s
                                              Define Relation as Array
           vcr   2001         9
           dvd   2001         0


Relation                                        SELECT prod, time, s FROM sales

Array        1    2      3        4    1999
             5    6      7        8    2000
                                                DIMENSION BY (prod, time) MEASURES (s)
             9    0      1        2    2001
           vcr   dvd     tv       pc

                  prod
        prod time                s
                                             Define Business Rules
        vcr     2001         9
        dvd     2001         0


 Relation                                      SELECT prod, time, s FROM sales

Array       1    2      3        4    1999
            5    6      7        8    2000
                                               DIMENSION BY (prod, time) MEASURES (s)
            9    0      1        2    2001
        vcr     dvd     tv       pc

                 prod             RULES UPSERT
                                  (
 Sales in 2000 2x of previous year s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2,
 Predict vcr sales in 2002          s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000],
 Predict dvd sales in 2002          s[dvd, 2002] =AVG(s) [CV(prod), time<2001]
                                  )
        prod time                s
                                             Evaluate Formulas – 1st
           vcr   2001        9
           dvd   2001        0


Relation                                       SELECT prod, time, s FROM sales

Array        1    2     3        4    1999
             2    4     6        8    2000
                                               DIMENSION BY (prod, time) MEASURES (s)
             9    0     1        2    2001
           vcr   dvd    tv       pc

                                   RULES UPSERT
                                   (
 Sales in 2000 2x of previous year s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2,
 Predict vcr sales in 2002          s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000],
 Predict dvd sales in 2002          s[dvd, 2002] = AVG(s) [CV(prod), time<2001]
                                   )
       prod time                  s
                                              Evaluate Formulas – 2nd
           vcr    2001        9
           dvd    2001        0


Relation                                        SELECT prod, time, s FROM sales

             1     2     3        4    1999
             2     4     6        8    2000
                                                DIMENSION BY (prod, time) MEASURES (s)
             9     0     1        2    2001

             11                        2002

           vcr    dvd    tv      RULES UPSERT
                                  pc
                                 (
Sales in 2000 2x of previous year s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2,
Predict vcr sales in 2002          s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000],
Predict dvd sales in 2002          s[dvd, 2002] = AVG(s) [CV(prod), time<2001]
                                 )
       prod time                 s
                                             Evaluate Formulas – 3rd
           vcr   2001        9
           dvd   2001        0


Relation                                       SELECT prod, time, s FROM sales

             1    2     3        4    1999
             2    4     6        8    2000
                                               DIMENSION BY (prod, time) MEASURES (s)
             9    0     1        2    2001

             11 3                     2002

           vcr   dvd    tv        RULES UPSERT
                                 pc
                                  (
Sales in 2000 2x of previous year s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2,
Predict vcr sales in 2002          s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000],
Predict dvd sales in 2002          s[dvd, 2002] = AVG(s) [CV(prod), time<2001]
                                  )
        prod time                 s
                                              Return as Relation
         vcr      2001        9
         dvd      2001        0


 Relation                                        SELECT prod, time, s FROM sales

            1      2      3       4    1999
            2      4      6       8    2000
                                                 DIMENSION BY (prod, time) MEASURES (s)
            9      0      1       2    2001

            11 3                       2002

         vcr      dvd    tv       pc             RULES UPSERT
                                                 (
 Self-join.                                        s[ANY, 2000] = s[CV(prod), CV(time) - 1] * 2,
 join + UNION                                      s[vcr, 2002] = s[vcr, 2001] + s[vcr, 2000],
 join + UNION                                      s[dvd, 2002] = AVG(s) [CV(prod), time<2001]
                                                 )

Relation again
            vcr    2001           9
            dvd    2001           0
            vcr    2002       11               Rows updated & inserted by the Model clause
            dvd    2002        3
Model Clause – Components
 Model clause                                               Partitioning

SELECT region, prod, time, s                              Dims of array
FROM sales
GROUP BY region, prod, time
MODEL PARTITION BY (region) DIMENSION BY (prod, time)
        MEASURES (sum(sales) s, count(sales) c)
RULES ITERATE ( ) UNTIL ( )                              Model options
(
           s[ANY, 2000] = s[CV(prod), CV(time) - 1],
           s[dvd, 2003] = s[dvd, 2002] + s[dvd, 2001],       Formulas
   UPSERT s[vcr, 2003] = AVG(s) [vcr, time < 2001]
)
ORDER BY region, product, time, s;
                                                    Formula Options
Key Concepts (1)
 New SQL Model Clause:
    –   Data as N-dim arrays with DIMENSIONS & MEASURES
    –   Data can be PARTITION-ed - creates an array per partition
    –   Formulas defined over the arrays express a (business) model
 Formulas within a Model:
    –   Use symbolic addressing using familiar array notation
    –   Can be ordered automatically based on dependency between
        cells
    –   Can be recursive with a convergence condition – recursive
        models
    –   Can UPDATE or UPSERT cells
    –   Support most SQL functions including aggregates
Key Concepts (2)
 Result of a SQL Model is a relation
   –   Can participate further in processing via joins, etc.
   –   Can define views containing Model computations
 SQL Model is the last query clause
   –   Executed after joins, aggregation, window
       functions
   –   Before ORDER BY
 Main Model and Reference Models
   –   Can relate models of different dimensionality
  Formula Fundamentals (1)
 Formulas: SQL expressions over cells with aggs, functions,
  etc.
 Formula has a left and right side and represents assignment
    –   s[‘vcr’, 2002] = s[‘vcr’, 2001] + s[‘vcr’, 2000]   – single ref
    –   s[‘vcr’, 2002] = AVG(s)[‘vcr’, t<2002]               – multi ref on right


 Left side can qualify multiple cells
    –   s[p IN (‘vcr’,’dvd’), t<2002] = 1000       – multi ref on left
    –   s[ANY, t=2002] = 2 * s[CV(p), CV(t)-1]      – left-right
        correlation
    –   s[p IN (SELECT prod FROM prod_tb), 2000] = 1000


 Formula can operate in update or upsert mode
    –   update s[‘vcr’, 2002] = s[‘vcr’, 2001] + s[‘vcr’, 2000]
    –   upsert s[‘vcr’, 2002] = s[‘vcr’, 2001] + s[‘vcr’, 2000]
Formula Fundamentals (2)
 Function CV(dimension) propagates values from
  left to the right side. In example, products in 2002
  are sum of two previous years.
    s[ANY, 2002] = s[CV(p), CV(t)-1] +s[CV(p), CV(t) – 2]
Formula Fundamentals (2)
 Function CV(dimension) propagates values from
  left to the right side. In example, products in 2002
  are sum of two previous years.
    s[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) – 2]


 Formula result can depend on processing order.
  Can specify order in each formula. E.g., shift by
  time:
    s[‘vcr’, ANY] ORDER BY t = s[‘vcr’, CV(t) - 1]
Formula Fundamentals (2)
 Function CV(dimension) propagates values from
  left to the right side. In example, products in 2002
  are sum of two previous years.
    s[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) – 2]


 Formula result can depend on processing order.
  Can specify order in each formula. E.g., shift by
  time:
    s[‘vcr’, ANY] ORDER BY t = s[‘vcr’, CV(t) - 1]
            vcr   2001   300.00     0
            vcr   2002   350.00 300.00
            vcr   2003   400.00
            vcr   2004   450.00
            vcr   2005   500.00
                  ORDER BY t
Formula Fundamentals (2)
 Function CV(dimension) propagates values from
  left to the right side. E.g, products in 2002 are
  sum of two previous years
    s[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) – 2]


 Formula result can depend on processing order.
  Can specify order in each formula. E.g., shift by
  time:
    s[‘vcr’, ANY] ORDER BY t = s[‘vcr’, CV(t) - 1]
              vcr    2001   300.00     0
              vcr    2002   350.00 300.00
              vcr    2003   400.00 350.00
              vcr    2004   450.00
              vcr    2005   500.00
                    ORDER BY t
Formula Fundamentals (2)
 Function CV(dimension) propagates values from
  left to the right side. E.g, products in 2002 are
  sum of two previous years
    s[ANY, 2002] = s[CV(p), CV(t) -1] + s[CV(p), CV(t) – 2]


 Formula result can depend on processing order.
  Can specify order in each formula. E.g., shift by
  time:
    s[‘vcr’, ANY] ORDER BY t = s[‘vcr’, CV(t) - 1]
             vcr   2001   300.00       0
             vcr   2002   350.00   300.00
             vcr   2003   400.00   350.00
             vcr   2004   450.00   400.00
             vcr   2005   500.00   450.00
                   ORDER BY t
   Model Options – Fundamentals
                                                            global options
MODEL
         [ UNIQUE DIMENSIONS | UNIQUE SINGLE REFERENCE ]
         [ IGNORE NAV | KEEP NAV ]
PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES [ UPDATE | UPSERT ]
        [ AUTOMATIC ORDER | SEQUENTIAL ORDER ]
                                                             rule   options
(
   s[ANY, 2002] = 1.2 * s[CV(product), 2002],
   s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001],
   s[‘vcr’, 2003] = AVG(s) [‘vcr’, t BETWEEN 1995 AND 2000],
   s[‘video’, 2003] = s[‘dvd’, 2003], s[‘vcr’, 2003]
)
NAV Options: Handling Sparse
Data     West dvd 2001 300.00
                     West     tv    2002   500.00
                     West     vcr   2001   200.00
                     West     vcr   2002   400.00


MODEL KEEP NAV PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES UPSERT
(
  s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001]
  s[‘tv’ ,2003] = sum(s) [‘tv’, t BETWEEN 2001 AND 2002]
)
         keep nav
                              2001 ?
 West   dvd   2001   300.00
 West   tv    2002   500.00
 West   dvd   2003   -
 West   tv    2003   500.00
 West   vcr   2001   200.00
 West   vcr   2002   400.00
NAV Options: Handling Sparse
Data     West dvd 2001 300.00
                   West   tv    2002   500.00
                   West   vcr   2001   200.00
                   West   vcr   2002   400.00


MODEL INGNORE NAV PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES UPSERT
(
  s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001]
  s[‘tv’ ,2003] = sum(s) [‘tv’, t BETWEEN 2001 AND 2002]
)
                                                   ignore nav
                                                                assume 0
                                       West   dvd   2001   300.00
                                       West    tv   2002   500.00
                                       West   dvd   2003   300.00
                                       West   tv    2003   500.00
                                       West   vcr   2001   200.00
                                       West   vcr   2002   400.00
NAV Options: Handling Sparse
Data     West dvd 2001 300.00
                     West     tv    2002   500.00
                     West     vcr   2001   200.00
                     West     vcr   2002   400.00


MODEL KEEP NAV PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES UPSERT
(
  s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001]
  s[‘tv’ ,2003] = sum(s) [‘tv’, t BETWEEN 2001 AND 2002]
)
         keep nav                                  ignore nav
                              2001 ?                                assume 0
 West   dvd   2001   300.00                West   dvd   2001   300.00
 West   tv    2002   500.00                West    tv   2002   500.00
 West   dvd   2003   -                     West   dvd   2003   300.00
 West   tv    2003   500.00                West   tv    2003   500.00
 West   vcr   2001   200.00                West   vcr   2001   200.00
 West   vcr   2002   400.00                West   vcr   2002   400.00
   Automatic Formula Ordering
MODEL PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES UPDATE AUTOMATIC ORDER
(
F1: s[‘vcr’, 2003] = AVG(s) [‘vcr’, t BETWEEN 1995 AND 2002],
F2: s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001],
F3: s[ANY, 2002] = s[CV(p), 2001] + s[CV(p), 1999]
)
   Automatic Formula Ordering
MODEL PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES UPDATE AUTOMATIC ORDER
(
F1: s[‘vcr’, 2003] = AVG(s) [‘vcr’, t BETWEEN 1995 AND 2002],
F2: s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001],
F3: s[ANY, 2002] = s[CV(p), 2001] + s[CV(p), 1999]
)


F1 depends on F3 and F2 depends on F3, thus F3 automatically first:
   Automatic Formula Ordering
MODEL PARTITION BY (r) DIMENSION BY (p, t) MEASURES (s)
RULES UPDATE AUTOMATIC ORDER
(
F1: s[‘vcr’, 2003] = AVG(s) [‘vcr’, t BETWEEN 1995 AND 2002],
F2: s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001],
F3: s[ANY, 2002] = s[CV(p), 2001] + s[CV(p), 1999]
)


F1 depends on F3 and F2 depends on F3, thus F3 automatically first:

RULES UPDATE AUTOMATIC ORDER
(
F3: s[ANY, 2002] = s[CV(p), 2001] + s[CV(p), 1999]
F2: s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001],
F1: s[‘vcr’, 2003] = AVG(s) [‘vcr’, t BETWEEN 1995 AND 2000]
)
UPDATE, UPSERT & Partitions
  Region         Product Time          s
  East           dvd      2001        100
  East           dvd      2002        150
  East           vcr     2002         100
  West           dvd     2001         200

MODEL PARTITION BY (r) DIMENSION BY (p,t) MEASURES (s)
RULES UPDATE IGNORE NAV
(
  UPDATE s[‘vcr’, 2002] = s[‘vcr’, 2002] * 1.2,
  UPSERT s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001]
)
UPDATE, UPSERT & Partitions
  Region         Product Time          s
  East           dvd      2001        100
  East           dvd      2002        150
  East           vcr     2002         100
  West           dvd     2001         200

MODEL PARTITION BY (r) DIMENSION BY (p,t) MEASURES (s)
RULES UPDATE IGNORE NAV
(
  UPDATE s[‘vcr’, 2002] = s[‘vcr’, 2002] * 1.2,
  UPSERT s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001]
)
  Region         Product Time         Old s    New s
  East           dvd      2001        100      100
  East           dvd      2002        150      100
  East           vcr     2002         100      120          updated

  West           dvd       2001       200      200
UPDATE, UPSERT & Partitions
  Region         Product Time          s
  East           dvd      2001        100
  East           dvd      2002        150
  East           vcr     2002         100
  West           dvd     2001         200

MODEL PARTITION BY (r) DIMENSION BY (p,t) MEASURES (s)
RULES UPDATE IGNORE NAV
(
  UPDATE s[‘vcr’, 2002] = s[‘vcr’, 2002] * 1.2,
  UPSERT s[‘dvd’, 2003] = s[‘dvd’, 2002] + s[‘dvd’, 2001]
)
  Region         Product Time         Old s    New s
  East           dvd      2001        100      100
  East           dvd      2002        150      100
  East           vcr     2002         100      120          updated
  East           dvd     2003           -      250
  West           dvd     2001          200     200          upserted
  West           dvd     2003           -      200
Different dimensions: Reference
 Relate Models with different dimensions. Represent each as n-
 dimensional array: one main, others as reference or lookup arrays.

    Sales Table
                                         Conv table converts currency to $
    c         p    t       s
    USA      dvd   2001   300.00 $                c        ratio
    USA      tv    2001   500.00 $                USA      1
    Poland   vcr   2001   200.00 zl               Poland   0.24
    France   vcr   2001   100.00 fr               France   0.12


SELECT c, p, t, s FROM sales
MODEL
  REFERENCE convert ON (SELECT c, ratio FROM conv) DBY (c) MEASURES(r)
MAIN DIMENSION BY (c,p,t) MEASURES (s)
RULES UPSERT
(
   s[ANY, ANY, ANY] = r[CV(c)] * s[CV(c), CV(p), CV(t)]
)
Different dimensions: Reference
    Sales Table
    c         p    t       s                   Conv table converts currency to $
    USA      dvd   2001   300.00 $                     c       ratio
    USA      tv    2001   500.00 $                     USA 1
    Poland vcr     2001   200.00 zl                    Poland 0.24
    France vcr     2001   100.00 fr                    France 0.12

SELECT c, p, t, s FROM sales
MODEL
  REFERENCE convert ON (SELECT c, ratio FROM conv) DBY (c) MEASURES(r)
MAIN DIMENSION BY (c,p,t) MEASURES (s)
RULES UPSERT
(
   s[ANY, ANY, ANY] = r[CV(c)] * s[CV(c), CV(p), CV(t)]
)

                     USA      dvd     2001   300.00 $
                     USA      tv      2001   500.00 $
                     Poland   vcr     2001    48.00 $
                     France   vcr     2001    12.00 $      Converted values
Recursive Model Solving
     Model can contain cyclic (recursive) formulas.
      - If cyclic formulas desired, use ITERATE option
      - If ITERATE not present, cyclic formulas automatically
           detected, and an error reported.
       Use ITERATE clause to specify # of iterations or
       Use UNTIL clause to specify convergence conditions
    SELECT x, s FROM dual
    MODEL DIMENSION BY (1 x) MEASURES (1024 s)
    RULES ITERATE (8)
    (
      s[1] = s[1] / 2
    )


    Iteration  1   2     3      4      5     6   7    8
    S value 1024 512   128     64     32    16   8    4
Recursive Model Solving
     Model can contain cyclic (recursive) formulas.
      - If cyclic formulas desired, use ITERATE option
      - If ITERATE not present, cyclic formulas automatically
           detected, and an error reported.
       Use ITERATE clause to specify # of iterations or
       Use UNTIL clause to specify convergence conditions
    SELECT x, s FROM dual
    MODEL DIMENSION BY (1 x) MEASURES (1024 s)
    RULES ITERATE (8)
    (
      s[1] = s[1] / 2
    )


    Iteration  1   2     3      4      5     6   7    8
    S value 1024 512   128     64     32    16   8    4
Recursive Model Solving
     Model can contain cyclic (recursive) formulas.
      - If cyclic formulas desired, use ITERATE option
      - If ITERATE not present, cyclic formulas automatically
           detected, and an error reported.
       Use ITERATE clause to specify # of iterations or
       Use UNTIL clause to specify convergence conditions
    SELECT x, s FROM dual
    MODEL DIMENSION BY (1 x) MEASURES (1024 s)
    RULES ITERATE (8)
    (
      s[1] = s[1] / 2
    )


    Iteration  1   2     3      4      5     6   7    8
    S value 1024 512   128     64     32    16   8    4
Recursive Model Solving
     Model can contain cyclic (recursive) formulas.
      - If cyclic formulas desired, use ITERATE option
      - If ITERATE not present, cyclic formulas automatically
           detected, and an error reported.
       Use ITERATE clause to specify # of iterations or
       Use UNTIL clause to specify convergence conditions
    SELECT x, s FROM dual
    MODEL DIMENSION BY (1 x) MEASURES (1024 s)
    RULES ITERATE (8)
    (
      s[1] = s[1] / 2
    )


    Iteration  1   2     3      4      5     6   7    8
    S value 1024 512   128     64     32    16   8    4
Recursive Model Solving with
Until can contain cyclic (recursive) formulas.
 Model
   - If cyclic formulas desired, use ITERATE option
   - If ITERATE not present, cyclic formulas automatically
        detected, and an error reported.
    Use ITERATE clause to specify # of iterations or
    Use UNTIL clause to specify convergence conditions
 SELECT x, s FROM dual
 MODEL DIMENSION BY (1 x) MEASURES (1024 s)
 RULES ITERATE (10000) UNTIL (PREVIOUS(s[1]) – s[1] <= 1)
 (
   s[1] = s[1] / 2
 )


  Iteration  1   2           3    4     5      6     7      8   9   10
  S value 1024 512         256   128   64     32    16      8   4    2


  previous(s[1]) - s[1] = 512
Recursive Model Solving with
Until can contain cyclic (recursive) formulas.
 Model
   - If cyclic formulas desired, use ITERATE option
   - If ITERATE not present, cyclic formulas automatically
        detected, and an error reported.
    Use ITERATE clause to specify # of iterations or
    Use UNTIL clause to specify convergence conditions
 SELECT x, s FROM dual
 MODEL DIMENSION BY (1 x) MEASURES (1024 s)
 RULES ITERATE (10000) UNTIL (PREVIOUS(s[1]) – s[1] <= 1)
 (
   s[1] = s[1] / 2
 )


  Iteration  1   2         3            4     5    6    7   8   9   10
  S value 1024 512       256           128   64   32   16   8   4    2


         previous(s[1]) - s[1] = 256
Recursive Model Solving with
Until can contain cyclic (recursive) formulas. They are
 Model
   automatically detected, and error is reported. Unless cycles
   are intentional which is indicated with ITERATE option
      Use ITERATE clause to specify # of iterations or
     Use UNTIL to specify convergence conditions. Stop if
      true.
 SELECT x, s FROM dual
 MODEL DIMENSION BY (1 x) MEASURES (1024 s)
 RULES ITERATE (10000) UNTIL (PREVIOUS(s[1]) – s[1] <= 1)
 (
   s[1] = s[1] / 2
 )


  Iteration  1   2      3       4       5      6     7       8        9     10
  S value 1024 512    256      128     64     32    16       8        4      2


                                                    previous(s[1]) - s[1] = 4
    SQL Model
Business Examples
Time Series Calculation (1)
Compute the ratio of current month sales of each product to
sales one year ago, one quarter ago and one month ago.

Assume: Sales cube with product sales per year, quarter, and month
& a time table mapping periods to prior year, quarter and month

     time table: maps t to y_ago, q_ago, m_ago   Sales cube: prod sales per y, q, m

 t             y_ago       q_ago       m_ago     t           product      sales
 1999-         1998-       1998-       1998-     1999-       vcr          100.00
 m01           m01         m10         m12       m01
 1999-         1998-       1998-       1999-     1999-       vcr          120.00
 m02           m02         m11         m01       m02
 …             …           …           …         …           …            …
 1999-         1998-       1998-       NULL      1999-       vcr          360.00
 q01           q01         q04                   q01
 …             …           …           …         …           …            …
Time Series Calculation (2)
• Reference model with Time table acts like look-up table
• CV carries values from the left side to the right side
• Without Model, you need 3 outer joins and a regular join

 SELECT product, sales, r_y_ago, r_q_ago, r_m_ago
 FROM sales_cube
 MODEL REFERENCE r ON (SELECT * from time)
         DIMENSION BY (t) MEASURES (y_ago, q_ago, m_ago)
 MAIN PARTITION BY (product) DIMENSION BY (t)
       MEASURES (sales, 0 r_y_ago, 0 r_q_ago, 0 r_m_ago)
 RULES
 (
    r_y_ago[ANY] = s[CV(t)] / s[ y_ago[CV(t)] ],  -- year ago
    r_q_ago[ANY] = s[CV(t)] / s[ q_ago[CV(t)] ], -- quarter ago
    r_m_ago[ANY] = s[CV(t)] / s[ m_ago[CV(t)] ] -- month ago
 );
Time Series Calculation (3)
Compute the ratio of current period sales of each product to sales a
year ago, quarter ago and a month ago.
For each row, we use the reference Model to find 3 other rows.

Sales cube: prod sales per y, q, m
t           product      sales       r_y_ago   r_q_ago   a_m_ago
1999-       vcr          100.00      0.050     0.280     0.830
m01
1999-       vcr          120.00
m02
…           …            …
1999-       vcr          360.00
q01
1998-       vcr          370.00
q04
…           …            …
Time Series Calculation (3)
Compute the ratio of current period sales of each product to sales a
year ago, quarter ago and a month ago.
For each row, we use the reference Model to find 3 other rows.

Sales cube: prod sales per y, q, m
t           product      sales       r_y_ago   r_q_ago   a_m_ago
1999-       vcr          100.00      0.050     0.280     0.830
m01
1999-       vcr          120.00      0.055     0.330     …
m02
…           …            …
1999-       vcr          360.00      0.160     0.970     null
q01
1998-       vcr          370.00
q04
…           …            …
Time Series Calculation (3)
Compute the ratio of current period sales of each product to sales a
year ago, quarter ago and a month ago.
For each row, we use the reference Model to find 3 other rows.

Sales cube: prod sales per y, q, m
t           product      sales       r_y_ago   r_q_ago   a_m_ago
1999-       vcr          100.00      0.050     0.280     0.830
m01
1999-       vcr          120.00      0.055     0.330     ...
m02
…           …            …
1999-       vcr          360.00      0.160     0.970     null
q01
1998-       vcr          370.00      …         …         null
q04
…           …            …
Recursive Model Solving: Ledger
(1) ledger, I have accounts: Net income, Interest, Taxes, etc.
In my

•I want to have 30 % of my Net income as Interest (F1)
•My Net income is Salary minus Interest, minus Tax (F2)
•Taxes are 38% of Gross (salary–interest) and 28% of Capital_gain (F3)

 SELECT account, b FROM ledger
 MODEL IGNORE NAV DIMENSION (account) MEASURES (balance b)
 RULES ITERATE (..) UNTIL ..
 (
   b[‘interest’] = b[‘net’] * 0.30, --F1
   b[‘net’] = b[‘salary’] – b[‘interest’] – b[‘tax’], --F2
   b[‘tax’] = (b[‘salary’] – b[‘interest’]) * 0.38 + b[‘capital_gain’] *0.28 --F3
 )
Recursive Model Solving: Ledger
(1) ledger, I have accounts: Net income, Interest, Taxes, etc.
In my

•I want to have 30 % of my Net income as Interest (F1)
•My Net income is Salary minus Interest, minus Tax (F2)
•Taxes are 38% of Gross (salary–interest) and 28% of Capital_gain (F3)

 SELECT account, b FROM ledger
 MODEL IGNORE NAV DIMENSION (account) MEASURES (balance b)
 RULES ITERATE (..) UNTIL ..
 (
   b[‘interest’] = b[‘net’] * 0.30, --F1
   b[‘net’] = b[‘salary’] – b[‘interest’] – b[‘tax’],    --F2
   b[‘tax’] = (b[‘salary’] – b[‘interest’]) * 0.38 + b[‘capital_gain’] *0.28 --F3
 )

             net
     F1               F2               F3       two cycles in the formulas
           interest           tax
           interest
Recursive Model Solving: Ledger
(2) ledger, I know Salary & Capital_gains. What are my
 In my
 Net income, Interest expense & Taxes?
 SELECT account, b FROM ledger
 MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b)
 RULES ITERATE (10000) UNTIL (ABS(b[‘net’] - PREVIOUS(b[‘net’])) < 0.01)
 (
      b[‘interest’] = b[‘net’] * 0.30,
      b[‘net’] = b[‘salary’] – b[‘interest’] – b[‘tax’],
      b[‘tax’] = (b[‘salary’] – b[‘interest’]) * 0.38 + b[‘capital_gain’] *0.28
 )


    Input Ledger                                                      Output
  Account          Balance                                  Account         Balance
  salary          100,000    Iterate till accuracy of .01   salary          100,000
  capital_gains    15,000                                   capital_gains    15,000
  net                   0                                   net             100,000
  tax                   0                                   tax              42,220
  interest              0        after 1st iteration
                                                            interest         30,000
 Recursive Model Solving: Ledger
 (2)
In my ledger, I know Salary & Capital_gains. What is my Net & Taxes?

 SELECT account, b FROM ledger
 MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b)
 RULES ITERATE (10000) UNTIL (ABS(b[‘net’] - PREVIOUS(b[‘net’])) < 0.01)
 (
      b[‘interest’] = b[‘net’] * 0.30,
      b[‘net’] = b[‘salary’] – b[‘interest’] – b[‘tax’],
      b[‘tax’] = (b[‘salary’] – b[‘interest’]) * 0.38 + b[‘capital_gain’] *0.28
 )


    Input Ledger
                                                                      Output
  Account          Balance                                  Account         Balance
  salary          100,000    Iterate till accuracy of .01   salary          100,000
  capital_gains    15,000                                   capital_gains    15,000
  net                   0                                   net              27,800
  tax                   0                                   tax              30,800
  interest              0      after 2nd iteration
                                                            interest          8,340
 Recursive Model Solving: Ledger
 (2)
In my ledger, I know Salary & Capital_gains. What is my Net & Taxes?

 SELECT account, b FROM ledger
 MODEL IGNORE NAV DIMENSION BY (account) MEASURES (balance b)
 RULES ITERATE (10000) UNTIL (ABS(b[‘net’] - PREVIOUS(b[‘net’])) < 0.01)
 (
      b[‘interest’] = b[‘net’] * 0.30,
      b[‘net’] = b[‘salary’] – b[‘interest’] – b[‘tax’],
      b[‘tax’] = (b[‘salary’] – b[‘interest’]) * 0.38 + b[‘capital_gain’] *0.28
 )


    Input Ledger                                                      Output
  Account          Balance                                  Account         Balance
  salary          100,000    Iterate till accuracy of .01   salary          100,000
  capital_gains    15,000                                   capital_gains    15,000
  net                   0                                   net              48,735
  tax                   0                                   tax              36,644
  interest              0     after reaching accuracy
                                                            interest         14,620
                                   (26 iterations)
 Financial Functions: NPV
                                                                  amount
                                                                            i
NPV – net present value of a series of periodic cash flows. 
                                                    values        (1  rate) i
                                                             i
                                               
                                                   (1  rate) i

Cash_Flow table

 year     i       prod   amoun    npv
                         t
 1999     0       vcr         -
                         100.00
 2000     1       vcr     12.00
 2001     2       vcr     10.00
 2002     3       vcr     20.00
 1999     0       dvd         -
                         200.00
 2000     1       dvd     22.00

 2001     2       dvd     12.00
 Financial Functions: NPV
                                                                   amount
                                                                             i
NPV – net present value of a series of periodic cash flows. 
                                                     values        (1  rate) i
                                                              i
                                                
                                                    (1  rate) i

Cash_Flow table

 year      i      prod   amoun    npv
                         t
 1999      0      vcr         -
                                        amount[1]/power(1+rate,1) + npv[1-1]
                         100.00
 2000      1      vcr     12.00
 2001      2      vcr     10.00
 2002      3      vcr     20.00
 1999      0      dvd         -
                         200.00
 2000      1      dvd     22.00

 2001      2      dvd     12.00
 Financial Functions: NPV
                                                                    amount
                                                                              i
NPV – net present value of a series of periodic cash flows. 
                                                      values        (1  rate) i
                                                               i
                                                 
                                                     (1  rate) i

Cash_Flow table

 year      i      prod   amoun    npv
                         t
 1999      0      vcr         -
                         100.00
 2000      1      vcr     12.00         amount[2]/power(1+rate,2) + npv[2-1]
 2001      2      vcr     10.00
 2002      3      vcr     20.00
 1999      0      dvd         -
                         200.00
 2000      1      dvd     22.00

 2001      2      dvd     12.00

 2002      3      dvd     14.00
 Financial Functions: NPV
                                                                   amount
                                                                             i
NPV – net present value of a series of periodic cash flows. 
                                                     values        (1  rate) i
                                                              i
                                                
                                                    (1  rate) i

Cash_Flow table

 year      i      prod   amoun    npv
                         t
 1999      0      vcr         -
                         100.00
 2000      1      vcr     12.00
 2001      2      vcr     10.00         amount[3]/power(1+rate,3) + npv[3-1]

 2002      3      vcr     20.00
 1999      0      dvd         -
                         200.00
 2000      1      dvd     22.00

 2001      2      dvd     12.00
 Financial Functions: NPV
                                                                       amount
                                                                                 i
NPV – net present value of a series of periodic cash flows. 
                                                         values        (1  rate) i
                                                                  i
                                                    
                                                        (1  rate) i

Cash_Flow table

  year     i      prod   amoun    npv
                         t
  1999     0      vcr         -
                         100.00
  2000     1      vcr     12.00
  2001     2      vcr     10.00
  2002     3      vcr     20.00         amount[i]/power(1+rate, i) + npv[i-1]
  1999     0      dvd         -
                         200.00         npv[ANY] ORDER BY i
  2000     1      dvd     22.00         = amount[ CV(i) ] / power(1+rate, CV(i))
                                         + npv[CV(i) – 1]
  2001     2      dvd     12.00

  2002     3      dvd     14.00
Financial Functions: NPV (2)
                                                                        amount
                                                                                  i
NPV – Net present value of a series of                              
      periodic cash flows.                                              (1  rate) i


Cash_Flow table and npv for rate = 0.14
                                                SELECT year, i, prod, amount, npv
year      i    prod      amoun            npv   FROM cash_flow
                         t                      MODEL PARTITION BY (prod)
1999      0    vcr            -         -        DIMENSION BY (i)
                         100.00    100.00        MEASURES
2000      1    vcr         12.00    -89.47          (amount, 0 npv, year)
                                                 RULES
2001      2    vcr         10.00    -81.78       (
2002      3    vcr         20.00    -68.28         npv[ 0] = amount[0],
                                                   npv[i !=0] ORDER BY i
1999      0    dvd            -         -           = amount[ CV() ] /
                         200.00    200.00             POWER(1.14, CV() ) +
2000      1    dvd         22.00        -             npv[CV() - 1]
                                   180.70        )
2001      2    dvd         12.00        -
                                   171.47
 SQL Model
Performance
SQL Model – Time Series
 Earlier example: ratio of sales to year, quarter and month ago
 SELECT product, sales, r_y_ago, r_q_ago, r_m_ago
 FROM sales_cube
 MODEL REFERENCE r ON (SELECT * from time)
         DIMENSION BY (t) MEASURES (y_ago, q_ago, m_ago)
 MAIN PARTITION BY (product) DIMENSION BY (t)
       MEASURES (sales, 0 r_y_ago, 0 r_q_ago, 0 r_m_ago)
 RULES
 (
    r_y_ago[ANY] = s[CV(t)] / s[ y_ago[CV(t)] ], -- year ago
    r_q_ago[ANY] = s[CV(t)] / s[ q_ago[CV(t)] ], -- quarter ago
    r_m_ago[ANY] = s[CV(t)] / s[ m_ago[CV(t)] ] -- month ago
 );

 • ANSI SQL version needs outer join for each formula plus a join for
      reference model.
 • N formulas, M reference models  N+M joins  4 joins in this example:
      sales_cube  time  sales_cube  sales_cube  sales_cube
SQL Model vs. ANSI Joins

  400
                                                 ANSI joins
  350

  300

  250

  200

  150                                               SQL Model

  100

   50
                                             Number of rules or joins

        1   2   3   4   5   6   7   8   9   10    11   12     13   14
Summary
 New facility for spreadsheet-like computations in SQL
 High Performance
    –   Replaces multiple joins, unions
    –   Scalable in size and parallel processing
    –   Powerful optimizations
 Collaborative analysis
 Move external processing such as spreadsheets into
  RDBMs for manageability and consolidation
Next Steps….

 Demonstration at Oracle DEMOgrounds
    –   Exhibit hall, Booth 1326, Database Area
    –   Monday: 5:00 PM - 8:00, Tuesday: 10:30 - 1:00, 3:00 -
        6:00,
        Wednesday: 11:00 - 4:30, Thursday: 10:30 - 2:00
 Hands-on Lab
    –   Marriott Hotel - Golden Gate B1
    –   Lab Section: Use Information from your Data Warehouse
        Lesson 1: Using the SQL Model clause
    –   Monday: 10:30 - 5:00, Tuesday: 8:30 - 12:30, 3:00 -
        5:00,
        Wednesday: 8:30 - 4:30, Thursday: 8:30 - 2:30
Reminder –
please complete the
OracleWorld online session
survey

Thank you.
Q U E S T I O N S
 A N S W E R S