Docstoc

Advanced Teradata Concepts

Document Sample
Advanced Teradata Concepts Powered By Docstoc
					                              Advanced
                              Teradata
                              Concepts



                         Last Updated : 21st Dec 2004



  Center of Excellence
Data Warehousing Group
Topics
 Primary and Secondary Indexes

 Join Processing

 Join Indexes

 Hash Indexes

 Partitioned Primary Indexes

 Collect Statistics

 Priority Scheduler

 Teradata Dual Active Server
Primary and Secondary Indexes
Indexes

 Teradata provides numerous indexing options that
  can improve query performance for different types
  of queries and workloads. Following kinds of
  indexes are available:

   Primary Index
   Secondary Indexes
   Join Indexes
   Hash Indexes
   Partitioned Primary Indexes.
Primary Indexes
 In Teradata, Primary Index is a mechanism to
  assign and store a data row in an AMP.

 Since primary index is used to store data rows,
  retrieving data using primary index is very
  efficient.

 Primary Index can be Unique or Non-Unique.

 Choosing Primary Index is critical as it affects the
  data distribution across the processing units
  (AMPs) and hence affects the performance.
Primary Index Choice Criteria
 Access Demographics – Choose the column most
  frequently used for access to maximize the
  number of one AMP operations.

 Distribution Demographics – Better distribution
  optimizes parallel processing.

 Volatility – Changing PI may cause the row itself to
  be moved to another AMP. Stable PI reduces data
  movement overhead.
UPI and NUPI
 UPI
     Best distribution due to unique value.
     One AMP operation and uses only one I/O.
     Best performance.


 NUPI
     Good distribution for „near unique‟ values.
     Duplicate PI rows go to same block. No extra I/O if all
      duplicate rows fit in single block.
     Duplicate row check required if there is no USI
      defined.
     Multiple I/Os required if rows do not fit in a single
      data block.
UPI and NUPI (cont.)
 Highly non-unique values cause skewed distribution.

 Highly non-unique values cause extra overhead in
  duplicate row check.
 Multi-Column PI gives better distribution.

 But as the number of column increases the index becomes
  less usable.

 Partial values can not be used for PI access.

 Do not include a column for index selection that does not
  improve the selectivity of the index.
Secondary Indexes

 Secondary Index values are stored in sub tables.
 May be unique or non unique.
 Teradata implements USI and NUSI differently.

                                                       Index Subtable
        Sec. Index value   Hash Algorithm   SI Value       BT Row ID

                                            SI value       BT Row ID



                                                          Base Table
Secondary Indexes
 USI are hash distributed across all AMPs.
   Sub table rows may reside in a AMP other than the base
    table row.
   USI access involved two-AMP operation.



 NUSI are implemented on a AMP local basis.
   Sub table rows located in the same AMP of base table
    rows.
   NUSI access involved all-AMP operation.
Secondary Index Considerations
 Need additional storage to hold sub-table.

 Need additional I/O.

 Choose columns for NUSI candidate only those having
  frequent access.

 If “COLLECTed STATISTICS” are not available Teradata
  may not choose NUSI as the access path.

 Use EXPLAIN facility to see the plan chosen by the
  optimizer.
NUSI Bit Mapping
 Used when multiple NUSI are being used will AND
  condition.

 Identifies common Row Ids in the satisfied by the query
  before retrieving the base table rows.

                SI Value   Row Id
                                    Indx1
                                            Multiple-column
                                            secondary indexes are less
                                            usable. Define multiple
                                            secondary indexes to allow
                                            bit mapping.


                                      Indx2
Row Access Methods

      PI/NUPI                USI                        NUSI            FTS

       PI Value            USI Value             NUSI Value             Value




   Hashing Algorithm   Hashing Algorithm      Hashing Algorithm     Hashing Algorithm




      Sub Table           Sub Table        Sub Table              Sub Table




     Base Table          Base Table        Base Table             Base Table
Value Ordered NUSI

 NUSI sub-tables are local to the AMP
  corresponding to its base table and, by default,
  are sorted in row hash of the secondary index
  column.

 Value Ordered NUSI sub-tables are sorted by
  secondary index column-value rather than its row
  hash.

 Value Ordered NUSI are efficient for processing
  queries with range conditions and inequality
  conditions on the secondary index column.
Value Ordered NUSI (Cont.)

CREATE INDEX Indx_Shipdate(L_Shipdate) on LineItem;

SELECT * FROM LineItem WHERE L_Shipdate < DATE '1992-02-28';

      …
      …
      3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an
         all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE <
         DATE '1992-02-28'") into Spool 1 (group_amps), which is built
         locally on the AMPs. The input table will not be cached in memory,
         but it is eligible for synchronized scanning. The size of Spool 1
         is estimated with high confidence to be 1,764 rows. The estimated
         time for this step is 2.58 seconds.
      …
      …


     Optimizer did not choose Secondary Index. Why ?
Value Ordered NUSI (Cont.)

CREATE INDEX Indx_Shipdate(L_Shipdate) ORDER BY VALUES ON
     Lineitem;

SELECT * FROM Lineitem WHERE L_Shipdate < DATE '1992-02-28';
      …
      …
      3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of a
         traversal of index # 4 with a range constraint of (
         "TPCH.lineitem.Field_1035 < DATE '1992-02-28'") extracting row ids
         only into Spool 2 (all_amps), which is built locally on the AMPs.
         Then we do a SORT to order Spool 2 by row id eliminating duplicate
         rows. The size of Spool 2 is estimated to be 1,764 rows. The
         estimated time for this step is 0.04 seconds.
      4) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of row
         ids from Spool 2 (Last Use) with no residual conditions into Spool
         1 (group_amps), which is built locally on the AMPs. The input
         table will not be cached in memory, but it is eligible for
         synchronized scanning. The size of Spool 1 is estimated with high
         confidence to be 1,764 rows. The estimated time for this step is
         2.58 seconds.
      …
      …
Join Processing
Join Processing

 Each AMP performs join processing in parallel.

 Optimizer chooses best join strategy based on
  Available indexes, and
  Data Demographics (Collect Statistics/Dynamic
    Sampling)

 Rows must be on the same AMP for matching.

 Teradata temporarily moves the rows to same AMP if
  they are not in the same AMP for join. This is called
  row redistribution.
Join Processing

 General Join Scenarios:

   Join column is the PI of both the tables.

   Join column is PI of one of the tables.

   Join column is not a PI of either of the table.
Case 1- PI of both the tables

 Rows taking part in the join are already in the
  same AMP.

 No data movement is necessary.

 This is the best-case scenario.
Case 1 - Example
CREATE SET TABLE EMPLOYEE            CREATE SET TABLE LOCATION
  (                                    (
   EmpNo SMALLINT                        EmpNo INTEGER,
   Name VARCHAR(12),                     Loc VARCHAR(25)
   DeptNo SMALLINT,                   )
   JobTitle VARCHAR(12),             PRIMARY INDEX ( EmpNo );
   Salary DECIMAL(8,2)
   DOB DATE,
  )
UNIQUE PRIMARY INDEX ( EmpNo )




                      SELECT E.EmpNo, E.Name, L.Loc
                      FROM Employee E, Location L
                      WHERE E.EmpNo = L.EmpNo;
Case 1 - Explain Output

 1) First, we lock a distinct PERSONNEL."pseudo table" for read on a
    RowHash to prevent global deadlock for PERSONNEL.LOCATION.
  2) Next, we lock a distinct PERSONNEL."pseudo table" for read on a
    RowHash to prevent global deadlock for PERSONNEL.EMPLOYEE.
  3) We lock PERSONNEL.LOCATION for read, and we lock
    PERSONNEL.EMPLOYEE for read.
  4) We do an all-AMPs JOIN step from PERSONNEL.EMPLOYEE by way of a
    RowHash match scan with no residual conditions, which is joined to
    PERSONNEL.LOCATION. PERSONNEL.EMPLOYEE and PERSONNEL.LOCATION are
    joined using a merge join, with a join condition of (
    "PERSONNEL.EMPLOYEE.EmpNo = PERSONNEL.LOCATION.EmpNo"). The
    result goes into Spool 1 (group_amps), which is built locally on
    the AMPs. The size of Spool 1 is estimated with low confidence to
    be 24 rows. The estimated time for this step is 0.04 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
    in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
    statement 1. The total estimated time is 0.04 seconds.
Case 2 - PI of one of the tables

 One table has its rows on the target AMP.

 Rows of the other table need to be redistributed to
  their target AMPs by the hash code of the join
  column value.

 If the table is small, optimizer may choose to
  duplicate the table on all AMPs
Case 2 – Example
CREATE SET TABLE EMPLOYEE               CREATE SET TABLE Department
  (                                        (
   EmpNo SMALLINT                           DeptNo SMALLINT,
   Name VARCHAR(12),                        DeptName VARCHAR(14),
   DeptNo SMALLINT,                         Loc CHAR(3),
   JobTitle VARCHAR(12),                    MgrNo SMALLINT
   Salary DECIMAL(8,2)                    )
   DOB DATE,                            UNIQUE PRIMARY INDEX ( DeptNo );
  )
UNIQUE PRIMARY INDEX ( EmpNo )




                    SELECT E.EmpNo, E.Name, D.DeptName
                    FROM Employee E, Department D
                    WHERE E.Deptno = D.DeptNo;
Case 2 – Explain Output
   …
   …
   4) We do an all-AMPs RETRIEVE step from PERSONNEL.EMPLOYEE by way of
     an all-rows scan with a condition of ("(PERSONNEL.EMPLOYEE.DeptNo
     >= 100) AND ((PERSONNEL.EMPLOYEE.DeptNo <= 900) AND (NOT
     (PERSONNEL.EMPLOYEE.DeptNo IS NULL )))") into Spool 2 (all_amps),
     which is redistributed by hash code to all AMPs. Then we do a
     SORT to order Spool 2 by row hash. The size of Spool 2 is
     estimated with no confidence to be 5 rows. The estimated time for
     this step is 0.03 seconds.
   5) We do an all-AMPs JOIN step from PERSONNEL.Department by way of a
     RowHash match scan with no residual conditions, which is joined to
     Spool 2 (Last Use). PERSONNEL.Department and Spool 2 are joined
     using a merge join, with a join condition of ("DeptNo =
     PERSONNEL.Department.DeptNo"). The result goes into Spool 1
     (group_amps), which is built locally on the AMPs. The size of
     Spool 1 is estimated with no confidence to be 5 rows. The
     estimated time for this step is 0.04 seconds.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
       statement 1. The total estimated time is 0.08 seconds.
Case 3 - not a PI of either of the table

 Rows of both the tables need to be redistributed to
  their target AMPs by the hash code of the join
  column value.

 Optimizer might choose to duplicate the smaller
  table on all AMPs.

 This join scenario involves maximum number of
  data movement.
Case 3 - Example
CREATE SET TABLE Lineitem               CREATE SET TABLE Partsupp
  (                                        (
   L_ORDERKEY INTEGER NOT NULL,             PS_PARTKEY INTEGER NOT NULL,
   L_PARTKEY INTEGER NOT NULL,              PS_SUPPKEY INTEGER NOT NULL,
   L_SUPPKEY INTEGER NOT NULL,              PS_AVAILQTY INTEGER NOT NULL,
   L_LINENUMBER INTEGER NOT NULL,           PS_SUPPLYCOST DECIMAL(15,2) NOT NULL,
   L_QUANTITY DECIMAL(15,2) NOT NULL,       PS_COMMENT VARCHAR(199)
  )                                       )
PRIMARY INDEX ( L_ORDERKEY );           PRIMARY INDEX ( PS_PARTKEY );




                  SELECT L_Suppkey, L_Quantity
                  FROM Lineitem, Partsupp
                  WHERE L_Suppkey = Ps_Suppkey;
Case 3 – Explain Output

   …
   …
   4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from TPCH.partsupp by way of
         an all-rows scan with no residual conditions into Spool 2
         (all_amps), which is redistributed by hash code to all AMPs.
         Then we do a SORT to order Spool 2 by row hash. The size of
         Spool 2 is estimated with low confidence to be 31,938 rows.
         The estimated time for this step is 0.79 seconds.
       2) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of
         an all-rows scan with no residual conditions into Spool 3
         (all_amps), which is redistributed by hash code to all AMPs.
         Then we do a SORT to order Spool 3 by row hash. The result
         spool file will not be cached in memory. The size of Spool 3
         is estimated with low confidence to be 240,480 rows. The
         estimated time for this step is 6.27 seconds.
Case 3 – Explain Output Cont…

   5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
       RowHash match scan, which is joined to Spool 3 (Last Use). Spool
       2 and Spool 3 are joined using a merge join, with a join condition
       of ("L_SUPPKEY = PS_SUPPKEY"). The result goes into Spool 1
       (group_amps), which is built locally on the AMPs. The result
       spool file will not be cached in memory. The size of Spool 1 is
       estimated with no confidence to be 15,661,999 rows. The estimated
       time for this step is 1 minute and 1 second.
   6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
   -> The contents of Spool 1 are sent back to the user as the result of
      statement 1. The total estimated time is 1 minute and 7 seconds.
Join Strategies

 Nested Join

 Merge Join

 Product Join
Nested Join

 Optimizer choose this join strategy when

       SELECT ...
       FROM Table_1, Table_2
       WHERE Table_1.Col1 = Table_2.<Any Index>
       AND Table_1.<Unique Index> = <value>;

       Example:

       SELECT E.Name, D.DeptName
       FROM Employee E, Department D
       WHERE E.DeptNo = D.DeptNo
       AND E.Name = 'Sandy M';
Nested Join Explain Output


  1) First, we do a two-AMP JOIN step from PERSONNEL.E by way of unique
     index # 4 "PERSONNEL.E.Name = 'Sandy M'" with a residual condition
     of ("(PERSONNEL.E.DeptNo >= 100) AND ((PERSONNEL.E.DeptNo <= 900)
     AND (NOT (PERSONNEL.E.DeptNo IS NULL )))"), which is joined to
     PERSONNEL.D by way of the unique primary index "PERSONNEL.D.DeptNo
     = PERSONNEL.E.DeptNo". PERSONNEL.E and PERSONNEL.D are joined
     using a nested join, with a join condition of ("(1=1)"). The
     result goes into Spool 1 (one-amp), which is built locally on the
     AMPs. The size of Spool 1 is estimated with high confidence to be
     1 row. The estimated time for this step is 0.04 seconds.
   -> The contents of Spool 1 are sent back to the user as the result of
     statement 1. The total estimated time is 0.04 seconds.
Merge Join
 Commonly done when the join conditions are based on
  equality.

 Steps
   Identify the smaller table.
   Put the qualifying rows from one or both table into spool.
   Move the spool rows to the AMPs based on join column hash (if
    required).
   Sort the spool rows by join column hash value (if necessary).
   Compare those rows with matching join column hash values.



    Example : Case 1, Case 2 and Case 3 as described.
Product Join
 Most general for of join. A X B.

 Optimizer choose product join usually in following
  conditions
   WHERE clause is missing.
   Join condition is not based on equality condition.


 Steps:
   Identify the smaller table
   Duplicate it in spool on all AMPs.
   Join each spool row of the smaller table to every row of
    the larger table.
Exclusion Merge Join
CREATE SET TABLE EMPLOYEE
                                   CREATE SET TABLE Department
  (
                                      (
   EmpNo SMALLINT
                                       DeptNo SMALLINT NOT NULL,
   Name VARCHAR(12),
                                       DeptName VARCHAR(14),
   DeptNo SMALLINT NOT NULL,
                                       Loc CHAR(3),
   JobTitle VARCHAR(12),
                                       MgrNo SMALLINT
   Salary DECIMAL(8,2)
                                     )
   DOB DATE,
                                   UNIQUE PRIMARY INDEX ( DeptNo );
  )
UNIQUE PRIMARY INDEX ( EmpNo )




          SELECT EmpNo, Name, Salary
          FROM Employee
          WHERE DeptNo NOT IN ( SELECT DeptNo FROM Department);
Exclusion Merge Join Explain Output
   4) We do an all-AMPs RETRIEVE step from PERSONNEL.employee by way of
      an all-rows scan with no residual conditions into Spool 2
      (all_amps), which is redistributed by hash code to all AMPs. Then
      we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
      estimated with high confidence to be 21 rows. The estimated time
      for this step is 0.03 seconds.
    5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of an
      all-rows scan, which is joined to PERSONNEL.department. Spool 2
      and PERSONNEL.department are joined using an exclusion merge join,
      with a join condition of ("DeptNo = PERSONNEL.department.DeptNo").
      The result goes into Spool 1 (group_amps), which is built locally
      on the AMPs. The size of Spool 1 is estimated with index join
      confidence to be 21 rows. The estimated time for this step is
      0.03 seconds.
    6) Finally, we send out an END TRANSACTION step to all AMPs involved
      in processing the request.
    -> The contents of Spool 1 are sent back to the user as the result of
      statement 1. The total estimated time is 0.06 seconds.
Exclusion Merge Join Example
         AMP 1                AMP 2                AMP 3                AMP 4
  1005           300   1000           300   1003           200   1001           400

  1009           100   1002           100   1004           300   1007           400

  1014           500   1006           900   1010           200   1008           500
                                                                                      Employee
  1019           100   1011           200   1013           400   1012           300

  1017           300   1015           200   1018           400   1016           700   Department

                 400                  100                  300                  200   Result
                 600                  500




  1001           400   1009           100   1005           300   1010           200

  1007           400   1019           100   1017           300   1011           200

  1013           400   1002           100   1000           300   1015           200

  1018           400   1014           500   1004           300   1003           200

                       1018           500   1012           300   1006           900

                       1016           700


  400                  100                  300                  200

  600                  500
Join Indexes
Join Indexes

 Join Index is an index structure that stores and
  maintains results from joining two or more tables.

 Optimizer resolves the query using join index, rather
  than performing joins every time the query is
  executed.

 Teradata supports a variety of Join Indexes such as:
   Multi-table Join Indexes
   Single-table join Indexes
   Aggregate Join Indexes
Join Index Example
CREATE JOIN INDEX EmpDept AS              SELECT e.Name, d.DeptName, e.Salary
SELECT (e.DeptNo, d.DeptName) ,           FROM Employee e INNER JOIN
       (E.Name, E.Salary)                      Department d
FROM Employee e INNER JOIN                ON e.DeptNo = d.DeptNo
     Department d                         ORDER BY d.DeptName;
ON   e.DeptNo = d.DeptNo;



Does the index cover the query ?
       1) First, we lock a distinct PERSONNEL."pseudo table" for read on a
          RowHash to prevent global deadlock for PERSONNEL.EmpDept.
        2) Next, we lock PERSONNEL.EmpDept for read.
        3) We do an all-AMPs RETRIEVE step from PERSONNEL.EmpDept by way of
          an all-rows scan with no residual conditions into Spool 1
          (group_amps), which is built locally on the AMPs. The size of
          Spool 1 is estimated with low confidence to be 6 rows. The
          estimated time for this step is 0.03 seconds.
       ……
       ……
Join Index Example
CREATE JOIN INDEX EmpDept AS      SELECT e.Name, d.DeptName, e.Salary, e.YrsExp
SELECT (e.DeptNo, d.DeptName) ,   FROM Employee e INNER JOIN
       (E.Name, E.Salary)               Department d
FROM Employee e INNER JOIN        ON e.DeptNo = d.DeptNo
     Department d                 ORDER BY e.DeptName;
ON   e.DeptNo = d.DeptNo;




Does the index cover the query ?
Join Index Example
                                CREATE JOIN INDEX EmpDept AS
                                SELECT (e.DeptNo, d.DeptName) ,
                                       (E.Name, E.Salary)
                                FROM Employee e LEFT JOIN
                                    Department d
                                ON   e.DeptNo = d.DeptNo;


 SELECT e.Name, d.DeptName, e.Salary,                 SELECT e.Name, d.DeptName, e.Salary,
 FROM Employee e INNER JOIN                           FROM Employee e LEFT JOIN
      Department d                                         Department d
 ON e.DeptNo = d.DeptNo                               ON e.DeptNo = d.DeptNo
 ORDER BY e.DeptName;                                 ORDER BY e.DeptName;


Does the index cover the query ?                      Does the index cover the query ?

Note: A join index with outer join covers both inner join query as well as outer join query.
Join Index Example
CREATE JOIN INDEX EmpDept AS               SELECT e.Name, d.DeptName, e.Salary , c.Hours
SELECT (e.EmpNo, e.DeptNo ,d.DeptName ),   FROM Employee e,
      (e.Name ,e.Salary )                      Department d,
FROM Employee e JOIN                           Charges c
    Department d                           WHERE e.DeptNo = d.DeptNo
ON e.DeptNo = d.DeptNo                     AND    e.EmpNo = c.EmpNo;

                   Does the index cover the query ?
         4) We do an all-AMPs RETRIEVE step from PERSONNEL.c by way of an
           all-rows scan with no residual conditions into Spool 2 (all_amps),
           which is redistributed by hash code to all AMPs. Then we do a
           SORT to order Spool 2 by row hash. The size of Spool 2 is
           estimated with low confidence to be 16 rows. The estimated time
           for this step is 0.03 seconds.
         5) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
           RowHash match scan, which is joined to PERSONNEL.EMPDEPT. Spool 2
           and PERSONNEL.EMPDEPT are joined using a merge join, with a join
           condition of ("PERSONNEL.EMPDEPT.EmpNo = EmpNo"). The result goes
           into Spool 1 (group_amps), which is built locally on the AMPs.
           The size of Spool 1 is estimated with index join confidence to be
           32 rows. The estimated time for this step is 0.04 seconds.
Secondary Index On Top Of Join Index
 Further performance improvement can be achieved by defining a
 Secondary Index on top of join index to avoid full table scan of the
 join index table.

 SELECT C_Name, C_Address, O_Orderdate, O_TotalPrice
 FROM Customer JOIN
      Ordertbl
 ON C_Custkey = O_Custkey
 WHERE O_Orderdate BETWEEN 950101 AND 970101;


 CREATE JOIN INDEX OrderByCust AS
 SELECT (C_Name ,C_Address ),
        (O_Orderdate ,O_TotalPrice)
 FROM Customer INNER JOIN Ordertbl
 ON C_CustKey = O_Custkey


 CREATE INDEX(O_Orderdate) ORDER BY VALUES ON OrderByCustomer;

 COLLECT STATISTICS ON OrderByCust INDEX(O_Orderdate);
Sparse Indexes
Sparse Index can be used to index a portion of a table.

 CREATE JOIN INDEX OrderByCust AS
 SELECT (C_Custkey, C_Name ,C_Address, O_Orderdate),
        (O_TotalPrice)
 FROM Customer INNER JOIN Ordertbl
 ON C_CustKey = O_Custkey
 WHERE O_Orderdate > DATE „2004-01-01'
 PRIMARY INDEX(C_Custkey);

SELECT C_Name ,C_Address,
       O_Orderdate ,O_TotalPrice
FROM Customer INNER JOIN Ordertbl
ON C_CustKey = O_Custkey
WHERE O_Orderdate BETWEEN DATE '2004-06-01' AND DATE '2004-12-31';



Shorter index table enables faster full table scan.
Join Index - Compressed Form
Uncompressed Form
 CREATE JOIN INDEX OrderByCust AS
 SELECT C_Custkey, C_Name ,C_Address, O_Orderdate, O_TotalPrice
 FROM Customer INNER JOIN Ordertbl
 ON C_CustKey = O_Custkey
 PRIMARY INDEX(C_Custkey);


          SELECT SUM(CurrentPerm) FROM DBC.TableSize
          WHERE DataBaseName = 'tpch' AND
          TableName = 'OrderByCust';


                 Sum(CurrentPerm)
                -------------------------
                            4,791,296
Join Index - Compressed Form
Compressed Form

 CREATE JOIN INDEX OrderByCust AS
 SELECT (C_Custkey, C_Name ,C_Address), (O_Orderdate, O_TotalPrice)
 FROM Customer INNER JOIN Ordertbl
 ON C_CustKey = O_Custkey
 PRIMARY INDEX(C_Custkey);


           SELECT SUM(CurrentPerm) FROM DBC.TableSize
           WHERE DataBaseName = 'tpch' AND
           TableName = 'OrderByCust';


                   Sum(CurrentPerm)
                  -------------------------
                              1,209,856
Single-Table Join Indexes
Single-table join indexes help in performance improvement in certain
kind of joins by partially covering the query.

                 SELECT d.DeptName, e.Name
                 FROM Employee e,Department d
                 WHERE e.DeptNo = d.DeptNo;

      4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an
        all-rows scan with no residual conditions into Spool 2 (all_amps),
        which is redistributed by hash code to all AMPs. Then we do a
        SORT to order Spool 2 by row hash. The size of Spool 2 is
        estimated with high confidence to be 21 rows. The estimated time
        for this step is 0.03 seconds.
      5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash
        match scan with no residual conditions, which is joined to Spool 2
        (Last Use). PERSONNEL.d and Spool 2 are joined using a merge join,
        with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The
        result goes into Spool 1 (group_amps), which is built locally on
        the AMPs. The size of Spool 1 is estimated with low confidence to
        be 18 rows. The estimated time for this step is 0.04 seconds.
                                                                             Cont..
Single-Table Join Indexes
                CREATE JOIN INDEX EmpDept AS
                SELECT Empno, Deptno, Name
                FROME Employee
                PRIMARY INDEX(DeptNo);

As the join index covers the employee part of the query, Optimizer joins
Department table with the join index instead of Employee table itself.


   4) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash
      match scan with no residual conditions, which is joined to
      PERSONNEL.EmpDept. PERSONNEL.d and PERSONNEL.EmpDept are joined
      using a merge join, with a join condition of (
      "PERSONNEL.EmpDept.DeptNo = PERSONNEL.d.DeptNo"). The result goes
      into Spool 1 (group_amps), which is built locally on the AMPs.
      The size of Spool 1 is estimated with low confidence to be 18 rows.
      The estimated time for this step is 0.04 seconds.
Single-Table Join Indexes

                        SELECT d.DeptName, e.Name, e.Salary
                        FROM Employee e,Department d
                        WHERE e.DeptNo = d.DeptNo;
          4) We do an all-AMPs RETRIEVE step from PERSONNEL.e by way of an
            all-rows scan with no residual conditions into Spool 2 (all_amps),
            which is redistributed by hash code to all AMPs. Then we do a
            SORT to order Spool 2 by row hash. The size of Spool 2 is
            estimated with high confidence to be 21 rows. The estimated time
            for this step is 0.03 seconds.
          5) We do an all-AMPs JOIN step from PERSONNEL.d by way of a RowHash
            match scan with no residual conditions, which is joined to Spool 2
            (Last Use). PERSONNEL.d and Spool 2 are joined using a merge join,
            with a join condition of ("DeptNo = PERSONNEL.d.DeptNo"). The
            result goes into Spool 1 (group_amps), which is built locally on
            the AMPs. The size of Spool 1 is estimated with low confidence to
            be 18 rows. The estimated time for this step is 0.04 seconds.

 Note : Optimizer went for full table scan of the Employee table instead of using Join Index
 because the existing join index EmpDept does not fully cover the Employee part of the query.
Single-Table Join Indexes
ROWID can be included in the join index definition to enable rowid
join for partially covered queries.

 CREATE JOIN INDEX JIorders AS
 SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE, ROWID)
 FROM Ordertbl
 PRIMARY INDEX (O_CUSTKEY);


    SELECT C_Name, C_Address, O_Orderdate,
           O_TotalPrice, O_Orderstatus
    FROM Customer, Ordertbl
    WHERE C_Custkey = O_Custkey
    AND C_Nationkey = 10;



                                                                Cont…
Single-Table Join Indexes

    5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
       match scan with a condition of ("TPCH.Customer.C_NATIONKEY = 10"),
       which is joined to TPCH.JIorders. TPCH.Customer and TPCH.JIorders
       are joined using a merge join, with a join condition of (
       "TPCH.Customer.C_CUSTKEY = TPCH.JIorders.O_CUSTKEY"). The input
       table TPCH.JIorders will not be cached in memory. The result goes
       into Spool 2 (all_amps), which is redistributed by hash code to
       all AMPs. Then we do a SORT to order Spool 2 by row hash. The
       size of Spool 2 is estimated with no confidence to be 9,000 rows.
       The estimated time for this step is 0.44 seconds.
     6) We do an all-AMPs JOIN step from TPCH.Ordertbl by way of a RowHash
       match scan with no residual conditions, which is joined to Spool 2
       (Last Use). TPCH.Ordertbl and Spool 2 are joined using a merge
       join, with a join condition of ("Field_2 = TPCH.Ordertbl.RowID").
       The input table TPCH.Ordertbl will not be cached in memory, but it
       is eligible for synchronized scanning. The result goes into Spool
       1 (group_amps), which is built locally on the AMPs. The size of
       Spool 1 is estimated with no confidence to be 9,000 rows. The
       estimated time for this step is 0.56 seconds.
Single-Table Join Indexes
Unique Primary Index column can also be used in place of ROWID
as shown in the example below.

  CREATE JOIN INDEX JIorders AS
  SELECT (O_CUSTKEY ), (O_ORDERDATE,O_TOTALPRICE,
                        O_ORDERKEY)
  FROM Ordertbl
  UNIQUE PRIMARY INDEX (O_CUSTKEY);

  SELECT C_Name, C_Address, O_Orderdate,
         O_TotalPrice, O_Orderstatus
  FROM Customer, Ordertbl
  WHERE C_Custkey = O_Custkey
  AND C_Nationkey = 10;
Aggregate Join Index

Aggregate Join Index are used to store pre-calculated summary data.

SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;

       3) We do an all-AMPs SUM step to aggregate from TPCH.lineitem by way
         of an all-rows scan with no residual conditions, and the grouping
         identifier in field 1. Aggregate Intermediate Results are
         computed globally, then placed in Spool 3. The input table will
         not be cached in memory, but it is eligible for synchronized
         scanning. The aggregate spool file will not be cached in memory.
         The size of Spool 3 is estimated with low confidence to be 238,809
         rows. The estimated time for this step is 10.04 seconds.
       4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
         an all-rows scan into Spool 1 (group_amps), which is built locally
         on the AMPs. The result spool file will not be cached in memory.
         The size of Spool 1 is estimated with low confidence to be 238,809
         rows. The estimated time for this step is 1.56 seconds.
Aggregate Join Index
CREATE JOIN INDEX AS
SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;



SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;



   3) We do an all-AMPs RETRIEVE step from TPCH.JIAggLineItem by way of
     an all-rows scan with no residual conditions into Spool 1
     (group_amps), which is built locally on the AMPs. The input table
     will not be cached in memory, but it is eligible for synchronized
     scanning. The result spool file will not be cached in memory.
     The size of Spool 1 is estimated with high confidence to be
     238,809 rows. The estimated time for this step is 0.19 seconds.
Aggregate Join Index
CREATE JOIN INDEX AS
SELECT L_PartKey, L_ShipDate, SUM(L_Quantity) AS SumQty
FROM Lineitem GROUP BY 1,2;

                                                                 Does the
SELECT L_ShipDate, SUM(L_Quantity) AS SumQty
                                                                 index cover
FROM Lineitem GROUP BY 1;
                                                                 the query ?
     3) We do an all-AMPs SUM step to aggregate from TPCH.JIAggLineItem by
       way of an all-rows scan with no residual conditions, and the
       grouping identifier in field 1. Aggregate Intermediate Results
       are computed globally, then placed in Spool 3. The input table
       will not be cached in memory, but it is eligible for synchronized
       scanning. The size of Spool 3 is estimated with no confidence to
       be 491 rows. The estimated time for this step is 0.84 seconds.
     4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
       an all-rows scan into Spool 1 (group_amps), which is built locally
       on the AMPs. The size of Spool 1 is estimated with no confidence
       to be 491 rows. The estimated time for this step is 0.04 seconds.
Hash Indexes
Hash Indexes
Index file structures that share properties with single table join indexes and
secondary indexes.

Hash indexes are like single table join indexes but they automatically carry base
table primary index value.
                                                     SELECT O_CustKey,
                                                            O_Orderdate,
CREATE HASH INDEX HIOrder
                                                            O_Totalprice,
(O_CustKey ,
                                                     FROM OrderTbl
 O_OrderDate,
                                                     WHERE O_CustKey > 12;
 O_TotalPrice)
ON OrderTbl
BY (O_CustKey)                                       SELECT O_CustKey,
ORDER BY (O_CustKey)                                        O_Orderdate,
                                                            O_Totalprice,
                                                            O_Orderstatus
                                                     FROM OrderTbl
                                                     WHERE O_CustKey > 12;
Hash Indexes
       CREATE HASH INDEX HIOrder
       (O_CustKey ,
        O_OrderDate,
        O_TotalPrice)
       ON OrderTbl
       BY (O_CustKey)
       ORDER BY (O_CustKey)


      SELECT C_Name, C_Address,
           O_Orderdate, O_TotalPrice, O_Orderstatus
      FROM Customer, Ordertbl
      WHERE C_Custkey = O_Custkey
      AND O_Custkey < 10;



                                        Explain
Hash Indexes
   5) We do an all-AMPs JOIN step from TPCH.Customer by way of an
     all-rows scan with a condition of ("TPCH.Customer.C_CUSTKEY < 10"),
     which is joined to TPCH.HIOrder with a range constraint of (
     "TPCH.HIOrder.O_CUSTKEY <= 9") with an additional condition of (
     "TPCH.HIOrder.O_CUSTKEY <= 9"). TPCH.Customer and TPCH.HIOrder
     are joined using a product join, with a join condition of (
     "TPCH.Customer.C_CUSTKEY = TPCH.HIOrder.O_CUSTKEY"). The input
     table TPCH.HIOrder will not be cached in memory, but it is
     eligible for synchronized scanning. The result goes into Spool 2
     (all_amps), which is redistributed by hash code to all AMPs. Then
     we do a SORT to order Spool 2 by row hash. The size of Spool 2 is
     estimated with no confidence to be 3 rows. The estimated time for
     this step is 0.10 seconds.
   6) We do an all-AMPs JOIN step from Spool 2 (Last Use) by way of a
     RowHash match scan, which is joined to TPCH.Ordertbl. Spool 2 and
     TPCH.Ordertbl are joined using a merge join, with a join condition
     of ("(Field_3 = (SUBSTRING((TPCH.Ordertbl.RowID) FROM 7 FOR 4 )))
     AND (Field_2 =)"). The input table TPCH.Ordertbl will not be
     cached in memory. The result goes into Spool 1 (group_amps),
     which is built locally on the AMPs. The size of Spool 1 is
     estimated with no confidence to be 3 rows. The estimated time for
     this step is 0.05 seconds.
Hash Indexes
Hash Index ( also Join Index ) can also be used to avoid row
redistribution for join preparation.




             SELECT C_Name, C_Address,
                  O_Orderdate, O_TotalPrice
             FROM Customer, Ordertbl
             WHERE C_Custkey = O_Custkey;
Hash Indexes
Without Hash Index defined:

       4) We do an all-AMPs RETRIEVE step from TPCH.Ordertbl by way of an
          all-rows scan with no residual conditions into Spool 2 (all_amps),
          which is redistributed by hash code to all AMPs. Then we do a
          SORT to order Spool 2 by row hash. The result spool file will not
          be cached in memory. The size of Spool 2 is estimated with high
          confidence to be 60,000 rows. The estimated time for this step is
          1.20 seconds.
        5) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
          match scan with no residual conditions, which is joined to Spool 2
          (Last Use). TPCH.Customer and Spool 2 are joined using a merge
          join, with a join condition of ("TPCH.Customer.C_CUSTKEY =
          O_CUSTKEY"). The result goes into Spool 1 (group_amps), which is
          built locally on the AMPs. The size of Spool 1 is estimated with
          low confidence to be 60,000 rows. The estimated time for this
          step is 0.48 seconds.
       …
       …
          The total estimated time is 1.68 seconds.
Hash Indexes
With Hash Index defined:

CREATE HASH INDEX HIOrder(O_Custkey,
                       O_TotalPrice,
                       O_OrderDate)
ON OrderTbl BY (O_CustKey) ORDER BY HASH (O_CustKey);


4) We do an all-AMPs JOIN step from TPCH.Customer by way of a RowHash
   match scan with no residual conditions, which is joined to
   TPCH.HIOrder. TPCH.Customer and TPCH.HIOrder are joined using a
   merge join, with a join condition of ("TPCH.Customer.C_CUSTKEY =
   TPCH.HIOrder.O_CUSTKEY"). The input table TPCH.HIOrder will not
   be cached in memory. The result goes into Spool 1 (group_amps),
   which is built locally on the AMPs. The size of Spool 1 is         No redistribution,
   estimated with low confidence to be 60,000 rows. The estimated        No sorting.
   time for this step is 0.52 seconds.                                 Total join time
……                                                                      significantly
                                                                           reduced
  The total estimated time is 0.52 seconds.
Partitioned Primary Index
Partitioned Primary Indexes

 Partitioned Primary Index (PPI) allows a class of
  queries to access a portion of a large table instead
  of the whole table.

 PPI table rows are assigned to user defined
  partitions in each AMP enabling enhanced
  performance for range queries that are predicated
  on primary index values.

 PPIs increase query efficiency by avoiding full
  table scan without the overhead and maintenance
  cost of secondary indexes.
NON-PPI Table
Records are sorted in row hash (not shown) sequence within the AMP.

   101 10 01/02           110 10 01/10            140 10 01/10        101 10 01/02

   120 30 01/10           125 30 01/12            121 40 01/12        130 20 01/10

   131 20 01/18           122 20 01/20            116 30 01/18        139 40 01/18

   115 30 01/10           132 40 01/02            112 20 01/10        134 30 01/20

   129 10 01/12           138 30 01/18            103 40 01/18        105 10 01/02

   102 30 01/20           128 30 01/20            133 10 01/20        104 20 01/12

   107 20 01/10           123 40 01/10            126 20 01/02        124 20 01/20

   135 30 01/02           118 10 01/18            127 30 01/18        117 30 01/18

   114 40 01/20           113 40 01/02            136 20 01/10        137 20 01/02

   119 30 01/18           106 40 01/12            109 30 01/02        108 10 01/12




     SELECT * FROM Employee WHERE EmpId = 114;

     SELECT * FROM Employee WHERE EmpId BETWEEN DATE „2004-01-12‟ AND
     DATE „2004-01-18‟;
PPI Table
Records are sorted in row hash (not shown) sequence in each partition within the AMP.

  101 10 01/02            113 40 01/02             126 20 01/02             101 10 01/02

  135 30 01/02            132 40 01/02             109 30 01/02             105 10 01/02

  115 30 01/10            110 10 01/10             140 10 01/10             137 20 01/02

  107 20 01/10            123 40 01/10             136 20 01/10
                                                                            130 20 01/10

                          106 40 01/12             112 20 01/10             104 20 01/12
  120 30 01/10
                          125 30 01/12             121 40 01/12             108 10 01/12
  129 10 01/12
                                                                            139 40 01/18
  119 30 01/18            118 10 01/18             116 30 01/18

                          138 30 01/18             127 30 01/18             117 30 01/18
  131 20 01/18

  102 30 01/20            128 30 01/20             103 40 01/18             134 30 01/20

  114 40 01/20            122 20 01/20             133 10 01/20             124 20 01/20



   SELECT * FROM Employee WHERE EmpId = 114;

   SELECT * FROM Employee WHERE EmpId BETWEEN DATE „2004-01-12‟ AND DATE
   „2004-01-18‟;
PPI Example
CREATE TABLE Lineitem (             CREATE TABLE LineitemPPI (
   L_ORDERKEY INTEGER,                 L_ORDERKEY INTEGER,
   L_PARTKEY INTEGER,                  L_PARTKEY INTEGER,
   L_SUPPKEY INTEGER,                  L_SUPPKEY INTEGER,
   L_LINENUMBER INTEGER ,              L_LINENUMBER INTEGER ,
   L_QUANTITY DECIMAL(15,2),           L_QUANTITY DECIMAL(15,2),
   L_EXTENDEDPRICE DECIMAL(15,2),      L_EXTENDEDPRICE DECIMAL(15,2),
   L_DISCOUNT DECIMAL(15,2),           L_DISCOUNT DECIMAL(15,2),
   L_TAX DECIMAL(15,2),                L_TAX DECIMAL(15,2),
   L_RETURNFLAG CHAR(1),               L_RETURNFLAG CHAR(1),
   L_LINESTATUS CHAR(1),               L_LINESTATUS CHAR(1),
   L_SHIPDATE DATE,                    L_SHIPDATE DATE,
   L_COMMITDATE DATE,                  L_COMMITDATE DATE,
   L_RECEIPTDATE DATE,                 L_RECEIPTDATE DATE,
   L_SHIPINSTRUCT CHAR(25),            L_SHIPINSTRUCT CHAR(25),
   L_SHIPMODE CHAR(10),                L_SHIPMODE CHAR(10),
   L_COMMENT VARCHAR(44)               L_COMMENT VARCHAR(44)
)                                   )
PRIMARY INDEX (L_ORDERKEY);         PRIMARY INDEX (L_ORDERKEY)
                                    PARTITION BY RANGE_N(L_ShipDate BETWEEN
                                    DATE '1992-01-03' AND DATE '1998-11-30'
                                    EACH INTERVAL '1' MONTH );
PPI Example
       SELECT MIN(L_Shipdate), MAX(L_Shipdate) FROM Lineitem;

       Minimum(L_SHIPDATE)               Maximum(L_SHIPDATE)
       -------------------------------   --------------------------------
       1992-01-03                        1998-11-30


NON-PPI Table:

EXPLAIN SELECT * FROM Lineitem WHERE l_Shipdate > DATE '1997-12-31';


3) We do an all-AMPs RETRIEVE step from TPCH.lineitem by way of an
   all-rows scan with a condition of ("TPCH.lineitem.L_SHIPDATE >
   DATE '1997-12-30'") into Spool 1 (group_amps), which is built
   locally on the AMPs. The input table will not be cached in memory,
   but it is eligible for synchronized scanning. The result spool
   file will not be cached in memory. The size of Spool 1 is
   estimated with high confidence to be 27,783 rows. The estimated
   time for this step is 3.00 seconds.
PPI Example
PPI Table:

EXPLAIN SELECT * FROM LineitemPPI WHERE l_Shipdate > DATE '1997-12-31';




   3) We do an all-AMPs RETRIEVE step from 12 partitions of
      TPCH.lineitemppi with a condition of (
      "TPCH.lineitemppi.L_SHIPDATE > DATE '1997-12-30'") into Spool 1
      (group_amps), which is built locally on the AMPs. The input table
      will not be cached in memory, but it is eligible for synchronized
      scanning. The result spool file will not be cached in memory.
      The size of Spool 1 is estimated with high confidence to be 27,735
      rows. The estimated time for this step is 0.88 seconds.


 Only 12 partitions are retrieved instead of a full table scan
PPI Example
NON-PPI Table:

EXPLAIN SELECT * FROM Lineitem WHERE L_Orderkey = 240000;


1) First, we do a single-AMP RETRIEVE step from TPCH.lineitem by
   way of the primary index "TPCH.lineitem.L_ORDERKEY = 240000"
   with no residual conditions into Spool 1 (one-amp), which is built
   locally on that AMP. The input table will not be cached in memory,
   but it is eligible for synchronized scanning. The size of Spool 1 is
   estimated with high confidence to be 5 rows. The estimated time
   for this step is 0.03 seconds.



 Rows are stored in rowhash order within a AMP. Search is very efficient.
 Only one block read.
PPI Example
PPI Table:

EXPLAIN SELECT * FROM LineitemPPI WHERE L_Orderkey = 240000;


 1) First, we do a single-AMP RETRIEVE step from all partitions of
    TPCH.lineitemppi by way of the primary index
    "TPCH.lineitemppi.L_ORDERKEY = 240000" with a residual
    condition of ("TPCH.lineitemppi.L_ORDERKEY = 240000")
    into Spool 1 (one-amp), which is built locally on that AMP.
    The input table will not be cached in memory, but it is eligible
    for synchronized scanning. The size of Spool 1 is estimated with
    high confidence to be 5 rows. The estimated time for this step is
    0.67 seconds.

 All partitions are to be scanned for comparison.
PPI – Delete Performance
CREATE TABLE Lineitem (             CREATE TABLE LineitemPPI (
   L_ORDERKEY INTEGER,                 L_ORDERKEY INTEGER,
   L_PARTKEY INTEGER,                  L_PARTKEY INTEGER,
   L_SUPPKEY INTEGER,                  L_SUPPKEY INTEGER,
   L_LINENUMBER INTEGER ,              L_LINENUMBER INTEGER ,
   L_QUANTITY DECIMAL(15,2),           L_QUANTITY DECIMAL(15,2),
   L_EXTENDEDPRICE DECIMAL(15,2),      L_EXTENDEDPRICE DECIMAL(15,2),
   L_DISCOUNT DECIMAL(15,2),           L_DISCOUNT DECIMAL(15,2),
   L_TAX DECIMAL(15,2),                L_TAX DECIMAL(15,2),
   L_RETURNFLAG CHAR(1),               L_RETURNFLAG CHAR(1),
   L_LINESTATUS CHAR(1),               L_LINESTATUS CHAR(1),
   L_SHIPDATE DATE,                    L_SHIPDATE DATE,
   L_COMMITDATE DATE,                  L_COMMITDATE DATE,
   L_RECEIPTDATE DATE,                 L_RECEIPTDATE DATE,
   L_SHIPINSTRUCT CHAR(25),            L_SHIPINSTRUCT CHAR(25),
   L_SHIPMODE CHAR(10),                L_SHIPMODE CHAR(10),
   L_COMMENT VARCHAR(44)               L_COMMENT VARCHAR(44)
)                                   )
PRIMARY INDEX (L_ORDERKEY);         PRIMARY INDEX (L_ORDERKEY)
                                    PARTITION BY RANGE_N(L_ShipDate BETWEEN
                                    DATE '1992-01-03' AND DATE '1998-11-30'
                                    EACH INTERVAL '1' MONTH );
PPI – Delete Performance
DELETE FROM Lineitem
WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';

  …
  3) We do an all-AMPs DELETE from TPCH.Lineitem by way of an all-rows
     scan with a condition of ("(TPCH.Lineitem.L_SHIPDATE >= DATE
     '1996-12-31') AND (TPCH.Lineitem.L_SHIPDATE <= DATE '1997-12-31')").
  …

DELETE FROM LineitemPPI
WHERE l_Shipdate BETWEEN DATE '1996-12-31' AND DATE '1997-12-31';


     …
     3) We do an all-AMPs DELETE from 2 partitions of TPCH.LineitemPPI
        with a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE
        '1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE
        '1997-12-31')").
      4) We do an all-AMPs DELETE of 11 partitions of TPCH.LineitemPPI with
        a condition of ("(TPCH.LineitemPPI.L_SHIPDATE >= DATE
        '1996-12-31') AND (TPCH.LineitemPPI.L_SHIPDATE <= DATE
        '1997-12-31')").
PPI Joins
CREATE TABLE LineitemPPI (          CREATE TABLE Shipping (
   L_ORDERKEY INTEGER,                 S_ORDERKEY INTEGER,
   L_PARTKEY INTEGER,                  S_SHIPDATE DATE,
   L_SUPPKEY INTEGER,                  S_RECEIPTDATE DATE,
   L_LINENUMBER INTEGER ,              S_SHIPINSTRUCT CHAR(25),
   L_QUANTITY DECIMAL(15,2),           S_SHIPMODE CHAR(10)
   L_EXTENDEDPRICE DECIMAL(15,2),   )
   L_DISCOUNT DECIMAL(15,2),        PRIMARY INDEX (S_ORDERKEY)
   L_TAX DECIMAL(15,2),             PARTITION BY RANGE_N(S_ShipDate BETWEEN
   L_RETURNFLAG CHAR(1),            DATE '1992-01-03' AND DATE '1998-11-30'
   L_LINESTATUS CHAR(1),            EACH INTERVAL '1' MONTH );
   L_COMMENT VARCHAR(44)
)
PRIMARY INDEX (L_ORDERKEY)
PARTITION BY RANGE_N(L_ShipDate     SELECT L_Orderkey, L_Shipdate,S_Shipmode
BETWEEN DATE '1992-01-03'           FROM LineitemPPI
AND       DATE '1998-11-30'         INNER JOIN Shipping
EACH INTERVAL '1' MONTH );          ON L_Orderkey = S_Orderkey;
PPI Joins


    4) We do an all-AMPs JOIN step from all partitions of TPCH.shipping
       by way of a RowHash match scan with a condition of ("NOT
       (TPCH.shipping.S_SHIPDATE IS NULL)"), which is joined to
       TPCH.lineitemppi with a condition of ("NOT
       (TPCH.lineitemppi.L_SHIPDATE IS NULL)"). TPCH.shipping and
       TPCH.lineitemppi are joined using a rowkey-based merge join, with
       a join condition of ("(TPCH.lineitemppi.L_SHIPDATE =
       TPCH.shipping.S_SHIPDATE) AND (TPCH.lineitemppi.L_ORDERKEY =
       TPCH.shipping.S_ORDERKEY)"). The input tables TPCH.shipping and
       TPCH.lineitemppi will not be cached in memory, but TPCH.shipping
       is eligible for synchronized scanning. The result goes into Spool
       1 (group_amps), which is built locally on the AMPs. The result
       spool file will not be cached in memory. The size of Spool 1 is
       estimated with low confidence to be 401,785 rows. The estimated
       time for this step is 4.85 seconds.
Collect Statistics
Collect Statistics

 Optimizer must be provided with correct
  demographic information on your data to choose
  optimal plan to execute your query.

 Statistics tells the optimizer
   How many rows per value are there.
   How many distinct values are there in the column.


 If Collected Statistics are not available, optimizer
  does random AMP sampling to derive
  demographics.
Collect Statistics

 Collected statistics are not automatically updated
  by Teradata DBS.

 User must refresh statistics when 5% to 10%
  change on the table rows.

 Collect Statistics on

   All non-unique Indexes of a table or a join index.
   Any column used in WHERE clause for set selection or
    join constraint.
Collect Statistics
COLLECT STATISTICS ON Lineitem COLUMN L_Orderkey;

COLLECT STATISTICS ON Lineitem COLUMN L_Shipdate;

COLLECT STATISTICS ON Lineitem COLUMN (L_Orderkey, L_Shipdate);


  HELP STATISTICS Lineitem;


  Date          Time         Unique Values              Column Names
  ------------- ------------ -------------------- ------------------------------------
  04/10/05 11:04:48            60,000           L_ORDERKEY
  04/10/05 09:57:52             2,524           L_SHIPDATE
  04/10/05 11:49:47 236,352                     L_ORDERKEY,L_SHIPDATE
Data Compression
Data Compression
 Makes row sizes smaller

 Allows more rows per block

 Reduces the number of I/Os

 Implemented in column level

 Compression is a I/O-intensive workload.

 Improvement gained through the more-rows-per-block
  concept is significant in the Full Table Scan operations.

 Compression is transparent to applications.
Data Compression
Single-Value Compression       Multi-Value Compression
      V2R4 and prior                V2R5 and later

CREATE TABLE Employee          CREATE TABLE Employee
(EmployeeNo INTEGER            (EmployeeNo INTEGER
…                              …
Jobtitle CHARACTER(30)         Jobtitle CHARACTER(30)
       COMPRESS (‟cashier‟)           COMPRESS (‟cashier‟,
…                                             „manager‟,
);                                            „programmer‟)
                               ...
                               );

Nulls and                      Cashiers,
cashiers will be compressed.   managers,
                               programmers
                               will be compressed including nulls.
                               255 distinct values for an individual
                               column can be compressed.
Data Compression - Implementation
The following graphics shows how Data Compression is implemented in Teradata
CREATE TABLE CompressExample (
StreetAddress VARCHAR(40),                                Table Header
City CHARACTER(20)
                                    Field:             Field: City CAHR(20)        Field:
     COMPRESS (’New York’,
                                    StreetAddress      01 ‘Chicago       ‘         StateCode
               ’Los Angeles’,                                                      Char(2)
                                    VARCHAR(40)        10 ‘Los Angeles ‘
               ’Chicago’)
               NOT NULL,                               11 ‘New York      ‘
StateCode CHARACTER(2));
                                                                  San
                            00           130 Sutter St.                           CA
                                                               Francisco
                            01           133 Wacker Drive.         IL

                            11           5 Times Sq.          NY
Actual Data Rows
                            01           900 North Michigan Av.           IL

                            11           135 East 57th             NY

                            00           1525 Howe St.                   Racine         WI

                            10           304 S. Broadway           CA
Multi Value Compression & VARCHAR
 VARCHAR consumes two extra bytes for each value whereas
  compression consumes CPU resource to decode compress values.

 The data demographics determine whether variable length character
  data type or fixed length plus compression is more efficient.

 VARCHAR is better when the difference of max and average field
  length is high and a low percentage of fields are compressible.

 Compression is better when the difference of max and average field
  length is low and a high percentage of fields are compressible.

 If no clear pictures about data demographics are available, use
  VARCHAR as it is less CPU intensive.
Query Management
Priority Scheduler

A DBA may want to:

Configure the system to execute queries at a higher priority
submitted by Sales Managers.


Or

Configure the system to execute queries submitted by Development
group at a lower priority during 8:00 AM and 3:00 PM and execute
at medium priority during 3:00 PM and 8:00 AM.

Or

Lower the priority of a job if it takes more than one hour to complete.
Priority Scheduler

 Can be used to control resources allocated to
  users.

 Administrator can specify performance group
  while creating the user.

 It manages resource distribution to improve
  performance of one application at the expense of
  other.
Priority Scheduler Components

Resource Partition
                                           RP#




Performance
Groups                L            M              H             R



Performance
Periods
           8am–5pm    5pm-9pm    9pm-8am              8pm-8am       8am-8pm
             AG1        AG2        AG3                  AG4           AG3



Allocation
                AG1        AG2              AG3                     AG4
Groups
                 5          10               20                      40
Priority Scheduler Components
 Resource Partition                   Performance Period
   High level Resource                  Controls the scheduling
    Partitioning                          policy at that point in time.
   Default is Partition 0               Links a PG to an Allocation
                                          Group‟s weight and policy


 Performance Group                    Allocation Group
   Provides relative priority with      Defines a method for
    in the Resource Partition             disbursing resources among
   Can be specified in the               sessions active within that
    Account String in Create              allocation group
    User statement.                      Carries the weight.
   Can be specified in user             Defines a scheduling policy
    Logon String ($M$, $DEV$
    etc).
Example 1 – Percentage of Resource
Allocation

 User WHDev with performance group $L$ logged on to the system at
 9:30 PM.

 What is the percentage share of system resources the user WHDev
 will get ?


 Sum of the weights = 5 + 10 + 20 + 40 = 75

 At 9:30 PM performance group L will be
 assigned to allocation group AG3.

 So % of resource allocation = 20/75 = 26%
Example 2 - Automatic Change in
Priority Based CPU usage
      Priority



         20


         10

            5


                 0                                                     Time
                         1000     2000     3000




                 Performance Period 1             Performance Period 2
                 Usage 3600 Seconds               Usage 0 Seconds
                 Allocation Group AG11            Allocation Group AGDEF




                     Allocation Grp=AG11            Allocation Grp=AGDEF
                          Weight=40                        Weight=5
Teradata Dynamic Query Manager

A DBA may want to:

Prevent all queries that are estimated to return more than 100,000
rows from running between the hours of 8:00 a.m. and 1:00 p.m.
on Fridays.

Or

Prevent all queries from Testing group that are estimated to take
more than 3 minutes running between the hours of 8:00 a.m. and
3.00 p.m. on Monday.

Or

Schedule a request to run on every Friday at 8.00 pm.
Teradata Dynamic Query Manager
 Teradata Dynamic Query Manager (TDQM) is product that
  enables you to effectively manage the access to and
  utilization of a Teradata database system.

 Managing the database system increases the workload
  capacity and efficiency of database usage.

 TDQM addresses the key problems of database system
  overload and network saturation that result from large
  number of clients accessing the Teradata system.

 Two main functionalities of TDQM are:
    Limiting the execution of some queries on the Teradata database
     according to rules – Query Management.
    Scheduling SQL request for batch execution – Scheduled Requests.
TDQM Architecture
          Query Management                       Scheduled Requests


  All Client
                                          Scheduled            Scheduled
   systems              TDQM
                                          Requests              Request
  accessing          Administrator
                                            Client               Server
  Teradata.




                TDQM
               Partition
                                                  User Data
                                TDQM
                                                                      Teradata
                               Metadata                                RDBMS
Teradata Dual Active Solution

 Provides support for unplanned down time.

 Eliminates the need for planned down time.

 Provides additional processing power to smooth
  out peak workload on the primary system.

 100% data replication as well as only mission-
  critical data replication possible.
Teradata Dual Active Solution
Architecture

                   Primary System

                                                          Users/
                                                        Applications




      Data                               Teradata         Users/
 Synchronization   Operation Control   Query Director   Applications



                                                          Users/
                                                        Applications

                   Backup System
Teradata Query Director

 Designed to intelligently route queries based on
  customer-established rules.

 Helps to share workload between the system.

 Provide failover capability.

				
DOCUMENT INFO