Embed
Email

TERADATA 13

Document Sample

Shared by: jjepamony
Categories
Tags
Stats
views:
23
posted:
1/24/2012
language:
pages:
43
Teradata Database 13.10 Overview

Todd Walter

CTO Teradata Labs

Fine Print



• Nothing in this presentation constitutes a

commitment to deliver any specific

functionality at any specific time.

• Current planning date for 13.10 release in

Q32010.









2

Key Features

What is a Temporal Database

Definitions

• Temporal – the ability to store all historic states of a given set of data

(a database row), and as part of the query select a point in time to

reference the data. Examples:

> What was this account balance (share price, inventory level, asset value,

etc) on this date?

> What data went into the calculation on 12/31/05, and what adjustments

were made in 1Q06?

> On this historic date, what was the service level (contract status, customer

value, insurance policy coverage) for said customer?

• Three Types of Temporal Tables

> Valid Time Tables

– When a fact is true in the modeled reality

– User specified times

> Transaction Time Tables

– When a fact is stored in the database

– System maintained time, no user control

> Bitemporal Tables

– Both Transaction Time and Valid Time

• User Defined Time

> User can add time period columns, and take advantage of the added

temporal operators

> Database does not enforce any rules on user defined time columns





4

Temporal Query



Provide a list of members who were reported as covered on

Jan. 15, 2000 in the Feb. 1, 2000 NCQA report, with names as

accurate as our best data shows today.

Without Temporal Support With Temporal Support



select member.member_id

SELECT member.member_id, member.member_nm

,member.member_nm

FROM edw.member_x_coverage

from edw.member_x_coverage coverage

VALIDTIME AS OF DATE ‘2000-01-15’ AND

,edw.member TRANSACTIONTIME AS OF DATE ‘2000-01-01’ ,edw.member

where coverage.member_id = member.member_id WHERE member_x_coverage.member_id =

member.member_id;

and coverage.observation_start_dt '2000-02-01'

or

coverage.observation_end_dt is NULL)

and coverage.effective_dt '2000-01-15'

or

coverage.termination_dt is NULL)





5

Temporal Update – BiTemporal Table



Current valid time, current transaction time Query

Jeans (125,102) are sold today (2005-08-30)

Without Temporal Support

With Temporal Support INSERT INTO objectlocation

SELECT item_id, item_serial_num, ‘External’, CURRENT_TIME, END(vt), CURRENT_TIME,

UPDATE objectlocation ‘Until_Closed’

FROM objectlocation

SET LOCATION = ‘External’ WHERE item_id = 125 AND item_serial_num = 102

AND BEGIN(vt) CURRENT_TIME

AND END(tt) = ‘Until_Closed’;

AND item_serial_num = 102

INSERT INTO objectlocation

SELECT item_id, item_serial_num, location, BEGIN(vt), CURRENT_TIME, CURRENT_TIME,

‘Until_Closed’

FROM objectlocation

WHERE item_id = 125 AND item_serial_num = 102

AND BEGIN(vt) CURRENT_TIME

AND END(tt) = ‘Until_Closed’;

UPDATE objectlocation

SET END(tt) = CURRENT_TIME

WHERE item_id = 125 AND item_serial_num = 102

AND BEGIN(vt) CURRENT_TIME

AND END(tt) = ‘Until_Closed’;

INSERT INTO objectlocation

SELECT item_id, item_serial_num, ‘External’, BEGIN(vt), END(vt), CURRENT_TIME,

‘Until_Closed’

FROM objectlocation

WHERE item_id = 125 AND item_serial_num = 102

AND BEGIN(vt) > CURRENT_TIME

AND END(tt) = ‘Until_Closed’

UPDATE objectlocation

SET END(tt) = CURRENT_TIME

WHERE item_id =125 AND item_serial_num = 102

AND BEGIN(vt) > CURRENT_TIME

AND END(vt) = ‘Until_Closed’









6

Moving Current Date in PPI





• Description

> Support use of CURRENT_DATE and CURRENT_TIMESTAMP built-in

functions in Partitioning Expression.

> Ability to reconcile the values of these built-in functions to a newer date or

timestamp using ALTER TABLE.

– Optimally reconciles the rows with the newly resolved date or timestamp value.

– Reconciles the PPI expression.

• Benefit

> Users can define with „moving‟ date and timestamps with ease instead of

manual redefinition of the PPI expression using constants.

– Date based partitioning is typical use for PPI. If a PPI is defined with „moving‟

current date or current timestamp, the partition that contains the recent data can

be as small as possible for efficient access.

> Required for Temporal semantics feature – provides the ability to define

„current‟ and „history‟ partitions.









7

Time Series Expansion Support





• Description

> New EXPAND ON clause added to SELECT to expand row with a

period column into multiple rows

– EXPAND ON clause allowed in views and derived tables

> EXPAND ON syntax supports multiple ways to expand rows



• Benefit

> Permits time based analysis on period values

– Allows business questions such as „Get the month end average

inventory cost during the last quarter of the year 2006‟

– Allows OLAP analysis on period data

> Allows charting of period data in an excel format

> Provides infrastructure for sequenced query semantics on

Temporal tables









8

Time series Expansion support





• What will it do?

> Expand a time period column and produce value equivalent rows

one each for each time granule in the period

– Time granule is user specified

– Permits a period representation of the row to be changed into an event

representation

> Following forms of expansion provided:

– Interval expansion

– By the user specified intervals such as INTERVAL „1‟ MONTH

– Anchor point expansion

– By the user specified anchored points in a time line

– Anchor period expansion

– By user specified anchored time durations in a time line









9

Geospatial Enhancements





• Description

> Enhancements to the Teradata 13 Geospatial offering drastically

increasing performance, adding functionality and providing

integration points for partner tools

• Benefits

> Increased performance by changing UDF‟s to Fast Path System

functions

> Replace the Shape File Generator client tool (org2org) with a

stored procedure for tighter integration with the database and

tools such as ESRI ARCGIS

> Provide geodetic distance methods – SphericalBufferMBR()

> WFS Server provides better tool integration support for MapInfo

and ESRI products









10

ESRI ArcGIS Connecting to Teradata via Safe

Software FME





1. FME connection in

ArcView

2. Connect to Teradata

via TPT API

3. Select Teradata

tables for ArcView

analysis









11

Projection of Impact Zone

& Storm Path to Google Earth





Where do I deploy my

cat management team.









12

Algorithmic Compression



• Description

> Provide the capability that will allow users the option of defining

compression/decompression algorithms that would be

implemented as UDFs and that would be specified and applied to

data at the column level in a row. Initially, Teradata will provide

two compression/decompression algorithms; one set for UNICODE

columns and another set for LATIN columns.

• Benefit

> Data compression is the process by which data is encoded so that

it consumes less physical storage space. This capability reduces

both the overall storage capacity needs and the number of

physical disk I/Os required for a given operation. Additionally,

because less physical data is being operated on there is the

potential to improve query response time as well.

• Considerations

> At some point, compressed data will have to be decompressed

when required. This can cause the use of some extra CPU cycles

but in general, the advantages of compression outweigh the extra

cost of decompression.



13

Multi-Value Compression For Varchar Columns



• Example – Multi-Value Compression for Varchar Column:





CREATE TABLE Customer

(Customer_Account_Number INTEGER

,Customer_Name VARCHAR(150)

COMPRESS (‘Rich’,‘Todd’)

,Customer_Address CHAR(200));









14

Block Level Compression



• Description

> Feature provides the capability to perform compression on whole

data blocks at the file system level before the data blocks are

actually written to storage.

• Benefit

> Block level compression yields benefit by reducing the actual

storage required for storing the data, especially cool/cold data, and

significantly reduce the I/O required to read the data.

• Considerations

> There is a CPU cost to perform the act of compression or

decompression on whole data blocks and is generally considered a

good trade since CPU cost is decreasing while I/O cost remains

high.









15

User-Defined SQL Operators



• Description

> This feature provides the capability that will allow users to define

and encapsulate complex SQL expressions into a User Defined

Function (UDF) database object.

• Benefits

> The use of the SQL UDFs Feature allows users to define their own

functions written using SQL expressions. Previously, the desired

SQL expression would have to be written into the query for each

use or alternatively, an external UDF could have been written in

another programming language to provide the same capability.

> Additionally, SQL UDFs allow one to define functions available in

other databases and with alternative syntax (e.g. ANSI).

• Considerations

> The Teradata SQL UDF feature is a subset of the SQL function

feature described in the ANSI SQL:2003 standard.

> Additionally, this feature does not introduce any changes to the

definition of the Dictionary Tables per se, but will add additional

rows into the DBC.TVM and DBC.UDFInfo tables to indicate the

presence of a SQL UDF.



16

SQL UDF - Example



• The “Months_Between” Function:



CREATE FUNCTION Months_Between

(Date1 DATE, Date2 DATE)

RETURNS Interval Month (4)

LANGUAGE SQL

DETERMINISTIC

CONTAINS SQL

PARAMETER STYLE SQL

RETURN(CAST(Date1 AS DATE)- CAST(Date2 AS DATE)) MONTH (4);



SELECT MONTHS_BETWEEN ('2008-01-01', '2007-01-01');

MONTHS_BETWEEN ('2008-01-01', '2007-01-01')

---------------------------------------------------

12





17

Performance

Character-Based PPI (CPPI)



• Description

> This feature leverages current Teradata Primary Partitioned Index

(PPI) technology and extends this capability to allow the use of

character data (CHAR, VARCHAR, GRAPHIC, VARGRAPHIC) as

table partitioning mechanisms.

• Benefit

> Currently, only an integer datatype is allowed to be used in a PPI

scheme as a partitioning mechanism which facilitates superior

query performance advantage via partition elimination. The

extension of this capability to use character-based datatypes as a

partitioning mechanism will allow for more partitioning options

and in-turn yield similar query performance advantage as the

current PPI technology gleans today.

• Considerations

> As with all Teradata indexes or partitioning database design

choices, the Optimizer will determine the appropriate index/PPI to

use that will provide the best-cost plan for executing the query.

No end-user query modification is required.

19

Timestamp Partitioning



• Description

> Provide the capability that allows users to explicitly specify a time

zone for PPI tables involving DateTime partitioning expressions in

order to make the expressions deterministic (e.g., not dependent

on the session time zone).

> Implement the enhancements that will extend the PPI partition

elimination capability to include timestamp data types in

partitioning expressions.

• Benefit

> Insuring that DateTime partitioning expressions to be

deterministic will eliminate the possibility of any errors that may

occur as a result of incorrect dependence on session time zones.

> The extension of this capability to use timestamp data types as a

partitioning mechanism will allow for more partitioning options

and in-turn yield similar query performance advantage as the

current PPI technology gleans today.

• Considerations

> Enhancements related to deterministic time zone handling will

also be applied to sparse join index search conditions as well.





20

Fastpath Functions



• Description

> The Fastpath Function project combines the extensibility, short

development cycles, and ease-of-use aspects of UDFs with the

high performance and ease-of-use aspects of Teradata system

functions to yield and alternate development path by which

Teradata Engineering software developers may add new Teradata

system functions to the Teradata server.

• Benefit

> The Fastpath Function project will allow Teradata to use a shorter

development cycle to fulfill many customer specific requests for

implementing new system functions that additionally perform in

the same manner as native Teradata system functions.

• Considerations

> Source code and/or libraries used in the development of Teradata

system functions must be solely managed and maintained by

Teradata Engineering. End-users will not be able to develop

Fastpath system functions.





21

FastExport – Without Spooling



• Description

> Enhance the FastExport utility to provide an option that would

allow the utility to execute in a mode that eliminates the

requirement that the query data be spooled prior to the actual

export process.

• Benefit

> The “direct without spooling” method will provide the mechanism

to extract data from Teradata table quickly and efficiently with

the main benefit being realized as a performance gain and

minimum resource utilization.

• Considerations

> The “direct without spooling” method is not transparent to the

user and must be specified as a discrete option when executing

the FastExport utility. It is a user decision to choose between

using either the “spool” or “no spool” method.









22

Teradata Workload Management

TASM: Additional Workload Definitions



• Description

> Feature increases the number of available TASM Workload

Definitions (WDs) to 250 (instead of 40).

• Benefits

> Complex mixed workloads require the ability to have a finer

degree of granular control over the parts of the workload.

Increasing the number of WDs will allow customers to better

manage and report on resource usage of their system to meet

either subject area (e.g. by country, application or division)

resource distribution requirements, or category-of-work (e.g. high

vs. low priority) resource distribution requirements.

• Considerations

> Administrators should be aware that when defining a large

number of workloads which will run concurrently, it will become

difficult to create significant differentiation among the different

workloads when the resource division granularity itself gets very

small.



24

TASM: Common Classifications



• Description

> This feature provides for capability to have Workload Definition

classification criteria be available for Teradata Workload

Management Category 1, 2 and 3 (Filters, System Throttles and

Workload Definitions) and additionally, extends wildcard support

to Filters and Throttles.

• Benefit

> The implementation of Common Classifications addresses the

differences and delivers consistency between the TDWM

categories (Filters, System Throttles and Workload Definitions),

which improves the Teradata Workload Management user

interface and it‟s subsequent usability.

• Considerations

> Consideration should be given to re-evaluating the current

settings for the different categories insofar as common

classification extends the ability to manage a workload in an

easier and simpler fashion.



25

TASM: Common Classifications



• “Who” Criteria

> Account String / Account Name

> Teradata Username / Teradata Profile

> Application Name

> Client Address or Client Name

> QueryBand

• “Where” Criteria (Data Objects)

> Databases

> Tables / Views / Macros

> Stored Procedures

• “What” Criteria

> Statement Type (SELECT, DDL, DML)

> Utility Type

> AMP Limits, Row Count, Final Row Count

> Estimated Processing (CPU time)

> Join Types

– ALL or no joins

– ALL or no product joins

– ALL or no unconstrained product joins

26

TASM Utility Management



• Description

> This feature enhances the TASM utility to augment the existing TD

Utility Management capability to provide controls to be similar to

the workload management of regular SQL requests and to provide

for the automatic selection of the number of sessions used by

Teradata utilities.

• Benefits

> Feature provides for more granular and centralized control of utility

execution and allows deployment to a much wider audience of users

and applications. Additionally, the use of Teradata utility sessions is

moved inside the database and is automated to eliminate the

detailed management of sessions in each job.

• Considerations

> Consideration should be given to a reevaluation of current rule sets

and settings to maximize control of the workload and relative utility

execution.

> Throttling in TASM eliminates need for Tenacity and Sleep.

Execution of queued jobs becomes FIFO. Execution of queued jobs

is immediate when resource available rather than at end of Sleep

time”

27

TASM Utility Session Configuration Rules



• For FastLoad, MultiLoad, and FastExport utilities, the DBS

default for number of AMP sessions is one per AMP.

• On a large system with hundreds or thousands of AMPs, this

default becomes inappropriate.

• Currently, a user can override this default by changing

individual load/export script, or changing the MAXSESS

parameter in the configuration file, or specifying through

runtime parameters (i.e., MAXSESS or –M).

• These overriding methods are inconvenient.

• This feature allows a DBA to define TDWM rules in one central

place that specifies the number of AMP sessions to be used

based a combination of the following criteria:

> Utility Name

> “Who” criteria (user, account, client address, query band, etc.)

> Data size





28

TASM Utility Session Configuration Rules



• Session configuration rules are optional.

• These rules are active when any category of TDWM is enabled.

• In each session configuration rule, the DBA specifies the

criteria and the number of sessions to be used when these

criteria are met.

• For example, for stand alone MultiLoad jobs submitted by user

Charucki, use 10 sessions.

• Session configuration rules also support the Archive/Restore

utility.

• The DBA can define similar rules to specify the number of

HUTPARSE sessions to be used for a specific set of criteria.

• A new internal DBSControl field: DisableTDWMSessionRules

is provided to disable user-defined session configuration rules

and default sessions rules while TDWM is enabled.

• When this field is set, Client and DBS will operate as in

Teradata 13.



29

Availability, Serviceability, DBA Tasks

Improvements

Fault Isolation



• Description

> Remove cases where faults can cause restarts

> Specific cases

– EVL fault isolation

– Unprotected UDFs

– Dictionary cache re-initialization



• Benefits

> Identify and isolate the fault to only the query or session

> Issues in query calculation and qualification will be isolated

> Badly behaving UDFs will have less opportunity to affect the

system

> Faults in the dictionary cache will result in the dictionary cache

being flushed and reloaded rather than affecting the entire system









31

AMP Fault Isolation





• Description

> This feature is intended to catch those AMP errors that currently

cause DBS restarts where the error can be dealt with by taking a

snapshot dump and aborting the transaction that caused the error

• Benefit

> This feature can reduce the number of DBS restarts for customers,

thus improving overall system availability

• What will it do?

> Current AMP Fault Isolation only avoids a full database restart for

errors when accessing spool tables

> The scope of fault isolation will be increased to cover ERRAMP* or

ERRFIL* errors on permanent tables as well spools

> Retrofitted to current supported releases









32

Read From Fallback



• Description

> In the event of encountering a data block read error, either

unreadable or corrupt data blocks, this feature will leverage the

pre-existing Fallback Table facility to transparently retrieve the

required data block from the fallback copy.

• Benefit

> When fallback is available, feature seriously improves fault

tolerance and system availability. Significantly improves the value

of having fallback and protects non-redundant (RAID 0 or JBOD)

storage media, such as SSD, from data loss without restart/failover.

• Considerations

> Fallback does not need to be instantiated as system-wide property,

because fallback is a table-level attribute, it can be applied

selectively to the largest/most critical customer tables.

> This facility does not in-and-of itself repair bad data blocks, but

allows them to be read from fallback until they can be repaired.





33

Read From Fallback - Particulars



• Reading data blocks from the Fallback copy is transparent to both a

user and/or application. Manual intervention is not required

whatsoever.



• Feature does not require any special or particular locking mechanism.



• A manual process is still required to rebuild the table to repair

unreadable or corrupt data blocks.



• Facility cannot recover from data block errors in the Cylinder Index,

NUSI Secondary Index or Permanent Journals.



• Read errors are fallback recoverable on TD Data Dictionary tables

with the exception of the unhashed system tables such as the WAL

log, Transient Journal and Space Accounting tables.



• Facility applies to SQL Queries with data block read errors, SQL

Insert…Select statements and the Archive utility where the block read

error is on the source table only.





34

Transparent Cylinder Packing



• Description

> Develop a new file system background task that will pro-actively

and transparently monitor and adjust the utilization (high or low)

of user data cylinders and pack/unpack said cylinders accordingly

with the goal of returning them to a more efficiently utilized state.

• Benefit

1. Cylinder Packing will result in cylinders having a higher datablock

to cylinder index ratio making Cylinder Read operations more

effective by reading less unoccupied sectors.

2. Higher cylinder utilization translates into data tables occupying

less cylinders leaving more cylinders available for other purposes.

3. Diminishes the chances that a “mini-cylpack” operation will be

executed and lessens the need for administrators to perform

regularly scheduled Packdisk operations.

• Considerations

> This feature will have several customer tunable parameters in

DBSControl that will allow customers to mange and adjust the

level of impact of the Transparent Cylinder Packing operations.



35

Merge Data Blocks

During Full Table Modify Operations



• Description

> During full table modification operations such as Multiload, Insert

Select and Update or Delete Where, combine adjacent blocks

when small blocks are present.

• Benefit

> Small data blocks increase the I/Os necessary to read a table and

interferes with features such as compression and large cylinders.

> Reduce the instances of small data blocks by combining them

when doing work on those blocks or adjacent ones.









36

Archive DBQL Rule Table



• Description

> Enhance the Teradata Archive utility to include two additional DBC

tables to the DBC database (Dictionary) backup/restore:

– DBC.DBQLRuleTbl

– DBC.DBQLRuleCountTbl



• Benefit

> Inclusion of the additional DBC tables in the DBC Archive/Restore

process will provide a mechanism by which these tables can be

archived/restored and will altogether eliminate the cumbersome

task of having to every time redefine the appropriate DBQL rules

after a Dictionary initialization.

> Implementation of this feature avoids the possibility of any table

synchronicity issues and offers simplicity, convenience, and

integrity when conducting a DBC archive/restore.

• Considerations

> DBC Archive will include these tables automatically in the

Dictionary Archive; no user intervention is required.



37

Be Aware

Especially if Considering Tech Refresh

Large Cylinder Support



• Description

> This feature increases data storage cylinder size, the basic

allocation unit for disk space in the Teradata file system. This also

includes an increase in the Cylinder Index size thus allowing for a

commensurate increase in storing more data blocks per cylinder.

• Benefit

> Eliminates the inefficiency associated with managing a large

number of small cylinders on very large disk drives, allows larger

AMP sizes (~10 TB per AMP), permits the more efficient storage of

Large Objects and provides the foundation for block level

compression by allowing more small blocks on a cylinder.

• Consideration

> This capability is only available starting in Teradata 13.10 and

going forward and requires a System Initialization (SysInit) to be

performed so that large cylinder support can be engaged. It is

anticipated that typically this activity would be performed during

technology refresh opportunities.





39

Packed Row format for 64-bit platforms



• Description

> With the introduction of Teradata 13.10, data will now be stored

on the database in byte-packed format whereas previously the

data had been stored in byte-aligned format.

• Benefits

> Translates directly into a 4-7 % disk space savings insofar as less

disk space is required to store byte-packed data than is required

with byte-aligned data. Additionally, enables data rows to be

accessed using fewer I/Os thus potentially enhancing the

performance of some workloads.

• Considerations

> This capability is only available starting in Teradata 13.10 and

going forward and requires a System Initialization (SysInit) to be

performed so that packed row format support can be engaged. It

is anticipated that typically this activity would be performed

during technology refresh opportunities.





40

Enhanced Teradata Hashing Algorithm



• Description

> Enhance the Teradata Hashing Algorithm to reduce the effects of

irregularities in character data on hash results.

• Benefit

> This enhancement is targeted to reduce the number of hash

collisions for character data stored as either Latin or Unicode,

notably strings that contain primarily numeric data. Reduction in

hash collisions reduces access time per AMP and produces a more

balanced row distribution which in-turn improves parallelism.

Reduced access time and increased parallelism translate directly

to better performance.

• Considerations

> This capability is only available starting in Teradata 13.10 and

going forward and requires a System Initialization (SysInit) to be

performed so that the enhanced hashing algorithm can be

engaged. It is anticipated that typically this activity would be

performed during technology refresh opportunities.



41

Teradata Database 13.10 3/18/10



• AMP fault isolation • Dictionary cache re-initialization

Support- Performance

Quality/

ability









• Parser diagnostic information capture • EVL fault isolation and unprotected UDFs



• FastExport without spooling • Merge data blocks during full table modify operations

• Character-based PPI • Statement independence

• Timestamp partition elimination • TVS Initial suggested temperature tables

• User Defined Ordered Analytics

• Restart time reduction • TASM: Utilities Management

Enable

Active









• Read from Fallback • TASM: Additional Workload Definitions

• TASM: Workload Designer

• Teradata 13.10 Teradata Express Edition • Moving current date in PPI

Ease of

Use









• Domain Specific System Functions • Automatic cylinder packing



• Algorithmic Compression for Character Data • Archive DBQL rule table

• VLC for VARCHAR columns • Enhanced trusted session security

• Block level compression • External Directory support enhancements

• Variable fetch size (JDBC) • Geospatial enhancements

Enterprise Fit









• User Defined SQL Operators • Statement Info Parcel Enhancements (JDBC)

• Temporal Processing • Support for IPv6

• Temporal table support • Support unaligned row format for 64-bit platforms

• Period data type enhancements • Enhanced hashing algorithm

• Replication support • Large cylinder support

• Time series Expansion support

42 >

Teradata Developer Exchange

http://developer.teradata.com/

• What is it?

> Portal for technical

insights

– Articles, blogs, podcasts

– Forums, FAQs, “How to”,

etc.

> Community of Teradata

experts

– Customers, Teradata R&D

and PS

> Share software

– Portlets, UDFs, SPs,

scripts, etc.

– Sample applications

• Who can use it?

> Anyone (read only)

> Registered contributors

– Blogs, code, ratings,

43 >

articles, etc.


Related docs
Other docs by jjepamony
TERADATA 13
Views: 23  |  Downloads: 0