Teradata Best Practices
Informatica Confidential. Do not duplicate. 1 of 23
This document discusses configuration and how-tos using PowerCenter 7.1.2 and NCR’s Teradata
RDBMS. It covers Teradata basics and also describes some “tweaks” which experience has shown
may be necessary to adequately deal with some of the “common” practices one may encounter at
a Teradata account. The Teradata documentation (especially the MultiLoad, FastLoad and Tpump
reference) is highly recommended reading material, as is the “External Loader” section of the
PowerCenter’s Server Manager Guide.
Additional Information: All Teradata documentation can be downloaded from the NCR web site
(http://www.info.ncr.com/Teradata/eTeradata-BrowseBy.cfm), it is also available on the Informatica
Tech Support website (tsspider.informatica.com/Docs/page1.html). There is a nice Teradata FAQ in
the Informatica Tech Support knowledge base (it contains a section on how to handle “timestamp”
columns). Finally, there is a “Teradata Forum” that provides a wealth of sometimes useful
information ( http://www.Teradataforum.com ).
Teradata is a relational database management system from NCR. It offers high performance for
very large databases tables because of its highly parallel architecture. It is a major player in the
retail space. While Teradata can run on other platforms, it is predominantly found on NCR hardware
(which runs NCR’s version of Unix). It is very fast and very scalable.
The NCR computers on which Teradata runs support both MPP (Massively Parallel Processing) and
SMP (Symetric Multi-Processing). Each MPP “node” (or semi-autonomous processing unit) can
Teradata can be configured to communicate directly with a mainframe’s I/O channel. This is known
as “channel attached”. Alternatively, it can be “network attached”. That is, configured to
communicate via TCP/IP over a LAN. Since PowerCenter runs on Unix, most of the time you will be
dealing with a “network attached” configuration. However, once in a while, a client will want to use
their existing “channel attached” configuration under the auspices of better performance. Do not
necessarily assume that “channel attached” is always faster than “network attached”. Similar
performance has been observed across a channel attachment as well as a 100MB LAN. In addition,
“channel attached” requires an additional sequential data move because the data must be moved
from the PowerCenter server to the mainframe prior to moving the data across the mainframe
channel to Teradata.
In the Teradata world, there are Teradata Director Program Ids (TDPIDs), databases and users. The
TDPID is simply the name that one uses to connect from a Teradata client to Teradata server (think
Oracle “tnsnames.ora” entry). Teradata also looks at databases and users somewhat
synonymously. A user has a userid, password and space to store tables. A database is basically a
user without a login and password (or a user is a database with a userid and password).
Teradata AMPs are Access Module Processors. Think of AMPs as Teradata’s parallel database
engines. Although they are strictly software (“virtual processors” according to NCR terminology),
Teradata folks often seem to use AMP and hardware “node” interchangeable because in the “old
days” an AMP was a piece of hardware.
Informatica Confidential. Do not duplicate. 2 of 23
Client Configuration Basics for Teradata
The client side configuration is wholly contained in the “hosts” file (/etc/hosts on Unix or
winnt\system32\drivers\etc\hosts on Win). Since INFA does not run on NCR Unix, one should not
ever have to deal with the server side. Teradata uses a naming nomenclature in the “hosts” file.
The name of the Teradata instance (i.e. tdpid – Teradata Director Program Id) is indicated by the
letters and numbers that precede the string “cop1” in a hosts file entry. For example:
127.0.0.1 localhost demo1099cop1
192.168.80.113 curly pcop1
This tells Teradata that when a client tool references the instance “demo1099”, it should direct
requests to “localhost” (or ip address 127.0.0.1), when a client tool references instance “p”, this
located on the server “curly” (or ip address 192.168.80.113). There is no tie here to any kind of
database server specific information (this is not similar to Oracle’s instance id. Tdpid <> Oracle
instance id!!!). That is, the tdpid is used strictly to define the name a client uses to connect to a
server. You can really call a server whatever you want. Teradata does not care. It simply takes the
name you specify, looks in the “host” file to map the <name>cop1 (or cop2, etc.) to an IP address,
and then attempts to establish a connect with Teradata at the IP address.
Sometimes you’ll see multiple entries in a hosts file with similar tdpids:
127.0.0.1 localhost demo1099cop1
192.168.80.113 curly_1 pcop1
192.168.80.114 curly_2 pcop2
192.168.80.115 curly_3 pcop3
192.168.80.116 curly_4 pcop4
This setup allows load balancing of clients among multiple Teradata nodes. That is, most Teradata
systems have many nodes, and each node has its own IP address. Without the multiple hosts file
entries, every client will connect to one node and eventually this node will be doing more than its
“fair share” of client processing. With multiple host file entries, if it takes too long for the node
specified with the “cop1” suffix to respond (i.e. curly_1) to the client request to connect to “p”, then
the client will automatically attempt to connect to the node with the “cop2” suffix (i.e. curly_2) and
Informatica / Teradata touch points
Informatica 7.1.2 accesses Teradata with severalthrough various Teradata tools. Each will be
defined and as to how it is configured within PowerCenter.
ODBC: Teradata provides 32-bit ODBC drivers for Windows and Unix platforms. If possible, use the
ODBC driver from Teradata’s TTU7 release (or above) of their client software because this version
supports “array reads”. Tests have shown these “new” drivers (3.02) can be 20%-30% faster than
the “old” drivers (3.01). This lastest release of Teradata’s TTU8.0 uses ODBC v3.0421. Teradata’s
ODBC is on a performance par with Teradata’s SQL CLI. In fact, ODBC is Teradata recommended
SQL interface for their partners.
Do not use ODBC to write to Teradata unless you’re writing very small data sets (and even then,
you should probably use Tpump defined later instead) because Teradata’s ODBC is optimized for
query access and, hence, is not optimized for writing data. ODBC is good for sourcing and lookups.
PowerCenter Designer uses Teradata’s ODBC to import Source and Target table.
Informatica Confidential. Do not duplicate. 3 of 23
If you are having performance problems you can use a cmd task with a shell script to call BTEQ.
SQL with intermediate work table which could be sourced by PowerCenter
Informatica Confidential. Do not duplicate. 4 of 23
When the PowerCenter server is running on Unix, then ODBC is required to read (both sourcing and
lookups) from Teradata
As with all Unix ODBC drivers, the key to configuring the Unix ODBC driver is adding the appropriate
entries to the “.odbc.ini” file. To correctly configure the “.odbc.ini” file, there must be an entry
under [ODBC Data Sources] that points to the Teradata ODBC driver shared library (tdata.sl on HP-
UX, standard shared library extension on other flavors of Unix ).
The following example shows the required entries from an actual “.odbc.ini” file (note the path to
the driver may be different on each computer):
[ODBC Data Sources]
dBase=MERANT 3.60 dBase Driver
Oracle8=MERANT 3.60 Oracle 8 Driver
Text=MERANT 3.60 Text Driver
Sybase11=MERANT 3.60 Sybase 11 Driver
Informix=MERANT 3.60 Informix Driver
DB2=MERANT 3.60 DB2 Driver
MS_SQLServer7=MERANT SQLServer driver
Informatica Confidential. Do not duplicate. 5 of 23
Description=Teradata Test System
Similar to the client “hosts” file setup, one can specify multiple IP addresses for the DBCName to
balance the client load across multiple Teradata nodes. Consult with the Teradata administrator for
exact details on this (or copy the entries from the PC client’s “hosts” file (see section Client
Important note: Make sure that the Merant ODBC path precedes the Teradata ODBC path
information in the PATH and SHLIB_PATH (or LD_LIBRARY_PATH, etc.) environment variables. This is
because both sets of ODBC software use some of the same file names. PowerCenter should use the
Merant files because this is the software that has been certified.
Teradata external loaders
PowerCenter 7.1.2 supports 4 different Teradata external loaders: Tpump, FastLoad, MultiLoad,
TeradataWarehouse Builder ( TWB ). The actual Teradata loader executables ( tpump, mload,
fastload, tbuild ) must be accessible by the Powercenter Server generally in the path statement.
All of the Teradata loader connections will require a value to the TDPID attribute. Refer to the first
section of this document to understand how to correctly enter the value. All of these loaders
• a load file ( can be configured to be a stream/pipe and is autogenerated by PowerCenter )
• a control file of commands to tell the loader what to do ( PowerCenter autogenerates)
All of these loaders will also produce a log file. This log file will be the means to debug the loader if
something goes wrong. As these are external loaders, PowerCenter will only receive back from the
loader whether it ran successfully or not.
Informatica Confidential. Do not duplicate. 6 of 23
By default, the input file, control file and log file will be created in $PMTargetFileDir of the
PowerCenter Server executing the workflow
Informatica Confidential. Do not duplicate. 7 of 23
Any of these loaders can be used by configuring the target in the PowerCenter session to be a “File
Writer” and then choose the appropriate loader.
Informatica Confidential. Do not duplicate. 8 of 23
The auto-generated control file can be overridden. Click the “Pencil” icon next to the loader
Informatica Confidential. Do not duplicate. 9 of 23
Scroll to the bottom of the connection attribute list and click the value next to the “control file
content Override” attribute. Then click the down arrow.
Informatica Confidential. Do not duplicate. 10 of 23
Click the Generate button and change the control file as you wish. The changed control file is
stored in the repository,
Informatica Confidential. Do not duplicate. 11 of 23
Most of the loaders also use some combination of internal Work, Error and Log tables. By default,
these will be in the same database as the target table. All of these can now be overridden in the
attributes of the connection.
To land the input flat file that the loaders need to disk, the “is staged” attribute must be checked. If
the “is staged” attribute is not set, then the file will be piped/streamed to the loader.
If one selects the non-staged mode for a loader, one should also set the “checkpoint” property to 0.
This effectively turns off the “checkpoint” processing. “Checkpoint” processing is used for
recovery/restart of fastload and multiload sessions. However, if one is not using a physical file as
input, but rather a named pipe, then the recovery/restart mechanism of the loaders does not work.
Not only does this impact performance (i.e. the checkpoint processing is not free and we want to
eliminate as much unnecessary overhead as possible), but a non-zero checkpoint value will
sometimes cause seemingly random errors and session failures when used with named pipe input
(as is the case with “streaming” mode).
Informatica Confidential. Do not duplicate. 12 of 23
Teradata loader Requirements for PowerCenter servers on Unix
All Teradata load utilities require a “non-null” standard output and standard error to run properly.
“Standard output” (stdout) and “standard error” (stderr) are Unix conventions that determine the
default location for a program to write output and error information. When you start the pmserver
without explicitly defining stdout and stderr, these both point to the current terminal session. If you
logout of Unix, Unix redirects stdout and stderr to /dev/null (i.e. a placeholder that throws out
anything written to it). At this point, Teradata loader sessions will fail because they do not permit
stdout and stderr to be /dev/null. Therefore, you must start pmserver as follows (“cd” to the
PowerCenter installation directory):
./pmserver ./pmserver.cfg > ./pmserver.out 2>&1
This starts the pmserver using the “pmserver.cfg” config file and points stdout and stderr to the file
“pmserver.out”. In this way, stderr and stdout will be defined even after the terminal session logs
out. Important note: There are no spaces in the token “2>&1”. This tells Unix to point stderr to
the same place stdout is pointing.
As an alternative to this method, you can specify the console output file name in the pmserver.cfg
file. That is, information written to “standard output” and “standard error” will go the file specified
With this entry in the pmserver.cfg file, one can start the pmserver normally (i.e.
With PowerCenter v7.x, if one sets a “round robin” partition point on the target definition and sets
each target instance to be loaded using the same loader connection instance, then PowerCenter
automatically writes all data to the first partition and only starts one instance of FastLoad or
MultiLoad. You will know you are getting this behavior if you see the following entry in the session
MAPPING> DBG_21684 Target [TD_INVENTORY] does not support multiple partitions. All
data will be routed to the first partition.
If you do not see this message, then chances are the session fails with the following error:
WRITER_1_*_1> WRT_8240 Error: The external loader [Teradata Mload Loader] does not
support partitioned sessions.
WRITER_1_*_1> Thu Jun 16 11:58:21 2005
WRITER_1_*_1> WRT_8068 Writer initialization failed. Writer terminating.
Informatica Confidential. Do not duplicate. 13 of 23
Tpump is an external loader that supports inserts, updates, upserts and deletes and data driven
updates. Multiples Tpump’s can execute simultaneously against the same table as it doesn’t use
many resource nor does it require table level locks. It is often used to “trickle load” a table. As
stated earlier, it will be a faster way to update a table as opposed to ODBC, but will not be as fast
as the other loaders.
Informatica Confidential. Do not duplicate. 14 of 23
This is a sophisticated bulk load utility and is the primary method PowerCenter uses to load/update
mass quantities of data into Teradata. Unlike bulk load utilities from other vendors, MultiLoad
supports inserts, updates, upserts, delete and data driven operations in PowerCenter. You can also
use variables and embed conditional logic into MultiLoad scripts. It is very fast (millions of rows in a
few minutes). It can be resource intensive and will take a table lock.
Cleaning up after a failed MultiLoad: MultiLoad supports sophisticated error recovery. That is, it allows load
jobs to be restarted without having to redo all of the prior work. However, for the types of problems normally
encountered during a POC (loading null values into a column that does not support nulls, incorrectly formatted
date columns), the error recovery mechanisms tend to get in the way. To learn about MultiLoad’s sophisticated
error recovery read the MultiLoad manual. To learn how to work around the recovery mechanisms to restart a
failed MultiLoad script from scratch, read this section.
Informatica Confidential. Do not duplicate. 15 of 23
MultiLoad puts the target table into the “MultiLoad” state. Upon successful completion, the target
table is returned to the “normal” (non-“MultiLoad”) state. Therefore, when a MultiLoad fails for any
reason, the table is left in “MultiLoad” state, and one cannot simply re-run the same MultiLoad.
MultiLoad will report an error. In addition, MultiLoad also queries the target table’s MultiLoad log
table to see if it contains any errors. If a MultiLoad log table exists for the target table, then you
also will not be able to rerun your MultiLoad job.
To recover from a failed MultiLoad, one must “release” the target table from the “MultiLoad” state
and also drop the MultiLoad log table. One can do this using BTEQ or QueryMan to issue the
drop table mldlog_<table name>;
release mload <table name>;
Note: The “drop table” command assumes that you’re recovering from a MultiLoad script
generated by PowerCenter (PowerCenter always names the MultiLoad log table “mldlog_<table
name>). If you’re working with a hand-coded MultiLoad script, the name of the MultiLoad log table
could be anything.
Here is the actual text from a BTEQ session which cleans up a failed load to the table “td_test”
owned by the user “infatest”:
BTEQ -- Enter your DBC/SQL request or BTEQ command:
drop table infatest.mldlog_td_test;
drop table infatest.mldlog_td_test;
*** Table has been dropped.
*** Total elapsed time was 1 second.
BTEQ -- Enter your DBC/SQL request or BTEQ command:
release mload infatest.td_test;
release mload infatest.td_test;
*** Mload has been released.
*** Total elapsed time was 1 second.
Using one instance of MultiLoad to load multiple tables
MultiLoad is a big consumer of resources on a Teradata system. Some systems will have hard limits
on the number of concurrent MultiLoad sessions allowed. By default, PowerCenter will start an
instance of MultiLoad for every target file. Sometimes, this is illegal (if the multiple instances target
the same table). Other times, it is just expensive. Therefore, a prospect may ask that PowerCenter
use a single instance of MultiLoad to load multiple tables (or to load both inserts and updates into
the same target table). To make this happen, we’re back to heavy editing of the generated
MultiLoad script file. Note: This should not be an issue with Tpump because Tpump is not as
resource intensive as MultiLoad (and a multiple concurrent instances of Tpump can target the same
table). Here’s the workaround:
1) Use a dummy session (i.e. set test rows to 1 and target a test database) to generate
MultiLoad control files for each of the targets.
Informatica Confidential. Do not duplicate. 16 of 23
2) Merge the multiple control files (one per target table) into a single control file (one for all
3) Configure the session to call MultiLoad from a post-session script using the control file
created in step 2. Integrated support cannot be used because each input file is processed
sequentially and this causes problems when combined with PowerCenter’s integrated named
pipes and streaming.
Details on “merging” the control files:
1) There is a single log file for each instance of MultiLoad. Therefore, you do not have to
change or add anything the “LOGFILE” statement. However, you might want to change the
name of the log table since it may be a log that spans multiple tables.
2) Copy the work and error table delete statements into the common control file
3) Modify the “BEGIN MLOAD” statement to specify all the tables that the MultiLoad will be
4) Copy the “Layout” sections into the common control file and give each a unique name.
Organize the file such that all the layout sections are grouped together.
5) Copy the “DML sections into the common control file and give each a unique name.
Organize the file such that all the DML sections are grouped together.
6) Copy the “Import” statements into the common control file and modify them to reflect the
unique names created for the referenced LAYOUT and DML sections created in steps 4) and
5). Organize the file such that all the Import sections are grouped together.
7) Run “chmod –w” on the newly minted control file so PowerCenter doesn’t overwrite it, or,
better yet, name it something different so PowerCenter cannot overwrite it.
8) It’s just that easy!!! Also remember, a single instance of MultiLoad can target at most 5
tables. Therefore, don’t combine more than 5 target files into a common file.
Here’s an example of a control file merged from two default control files:
DROP TABLE infatest.UV_TD_TEST ;
DROP TABLE infatest.WT_TD_TEST ;
DROP TABLE infatest.ET_TD_TEST ;
DROP TABLE infatest.UV_TD_CUSTOMERS ;
DROP TABLE infatest.WT_TD_CUSTOMERS ;
DROP TABLE infatest.ET_TD_CUSTOMERS ;
.ROUTE MESSAGES WITH ECHO TO FILE
.BEGIN IMPORT MLOAD TABLES infatest.TD_TEST, infatest.TD_CUSTOMERS
Informatica Confidential. Do not duplicate. 17 of 23
/* Begin Layout Section */
.Field CUST_KEY 1 CHAR( 12) NULLIF CUST_KEY = '*' ;
.Field CUST_NAME 13 CHAR( 20) NULLIF CUST_NAME = '*' ;
.Field CUST_DATE 33 CHAR( 10) NULLIF CUST_DATE = '*' ;
.Field CUST_DATEmm 33 CHAR( 2) ;
.Field CUST_DATEdd 36 CHAR( 2) ;
.Field CUST_DATEyyyy 39 CHAR( 4) ;
.Field CUST_DATEtd CUST_DATEyyyy||'/'||CUST_DATEmm||'/'||CUST_DATEdd NULLIF
CUST_DATE = '*' ;
.Filler EOL_PAD 43 CHAR( 2) ;
.Field CUSTOMER_KEY 1 CHAR( 12) ;
.Field CUSTOMER_ID 13 CHAR( 12) ;
.Field COMPANY 25 CHAR( 50) NULLIF COMPANY = '*' ;
.Field FIRST_NAME 75 CHAR( 30) NULLIF FIRST_NAME = '*' ;
.Field LAST_NAME 105 CHAR( 30) NULLIF LAST_NAME = '*' ;
.Field ADDRESS1 135 CHAR( 72) NULLIF ADDRESS1 = '*' ;
.Field ADDRESS2 207 CHAR( 72) NULLIF ADDRESS2 = '*' ;
.Field CITY 279 CHAR( 30) NULLIF CITY = '*' ;
.Field STATE 309 CHAR( 2) NULLIF STATE = '*' ;
.Field POSTAL_CODE 311 CHAR( 10) NULLIF POSTAL_CODE = '*' ;
.Field PHONE 321 CHAR( 30) NULLIF PHONE = '*' ;
.Field EMAIL 351 CHAR( 30) NULLIF EMAIL = '*' ;
.Field REC_STATUS 381 CHAR( 1) NULLIF REC_STATUS = '*' ;
.Filler EOL_PAD 382 CHAR( 2) ;
/* End Layout Section */
/* begin DML Section */
.DML Label tagDML1;
INSERT INTO infatest.TD_TEST (
) VALUES (
.DML Label tagDML2;
INSERT INTO infatest.TD_CUSTOMERS (
Informatica Confidential. Do not duplicate. 18 of 23
) VALUES (
/* end DML Section */
/* Begin Import Section */
.Import Infile c:\LOGS\TgtFiles\td_test.out
.Import Infile c:\LOGS\TgtFiles\td_customers.out
/* End Import Section */
Informatica Confidential. Do not duplicate. 19 of 23
Multiple workflows that Multiload to the same table
Since multiload puts a lock on the table, we require that all multiload sessions must handle wait
events so they don't try to access the table simultaneously. Also, any log files should be given
unique names for the same reason.
As the name suggests, this is a very fast utility to load data into Teradata. It is the fastest method
to load data into Teradata. However, there is one major restriction: the target table must be empty.
Informatica Confidential. Do not duplicate. 20 of 23
Teradata Warehouse Builder (TWB)
Teradata Warehouse Builder (TWB) is a single utility that was intended to replace FastLoad,
MultiLoad, Tpump and FastExport. It was to support a single scripting environment with different
“modes”, where each “mode” roughly equates to one of the legacy utilities. It also was to support
parallel loading (i.e. multiple instances of a TWB client could run and load the same table at the
same time – something the legacy loaders cannot do). PowerCenter supports TWB. Unfortunately,
NCR/Teradata does not. TWB has never been formally released (never went “GA”). According to
NCR, the release was delayed primarily because of issues with the mainframe version. If you find a
prospect willing to use TWB, please do. Its ability to support parallel load clients makes some
things quite a bit easier.
Informatica Confidential. Do not duplicate. 21 of 23
PowerCenter 7.1.2 PAM for Teradata
Server Client Software
Platform Version & Version Source Target Comment
PowerCenter uses the following components
of the Teradata Tools and Utilities (TTU):
Teradata ODBC driver, FastLoad, MultiLoad,
and Tpump. The version numbers of each of
the Teradata Client components vary with
release (e.g. TTU7 contains ODBC 3.02,
FastLoad 7.05, MultiLoad 3.03, Tpump 1.07).
The TeradataTools and Utilities (TTU) was
previously called Teradata Utilities
Foundation (TUF). Compatibility between a
particular version of the Teradata RDBMS
TUF 6.1, TTU 6.1.1, and the Teradata Client software is
NCR Teradata Unix and NT V2R4 TTU 6.2 x x determined by Teradata -- not Informatica.
TUF 6.1, TTU 6.1.1,
Unix and NT V2R4.1 TTU 6.2, TTU 7 x x
The Teradata Client/Teradata RDBMS
pairings listed here represent our
understanding based on
Teradata's documentation. Note that the
minimum version number for the Teradata
ODBC driver is 3.00.01.04. Teradata has
made many fixes to the 3.02 ODBC driver. If
you are using this driver, please contact NCR
Unix and NT V2R5 TTU 7 X X support for the latest maintenance release
Unix and NT V2R5.1 TTU 7 x x
Unix and NT V2R6 TTU 8 x x Supported Teradata ODBC driver is v220.127.116.11.
Informatica Confidential. Do not duplicate. 22 of 23
Future Support of Teradata:
FastExport is just as it’s name implies, that is a utility to extract data from Teradata very quickly. It
will be supported in PowerCenter 7.1.3 for Teradata sources. It should extract large table used for
lookups quicker that ODBC.
The is also commonly referred to as ELT. The Zeus release of PowerCenter will have the ability to
create SQL that will execute in the Teradata database server that will replace certain transformation
that would normally run in the PowerCenter Server. This is critical to Teradata as it is common for
the Source and Target to reside in the same database.
Informatica Confidential. Do not duplicate. 23 of 23