expressor Documentation

Document Sample
expressor Documentation Powered By Docstoc
					expressor Documentation
                                                                                  Table of Contents
What's New in expressor Version 3.4 RC1............................................................................................................................................................ 1

    expressor 3.4 ................................................................................................................................................................................................................ 1

    Lookup Tables ............................................................................................................................................................................................................. 1

    Simplified Data Propagation ................................................................................................................................................................................. 1

    New Configuration Parameters ............................................................................................................................................................................ 1

    Multi-Test Filter Operator ....................................................................................................................................................................................... 2

    Native Database Merge Support ......................................................................................................................................................................... 2

    Pivot Row and Pivot Column Operators ........................................................................................................................................................... 2

    Multi-Step Dataflows ................................................................................................................................................................................................ 3

    Persistent Values ........................................................................................................................................................................................................ 3

    Desktop Edition........................................................................................................................................................................................................... 3

    Free Studio Download ............................................................................................................................................................................................. 3

    Community Center .................................................................................................................................................................................................... 3

Key Concepts in expressor software ....................................................................................................................................................................... 5

    Reusability ..................................................................................................................................................................................................................... 5

    Semantic Types ........................................................................................................................................................................................................... 7

    Transformations .......................................................................................................................................................................................................... 7

    Studio Explorer ............................................................................................................................................................................................................ 9

    Integrated Source Control ................................................................................................................................................................................... 10

    Deployment ............................................................................................................................................................................................................... 10

Databases Supported ................................................................................................................................................................................................. 13

Product Installation ..................................................................................................................................................................................................... 15

    Install Free expressor Studio .............................................................................................................................................................................. 15

    Install expressor Standard Edition.................................................................................................................................................................... 17

    Install expressor Desktop Edition ..................................................................................................................................................................... 29

    Remove expressor Licenses ................................................................................................................................................................................ 31

Getting Started with Studio ..................................................................................................................................................................................... 33

    Studio Basics ............................................................................................................................................................................................................. 33

    Studio Interface ........................................................................................................................................................................................................ 33




                                                                                                                                                                                                                                    iii
expressor Documentation



     Workspace Explorer ............................................................................................................................................................................................... 35

     Samples and Solutions .......................................................................................................................................................................................... 36

Workspaces, Projects, and Libraries ..................................................................................................................................................................... 37

     Workspaces, Projects, and Libraries ................................................................................................................................................................ 37

     Create a New Workspace..................................................................................................................................................................................... 38

     Open or close a workspace ................................................................................................................................................................................. 40

     Convert a Standalone to a Repository Workspace ................................................................................................................................... 41

     Create a Project........................................................................................................................................................................................................ 41

     Create a Library ........................................................................................................................................................................................................ 42

     Associate a Library with a Project..................................................................................................................................................................... 43

     Export Projects ......................................................................................................................................................................................................... 43

     Import Projects ......................................................................................................................................................................................................... 44

     Manage Artifacts ..................................................................................................................................................................................................... 45

Dataflows ......................................................................................................................................................................................................................... 49

     Dataflows .................................................................................................................................................................................................................... 49

     Create a New Dataflow ......................................................................................................................................................................................... 54

     Build Dataflows with Operators ........................................................................................................................................................................ 55

     Configure Operators .............................................................................................................................................................................................. 59

     Run a Dataflow ......................................................................................................................................................................................................... 64

     Manage Log File Output ...................................................................................................................................................................................... 65

     Debug a Dataflow ................................................................................................................................................................................................... 66

     Reference: Propagation Rules ............................................................................................................................................................................ 66

Operators ........................................................................................................................................................................................................................ 69

     Aggregate Operator............................................................................................................................................................................................... 69

     Buffer Operator ........................................................................................................................................................................................................ 74

     Copy Operator .......................................................................................................................................................................................................... 75

     Filter Operator .......................................................................................................................................................................................................... 75

     Funnel Operator....................................................................................................................................................................................................... 77

     Join Operator ............................................................................................................................................................................................................ 78

     Pivot Column Operator......................................................................................................................................................................................... 80

     Pivot Row Operator ................................................................................................................................................................................................ 82



iv
                                                                                                                                                                                                  Table of Contents



    Read Custom Operator ......................................................................................................................................................................................... 84

    Read File Operator.................................................................................................................................................................................................. 91

    Read Lookup Table Operator ............................................................................................................................................................................. 92

    Read Table Operator ............................................................................................................................................................................................. 92

    Sort Operator ............................................................................................................................................................................................................ 94

    SQL Query .................................................................................................................................................................................................................. 94

    Transform Operator ............................................................................................................................................................................................... 95

    Trash Operator ......................................................................................................................................................................................................... 98

    Unique Operator ..................................................................................................................................................................................................... 98

    Write Custom Operator ...................................................................................................................................................................................... 101

    Write File Operator ............................................................................................................................................................................................... 105

    Write Lookup Table Operator .......................................................................................................................................................................... 106

    Write Table Operator ........................................................................................................................................................................................... 106

Operator Templates .................................................................................................................................................................................................. 111

    Operator Templates ............................................................................................................................................................................................. 111

    Create Operator Templates .............................................................................................................................................................................. 111

Pivot Editor ................................................................................................................................................................................................................... 113

    What is a Pivot?...................................................................................................................................................................................................... 113

    Create a Pivot.......................................................................................................................................................................................................... 114

Rules Editor .................................................................................................................................................................................................................. 117

    Transformations ..................................................................................................................................................................................................... 117

    Learning to Write Datascripts .......................................................................................................................................................................... 118

    Mapping Input to Output Attributes ............................................................................................................................................................ 124

    Write Rules ............................................................................................................................................................................................................... 128

expressor Datascript language............................................................................................................................................................................. 141

    The Datascript language .................................................................................................................................................................................... 141

    expressor Datascript Pattern Matching ....................................................................................................................................................... 143

    Lexical conventions .............................................................................................................................................................................................. 145

    Values and Types................................................................................................................................................................................................... 147

    Variables.................................................................................................................................................................................................................... 150

    Statements ............................................................................................................................................................................................................... 150



                                                                                                                                                                                                                                   v
expressor Documentation



     Expressions .............................................................................................................................................................................................................. 154

     Visibility rules .......................................................................................................................................................................................................... 162

     Scope of variables in an expressor datascript ........................................................................................................................................... 163

     Multiple assignment statements .................................................................................................................................................................... 164

     Relational and logical operators ..................................................................................................................................................................... 165

     Write and use functions ..................................................................................................................................................................................... 167

     Datascript tables .................................................................................................................................................................................................... 170

     Object oriented programming ........................................................................................................................................................................ 175

     Call external scripts from an expressor datascript .................................................................................................................................. 176

     Use Runtime Parameters in expressor Datascript ................................................................................................................................... 179

expressor functions ................................................................................................................................................................................................... 181

     Datascript functions ............................................................................................................................................................................................. 181

     basic functions ........................................................................................................................................................................................................ 182

     bit functions............................................................................................................................................................................................................. 192

     byte functions ......................................................................................................................................................................................................... 199

     datetime functions ................................................................................................................................................................................................ 201

     decimal functions .................................................................................................................................................................................................. 210

     expressor Operator helper functions ............................................................................................................................................................ 218

     expressor runtime parameters......................................................................................................................................................................... 228

     The initialize and finalize functions ............................................................................................................................................................... 228

     is functions ............................................................................................................................................................................................................... 231

     logging functions .................................................................................................................................................................................................. 239

     lookup functions .................................................................................................................................................................................................... 241

     math functions ....................................................................................................................................................................................................... 246

     string functions ...................................................................................................................................................................................................... 260

     table functions ........................................................................................................................................................................................................ 283

     ustring functions .................................................................................................................................................................................................... 286

     utility functions....................................................................................................................................................................................................... 300

Connections ................................................................................................................................................................................................................. 311

     Connections ............................................................................................................................................................................................................. 311

     Create a Connection for a file-based resource ......................................................................................................................................... 312



vi
                                                                                                                                                                                                   Table of Contents



    Create a Connection for a Database ............................................................................................................................................................. 313

    Change the Directory Path in a File Connection ...................................................................................................................................... 316

    Change a DSN Connection ............................................................................................................................................................................... 316

    Change a Provider Connection ....................................................................................................................................................................... 317

Schema ........................................................................................................................................................................................................................... 319

    Schema ...................................................................................................................................................................................................................... 319

    Create Delimited Schema .................................................................................................................................................................................. 323

    Create a Schema for a Database resource.................................................................................................................................................. 326

    Create an SQL Query Schema .......................................................................................................................................................................... 329

    Create a Schema for a Rejected Record ...................................................................................................................................................... 330

    Change Schema delimiters and fields .......................................................................................................................................................... 331

    Change a Table Schema ..................................................................................................................................................................................... 332

    Change SQL Query Schema .............................................................................................................................................................................. 334

    Map Schema Fields to Composite Type Attributes ................................................................................................................................ 336

    Change Data Format for Mapping ................................................................................................................................................................. 338

    Change Schema Mappings to Composite Types ..................................................................................................................................... 358

    Change Composite Type Attributes for a Schema .................................................................................................................................. 360

Parameter Management ......................................................................................................................................................................................... 363

    What Are Parameters? ........................................................................................................................................................................................ 363

    Substitute Parameters in a Dataflow ............................................................................................................................................................. 363

    Supported Parameters ........................................................................................................................................................................................ 365

Semantic Types ........................................................................................................................................................................................................... 369

    Semantic Types ...................................................................................................................................................................................................... 369

    Create an Atomic Type ....................................................................................................................................................................................... 372

    Create a Composite Type .................................................................................................................................................................................. 372

    Select the Data Type and Constraints for an Atomic Type .................................................................................................................. 373

    Add Attributes to a Composite Type ............................................................................................................................................................ 373

    Map Types in the Rules Editor ......................................................................................................................................................................... 375

    Change Semantic Types ..................................................................................................................................................................................... 376

    Reference: Constraint Pattern Matching ..................................................................................................................................................... 381

    Reference: Constraints on Semantic Types ................................................................................................................................................ 383



                                                                                                                                                                                                                                  vii
expressor Documentation



    Reference: Constraint Corrective Actions ................................................................................................................................................... 388

Datascript Modules ................................................................................................................................................................................................... 393

    Datascript Modules .............................................................................................................................................................................................. 393

    Create a Datascript Module .............................................................................................................................................................................. 394

    Include a Datascript Module in a Datascript Operator ......................................................................................................................... 396

    Reference: Datascript Module Editing .......................................................................................................................................................... 397

Lookup Tables ............................................................................................................................................................................................................. 399

    What is a Lookup Table? .................................................................................................................................................................................... 399

    Define the Structure of a Lookup Table ...................................................................................................................................................... 400

    Populate a Lookup Table ................................................................................................................................................................................... 401

    Use Data in a Lookup Table .............................................................................................................................................................................. 402

    Read Data in a Lookup Table ........................................................................................................................................................................... 403

Deployment Packages ............................................................................................................................................................................................. 405

    Deployment Packages ......................................................................................................................................................................................... 405

    Create a New Deployment Package .............................................................................................................................................................. 405

    Build a Deployment Package ........................................................................................................................................................................... 406

    Update Artifacts in a Deployment Package ............................................................................................................................................... 407

    Manage External Files.......................................................................................................................................................................................... 409

    Run a Compiled Dataflow .................................................................................................................................................................................. 410

    Deploying Packages Manually ......................................................................................................................................................................... 410

Samples and Solutions ............................................................................................................................................................................................ 413

    Build and Deploy an expressor Application ............................................................................................................................................... 413

    Connect to Data Services with Datascript Operators ............................................................................................................................. 430

Using expressor Engine ........................................................................................................................................................................................... 435

    Running Dataflows with the expressor Engine ......................................................................................................................................... 435

    Check Out a Project ............................................................................................................................................................................................. 435

    Run a Dataflow in a Deployment Package ................................................................................................................................................. 436

    Run a Compiled Dataflow .................................................................................................................................................................................. 436

Managing and Using expressor Repository.................................................................................................................................................... 437

    Storing Artifacts in the Repository................................................................................................................................................................. 437

    Set User Access to a Repository ..................................................................................................................................................................... 437



viii
                                                                                                                                                                                                        Table of Contents



    Backup and Restore Repository Data ........................................................................................................................................................... 438

    Check Artifacts Out of a Repository .............................................................................................................................................................. 439

    Manage Changes to Projects and Artifacts Checked Out of a Repository ................................................................................... 440

    Lock Artifacts in a Repository .......................................................................................................................................................................... 442

    Resolve Conflicts between Versions of Projects or Artifacts ............................................................................................................... 443

    Change Repository Properties ......................................................................................................................................................................... 444

    Change Repository Credentials ....................................................................................................................................................................... 445

Command Line Utilities ........................................................................................................................................................................................... 447

    ecommand ............................................................................................................................................................................................................... 447

    eflowsubst ................................................................................................................................................................................................................ 447

    ekill .............................................................................................................................................................................................................................. 448

    elicense ...................................................................................................................................................................................................................... 449

    eproject ..................................................................................................................................................................................................................... 449

    datascript .................................................................................................................................................................................................................. 451

    etask............................................................................................................................................................................................................................ 453

Glossary .......................................................................................................................................................................................................................... 463




                                                                                                                                                                                                                                         ix
What's New in expressor Version 3.4 RC1

Product Version:                        expressor 3.4 RC1


Documentation Revision Date:            September 16, 2011



expressor 3.4
The expressor 3.4 data integration platform includes new capabilities that focus on three areas: enhancements to
support data migration and data warehousing use-cases, dataflow usability, and deployment. With this release, users
will find new tools and functionality to address more sophisticated data integration projects and deployments.


Lookup Tables
Lookup Tables enable expressor data integration application developers to access data records using simple lookup
rules within a dataflow. Lookup Tables are described by easy-to-share reusable metadata artifacts and are treated like
any other source or target. They are easily referenced using the Lookup Rules in any expressor transformation
operator. The new Read and Write Lookup Table operators allow Lookup Tables to be read, initialized, or updated at
any step in a dataflow.

Lookup Table operations are also exposed through Datascript, giving users complete control over how Lookup Table
records are found, used, and updated.


Simplified Data Propagation
The new Rules editor replaces the Transform Editor. The Rules Editor's new, simplified interface, enables users to
―compartmentalize‖ and visualize transformation rules while insulating them from changes to the names of upstream
input attributes and downstream output attributes. Attribute propagation is a new built-in capability that
automatically manages the flow of data through such operators. With this capability, users are no longer required to
explicitly manage an ―output Composite Type‖ for interior operators. All input attributes are automatically
―transferred‖ to the output and any ―unused‖ outputs are automatically eliminated. All transformation operators will
automatically upgrade to the new Rules editor, allowing seamless leverage of these usability enhancements by
existing applications.

    Note: Dataflows created prior to the expressor Version 3.4 should be backed up before opening them in
              the Version 3.4 Studio. There are known issues that could cause the upgrade of the dataflow to
              fail and possibly corrupt the dataflow.


New Configuration Parameters
Parameterization dramatically improves deployment to Development/Test/Production servers. This new capability
allows users to easily change the source and target databases for the same dataflow after it has already been
deployed to different servers. With this capability, it is no longer necessary to create Connection artifacts for each
server or multiple Dataflows to use the appropriate Connection artifact on each server. Instead, users create only one




                                                                                                                         1
expressor Documentation



Database (or File) Connection, use it on relevant source/target operators in your Dataflow, create one Deployment
Package, and then deploy the one Package to multiple servers.

Once an application is deployed, specific properties in the deployment such as the server name, credentials, and other
properties that are likely to change between Development, Test, and Production, can be overridden.

Along with the Deployment Packages artifact and the simplicity of the expressor Engine command line,
Parameterization in version 3.4 further enhances the deployment capabilities of your expressor applications. This
capability is only available in the expressor Standard Edition.


Multi-Test Filter Operator
The Filter Operator can now process multiple boolean tests and output positive results on multiple ports instead of
just one. The new Rules Editor simplifies the writing of rules for each test by mapping each rule to a separate output
port. The Filter operator can be configured to execute every defined Filter test and emit a corresponding output
record for each successful test, OR, to stop executing Filter tests after the first successful test.


Native Database Merge Support
The Write Table operator now supports ―merge‖ mode for applications that must update existing rows or add new
rows to a table without deleting existing rows. This capability is available for select databases that provide a native
merge function including Microsoft SQL Server, Oracle, MySQL, Teradata, and DB2.


Pivot Row and Pivot Column Operators
The new Pivot Row and Pivot Column operators allow users to easily transpose one input record to many output
records and vice versa. For example, using the Pivot Row operator, you can convert each input record that includes
one field for each month of the year:


      ID           January             February         March                ...


 6712399       $1,976,123.00         $1,854,236.00   $2,010,442.89            ...


into multiple records, one for each month of the year:


ID           Month           Total             ...


6712399      January         $1,976,123.00     ...


6712399      February        $1,854,236.00     ...


6712399      March           $2,010,442.89     ...


...          ...             ...               ...




2
                                                                                               What's New in expressor 3.4



Multi-Step Dataflows
Multi-step dataflows provide sequential processing of separate but related and sequential data processing tasks. the
multi-step dataflow functionality removes the need to use separate dataflows to accomplish related but sequential
data processing, and it provides a very easy to use mechanism for ―staging‖ data processing or initializing parameters
such as the properties of source and target operators.

Prior to expressor 3.4, each Dataflow only allowed for one ―step‖. In expressor 3.4 each Dataflow can now be
composed of one or more ―steps‖ where each step is executed in succession.


Persistent Values
Operational values can be made to persist across different runs of a dataflow. These persistent values can also be
visible to separate dataflows. Function calls written in datascript operators such as Transform and Write Custom store
variables during the execution of a dataflow, and those values persist and can be called back during subsequent
executions of the dataflow, and they can be called by other dataflows.


Desktop Edition
The expressor Desktop Edition provides a license to create deployment packages and run them from the command
line using the etask command on desktop systems, Windows 7 and Windows XP.


Free Studio Download
The free download version of expressor Studio now does not require a license to run. Users can, however, install an
expressor Engine or Repository license in the free download version of Studio to allow it to interact with Engine and
Repository. Both 30-day trial and full-term licenses can be installed in Studio. If those licenses expire, Studio will revert
to running as a free, standalone version.


Community Center
For the expressor 3.4 version, the community web site has been enhanced to support its fast growing user base.
The community site offers new and updated knowledge base articles to enhance the user experience and productivity
with Version 3.4.

The Community Center continues to offer blogs, forums, tutorials, and online documentation. It is also the center for
free community support of the expressor 3.4 version and subsequent releases.

Visit the community web site to review the version 3.3 Release Notes. Release Notes list bugs that have been fixed in
version 3.4 as well as any outstanding issues.




                                                                                                                           3
Key Concepts in expressor software

Reusability                             Studio Explorer


Semantic Types                          Integrated Source Control


Transformations                         Deployment




Reusability
Reusability is at the core of expressor software. The metadata model at the heart of all expressor applications makes
sharing and reuse an integral part of the development process. As a result, application development is simple and
efficient.

Data integration applications consist of artifacts such as Dataflows, Schema, Semantic Types, and Connections that
can all be shared across applications. Many, such as Schema, Semantic Types, and Connections, can be reused within
a single application to eliminate redundant effort.

Semantic Types, in particular, offer a new level of reusability in data integration application development, especially in
data mapping.

expressor Studio also provides for creating templates from configured Dataflow Operators. An Operator Template
can be reused in the same or a different Dataflow without configuring it from scratch.

Projects and Libraries
Projects and Libraries are the system containers where the artifacts reside. Projects contain all the artifacts required
for a data integration application. Libraries are designed to contain artifacts that can be shared across Projects, so
that the same artifacts do not have to be created multiple times or stored separately for each application. Libraries
make sharing and reuse simple. Libraries cannot, however, be deployed independently to run artifacts. The artifacts
they contain are deployed as part of the Project in which they are used. Libraries are linked to Projects with Library
References.

Users manage Projects and Libraries on their local machines in Workspaces.

Artifacts
Of the artifacts that comprise an expressor Studio application, the Dataflow is the centerpiece because it contains
the ETL (Extract, Transform, Load) flow of the data integration. Dataflows use Connection artifacts to specify the
location and type of the resources they will read in to transform and write out to a data store. They use Schema
artifacts to specify the metadata read into or written from the application, and they use Semantic Types to normalize
the Schema from different sources for unified processing in the application.




                                                                                                                           5
expressor Documentation



Operator Templates are copies of operators that have been configured and saved for reuse. Datascript Modules are
collections of datascript functions written independently of specific operator transformations. That can be referenced
by multiple operators.

Deployment Packages compile dataflows and their related artifacts into packages that can be deployed. See
Deployment below. External Files are any files used in an application that must be included Deployment Packages.


The adjoining screen
illustrates a
workspace with
multiple Projects, a
Library, multiple
artifacts, and in the
center panel, the
Dataflow for a
simple data
integration
application. The
Properties tab on
the right displays
the properties of the
Write File operator.
(Throughout Studio
documentation,
icons like the one
below are used to
indicate that a full
screen image is
viewable by clicking
on the icon.)




Top-down and Bottom-up Development
Because the artifacts are independent and can be used in multiple applications, development can start at the top with
the Dataflow and build the Schema, Connection, and Semantic Types the application needs. Alternatively, the Schema
and other artifacts can be created first and Dataflows then built on top of them. For example, the Schema for a
particular data source can be defined and a Connection file for the location of the data source built and then used in
multiple Dataflows, in the same project or different projects.

Even the Dataflows can be reused and additional Schema, Connections, and Semantic Types built to adapt the
dataflow to different data. Dataflows can be shared across multiple projects and applications, though other artifacts
that they require from within the first project must be moved along with them. Once a dataflow or any of the artifacts
exist, they can be reused to support new applications.




6
                                                                                    Key Concepts in expressor software




Semantic Types
The expressor software architecture includes a new types system known as "Semantic Types." This types system
allows you to build smarter data processing applications by allowing them to define "smart‖ types that include
constraints and well-defined rules about how they may or may not be transformed. This enables applications to
ensure data consistency and integrity very early in the application and rely on the data represented by the type
elsewhere in the application.

Semantic Types are used to capture important characteristics of external data while also providing an important
common model for reusable source/target mappings. This simple capability allows a developer to use a single string
type, named for example "SocialSecurityNumber," to represent a Social Security number in two different database
columns of differing names, such as SSNO and ssid, and differing types, such as varchar and integer. After defining
the new type once, you only need to associate this Semantic Type with the data being read or written, and the
expressor system does the rest, including automatic type conversions and validations.

The intelligence of Semantic Types is further enhanced by their reusability. As new applications are built, previously
defined Semantic Types can easily be reused, thus ensuring that the organization's data rules can easily be
propagated from application to application.

There are two kinds of Semantic Types: Atomic and Composite. An Atomic Type specifies the type for an attribute in
a Composite Type. A Composite Type defines a structure of one or more attributes whose type can be an Atomic
Type. Atomic Types specify the primitive data type and other constraints such as length, maximum length, total
digits, and fraction digits. An Atomic Type can be reused for multiple attributes in the same or other Composite
Types. When mapping to Schema, one Atomic Type can be used for multiple fields in the Schema. For example,
fields such as employee number and manager number may be distinct, but the semantics of their usage or type can
be the same. Processing them as a single Semantic Type simplifies the application, ensures that they adhere to the
same constraints, and emphasizes their semantic equivalence.

Constraints set on Atomic Types provide a powerful method for cleansing external data as it enters a dataflow and for
maintaining data integrity as it is transformed during dataflow processing. Data that violates constraints can be
handled with predetermined Corrective Actions, which are set by the user when defining an Atomic Type.

Although the expressor system is built on this new types model, by default you do not have to actively define or
manage types. During normal usage, the expressor system automatically harvests types based on the metadata
structures of the sources and targets. Users decide when particular types should be modified and when they should
be "shared" for reuse.




Transformations
At their simplest, data integration applications map input data to output data and specify transformation functions for
individual data fields. Depending on the requirements of the application, the mapping and transformation tasks can
range from being trivial to complex. In the trivial case, the input and output is a direct match, and beyond a simple
copy from input to output, only a small handful of trivial transformations must be specified. In the complex case, the




                                                                                                                         7
expressor Documentation



mapping from input to output may not be direct, such as when some inputs are dropped, some new outputs are
added, and when multiple inputs can contribute to the value of a single output.

The expressor Rules Editor is a key interface designed to accommodate both the simple and complex use cases. The
Rules Editor enables you to visually define the mapping between source and target data fields. When building an ETL
application, it is common that the output type of a transformation must be defined or modified explicitly. In the event
that the output of the transformation must be built or modified from within the Rules Editor, you have access to the
same powerful yet easy-to-use editor that you would use to define types elsewhere in the system.


The point-and-
click expression
builder
provides an
easy-to-use
interface for
defining simple
transformation
rules. More
complex
transformations
can be written
with Datascript,
expressor's
powerful data
transformation
language.
While the Rules
Editor does not
require
knowledge of
Datascript, its
interface
simplifies the
construction of
Datascript
functions from
the ribbon bar
and typing-
completion
feature.


Datascript can also be used in custom read and write operators to build data adapters for reading and writing data
from/to custom data sources such as Web services.




8
                                                                                       Key Concepts in expressor software




Studio Explorer
The expressor Studio is the environment in which the operations described above are carried out. It enables the
creation of all expressor Artifacts, provides a platform on which to run an application, and allows for the
management of projects and artifacts.

Management takes place in the Explorer. The Explorer panel visible at any particular time contains all the projects and
artifacts of a single workspace. Like the rest of the Studio, Explorer employs many of the common graphical user
interface tools, such as drag-and-drop and point-and-click. Objects in the Explorer can be moved around by
dragging and dropping, and other operations are performed by selecting the object by pointing to it and then using
right-click menus.

The Explorer can display workspace components in several ways. All projects and libraries can be displayed, a single
project can be displayed, and artifacts can be displayed in groups.




Explorer offers a convenient impact analysis search capability that enables users to find artifacts that may be impacted
by changes to other artifacts. Explorer's search functionality also allows for artifacts to be found by name pattern
match.

A key feature in expressor Studio is integrated online help. Most of the information you need to perform tasks,
including identifying objects such as operators and icons and filling out forms is available directly in the Studio
interface. You do not have to launch a separate help window to see the basic information you need. That
information is displayed automatically in the context where it is relevant.

The most prominent integrated help component is the Task panel. This panel is displayed on the right side of the
Studio window. It contains brief explanations, links that initiate action directly, and links to the full online help system
displayed in a separate window when necessary.




                                                                                                                               9
expressor Documentation




Another key component of the integrated online help is the operator properties panel. When an operator in a
Dataflow is selected, the fields required to configure its properties are displayed in the right panel. As each property
field is selected, a description of that field is displayed in the top portion of the panel.




Tool tips are available for icons, buttons, and fields in the Studio interface.


Integrated Source Control
The expressor Repository provides a fully-functional revision control system based on the open source standard
Subversion. The Repository provides a wide variety of source control operations including branching, locking, conflict
resolution, and access to revision history. These operations are directly integrated into the Studio Explorer.

Integrated source control facilitates sharing and reusability. Artifacts can easily be checked out and managed by
multiple team members, and they can be ported from one project to another.

The Repository also makes it easier to move the applications developed in Studio into a deployment environment
because it provides centralized storage and ongoing version management.


Deployment
The vehicle for deploying applications is the Deployment Package, which is a first-class Project artifact that compiles
Dataflows so that they can be run on a deployment system. Deployment Packages are created in Studio just like other
artifacts, and are managed in the same way with the Studio Explorer. Deployment Packages can be moved manually




10
                                                                                  Key Concepts in expressor software



from a Studio environment or, more conveniently and with better control, they can be checked out of Repository into
the deployment environment.

It is expressor Engine that constitutes the deployment environment. It runs the compiled Dataflows in a Deployment
Package. The Engine is a runtime application that can be installed independent of Studio on a Windows Server.
Dataflows in a Deployment Package are run with a utility named "etask.exe," which can be called from a command
line or configured to run from a scheduler.

Because the Engine core is built into Studio, program validation can be performed during compilation in Studio,
avoiding costly troubleshooting errors on the server. And when Repository stores the compiled applications under
version control, it provides a strong construct for managing compliance of the deployment environment.




                                                                                                                   11
Databases Supported
The database connectors in expressor support connectivity through specific drivers, as described below, and with
ODBC drivers. Certified versions are also described below:


   RDBMS or Data source             Certified                      Client Driver Support
                                   Versions*


IBM DB2                           7, 8, or 9       Drivers provided by expressor


Informix                          11.5             Drivers provided by expressor


Microsoft SQL Server              2005, 2008       Drivers provided by expressor


MySQL Community Edition           5.1              Customer must provide and install the official MySQL
                                                   5.1 ODBC driver from MySQL (Oracle).

                                                   When creating a Connection using a MySQL
                                                   Community Edition driver, expressor will attempt to
                                                   find the driver with this name: MySQL ODBC 5.1
                                                   Driver


MySQL Enterprise Edition          5.1              Drivers provided by expressor


Netezza                           4.6, 5.0, 6.0    Customer must provide and install the official
                                                   Netezza ODBC drivers from Netezza (IBM).

                                                   When creating a Connection using a Netezza driver,
                                                   expressor will attempt to find the driver with this
                                                   name: NetezzaSQL


Oracle                            8, 9, 10, or     Drivers provided by expressor
                                  11


PostgreSQL                        8.4              Drivers provided by expressor


Sybase                            11, 12 or 15     Drivers provided by expressor


Teradata                          12 and 13        Drivers provided by expressor but customer must
                                                   also install Teradata Client per installation
                                                   instructions.


Other through ODBC (e.g.                           Customer must provide and install ODBC driver.
Microsoft Excel)
                                                   ODBC Driver Requirements:




                                                                                                                   13
expressor Documentation



                                                     At a minimum, the ODBC driver must support the
                                                     ODBC 3.5 API Specification. Please note that not all
                                                     drivers support all of the functionality and
                                                     capabilities described in the specification and
                                                     exposed by expressor. As such expressor’s database
                                                     motors will only be as functional as the driver allows.




* Certified versions describe those versions that have been certified to work with expressor. While the drivers may
support additional versions of the listed RDBMS, their utility and reliability is unknown.




14
Product Installation

Install Free expressor Studio

Installation Requirements


Install the Free Studio


Uninstall Studio




Installation Requirements
expressor Studio can be installed onto computers with the following operating systems.

        Microsoft Windows 7 Professional

        Microsoft Windows XP Professional, sp3 or higher

        Microsoft Windows Server 2003 SP2

        Microsoft Windows Server 2008 R2

    Note: expressor Studio requires the following hotfix to run on Windows XP. If you receive a message
              during installation that indicates the hotfix is not installed on your system, use the following URL
              to get the hotfix:

    http://support.microsoft.com/kb/943326

    All of the supported platforms require the following hotfix. If you receive a message during installation
              that indicates the hotfix is not installed on your system, use the following URL to get the hotfix:

    http://support.microsoft.com/kb/967328

Systems require:

        2 GB of RAM

        345 MB of disk space

        .NET 3.5 or higher

        If the Microsoft .NET framework is not installed, it will be installed automatically during the expressor Studio
        installation.

        Monitors for expressor Studio must be able to display 1280 x 800 DPI resolution.

VMware images with these operating systems may also be used.

In order to install expressor Studio, you must be an administrator on the target computer.




                                                                                                                       15
expressor Documentation



Install the Free Studio
During the installation procedure, your computer must have an Internet connection. Once the software is installed,
you may work either online or offline.

     1.   Download the expressorStudioInstaller.exe installation application from the expressor Community
          web site:
          http://expressor-community.com/

     2.   Run the installation application, expressorStudioInstaller.exe.
          If necessary, the application will download prerequisite software (e.g., updates to the .NET framework) and
          then install the expressor Studio application.

You do not need a license to run the free version of Studio. When you install expressor Engine and/or Repository,
install one of the licenses from those components on Studio to activate functionality that enables Studio to interact
with those components.




Uninstall Studio
You can remove expressor Studio through the Windows Control Panel. Remove the program named expressor
Studio.

If you want to archive your work, copy or rename the Workspaces directory

     C:\Documents and Settings\<user_name>\My
     Documents\expressor\Workspaces (Windows XP)
     C:\Users\<user_name>\Documents\expressor\Workspaces
     (Windows 7)

If you are planning to reinstall expressor Studio, it is best to start with a clean workspace. You should delete the
directory the set up program created for expressor Studio.

     C:\Documents and Settings\<user_name>\Application
     Data\expressor (Windows XP)
     C:\Users\<user_name>\AppData\Local\expressor_software
     (Windows 7)




16
                                                                                                    Product Installation



Install expressor Standard Edition

Installation Requirements


Install the Complete Standard Edition


Install the Engine on a Separate Server


Install Studio


Install the Repository on a Separate Server


Get a Full-Term License


Install the License


Uninstall expressor Software


Installation Requirements
expressor Standard Edition can be installed onto computers with the following operating systems.

        Microsoft Windows Server 2003 SP2

        Microsoft Windows Server 2008 R2

        Microsoft Windows 7 Professional

        Microsoft Windows XP, sp3 or higher

VMware images with these operating systems can also be used.

Systems can have either 32-bit or 64-bit processors.

Systems require 2 GB of RAM.

Disk space:

        Studio: 345 MB

        Engine: 270 MB

        Repository: 100 MB

Monitors for expressor Studio must be able to display 1280 x 800 DPI resolution.

expressor Studio requires the Microsoft .NET 3.5 framework. If the framework is already not installed, it will be
installed automatically during the Studio installation.

Do not install Standard Edition in an NFS shared directory.

In order to install Standard Edition, you must be an administrator on the target computer.




                                                                                                                     17
expressor Documentation



     Note: expressor Studio requires the following hotfix to run on Windows XP. If you receive a message
                during installation that indicates the hotfix is not installed on your system, use the following URL
                to get the hotfix:

     http://support.microsoft.com/kb/943326

     All of the supported platforms require the following hotfix to run expressor Studio. If you receive a
                message during installation that indicates the hotfix is not installed on your system, use the
                 following URL to get the hotfix:

     http://support.microsoft.com/kb/967328

Install the Complete Standard Edition
During the installation procedure, your computer must have an Internet connection. Once the software is installed,
you may work either online or offline.

     1.    Download the expressorPlatformInstaller.exe installation application from the expressor site you
           were directed to when you bought the product or from the expressor Community web site
           (http://expressor-community.com/) if you intend to install a 30-day trial version.
           Place the installation file in any convenient file system location.

     2.    Run the expressorPlatformInstaller.exe file.
           If necessary, the application will download prerequisite software (e.g., updates to the .NET framework) and
           then install the expressor Standard Edition applications.

     3.    On the Custom Setup screen in installation wizard, ensure that none of the components--Engine, Studio, or

           Repository--are marked for exclusion (          ).

          The Custom Setup screen should look like the following screen shot when all components are selected for
          installation:




18
                                                                                                         Product Installation




     7.    On the next screen, specify the name or the user account that will be used to log on to this Repository
           installation.

          You can change the location in which to install the repository. Repository data is stored in
          \ProgramData\expressor\repository. If a repository already exists in that location, you can change the
          location of the repository data created with this new installation. Or you can elect to use an existing
          repository, in which case you would not need to specify the user name and password. You could, however,
          specify a different port number if necessary.

     8.    Set a password for logging into the Repository.

     9.    Assign a port number.
           The port number must be one that can be used exclusively for Repository. You can use the default port
           selected by the installation program, or you can specify another port for use by Repository. In most cases,
           the default port number will work.
           The Repository program is installed in \Program Files\expressor\expressor3\Repository.

     Note: Write down the hostname or IP address of the computer on which you installed Repository and the
                port number assigned to the Repository service. You will need this information to connect Studio
                and Repository.

After installation is complete, you must install a license to enable the Standard Edition. If you installed a 30-day trial
version of the Standard Edition, the license key is in the email sent to you after you downloaded the installation file.
Since you already received the license, you are ready to install the license.

If you have purchased the Standard Edition, then you will get a full-term license for the specific machine on which
have installed it.



                                                                                                                             19
expressor Documentation



     Note: When you request a full-term license, you will receive separate licenses for Engine and Repository.
              Both licenses must be installed even though Engine and Repository are running on the same
              machine.

Install the Engine on a Separate Server
This procedure installs expressor Engine on a separate server without Studio or Repository.

During the installation procedure, your computer must have an Internet connection. Once the software is installed,
you may work either online or offline.

     1.   Download the expressorPlatformInstaller.exe installation application from the expressor site you
          were directed to when you bought the product or from the expressor Community web site (http://expressor-
          community.com/) if you intend to install a 30-day trial version.
          Place the installation file in any convenient file system location.

     2.   Run the expressorPlatformInstaller.exe file.
          If necessary, the application will download prerequisite software (e.g., updates to the .NET framework) and
          then install the expressor Engine application.

     3.   On the Custom Setup screen in installation wizard, select "This feature will be installed on local hard drive"
          from the expressor Engine drop-down menu.




20
                                                                                            Product Installation



7.   Select the option "This feature will not be available" from the expressor Studio drop-down menu on the
     Custom Setup screen.




8.   Select the option "This feature will not be available" from the expressor Repository drop-down menu on
     the Custom Setup screen.




                                                                                                              21
expressor Documentation




After installation is complete, you must install a license to enable the Engine. If you installed a 30-day trial version of
the Engine, the license key is in the email sent to you after you downloaded the installation file. Since you already
received the license, you are ready to install the license.

If you have purchased Engine, then you will get a license for the specific machine on which you have installed Engine.

After you install the license for Engine, you should use the license to enable Studio on any machines where it is
installed. Using the license for Engine activates Studio functionality for creating deployment packages. See the install
license instructions for Studio interface.

Install Studio
Installing on a client machine is similar to installing the Engine, except in this case you elect to install Studio instead of
selecting "This feature will not be available." When elect to install Studio, the Engine is automatically installed with it.
You cannot select "This feature will not be available" from the expressor Engine drop-down menu without also
negating the installation of Studio.

During the installation procedure, your computer must have an Internet connection. Once the software is installed,
you may work either online or offline.

     1.   Download the expressorPlatformInstaller.exe installation application from the expressor site you
          were directed to when you bought the product or from the expressor Community web site
          (http://expressor-community.com/) if you intend to install a 30-day trial version.
          Place the installation file in any convenient file system location.




22
                                                                                                  Product Installation



2.   Run the expressorPlatformInstaller.exe file.
     If necessary, the application will download prerequisite software (e.g., updates to the .NET framework) and
     then install the expressor Engine and Studio applications.

3.   On the Custom Setup screen in installation wizard, select "This feature will be installed on local hard drive"
     from the expressor Studio drop-down menu.




4.   Ensure the option "This feature will not be available" on the expressor Repository drop-down menu on the
     Custom Setup screen is selected.




                                                                                                                      23
expressor Documentation




     8.    Ensure that you have the appropriate license.

          If you installed a 30-day trial version, the license key is in the email sent to you after you downloaded the
          installation file. Since you already received the license, you are ready to install the license. If you have
          purchased Standard Edition, then you will get a full-term license for the specific machine on which have
          installed the Studio and Engine.




Install the Repository on a Separate Server
During the installation procedure, your computer must have an Internet connection. Once the software is installed,
you do not have to maintain an Internet connection.

     1.    Download the expressorPlatformInstaller.exe installation application from the expressor site you
           were directed to when you bought the product or from the expressor Community web site
           (http://expressor-community.com/) if you intend to install a 30-day trial version.
           Place the installation file in any convenient file system location.

     2.    Run the expressorPlatformInstaller.exe file.

     3.    On the Custom Setup screen in installation wizard, select "This feature will not be available" from the
           expressor Engine drop-down menu.


           Both the expressor Engine and expressor Studio drop-down menus will then show that they are not selected.




24
                                                                                               Product Installation




4.   On the next screen, specify the name or the user account that will be used to log on to this Repository
     installation.




                                                                                                                25
expressor Documentation



     5.    Set a password for logging into the Repository.

     6.    Assign a port number.
           The port number must be one that can be used exclusively for Repository. You can use the default port
           selected by the installation program, or you can specify another port for use by Repository. In most cases,
           the default port number will work.
           The Repository program is installed in \Program Files\expressor\expressor3\Repository.
           Repository data is stored in \ProgramData\expressor\repository.

     Note: Write down the hostname or IP address of the computer on which you installed Repository and the
                port number assigned to the Repository service. You will need this information to connect Studio
                and Repository.

After installation is complete, you must install a license to enable the Repository. If you installed a 30-day trial version
of the Repository, the license key is in the email sent to you after you downloaded the installation file. Since you
already received the license, you are ready to install the license.

If you have purchased the Standard Edition, then you will get a full-term license for the specific machine on which you
have installed the Repository.

After you install the license for Repository, you should use the license to enable Studio on any machines where it is
installed. Using the license for Repository allows Studio users to connect to Repository.

You can install either an Engine license or Repository license with Studio. Either license will enable Studio to interact
with both Engine and Repository.

See the install license instructions for Studio interface.

Get a Full-Term License
After the installation is complete, you get a license from expressor software corporation by sending back
information about your installation.

     1.    Open the expressor command prompt window from the expressor folder on the Microsoft Windows Start
           menu.
           The expressor command prompt option is in the expressor3 subfolder.

     2.    Type the command:

               elicense -r

     3.    Copy the output from the elicense command.

     4.    Paste the command output into an email message and send it platform-license@expressor-software.com

Install the License
     1.    Copy the license text string from the email you received after downloading the 30-day trial version or in
           response to the email you sent with the output of the elicense -r command.

          If you have installed both Engine and Repository on the same machine, then you will receive two licenses in
          response to the email you send with the output of the elicense -r command. In that case, you must
          perform step 3 twice to install both licenses.



26
                                                                                                      Product Installation



2.    Open the expressor command prompt window in the expressor folder on the Microsoft Windows Start
      menu.
      The expressor command prompt option is in the expressor3 subfolder.

3.    Type the elicense command and paste the license key after the -k argument:

          elicense -k license_key

     Alternatively, if Studio is installed on the system on which the license is to be installed, you can use the Studio
     interface to install the license.

           1.   Start Studio.
                The License Information dialog box will open automatically, unless Studio already has a valid license
                for the 3.3 version.
                If you want to change the installed license for any reason, then open the drop-down menu from


                help button              in the upper right corner of Studio and select License Information.

           2.   Click Install License... on the License Information dialog box.

           3.   In the Install License dialog box, paste the license string into the license entry field and Click the
                Install button.




4.    Select repository Start from the repository folder in the expressor folder on the Microsoft Windows Start
      menu.
      The Repository will run as a Windows Service and automatically restart whenever the system is rebooted.

     Do not perform step 4 if you have not installed Repository.




                                                                                                                         27
expressor Documentation



Uninstall expressor Software
You can remove expressor Standard Edition or individual components--Engine or Repository-- through the
Windows Control Panel.

Uninstall entire Standard Edition
     1.   Select Programs and Features from the Windows Control Panel.

     2.   Select expressor Software.

     3.   Right-click and select Uninstall.

Uninstall Engine or Repository
     1.   Select Programs and Features from the Windows Control Panel.

     2.   Select expressor Software.

     3.   Right-click and select Change.
          The expressor Installation Wizard displays.

     4.   Move to the second screen of the Wizard and select Modify.

     5.   On the following screen, select This feature will not be available from the drop-down menu for either
          Engine or Repository.
          If you choose Engine, Studio will also be removed during the uninstall process. You cannot select Studio
          alone for removal.




28
                                                                                                    Product Installation



Install expressor Desktop Edition

Installation Requirements


Install the Desktop Edition


Get a License


Install the License


Uninstall Desktop Edition


Installation Requirements
expressor Desktop Edition can be installed onto computers with the following operating systems.

        Microsoft Windows 7 Professional

        Microsoft Windows XP, sp3 or higher

VMware images with these operating systems can also be used.

Systems can have either 32-bit or 64-bit processors.

Systems require 2 GB of RAM.

Disk space:

        Studio: 345 MB

        Engine: 270 MB

Monitors for expressor Studio must be able to display 1280 x 800 DPI resolution.

expressor Studio requires the Microsoft .NET 3.5 framework. If the framework is already not installed, it will be
installed automatically during the Studio installation.

Do not install Standard Edition in an NFS shared directory.

In order to install Standard Edition, you must be an administrator on the target computer.

    Note: expressor Studio requires the following hotfix to run on Windows XP. If you receive a message
                during installation that indicates the hotfix is not installed on your system, use the following URL
                to get the hotfix:

    http://support.microsoft.com/kb/943326

    All of the supported platforms require the following hotfix to run expressor Studio. If you receive a
                message during installation that indicates the hotfix is not installed on your system, use the
                following URL to get the hotfix:

    http://support.microsoft.com/kb/967328




                                                                                                                     29
expressor Documentation



Install the Desktop Edition
This installation is designed for Windows XP and Windows 7. The Desktop Edition is not supported on Windows
Server. See Studio Installation Requirements.

During the installation procedure, your computer must have an Internet connection. Once the software is installed,
you may work either online or offline.

     1.    Download the expressorPlatformInstaller.exe installation application from the expressor site you
           were directed to when you bought the product.
           Place the installation file in any convenient file system location.

     2.    Run the expressorPlatformInstaller.exe file.
           If necessary, the application will download prerequisite software (e.g., updates to the .NET framework) and
           then install the expressor Desktop Edition applications.

     3.    On the Custom Setup screen in installation wizard, ensure that neither of the components--Engine or Studio-

           -are marked for exclusion (         ).

After installation is complete, you must install a license to enable the Desktop Edition.

Get a License
After the installation is complete, you get a license from expressor software corporation by sending back
information about your installation.

     1.    Open the expressor command prompt window from the expressor folder on the Microsoft Windows Start
           menu.
           The expressor command prompt option is in the expressor3 subfolder.

     2.    Type the command:

               elicense -r

     3.    Copy the output from the elicense command.

     4.    Paste the command output into an email message and send it platform-license@expressor-software.com

Install the License
     1.    Copy the license text string from the email you received in response to the email you sent with the output of
           the elicense -r command.

     2.    Open the expressor command prompt window in the expressor folder on the Microsoft Windows Start
           menu.
           The expressor command prompt option is in the expressor3 subfolder.

     3.    Type the elicense command and paste the license key after the -k argument:

               elicense -k license_key

          Alternatively, you can use the Studio interface to install the license.




30
                                                                                                       Product Installation



             1.   Start Studio.
                  The License Information dialog box will open automatically, unless Studio already has a valid license
                  for the 3.3 version.
                  If you want to change the installed license for any reason, then open the drop-down menu from


                  help button            in the upper right corner of Studio and select License Information.

             2.   Click Install License... on the License Information dialog box.

             3.   In the Install License dialog box, paste the license string into the license entry field and Click the
                  Install button.




Uninstall Desktop Edition
    1.   Select Programs and Features from the Windows Control Panel.

    2.   Select expressor Software.

    3.   Right-click and select Uninstall.


Remove expressor Licenses
In some circumstances, you may need to remove an expressor license. If you need to upgrade your license for
Engine or Repository, you must remove the old license before the elicense command can add the new license.

Licenses can be removed with the elicense command "elicense -x". However, because access privileges to
license files vary according to the privileges of the user who installed the license, a Best Practice is recommended to
ensure that removal is simple and error-free.

Best Practice for Removing Licenses
    1.   Open the Windows Start menu.

    2.   Open the expressor > expressor3 folder.

    3.   Right-click on the expressor command prompt option.

    4.   Select Run as Administrator.

    5.   Type "elicense -x" in the command prompt window.

    6.   Close the expressor command prompt window.




                                                                                                                           31
expressor Documentation




32
Getting Started with Studio

Studio Basics


Studio Interface


Workspace Explorer


Samples and Solutions




Studio Basics
expressor Studio is the graphical user interface through which you create and manage data integration applications.


Studio contains an embedded version of the expressor Engine that supports database connectivity through high-
performance drivers to the most popular databases as well as standard ODBC drivers. The Engine supports
connection to CSV (comma-separated values) text files in the expressor Studio 3.4 version. Studio provides
management of all the components or Artifacts of data integration applications. Through Studio, you can connect to
data sources, create drawings of the data flow from input to output connectivity, and map the transformation of data
in the flow. As you construct Dataflows, Studio displays real-time verification messages. With the embedded Engine,
Studio provides an environment for run-time validation and testing prior to deployment.

The Studio interface presents actions through point-and-click, drag-and-drop and through ribbon bar and Quick
Access toolbars. The ribbon tool bar contains buttons and drop-down menus that change according to the context
you are working in. The Quick Access toolbar remains constant, though it can be customized.



expressor Studio is launched by double-clicking the desktop icon              or by using the Windows Start menu. The
Studio icon is placed in the Windows Start menu during the expressor product installation.


Studio Interface
When you launch Studio, the first screen presents Workspaces, the highest organizational level in Studio. (Click this

icon     to see a full image of the first Studio screen. At a number of places in Studio documentation, this icon is
used to indicate that a full screen image is viewable by clicking on the icon.)




                                                                                                                        33
expressor Documentation




Most online help in expressor Studio is integrated into the user interface. Information you need to perform tasks,
including identifying objects such as operators and icons and filling out forms is available directly in the Studio
interface. You do not have to call up a separate help window to see the basic information you need. That
information is displayed automatically in the context where it is relevant.

Links indicated by More... open complete explanations in a separate online help window. You can also access the full


online help system by using the F1 key or the help icon          .

Another key component of the integrated online help is the Operator Properties panel. When an Operator in a
Dataflow is selected, the properties required to configure it are displayed in the right pane. As each property is
selected, a description of it is displayed in the top portion of the panel. You can use the <Previous and Next> links as
well to move through the descriptions.




34
                                                                                               Getting Started with Studio




Some of the property descriptions contain active links to actions that help with the operator configuration. For
example, there are links for creating Connections and Schema while you are in the process of configuring an operator
that needs them.

Tool tips are available for icons, buttons, and fields in the Studio interface.


Workspace Explorer
The Workspace Explorer is a tool for managing all the Projects, Libraries, and Artifacts in an open Workspace.


                     The Workspace Explorer is part of the
                     Studio interface. Within it, you can
                     perform most of the actions you can
                     perform with other aspects of the
                     Studio interface. For example, a
                     right-click menu enables actions
                     available on the ribbon bar, such as
                     creating new Schema, Connections,
                     and other Artifacts. It also enables
                     moving and copying, renaming and
                     deleting, and searching.


There are two types of searches available: searching for an artifact by name and searching where an artifact is used.

The former is initiated with the search button at the top of the Explorer panel         . Searching for where an Artifact
is used is provided by a choice on the right-click menu associated with an individual artifact.

The Explorer also permits drag-and-drop operations. You can move artifacts between Projects and Libraries.

You control the display in the Explorer by expanding and collapsing containers such as Projects and Artifact groups.
You can display all Projects or selected Projects with the drop-down menu at the top of the Explorer. You can also

arrange the tree display by Artifact type by clicking the         icon at the top of the Explorer panel.




                                                                                                                        35
expressor Documentation



Samples and Solutions
Studio includes samples and solutions that promote learning by doing. The help topics explain how the dataflows and
other artifacts that make up the applications are constructed. The Web Services sample is preconstructed, so you can
run it. You must customize the application by using your own Google and Salesforce.com accounts before you can
run it successfully.

The help topics include links to other topics in the help system that explain in detail the steps taken to construct the
applications. Working through samples and solutions is one approach to learning the concepts and tasks needed to
build data integration applications with expressor Studio.

The two samples and solutions in Version 3.4 are:

Build and Deploy and expressor Application

Connect to Data Services with Datascript Operators

There is also the Sample Workspace available on the Studio Workspaces screen. The projects and library in the
Sample Workspace are preconstructed, so you can run it. When you open the Sample Workspace, a special help file
opens to walk you through how the workspace was constructed.




36
Workspaces, Projects, and Libraries

Workspaces, Projects, and Libraries
Workspaces, Projects, and Libraries are containers for organizing Dataflows and other components of data integration
applications created with expressor Studio. Workspaces contain Projects and Libraries. You can use just one
workspace for all your projects, or you can create multiple workspaces and place projects in them according your
preferred organizing scheme.

There are two types of Workspaces.

        Standalone Workspace — allows you to work independently when you do not need the benefits of
         collaboration and version control provided by a centralized expressor Repository

        Repository Workspace — allows you to collaborate with other team members on the same projects and
         provides the security and benefits of version control for your projects through a centralized expressor
         Repository

Projects and Libraries are the system containers where the artifacts reside. They are designed to make sharing and
reuse simple. Projects and Libraries are very similar in that they both can contain the same types of artifacts.

A Project is a collection of expressor Dataflows, Schemas, Semantic Types, Connections, Operator Templates,
Datascript Modules, Deployment Packages, and External Files that together provide the implementation of an
expressor data integration application. Projects reference artifacts in Libraries for their own use. Projects can use
artifacts that they contain, and they can use artifacts contained in Libraries that they reference.

Projects also contain Deployment Packages. These are special artifacts used to compile a project's dataflows and
deploy them in a data integration environment.

A Library is like a Project in that it is a collection of Dataflows, Schemas, Semantic Types, Connections, Operator
Templates, Datascript Modules, and External Files. However, the primary purpose of a Library is to collect artifacts that
can be shared among multiple Projects and Libraries. Consequently, a Library may be associated with more than one
Project within the same workspace. Also, Libraries can be associated with one another so that when one is associated
with a Project, the other Libraries are associated as well. The references Projects and Libraries make are contained in
folders called "Library References," which are listed in Explorer along with the related projects and libraries.

    Note: Libraries cannot create circular references. That is, two libraries cannot both refer to the other.
               One library can refer to another, but then the library referred to cannot refer back to the one
              that refers to it.

The biggest difference between a Library and a Project is that a Library cannot contain Deployment Packages. The
artifacts Libraries contain are deployed as part of the Project that uses the Library.




                                                                                                                        37
expressor Documentation



You manage
Workspaces with
simple point-and-
click, drag-and-
drop operations in
the Workspace
Explorer.




Explorer manages one workspace at a time. It is available when you select an individual workspace. It lists Projects
and Libraries you have created and the artifacts (Dataflows, Connections, Schemas, Semantic Types, Operator
Templates, Datascript Modules, External Files, and Deployment Packages) in each. The implications of copying,
moving, renaming and deleting artifacts is explained in Manage Artifacts.

Projects and Libraries can also be renamed and deleted, and they can be copied to different workspaces. Projects and
Libraries can be copied to workspaces on a local system or they can be exported and made available for copying on
other systems. Copying Projects and Libraries is explained in Import Projects and Export Projects.

When you start Studio, the opening display allows you to create a new workspace or open an existing one. There you
can choose to open a sample workspace that contains several complete data integration applications. The sample
workspace includes a guided tour through each application that explains how it is built and how the various artifacts
like Dataflows and Semantic Types are used.


Create a New Workspace
Create a Standalone Workspace

Create a Repository Workspace

Create a Standalone Workspace
     1.     Select Studio > New Workspace... in the expressor Studio Ribbon bar.

          The New Workspace dialog appears:




     2.     Select Standalone Workspace.

     3.     Name the workspace with a name that is unique among all workspaces.




38
                                                                                  Workspaces, Projects, and Libraries



   4.    Browse for a directory location in which to store the workspace.

   5.    Describe the purpose of the workspace (optional).

        The new Workspace appears.




Create a Repository Workspace
   1.    Select Studio > New Workspace... in the expressor Studio Ribbon bar.

        The New Workspace dialog appears:




   2.    Select Repository Workspace.

   3.    Name the workspace in the New Workspace dialog.

   4.    Provide a description of the workspace (optional).

   5.    Supply the name or IP address of the system on which the repository resides.

   6.    Indicate the port number on the repository system through which connection to the repository will be
         received.

   7.    Click the Create button to create the workspace.

        The workspace home page appears.




                                                                                                                  39
expressor Documentation




          The message in the lower right status pane displays the repository name and the user name. This message
          does not reflect the state of the connection to the repository. It indicates instead which repository the
          workspace uses and the name of the repository user whose credentials are cached in this workspace. If you
          receive a message indicating your credentials are not valid, you can try again or cancel the authentication. If
          you cancel, the status pane will display the repository name and the name "no user."


Open or close a workspace
     1.     Select Studio > Open Workspace... or Studio > Close Workspace in the expressor Studio ribbon bar.


          When you open a new or
          existing workspace, the
          Explorer panel opens on the
          left.

          When you open a
          Repository Workspace, the
          cached credentials are
          automatically used to
          connect to the repository.

          If those credentials are
          invalid, you will be
          prompted to provide new
          credentials. If the
          credentials are valid but the
          repository does not
          connect, you receive a
          notice to that effect.




     Note: If you open workspaces created with a prior version of Studio, dataflows and database connections
                  within the workspace must be upgraded to work with the later Studio version. When you open a
                  dataflow or database connection, you will be given the opportunity to create a backup before the
                  upgrade.



40
                                                                                    Workspaces, Projects, and Libraries



        When you close a workspace, you are returned to the Workspaces home page:




Convert a Standalone to a Repository Workspace
   1.    Open a Standalone Workspace.

   2.    Select Convert Workspace from the Studio tab drop-down menu.

   3.    Enter the Repository host name or IP address and the port number used by Repository.




        If the repository can be connected, you will be prompted for your access credentials.


Create a Project
   1.    Click the New Project button on the Home tab of the ribbon bar.

        The New Project dialog appears.




                                                                                                                    41
expressor Documentation



     2.    Name the Project.
           Project names must be unique within a Workspace and should be unique among all projects. Completely
           unique names prevent clashes. For example, when projects are stored in Repository, they are stored by
           project name without reference to workspace.

     3.    Provide a description of the Project (optional).

          The Project appears in the Explorer pane.




Create a Library
     1.    Click the New Library button on the Home tab of the ribbon bar.

          The New Library window appears.




     2.    Name the Library.
           Library names must be unique within a Workspace and should be unique among all libraries. Completely
           unique names prevent clashes. For example, when libraries are stored in Repository, they are stored by
           library name without reference to workspace.

     3.    Provide a description of the Library (optional).

          The new Library appears in the Explorer pane.




42
                                                                                          Workspaces, Projects, and Libraries



Associate a Library with a Project
In order to use the artifacts created within a Library, you must associate the Library with a Project.

     1.    Select the Project to which you want to associate a Library.

     2.    Click the Library References button in the Home tab of the ribbon bar.

          The Library References dialog box appears.




          The direct library references listed in Selected references are those libraries explicitly referenced by the project.
          The indirect library references are those libraries linked to the project by way of another library linked directly
          to the project.

     3.    Select the Library (or Libraries) in the left-hand panel that you want to associate with the Project.

The artifacts included in the Library will be accessible to dataflows within the referenced Project.

To break the association between a Library and a Project, right-click the Library References icon in the Explorer to
open the Library References dialog box and deselect the Library.


Export Projects
Projects are exported as zip files and placed in a file system directory. There they can be copied into another
workspace local to the system on which Studio is running or into a workspace on another system with access to the
zip file location.




                                                                                                                            43
expressor Documentation



     1.   Select the Export Projects option from the Studio drop-down menu.




     2.   Select the Projects and Libraries to be exported.
          When you select a Project that contains a Library Reference, the related Library is automatically selected at
          the same time.

     3.   Specify the name and location of the zip file.
          All selected Projects and Libraries are placed in a single zip file.

     4.   Provide a description of the exported project (optional).


Import Projects
Projects and Libraries can be imported into a workspace either directly from another workspace or from a zip created
by a project export. Importing Projects and Libraries creates copies of them in the current workspace. They are then
independent of the original Projects and Libraries from which they were copied in the other workspace.

     1.   Select the Import Projects option from the Studio drop-down menu.




44
                                                                                       Workspaces, Projects, and Libraries



    2.     Select ZIP file to import from projects exported to a zip file or Workspace to copy file from another local
           workspace.

    3.     Select the Projects and Libraries to be imported.

    4.     If necessary, rename the imported Projects and Libraries if there are name clashes.


Manage Artifacts

What are Artifacts?                      Copy and Move Artifacts


Explorer View of Artifacts               Delete Artifacts


Search Artifacts




What are Artifacts
Artifacts are the objects that can be managed within projects. There are nine types of artifacts:

Dataflows define the actual flow of data for an expressor application.

Connections capture the information necessary for an expressor Dataflow to connect to an external data source.

Schemas capture the structure of the external data that will be read or written by a Dataflow as well as the mappings
of the field in that external structure to the attributes in the common Semantic Types model.

Semantic Types capture the data type, constraints, and other semantics for a unit of business data that will be
processed by an expressor Dataflow.

Operator Templates capture the configuration of an operator in a Dataflow so that it can be reused within the
project.

Datascript Modules contain functions that can be called by Datascript operators such as Transform and Read
Custom.

Lookup Tables are database tables created locally within an expressor Project that are referenced using the Lookup
Rules in a transformation operator.

Deployment Packages are collections of Dataflows, compiled with their related Connections, Schemas, and Semantic
Types, that can be run by the expressor Engine, independent of Studio.

External Files are files referenced by a Dataflow and included in Deployment Packages. They can be data files, but
usually they are external scripts called by expressor Datascript written for Transformer operators.

Artifacts are created with the six buttons on the left side of the Home tab on the Studio ribbon bar. Operator
Templates are created with the Save As Template button in the Dataflow Build tab in the ribbon bar. External Files
are added to a project by right-clicking on the artifact folder in the Explorer and selecting Add Files. Creation of the
other artifacts is described in the sections of online help that cover each of the artifact types. While the processes for




                                                                                                                         45
expressor Documentation



creating artifacts vary according to the type of artifact, there are certain management tasks that are the same for all
artifacts.

Explorer View of Artifacts
Explorer is the Studio tool for managing artifacts. Explorer groups artifacts by type and lists them under their Projects
and Libraries.




Explorer also displays all artifacts by type regardless which Project or Library they are in. That view is provided by

selecting the        icon in the Explorer tool bar.




46
                                                                                      Workspaces, Projects, and Libraries




Search Artifacts
Explorer provides two methods of searching: searching on artifact names and searching where artifacts are used.


    1.    Click the search icon         in Explorer to search on artifact names.

    2.    Select the Projects and types of artifacts to search from the two drop-down lists below the text-entry field.

    3.    Type all or part of an artifact name in the text-entry field.

         If the string you enter appears anywhere in an artifact name, the artifact name and associated project and
         libraries are listed.

To search for where an artifact is used:

    1.    Select an artifact listed in Explorer.




                                                                                                                          47
expressor Documentation



     2.    Right-click and select Search Where Used from the menu.

          If the artifact is used by other artifacts, such as a Dataflow, the artifact names and associated Projects and
          Libraries are listed.

Copy and Move Artifacts
Copy and move individual artifacts by dragging and dropping them within the Explorer panel or select the artifact and
use the right-click menu.

Copying creates a new artifact in the Project or Library to which it is copied. The copy reflects the artifact at the time
it is copied. Subsequent changes to either the original or copy are not reflected in the other.

Moving removes the artifact from the Project or Library in which it resides. If it is moved to another Project or to a
Library that is not referenced by the Project from which it is moved, then the artifact is no longer available in the
Project. Any use that has been made of the artifact within the Project, such as a Schema used by an input operator in
a Dataflow, will become invalid unless it is replaced with a reference to an artifact that is available to the Project.

     Note: Deployment Packages cannot be copied or moved to Libraries because Libraries cannot contain
                Deployment Packages. Also, Deployment Packages cannot be moved or copied from one Project
                to another. That is because Deployment Packages are specific to the Projects in which they are
                created.

Renaming affects artifacts similar to moving. If the artifact is already in use under its original name, it will not be
found by its new name. Consequently, the new name must be applied in each instance where the original name is
used.

Rename an artifact by selecting it and using the right-click menu.

Delete Artifacts
Delete an artifact by selecting it and using the right-click menu.

If the deleted artifact is referenced by another artifact in the Project, it must be replaced. If not, a Dataflow will break
at points where the deleted artifact is used. For example, if you delete a Connection, then each operator that uses
that Connection will not be able to connect to the data source.




48
Dataflows

Dataflows

What Are Dataflows?                    Examining Data as it Moves
                                       through a Dataflow


How Data Moves through a               Operator Properties
Dataflow


Propagating a Record through           Transformation and the Rules
Multiple Operators                     Editor


Persistent Values


What Are Dataflows?
Dataflows are the graphical representation of an expressor data integration application. Dataflows are built in the
expressor Studio Dataflow editor. The Dataflow editor is active when a project is open and a dataflow has been
created.


Operators are the
building blocks of a
dataflow. The
operators
representing the
various operations are
listed in Studio's left
panel when the
Operators tab is
selected.

The operators perform
well-defined
operations in a data
integration
application. After the
operators are placed
in the dataflow panel,
they must be
connected and
configured.
Configured operators




                                                                                                                      49
expressor Documentation



can be saved as
templates and reused.




When a dataflow is complete and ready to be deployed into a production environment, it is added to a Deployment
Package. In a Deployment Package, a dataflow is compiled with all the other artifacts it uses (Connections, Schemas,
and Semantic Types). The Deployment Package can be placed on a system running an expressor Engine, and
individual dataflows are executed there.

Dataflows can also be stored in an expressor Repository. There they are placed under version control and can be
shared with other users. To be stored by Repository, the dataflow must be in a Repository Workspace.

How Data Moves through a Dataflow
Input Operators, such as Read File and Read Table, bring data into a dataflow from an outside source. The Read
Custom operator generates data from computation or HTTP or similar calls. Input Operators pass the data on to
operators that perform actions on the data, such as Transform, Sort, Filter, and Join, and finally the data goes to an
Output Operator, such as Write File, Write Table, or Write Custom. The data flows from operator to operator through
channels that are represented by links in the dataflow that connect the operators to one another.

     Note: There are two additional Read and Write operators for reading and populating Lookup Tables,
              which are internal to expressor Projects. See Populate a Lookup Table and Read Data in a Lookup
              Table for an explanation of their use.

In order to pass data to one another, operators must know the Composite Type Attributes they will receive and/or
send. Input Operators, which receive data from an external source rather than another operator, read data whose
structure and format, except in the case of the Read Custom operator, is defined by a Schema. They then map that
Schema's fields to Composite Type Attributes that specify the format of the data as it flows "downstream" to the next
operator.

     Note: The names used for Attributes are case-sensitive. When attributes move through a dataflow, they
              are matched automatically with attributes that have the same name. If an input attribute has
              name "foo" and the user creates an output attribute name "Foo," the two attributes will not be
              mapped to one another automatically. They can be mapped to each other manually through a
              rule, but all other properties of the attributes, such as data type and constraints, must match.

The Read Custom operator, which generates data rather than reading structured data, does not use Schemas. It
creates a Composite Type directly when generating data.

Output Operators must know the Composite Type Attributes of data they will receive, but instead of passing data to
another operator, they take the data received from the "upstream" Operator and map from its Attributes to the
Schema fields that defines its format in the external system. The Schema is the structure and format used to send the
data to the external source. As with the Read Custom operator, the Write Custom operator does not use a Schema. It
takes data from the Composite Type and generates output independent of a structured output source.

The Schema and Semantic Types the Input and Output operators use in a particular dataflow are specified in each
operator's properties. Before an operator can function in a dataflow, its properties must be configured.




50
                                                                                                                 Dataflows



When Schema are defined for Input and Output operators, the Schema must be mapped to Composite Types. That
mapping is done in the Schema Editor. Input Operators will pass incoming data downstream to other operators in
the form specified by the Composite Type's Attributes. Output Operators take the data they receive from upstream
operators in the form specified by Composite Type Attributes and map it to the Schema it will use to send it to an
external data source. This flow of data is elaborated upon below in Propagating a Record through Multiple
Operators.

Certain operators, such as Transform and Aggregate, transform the data in ways that require their operation be
precisely specified for the particular dataflow. In addition to Property configuration, transformation rules must be
specified for those operator that transform data. Those rules are specified by mapping input and output in the
expressor Rules Editor. expressor Datascript is used to construct transformations in the Rules Editor.
Transformation mapping is explained in more detail below in Transformations and the Rules Editor.

Propagating a Record through Multiple Operators
In the dataflow representation, data moves from left to right from Input Operators to operators that transform the
data in some way and then onto Output Operators. Each operator must know the type of data it will receive and the
type of data it will transmit to the next operator in the flow. Semantic Types simplify the flow of data because they
create consistency in data from multiple sources. As a result, it is much easier to match the data as it moves from one
operator to the next.

The flow of data is further simplified by the independent configuration of operators. As you connect and configure
each operator left to right, Attributes are automatically available to the next operator "downstream." Building the flow
thus mirrors the actual flow of the data.

Consider the following dataflow that includes the Read File, Transform, and Write File operator.




The Read File operator propagates the Composite Type Attributes downstream as Input Attributes for the Transform
operator. In the following example, there are four attributes that propagate to the Transform operator, and they
automatically transfer to the Transform operator's output.




    Note: Attribute Propagation refers to the movement of attributes from operator to operator. Attribute
              Transfer refers to the movement of input attributes to output attributes within an operator.

If the goal of the application is to create a list of last names in chronological order, then first name and Party would
not be needed. At this point, you could use the Rules Editor to map the two attributes needed for the output. Those




                                                                                                                           51
expressor Documentation



would include only the Lastname and Chronological_Place attributes. Notice that the icon next to the output
attributes' names change when the two attributes are mapped. The diamond-shaped icon indicates that the attribute
is explicitly assigned, not just automatically transferred.




The Firstname and Party attributes are automatically transferred to the Transform operator's output attributes and are
available to the Write File operator. The Write File opeator's Composite Type presumably would not use Firstname
and Party because the goal is to produce a chronological list that includes only last names. For that reason, the
Firstname and Party attributes would not be propagated to the Write File operator.

See the reference topic on propagation rules for complete explanations of the rules.

Using Steps in a Dataflow
Some dataflows require multiple input and output operators. These could be constructed as separate dataflows, but
because they are logically related, it is better to make them part of a single dataflow. In such cases, the dataflow can
be divided into multiple Steps.

When expressor Studio runs the dataflow, it will execute the Steps in sequence. You can include as many Steps in a
dataflow as you require and arrange them in the desired execution sequence.

Steps are listed as tabs at the bottom of the Dataflow panel. See Create Multiple Sequential Dataflows.

Examining Data as it Moves through a Dataflow
After data is read into an Input Operator's Schema, it is examined as it passes to each Semantic Type to ensure that it
conforms to each operator's requirements for processing. Every field or unit of data is mapped to Atomic Types that
specify a data type and establish Constraints on the data. expressor supports seven data types--string, integer,
decimal, datetime, boolean, byte, and double. Constraints specify rules such as the minimum and maximum length of
the data, patterns data must conform to, maximum and minimum numeric or datetime values, and the number of
digits before and after a decimal point. See Reference: Constraints on Semantic Types.

As each field or unit of data passes to an Atomic Type, it is validated against the data type and constraint
requirements of the Type. When the data does not conform to the requirements, it can disrupt the dataflow. However,
when constraints are defined for an Atomic Type, Recovery Actions can also be specified for cases in which the data
does not conform to the constraint. For example, when a string is not long enough to meet the minimum length
constraint, a recovery action can specify that the string will be padded with enough spaces on the right side to make
it conform to the constraint. Recovery actions correct the data so the dataflow can continue.

If recovery actions for constraints are not specified, then the data cannot be processed, and the operator processing
the data in the dataflow must determine how to proceed. An operator can skip a record and continue, or it can stop
the dataflow if a record does not conform to the constraints. Recovery Actions for operators are specified in the
Operator Properties. See Reference: Constraint Corrective Actions.




52
                                                                                                               Dataflows



Operator Properties
Operators' Properties vary according to the function of the individual operator. For example, operators that read data
into or write it out of the dataflow have a Connection Property that specifies a file containing information about
where the data source is or where it is to be written. Input and Output Operators also have a Schema Property to
indicate the Schema of the input or output data.


Properties are specified by
filling in the fields in the
Properties panel on the
right side of the Studio
window.

Note that the portion of
the panel above the
Property fields contains
information about the
individual fields. When you
select a field, the
appropriate information
displays, and you can also
page through the
information for the fields
with the Previous and Next
links. To close the
information display portion
of the Properties panel,

click the        button.


Transformation and the Rules Editor
Operators such as Filter, Join, Transform, and Aggregate allow for data transformation and/or control logic during
operator processing.

The Rules Editor is a graphical interface that simplifies transformation tasks and provides ready access to expressor's
library of transformation functions and operations. The Rules Editor also provides drag-and-drop capability for
visually mapping input and output. In some cases, the transformation logic of the Operator can be constructed with
the mapping capability alone.

All transformations and control logic in the Rules Editor are built on top of Datascript, expressor’s scripting language.
Datascript is written in the Rules Editor. When transformation or control logic is possible or required for an Operator,
the Edit Rules button is enabled on the ribbon bar.




                                                                                                                       53
expressor Documentation



Persistent Values
Operational values can be made to persist across different runs of a dataflow. These persistent values can also be
visible to separate dataflows. Function calls written in datascript operators such as Transform and Write Custom store
variables during the execution of a dataflow, and those values persist and can be called back during subsequent
executions of the dataflow, and they can be called by other dataflows.

The store and retrieve functions are part of the utility function library. They store and retrieve name/value
pairs with defined datatypes. The datatypes supported are:

          decimal

          number

          integer

          datetime

          string

          binary

          boolean

When a value is stored, it overwrites any existing value stored in the named variable, regardless of the datatype
previously stored in the variable. Persistent Values are stored in system global equivalent to the APPDATE directory.
The name of that directory can vary, but a commonly used directory name is CommonAppData.

The retrieve functions return values of the requested data type, and if no value is found for the variable named, nil is
returned. Nil is also returned if the value contained in the variable is not of the data type specified in the retrieval call.

See the utility function for details about the store and retrieve functions.


Create a New Dataflow
To create a new Dataflow, you must have a Project open.

     1.    Click the New Dataflow button on the Home tab of the ribbon bar.

     2.    Name the Dataflow with a name that is unique within the workspace.

          The name should be unique among all the artifacts within the workspace regardless of which projects and
          libraries it is included in.

     3.    Select the Project or Library in which to place the Dataflow.

     4.    Provide a description of the purpose of the Dataflow (optional).

          A Dataflow Build tab opens in the center panel of the project work area.




54
                                                                                                           Dataflows




Once a dataflow has been created, it must be built out by adding, connecting, and configuring operators.

         Build Dataflows with Operators


Build Dataflows with Operators

Required Steps                            Additional Steps


Add Operators                             Set Semantic Types for
                                          Operators


Connect Operators                         Rename Operators


Configure Operators                       Move Operators


                                          Create Multiple Sequential
                                          Dataflows


                                          Delete Operators or Links




Add Operators
Once a Dataflow has been created, you can add operators to specify exactly how data will be transformed by the
application.

    1.    Click the Dataflow Build tab on the ribbon bar to display the Dataflow tools.

    2.    Select the Operators tab in the left panel to display a listing of Operator categories.




                                                                                                                 55
expressor Documentation



     3.    Expand the Inputs category to display the Operators for reading data from a resource such as a database.




          Note that the Inputs category displays two tabs--New and Templates. The New tab lists the unconfigured
          operators, and the Templates tab lists any Operator Templates the user has created (see Create Operator
          Templates).




     4.    Drag an Input Operator into the Dataflow panel.

     5.    Continue expanding Operator categories and dragging the operators you need into the Dataflow panel.

Placing an operator in Dataflow panel populates the Properties in the right panel. The properties fields for each
operator you add to the dataflow can be filled in when the property is added to the dataflow, or they can be
completed when all the operators have been added. If you have used a template operator, the property fields are
already filled in. They can be changed if necessary.

Input and Output Operators must be fully defined before some of the properties for the Operators between Read and
Write can be completed. That is because Schema for the data brought in and sent out of the application affect how
the transformation operators work.

Once you've started working on a new dataflow, you should save it. Click the save icon on the Quick Access toolbar.




56
                                                                                                                 Dataflows



Connect Operators
    1.    Click an output port on an operator shape in the dataflow and drag the cursor to an input port on another
          operator in the dataflow.




    2.    Draw a line between each pair of Operators you intend to connect.

         When all the operators in a dataflow have been linked, you can finish filling out the properties fields for each
         operator.

Configure Operators
    1.    Select an Operator in the Dataflow panel.


The Operator's
properties appear in
the Properties panel on
the right side of the
Studio window.




    2.    Rename the Operator, if you haven't already done so.

         The new name will appear in the center of the Operator shape in the dataflow. The name in the oval attached
         to the shape does not change. The new name should indicate the exact function of the operator in this
         dataflow. The base name of the operator will continue to indicate the basic function. Renaming is optional, but
         it can alleviate confusion when the dataflow uses multiple copies of an operator.

         If you are using a template operator, leaving the name unchanged will probably better indicate the origin of
         the operator. On the other hand, if you reconfigure the template operator in any way, you probably want to
         create a unique name for the new configuration.

    3.    Make entries in each field of each operator's properties.

         The properties fields vary according to the basic function of each operator. An explanation of each property
         field is available in the panel above the property fields when an individual property is selected. Operator
         properties are also explained in Operators section of the online help.

         Configuration of the Filter, Join, Transform, and Aggregate operators entails using the Rules Editor to define
         the transformation of data that is to take place in the operator at the particular place where the operator is
         placed in a dataflow. See Transformations.

         When an operator is configured completely, the color of its shape in the Dataflow turns from yellow to white.




                                                                                                                          57
expressor Documentation



Set Semantic Types for Operators
Semantic Types for a Dataflow's operators are set by connecting and configuring the operators. You can, however,
change Composite Type attributes for operator output.

Input to operators is controlled by the upstream operator connected to the input port.

The Composite Type attributes used for output in transformation operators come from the operator's input attributes
and from downstream operators. Attributes that are assigned from input or from downstream operators cannot be
changed.

Users can, however, add new attributes to an operator's output. Composite Type attributes used for output in
transformation operators can be changed by:

          Adding and editing an output attribute in the Rules Editor.

          Importing an attribute from a Shared Composite Type.

          Importing an attribute from a Local Composite Type mapped to a Schema.

See Summary of Propagation Rules for a complete explanation of the rules for setting Semantic Types in operators.

Rename Operators
     1.    Select an Operator in the Dataflow panel.

          The properties of that Operator appear in the Properties tab to the right of the Dataflow panel.




     2.    Enter a new name in the Name: field.

          The new name will appear in the center of the shape in the dataflow. The name in the oval attached to the
          shape does not change. The new name should indicate the exact function of the shape in this dataflow. The
          base name of the shape will continue to indicate the basic function of this shape.

Move Operators
     1.    Select the Pointer tool in the Dataflow Build tab of the ribbon bar.

     2.    Drag an Operator in the Dataflow panel to a new location in the Dataflow.

          Connections between the moved operator remain intact.




58
                                                                                                            Dataflows



Create Multiple Sequential Dataflows
    1.    Select the Add Step button from the Steps section of the Dataflow Build tab on the ribbon bar.
          A new Step tab is added beneath the Dataflow panel.

    2.    Select the Rename button from the Steps section of the Dataflow Build tab on the ribbon bar to give the
          new Step a distinctive name.

Delete Operators or Links
    1.    Select an Operator or a Link with the Pointer arrow.

    2.    Click the Delete button on the Dataflow Build tab.

To delete a Step:

    1.    Open a dataflow step by clicking its tab beneath the Dataflow panel.

    2.    Select the Delete button from the Steps section of the Dataflow Build tab on the ribbon bar.




Configure Operators

Configure Input Operators


Configure Output Operators


Configure Transformation Operators


Create Operator Templates


Configure Input Operators
Input Operators have several distinctive properties that makes configuring them somewhat different from configuring
Output and transformation Operators.

    1.    Select an Input Operator shape in the Dataflow panel.

         The Operator's properties appear in the Properties panel on the right side of the Studio window. The
         following screen shot shows the properties for the Read Table operator.




                                                                                                                    59
expressor Documentation



     2.    Rename the operator.

          The new name will appear in the center of the shape in the dataflow. The name in the oval attached to the
          shape does not change. The base name of the shape will continue to indicate the basic function of this
          operator. Providing a new name is optional, but it can be useful to indicate the exact function of the shape in
          the Dataflow.

     3.    Select a Connection from the drop-down list of existing Connection artifacts, or use the button to the right
           of the list to create a new Connection for the operator.

     4.    Select a Schema from the drop-down list of existing Schema artifacts, or use the button to the right of the
           list to choose to create a new Schema for the operator.
           For the Read File operator, which uses Delimited Schema, a new Schema can be created from scratch or from
           an existing Composite Type. The existing Composite Type can be a Shared Type or it can be a Local Type
           that exists within the Schema. See Create New Delimited Schema from a Composite Type.

     5.    Select a Type from the drop-down list of existing Composite Types that the Schema can be mapped to.
           The list of Types consists of those Composite Types to which the Schema has been mapped. If the Schema
           has not been explicitly mapped to a Type in the Schema editor, then the drop-down list will contain a single
           Local Type.

     6.    Select a mapping from the drop-down list of existing mappings that have been created for the Schema.
           If the Schema has not been explicitly mapped to a Type in the Schema editor, then the drop-down list will
           contain a single default mapping to the default Local Type.

     7.    For a Read File operator, enter the file name to be read from the location indicated by the Connection
           property. Be sure to include the file name extension, e.g., .txt.

     8.    For a Read File operator, select from the drop-down list whether Record fields in the incoming file either do
           not have quotes or might have quotes.

     9.    For a Read File operator, enter the number of rows to skip at the beginning of the file.

     10. Select an option for Error handling.
           The action specified will be employed when the operator encounters a data error. The most common sources
           of data errors are mappings from schema fields to Atomic Types and constraint violations.

     11. Select or deselect the Show errors check box to display data errors data errors and results of recovery
           actions in Studio Messages panel and expressor command prompt window when using the etask
           command.

     12. For a Read Table operator, select or deselect the Override check box.
           When Override is selected, four additional text boxes display. The Current values text box shows the
           database, schema, and table currently specified for the operator (e.g., demo.dbo.customers). The other three
           text boxes allow you to specify another database, schema, and table to use for the dataflow.

     Note: Oracle databases show current values for the schema and table. The database value is meant to be
                the service name. The service name must be supplied in the Database entry field when overriding
                the database, schema, and table defined for the operator.

     13. For the Read Custom operator, select the Edit Rules button and write datascript that will generate data.



60
                                                                                                               Dataflows



Configure Output Operators
Output Operators have several distinctive properties that makes configuring them somewhat different from
configuring Input and transformation Operators.

    1.    Select an Output Operator shape in the Dataflow panel.

         The Operator's properties appear in the Properties panel on the right side of the Studio window. The
         following screen shot shows the properties for the Write Table operator.




    2.    Rename the operator.

         The new name will appear in the center of the shape in the dataflow. The name in the oval attached to the
         shape does not change. The base name of the shape will continue to indicate the basic function of this
         operator. Providing a new name is optional, but it can be useful to indicate the exact function of the shape in
         the Dataflow.

    3.    Select a Connection from the drop-down list of existing Connection artifacts, or use the button to the right
          of the list to choose to create a new Connection for the operator.

    4.    Select a Schema from the drop-down list of existing Schema artifacts, or use the button to the right of the
          list to choose to create a new Schema for the operator.
          For the Write File operator, which uses Delimited Schema, a new Schema can be created from scratch or
          from an existing Composite Type. The existing Composite Type can be a Shared Type or it can be a Local
          Type that exists within the Schema.

    5.    Select a Type from the drop-down list of existing Composite Types that the Schema can be mapped to.
          The list of Types consists of those Composite Types to which the Schema has been mapped. If the Schema
          has not been explicitly mapped to a Type in the Schema editor, then the drop-down list will contain a single
          Local Type.

    6.    Select a mapping from the drop-down list of existing mappings that have been created for the Schema.
          If the Schema has not been explicitly mapped to a Type in the Schema editor, then the drop-down list will
          contain a single default mapping to the default Local Type.




                                                                                                                         61
expressor Documentation



     7.   For a Write Table operator, select the Mode of operation to perform.
          See Write Table Operator for details about the Mode setting.

     8.   For a Write Table operator, select the key fields to use when Mode is set to Update.

     9.   For a Write Table operator, check whether or not to truncate the database table.

     10. For a Write File operator, enter the file name to be read from the location indicated by the Connection
          property.

     11. For a Write File operator, select from the drop-down list whether Record fields should always be quoted,
          never be quoted, or quoted as needed.
          The default setting is Quote Always. If you upgrade a Write File operator in dataflow that was written with
          the expressor 3.0 version, which did not have a setting for quotes, the default setting is No Quotes.

     12. For a Write File operator, check the Include header box to indicate whether or not to include a header row at
          the beginning of the data output.

     13. For a Write File operator, check the Append to output box to indicate whether the new output should be
          appended to any existing content in the output file. If you do not check Append to output, any existing
          content will be overwritten by the new output.

     14. For the Write Table operator, select whether or not to create a table in the database if the table matching the
          Schema does not already exist.

     15. For the Write Table operator, set the batch size, which is the maximum number of table rows to write in one
          batch.

     16. For the Write Table operator, select an option for Error handling.
          The action specified will be employed when the operator encounters a data error. The most common sources
          of data errors are mappings from schema fields to Atomic Types and constraint violations.

     17. For the Write Table operator, select or deselect the Show errors check box to display data errors data errors
          and results of recovery actions in Studio Messages panel and expressor command prompt window when
          using the etask command.

     18. For a Write Table operator, select or deselect the Override check box.
          When Override is selected, four additional text boxes display. The Current values text box shows the
          database, schema, and table currently specified for the operator (e.g., demo.dbo.customers). The other three
          text boxes allow you to specify another database, schema, and table to use for the dataflow.

     Note: Oracle databases show current values for the schema and table. The database value is meant to be
              the service name. The service name must be supplied in the Database entry field when overriding
              the database, schema, and table defined for the operator.

     19. For the Write Custom operator, select the Edit Rules button and write datascript that will process the data
          received through the dataflow.




62
                                                                                                               Dataflows



Configure Transformation Operators
Transformation Operators are comprised of all the operators that are not Input or Output Operators. They are
classified in two groups--Transformers and Utility. They have several distinctive properties that makes configuring
them different from configuring Input and Output Operators. The properties of transformation operators vary, so the
following procedure points to the separate operator documentation for descriptions of the properties.

    1.    Select a transformation operator shape in the Dataflow panel.

         The Operator's properties appear in the Properties panel on the right side of the Studio window. The
         following screen shot shows the properties for the Sort operator.




    2.    Rename the operator.

         The new name will appear in the center of the shape in the dataflow. The name in the oval attached to the
         shape does not change. The base name of the shape will continue to indicate the basic function of this
         operator. Providing a new name is optional, but it can be useful to indicate the exact function of the shape in
         the Dataflow.

    3.    See the documentation for the specific operators for descriptions of each one's particular properties.

         Aggregate

         Buffer

         Copy

         Filter

         Funnel

         Join

         Sort

         Transform

         Unique

    4.    Use the Rules Editor to write transformations in any of the following operators used in the dataflow:

         Aggregate

         Filter

         Join

         Transform




                                                                                                                       63
expressor Documentation



Create Operator Templates
After an operator has been configured, it can be saved as a template to use as a preconfigured operator in another
dataflow or in a different location within the same dataflow. The template for an operator appears under the
Operators tab on the left, under the Templates tab in each of the operator groups. It also appears in the Workspace
Explorer panel as an Operator Templates artifact. To use an operator template in a dataflow, you can drag it from
either the Operators list or the Explorer.

When an Operator Template is used in a dataflow, its configuration can be changed without affecting the original
template. Once an Operator Template is moved into a dataflow, it becomes a separate instance of the template and
so can be changed without affecting the original. In fact, after configuration changes are made, it can be saved as
another template.

Operator Templates made from the Read and Write Custom operators are different from other Operator Templates in
that the configured Composite Type is not preserved in the template. To reuse the Composite Type with the Operator
Template, you can save the Type as a Shared Type and then assign it to the template when the template is reused.

     1.    Select a configured operator in a dataflow.

     2.    Select the Save As Template button on the Dataflow Build tab in the ribbon bar.

     3.    Select the Project or Library in which to save the Operator Template.

     4.    Enter a unique name for the Operator Template.

     5.    Provide a description of the Operator Template (optional).
           The new Operator Template appears under the Operators tab, in the appropriate templates section and as
           an Operator Templates artifact in the Workspace Explorer.


Run a Dataflow
     1.    Open a dataflow in the center panel of Studio.

     2.    Click the Start button on the Dataflow Build tab in the ribbon bar.

     Note: When running a dataflow within Studio, the paths to data files and any external scripts used by the
                dataflow must be accessible. Paths specified in Connection files must be reachable from the
                system running Studio. If a relative pathname is used, it must start with the dataflow's external
                directory (workspace_name\Metadata\project_name\dfp\dataflow_name\external).
                External scripts called by datascript within transformation operators must be in the dataflow's
                external directory or accessible in the require statement's search path order. See Call external
                scripts from an expressor script for details on including external scripts in datascript. If you use
                Datascript Modules for external scripts, then you do not to set paths because Dataflow Modules
                are a part of the Project.

     3.    Select the Results tab in the bottom portion of the center panel, below the dataflow.

          The results of the processing are displayed under the Results tab.

     4.    Check the output source file or database to confirm the dataflow produced the intended results.

     Note: Multiple dataflows can be run simultaneously.



64
                                                                                                                Dataflows



Manage Log File Output
When expressor Studio runs a dataflow, logging information is displayed in the Results tab of the Status panel.
Currently the detail of the logging messages is not managed from within expressor Studio but by setting an
environment variable in the Environment Variables window accessed from the Start > Control Panel > System >
Advanced > Environment Variables button.

The default setting for the log variable is 4. Specifying a value for this variable is completely optional and should only
be done if you need to alter the detail of the logging messages, for example when tracking down a connectivity issue.

    1.   Create a System variable named EXP_TRACE_LEVEL

    2.    Set EXP_TRACE_LEVEL to a value between 1 and 7.
         This has the effects summarized in the following table.



         EXP_TRACE_LEVEL                   Description


                  1              Logs critical errors and
                                 failures and a description of
                                 why the error occurred.


                  2              Logs noncritical errors and
                                 failures such as the fact that
                                 the license file is about to
                                 expire.


                  3              Logs noticeable events.


                  4              Logs standard operational
                                 information, such as
                                 initialization of operators or
                                 anything out of the ordinary.


                  5              Logs extended operational
                                 information.


                  6              Logs standard debugging
                                 information


                  7              Logs extended debugging
                                 information.




    3.   Restart Studio after setting or changing the value of EXP_TRACE_LEVEL.




                                                                                                                        65
expressor Documentation



Debug a Dataflow
Studio provides for Dataflow debugging during development and when the Dataflow is run.

While building a Dataflow:

     1.    Select the Messages tab in the bottom portion of the center panel, beneath the Dataflow, to debug while
           building a Dataflow.

     2.    Read messages displayed as you add, link, and configure operators.

Before running a Dataflow:

     1.    Insert Output Operators, such as Write File and Write Table, after an operator that processes data, such as
           Sort or Join.

     2.    Click the Start button on the Dataflow Build tab on the ribbon bar.

     3.    Examine the results written by the Output Operator to determine if the processing operator performed as
           intended.

While running a Dataflow:

     1.    Select the Results tab in the bottom portion of the center panel, beneath the Dataflow, to display tracking
           and error messages in the running Dataflow.
           To specify the information to log during the dataflow run, see Manage Log File Output.

     2.    Click the Start button on the Dataflow Build tab on the ribbon bar.

     3.    Read messages under the Results tab as the Dataflow runs.

While writing Datascript in the Rules Editor:

          See Testing Datascript with Synthetic Debugging.


Reference: Propagation Rules
While data types generally propagate from left to right in a dataflow, there are variations in the propagation rules that
give users flexibility and simplify the matching of Composite Type attributes in operators.

Attributes propagate in only one direction--downstream. Propagation refers to the movement of attributes between
operators. Similarly, within operators, attributes transfer from input to output. The movement of attributes from input
to output is called Attribute Transfer.

     Note: A transformation operator is one of the five operators that use the Rules Editor to change data
               (Transform, Join, Aggregate, Pivot Row, and Pivot Column). Transformation operators are the
               only operators that can change output attributes to match the attributes required by Write
               operators.




66
                                                                                                              Dataflows



General Rules
   1.   A Composite Type is set in a Read operator, and its attributes flow downstream from there.

   2.   All downstream operators, except Write operators, automatically take the current output attributes from the
        upstream operator as their input attributes.

   3.   Downstream operators, except Write operators, automatically transfer input attributes to output attributes.

   4.   Users can explicitly create additional output attributes.

   5.   Users cannot explicitly assign or change input attributes. No input attributes are assigned until an upstream
        operator is attached to the input port of the transformation operator.

Input Attributes
   1.   Output attributes are marked with an arrow pointing right, or downstream, when they have been transferred
        from an input attribute.

   2.   When an attribute propagates as input to a transformation operator and the operator contains a Local
        output attribute with the same name, the Local output attribute is used instead of the transferred attribute.
        The Local attribute propagates downstream, as necessary, from that point.

   3.   In a Join operator, all input attributes from input1 are transferred to output except those for which a Local
        attribute with the same name exists. Attributes from input2 are transferred to output if there is no Local
        attribute or input1 attribute with the same name.

   4.   In Aggregate operators, only the input attributes selected as keys propagate to the output.

   5.   When an output attribute has the same name but a different data type than the input attribute with that
        name, the input attribute transfers to output and is propagated to the next operator. The original output
        attribute with that name is not propagated.

   6.   When the name of an input attribute is changed in an upstream operator, it propagates downstream and
        replaces the attribute with the original name.

   7.   When the data type of an input attribute is changed in an upstream operator, the change is reflected in the
        propagated attribute.

Local Attributes
   1.   Local Attributes are marked with a diamond icon in the Rules Editor.

   2.   When a new Local attribute is created with the same name as an existing transferred output attribute, it
        replaces the attribute transferred from input.

   3.   When a Local attribute is renamed with the name of an existing transferred attribute, it replaces the attribute
        transferred from input.

   4.   When a Local attribute is renamed with the name of a Required Attribute, the Required Attribute is fulfilled
        and disappears from the output.

   5.   When a Local attribute that fulfills a Required Attribute is renamed, the Required Attribute reappears in the
        output as unfulfilled.



                                                                                                                        67
expressor Documentation



     6.   The data type of a Local attribute can be changed.

     7.   When a Local attribute is deleted and an input attribute exists with the same name, the input attribute will
          appear as a transferred output attribute.

     8.   When a Local attribute that fulfills a Required Attribute is deleted, the Required Attribute reappears in the
          output as unfulfilled.

Required Attributes
     1.   When a Required Attribute appears in a transformation operator output, an arrow icon pointing left, or
          upstream, is displayed next to it.

     2.   Composite Type attributes set in a Write operator must be matched by attributes in the transformation
          operator immediately upstream. Such attributes are called Required Attributes.

     3.   While attributes only propagate downstream, Required Attributes can appear in an upstream transformation
          operator to indicate that they are required downstream.

     4.   When an input attribute is mapped to the required attribute, its icon changes to the diamond shape used to
          indicate Local attributes, attributes that are explicitly assigned in the operator.

     5.   When a transformation operator contains a Required Attribute with the same name as an input attribute, the
          input attribute is assigned to the output and the Required Attribute is removed.

     6.   If the input attribute assigned to a Required Attribute has a different data type, a Type Mismatch error will
          be indicated on the Write operator.

     7.   When a new Local attribute is created with the same name and data type as a Required Attribute, the Local
          attributed replaces the required attribute.

     8.   When a new Local attribute is created with the same name but different data type than a Required Attribute,
          the Local attributed replaces the required attribute but the Write operator containing that required attribute
          will get a Type Mismatch error.

     9.   When an attribute in a Write operator is renamed, it will be fulfilled if there is an upstream operator with the
          same name.

     10. When an attribute in a Write operator is renamed, it will appear as a Required Attribute in the upstream
          operator.




68
Operators

Aggregate Operator
The Aggregate operator performs grouping operations on incoming records.

The primary use of the Aggregate operator is to calculate summary data (e.g. averages or maximum and minimum
values) from a group of incoming records. Groups are formed on the basis of key values or the result of a change
function.

The following table lists the properties that control its functionality.


      Property                                                       Effect


Use Change                Specifies that the Change function will be used to select the record aggregation
Function                  instead of Aggregate keys. The Change function must be written for the aggregate
                          operation, and the Change Rule must be enabled.


Aggregate keys            Specifies the key field(s) used to group the incoming records.


Error handling            Specifies the action to take if a record contains invalid data. One common reason for
                          invalid data is that it failed to pass constraint validation. See Reference: Constraint
                          Corrective Actions.

                          Error handling actions available to the Aggregate operator are:

                          Abort Dataflow
                          Skip Remaining
                          Skip Group


Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.


Method                    Specifies whether records need to be maintained in memory or on disk during
                          processing.

                          Sorted indicates that records are sorted in ascending order by key value and that
                          records do not need to be maintained in memory or on disk during processing.

                          The other options – On disk and In memory– specify where incoming records are
                          stored during processing

                                   In memory means all processing takes place in RAM

                                   On Disk means records are written to disk

                          If all the records from input fit into the operator's internal buffer




                                                                                                                    69
expressor Documentation



                         (aggregate.alignment.block.size), the method defaults to In memory regardless of
                         the actual configuration setting.


Working connection       Name of a File Connection that identifies the location for writing temporary files
                         used in processing.




When writing a script for this operator, implementations are provided for mandatory and helper functions.


                                     This variable is re-initialized to an empty table as the operator
                                     begins processing each group of records with the same key
Global
                work                 value. It is available when processing all the records in a group
Variable
                                     and is used to retain information throughout the processing of
                                     the records.


                                     The Engine invokes this function as each record is processed.
                                     Include in the method body expressor Datascript that
                                     performs the desired data manipulation.

                                     Arguments:

                                              input – the record to be processed
                aggregate                     index – which record in the group is being processed

                                     Data extracted and processed from the record are used to
                                     update the value(s) stored in work.

                                     Note: The aggregate function can also be called as the "collate"
Mandatory                            function. However, the name "collate" will be deprecated in the
Functions                            future.


                                     The Engine invokes this function after the last record with a
                                     specific key value has been processed. Include in the method
                                     body expressor Datascript that finalizes the processing and
                                     returns the output record.

                result               Arguments:

                                              input– the last record in the group of records with the
                                               same key value

                                              count – the number of records in the group

                                     Data extracted and processed from work and/or input are used




70
                                                                                              Operators



                      to initialize the output record. This function returns the output
                      record.




                      Providing an implementation to this function is optional, though
                      a rule for the change function is included in the Aggregate
                      operator by default. This rule cannot be deleted, but initially it is
                      disabled. It must be enabled in the Rules Editor before it can be
                      executed.

                      Arguments:

            change             input

                               previous – the previous input record

                      If this function returns true, it begins processing another group.

                      If it returns false, processing of the current group continues.

                      When providing an implementation for this function, the method
Helper                configuration option must be set to sorted.
Functions
                      The Engine invokes this function before processing records with
                      a new key value. Argument:

                               input – the first record in the group of records having
                                the same key value.

                      Include in the method body expressor Datascript that initializes
                      variables needed for the operator processing. If desired, initialize
            prepare
                      a variable named work with the fields needed to complete the
                      processing and initialize the output record.

                      If this function is not implemented, an instance of the work
                      variable (representing an empty table) is available to the
                      aggregate function when processing the first record with a
                      specific key value.




                                                                                                    71
expressor Documentation



                                     Providing an implementation for this function is optional.
                                     Argument:

                                             output
                   sieve             If the script returns true or non-nil value, the evaluation is "true"
                                     and the record is emitted by the operator.

                                     If the script returns false or nil value, the evaluation is "false" and
                                     the record is not emitted by the operator.


                                     Invoked one time before the operator begins to process records.
                   initialize        This function has no arguments and no return value. See The
                                     initialize and finalize functions for details on using initialize.


                                     Invoked one time after the operator has processed all records
                                     successfully.
                   finalize
                                     This function has no arguments and no return value. See The
                                     initialize and finalize functions for details on using finalize.




When the Aggregate operator is opened in the Rules Editor, its basic functions are in place, and the prepare
function is set up based on the configuration of the operator's properties. The following is an example of the
operator in the Rules Editor.

     function prepare(input)

       --assign work.average_balance to the first
     input.average_balance
       work.average_balance = input.average_balance

       --assign work.input.average_balance.sum to the first
     input.average_balance.sum
       work.input.average_balance.sum =
     input.average_balance.sum

       --assign work.average_balance.count to the first
     input.average_balance.count
       work.average_balance.count =
     input.average_balance.count

       --assign work.average_line to the first
     input.average_line



72
                                                                                                            Operators


        work.average_line = input.average_line

       --assign work.average_line.count to the first
     input.average_line.count
       work.average_line.count = input.average_line.count

       --assign work.maximum_balance to the first
     input.maximum_balance
       work.maximum_balance = input.maximum_balance

       --assign work.minimum_balance to the first
     input.minimum_balance
       work.minimum_balance = input.minimum_balance

     end

     function aggregate(input, index)

     end

     function result(input, count)

       --set the output record to the values calculated in
     work
       output = work
       return output

     end

The initial set up in the prepare function usually needs to be changed for the particular processing that will be done
by the aggregate function. For example, the following script in the prepare function prepares data records for the
aggregate and result functions to calculate the average account balance and the average credit line.

     function prepare(input)

        work.average_balance = {}
        work.average_balance.sum = 0
        work.average_balance.count = 0
        work.average_line = {}
        work.average_line.sum = 0
        work.average_line.count = 0
        work.maximum_balance = input.account_balance
        work.minimum_balance = input.account_balance

     end




                                                                                                                    73
expressor Documentation


     function aggregate(input, index)
       if input.account_balance ~= nil then
         work.average_balance.sum = work.average_balance.sum +
     input.account_balance
         work.average_balance.count =
     work.average_balance.count + 1
       end
       if input.credit_line ~= nil then
         work.average_line.sum = work.average_line.sum +
     input.credit_line
         work.average_line.count = work.average_line.count + 1
       end
       if input.account_balance ~= nil and
     input.account_balance > work.maximum_balance
         then work.maximum_balance = input.account_balance end
       if input.account_balance ~= nil and
     input.account_balance < work.minimum_balance
         then work.minimum_balance = input.account_balance end
     end

     function result(input,count)
       work.average_balance = work.average_balance.sum /
     work.average_balance.count
       work.average_line = work.average_line.sum /
     work.average_line.count

        output = work
        return output

     end



Buffer Operator
The Buffer operator provides temporary storage space in a dataflow that has a multi-input operator, such as Join,
accepting input from a common source. The Buffer operator copies data on its single input port to its output port
without changing or reordering the data in any way. It accepts input data continuously and holds it, if necessary, until
the downstream operator is ready to accept it.

The Buffer operator is used to prevent deadlock in a dataflow. To be effective, Buffer operators must be used on each
data stream targeted to the multi-input operator.

Buffer operators are needed when a single stream of data is split into two streams and then later recombined using a
multi-input operator other than a Funnel operator. The Join operator is currently the only multi-input operator.

The following table lists the properties that control its functionality.




74
                                                                                                                  Operators



        Property                                                    Effect


Working connection        Name of a File Connection that identifies the location for writing temporary files
                          used in processing.




Copy Operator
The Copy operator copies each input record to two to ten output ports. To configure the operator, specify the
number of outputs and a file describing the connection.

The following table lists the properties that control its functionality.


        Property                                                    Effect


Number of outputs         Select from the drop down the number of output ports; default 2, maximum 10.




Filter Operator
The Filter operator is unusual for a Transformer Operator as it does not change the structure of the record. Instead, it
splits input records into one of two categories--true or false. This determination is based on one or more user-
supplied rule tests, which are entered as rules in the Rules Editor. One Filter operator in a dataflow can have up to ten
rules. It has one output port for each rule and a single output port for all records that test false. Records that evaluate
true are sent out the port attached to the rule that evaluates the record.

When the Filter operator uses more than one user-supplied test, it operates in one of two modes:

         All: all rules are evaluated and each one that evaluates "true" sends the appropriate data to its associated
          output port.

         First: rules are evaluated in order until one of the rules evaluates "true." Rules evaluation stops at that point.

The mode is selected when the Filter operator is open in the Rules Editor.


        Property                                                     Effect


Error handling             Specifies the action to take if a record contains invalid data. One common reason
                           for invalid data is that it failed to pass constraint validation. See Reference:
                           Constraints on Semantic Types and Reference: Constraint Corrective Actions.

                           Error handling actions available to the Filter operator are:

                           Abort Dataflow
                           Skip Remaining
                           Reject Remaining




                                                                                                                          75
expressor Documentation



                          Skip Record
                          Reject Record
                          Reject Current and Skip Remaining

Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.




When writing a script for this operator, implementations are provided for mandatory and helper functions.


                                        The Engine invokes this function as each record is processed.
                                        Include in the method body expressor Datascript code that
                                        uses field values from the record to return "true" or "false."

                                        The argument to filter is the incoming record.

                                        Unlike the other operators, the Filter operator has two outputs
                                        on the right side of its icon.

                                        The Filter operator uses the values extracted from the incoming
                                        record's fields in its processing logic. If the processing logic
                                        returns "true" or a non-nil value, the operator interprets the
                                        result as "true" and emits the record through the upper output.
                                        Otherwise, the record is emitted through the lower output.
 Mandatory
                 filter                 Additional outputs can be added in the Rules Editor so that
Function
                                        more than one rule can be applied to test the incoming record.
                                        The lowest output is always reserved for the output records that
                                        test "false."

                                        Outputs do not have to be connected. An unconnected output
                                        is treated as if its rule evaluates to "false."

                                        The Reject output on the bottom of the operator shape is used
                                        when the execution of any rule fails. The Reject output does not
                                        have to be connected. When it is connected, it affects handling
                                        of records when the All option has been selected. In that
                                        situation, records are not written as soon as the rule evaluates
                                        "true." Instead, all rules must be executed first to determine
                                        whether or not the record will be rejected.


 Helper          initialize             Invoked one time before the operator begins to process




76
                                                                                                                Operators



 Functions                              records.

                                        This function has no arguments and no return value. See The
                                        initialize and finalize functions for details on using initialize.


                                        Invoked one time after the operator has processed all records
                                        successfully.
                     finalize
                                        This function has no arguments and no return value. See The
                                        initialize and finalize functions for details on using finalize.




The Datascript in the Filter operator is constructed to test the selected input values for adherence to a stated rule.
The basic script provides the framework:




A simple true-false expression can replace the "true" field.




Funnel Operator
The Funnel operator combines data from multiple inputs that have the same Composite Type into a single output
stream. The Funnel operator takes input data from whichever input port has data ready to process so that it cannot
become blocked or deadlocked waiting for input from a specific port. It does not interleave or concatenate the input
data. It simply takes data as it comes through any input port and adds it to the stream.

The Funnel operator is used to combine data from two or more paths in a dataflow into a steam that can be delivered
to an operator that accepts data on only one port, such as a Write File or Write Table operator. The Funnel operator
differs from the Join operator, which also accepts input on multiple ports, in that it does not perform any join
processing; it simply combines the input data in the most efficient way possible.

The order in which data is selected from among the available inputs is non-deterministic and may vary from run to
run of a dataflow, even with the same data.

The following table lists the properties that control its functionality.


      Property                                                      Effect


Number of inputs          Select from the drop down the number of input ports; default 2, maximum 10.




                                                                                                                         77
expressor Documentation



Join Operator
The Join operator joins two inputs records by common key, similar to joining two database tables.

The following table lists the properties that control its functionality.


      Property                                                         Effect


Join keys                 Specifies the key field(s) used to join input records.


Primary input             Specifies the driving or input port. The default is 0, which sets the driver to input0.
                           The input0 port is the top port on the left side of the Join operator shape.


Join type                 inner means that when both inputs include a record with the same key value(s),
                          the records will be emitted from the primary output port on the Join operator
                          shape's right side. These emitted records might contain data derived from each of
                          the input records. Unmatched records will be emitted on one of the secondary
                          output ports on the shape's lower side.

                          outer means that all matched and unmatched records will be emitted from the
                          primary output port. Any fields in this record derived from the missing record will
                          be set to nil. No records will be emitted on the secondary output ports on the
                          shape's lower side.

                          0 or 1 is the same as specifying an outer join for only one of the inputs. On the Join
                          operator shape, the top port on the left side is the 0 input port, and the bottom
                          port on the left side is the 1 input port.

                          Join Type 0: records arriving on the 0 input port that do not have a matching
                          record arriving on input port 1 are emitted on the secondary output 0 port, which
                          is the port on the bottom left of the Join operator shape.

                          Join Type 1: records arriving on the 1 input port that do not have a matching
                          record arriving on input port 0 are emitted on the secondary output 1 port, which
                          is the port on the bottom right of the Join operator shape.


Method                    Specifies whether records need to be maintained in memory or on disk during
                          processing.

                          sorted indicates that records are sorted in ascending order by key value and that
                          records do not need to be maintained in memory or on disk during processing.

                          The other options – On disk and In memory– specify where incoming records are
                          stored during processing

                                   In memory means all processing takes place in RAM

                                   On Disk means records are written to disk




78
                                                                                                                    Operators



                          If all the records in the non-primary input fit into the operator's internal buffer
                          (joiner.alignment.block.size), the method defaults to In memory regardless of the
                          actual configuration setting.


Working connection        Specifies the name of a File Connection that identifies the location for writing
                          temporary files used in the sorting process.

                          When the Join fills its specified allocation in memory storage, a temporary file is
                          created. These files are merged into the output. The Working connection is the
                          location for these temporary files.


de-duplicate input 0      Allows selection of only the first or last record for a specific key field value on the
                          specified input port. These options are used to select a single record from the
de-duplicate input 1
                          corresponding input when more than one record has the same key value. This
                          record is then joined with all matching records from the other input. If a selection is
                          made for both de-duplicate options, at most one record is emitted for each key.

                          The default (blank) is to select all records with the same key field value. Alternative
                          options, first and last, select a single record.


Error handling            Specifies the action to take if a record contains invalid data. One common reason
                          for invalid data is that it failed to pass constraint validation. See Reference:
                          Constraint Corrective Actions.

                          Error handling actions available to the Join operator are:

                          Abort Dataflow
                          Skip Remaining
                          Skip Record

                          If any rule in the Join operator has an execution error, the recovery action will apply
                          to all of the rules. All other errors, such as input validation and rule parameter
                          validation, will cause the dataflow to abort.

Show errors               Displays record errors associated with constraint validation and the recovery
                          actions taken. The errors are displayed in the Studio Messages panel and in the
                          expressor command prompt window when using the etask command.

When writing a script for this operator, implementations are provided for mandatory and optional
functions.


                                        The Engine invokes this function as each pair of matched
                                        records is processed. Include in the method body expressor
 Mandatory
                   join                 Datascript code that uses field values from the records to
Function
                                        initialize the output record.

                                        The arguments to join are the two incoming records. Data



                                                                                                                          79
expressor Documentation



                                       extracted and processed from the records are used to initialize
                                       the output record. This function returns the output record.

                                       Note: The join function can also be called as the "joiner"
                                       function. The Engine looks for "join" first.


                                       Providing an implementation for this function is optional.
                                       Argument:

                                                 output
                    sieve              If the script returns true or non-nil value, the evaluation is "true"
                                       and the record is emitted by the operator.

                                       If the script returns false or nil value, the evaluation is "false"
                                       and the record is not emitted by the operator.
 Helper
                                       Invoked one time before the operator begins to process
 Functions
                                       records.
                    initialize
                                       This function has no arguments and no return value. See The
                                       initialize and finalize functions for details on using initialize.


                                       Invoked one time after the operator has processed all records
                                       successfully.
                    finalize
                                       This function has no arguments and no return value. See The
                                       initialize and finalize functions for details on using finalize.




Pivot Column Operator
The Pivot Column operator takes data from the columns of multiple records and creates a single record. Records
often need to be pivoted to redirect data to a new database schema.

For example, a column containing records with individual months could be pivoted to create a single record
containing fields for each month.

The following table lists the properties that control the Pivot Column operator's functionality.


      Property                                                     Effect


Error handling           Specifies the action to take if a record contains invalid data. One common reason for
                         invalid data is that it failed to pass constraint validation. See Reference: Constraint
                         Corrective Actions.




80
                                                                                                                  Operators



                             Error handling actions available to the Pivot Column operator are:

                             Abort Dataflow
                             Skip Record
                             Skip Remaining


Show errors                  Displays record errors associated with constraint validation and the recovery actions
                             taken. The errors are displayed in the Studio Messages panel and in the expressor
                             command prompt window when using the etask command.


Method                       Specifies whether records need to be maintained in memory or on disk during
                             processing.

                             Sorted indicates that records are sorted in ascending order by key value and that
                             records do not need to be maintained in memory or on disk during processing.

                             The other options – On disk and In memory– specify where incoming records are
                             stored during processing

                                     In memory means all processing takes place in RAM

                                     On Disk means records are written to disk

                             If all the records from input fit into the operator's internal buffer, the method
                             defaults to In memory regardless of the actual configuration setting.


Working connection           Name of a File Connection that identifies the location for writing temporary files
                             used in processing.


To follow the example indicated above, a table whose records show the monthly sales totals for each sales
representative could be pivoted to list each sales representative's sales totals month by month in a single record. See
Write Pivot Rules for instruction on using the Rules Editor's Pivot tool to restructure the input attributes into the
output attributes.

Input attributes:


Month      ID        Sales


Jan.       1011      100


Feb.       1011      200


Mar.       1011      300


Jan.       1012      101


Feb.       1012      201




                                                                                                                        81
expressor Documentation



Mar.      1012      301


Jan.      1013      102


Feb.      1013      202


Mar.      1013      302


Jan.      1014      103


Feb.      1014      203


Mar.      1014      303




Output attributes:


ID       Jan.     Feb.    Mar.


1011     100      200     300


1012     101      201     301


1013     102      202     302


1014     103      203     303




Pivot Row Operator
The Pivot Row operator takes data from a row of record fields and creates separate records for each field. Records
often need to be pivoted to redirect data to a new database schema.

For example, a record row with data organized by month could be pivoted to create individual records for each
month.

The following table lists the properties that control Pivot Row operator's functionality.


       Property                                                      Effect


Error handling             Specifies the action to take if a record contains invalid data. One common reason
                           for invalid data is that it failed to pass constraint validation. See Reference:
                           Constraints on Semantic Types and Reference: Constraint Corrective Actions.




82
                                                                                                                Operators



                            Error handling actions available to the Pivot Row operator are:

                            Abort Dataflow
                            Skip Record
                            Reject Record
                            Skip Remaining
                            Reject Remaining


Show errors                 Displays record errors associated with constraint validation and the recovery actions
                            taken. The errors are displayed in the Studio Messages panel and in the expressor
                            command prompt window when using the etask command.


To follow the example indicated above, a table whose records show the monthly sales totals for each sales
representative could be pivoted to list each month separately for each sales rep. See Write Pivot Rules for instruction
on using the Rules Editor's Pivot tool to restructure the input attributes into the output attributes.

Input attributes:


ID       Jan.     Feb.    Mar.


1011     100      200     300


1012     101      201     301


1013     102      202     302


1014     103      203     303




Output attributes:


Month      ID       Sales


Jan.       1011     100


Feb.       1011     200


Mar.       1011     300


Jan.       1012     101


Feb.       1012     201


Mar.       1012     301




                                                                                                                      83
expressor Documentation



Jan.       1013     102


Feb.       1013     202


Mar.       1013     302


Jan.       1014     103


Feb.       1014     203


Mar.       1014     303




Read Custom Operator
The Read Custom operator executes expressor Datascript code to produce data that can be processed by a
downstream operator. Using the built-in capabilities of Datascript, Read Custom can access source data systems that
might not be accessible by other Read operators provided by expressor. For example, in the Web Services sample
application, a Read Custom operator is used to access source data through a web services call to a SaaS application.

Addtional sections in this topic

Read Custom Operator Properties

Read Custom Functions

Return Value for the read function

Defining the Output Type of the data produced by the Read Custom operator

Error Handling

Example 1: Standard Use

Example 2: Using an iterator function to produce records

Related topics

The expressor Datascript language

Transformations

Write Custom Operator

Read Custom Operator Properties
The following table lists the properties that control its functionality.


       Property                                                     Effect




84
                                                                                                             Operators



 Error handling          Specifies the action to take if a record contains invalid data. See Reference:
                         Constraints on Semantic Types and Reference: Constraint Corrective Actions.

                         Error handling actions available to the Read Custom operator are:

                         Abort Dataflow
                         Skip Remaining
                         Skip Record


 Show errors             Displays record errors associated with constraint validation and the recovery
                         actions taken. The errors are displayed in the Studio Messages panel and in the
                         expressor command prompt window when using the etask command.




 Read Custom Functions
 Implementation of the following functions controls the behavior of the Read Custom operator.




                                     The Engine invokes this function to get data to send to the downstream
                                     operator in the dataflow.

                                     The standard approach to using this function is for the expressor
                                     Engine to call the read function repeatedly; each time the function
 Mandatory
                  read               returns it provides the processor with a record to send downstream. The
 Functions
                                     custom code in the read function would populate and return that
                                     record or one of the other acceptable return values which control the
                                     behavior of the read function as described in the Return Values table
                                     below.



                                     A Lookup Rule uses this function to generate a new record when
                                     the requested record does not exist in the specified Lookup
                                     Table.
                  generate           The generate function is written when the On miss setting on a
                                     Lookup Rule is Generate Record. When the On miss setting is
 Helper
                                     Output Nil or Escalate Error, the generate function is not
Functions
                                     needed.


                                     The Engine invokes this function one time before the operator

                  initialize         begins to invoke the read function.

                                     This function has no arguments or return values.




                                                                                                                   85
expressor Documentation



                                    Use this function to set up a connection to your data source, e.g.,
                                    establish a connection to an FTP server or obtain a handle to a
                                    file that will be progressively read on each invocation of the
                                    read function.

                                    See The initialize and finalize functions for details on using
                                    initialize.


                                    The Engine invokes this function one time after the operator has
                                    completed emitting all records, that is, after the operator stops
                                    receiving records from its upstream operator.

                                    This function has no arguments or return values.
                 finalize
                                    Use this function to free any resources obtained during execution
                                    of the initialize function.

                                    See The initialize and finalize functions for details on using
                                    finalize.




Return values for the read function

Returned value     Meaning


table              This return value indicates the output record to send to the
                   operator downstream of the Read Custom operator. Note that
                   the data values must be assigned using the names of the output
                   type attributes expected by the Read Custom operator. See the
                   Example below.

                   With this return value, the read function will be called again.


true               This return value indicates that there are no more records to
                   send downstream and the read function should not be called
                   again.


false              This return value indicates that a record value is not being
                   provided on this call to the read function but that the read
                   function should be called again by the Engine anyway.


nil                This return value indicates that an error occurred and that the
                   read function not be called again.




86
                                                                                                            Operators



Advanced use


iterator function    This is for advanced use. See Example 2: Using an iterator
                     function to produce records.




Defining the Output Type of the data produced by the Read Custom operator
Composite Types are required to define the data produced by the read function and propagated to the downstream
operator. The Composite Type attributes are assigned within the Rules Editor. The attributes must, of course, map to
the data produced by the Read Custom datascript.

The Datascript example below illustrates the syntax used to assign data from the read function to the attributes
defined by the output Type.

Error Handling
Errors that occur while the Read Custom operator is processing are handled as follows.


Description of the     When detected                                    Action
error


Syntax error in        On operator initialization                       The flow will terminate
datascript                                                              abnormally.


Datascript logic       On operator initialization in the                The flow will terminate
errors                 initialize function                              abnormally.


Generated errors       While read function or an iterator is            The error can be
                       running                                          handled. See the error
                                                                        handling property for
                                                                        more details.


Field processing       When a record is converted from datascript       The error can be
errors                 table to an internal form                        handled. See the error
                                                                        handling property for
                                                                        more details.




Example 1: Standard Use
This example reads lines from a text file and produces records containing a record number and the text of each line.
The operator's output Composite Type consists of two attributes:


        recordNumber



                                                                                                                       87
expressor Documentation



         recordData

Empty lines are skipped (not sent downstream).

Setup
To successfully use this example, add any text file named "data.txt" to the External Files folder in the same project as
the dataflow that contains this Read Custom operator.

Code for Read Custom operator

fileName = "data.txt"
function initialize()
  local message = ""
  infile, message = io.open(fileName, "r")
  if not infile then
    error("Unable to open file '" .. fileName .."' for reading.")
  end
  recordNumber = 0
end

function read()

     -- return records or behavioral flags
     local data = infile:read()
     if not data then
       -- all done
       return true
     end
     recordNumber = recordNumber + 1
     if string.length(recordData) < 1 then
       -- skip this record
       return false;
     else

    -- return/produce a table (the record)
     return {
       recordNumber = recordNumber,
       recordData = recordData
       }
  end
end


function finalize()
  infile:close()
end

Example 2: Using an iterator function to produce records
As described above, the standard approach to using the read function is to have the expressor Engine call the read
function repeatedly to produce records or provide behavioral indications.




88
                                                                                                                Operators



An alternative approach is to use an iterator function to produce records. In this approach, the custom code in the
read function would return a function to the expressor Engine. The Engine would no longer call the read function
but would instead call the returned iterator function repeatedly to produce records.

In the example below, a Read Custom operator contains datascript that can read all files in the C:\temp directory
whose names end in ".txt" and streams their lines to the downstream operators along with their file names and record
numbers. It relies on an iterator function to read the data incrementally.

Setup
To successfully use this example, add one or more text files with the ".txt" extension to the C:\temp directory.

Code for the Read Custom operator
This example does the same thing as the preceding example but uses an iterator instead of returning record directly
from the read function. It is a contrived example in this case, but a realistic use would be to request a batch of data
from some external system that has significant latency (such as a web service). You can copy the datascript code
below and paste it into a Read Custom operator in your dataflow to serve as starting point for your own customized
function.

     -- some globals to track where we're at
     file_directory = "C:\\temp"
     file_pattern = "*.txt"
     files = {}    -- array for the names of files that match
     the pattern
     cur_file_num = 1
     cur_line_iter = nil
     cur_file_name = nil
     cur_line_num = nil
     function initialize()
          -- find the names of the files that match the
          pattern
          local dir_cmd = "dir /B /A-D-H-S"
          local prefix = ''
          if type(file_directory) == 'string' then
                if type(file_pattern) == 'string' then
                     dir_cmd = dir_cmd .. ' ' .. file_directory
                     .. '\\' .. file_pattern
                else
                     dir_cmd = dir_cmd .. ' ' .. file_directory
                end
                prefix = file_directory .. '\\'
          elseif type(file_pattern) == 'string' then
                dir_cmd = dir_cmd .. ' ' .. file_pattern
          end
          local pipe = io.popen(dir_cmd)
          for fname in pipe:lines() do




                                                                                                                          89
expressor Documentation


           files[#files + 1] = prefix ..
           string.gsub(fname,"([^%s]*)%s*","%1")
           -- strip trail CR & NL
           end
           pipe:close()
     end
     -- helper function to get a line iterator for the next
     file
     function getNextFileIterator()
          if infile then
               infile:close() -- close the file we just
               finished reading
          end
          while cur_file_num <= #files do
               cur_file_name = files[cur_file_num]
               infile, message = io.open(cur_file_name, "r")
               if infile then
                     cur_line_num = 1
                     cur_file_num = cur_file_num + 1 -- advance
                     to next file
                     return infile:lines()
               end
               log.information("Could not open file %s for
               read (error: %s). Skipping.", cur_file_name,
               message)
               cur_file_num = cur_file_num + 1    -- advance to
               next file
          end
          return nil -- must have exhausted all file in the
          pattern match list
     end
     -- the line iterator function (multi-file capable, thanks
     to helper above)
     function nextFileLine()
          while 1 do
               if not cur_line_iter then
                     cur_line_iter = getNextFileIterator()
                     if not cur_line_iter then
                          return nil -- exhausted all files
                     end
               end
               local data = cur_line_iter()
               while data do
                      if string.length(data) > 0 then
                          local tmpline = cur_line_num




90
                                                                                                               Operators


                                      cur_line_num = cur_line_num + 1
                                      return { recordFile=cur_file_name,
                                      recordNumber=tmpline, recordData=data
                                      }
                              end
                             data = cur_line_iter()
                     end
                     -- exhausted data in current file
                     cur_line_iter = nil -- to force next file open
             end
       end
       function read()
            return nextFileLine -- NOT a call, but returning a
            function instead
       end



Read File Operator
The Read File operator reads data from a file.

The following table lists the properties that control its functionality.


       Property                                                     Effect


Connection                The name of a File Connection that identifies the location of the input file.


Schema                    The name of a Schema that specifies the metadata structure of an incoming record.


Type                      Name of the Composite Type the Schema will be mapped to.


Mapping                   Name of the Schema-to-Composite-Type mapping.


File name                 The name of the file to read, including file name extension, e.g., .txt.. The file
                          system location of this file is provided by the Connection field.


Skip rows                 The number of rows to skip at the beginning of the file.


Quotes                    Whether Record fields in the incoming file do not have quotes or might have
                          quotes.


Error handling            Specifies the action to take if a record contains invalid data. One common reason
                          for invalid data is that it failed to pass constraint validation. See Reference:
                          Constraint Corrective Actions.

                          Error handling actions available to the Read File operator are:




                                                                                                                     91
expressor Documentation



                          Abort Dataflow
                          Skip Remaining
                          Reject Remaining
                          Skip Record
                          Reject Record


Show errors               Displays record errors associated with constraint validation and the recovery
                          actions taken. The errors are displayed in the Studio Messages panel and in the
                          expressor command prompt window when using the etask command.




Read Lookup Table Operator
The Read Lookup Table operator reads the records in a Lookup Table. It is used to send the records to an operator
such as Write File that can write them to a readable file. This provides a means for examining the data in the Lookup
Table.


         Property                                                    Effect


Lookup Table              Specifies the name of the Lookup Table artifact that identifies the Lookup Table to
                          read.


Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.




Read Table Operator
The Read Table operator reads records from a single database table or view in a variety of RDBMS or proprietary
databases.

The following table lists the properties that control its functionality.


         Property                                                    Effect


Connection                 The name of a Database Connection that identifies the location of the input table.


Schema                     The name of a Schema that specifies the metadata structure of an incoming record.


Type                       Name of the Composite Type the Schema will be mapped to.


Mapping                    Name of the Schema-to-Composite-Type mapping.




92
                                                                                                            Operators



Error handling          Specifies the action to take if a record contains invalid data. One common reason
                        for invalid data is that it failed to pass constraint validation. See Reference:
                        Constraint Corrective Actions.

                        Error handling actions available to the Read Table operator are:

                        Abort Dataflow
                        Skip Remaining
                        Reject Remaining
                        Skip Record
                        Reject Record


Show errors             Displays record errors associated with constraint validation and the recovery actions
                        taken. The errors are displayed in the Studio Messages panel and in the expressor
                        command prompt window when using the etask command.


Override                Specify a database, table, and schema to use for the operator instead of those
                        defined in the Schema artifact assigned to the operator. When this option is
                        selected, fields for Current values, Database, Schema, and Table display below.


Current values          Displays the current database, schema, and table specified for the operator. This is a
                        read-only field.

                        Note: Oracle databases show current values for the schema and table. The database
                        value for Oracle is not included and need not be supplied in the Override's
                        Database field.


Database                A database available through the specified Connection.

                        Note: This field can be left blank when the database you are connecting is an
                        Oracle database.


Schema                  An ODBC schema available with the newly specified database.


Table                   A table available in the newly specified database. The metadata structure of this
                        table must match the Schema artifact named in the Schema operator property.


    Note: To use the database operators with Microsoft MSSQL Server, both the SQL Server
              instance and the SQL Server Browser service must be running on the computer hosting
              the database system. The SQL Server instance must be configured to use mixed mode
              authentication if you want to specify the database credentials from within the project
              (i.e., through the Connection file). In this case, select the SQL Server and Windows
              Authentication mode radio button in the Server authentication grouping on the Server
              Properties, Security page accessed through the management console. Optionally,
              Windows authentication only may be used.



                                                                                                                  93
expressor Documentation



     Note: Tables read from an Excel spreadsheet must define the intended headers in the
              spreadsheet. Schema created from Excel spreadsheet data derive headers from the first
              row in the spreadsheet.

     Note: To use the Read Table operator to connect to a Teradata database, you must install the
              Teradata client libraries CLIV2, tdicu, and Teradata generic security service. expressor
              includes the driver used by the Read Table operator to communicate with Teradata, but
              it does not include those libraries. They must be acquired directly from the Teradata
              Download Center.



Sort Operator
The Sort operator sorts records according to specified key fields. The amount of memory allocated to the operator is
specified through a configuration variable or in an operator option.

The following table lists the properties that control its functionality.


      Property                                                      Effect


Sort keys                 Specifies the key field(s) used to align input records.

                          An ascending sort is the default but it can be explicitly specified with the keyword
                          asc.


Working connection        Specifies the name of a File Connection that identifies the location for writing
                          temporary files used in the sorting process.

                          When the Sort fills its specified allocation in memory storage, a temporary file is
                          created. This file is merged into the output. The Working connection is the location
                          for this temporary file or files.


Working memory            The amount of memory that the sort operator can use while processing records.

                          A setting less than 8 megabytes defaults to 8 megabytes.




SQL Query
The SQL Query operator reads records from a database with a user-supplied SQL statement. Use this operator to
execute stored procedures or SELECT statements that include table joins and/or complex WHERE clauses.

The following table lists the options that control its functionality.


      Property                                                          Effect




94
                                                                                                                   Operators



Connection                The name of a Database Connection that identifies the location of the input table.


Schema                    The name of an SQL Query Schema that contains the SQL query to be performed on
                          the connected database table or view.


Type                      Name of the Composite Type the Schema will be mapped to.


Mapping                   Name of the Schema-to-Composite-Type mapping.


Error handling            Specifies the action to take if a data record fails to pass constraint validation. See
                          Reference: Constraints on Semantic Types and Reference: Constraint Corrective
                          Actions.

                          Error handling actions available to the SQL Query operator are:

                          Abort Dataflow
                          Skip Remaining
                          Reject Remaining
                          Skip Record
                          Reject Record


Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.


    Note: To use the SQL Query operator with Microsoft MSSQL Server, both the SQL Server instance and the
              SQL Server Browser service must be running on the computer hosting the database system. The
              SQL Server instance must be configured to use mixed mode authentication if you want to specify
              the database credentials from within the project (i.e., through the Database Connection).


Transform Operator
The Transform operator performs a transformation that is written in the Rules Editor. A transformation can be written
as either an Expression Rule or a Function Rule. The Transform operator can also use Lookup Rules to perform queries
on a Lookup Table.

The following table lists the properties that control its functionality.


       Property                                                      Effect


Error handling             Specifies the action to take if a record contains invalid data. One common reason
                           for invalid data is that it failed to pass constraint validation. See Reference:
                           Constraints on Semantic Types and Reference: Constraint Corrective Actions.

                           Error handling actions available to the Transform operator are:

                           Abort Dataflow



                                                                                                                         95
expressor Documentation



                            Skip Remaining
                            Reject Remaining
                            Skip Record
                            Reject Record


Show errors                 Displays record errors associated with constraint validation and the recovery actions
                            taken. The errors are displayed in the Studio Messages panel and in the expressor
                            command prompt window when using the etask command.




A transformation can be written as either an Expression Rule or a Function Rule. When writing Function Rules for the
Transform operator, the following mandatory and helper functions are available in the Rules Editor.


                                          The Engine invokes this function as each record is processed.
                                          Include one rule with expressor Datascript code that
                                          transforms the data and returns the output record.
 Mandatory
                   transform              The argument to the transform function is the incoming record.
Function
                                          Data extracted and processed from the record are used to
                                          initialize the output record. This function returns the output
                                          record.


                                          The Engine invokes this function as each record is processed
                                          before invoking the transform function. Include in the method
                                          body expressor Datascript code that determines whether the
                                          record should be processed by the transform function.

                                          The argument to filter is the incoming record:

                                          function filter(input)
                   filter
                                          If the processing logic returns true or a non-nil value, the
 Helper                                   operator interprets the result as "true" and passes the record to
 Functions                                the transform function.

                                          If the calculation returns false or a nil value, the operator
                                          interprets the result as "false" and does not pass the record to
                                          the transform function.


                                          A Lookup Rule uses this function to generate a new record
                                          when the requested record does not exist in the specified
                   generate
                                          Lookup Table.

                                          The generate function is written when the On miss setting on



96
                                                                                                             Operators



                                      a Lookup Rule is Generate Record. When the On miss setting is
                                      Output Nil or Escalate Error, the generate function is not
                                      needed.


                                      Providing an implementation for this function is optional.
                                      Argument:

                                                output.
                    sieve             If the script returns true or non-nil value, the evaluation is
                                      "true" and the record is emitted by the operator.

                                      If the script returns false or nil value, the evaluation is "false"
                                      and the record is not emitted by the operator.


                                      Invoked one time before the operator begins to process
                                      records.
                    initialize
                                      This function has no arguments and no return value. See The
                                      initialize and finalize functions for details on using initialize.


                                      Invoked one time after the operator has processed all records
                                      successfully.
                    finalize
                                      This function has no arguments and no return value. See The
                                      initialize and finalize functions for details on using finalize.




The Transform operator starts as a simple mapping of input to output. In fact, if the transformation is meant to simply
map input attributes to output attributes, no rules need be written. The Transform operator does not require that any
rules be used.

Usually, though, the Transform operator is placed in the dataflow to perform some kind of transformation, more than
simply copying input to output. As indicated above, the transformation can be written as an Expression Rule or a
Function Rule. An Expression Rule is a simple assignment statement in which the expression operates on the input
parameters and assigns the result to a single output parameter. It does not explicitly use the transform function. In
an Expression Rule, the transform function is implied.

In a Function Rule, however, the transform function must be used explicitly. A Function Rule for the Transform
operator opens with the following skeletal Datascript:




                                                                                                                     97
expressor Documentation



The following sample Datascript for the transform function creates a full name out of separate fields for first name
and last name.

     function transform(input)

       output.Fullname = string.concatenate(input.Firstname,"
     ",input.Lastname)

        return output;

     end;




Trash Operator
The Trash operator serves as a "no operation" endpoint. It simply drops all records that it receives.

     Note: The Trash operator was previously named "Null." Null operators used in Dataflows in expressor
              software versions prior to version 3.4 will automatically be renamed as the Trash operator.




Unique Operator
The Unique operator selects data records based on the presence of one or more key fields and the mode of operation
specified.

The following table lists the properties that control its functionality.


       Property                                                      Effect


Aggregate key             Specifies the key field(s) used to group the incoming records.


Mode                      Specifies the operation to be performed.

                                   first — Selects the first record in a series of records with matching keys. A
                                    single unique record is selected.

                                   last — Selects the last record in a series of records with matching keys. A
                                    single unique record is selected.

                                   unique — Selects only those records that have unique (no matching) keys.
                                      A single unique record is selected.

                                   duplicate — Selects only those records that have duplicate (matching)
                                    keys. A collection of records is selected.


Method                    Specifies whether records need to be maintained in memory or on disk during
                          processing.




98
                                                                                                                Operators



                          sorted indicates that records are sorted in ascending order by key value and that
                          records do not need to be maintained in memory or on disk during processing.

                          The other options – On disk and In memory– specify where incoming records are
                          stored during processing

                                  In memory means all processing takes place in RAM

                                  On Disk means records are written to disk

                          If all the records in the non-primary input fit into the operator's internal buffer
                          (unique.alignment.block.size), the method defaults to In memory regardless of the
                          actual configuration setting.


Working connection        Name of a File Connection that identifies the location for writing temporary files
                          used in processing.


The following example illustrates how records are selected using each of the four modes of operation. The input data
records contain five fields:

Employee_Number

Last_name

First_name

Hire_Date

Department

In the following configuration of the Unique operator, the Hire_Date of each record is evaluated, and because the
Mode specified is "First record," the record or records with earliest hire date are selected.




                                                                                                                      99
expressor Documentation



To get the latest hire date, you would select the same Aggregate key (Hire_Date) and specify the "Last record" Mode.

If we want to use the Unique operator to find all departments with only one employee, we would select the
Department key and specify "Unique records" as the Mode.




Finally, we could use the Unique operator to find out if, by mistake, we have two or more employees with the same
employee number. To do that, we would configure the operator as follows:




In this configuration, the Employee_Number field of each record would be compared with the Employee_Number
fields of all the other records, and if any two records have the same number, those two records would be selected and
passed on through the Unique operator's output port.




100
                                                                                                              Operators



Write Custom Operator
The Write Custom operator executes expressor Datascript to consume and process data produced by the upstream
operator. Using the built-in capabilities of Datascript, Write Custom can access target data systems that might not be
accessible by other Write operators provided by expressor. For example, a Write Custom operator could be used to
write to a target data system through a web services call to a SaaS application.

A Composite Type is required to match data from the upstream operator. The Composite Type is assigned
automatically when Write Custom is linked to an upstream operator.

Additional sections in this topic

Write Custom Operator Properties

Write Custom Functions

Return Values for the write function

Defining the Input Type of the data consumed by the Write Custom operator

Error Handling

Example

Related topics

The expressor Datascript language

Transformations

Read Custom Operator

Write Custom Operator Properties
The following table lists the properties that control its functionality.


      Property                                                       Effect


Error handling            Specifies the action to take if a record contains invalid data. See Reference:
                          Constraints on Semantic Types and Reference: Constraint Corrective Actions.

                          Error handling actions available to the Write Custom operator are:

                          Abort Dataflow
                          Skip Record
                          Reject Record
                          Skip Remaining
                          Reject Remaining


Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.




                                                                                                                   101
 expressor Documentation



 Send nil at end           Sends a final record with nil values when all records have been written. Default




 Write Custom Functions
 Implementation of the following functions controls the behavior of the Write Custom operator.



                                      The Data Processing Engine invokes this function as each
                                      record enters the operator from its upstream operator. The
                                      upstream record is provided as an input argument to this
                                      function.
 Mandatory
                   write              The method body should be customized with expressor
 Functions
                                      Datascript to write the record to the external target data system
                                      or otherwise process the record.

                                      The return values for this function are described in the Return
                                      Values section below.


                                      A Lookup Rule uses this function to generate a new record when
                                      the requested record does not exist in the specified Lookup
                                      Table.

                   generate           The generate function is written when the On miss setting on a
                                      Lookup Rule is Generate Record. When the On miss setting is
                                      Output Nil or Escalate Error, the generate function is not
                                      needed.


                                      The Engine invokes this function one time before the operator
 Helper                               begins to invoke the write function.
Functions
                                      This function has no arguments or return values.

                                      Use this function to perform one-time initialization tasks such as:
                   initialize         setting up a connection to your data source, e.g., establishing a
                                      connection to an FTP server or obtaining a handle to a file that
                                      will be written during each invocation of the write function.

                                      See The initialize and finalize functions for details on using
                                      initialize.


                   finalize           The Engine invokes this function one time after the operator has




 102
                                                                                                           Operators



                                      completed emitting all records, that is, after the operator stops
                                      receiving records from its upstream operator.

                                      This function has no arguments or return values.

                                      Use this function to free any resources obtained during execution
                                      of the initialize function.

                                      See The initialize and finalize functions for details on using
                                      finalize.




Return value for the write function

Returned            Meaning
value


nil                 The record was accepted/processed


false               The record was skipped (not written)




Defining the Input Type of the data consumed by the Write Custom operator
A Composite Type is required to match data from the upstream operator. The Composite Type is assigned
automatically when Write Custom is linked to an upstream operator.

The Datascript example below illustrates the syntax used by the write function to take data from the attributes
defined by the input Type.

Error Handling
Errors that occur while the Write Custom operator is processing are handled as follows.


Type of the error          When detected                Action


Syntax error in            On operator initialization   The flow will
datascript                                              terminate
                                                        abnormally.


Datascript logic errors    On operator initialization   The flow will
                                                        terminate
                                                        abnormally.




                                                                                                                  103
expressor Documentation



Generated errors          While write function is      The error can be
                          running                      handled. See the
                                                       error handling
                                                       property for more
                                                       details.




Example
This example writes the data from the incoming record to a text file. The operator's input Composite Type consists of
two attributes:

          recordNumber

          recordData

Code for the Write Custom operator



fileName = "output.txt"
function initialize()

  -- Open the output file; error if unsuccessful
  outfile, message = io.open(fileName, "w")
  if not outfile then
    error("open failed for " .. fileName .. " - " .. message)
  end
end
function write(input)
  if string.length(input.data) < 1 then

     -- No data; skip the record
     return false
   else

          -- Write the record
          outfile:write(tostring(input.recordNumber) .. "|" .. input.recordData

      .. "\n")
  end
end
function finalize()

  -- Close the file
  outfile:close()
end




104
                                                                                                                   Operators



Write File Operator
The Write File operator writes data records to a file.

The following table lists the properties that control its functionality.


       Property                                                      Effect


Connection                The name of a Connection file that identifies the location of the output file.


Schema                    The name of a Schema that specifies the metadata structure of the outgoing record.


Type                      Name of the Composite Type the Schema will be mapped to.


Mapping                   Name of the Schema-to-Composite-Type mapping.


File name                 The name of the file to write.


Quotes                    Whether Record fields should always be quoted, never be quoted, or quoted as
                          needed.

                          The default setting is Quote Always. If you upgrade a Write File operator in dataflow
                          that was written with the expressor 3.0 version, which did not have a setting for
                          quotes, the default setting is No Quotes.


Include header            Whether to write a header row at the beginning of the file.


Append to output          Whether to append the output to the contents of an existing file or to delete the file
                          content before writing.

                          If the output file does not exist, create the file.


Append timestamp          Whether to apply a timestamp to the name of the output file. If the file name has an
to filename               extension, the timestamp is placed before the dot. For example, Output.txt would
                          become Output20110519T175630.168945@.txt. The timestamp format is
                          YYYYMMDDThhiiss.milliseconds.

                          If the file name does not have an extension, the timestamp is appended to the end
                          of the name (Output20110519T175630.168945@).


Error handling            Specifies the action to take if a data record fails to pass constraint validation. See
                          Reference: Constraint Corrective Actions.

                          Error handling actions available to the Write File operator are:

                          Abort Dataflow
                          Skip Record
                          Reject Record



                                                                                                                        105
expressor Documentation



                          Skip Remaining
                          Reject Remaining


Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.




Write Lookup Table Operator
The Write Lookup operator initializes or updates a Lookup Table, which is a persistent store of data that can be
accessed by a Lookup rule.


      Property                                                       Effect


Lookup Table              Specifies the name of the Lookup Table artifact that identifies the Lookup Table to
                          which to write.


Error handling            Specifies the action to take if a record contains invalid data. One common reason for
                          invalid data is that it failed to pass constraint validation. See Reference: Constraint
                          Corrective Actions.

                          Error handling actions available to the Write Lookup Table operator are:

                          Abort Dataflow
                          Skip Remaining
                          Reject Remaining
                          Skip Record
                          Reject Record


Show errors               Displays record errors associated with constraint validation and the recovery actions
                          taken. The errors are displayed in the Studio Messages panel and in the expressor
                          command prompt window when using the etask command.


Truncate                  Specifies whether to truncate the database table.




Write Table Operator
The Write Table operator writes to a single database table in a variety of relational database management systems or
proprietary databases.

The following table lists the properties that control its functionality.




106
                                                                                                              Operators



       Property                                                 Effect


Connection             The name of a Database Connection that identifies the location of the output table.



Schema                 The name of a schema file that specifies the metadata structure of the outgoing
                       record.


Type                   Name of the Composite Type the Schema will be mapped to.


Mapping                Name of the Composite Type mapping for the Schema.


Mode                   The type of operation to perform: bulk, normal, update, delete.

                                Bulk and normal modes write (insert) records to the table, optionally
                                 deleting all rows from the table before writing new rows.

                                Update mode changes data in an existing row in the table. Rows are
                                 specified in the key selection property.

                                Merge mode updates existing rows or adds new rows to the table without
                                 deleting existing rows. Rows are specified in the key selection property.
                                 Merge mode is supported for the following databases:

                                         DB2

                                         Oracle

                                         SQL Server

                                         MySQL

                                         Teradata

                                Delete mode deletes rows from the table.


Keys                   Specifies the Schema field(s) used when Mode is set to Update or Delete.


Truncate               Specifies whether to truncate the database table.


Create Missing Table   Creates a table in the target database when a table does not already exist for the
                       schema.

                       Note: If the CREATE TABLE statement attempts to create a table with a schema that
                       contains a user-defined data type, the attempt will fail if the target database does
                       not support the user-defined data type.


Maximum Batch Size     Specifies the maximum number of rows written to the database in one batch. For
                       Sybase databases, the batch size is limited to 150 records per batch for insert,



                                                                                                                   107
expressor Documentation



                         delete, and update modes. If a batch size for a Sybase database is specified higher
                         than 150 records, it will be resized automatically to 150.


Error handling           Specifies the action to take if a record contains invalid data, such as a formatting
                         error.

                         Error handling actions available to the Write Table operator are:

                         Abort Dataflow
                         Skip Record
                         Reject Record
                         Skip Remaining
                         Reject Remaining

Show errors              Displays record errors and the recovery actions taken. The errors are displayed in
                         the Studio Messages panel and in the expressor command prompt window when
                         using the etask command.


Override                 Specify a database, table, and schema to use for the operator instead of those
                         defined in the Schema artifact assigned to the operator. When this option is
                         selected, fields for Current values, Database, Schema, and Table display below.


Current values           Displays the current database, schema, and table specified for the operator. This is
                         a read-only field.

                         Note: Oracle databases show current values for the schema and table. The
                         database value for Oracle is not included and need not be supplied in the
                         Override's Database field.


Database                 A database available through the specified Connection.

                         Note: This field can be left blank when the database you are connecting is an
                         Oracle database.


Schema                   An ODBC schema available with the newly specified database.


Table                    A table available in the newly specified database. The metadata structure of this
                         table must match the Schema artifact named in the Schema operator property.


For best performance, the key(s) should be located in the first field(s) of the Schema that describes the record.

      Note: When executing a normal mode operation against an IBM DB2 database, the user under
              which the dataflow is running must have ALTER or CONTROL privilege on the table,
              ALTERIN privilege on the schema of the table, or SYSADM or DBADM authority.
              Bulk mode is not supported for the IBM DB2 database.




108
                                                                                            Operators



       Since IBM did not introduce table partitioning for DB2 until version 9.5, multi-partition
       connection to write to an IBM DB2 table is not supported in version 8.

Note: To use the database operators with Microsoft MSSQL Server, both the SQL Server
       instance and the SQL Server Browser service must be running on the computer hosting
       the database system. The SQL Server instance must be configured to use mixed mode
       authentication if you want to specify the database credentials from within the project
       (i.e., through the connection file, URI, or configuration variables). In this case, select the
       SQL Server and Windows Authentication mode radio button in the Server
       authentication grouping on the Server Properties, Security page accessed through the
       management console. Optionally, Windows authentication only may be used.

Note: In order to use this operator with Netezza, the user ID under which the motor runs most
       have CREATE EXTERNAL TABLE privileges.

Note: To use the Write Table operator to connect to a Teradata database, you must install the
       Teradata client libraries CLIV2, tdicu, and Teradata generic security service. expressor
       includes the driver used by the Write Table operator to communicate with Teradata,
       but it does not include those libraries. They must be acquired directly from the
       Teradata Download Center.




                                                                                                  109
Operator Templates

Operator Templates
After an operator has been configured, it can be saved as a template to use as a preconfigured operator in another
dataflow or in a different location within the same dataflow. The template for an operator appears under the
Operators tab on the left, under the Templates tab in each of the operator groups. It also appears in the Workspace
Explorer panel as an Operator Templates artifact. To use an operator template in a dataflow, you can drag it from
either the Operators list or the Explorer.

When an Operator Template is used in a dataflow, its configuration can be changed without affecting the original
template. Once an Operator Template is moved into a dataflow, it becomes a separate instance of the template and
so can be changed without affecting the original. In fact, after configuration changes are made, it can be saved as
another template.

Operator Templates made from the Read and Write Custom operators are different from other Operator Templates in
that the configured Composite Type is not preserved in the template. To reuse the Composite Type with the Operator
Template, you can save the Type as a Shared Type and then assign it to the template when the template is reused.


Create Operator Templates
    1.   Select a configured operator in a dataflow.

    2.   Select the Save As Template button on the Dataflow Build tab in the ribbon bar.

    3.   Select the Project or Library in which to save the Operator Template.

    4.   Enter a unique name for the Operator Template.

    5.   Provide a description of the Operator Template (optional).
         The new Operator Template appears under the Operators tab, in the appropriate templates section and as
         an Operator Templates artifact in the Workspace Explorer.

    Note: Operator templates made from the Read and Write Custom operators are different from other
              operator templates in that the configured Composite Type is not preserved in the template. To
              reuse the Composite Type with the operator template, you can save the Type as a Shared Type
              and then assign it to the template when the template is reused.




                                                                                                                      111
Pivot Editor

What is a Pivot?
A pivot is a transformation that changes the structure of records by changing, or "pivoting," columns into rows and
vice versa.

The Pivot Column operator takes data from the columns of multiple records and creates a single record. The Pivot
Row operator takes data from a row of record fields and creates separate records for each field. Records often need
to be pivoted to redirect data to a new database schema.

For example, a column containing records with individual months could be pivoted to create a single record
containing columns for each month.


  Month       ID     Sales


  Jan.        1011   100


  Feb.        1011   200


  Mar.        1011   300


  Jan.        1012   101         ID       Jan.    Feb.     Mar.

                             -
  Feb.        1012   201     -   1011     100     200      300
                             -
  Mar.        1012   301     -   1012     101     201      301
                             -
  Jan.        1013   102     -   1013     102     202      302
                             -
  Feb.        1013   202     >   1014     103     203      303


  Mar.        1013   302


  Jan.        1014   103


  Feb.        1014   203


  Mar.        1014   303



In Pivot Row, a record row with data organized by month could be pivoted to create individual records for each
month.




                                                                                                                   113
expressor Documentation




                                           Month    ID       Sales


                                           Jan.     1011     100


                                           Feb.     1011     200


                                           Mar.     1011     300


 ID          Jan.    Feb.    Mar.          Jan.     1012     101

                                       -
 1011        100     200     300       -   Feb.     1012     201
                                       -
 1012        101     201     301       -   Mar.     1012     301
                                       -
 1013        102     202     302       -   Jan.     1013     102
                                       -
 1014        103     203     303       >   Feb.     1013     202


                                           Mar.     1013     302


                                           Jan.     1014     103


                                           Feb.     1014     203


                                           Mar.     1014     303




Create a Pivot
      1.   Select the Edit Pivot button in the Studio ribbon bar, on the Dataflow Build tab, or in the Properties panel of
           a Pivot Row or Pivot Column operator.


           The Pivot Editor opens to Build
           Output tab. The editor that
           opens is specific to the
           operator being configured,
           either Pivot Row or Pivot
           Column




114
                                                                                                          Pivot Editor



2.   Select the Add or Import button above the Output Attributes box to create the output attributes for the
     Pivot operator.
     Imported attributes come from existing shared Composite Types or local Composite Types associated with a
     Schema artifact.

3.   Click the Specify Transfers tab or the Next button in the Pivot Editor.

4.   Select the input attributes that are to be used as output attributes, without change.
     The list of Output attributes are those attributes specified on the Build Output tab. Input and output
     attributes with the same name are mapped to one another automatically.

5.   Click the Specify Pivots tab or the Next button in the Pivot Editor.
     In the Pivot Row Editor, Input attributes must be selected with Select... at the bottom of the Input attributes
     box to specify which attributes are to move into Output attribute selected from the Pivot into drop-down
     menu. The Values into drop-down menu is used to indicate the output attribute into which the data values
     contained in each input attribute is to be placed.

6.   Click the Edit Output tab or the Next button in the Pivot Editor.
     The final tab shows how an input record for the Pivot operator will be structured for output.

7.   Edit the Pivot attribute if necessary.
     Only the record fields in the Pivot attribute record can be edited. For example, if the Pivot attribute is Month,
     you can change the record field names from abbreviations (Jan., Feb., etc.) to full name (January, February,
     etc.).




                                                                                                                  115
Rules Editor

Transformations
Transformations are performed by the expressor Operators that transform data (e.g., Transform and Aggregate) and
can be edited in the Rules Editor. Transformations are accomplished with rules written in the Rules Editor. There are
six types of rules that can be written in the Rules Editor:

           Expression Rules

           Function Rules

           Lookup Expression and Function Rules

           Aggregation Rules

           Change Rules

Expression Rules can have one or multiple inputs, but they can have only one output. Because Expression Rules have
only one output, they have only one function.

Function Rules can have multiple inputs and multiple outputs. They can also contain multiple functions and an
iterator.

Lookup Rules populate output attributes by reading data from a Lookup Table using Lookup Keys. Lookup Rules can
be written as expressions (Lookup Expression Rules) and functions (Lookup Function Rules).

Aggregation Rules select the values from a group's records that will be used in the Aggregate operator's output
attributes.

Change Rules compare records in the Aggregate operator and return true or false, depending on the requirements of
the comparison.

To open the Rules Editor:

     1.     Select a shape in the Dataflow panel that represents an operator that performs transformations.

     2.     The operator's properties appear in the Properties panel.

     3.     Click the Edit Rules button in the Dataflow Build tab on the ribbon bar or on the Properties panel.
            The Rules Editor opens.




See Write Rules for instructions on writing the different types of rules.




                                                                                                                    117
expressor Documentation



The operators for which transformation can be written with expressor Datascript are:

Aggregate

Filter

Join

Transform

Two connectivity operators also use Datascript to read and write data from/to custom data sources.

Read Custom

Write Custom

As with the transformation operators, rules for the two connectivity operators are written in the Rules Editor.


Learning to Write Datascripts
Script Writing Concepts

expressor Operators and Functions

Where to Write Datascript

Using Functions Together

expressor Datascript is easy to learn by people unaccustomed to writing scripts, and for experienced programmers,
it provides such a rich set of functions that they can start immediately writing complex transformations. The following
topic progresses from basic concepts and illustrations for beginners to examples of complex scripts that use
Datascripts' built-in functions in increasingly complex combinations.

expressor engineers and customers are continually finding new applications for Datascript, and many of these will be
posted on the expressor Community web site. For starters, if you feel you are ready to jump in, read the paper entitled
expressor Datascript on the Community site. Beginners may first need to understand some of the concepts in the
Script Writing Concepts section below, but almost everyone will find the paper's presentation of Datascript and the
Studio interface for writing scripts a very helpful primer.

Script Writing Concepts
Whether you are writing in a scripting language like Python or expressor Datascript or a programming language like
C or C++, there are really only a few concepts you need to understand, and they form the basis of the most complex
programs or scripts you will ever see or write. Let's start with the following:

          variables

          assignment statements

          functions

          data types




118
                                                                                                                Rules Editor



Variables are very important because we don't want have to name each piece of data processed in an application.
They are critically important in data integration applications because those applications process large amounts of
data. Here's a simple use of a variable:

     x = 3

The variable is "x" and it is given the value 3. Next you could assign the value of 4.

     x = 4

The variable "x" is the same, but its value has changed. It is "variable."

When you write an expressor Datascript, things like "Order_Number" are variables. Order_number is the name given
to all the different pieces of data in a particular data field. Instead of naming each piece of data uniquely, like
Order_Number1, Order_Number2, etc., you simply read the value in the field into Order_Number, process it, and then
read the next value from the field into Order_Number.

Assignment statements. Guess what? You've just seen one. x = 3. That's an assignment statement. It assigns a value
to "x."

Is "a = b" an assignment statement? Sure, it assigns the value of "b" to "a." Pretty simple. And it can get pretty
complex. But the basis of most expressor Datascripts is an assignment statement. You assign an input value to an
output value. It can be that simple. For example,

     output.Order_Number = input.Order_Number

Simple is fine. Most of the time, you are not interested in changing an order number. But let's say you are running a
real big splash promotion, and you are going to double the size of all current orders. You would take another field in
the record that contains Order_Number and change it. That field is probably named something like "Order_Size." The
assignment statement to double the order would be:

     output.Order_size = 2 * input.Order_Size

Now you have some very happy customers, and you know how to make an assignment statement in Datascript.

A Note about the format of names in these statements. expressor Studio appends a prefix to variable names to
indicate whether they hold data from input to the script or output. When you create a new Expression Rule in the
Rules Editor, you see:




Functions are reusable pieces of script or code. They are written for processing steps that are used multiple times.
Once you have a function, you can execute the steps in the function simply by referring to it. You do not have to
rewrite the code (or copy and paste it) each time you want to repeat those processing steps.

Here's a simple function you can write:

     function change_num(number)
        number = 3 * number
        number = number/2



                                                                                                                        119
expressor Documentation


            return number
      end

Can you explain what happened here? Probably. Not much really. The function is given the name "change_num," and
it takes a value named "number" and manipulates it. It multiplies the variable "number" by 3 and then divides it by 2
and returns the new number.

When you "call" or refer to this function, you would type:

      change_num(4)

In this call, 4 is the number you want to process.

Let's stop and look at an integral part of any function, the "return" statement. The return statement specifies what is
going to come out of the function when it is done. In this case, the function returns the value in the "number"
variable. If you don't get something out of the function, then it is not going to do you much good.

Now we are going to write a little script that creates and calls a function and shows you what the result is. Open a text
editor like Notepad and enter the following:

      function get_average(table)
                      len = #table
                      total = 0
                      for i = 1,len do
                          total = total + table[i]
                      end
                      return total/len
      end

      scores = {84, 90, 79, 96, 90, 86, 86, 96, 88, 81}

      avg_score = get_average(scores)

      print("Average Score is", avg_score)

The first thing you are doing in this script is defining the function named "get_average." The word "table" in
parentheses designates the data that the function is going to get. The function determines how many elements are in
the input "table" with the # function. When the # function determines the number of elements in the variable
designated by "table," the number it gets is assigned to the variable "len," which as you might guess is short for
"length."

The function then defines two more variables and initializes or sets a beginning value for them. The variable "total"
will be used to add up the values in the table. To start, it is set to zero. The variable "i" will be used as an index to keep
track if which element we are working with in the table.

Notice though that the assignment statement for "i" is preceded by the word "for." In Datascript, "for" sets up a loop.
It starts with i=1, and continues until i=len. Each time the for statement executes, it increases the value of "i" by 1 and
then executes the statement or statements after the word "do." In this script, there is just one statement after "do":
total = total + table[i]. That is an assignment statement that makes the variable "total" equal to the current value of
"total" plus the value of the element in the table at the location indicated by the index variable "i." This statement will



120
                                                                                                                Rules Editor



continue to be executed, adding to the value of "total," until "i" is greater than the variable "len." Since "len" indicates
the number of elements in the table, the for statement loop will have added all the table elements to the total by the
time it is done.

The word "end" marks the end of the for statement loop. Then the function returns the average by dividing the total
by the variable "len." The second "end" marks the end of the function "get_average."

Next, the script creates a table named "scores." It puts a number of test scores or golf scores or whatever scores
you're dealing with in a list separated by commas.

Now that we have data, we can use the get_average function to give us the average of the values in the scores table.
We've set up the get_average function to return the average of the scores in a table, so we write an assignment
statement with a new variable named "avg_score" and assign it the value returned by get_average. Notice that when
we call the get_average function, we substitute the word "scores" for the word "table" because that is the particular
table we want to process with the function.

When the function finishes processing all the values in the scores table and assigns the average to the avg_score
variable, we invoke the built-in function named "print" to print out the results. In doing so, we add some text to
indicate what the printed number is. The print function takes to pieces of input, separated by a comma. The first is the
text "Average Score is" enclosed in quotation marks and the second is the number in the variable avg_score.

Go ahead and save the file you've written the script in with the name "average."

    1.   Open the expressor command prompt window from the Windows Start menu.

    2.   Change directory to the one where you saved "average."
         For example,

         C:\cd Users\me\Documents

    3.   Type:

         datascript average

The results should be:

         Average Score is                      87.6

Data types is the last important concept you need to understand. There are three basic types of data:

        alphanumeric and related characters (e.g., %, @, !)

        numbers

        dates and times

There are many variations within those types, such as integer numbers and decimal numbers.

When you perform operations on data, you have to know which type you are dealing with because something you
can do with numbers, like multiply them, you can't do with alphanumeric data. Even though alphanumeric data
includes characters that represent number (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), they are not the same as a number data type.
They are not numbers like those used in the Integer data type. The alphanumeric data type is usually referred to
String data, as in character strings.




                                                                                                                         121
expressor Documentation



expressor Operators and Functions
Before we start building datascripts with expressor Datascript's built-in functions, we should talk about where
Datascript fits into expressor Dataflows.

It is the operators in a dataflow that control what get's done in the data-integration application. Read File, Read Table,
and SQL Query operators read data into the dataflow and prepare it for processing by mapping it from the schema
structure it has in the source file or database to the expressor Semantic Types that define the data as it moves
through the dataflow.

On the other end of the dataflow, Write File and Write Table prepare the data for output to a file or database. Neither
the read or write operators use Datascript.

It's the operators used between the read and write operators that do the processing. Some of them are very
specialized and do only one thing. The Copy operator, for instance, takes an input record and makes copies of it.
Nothing more. It doesn't change the input record in any way. Just copies it.

There are operators, however, whose basic function can be tailored by using Datascript. Those operators are
Transform, Filter, and Aggregate. Each of these three has a basic function with the same name as the operator. Let's
have a look at one of them, Transform.




As you can see, the function name is "transform," and it takes one input. The input it takes is the input record, which
in this case includes two fields (Lastname, Firstname).

We'll do something simple here and show more complicated stuff later. For now, we are simply combining the last
names and first names to get a full name for each customer. To do this, we use the string.concatenate function.
What is that dot in the function name? Well, that's there because there are lots of functions associated with the string
type. (Go here to see all of them.)

Let's look at one more of the operators that take Datascript--Filter. The object of this operator is to separate data
based on some criteria. What criteria? That's up to you.

The Filter operator wants to know what is true and what is false. The records that test true will go out the top output
port on the Filter operator shape (output0) and the false will go the bottom port (output1).

So let's write a little datascript to test the input. The input records are US presidents names, the order in which they
held office. The last is their political party. We are going to filter the presidents by political party. To do that, we look
for a match on one of the parties; in this case, we'll test for "Democratic." For that, the built-in function
string.match does the job.




122
                                                                                                              Rules Editor




Where to Write Datascript
The datascript examples in the screen shots above are all displayed in the Rules Editor. The Rules Editor is the
interface for writing Datascript for all the operators for which you can write datascipts.




Notice what we have in the Rules Editor: an assignment statement, one of the basic concepts for writing scripts. In fact,
all four of the basic concepts come into play in this assignment statement. First is the variable for output, which is set
as output.CustomerFullname. What you script on the right becomes the value assigned to the output variable. It
could be simple, an assignment of the input variable to the output. But here we have a function--string.concatenate--
transforming two input variables into one output variable. And of course, data type is critical because we are using a
function that manipulates String data. The input variables must contain String data in order for this transformation to
work.

Using Functions Together
Now that we understand how basic programming concepts are used in expressor Datascript and how to write
datascripts in expressor Studio, we are ready to try more complicated scripts that use multiple functions together.
One thing you will see is that many datascripts can be written as a single assignment statement because expressor
Datascript provides a wealth of operators to help you accomplish common scripting tasks.

Let's start by building on a statement we have already used--the concatenation of our customers' first names and last
names.

Whereas in the first transformation we combined two input attributes into one (CustomerFullname), in the next
transformation we are going to use three input attributes that contains not only the customers full name but his or
her geographic location as well. What we're going to do with this is write an assignment statement with nested
functions.




                                                                                                                       123
expressor Documentation




Now we will have a record that contains our customers' full names and what part of the country they are from. We
start with the same string concatenation as earlier, but instead of leaving it at that, we have added a second function.
The decision function acts like an if-elseif-else statement. It checks the value of the input.State variable to see if it is
equal to one of the postal service abbreviations for state names. See the function documentation for a full description
of the decision function.

If you prefer, you can use explicit if-elseif-else statements.




Mapping Input to Output Attributes
In the Rules Editor, input attributes are automatically mapped to the same attributes for output. Input attributes
cannot be changed. They are always determined by the output of the upstream operator. Furthermore, they cannot
be removed from the list of output attributes.

The input attributes can, however, be changed by rules that govern their mapping to output attributes. If no rule is
written for an input attribute, then it transfers as is to the same-named attribute for output.


When the Rules Editor is opened,
before any rules are written, that
is the mapping state displayed--
the input attributes are
transferred to identical output
attributes.




124
                                                                                                              Rules Editor



The arrow icon next to an output
attribute indicates the direction
from which it comes to the
operator's output.

If the downstream operator has
different input attributes than the
upstream operator's output, then
its additional attributes will also
be listed as an output attribute
for the upstream operator.




                                                                                      The CustomerFullname
                                                                                      attribute has an arrow icon
                                                                                      that points in the opposite
                                                                                      direction than those attributes
                                                                                      that are mapped from the
                                                                                      input. That icon will change to
                                                                                      a diamond icon when a rule is
                                                                                      written that uses the attribute
                                                                                      for its output.




The attributes whose icons contain arrows cannot be changed. The Rules Editor Home tab on the ribbon bar contains
an Edit button for Output Attributes, but that button is not active when an output attribute with an arrow icon is
selected. That is because the properties of those attributes are determined by their source, either the upstream
operator or the downstream operator.

Output attributes can, however, be added, with either the Add, Copy, or Import button. The properties of those
attributes are, of course, set when the attribute is added to the output attributes in the Rules Editor. For example,
when you add an attribute, the same Add Attribute dialog box that is used for adding attributes with the Composite
Type editor displays. That dialog box enables you to name the attribute, define a default value, specify the data type,
and set constraints.

Copy allows you to create a copy of a selected attribute, regardless of the type of attribute (Transferred, Local, or
Required). The Make Local Copy option creates a copy of the selected attribute and adds or increments a number at
the end of the name. For example, a copy of the Street attribute would be named Street1. The Replace with Local
Copy option turns a Transferred or Required attribute into a Local Attribute.

Local attributes, whether created by adding or copying, must be assigned a value to transfer downstream. If a Local
attribute is created but not assigned a value, the operator will be flagged with an error. The usual way of providing
Local output attributes with values is by mapping them to the output of a rule. Values can also be assigned by




                                                                                                                        125
expressor Documentation



allowing the attribute to be null or by assigning it a default value. Null and default settings are assigned in dialog
boxes used for adding and editing attributes.




Attributes
created by
the user with
the Rules
Editor are
labeled with
a diamond
icon. The
diamond
icon is also
used for
output
attributes
that have
been
mapped to
an input
attribute.

The
following
illustration of
the Rules
Editor shows
how both
the attribute
from the
downstream
operator and
a newly
created




126
                 Rules Editor



attribute are
mapped
from input
attributes
through the
use of rules.

In this
example, the
input
attributes
mapped to
output
attributes
through
rules are
mapped to
output
attributes
with
different
names.
Nevertheless,
all the input
attributes are
listed as
output
attributes.
They will all
move
downstream
to the next
operator and
continue
moving
downstream
even if no
operator
uses them. In
that case,
their data
will simply
be discarded




                         127
expressor Documentation



by the Write
operator at
the end of
the flow.


This automatic transfer of attributes can be blocked in the Rules Editor by selecting an input attribute and clicking the
Block Transfer button in the ribbon bar.


Write Rules
Select a Type of Rule

Connect Input and Output Attributes to Expression and Function Rules

Connect Input and Output Attributes to Lookup Rules

Write the Rule

Execute Multiple Functions in a Rule

Use an Iterative Function

Select a Type of Rule
There are six types of rules that can be written in the Rules Editor:

           Expression Rules

           Function Rules

           Lookup Expression and Function Rules

           Aggregation Rules

           Change Rules

Expression Rules can have one or multiple inputs, but they can have only one output. Because Expression Rules have
only one output, they have only one function.

Function Rules can have multiple inputs and multiple outputs. They can also contain multiple functions and an
iterator.

Lookup Rules populate output attributes by reading data from a Lookup Table using Lookup Keys. Lookup Rules can
be written as expressions (Lookup Expression Rules) and functions (Lookup Function Rules).

Aggregation Rules select the values from a group's records that will be used in the Aggregate operator's output
attributes.

Change Rules compare records in the Aggregate operator and return true or false, depending on the requirements of
the comparison.

To create a new rule:

      1.    Click the New Rule button on the Home tab of the Rules Editor's ribbon bar.




128
                                                                                                              Rules Editor



    2.   Select a rule type from the New Rule button's drop-down list.
         A rule editing box opens in the center panel of the Rules Editor.




Rules are enabled by default. Rules can, however, be disabled when they are not needed. The Disable button in the
Settings section of the Rules Editor's ribbon bar turns a rule off. Similarly, the Enable button turns the rule back on.

    Note: The Change Rule is not selected from the New Rule drop-down menu. It appears automatically
              when the Aggregate operator is opened in the Rules Editor. It is disabled by default. There can be
              only one Change Rule in an Aggregate operator.

Connect Input and Output Attributes to Expression and Function Rules
    1.   Select an input attribute and drag the cursor to the Input label in the Rule box.
         Repeat this step for any additional input attributes you intend to transform with the rule.

    2.   Select an output attribute and drag the cursor to the Output label in the Rule box.
         Repeat this step for any additional output attributes you intend to use with the rule.




Connect Input and Output Attributes to Lookup Rules
The input and output parameters of Lookup Rules are determined by the Lookup Table that is chosen in the Rule box.
Input attributes are mapped to specific input parameters in the Rule box, and output attributes are mapped to
specific output parameters in the Rule box.

    1.   Select the Lookup Table from the Lookup drop-down menu in the Rule box.

    2.   Select the Lookup Key from the Key drop-down menu in the Rule box.

    3.   Select an action from the On fail drop-down menu.
         This selection does not affect the mapping of input and output attributes, but if you select the Generate
         Record option, the Lookup Rule requires that you write Datascript to generate replacement values for the
         record. You can also elect to use the generated data for output only or to update the Lookup Table as well.




                                                                                                                       129
expressor Documentation



      4.   Select an input attribute and drag the cursor to an input parameter in the Rule box.
           Repeat this step for any additional input parameters the Lookup Key includes for input to the Lookup Rule.

      5.   Select an output attribute and drag the cursor to an output parameter in the Rule box.
           Repeat this step for any additional output parameters you intend to map from the Lookup Rule to the
           operator's output attributes. Not all output parameters must be mapped to output attributes. Output
           attributes might be mapped to output parameters from other rules in the operator, or they might be left
           unmapped if they are not required by downstream operators.

Write the Rule
The type of rule, Expression, Function, or Lookup, determines how the rule is written. Nevertheless, the Rules Editor
provides several ease-of-use features that help with writing all rules.

Rules are written with expressor Datascript. Basic instructions for writing Datascript are provided in the topic
Learning to Write Datascripts. Complete instructions are contained in The expressor Datascript language.

The foremost tool is the Edit tab on the Rules Editor's ribbon bar. The Edit tab provides access to all of expressor's
built-in Datascript functions.




The Rules Editor also provides
typing-completion aides. When you
start typing an input or output name
or a Datascript function name, a list
of completion options pops up. At
first, the option pop-up window has
a white background. At this point,
you can choose an option with the
cursor.


As you continue typing, the
background in the pop-up window
turn blue. At that point, the typing
completion tool has determined
which option you are typing and
allows you to complete the typing by
pressing the Enter key.


If you are typing the name of a built-in Datascript function, the typing-completion tool will narrow your choices as it
receives more typing input with which to narrow the selection.



130
                                                                                                          Rules Editor



For example, when typing
string.concatenate, typing-
completion will start by listing all the
functions beginning with "S" as you
start typing.

As your typing narrows the choices,
the tool makes the selection so that
all you need do is hit Enter.

When you type a period after the
function name, the typing-
completion tool pops up a list of all
the subfunctions for string.




                                                             Again, as your typing narrows the options, the tool
                                                             displays the shorter list and finally the choice matching
                                                             what you've typed.




Write Expression Rules
An Expression Rule takes one or more input attributes and returns one or more output parameters that can be
mapped to output attributes. An Expression Rule is a simple assignment statement in which an expression, such as a
string concatenation, manipulates the input parameters to produce one output parameter.

    1.   Select Expression Rule from the New Rule button's drop down menu on the Home tab of the Rule Editor's
         ribbon bar.




                                                                                                                   131
expressor Documentation



           An empty Expression Rule box opens in the center panel of the Rules Editor.




      2.   Add one or more input parameters to the rule by dragging a map line from an input attribute to the Add
           Input label in the rule box.

      3.   Add one output parameter to the rule by dragging a map line from an output attribute to the Add Output
           label in the rule box.

      4.   Place the cursor in the center section of the rule box and type an expression to transform the input to the
           desired output value.
           For example, building on the attributes in the screen shot above, you could write the following string
           concatenation expression.




Basic instructions for writing Datascript expressions are provided in the topic Learning to Write Datascripts. Complete
instructions are contained in The expressor Datascript language.

Write Function Rules
      1.   Select Function Rule from the New Rule button's drop down menu on the Home tab of the Rule Editor's
           ribbon bar.

      2.   Add one or more input parameters to the rule by dragging a map line from an input attribute to the Add
           Input label in the rule box.

      3.   Add one or more output parameter to the rule by dragging a map line from an output attribute to the Add
           Output label in the rule box.

      4.   Place the cursor in the center section of the rule box and type expressions to transform the input to the
           desired output value.
           The starting point code for a Function Rule is a function that generally has the same name as the operator
           for which it is being written. For example, the Transform operator's main function is named "transform."



132
                                                                                                     Rules Editor



  When the application runs, the expressor Engine invokes this function at the appropriate time (e.g. as each
 record is processed by the operator).

In addition to an operator's principal function, a Function Rule can also contain helper functions. Helper
functions are used in conjunction with the main function and may precede it in execution. For example, the
initialize function can be used in all of the datascript operators. It executes before a function like
transform, usually to initialize variables that will be used in the transformation.


The specific rules of
the transformation
are written into the
transform
function. The
function in the
illustration on the
right uses the input
value from the State
attribute to create
data for the Region
output attribute.


Each rule written in a Transform operator uses a transform function, though each rule will be different.


In the
example on
the right,
two rules
are used
within a
Transform
operator to
transform
different
input
attributes.

The first
rule takes
two input
attributes
and
concatenat
es them
into one




                                                                                                             133
expressor Documentation



      output
      attribute.

      The second
      rule
      transforms
      the name
      of state,
      such as MA,
      to a new
      value that
      indicates
      the region
      it is in. The
      rule's
      output is
      sent to the
      Region
      output
      attribute.

      Rules are
      written with
      expressor
      Datascript.
      Basic
      instructions
      for writing
      Datascript
      are
      provided in
      the topic
      Learning to
      Write
      Datascripts.
      Complete
      instructions
      are
      contained
      in The
      expressor
      Datascript
      language.




134
                                                                                                            Rules Editor



Write Lookup Expression and Function Rules
The difference between Lookup Expression Rules and Lookup Function Rules comes into play when Generate Record
is selected as the On miss option. In that case, an one or more expressions or functions are required to generate data
that is used in place of the data that is missing from the Lookup Table.

Both types of Lookup Rules can lookup in a range rather than look for an exact match on the key value.

    1.   Select Lookup Expression Rule or Lookup Function Rule from the New Rule button's drop down menu on
         the Home tab of the Rule Editor's ribbon bar.
         In either case, the top part of the rule box that opens in the center panel of the Rules Editor has the same
         drop-down menus.




    2.   Select the appropriate Lookup Table artifact from the Lookup drop-down menu.

    3.   Select a key field for searching the Lookup Table from the Key drop-down menu.
         The keys available for the lookup are specified when the Lookup Table artifact is created.

    4.   Select the action to take if the lookup fails from the On miss drop-down menu.

             a.   If you select Generate Record as the failure action in a Lookup Expression Rule, an expression entry
                  field displays in the center panel of the rule box.




                  There is one expression entry field for each output parameter on the rule. Write a single-statement
                  expression using the functions on the Edit tab of the Rules Editor's ribbon bar. The output
                  generated by the datascript expression is used in place of the record that was not found in the
                  Lookup Table.


                  If you want to add the generated record to the Lookup Table, use the check box in the lower right
                  corner of the rule box.

             b.   If you select Generate Record as the failure action in a Lookup Function Rule, the generate
                  function displays in the center panel of the rule box.




                                                                                                                        135
expressor Documentation



                  Write datascript for the generation of Lookup Table output using the functions on the Edit tab of the
                  Rules Editor's ribbon bar. The output generated by the datascript function is used in place of the
                  record that was not found in the Lookup Table.

                   If you want to add the generated record to the Lookup Table,you must add an expressor.lookup.writer
                  function before the return statement. Note that there is an automatically inserted comment for the
                  writer:execute method. Place the writer function beneath that comment.

                  Basic instructions for writing datascript are provided in the topic Learning to Write Datascripts.
                  Complete instructions are contained in The expressor Datascript language. See the lookup functions
                  for a complete description of the functions to use in a Lookup Function Rule.

Write Aggregation Rules
      1.    Select Aggregation Rule from the New Rule button's drop down menu on the Home tab of the Rule Editor's
            ribbon bar.
            An empty Aggregation Rule box opens in the center panel of the Rules Editor. Note that the rule entry panel
            in the center contains a drop-down menu. After the input and output parameters are set, the drop-down
            menu will contain the appropriate options for the input and output (e.g., string data or numeric data).



      2.    Add one input parameter to the rule by dragging a map line from an input attribute to the Add Input label in
            the rule box.
            Aggregation Rules have only one input parameter. The Aggregate operator processes groups of records, and
            separate Aggregation Rules are used for each field in the record.

      3.    Add one output parameter to the rule by dragging a map line from an output attribute to the Add Output
            label in the rule box.
            Aggregation Rules have only one output parameter. Separate Aggregation Rules must be written for each
            output attribute that is to be passed downstream from the Aggregate operator.

      4.    Select an option from the drop-down menu that will select the appropriate group input to use as output.

           Fields with string data have the following selection options:

                  first specifies the value from the first record in each group

                  last specifies the value from the last record in each group

                  min specifies the value that comes first alphabetically in each group

                  max specifies the value that comes last alphabetically in each group

           Fields with numeric data have the following selection options:

                  first specifies the value from the first record in each group

                  last specifies the value from the last record in each group

                  count sets the value to the number of records in the group

                  count null sets the value to the number of records in the group in which the input field is null. The
                  input field can be a string or numeric data type.



136
                                                                                                                Rules Editor



              count not-null sets the value to the number of records in the group in which the input field is not null.
              The input field can be a string or numeric data type.

              count unique sets the value to the number of unique values in the input field in the group. The input
              field can be a string or numeric data type.

              min specifies the value that comes first alphabetically in each group

              max specifies the value that comes last alphabetically in each group

              average sets the value to the average of the values in the input field. Fields containing a null value are
              not used in this calculation.

              sum sets the value to the sum of the values in the input field.

              std deviation sets the value to the standard deviation of the values in the input field. Fields containing
              a null value are not used in this calculation.

Write Change Rules
   1.   Open an Aggregate operator in the Rules Editor.
        The Change Rule appears at the top of the center panel.

   2.   Open the rule by clicking the open/close arrow in the right corner of the Change Rule box.




   3.   Add one or more input parameters to the rule by dragging a map line from an input attribute to the Add
        Input label in the rule box.
        The Change Rule does not have output parameters to map to the Aggregate operator's output attributes
        because the rule returns a boolean value.

   4.   Place the cursor in the center section of the rule box and type expressions to compare the input record with
        the previous input record.
        The comparison will determine when to start, or change to, a new group. When the rule returns true, a new
        group starts. As long as it returns false, the operator will continue to aggregate the input records.


        Rules are written with expressor Datascript. Basic instructions for writing Datascript are provided in the
        topic Learning to Write Datascripts. Complete instructions are contained in The expressor Datascript
        language.

   5.   Select the Enable button in the Settings section on the Home tab of the Rules Editor's ribbon bar.
        The Change Rule must be enabled and the Use Change Function checked in the Aggregate operator's
        Properties before the rule can be executed in the dataflow.




                                                                                                                        137
expressor Documentation



Execute Multiple Functions in a Rule
An operator can have multiple rules, and each rule can have multiple functions. When a rule contains multiple
functions, those functions are executed in a sequence determined by their type. The order for the types of rules is:

      1.   change (Aggregate operator only)

      2.   initialize (Executed once, before any input records are processed)

      3.   filter (Executed once per record)

      4.   transform, aggregate, and join (Executed once per record)

              Note: These three functions are used in the respective operators with the same names. They are
                        not used together in any operator.

      5.   sieve (Executed once per record)

      6.   finalize (Executed once, after all input records have been processed)

When an operator contains multiple rules, each type of function is executed in each rule before the next type of
function is executed in any rule. For example, in a Transform operator that uses two rules and both rules contain
initialize, filter, transform, sieve, and finalize functions, the initialize functions is both rules would be executed before a
filter function in either rule.

The filter and sieve functions are boolean functions. If they return false, processing stops for all rules at that
point.

See expressor Operator helper functions for details on all of these functions.

Use an Iterative Function
Function Rules and Lookup Function Rules in Transform, Aggregate, and Join operators can contain functions that
iterate a value. An iterative function executes repeatedly until it returns nil. For example:

      function iterator()
        index = index + 1;
        if index <= count then
          return { copy=index, value=value }
        else
          return nil -- done
        end
      end -- of iterator function

An operator can contain only one rule that uses an iterative function.

Lookup Function Rules that use non-unique keys are iterative by nature, so a rule is marked Iterative as soon as a
non-unique key is chosen for the lookup.

The generate function in a Lookup Function Rule cannot return an iterator for a Lookup Table that has Unique keys.

Function Rules that contain an iterative function must be tagged. (Lookup Function Rules are automatically tagged as
Iterative when a non-unique key is chosen.)




138
                                                                                                            Rules Editor



    1.    Select the Function Rule containing an iterative function.

    2.    Select the Iterative button in the Settings section on the Home tab of the Rules Editor's ribbon bar.
          This marks the rule as iterative.


         An operator can have only
         one rule that is iterative. If
         more than one rule is
         marked as Iterative, the
         Iterative symbols on the
         rules turns red. Using this
         error check while writing
         rules will prevent the
         multiple-iterator error
         from occurring at runtime.




Use Ranges in Lookup Function and Expression Rules
Lookup Tables can be constructed in ranges so that key values can be searched for in a range rather than as an exact
value. For example, if a Lookup Table contains records with key values of 100, 200, 300, and 400, a range lookup on a
value of 80 would return the record with the key value of 100, 150 would return the record with the key value of 200,
and 300 would return the record with the key value of 300. If lookup was not done by range, then 80 and 150 would
be misses because they do not have an exact match in the Lookup Table.

To designate a Lookup Rule as a range rule:

    1.    Select the Lookup Rule in the Rule Editor.

    2.    Click the Range button next to the Key drop-down menu in top portion of the Lookup Rule box.




                                                                                                                    139
expressor Datascript language

The Datascript language
The expressor Datascript language is a light-weight scripting language with robust functionality that is easy to use
and in addition gives you the user the ability to extend it by including independently written scripts.

Datascript is used with expressor Operators that are capable of modifying the content of a record. Those operators
are:

          Aggregate

          Filter

          Join

          Transform

Datascript is also used with two connectivity operators to read and write data from/to custom data sources.

          Read Custom

          Write Custom

When any of the operators that use Datascript are selected, the Edit Rules button is enabled on the Build tab of the
Studio ribbon bar.

enables you to write more complex datascript statements.

To enter Datascript code, you simply start typing. Appropriate entries for that point in the script appear in a pop-up
list of available functions or fields.




In the following documentation, the expressor Datascript language constructs are explained using the usual
extended BNF notation, in which:

          {a} means 0 or more a's

          [a] means an optional a

          non-terminals are shown as non-terminal

          keywords are shown as kword

          other terminal symbols are shown as `=´.

The following topics cover the rules and conventions of the Datascript language.




                                                                                                                    141
expressor Documentation



expressor patterns

Lexical conventions

Values and Types

Variables

Statements

Expressions

Visibility rules

      Note: expressor Datascript is based on the Lua scripting language developed at the Pontifical Catholic
               University of Rio de Janeiro in Brazil. Information on the language is available in the Lua
               Reference Manual. expressor Datascript includes functions that are not part of Lua, and some
               functionality available in Lua has been removed from expressor Datascript. Specifically, the
               ability to invoke functions in the Lua I/O, OS, and debug libraries, as well as the ability to call
               batch/script files, are not accessible to an expressor script. Additionally, some Lua functions have
               been overwritten or removed from expressor Datascript. Consequently, the function API
               descriptions included in the expressor documentation are definitive and should be used instead
               of the descriptions in the Lua Reference Manual.




142
                                                                                          expressor Datascript language



expressor Datascript Pattern Matching
You can use pattern matching in the scripting in Transform, Filter, and Aggregate operators.


Character Class                  Pattern Item


Pattern                          Captures


Character Class
A character class is used to represent a set of characters. The following character combinations are allowed when
describing a character class.

         Any keyboard character represents itself.

                  The characters ^$()%.[]*+-? are "magic" characters that cannot directly represent themselves.
                   These characters must be escaped with a preceding % character.

                  The % escape character can be placed before any non-alphanumeric character (e.g. punctuation
                   marks, slashes or the pipe character) to ensure that no special interpretation is attached to the
                   character.

         A dot . represents all characters.

In the following pattern combinations, use the %! prefix rather than the % prefix when working with strings containing
unicode characters.

         The combination %a represents all letters.

                  The combination %A represents the complement of %a.

         The combination %c represents all control characters.

                  The combination %C represents the complement of %c.

         The combination %d represents all digits.

                  The combination %D represents the complement of %d.

         The combination %l represents all lower case letters.

                  The combination %L represents the complement of %l.

         The combination %p represents all punctuation characters.

                  The combination %P represents the complement of %p.

         The combination %s represents all space characters.

                  The combination %S represents the complement of %s.

         The combination %u represents all upper case letters.

                  The combination %U represents the complement of %u.



                                                                                                                       143
expressor Documentation



         The combination %w represents all alphanumeric characters.

                   The combination %W represents the complement of %w.

         The combination %x represents all hexadecimal digits.

                   The combination %X represents the complement of %x.

         The combination %z represents the character with representation 0.

                   Use %z rather than the zero character in describing a character class.

                   The combination %Z represents the complement of %z.

         The combination %x, where x is a non-alphanumetric character, represents the character x.

                   Use this combination to represent the "magic" characters in a character class.

Set
A character class that includes a union of characters is indicated by enclosing the characters in square brackets:
[...] is referred to as a set.

You can specify a range of characters in a set by separating the end characters of the range with a dash. All character
combinations in the preceding list can also be used as components in a set. All other characters in the set represent
themselves.

For example:

              [%w_] or [_%w] represent all alphanumeric characters plus the underscore

              [0-7] represents the octal digits

              [0-7%l%-] represents the octal digits plus the lowercase letters plus the '-' character

The complement of a set is represented by including the ^ character at the beginning of the set. For example, [^1-
5] represents the complement of the set [1-5].

For all character combinations, the corresponding combinations that use uppercase letters represent the complement
of the combination. For example, %S represents all non-space characters (since %s represents all space characters).

The definitions of letter, space, and other character groups depend on the current locale. In particular, the class [a-
z] may not be equivalent to %l.

Pattern Item
A pattern item can be represented by:

         A single character class, which matches any single character in the class.

         A single character class followed by *, which matches zero or more repetitions of characters in the class.
          These repetition items always match the longest possible sequence.

         A single character class followed by +, which matches 1 or more repetitions of characters in the class. These
          repetition items always match the longest possible sequence.




144
                                                                                            expressor Datascript language



         A single character class followed by -, which also matches zero or more repetitions of characters in the class.
          Unlike '*', these repetition items always match the shortest possible sequence.

         A single character class followed by ?, which matches zero or 1 occurrence of a character in the class.

         The character combination %n, for n between 1 and 9, which matches a substring equal to the n-th captured
          string.

         The character combination %bxy, where x and y are two distinct characters, which match strings that start
          with x, end with y and where the x and y are balanced. This means that in reading the string from left to
          right, counting +1 for an x and -1 for a y, the ending y is the first y where the count reaches 0. For example,
          the item %b() matches expressions with balanced parentheses.

Pattern
A pattern is a sequence of pattern items:

             ^ at the beginning of a pattern anchors the match at the beginning of the subject string.

             $ at the end of a pattern anchors the match at the end of the subject string.

             At other positions, ^ and $ have no special meaning and represent themselves.

A pattern cannot contain embedded zeros. Use %z instead.

Captures
A pattern can contain sub-patterns enclosed in parentheses, which describe "captures." When a match succeeds, the
substrings of the analyzed string that match the captures are stored (captured) for future use.

Captures are numbered according to their left parentheses. For example, in the pattern (a*(.)%w(%s*):

             The part of the string matching a*(.)%w(%s*) is stored as the first capture (and therefore has number
              1)

             The character matching . is captured with number 2

             The part matching %s* has number 3

As a special case, the empty capture () captures the current string position (a number). For example, if you apply the
pattern ()aa() on the string flaaap, there are two captures: 3 and 5.


Lexical conventions
Names (also called identifiers) in expressor Datascript can be any string of letters, digits and underscores, not
beginning with a digit. This coincides with the definition of names in most languages. Identifiers are used to name
variables and table fields.

The following keywords are reserved and cannot be used as names:

and           break      do       else           elseif




                                                                                                                      145
expressor Documentation



end            false        for    function         if

in             local        nil    not              or

repeat         return       then   true             until     while


expressor Datascript is a case-sensitive language:

         and is a reserved word

         And and AND are two different, valid names.

As a convention, names starting with an underscore followed by uppercase letters (such as _VERSION) are reserved
for internal global variables used by expressor Datascript.

The following strings denote other tokens:

+         -       *          /      %         ^          #

==        ~=      <=         >=     <         >          =

(         )       {          }      [         ]

;         :       ,          .      ..        ...


Literal strings can be delimited by matching single or double quotes, and can contain the following C-like escape
sequences:

         '\b' (backspace)

         '\n' (new line)

         '\r' (carriage return)

         '\t' (horizontal tab)

         '\\' (backslash)

         '\"' (quotation mark [double quote])

         '\'' (apostrophe [single quote]).

A character in a string can also be specified by its numerical value using the escape sequence \dd, where dd is a
sequence of up to two decimal digits.

      Note: If a numerical escape is to be followed by a digit, it must be expressed using exactly two digits.

Strings in expressor Datascript can contain any 8-bit value, including embedded zeros, which can be specified as
'\0'.

You can write a numerical constant with an optional decimal part and an optional decimal exponent. expressor
Datascript also accepts integer hexadecimal constants, by prefixing them with 0x. Examples of valid numerical
constants are



146
                                                                                            expressor Datascript language




        3       3.0       3.1416       314.16e-2          0.31416E1         0xff       0x56
A comment starts with a double hyphen (--) anywhere outside a string.


Values and Types
expressor Datascript is a dynamically typed language. This means variables do not have types, only values do. There
are no type definitions in the language. All values carry their own type.

All values in expressor Datascript are first-class values. This means all values can be stored in variables, passed as
arguments to other functions and returned as results.


Basic Types


Coercion


Basic Types
There are ten basic types in expressor Datascript:

           nil: the type of the value nil, whose main property is to be different from any other value. It usually
            represents the absence of a useful value.

           boolean: the type of the values false and true

           datetime: represents a date or date and time value

           decimal: represents decimal numbers, 34 significant digits

           integer/long: represents integer numbers, 8 byte

           number: represents real (double-precision floating-point) numbers, 15 significant digits

           string: represents arrays of characters. expressor Datascript is 8-bit clean: strings can contain any 8-bit
            character, including embedded zeros ('\0').

           function

           table: implements associative arrays (i.e. arrays that can be indexed not only with numbers, but with any
            value except nil)

           ustring: represents arrays of Unicode characters.

Both nil and false make a condition false; any other value makes it true.

Tables can be heterogeneous, which means they can contain values of all types (except nil). Tables are the sole data
structuring mechanism in expressor Datascript. They can be used to represent ordinary arrays, symbol tables, sets,
records, graphs, trees, etc.

To represent records, expressor Datascript uses the field name as an index. The language supports this
representation by providing a.name as a shorthand representation of a["name"].




                                                                                                                         147
expressor Documentation



Like indices, the value of a table field can be of any type (except nil). In particular, because functions are first-class
values, table fields can contain functions. Thus tables can also carry methods.

Tables and functions are objects: variables do not actually contain these values, only references to them. Assignment,
parameter passing and function returns always manipulate references to such values. These operations do not imply
any kind of copy.

Coercion
expressor Datascript provides automatic conversion between string, integer and number types at run time. Any
arithmetic operation applied to a string tries to convert this string to a number, following the usual conversion rules.
Conversely, whenever an integer or number is used where a string is expected, the numeric is converted to a string, in
a reasonable format. For complete control over how numerics are converted to strings, use the format function
from the string library.

expressor datetime types are stored as the number of seconds after the epoch, which is January 1, 2000. If a variable
referencing a datetime value is used in an expressor function other than the functions specific to datetime values, it
will generally be used as a numeric value and not as a formatted datetime value.

expressor decimal types are not automatically coerced to non-numeric types and must be explicitly converted using
a function.

expressor Datascript provides functions to explicitly convert between types.

         datetime.string: operates on an integer, datetime, or number field, converting it into a datetime
          formatted string type

         decimal.integer: operates on a decimal field, converting it into an integer/long type

         decimal.number: operates on a decimal field, converting it into a number type

         decimal.string: operates on a decimal field, converting it into a string type

         decimal.todecimal: operates on an integer, decimal, number, or string field, converting it into a decimal
          type

         string.datetime: operates on a string field, converting it into a datetime type

         todecimal: operates on a datetime, decimal, integer, number, or string field, converting it into a decimal
          type

         tointeger: operates on a datetime, decimal, integer, number, or string field, converting it into an 8-byte
          integer type

         tolong: operates on a datetime, decimal, integer, number, or string field, converting it into an 8-byte
          integer type

         tonumber: operates on a datetime, decimal, integer, number, or string field, converting it into a number
          type

         tostring: operates on a datetime, decimal, integer, number, or string field, converting it into a string type




148
                                                                                         expressor Datascript language



         ustring.datatime: operates on a Unicode string, returning an unformatted datetime as the number of
          seconds since January 1, 2000

         ustring.decimal: operates on a Unicode string that is interpretable as a number, returning a decimal

Type conversions are summarized in the following table.


                                                          Target Type


                                                                                                                 ustri
                       datetime            decimal             integer           number             string
                                                                                                                  ng


                                                                              Conversion
                                                                              not needed:
                                                                              Within
                                       todecimal
          dateti                                                              expressor, a      datetime.s
                                       decimal.tode         tointeger
          me                                                                  datetime          tring
                                       cimal
                                                                              value is
                                                                              represented
                                                                              as a number.


                    Direct
                    conversion not
                                                            tointeger         tonumber          tostring
          decim     permitted:
Sour                                                        decimal.in        decimal.n         decimal.st
          al        Convert
 ce                                                         teger             umber             ring
                    decimal to
Typ
                    number.
 e

                    Conversion not
                    needed: An
                                       todecimal
          intege    integer value
                                       decimal.tode                           tonumber          tostring
          r         will be
                                       cimal
                    interpreted as
                    a datetime.


                    Conversion not
                    needed: A
                                       todecimal
          numb      number value                                                                tostring
                                       decimal.tode         tointeger
          er        will be
                                       cimal
                    interpreted as
                    a datetime.


                    string.date        todecimal
          string                                            tointeger         tonumber
                    time
                                       decimal.tode




                                                                                                                  149
expressor Documentation


                                          cimal


          ustrin     ustring.dat          ustring.deci
          g          etime                mal




Variables
Variables are places that store values. There are three kinds of variables in expressor Datascript:

         global variables

         local variables

         table fields

A single name can denote a global variable or a local variable (or a function's formal parameter, which is a particular
kind of local variable):


              var ::= Name
Name denotes identifiers.

Any variable is assumed to be global unless explicitly declared as a local. Local variables are scoped and can be freely
accessed by functions defined inside their scope.

Before the first assignment to a variable, its value is nil.

Square brackets are used to index a table:


              var ::= prefixexp `[´ exp `]´
The syntax var.Name is a shorthand representation of var["Name"]:


              var ::= prefixexp `.´ Name

Statements
expressor Datascript supports an almost conventional set of statements, similar to those in Pascal or C. This set
includes assignments, control structures, function calls, and variable declarations.


Chunks                                    For Statement


Blocks                                    Function Calls as Statements


Assignment                                Local Declarations


Control Structures




150
                                                                                                 expressor Datascript language



Chunks
The unit of execution is called a chunk. A chunk is a sequence of statements which are executed sequentially. Each
statement can be optionally followed by a semicolon:

         chunk ::= {stat [`;´]}

Empty statements are not allowed so ';;' is not legal.

A chunk is processed as the body of an anonymous function with a variable number of arguments. Chunks can define
local variables, receive arguments and return values.

Blocks
A block is a list of statements. Syntactically, a block is the same as a chunk:

         block ::= chunk

A block can be explicitly delimited to produce a single statement:

         stat ::= do block end

Explicit blocks are useful to control the scope of variable declarations. Explicit blocks are sometimes used to add a
return or break statement in the middle of another block.

Assignment
expressor Datascript allows multiple assignments. Therefore, the syntax for assignment defines a list of variables on
the left side and a list of expressions on the right side. The elements in both lists are separated by commas:

         stat ::= varlist `=´ explist

         varlist ::= var {`,´ var}

         explist ::= exp {`,´ exp}

Before the assignment, the list of values is adjusted to the length of the list of variables.

          If there are more values than needed, excess values are thrown away.

          If there are fewer values than needed, the list is extended with as many nils as needed.

          If the list of expressions ends with a function call, all values returned by that call enter the list of values,
           before the adjustment (except when the call is enclosed in parentheses).

The assignment statement evaluates all its expressions and then performs the assignments. In the following example,
the code

         i = 3

         i, a[i] = i+1, 20

sets a[3] to 20, without affecting a[4] because the i in a[i] is evaluated (to 3) before it is assigned 4.

Similarly, the line

         x, y = y, x

exchanges the values of x and y, and




                                                                                                                              151
expressor Documentation


x, y, z = y, z, x

cyclically permutes the values of x, y, and z.

Control Structures
The control structures if, while and repeat have the usual meaning and familiar syntax:

          stat ::= while exp do block end

          stat ::= repeat block until exp

          stat ::= if exp then block {elseif exp then block} [else block] end

expressor Datascript also has a for statement, in two flavors (see next section).

The condition expression of a control structure can return any value. Both false and nil are considered false. All other
values are considered true (in particular, the number 0 and the empty string are also true).

In the repeat–until loop, the inner block does not end at the until keyword, but only after the condition. So the
condition can refer to local variables declared inside the loop block.

The return statement returns values from a function or a chunk (which is just a function). Functions and chunks can
return more than one value, so the syntax for the return statement is

          stat ::= return [explist]

The break statement terminates the execution of a while, repeat or for loop, skipping to the next statement after the
loop:

          stat ::= break

A break ends the innermost enclosing loop.

The return and break statements can only be written as the last statement of a block. If it is really necessary to return
or break in the middle of a block, an explicit inner block can be used, as in the idioms do return end and do
break end, because now return and break are the last statements in their (inner) blocks.

For Statement
The for statement has two forms:

          numeric

          generic

The numeric for loop repeats a block of code while a control variable runs through an arithmetic progression. It has
the following syntax:

stat ::= for Name `=´ exp `,´ exp [`,´ exp] do block end

The block is repeated for name starting at the value of the first exp, until it passes the second exp by steps of the
third exp. More precisely, a for statement like

          for v = e1, e2, e3 do block end

is equivalent to the code:

          do



152
                                                                                           expressor Datascript language


               local var, limit, step = tonumber(e1), tonumber(e2), tonumber(e3)

               if not (var and limit and step) then error() end

               while (step > 0 and var <= limit) or (step <= 0 and var >= limit) do


                       local v = var

                       block

                       var = var + step

               end

        end

Note the following:

        All three control expressions are evaluated only once before the loop starts. The control expressions result in
         values that are the expressor integer, decimal, or number types.

        var, limit, and step are invisible variables. The names shown here are for explanatory purposes only.

        If the third expression (the step) is absent, then a step of 1 is used.

        You can use break to exit a for loop.

        The loop variable v is local to the loop; you cannot use its value after the for ends or is broken. If you need
         this value, assign it to another variable before breaking or exiting the loop.

The generic for statement works over functions called iterators. On each iteration, the iterator function is called to
produce a new value, stopping when this new value is nil. The generic for loop has the following syntax:

        stat ::= for namelist in explist do block end

        namelist ::= Name {`,´ Name}

A for statement like

        for var_1, ···, var_n in explist do block end

is equivalent to the code:

        do

               local f, s, var = explist

               while true do

                       local var_1, ···, var_n = f(s, var)

                       var = var_1

                       if var == nil then break end

                       block

               end

        end




                                                                                                                         153
expressor Documentation



Note the following:

          explist is evaluated only once. Its results are an iterator function, a state and an initial value for the first
           iterator variable.

          f, s, and var are invisible variables. The names are here for explanatory purposes only.

          You can use break to exit a for loop.

          The loop variables var_i are local to the loop; you cannot use their values after the for ends. If you need
           these values, assign them to other variables before breaking or exiting the loop.

Function Calls as Statements
To allow possible side-effects, function calls can be executed as statements:

          stat ::= functioncall

In this case, all returned values are thrown away.

Local Declarations
Local variables can be declared anywhere inside a block. The declaration can include an initial assignment:

          stat ::= local namelist [`=´ explist]

If present, an initial assignment has the same semantics as a multiple assignment. Otherwise, all variables are
initialized with nil.

A chunk is also a block, so local variables can be declared in a chunk outside any explicit block. The scope of such
local variables extends until the end of the chunk.


Expressions

Basic Expressions               Concatenation         Function Calls


Arithmetic Operators            Length Operator       Function
                                                      Definitions


Relational Operators            Precedence


Logical Operators               Table Constructors




Basic Expressions
The basic expressions in expressor Datascript are the following:

          exp ::= prefixexp

          exp ::= nil | false | true

          exp ::= Number



154
                                                                                             expressor Datascript language


        exp ::= String

        exp ::= function

        exp ::= tableconstructor

        exp ::= `...´

        exp ::= exp binop exp

        exp ::= unop exp

        prefixexp ::= var | functioncall | `(´ exp `)´

Variable argument expressions, denoted by three dots ('...'), can only be used when directly inside a vararg
function.

Binary operators comprise arithmetic operators, relational operators, logical operators and the concatenation
operator. Unary operators comprise the unary minus, the unary not, and the unary length operator.

Both function calls and vararg expressions can result in multiple values. If an expression is used as a statement, its
return list is adjusted to zero elements, discarding all returned values. If an expression is used as the last (or only)
element of a list of expressions, no adjustment is made (unless the call is enclosed in parentheses). In all other
contexts, expressor Datascript adjusts the result list to one element, discarding all values except the first one.

Here are some examples:

        f()                        -- adjusted to 0 results

        g(f(), x)                  -- f() is adjusted to 1 result

        g(x, f())                  -- g gets x plus all results from f()

        a,b,c = f(), x             -- f() is adjusted to 1 result (c gets nil)

        a,b = ...                  -- a gets the first vararg parameter, b gets


                                      -- the second (both a and b can get nil if there


                                      -- is no corresponding vararg parameter)

        a,b,c = x, f()             -- f() is adjusted to 2 results

        a,b,c = f()                -- f() is adjusted to 3 results

        return f()                 -- returns all results from f()

        return ...                 -- returns all received vararg parameters

        return x,y,f()             -- returns x, y, and all results from f()

        {f()}                      -- creates a list with all results from f()

        {...}                      -- creates a list with all vararg parameters

        {f(), nil}                 -- f() is adjusted to 1 result

Any expression enclosed in parentheses always results in only one value. Thus, (f(x,y,z)) is always a single value,
even if f returns several values. (The value of (f(x,y,z)) is the first value returned by f or nil if f does not return
any values.)



                                                                                                                           155
expressor Documentation



Arithmetic Operators
expressor Datascript supports the usual arithmetic operators:

          binary

                     + (addition)

                     - (subtraction)

                     * (multiplication)

                     / (division)

                     % (modulo)

                     ^ (exponentiation)

          unary

                     - (negation)

If the operands are numerics, or strings that can be converted to numbers, all operations have the usual meaning.
Unless explicitly typed, operands are the number type.

      Note: When using an arithmetic operator, the operands do not need to be the same expressor type. That
               is, the expression x=decimal.todecimal(5)+5 is acceptable as written; it is not necessary to
               convert both operands to the same type.

Exponentiation works for any exponent. For instance, x^(-0.5) computes the inverse of the square root of x.

Modulo is defined as

          a % b == a - math.floor(a/b)*b

It is the remainder of a division that rounds the quotient towards minus infinity.

      Note: Dividing by zero is an undefined operation. Check the value of the divisor before invoking the
               division operation.

Relational Operators
The relational operators are


          ==         ~=       <         >     <=        >=
These operators always result in false or true.

Equality (==) first compares the type of its operands. If the types are different, the result is false. Otherwise, the values
of the operands are compared.

          Numbers and strings are compared in the usual way.

          Objects (tables and functions) are compared by reference: two objects are considered equal only if they are
           the same object. Every time you create a new object, this new object is different from any previously existing
           object.



156
                                                                                           expressor Datascript language



The conversion rules do not apply to equality comparisons. Thus, "0"==0 evaluates to false, and t[0] and
t["0"] denote different entries in a table.

The operator ~= is exactly the negation of equality (==).

The order operators work as follows:

        If both arguments are numbers, they are compared as such.

        If both arguments are strings, their values are compared according to the current locale.

        Otherwise, expressor Datascript tries to call the "less than" or the "less than, equal to" metamethods. A
         comparison a > b is translated to b < a and a >= b is translated to b <= a.

    Note: When using a relational operator, the operands must be the same expressor type. That is, the
              expression x=decimal.todecimal(5)>4 must be rewritten so that operands are the same type.
              For example, x=decimal.todecimal(5).decimal.todecimal(4) or x=5>4 or
              x=tonumber(decimal.todecimal(5))>4 or
              x=decimal.todecimal(5)>decimal.todecimal(4).

Logical Operators
The logical operators are and, or, and not. Like the control structures, all logical operators consider both false and nil
as false and anything else as true.

        The negation operator not always returns false or true.

        The conjunction operator and returns its first argument if this value is false or nil. Otherwise, and returns its
         second argument.

        The disjunction operator or returns its first argument if this value is different from nil and false. Otherwise,
         or returns its second argument.

Both and and or use short-cut evaluation (i.e. the second operand is evaluated only if necessary).

Here are some examples:

        10 or 20                      --> 10

        10 or error()                 --> 10

        nil or "a"                    --> "a"

        nil and 10                    --> nil

        false and error()             --> false

        false and nil                 --> false

        false or nil                  --> nil

        10 and 20                     --> 20

    Note: When using a logical operator, the operands do not need to be the same expressor type. That is,
              the expression x=decimal.todecimal(10)or 20 is acceptable as written; it is not necessary to
              convert both operands to the same type.




                                                                                                                       157
expressor Documentation



Concatenation
String concatenation is performed using the string.concatenate datascript function.

Length Operator
The length operator is denoted by the unary operator #, or obtained using the string.length function. The
length of a string is its number of bytes (i.e. the usual meaning of string length when each character is one byte).

The length of a table t is defined to be any integer index n such that t[n] is not nil and t[n+1] is nil. If t[1] is
nil, n can be zero.

For a regular array, with non-nil values from 1 to a given n, its length is exactly n, the index of its last value. If the
array has "holes" (i.e. nil values between other non-nil values), #t can be any of the indices that directly precedes a
nil value (i.e. it might consider any such nil value as the end of the array).

Precedence
Operator precedence follows the table below, from lowest to highest priority:


          or
          and
          <          >     <=        >=        ~=        ==
          +          -
          *          /     %
          not        #     - (unary)
          ^
You can use parentheses to change the precedences of an expression. The exponentiation ( ^) operator is right
associative. All other binary operators are left associative.

Table Constructors
Table constructors are expressions that create tables.

Every time a constructor is evaluated, a new table is created. A constructor can create an empty table or create a table
and initialize some of its fields. The general syntax for constructors is

                 tableconstructor ::= `{´ [fieldlist] `}´

                 fieldlist ::= field {fieldsep field} [fieldsep]

                 field ::= `[´ exp `]´ `=´ exp | Name `=´ exp | exp

                 fieldsep ::= `,´ | `;´

          Each field of the form [exp1] = exp2 adds to the new table an entry with key exp1 and value exp2.

          A field of the form name = exp is equivalent to ["name"] = exp.

          Fields of the form exp are equivalent to [i] = exp, where i are consecutive numerical integers, starting
           with 1.

Fields in the other formats do not affect this counting. For example,

          a = { [f(1)] = g; "x", "y"; x = 1, f(x), [30] = 23; 45 }



158
                                                                                            expressor Datascript language



is equivalent to

        do

                 local t = {}

                 t[f(1)] = g

                 t[1] = "x"                 -- 1st exp

                 t[2] = "y"                 -- 2nd exp

                 t.x = 1                    -- t["x"] = 1

                 t[3] = f(x)                -- 3rd exp

                 t[30] = 23

                 t[4] = 45                  -- 4th exp

                 a = t

        end

If the last field in the list has the form exp and the expression is a function call or a vararg expression, all values
returned by this expression enter the list consecutively. To avoid this, enclose the function call or the vararg
expression in parentheses.

The field list can have an optional trailing separator, as a convenience for machine-generated code.

Function Calls
A function call has the following syntax:

        functioncall ::= prefixexp args

In a function call, first prefixexp and args are evaluated. If the value of prefixexp has type function, this
function is called with the given arguments. Otherwise, the prefixexp "call" metamethod is called, having as its first
parameter the value of prefixexp, followed by the original call arguments.

The form

        functioncall ::= prefixexp `:´ Name args

can be used to call "methods". A call v:name(args) is a shorthand representation for v.name(v,args),
except that v is evaluated only once.

Arguments have the following syntax:

        args ::= `(´ [explist] `)´

        args ::= tableconstructor

        args ::= String

All argument expressions are evaluated before the call.

          A call of the form f{fields} is a shorthand representation for f({fields}), which means the
           argument list is a single new table.




                                                                                                                          159
expressor Documentation



          A call of the form f'string' (or f"string" or f[[string]]) is a shorthand representation for
           f('string'), which means the argument list is a single literal string.

As an exception to the free-format syntax of expressor Datascript, you cannot put a line break before the '(' in a
function call. This restriction avoids some ambiguities in the language. If you write

          a = f

          (g).x(a)

expressor Datascript would see that as a single statement, a = f(g).x(a). If you mean it to be two statements,
you must place a semi-colon between them. If you intend to call f, you must remove the line break before (g).

A call of the form return functioncall is called a tail call. expressor Datascript implements proper tail calls (or
proper tail recursion): in a tail call, the called function reuses the stack entry of the calling function. Therefore, there is
no limit on the number of nested tail calls that a program can execute.

However, a tail call erases any debug information about the calling function. A tail call only happens with a particular
syntax, where the return has one single function call as argument. This syntax makes the calling function return
exactly the returns of the called function.

None of the following examples are tail calls:

          return (f(x))                 -- results adjusted to 1

          return 2 * f(x)

          return x, f(x)                -- additional results

          f(x); return                  -- results discarded

          return x or f(x)              -- results adjusted to 1

Function Definitions
The syntax for function definition is

          function ::= function funcbody

          funcbody ::= `(´ [parlist] `)´ block end

The alternative representations simplify function definitions:

          stat ::= function funcname funcbody

          stat ::= local function Name funcbody

          funcname ::= Name {`.´ Name} [`:´ Name]

The statement

          function f () body end

translates to

          f = function () body end

The statement

          function t.a.b.c.f () body end




160
                                                                                            expressor Datascript language



translates to

         t.a.b.c.f = function () body end

The statement

         local function f () body end

translates to

         local f; f = function () body end

not to

         local f = function () body end

(This only makes a difference when the body of the function contains references to f.)

A function definition is an executable expression whose value has type function. When expressor Datascript pre-
compiles a chunk, all its function bodies are pre-compiled too. When expressor Datascript executes the function
definition, the function is instantiated (or closed).

This function instance (or closure) is the final value of the expression. Different instances of the same function can
refer to different external local variables and can have different environment tables.

Parameters act as local variables that are initialized with the argument values:

         parlist ::= namelist [`,´ `...´] | `...´

When a function is called, the list of arguments is adjusted to the length of the parameter list unless the function is a
variadic or vararg function, which is indicated by three dots ('...') at the end of its parameter list.

A vararg function does not adjust its argument list. Instead, it collects all extra arguments and supplies them to the
function through a vararg expression, which is also written as three dots. The value of this expression is a list of all
actual extra arguments, similar to a function with multiple results.

If a vararg expression is used inside another expression or in the middle of a list of expressions, its return list is
adjusted to one element. If the expression is used as the last element of a list of expressions, no adjustment is made
unless that last expression is enclosed in parentheses.

As an example, consider the following definitions:

         function f(a, b) end

         function g(a, b, ...) end

         function r() return 1,2,3 end

We have the following mapping from arguments to parameters and to the varargexpression:

         CALL                  PARAMETERS

         f(3)                    a=3, b=nil

         f(3, 4)                 a=3, b=4

         f(3, 4, 5)              a=3, b=4

         f(r(), 10)              a=1, b=10




                                                                                                                         161
expressor Documentation


       f(r())                   a=1, b=2

       g(3)                     a=3, b=nil, ... -->           (nothing)

       g(3, 4)                  a=3, b=4,        ... -->      (nothing)

       g(3, 4, 5, 8)            a=3, b=4,        ... -->      5   8

       g(5, r())                a=5, b=1,        ... -->      2   3

Results are returned using the return statement. If control reaches the end of a function without encountering a
return statement, the function returns with no results.

The colon syntax is used for defining methods (i.e. functions that have an implicit extra parameter self). Thus, the
statement

       function t.a.b.c:f (params) body end

is a shorthand representation for

       t.a.b.c.f = function (self, params) body end


Visibility rules
expressor Datascript is a lexically scoped language. The scope of variables begins at the first statement after their
declaration and lasts until the end of the innermost block that includes the declaration.

Consider the following example:

       x = 10                          -- global variable

       do                              -- new block

               local x = x                  -- new 'x', with value 10

               print(x)                     --> 10

               x = x+1

               do                           -- another block

                       local x = x+1             -- another 'x'

                       print(x)                  --> 12

               end

               print(x)                     --> 11

       end

       print(x)                        --> 10      (the global one)

Notice that, in a declaration like local x = x, the new x being declared is not in scope yet, so the second x refers
to the outside variable.

Because of the lexical scoping rules, local variables can be freely accessed by functions defined inside their scope. A
local variable used by an inner function is called an upvalue, or external local variable, inside the inner function.

Each execution of a local statement defines new local variables. Consider the following example:

       a = {}



162
                                                                                           expressor Datascript language


        local x = 20

        for i=1,10 do

               local y = 0

               a[i] = function () y=y+1; return x+y end

        end

The loop creates ten closures (that is, ten instances of the anonymous function). Each of these closures uses a
different y variable, while they share the same x.


Scope of variables in an expressor datascript
The data transformation functions, and their associated helper functions, are invoked, as required, by expressor
Engine (e.g. as each record, or group of records, is processed by the operator). Application-specific operator helper
functions are called from statements in the data transformation and helper functions.

The scope of variables declared in functions is limited to the function or to blocks of code in a function. Variables with
function scope exist only when the function or block of code is executing.

Variables declared outside of a function have operator scope and are available to all functions in the script. These
variables exist for the lifetime of the operator (expressor process) and can be used to store values used in processing
each record.

In the following example, the variable c, initialized to zero when the operator first starts, is incremented each time the
transform function executes. The new value is used during subsequent executions of transform and the variable
c retains this altered value until the next execution of the function, when it is again incremented. In this example, the
variable c implements a counter whose current value is the number of records processed.




The important points are:

        Code outside a data transformation function or associated helper function is called during initialization of
         the operator and not each time the operator processes a record.

        Code in a data transformation function or associated helper function is called only when a record is being
         processed and not during initialization of the operator.

        Variables declared outside a data transformation or associated helper function retain their values after
         processing of a record completes and are available during processing of subsequent data records.




                                                                                                                        163
expressor Documentation



          Variables declared inside a data transformation or associated helper functions are re-initialized each time the
           function is invoked.

          Variables are assumed to be global unless explicitly declared as a local. Local variables are scoped and can
           be freely accessed by functions defined inside their scope.

          The variables input and output, which represent the incoming and outgoing records, have operator scope
           and are available to all functions in a script, but are re-initialized when the processing of each record begins.


Multiple assignment statements
expressor Datascript supports multiple assignment, where a list of values is assigned to a list of variables in a single
statement. After executing the following statement, the variable a has the value 5 and b has the value 10.

      a, b = 5, 10

This syntax is useful in two situations.

      1.   If it is necessary to swap the values stored in two variables.

      a, b = b, a

In executing this statement, the values on the right are evaluated and assigned to the variables on the left. The
equivalent logic would require four single assignment statements.

      c    =   a
      a    =   b
      b    =   c
      c    =   nil

      2.   To capture multiple return values from function calls. For example, the string.find function returns
           multiple values that can be captured by a multiple assignment statement. The following function call returns
           the starting position for the pattern .([lo]+) into var1, the pattern's ending position into var2 and the
           capture llo into var3.

                var1, var2, var3 = string.find("Hello all
                users",".([lo]+")

Return values are captured in order. Consequently, both of the following statements assign the pattern starting and
ending positions to variables, although the variables are named differently. In both cases, it is the third return value,
the capture, which is not assigned to a variable.

      var1, var2 = string.find
                   ("Hello all
      users", ".([lo]+)")
      var2, var3 = string.find
                   ("Hello all
      users", ".([lo]+)")




164
                                                                                           expressor Datascript language



In a similar vein, if a function returns fewer values than the number of listed variables, the unneeded variables are
assigned the value nil. In the following statement, var1 is assigned the value 10 while var2 is assigned the value
nil, which removes the variable from memory.

     var1, var2 = math.abs(-
     10)

Relational and logical operators
The relational operators (==, ~=, <, >, <=, >=) return true or false. These operators can be used to compare
numeric or string values, but not a combination of numeric and string values..

     5 > 6                                -
     - returns false
     "5" > "6"                            -
     - returns false
     "5" > 6                              -
     - will not execute;
                                          -- cannot compare a string to a number
     "abc" < "bcd"                        -
     - returns true

The logical operators and and or do not necessarily return boolean values, but rather the value in one of their inputs.

        The and operator returns its first input if the input value is false or nil. Otherwise it returns the value of
         its second input.

        The or operator returns the value of its first input if the input value is not false or nil. Otherwise it returns
         the value of its second input.

             "hello" and
             "goodbye" --
             returns "goodbye"
             6 and 0
                            --
             returns 0; zero is
             not

                 -- interpreted
             as false
             nil and 0
                          --
             returns nil
             "hello" or
             "goodbye"   --
             returns "hello"
             "" or "xx"
                         --




                                                                                                                        165
expressor Documentation


             returns ""; the
             empty

                   -- string is
             not

                 -- interpreted
             as false
             nil or 0
                           --
             returns 0

If the inputs are boolean values, the returns are boolean values.

      true and false         -
      - returns false
      false and true         -
      - returns false
      true or false          -
      - returns true
      false or true          -
      - returns true
      false or false         -
      - returns false; its the
                             -
      - second input that
                             -
      - is returned

The logical operator not reverses a logical value.

      not false              -
      - returns true
      not nil                -
      - returns true; nil is
                             -
      - interpreted as false
      not 0                  -
      - returns false; zero is
                             -
      - interpreted as true
      not ""                 -
      - returns false; the
                             -
      - empty string is
                             -
      - interpreted as true




166
                                                                                           expressor Datascript language



Write and use functions
When writing a function that returns multiple values, list the multiple values after the keyword return as illustrated
in the following fragment.


     function myFunction
     (list of arguments)
       -- statements

       -- return multiple
     values
       return val1, val2
     end

You can also write functions with a variable number of arguments. Replace the parameter list with an ellipsis (...). In
the function body, access the passed values by:

        using a multiple assignment statement to capture each argument into a separate variable or

        using the arguments in a table constructor.

The following fragment illustrates how to save and return a variable number of arguments. (The statements are not
meant to represent a meaningful script, only usage.)

     function myFunction
     (...)
       -- assign arguments to
     local variables
       local var1, var2, var3
     = ...
       -- assign arguments to
     table elements
       local t = {...}

       -- additional
     statements

       -- return all the
     arguments
       return ...
     end

With a multiple assignment statement, if there are not enough arguments to assign to each variable in the variable
list, variables have the value nil. If there are more arguments than variables, arguments are dropped.

Functions can also have optional arguments which, when absent, are replaced with a default value.

In the following illustration, the assignment statement initializes the variable either with the value supplied as an
argument or with a default value. Notice the use of the or (blue) logical operator in the assignment statement. If



                                                                                                                        167
expressor Documentation



optional_argument is not supplied, the or operator returns default_value, which is stored in the variable
var1.

      function
      myFunction(optional_argu
      ment)
        local var1 =
      optional_argument or
      default_value

        -- additional
      statements
      end

(The preceding statement is an example: the actual code should be appropriate to the type of
optional_argument.)

If a function returns multiple values, the number of values returned depends on how the function is used.

         If the function is used in a multiple assignment statement as the first entry in a listing of values, only the first
          return value is returned.

         If the function is the only, or last, entry in the listing of values, the function returns all its return values.

In the following example, myFunction returns two return values.

      -- save both return
      values into local
      variables
      local var1, var2 =
      myFunction()

      -- save both return
      values into local
      variables
      --    and store another
      value
      -- note that myFunction
      must be the last entry
      --    in the listing of
      values
      local var1, var2, var3 =
                         anothe
      r_value, myFunction()

      -- save the first return
      value into a local




168
                                                                                         expressor Datascript language


     --    variable and
     discard the second
     return value
     local var1 =
     myfunction()
     -- suppress all but the
     first return value from
     --    myFunction; note
     the enclosing
     parentheses
     --    (red font)
     local var1 =
     (myfunction())
     -- save the first return
     value into a local
     variable
     --    and store another
     value into a second
     local
     --    variable; note that
     the second return
     --    value from
     myFunction has not been
     --    returned and
     assigned to a variable
     local var1, var2 =
     myFunction(),
     another_value
     -- the second return
     value from myFunction is
     still
     --    discarded, the
     second variable is set
     to
     --    another_value, and
     the third variable is
     --    set to nil
     local var1, var2, var3 =
     myFunction(),
     another_value

If a function that returns one or more values is not used in an assignment statement, all return values are discarded.

     -- call myFunction and
     discard all return
     values
     myFunction()


                                                                                                                     169
expressor Documentation



Datascript tables

Table structure


Table memory management


Store functions in tables


Table structure
In expressor Datascript, tables are the only data-structuring paradigm. Tables have no fixed size, can be indexed
using numbers, strings, or a combination of both numbers and strings, and can change size as needed. Table entries
evaluate to nil when they are not initialized.

A table is created using a table constructor, which is represented by a set of curly braces.

      t = {}                  -- creates an
      empty table
      t = {7, 8, 9, 20}       -- creates a
      table containing four elements and using
      an
                              --    integer
      index
                              --    the indices
      are 1, 2, 3, 4 and the values
                              --    are 7, 8,
      9, 20
      t = {fifth=5, sixth=6} -- creates a
      table containing two elements using a
                              --    string key
                              --    the keys
      are "fifth" and "sixth" and the
                              --    values are
      5 and 6

         When used to represent an array or list, a table uses integer indices

         When used to represent a data record, a table (referred to as a dictionary) uses string keys.

In the scripting example at the beginning of this topic, fields in the incoming and outgoing data records were
referenced using dot notation, record_identifier.field_identifier.

For example:

      input.fname

This syntax is just a shorthand way to select a field value from a dictionary and is equivalent to the following
statement.

      input["fname"]


170
                                                                                           expressor Datascript language



The transform function shown in the earlier example could have been written as follows.



                                    function transform(input)

                                         output["individual_full_name"
                                    ]=
                                        string.concatenate(input["F
                                    irstname"],"
                                    ",input["Lastname"])

                                      return output;
                                    end;




The syntax record_identifier["field_identifier"] specifies a string key, with the literal value
field_identifier, in the table record_identifier.

The quotation marks are a required part of the syntax. Without the quotation marks, field_identifier becomes
the name of a variable whose value is interpreted as the table index or key.

The # operator applied against a table that uses numeric indexing starting at index 1 returns an integer value that is
the number of elements in the table.

     numberOfElements = #tableName

The # operator applied against a table that uses string indices, or numeric indices that do not start at 1, returns zero.

To determine the number of entries in a table regardless of the type of index, use the generic for loop in coding
similar to the following example.

     numberOfElements = 0
     for k in pairs(tableName) do
       numberOfElements = numberOfElements + 1
       end

    Note: The # operator applied against a string field returns the length of the string currently stored in the
              field.

Although expressor Datascript uses tables to store data records, this fact is hidden by the more convenient dot
notation used in transformation scripts. Individual data transformation scripts can make use of tables (see
Programming in Lua for more details), and tables are the underlying implementation for modules (libraries used to
incorporate existing scripts into expressor scripting).




                                                                                                                       171
expressor Documentation



Table memory management
When the assignment operator is used to directly set a field value in a table, the value is copied into the field.

For example, if the incoming record to a transformation script contains the two fields prodCode and desc, the
following statements copy the values of these fields from input to output.

      output.prodCode = input.prodCode
      output.desc = input.desc

If the assignment operator is used to initialize a new table to the values stored in an existing table, the new table and
the existing table are references to the same memory. The new table is not a copy of the original table. If the script
then changes a value represented by the new table variable, it also changes the value referenced by the original table
variable.

For example, many transformation scripts use the assignment operator to initialize the output record from the input
record. In this case, references to the fields in input are stored in the same memory locations as the identically
named fields in output. If a subsequent statement changes the value of a field in output, the value of the identically
named field in input also changes.

      output = input
      output.someField = someNewValue

      -- the value of input.someField is now
      equal to someNewValue, not its
      -- original value

Similarly, if an additional field is added to the new table, it also becomes a field in the original table.

      output = input
      output.additionalField = someValue

      -- the field additionalField also exists
      in input

Copy by reference results in fast and efficient processing, but it can lead to a problem if a script uses a modified field
in multiple conditional statements.

      -- the input record includes two fields
      --     prodCode, a number, and desc, a
      string
      -- an input record might be equivalent to
      --     input = { prodCode = 9999, desc =
      "Widget" }

      -- the script includes the following
      statements
      output = input
      if (input.prodCode < 10000) then
      output.prodCode = input.prodCode * 2 end



172
                                                                                          expressor Datascript language


     if (input.prodCode < 10000) then
     output.desc = "NotAWidget" end

In this example, since the original value of input.prodCode is less than 10000, the first conditional statement is true
and the value of output.prodCode is reset. Consequently, the second conditional statement is false and
output.desc is not reset. output contains the values 19998 and Widget, which is probably not the intended result.

There are several ways to avoid this potential problem.

    1.   Include all statements dependent on a certain condition in the same then clause.

     output = input
     if (input.prodCode < 10000)
      then
       output.prodCode = input.prodCode * 2
       output.desc = "NotAWidget"
      end

    2.   Store the value from input in a local variable and use the local variable in the conditional statements.

     local Code = input.productCode
     output = input
     if (Code < 10000) then output.prodCode =
     input.prodCode * 2 end
     if (Code < 10000) then output.desc =
     "NotAWidget" end

    3.   Use field-by-field initialization of output so that the input and output variables represent different
         copies of the data.

     output.prodCode = input.prodCode
     output.desc = input.desc
     if (input.prodCode < 10000) then
     output.prodCode = input.prodCode * 2 end
     if (input.prodCode < 10000) then
     output.desc = "NotAWidget" end

Store Functions in Tables
A table can store more than state and it can store functions.

For example, the following table – setters – contains 15 elements.

     -- create a table containing setter
     functions for all image types
     --   each function is passed two
     arguments
     --     the object on which to invoke
     --     the value to set
     --   function definitions must be on a
     single line



                                                                                                                    173
expressor Documentation


      setters = {array = function(obj, val)
      obj:setArray(val) end,
                  contact = function(obj, val)
      obj:setContact(val) end,
                  count = function(obj, val)
      obj:setCount(val) end,
                  default = function(obj, val)
      obj:setDefault(val) end,
                  definition = function(obj,
      val) obj:setDefinition(val) end,
                  encoding = function(obj, val)
      obj:setEncoding(val) end,
                  endian = function(obj, val)
      obj:setEndian(val) end,
                  extent = function(obj, val)
      obj:setExtent(val) end,
                  external = function(obj, val)
      obj:setExternal(val) end,
                  format = function(obj, val)
      obj:setFormat(val) end,
                  implied = function(obj, val)
      obj:setImplied((val=="true")) end,
                  key = function(obj, val)
      obj:setKey(tonumber(val)) end,
                  modifier = function(obj, val)
      obj:setModifier(val) end,
                  predicate = function(obj, val)
      obj:setPredicate(val) end,
                  value = function(obj, val)
      obj:setValue(val) end
                }

Each element is a function declaration. The function name becomes the table element key while the function body is
the table element value. Each table entry is equivalent to a the following syntax.

      function array (obj, val)
      obj:setArray(val) end

One advantage of storing functions in a table is that the function can be invoked anonymously (i.e. the name of the
function to be invoked is stored in a variable used to specify the table key). For example, the following code extract
invokes the function array.

      function_to_invoke="array"
      setters[function_to_invoke](obj,val)

Alternatively, the function can be declared outside the table constructor. This is useful when the function requires
multiple statements.




174
                                                                                          expressor Datascript language


     t = {}
     function t.am_or_pm()
       if datetime.string(datetime.start(),
     "HH24") > tostring(12) then
          return "PM"
       else
          return "AM"
       end
     end;

To invoke this method, use the table and function name.

     print(t.am_or_pm())

Object oriented programming
When a table stores both state and the functions that act on the state, it supports the concepts of object oriented
programming.

For example, in the following code, the table greetings contains both state and functions. The first (and in this
example the only) argument to each function is a reference to an instance of the table (self), which indicates that
the function is called on this specific instance of the table.

     greetings = {eng="Hello", span="Hola"}
     function greetings.english(self) return
     self.eng end
     function greetings.spanish(self) return
     self.span end

The most typical way to use this table as an object is illustrated in the following code fragment, where the variable
representing the table is passed as an argument to the function call (G.english(G)).

     G = greetings
     string.concatenate(G.english(G), " ",
     "Mr. Jones")
     string.concatenate(G.spanish(G), " ",
     "Senior Martinez")

To simplify the object oriented syntax, use the colon operator instead of the dot operator. The colon operator
transparently passes the reference to the table to the function call (G:english()) as shown in the following code.

     G = greetings
     string.concatenate(G:english(), " ", "Mr.
     Jones")
     string.concatenate(G:spanish(), " ",
     "Senior Martinez")




                                                                                                                        175
expressor Documentation



Call external scripts from an expressor datascript
In some applications, the coding you include in an Aggregate, Filter, Join, or Transform operator may also be usable in
other data integration applications. While you could certainly re-enter this code in each operator where it would be
useful, this would be inefficient and subject to errors. A more effective approach would be to consolidate this coding
in an external script file and integrate this file into each data integration application where its contents are
required. Datascript Modules are the most effective way to manage external script files.

You can also include other external scripts that were written independently from the transformation operators
mentioned above.

There are two ways to reference external files from the code within the Rules Editor:

         require statement

         dofile function

Both approaches allow you to access external scripts and therefore accomplish the same thing, but the require
statement approach is more adaptable and portable across operating systems and host computers. With the
require statement approach, a system environment variable is used to specify the file system location, and file
extension, of the external script file(s). The dofile function approach requires that the operator code explicitly
specify the location and complete name of the external script file.

Use the require Statement
The require statement makes a datascript easy to port to other systems because it searches set paths to find
modules for inclusion. First and foremost, if the datascript function is contained in a Datascript Module, the require
statement finds it in the module named in the require statement. If the function is contained in an external file, the
require statement looks in the external directory where dataflows are run, so the statement does not need a path.
As long as you include the script files in the dataflow's external directory
(workspace_name\Metadata\project_name\dfp\dataflow_name\external), the require statement will find them. In
addition, if you set a path to datascripts, the require statement will search that path if it does not find the script file
in the external directory. This enables you to use scripts that may be located in a variety of directories on a system
without relocating them. Nevertheless, the most efficient method for including user-written datascript functions is to
use Datascript Modules.

When attempting to locate a datascript file in the file system, the require statement searches for the file using the
following search path order:

./?.eds
../modules/?.eds
EXP_HOME/datascript/?.eds
./?.lua
../modules/?.lua
EXP_HOME/datascript/?.lua

where:




176
                                                                                           expressor Datascript language



The ./ refers to the external directory of either the Dataflow Package or the Deployment Package, and ? is the root
base name of the datascript file specified in the require statement. EXP_HOME is the name of the root installation
directory for expressor.

Using the wild card character, you can place all files containing modules into a group of common directories and
expressor will find the file corresponding to the required module.

The structure of the require statement is quite simple. You place the statement at the beginning of your datascript,
as in this example:

     require "Project1.0.MyScript"

     function transform(input)

          output.full_name=string.concatenate(t.title(),
            input.first_name, " ",input.last_name)

        return output;
     end;

     Note: The quotation marks around the script file name (MyScript) are a necessary part of the syntax.

In this example, the require statement returns a reference to a table. All the functions that the module makes
available to the calling script need to be defined in this table. So MyScript contains a single function that returns a
string.

     t = {}
     function t.title() return "President "
     end
     return t

         The first statement creates an empty table named t.

         The second statement defines a function named title() which returns the string "President ". The
          function is added to the table by qualifying the function name with the table name. Therefore, the function
          title() is stored in the table at the index "title".

         The third statement returns the table from the module.

Alternatively, the module could include the following coding in which the table is initialized with the function using
the key "title", which becomes the name of the function.

     t = {title = function() return "President
     " end}
     return t

Additional functions can be added to the table by initializing another table entry. For example,

     function t.suffix() return "Sr." end

or




                                                                                                                          177
expressor Documentation


      t = {title = function() return "President
      " end, suffix = function() return "Sr."
      end}

After running the dataflow, each line in the output file is similar to the following example.

      President George Washington


Use the dofile function
Another way to use coding in an external file in a script is to use the dofile function to execute the statements in
the external file. With this approach, variables or functions created in the external script become available in the
script that calls the dofile function.

Within the coding of any operator that requires access to this file, include, at the beginning of the scripting, the
statement

      dofile(”r;./MyScript.txt”)

For example, if the file MyScript.txt, contained in the dataflow's external subdirectory, includes the following
script:

      title = "President "

      function honorific() return "President "
      end

a transform operator containing either

      dofile("./MyScript")

      function transform(input)

         output.individual_full_name=
           string.concatenate(title,input.individual_first_name
      , " ",input.individual_last_name)

         return output;
      end;

or

      dofile("./MyScript")

      function transform(input)

         output.individual_full_name=
           string.concatenate(honorific(),input.individual_firs
      t_name, " ",input.individual_last_name)




178
                                                                                          expressor Datascript language


        return output;
     end;

produces an output file with content similar to the following.

     President George Washington



Use Runtime Parameters in expressor Datascript
When a dataflow is running, there are several pieces of runtime information that can be referenced from in an
expressor Datascript. The following table lists the available information.


   variable name                                                 description


                           When a dataflow uses a multi-channel network, this variable contains the
record.partition
                           number of the channel carrying the record.


dataflow.name              The name of the currently running dataflow.


                           The time at which execution of the dataflow began. This is different than the
dataflow.start             value returned by the datetime.start function, which returns the time at
                           which the function was invoked.


dataflow.pid               The process identifier associated with the running dataflow.


dataflow.ppid              The parent process identifier.




                                                                                                                   179
expressor functions

Datascript functions
When you use expressor Studio to create a dataflow, expressor Engine operators use Datascript functions to
transform data between the incoming and emitted images. These are the categories of transformation functions:

        Basic functions provide core processing functionality.

        Bit functions create and manipulate 64-bit bit sets.

        Byte functions perform transformations on byte and string fields.

        Datetime functions perform transformations on datetime, number and string fields.

        Decimal functions perform transformations on decimal fields.

        Is functions perform logical tests on fields.

        Logging functions print information to the expressor, and possibly system, log files.

        Lookup functions match input data with records in a Lookup Table.

        Math functions perform transformations on datetime and number fields.

        String functions perform transformations on string fields.

        Table functions manipulate the contents of tables.

        Ustring functions manipulate Unicode characters.

        Utility functions are general purpose, non-typed functions.

Datascript extension libraries provide access to Web services and FTP making it possible to build dataflows that
integrate either between on-premise and cloud sources and targets or directly from cloud-to-cloud.

        dscurl (from luacurl): http://luacurl.luaforge.net

        dssql (from luasql): http://www.keplerproject.org/luasql/manual.html

        lxp (from Luaexpat): http://www.keplerproject.org/luaexpat/manual.html

        json (from jspon4lua): http://json.luaforge.net (note that the JSONRPC4Lua is not included)

        memcached (from Luamemcached): http://code.google.com/p/luamemcached/w/list

        socket (from Luasocket): http://w3.impa.br/~diego/software/luasocket

Use the expressor functions when writing expressor Datascript. You can also use expressor runtime parameters
when writing datascript.




                                                                                                                   181
expressor Documentation



basic functions
Basic functions include:


assert              error            pcall               tolong


decision            ipairs           select              tonumber


decode              next             todecimal           tostring


dofile              pairs            tointeger


Return to Reference: Datascript Module Editing

assert
assert can terminate the code from which it is invoked and can optionally return an error message.


usage                assert(statement[, message])


arguments            statement                a code statement to execute

                                              if the statement executes successfully, assert returns the
                                              result from executing the statement

                                              if the statement evaluates to false or nil, assert invokes
                                              the error function passing the optional message

                     message                  optional error message returned when statement evaluates
                                              to false or nil


return             the result from statement or a message (which can be an informative string or data
                   value)


Examples
      ret_val =
      assert(is.datetime(input.field), "not a
      date")

      If input.field is not a datetime type, assert will throw a fault that contains the message not a date,
      otherwise the variable ret_val will contain the value returned from the is.datetime function, that is, true.

      Note: If statement returns false or nil, assert will throw a fault containing message. If statement
              throws a fault, assert will throw a fault containing a system generated message. Whenever
              assert throws a fault, processing will stop. The function assert should not be used in an
              attempt to catch an error and continue processing; use pcall for this purpose.




182
                                                                                                          expressor functions



decision
decision provides a set of value tests and related values to be returned based on the outcome of the value tests.
The function examines the result of a logical expression.

           If the expression evaluates to true, it returns the value associated with this expression.

           If the expression evaluates to false, it examines the result of a following expression, returning a specific value
            associated with the expression if true, or examining another expression if false.

The function returns a value from the first expression that evaluates to true. Consequently, list more restrictive logical
expressions before less restrictive logical expressions. If no expressions evaluate to true, the function returns a default
value if provided, or nil.


usage                  decision (comparison_1, return_val_1
                      [, comparison_n, return_val_n]*
                      [,default_return])


arguments               comparison_1              a logical expression

                        comparison_n              optional additional logical expressions

                                                  * zero or more logical expressions

                        return_val_1...           the return value associated with each logical expression
                        return_val_n

                        default_return            optional default return value


return                  return_val_n or default_return or nil


Examples
     decision(input.state=="MA", "northeast",
              input.state=="GA" or
     input.state=="LA", "south",
              input.state!="AK", "lower 48")

            If input.state is equal to "MA", returns "northeast"

            If input.state is equal to "GA" or "LA", returns "south"

            If input.state is not equal to "AK", returns "lower 48"

            Otherwise, returns nil

    Note: The arguments can be literal values, expressor Datascript function calls, references to datascript
                variables or record fields.

     The decision function is a true function, which means that each argument to the function must be resolvable
     before the function begins its processing. For example, if in the following fragment the value of




                                                                                                                          183
expressor Documentation



      ledger_balance is nil, the third argument to the function (red) cannot be resolved before the function
      begins executing and an error is raised.

      decision(is.empty(ledger_balance),
      "empty", tonumber(ledger_balance) + 100)

      Calls to a Datascript function can be used as arguments to the decision function. If an argument to a
      Datascript function is a decimal value, you must copy the value into a new variable and use the new variable as
      the function argument. This insures that the argument is passed by value and that optimizations in the
      underlying code will not alter the original value.

      leger_balance_copy = leger_balance --
      leger_balance is a decimal
      -- some_function operates on the copy,
      leaving the original value unchanged
      decision(is.empty(leger_balance_copy),
      "empty",
      some_function(leger_balance_copy))

      Be certain that the values compared in the expressions are compatible. For example, in the following statement,
      since leger_balance_copy is a decimal type, the comparison value must also be a decimal type.

      decision(leger_balance_copy >
      todecimal(10000), "overlimit",
      "withinlimit")



decode
decode compares val_1 to val_2.

         If the values are equal, the function returns a value associated with val_2.

         If the comparison is false, the decode function compares val_1 to val_n, returning a specific value associated
          with val_n if true or performing another comparison if false.

The function returns a value from the first true comparison. If no comparisons are true, the function returns a default
value if provided, or nil.


usage                decode(val_1, val_2, return_val_2
                    [, val_n, return_val_n]*[,default_return])


arguments             val_1                      the value to be compared

                      val_2                      required comparison value

                      val_n                      optional additional comparison values

                                                 * zero or more comparison values




184
                                                                                                    expressor functions



                     return_val_2...          the return value associated with each comparison value
                     return_val_n

                     default_return           optional default return value


return              return_val_n or default_return or nil


Examples
     decode(input.state, "MA", "northeast",
     "GA", "south",
     "CA", "west",
     "lower 48")

         If input.state is equal to "MA", "GA", or "CA", returns "northeast", "south", or "west"

         Otherwise, returns "lower 48".

    Note: The arguments can be literal values, expressor Datascript function calls, references to datascript
             variables or record fields.

     The decode function is a true function, which means that each argument to the function must be resolvable
     before the function begins its processing. Calls to a Datascript function can be used as arguments to the decode
     function. If an argument to a Datascript function is a decimal value, you must copy the value into a new variable
     and use the new variable as the function argument. This insures that the argument is passed by value and that
     optimizations in the underlying code will not alter the original value. (See example in the decision function.)

     Be certain that the values compared in the expressions are compatible. For example, in the following statement,
     since department_number is a number type, the comparison value must also be a decimal type.

     decode(department_number, tonumber(100),
     "marketing",
     tonumber(200), "sales",
     tonumber(300), "engineering",
     "administration")



dofile
dofile loads and executes the contents of a file.


usage               dofile(filename)


arguments            filename         the path to, and name of, the file to execute


return              the variables, tables, and functions defined in the file




                                                                                                                   185
expressor Documentation



error
error terminates the code from which it is invoked and returns an error message.


usage                error(message [, level])


arguments             message                the message

                      level                  the position of the error

                                                      level 0 suppresses addition of position information.

                                                      level 1 (the default) returns the line number in the
                                                       datascript where the error function was called.

                                                      level 2 returns the line number in the datascript where
                                                       to the function that called the function in which the
                                                       error was encountered is located.


return               message (which can be an informative string or data value)



Examples
      function callme(arg)
        if arg then return arg else error("a
      nil argument", 0) end
      end

If the error function is invoked, it will return the indicated message, but because the level specified is zero, no
information is returned about where the error occurred.

If level 1 is specified, then the error message would be "Datascript:10: a nil argument", where 10 is the line in the
datascript from where the error function is called.




ipairs
ipairs creates an iterator over an integer indexed table.


usage                ipairs(table)


arguments             table             the table over which to iterate


return               index, value




186
                                                                                                        expressor functions



Examples
     for i, v in ipairs(table) do ... end

     Returns the index into i and the value t[i] into v for each element in the table up to the first empty index.




next
next allows retrieval of any entry from a table. The function returns the index and value of the entry at the next index.


usage                 next(table [, index])


arguments             table                   the table from which to extract values

                      index                   an index in this table

                                                      if nil, or absent, returns the value at the first index

                                                      if equal to the last index, returns nil


return                index, value


Examples
     i, v = next(table,index)

     Returns the index of the following element into i and the value t[i] into v.

     i, v = next(table)

         Returns nil if the table is empty.




pairs
pairs creates an iterator over a table.


usage                 pairs(table)


arguments             table             the table over which to iterate


return                key, value


Examples
     for k, v in pairs(table) do ... end

     Returns the key into k and the value t[k] into v for each element in the table up to the first empty key.




                                                                                                                       187
expressor Documentation



pcall
pcall invokes a function in protected mode, returning true and the function results or false and an error
message.


usage                pcall(f [,args[,...]])


arguments            f                     the function to execute; just the function name, do not include
                                           parentheses

                     args[,...]            arguments to the function


return               boolean, function return value(s) or error message


Examples
      -- first, write the function that will be
      called in protected mode
      function callme(arg)
        if arg then return arg else error("a
      nil argument", 0) end
      end

      -- then call the function in protected
      mode
      --   be certain to capture all return
      values
      -- this invocation will return false and
      the message "a nil argument"
      success, result = pcall(callme)
      print(success, " ", result)

      -- this invocation will return true and
      return value "expressor software"
      success, result = pcall(callme,
      "expressor software")
      print(success, " ", result)

      In the preceding example, print represents scripting that handles either the expected return values or the
      error message. In actual usage, testing the value of success determines whether result contains the return
      value or error message.

      -- you may also call a prewritten
      datascript function
      success, v1 =
      pcall(string.datetime,"01012010","MMDDCCY
      Y")



188
                                                                                                   expressor functions




     success, v2 =
     pcall(string.datetime,"01322010","MMDDCCY
     Y")

     In the first call, success will be true and v1 will be a datetime value (expressed as a number). In the second
     call, success will be false and v2 will contain a system generated error message. With both calls, testing the
     value of success will allow you to determine whether the function executed without error.




select
select function a subset of the passed arguments.


usage                select(index, ...)


arguments             index                  the first argument to return

                      ...                    arguments, or expression, to evaluate


return               arguments or expression return values beginning with the value at index


Examples
     function multi_return() return 1, 2, 3, 4
     end

     print (select(2, multi_return()))
     -- prints 2 3 4, the return values
     beginning with second value

     print (select 3, "a", "b", "c", "d")
     -- prints c d, the arguments beginning
     with the third value

You can use the select function to choose only one return value from a function.

For example, enclosing a function in parentheses limits the return to the first return value.

     function multi_return() return 1, 2, 3, 4
     end

     print ((multi_return()))
     -- prints 1, the first return value

By combining parentheses with select, any of the multiple return values can be selected.

     print (select(3, multi_return()))




                                                                                                                      189
expressor Documentation


      -- prints 3 4, the third and fourth
      return values

      -- to return only the third return value
      print ((select(3, multi_return())))
      -- prints 3, the first of the selected
      return values



todecimal
todecimal tries to convert its argument to a decimal.


usage               todecimal(value)


arguments           value                 a number or a string convertible to a decimal


return              decimal


Examples
      todecimal ("5")

      Returns 5.

      todecimal (tolong(5))

      Returns 5.




tointeger
tointeger tries to convert its argument to an 8-byte integer.


usage               tointeger(value [, base])


arguments           value                 a number or a string convertible to a number

                                          when the base is 10 (the default), value can have a decimal
                                          part or optional exponent; with all other base specifications,
                                          value must be an unsigned integer

                    base                  the base used to interpret the value

                                                  an integer entry between 2 and 35


return              8-byte integer




190
                                                                                                   expressor functions



tolong
tolong tries to convert its argument to an 8-byte integer.


usage               tolong(value [, base])


arguments            value                 a number or a string convertible to a number

                                           when the base is 10 (the default), value can have a decimal
                                           part or optional exponent; with all other base specifications,
                                           value must be an unsigned integer

                     base                  the base used to interpret the value

                                                   an integer entry between 2 and 35


return              8-byte integer


    Note: This function has been deprecated. The tointeger function provides equivalent functionality.




tonumber
tonumber tries to convert its argument to a number.


usage               tonumber(value [, base])


arguments            value                 a number or a string convertible to a number

                                           when the base is 10 (the default), value can have a decimal
                                           part or optional exponent; with all other base specifications,
                                           value must be an unsigned integer

                     base                  the base used to interpret the value

                                                   an integer entry between 2 and 35


return              number


Examples
     tonumber ("1021", 2)

     Returns nil: the string "1021" cannot be interpreted as a valid base 2 number.

     tonumber ("1021")

     Returns 1021: the string "1021" can be interpreted as a base 10 number.

     tonumber ("1011", 2)



                                                                                                                  191
expressor Documentation



      Returns 11: the string "1011" can be interpreted as a valid base 2 number.

      tonumber (1011, 2)

      Returns 11: the number 1011 is a valid base 2 number.

      tonumber ("0xff", 16)

      Returns 255: the string "0xff" can be interpreted as a valid base 16 number.




tostring
tostring tries to convert its argument to a string.


usage                 tostring(value)


arguments                value                   an argument in any type to be converted into a string (see
                                                 string.format to control how numbers are converted)


return                string




bit functions
All bit functions operate and return 64-bit bit sets. The right-most bit is bit position 1 and the left-most bit is bit
position 64.

Bit functions include:


create                                   position


every                                    set


exclusive                                shift


inclusive                                string


negation                                 toggle


Return to Reference: Datascript Module Editing

create
create creates a bit set variable from a binary pattern.


usage                 bit set bit.create(value)




192
                                                                                             expressor functions



arguments            value            string representation of the binary pattern


return              bit set


Examples
     bit.create("1111")

     Returns the bit set 0000000000000000000000000000000000000000000000000000000000001111.




every
every implements the logical exclusive or on a bit set variable.


usage               bit set bit.every(value, mask [, ...])


arguments            value                 the bit set on which to operate

                     mask                  the bit set used to toggle the value argument

                                           one or more mask arguments can be provided; the arguments
                                           are applied in turn to the value argument


return              bit set


Examples
     x=bit.create("1111")
     y=bit.create("1100")
     a=bit.every(x,y)

     Returns the bit set 0000000000000000000000000000000000000000000000000000000000000011.

     x=bit.create("1111")
     y=bit.create("1100")
     z=bit.create("0001")
     a=bit.every(x,y,z)

     Returns the bit set 0000000000000000000000000000000000000000000000000000000000000010.

         The exclusive or is performed on x and y

         An exclusive or is performed on the resulting value with the argument z.




                                                                                                            193
expressor Documentation



exclusive
exclusive implements the logical and on a bit set variable.


usage               bit set bit.exclusive(value, mask [, ...])


arguments           value                   the bit set on which to operate

                    mask                    the bit set used to set the value argument

                                            one or more mask arguments can be provided; the arguments
                                            are applied in turn to the value argument


return              bit set


Examples
      x=bit.create("1111")
      y=bit.create("1100")
      a=bit.exclusive(x,y)

      Returns the bit set 0000000000000000000000000000000000000000000000000000000000001100.

      x=bit.create("1111")
      y=bit.create("1100")
      z=bit.create("1000")
      a=bit.exclusive(x,y,z)

      Returns the bit set 0000000000000000000000000000000000000000000000000000000000001000.

         The and is performed on x and y

         Another and is performed on the resulting value with the argument z.




inclusive
inclusive implements the logical or on a bit set variable.


usage               bit set bit.inclusive(value, mask [, ...])


arguments           value                   the bit set on which to operate

                    mask                    the bit set used to set the value argument

                                            one or more mask arguments can be provided; the arguments
                                            are applied in turn to the value argument


return              bit set




194
                                                                                                 expressor functions



Examples
    x=bit.create("1110")
    y=bit.create("1100")
    a=bit.inclusive(x,y)

    Returns the bit set 0000000000000000000000000000000000000000000000000000000000001110.

    x=bit.create("1110")
    y=bit.create("1100")
    z=bit.create("0001")
    a=bit.inclusive(x,y,z)

    Returns the bit set 0000000000000000000000000000000000000000000000000000000000001111.

         The or is performed on x and y

         Another or is performed on the resulting value with the argument z.

negation
negation inverts the mask bit set and then applies the logical and to the value argument.


usage               bit set bit.negation(value, mask [, ...])


arguments           value                  the bit set on which to operate

                    mask                   the bit set used to set the value argument

                                           one or more mask arguments can be provided; the
                                           arguments are applied in turn to the value argument


return              bit set


Examples
    x=bit.create("1111")
    y=bit.create("1100")
    a=bit.negation(x,y)

    Returns the bit set 0000000000000000000000000000000000000000000000000000000000000011.

    x=bit.create("1111")
    y=bit.create("1100")
    z=bit.create("0011")
    a=bit.negation(x,y,z)

    Returns the bit set 0000000000000000000000000000000000000000000000000000000000000000.

         Inversion is performed on argument y

         An and is performed with argument x

         Another and is performed with the resulting value and the inversion of argument z.



                                                                                                                195
expressor Documentation



position
position returns true or false depending on whether the bit specified through position is set or unset.


usage                  boolean bit.position(value, position)


arguments              value                   the bit set on which to operate

                       position                the bit position to test

                                               one or more position arguments can be provided

                                               when negative, the bit position is referenced from the left end
                                               of the bit set


return                 boolean


Examples
      x=bit.create("1111")
      a=bit.position(x,1)

      Returns true.

      x=bit.create("1110")
      a=bit.position(x,1)

      Returns false.

      x=bit.create("1111")
      a=bit.position(x,-2)

      Returns false, because the bit set actually includes 64 bits and the second bit from the left end is 0.

      x=bit.create("1111")
      a=bit.position(x,5)

      Returns false, because the bit set actually includes 64 bits and the fifth bit from the right end is 0.




set
set sets the specified bit.


usage                  bit set bit.inclusive(value, position [ , ...])


arguments              value                   the bit set on which to operate

                       position                the bit position to toggle

                                               one or more position arguments can be provided




196
                                                                                                         expressor functions



                                                  when negative, the bit position is referenced from the left end
                                                  of the bit set


return                bit set


Examples
     x=bit.create("1111")
     a=bit.set(x,5)

     Returns the bit set 0000000000000000000000000000000000000000000000000000000000011111.

     x=bit.create("1111")
     a=bit.set(x,8,9)

     Returns the bit set 0000000000000000000000000000000000000000000000000000000110001111.

     x=bit.create("1111")
     a=bit.set(x,-2)

     Returns the bit set 0100000000000000000000000000000000000000000000000000000000001111.




shift
shift scrolls the bit set to the left or right.


usage                 bit set bit.shift(value, length)


arguments             value              the bit set on which to operate

                      length             the number of bits to shift; range -63 to +63

                                                    when positive, the bit set shifts to the
                                                    left

                                                    when negative, the bit set shifts to the
                                                    right


return                bit set


Examples
     x=bit.create("1110")
     a=bit.shift(x,1)

     Returns the bit set 0000000000000000000000000000000000000000000000000000000000011100.

     x=bit.create("1110")
     a=bit.shift(x,-1)

     Returns the bit set 0000000000000000000000000000000000000000000000000000000000000111.



                                                                                                                        197
expressor Documentation



string
string converts a bit set, or a portion of a bit set, into a string. Optionally, you can specify characters for the
numerics 0 and 1.


usage                 string bit.string(value [, length [ , format]])


arguments             value                  the bit set on which to operate

                      length                 the number of bits, beginning from the right end, to include in
                                             the string

                      format                 the characters to use as replacements for 0 and 1


return                string


Examples
      x=bit.create("1001")
      a=bit.string(x,3)

      Returns the string 001.

      x=bit.create("1001")
      a=bit.string(x,3, "TF")

      Returns the string FFT.

      x=bit.create("1001")
      a=bit.string(x,3, "+-")

      Returns the string --+.




toggle
toggle reverses the specified bit.


usage                 bit set bit.toggle(value, position [ , ...])


arguments             value                   the bit set on which to operate

                      position                the bit position to set

                                              one or more position arguments can be provided

                                                      when positive, the bit position is referenced from the
                                                      right end of the bit set

                                                      when negative, the bit position is referenced from the




198
                                                                                                 expressor functions



                                                   left end of the bit set


return               bit set


Examples
       x=bit.create("1111")
       a=bit.toggle(x,3)

       Returns the bit set 0000000000000000000000000000000000000000000000000000000000001011.

       x=bit.create("1111")
       a=bit.toggle(x,2,3)

       Returns the bit set 0000000000000000000000000000000000000000000000000000000000001001.

       x=bit.create("1111")
       a=bit.toggle(x,2,3,-1)

       Returns the bit set 1000000000000000000000000000000000000000000000000000000000001001.




byte functions
Byte functions include:


decode


encode


size


Return to Reference: Datascript Module Editing

decode
decode operates on a base64 encoded string field, returning a readable/printable string field.


usage                string byte.decode(value)


arguments            value            base64 encoded string field


return               string or byte[]


Examples
       byte.decode("ZXhwcmVzc29yIHByb2Nlc3Nvcg==
       ")

       Returns the string expressor processor.




                                                                                                                199
expressor Documentation


       byte.decode(nil)

       Returns nil.

      Note: This function only works when called from scripting within an operator in a running dataflow; do
               not invoke from a command window running the datacript command.




encode
encode operates on a readable/printable string or byte[ ] field, returning a base64 encoded string field.


usage                 string byte.encode(value)


arguments             value          a string or byte[ ]


return                string


Examples
       byte.encode("expressor processor")

       Returns the string ZXhwcmVzc29yIHByb2Nlc3Nvcg==.

       byte.encode(nil)

       Returns nil.

      Note: This function only works when called from scripting within an operator in a running dataflow; do
               not invoke from a command window running the datacript command.

size
size operates on a string or byte[ ] field, returning the number of bytes.


usage                 string byte.size(value)


arguments             value          a string or byte[ ]


return                integer


Examples
       byte.size("John")

       Returns 4.

       byte.size("\001\002\003\004\005\006\007")

       Returns 7.

      Note: nil is not a valid an argument to this method. Passing nil raises an exception and processing
               terminates.



200
                                                                                                          expressor functions



datetime functions
Datetime functions include:


adjust                                 past


elapse                                 start


future                                 string


moment                                 timestamp


    Note: For datetime functions, the epoch starts on January 1, 2000. The default length of one year is
             365.25 days. If datetime values are provided with time information only, the date is assumed to
             be the epoch.

Return to Reference: Datascript Module Editing

adjust
adjust operates on a datetime field, returning a datetime field adjusted by a specified interval.


usage               datetime datetime.adjust(value, interval[, format[, exact]])


arguments            value                     starting datetime

                     interval                  the adjustment to be applied to the starting datetime

                     format                    the interpretation of interval; the default is seconds


                                                     format (case insensitive)           interpretation


                                                 none                            seconds


                                                 s                               seconds


                                                 i                               minutes


                                                 h                               hours


                                                 d                               days


                                                 y                               years




                                                                                                                         201
expressor Documentation




                                              c                                 centuries

                                            months (m) is not a valid format.

                        exact               If false (the default), intervals are calculated using a 365.25 day
                                            year. If true, intervals are calculated using a 365 day year.


return                  datetime


Examples
      datetime.adjust(string.datetime("02152008
      ", "MMDDCCYY"), 1, "y")
      datetime.adjust(string.datetime("02152008
      ", "MMDDCCYY"), 1, "y", false)

      Both return 2009-02-14 06:00:00.

      datetime.adjust(string.datetime("02152008
      ", "MMDDCCYY"), 1, "y", true)

      Returns 2009-02-15 00:00:00.

      datetime.adjust(string.datetime("02292008
      ", "MMDDCCYY"), 1, "y", true )

      Applying the adjust function with a year length of 365 days to leap day, returns March 1, not February 28. This
      example returns 2009-03-01 00:00:00. (Notice the use of the string.datetime function to create a
      datetime type.)

      datetime.adjust(string.datetime("02292008
      ", "MMDDCCYY"), 1, "y", false )

      Applying the adjust function with a year length of 365.25 days to leap day, returns February 28. This example
      returns 2009-02-28 06:00:00.

      Note: nil is not a valid an argument to this method. Passing nil raises an exception and processing
              terminates.




elapse
elapse operates on datetime fields, returning the difference (seconds, minutes, hours, days, years, centuries)
between the values.


usage                   number datetime.elapse(value1, value2[, format])


arguments               value1         starting datetime
                        value2         ending datetime




202
                                                                                                    expressor functions



                       format          the time period (units) in which to express the
                                       differential


                                             format (case           interpretation
                                              insensitive)


                                         none                    seconds


                                         s                       seconds


                                         i                       minutes


                                         h                       hours


                                         d                       days


                                         y                       years


                                         c                       centuries

                                       months (m) is not a valid format.


return                 number


Examples
     datetime.elapse
       (string.datetime("08081992", "MMDDCCYY"),
     string.datetime("08081993", "MMDDCCYY"), "y")

     Returns 1 year.

    Note: nil is not a valid an argument to this method. Passing nil raises an exception and processing
              terminates.




future
future operates on a datetime field, returning true if the field represents a date in the future.


usage                  boolean datetime.future(value)


arguments              value           datetime to be tested


return                 boolean




                                                                                                                   203
expressor Documentation



Examples
      datetime.future(nil)

      Returns false.

      datetime.future(0)

      Interpreted as January 1, 2000 returns false.

      datetime.future
      (string.datetime("10128888", "MMDDCCYY"))

      Interpreted as October 12, 8888; returns true.

      Note: This function is equivalent to the is.future function.




moment
moment operates on a datetime field, returning a specific element (seconds, minutes, hours, day, month, year,
century).


usage                  variable datetime.moment(value[, format])


arguments              value              starting datetime

                       format             the element to return; the default is seconds


                                            format (case insensitive)           interpretation


                                            none                        seconds

                                            s                           seconds


                                            i                           minutes


                                            h                           hours


                                            d                           day


                                            j                           Julian day (leap year
                                                                        has 366 days)


                                            w                           day of week (Sunday
                                                                        is 0)




204
                                                                                                expressor functions




                                             m                             month


                                             y                             year


                                             c                             century


return               string, number


Examples
     datetime.moment
     (string.datetime("07122007", "MMDDCCYY"),
     "y")

     Returns 2007.

    Note: nil is not a valid an argument to this method. Passing nil raises an exception and processing
              terminates.




past
past operates on a datetime field, returning true if the field represents a date in the past.


usage                boolean datetime.future(value)


arguments             value            datetime to be tested


return               boolean


Examples
     datetime.past(0)

     Interpreted as January 1, 2000; returns true.

     datetime.past
     (string.datetime("10128888", "MMDDCCYY"))

     Interpreted as October 12, 8888; returns false.

    Note: This function is equivalent to the is.past function.
              nil is not a valid an argument to this method. Passing nil raises an exception and processing
              terminates.




                                                                                                               205
expressor Documentation



start
start returns the datetime as computed by the current component channel at the start of an expressor dataflow
run.


usage                 datetime datetime.start()


arguments                                 none


return                datetime


Examples
       datetime.start()

       Returns operating system time corresponding to the start time for the current channeled components' thread of
       execution. GMT is the time zone.

       Note: It is not guaranteed that all component channels compute the same start value.




string
string operates on a datetime field, returning the string representation of the datetime.


usage                 string datetime.string(value[, format])


arguments             value                      datetime to convert into a string representation

                      format                     the format of the string representation;
                                                 the default format is
                                                 CCYY-MM-DD HH24:MI:SS


                                                      Format                      Interpretation


                                                    HH24            hours in 24 hour format

                                                                    hours in 24 hour format
                                                    H*24            one digit hour formatting when
                                                                    appropriate

                                                    HH12            hours in 12 hour format

                                                                    hours in 12 hour format
                                                    H*12            one digit hour formatting when
                                                                    appropriate




206
                                                expressor functions




HH          hours in 24 hour format

            hours in 24 hour format
H*          one digit hour formatting when
            appropriate

MI          minutes

SS          seconds

s[ssssss]   fractional seconds

            used with HH or HH12 to indicate
            whether hour values are AM or PM

            only valid if a full time format,
AM or PM
            including fractional seconds, is
            specified

            this value is passed to the output

DD          day in numeric format

            day specified as either one or two
            digits

            format pattern must be delimited, i.e.

D*          MM-D*-CCYY or MM/D*/CCYY, not
            MMD*CCYY

            valid format delimiters are space,
            hyphen, forward slash, comma and
            period

            invalid day specification accepted;
            converts the day to either 01 or the
D?
            last day of the month based on the
            input value

            allows processing of mixed
            day/month, giving precedence to day
DM
            used in conjunction with the MD
            format




                                                               207
expressor Documentation




                          DDD    day of week abbreviated (e.g., MON)

                          DAY    day of week abbreviated (e.g., MON)

                                 day of week in long format (e.g.,
                          DDDD
                                 Monday)

                                 day of week in long format (e.g.,
                          DDAY
                                 Monday)

                          JJJ    Julian day of year

                          MM     month in numeric format

                                 month specified as either one or two
                                 digits
                                 format pattern must be delimited, i.e.
                                 M*-DD-CCYY or M*/DD/CCYY, not
                          M*
                                 M*DDCCYY

                                 valid format delimiters are space,
                                 hyphen, forward slash, comma and
                                 period

                          M?     invalid month specification accepted

                                 allows processing of mixed
                                 month/day, giving precedence to
                          MD     month

                                 used in conjunction with the DM
                                 format

                          MMM    month in short format (e.g., JAN)

                          MMMM   month in long format (e.g., January)

                          YY     years

                                 forces a century designation
                                 anchored to NN
                          YNN
                                 in a date field, a two character year is
                                 interpreted as the current century if
                                 less than NN and the previous




208
                                                                                             expressor functions



                                                             century if greater than NN.

                                             CC              century

return             string


Examples
    datetime.string(0)

    Returns the string 2000-01-01 00:00:00.

    datetime.string(datetime.start())

    Returns a string containing the date and time at which the start function was invoked.

    datetime.string(string.datetime("2009-07-
    04", "CCYY-MM-DD"), "MM/DD/YY")

    Returns a string containing the date 07/04/09. The format specification to the datetime.string function
    ("MM/DD/YY") overrides the format of the datetime created by the string.datetime function ("CCYY-MM-
    DD").

    Note: nil is not a valid an argument to this method. Passing nil raises an exception and processing
            terminates.




timestamp
timestamp returns the current datetime as computed by the operating system hosted component channel.


usage              datetime datetime.timestamp()


arguments                           none


return             datetime


Examples
    datetime.timestamp()

    Returns the current operating system time. GMT is the time zone.

    Note: For performance reasons it might be better to use the datetime.start() function.




                                                                                                              209
expressor Documentation



decimal functions
expressor decimal functions operate on decimal values or convert non-decimal values into decimal values.
Arguments to these functions (except as noted below) must be expressor decimal values. Whenever a decimal
function operates on a non-decimal numeric, a decimal result is returned.

expressor decimal types are not automatically coerced to non-numeric types and must be explicitly converted using
the functions as described in this topic. You must coerce a decimal value to a non-numeric type in order to use the
value as an argument to non-numeric functions (e.g. string or utility functions).

Arithmetic and comparative operations involving numeric values of different types will be properly executed.
expressor will transparently promote one of the values to a compatible type so the operation may proceed. For
example, in adding decimal and number(real) values, the number will be promoted to a decimal and the result will be
a decimal type. An integer will be promoted to number(real) or decimal, a value of the number(real) type will be
promoted to decimal.

Invoking a math function on a decimal value will result in a fault if the decimal value is too large to be converted into
a number(real) type.

Decimal functions include:


abs         compare       exp         floor        fma


integer     invert        log         logb         log10


max         min           number      quantize     sqrt


string      todecimal


Return to Reference: Datascript Module Editing

abs
abs operates on a decimal field, returning the absolute value.


usage                  decimal decimal.abs(value)


arguments              value           decimal field


return                 decimal


Examples
      decimal.abs(decimal.todecimal(-2))

      Returns 2.

      decimal.abs(decimal.todecimal("-22"))

      Returns 2.




210
                                                                                            expressor functions




compare
compare operates on two decimal fields, returning an indicator of which field is greater.


usage                 decimal decimal.compare(value1, value2)


arguments             value              decimal field


return                decimal


Examples
      decimal.compare(decimal.todecimal(2),
      decimal.todecimal(3))

      Returns -1; the second value is greater than the first.

      decimal.compare(decimal.todecimal(5),
      decimal.todecimal(3))

      Returns 1; the first value is greater than the second.

      decimal.compare(decimal.todecimal(2),
      decimal.todecimal(2))

      Returns 0; the values are equal.




exp
exp operates on a decimal field, returning e raised to the argument.


usage                 decimal decimal.exp(value)


arguments             value                   decimal field


return                decimal (34 digits)


Examples
      decimal.exp(decimal.todecimal(2))

      Returns 7.389056098....

      decimal.exp(decimal.todecimal("2"))

      Returns 7.389056098....

      decimal.exp(2)




                                                                                                           211
expressor Documentation



floor
floor operates on a decimal field, returning the smallest integer value less than or equal to the argument divided by
a scale.


usage                decimal decimal.floor(value, scale)


arguments            value             decimal field

                     scale             decimal field


return               decimal


Examples
      decimal.floor(decimal.todecimal(3.14159),
      decimal.todecimal(1))

      Returns 3.

      decimal.floor(decimal.todecimal(3.14159),
      decimal.todecimal(2))

      Returns 1.




fma
fma computes the value (field1 * field2) + field3 where all three fields are decimal values.


usage                decimal decimal.fma(value1, value2, value3)


arguments            value             decimal field


return               decimal


Examples
      a = decimal.todecimal(1)
      b = decimal.todecimal(2)
      c = decimal.todecimal(3)
      decimal.fma(a, b, c)

      Returns 5.




212
                                                                                                 expressor functions



integer
integer operates on a decimal field, converting it into an expressor integer type.


usage                 integer decimal.integer(value)


arguments             value           decimal field


return                integer


Examples
     is.integer(decimal.integer(decimal.todeci
     mal(2)))

     Returns true.

     is.decimal(decimal.integer(decimal.todeci
     mal(2)))

     Returns false.

     decimal.integer(decimal.todecimal(12.4))

     Returns 12.

     decimal.integer(decimal.todecimal(12.6))

     Returns 13.

     decimal.integer(decimal.todecimal(12.5))

     Returns 12. Rounding to even is performed.

     decimal.integer(decimal.todecimal(13.5))

     Returns 14. Rounding to even is performed.

invert
invert operates on a decimal field composed only of the digits 0 and 1, returning a 34 digit decimal with the value
of each digit reversed.


usage                 decimal decimal.invert(value)


arguments             value                 decimal field


return                decimal (34 digits)


Examples
     decimal.invert(decimal.todecimal(101))

     Returns 1111111111111111111111111111111010.




                                                                                                                 213
expressor Documentation


      decimal.invert(decimal.todecimal(2))

      Throws a fault; the argument contains digits other than 0 or 1.




log
log operates on a decimal field, returning the natural logarithm.


usage                 decimal decimal.log(value)


arguments             value             decimal field


return                decimal (34 digits)


Examples
      decimal.log(decimal.todecimal(2))

      Returns 0.69314718....

      decimal.log(decimal.todecimal("2"))

      Returns 0.69314718....




logb
logb operates on a decimal field, returning the characteristic (the whole number part) of the base 10 logarithm.


usage                 decimal decimal.logb(value)


arguments             value             decimal field


return                decimal


Examples
      decimal.logb(decimal.todecimal(123))

      Since the base 10 logarithm of 123 is 2.0899, this function returns 2.




log10
log10 operates on a decimal field, returning the base 10 logarithm.


usage                 decimal decimal.log10(value)




214
                                                                   expressor functions



arguments              value          decimal field


return                decimal (34 digits)


Examples
      decimal.log10(decimal.todecimal(2))

      Returns 0.30102999....

      decimal.log10(decimal.todecimal("2"))

      Returns 0.30102999....




max
max operates on two decimal fields, returning the larger value.


usage                 decimal decimal.max(value1, value2)


arguments              value          decimal field


return                decimal


Examples
      decimal.max(decimal.todecimal(2),
      decimal.todecimal(3))

      Return the second value.




min
min operates on two decimal fields, returning the smaller value.


usage                 decimal decimal.max(value1, value2)


arguments              value          decimal field


return                decimal


Examples
      decimal.min(decimal.todecimal(2),
      decimal.todecimal(3))

      Return the first value.




                                                                                  215
expressor Documentation



number
number operates on a decimal field, converting it into an expressor number type.


usage                  number decimal.number(value)


arguments              value          decimal field


return                 number


Examples
      is.number(decimal.number(decimal.todecima
      l(2)))

      Returns true.

      is.decimal(decimal.number(decimal.todecim
      al(2)))

      Returns false.

      Note: Coercing a decimal value to the number type may result in loss of precision. For example, the
              decimal value 99999999998888888887777777777 when coerced to a number will only retain 14
              significant digits, becoming the value 9.9999999998889e+028.




quantize
quantize operates on a decimal field producing a result specified through a masking argument. The value will be
rounded to even if the mask requires truncation.


usage                  decimal decimal.quantize(value, mask)


arguments              value          decimal field

                       mask           decimal field; a pattern showing the format of
                                      the return (any digit may be used in creating
                                      the pattern)


return                 decimal


Examples
      decimal.quantize(decimal.todecimal(3.135)
      , decimal.todecimal(9.99))

      Return 3.14. Rounding to even is performed.




216
                                                                                   expressor functions


     decimal.quantize(decimal.todecimal(3.145)
     , decimal.todecimal(9.99))

     Return 3.14. Rounding to even is performed.

     decimal.quantize(decimal.todecimal(3.145)
     , decimal.todecimal(1.1))

     Return 3.1.




sqrt
sqrt operates on a decimal field, returning the square root of the argument.


usage                 decimal decimal.sqrt(value)


arguments             value           decimal field


return                decimal


Examples
     decimal.sqrt(decimal.todecimal(4))

     Returns 2.

     decimal.sqrt(decimal.todecimal("4"))

     Returns 2.




string
string operates on a decimal field, converting it into an expressor string type.


usage                 string decimal.string(value)


arguments             value           decimal field


return                string


Examples
     is.string(decimal.string(decimal.todecima
     l(2)))

     Returns true.er(decimal.todecimal(2)))

     Returns false.




                                                                                                  217
expressor Documentation



todecimal
decimal operates on a string, decimal, number, or integer field, converting it into an expressor decimal type.


usage                 decimal decimal.todecimal(value)


arguments             value            string, decimal, number, integer field


return                decimal


Examples
      is.decimal(decimal.todecimal("2"))

      Returns true. The argument is a string type.

      is.decimal(decimal.todecimal(decimal.tode
      cimal(2)))

      Returns true. The argument is a decimal type.

      is.decimal(decimal.todecimal(2))

      Returns true. The argument is a number type.

      is.decimal(decimal.todecimal(tolong(2)))

      Returns true. The argument is an integer type.




expressor Operator helper functions
Data transformation scripts are invoked transparently by expressor Engine. Sometimes the processing logic of an
operator involves multiple interrelated functions or you might want to provide set-up before the transformation script
is run. The Engine helper functions, which are also transparently invoked, work in concert with the primary operator
functions.

The helper functions are available in the following operators for the expressor Studio 3.4 version:

         Aggregate

         Filter

         Join

         Read Custom

         Transform

         Write Custom

The initialize and finalize helper functions are available in all these operators. They are not
included in the script editor starting point code so you must explicitly add them.




218
                                                                                               expressor functions



When these functions are executing, the functions in the Lua input/output and operating system libraries
are enabled, which permits, for example, reading from and writing to external files or Web services. While
either function is executing, the expressor runtime parameters are accessible.

When present, these functions are invoked once for each channel described in the network file used by
the operator but the functions cannot access any information (e.g. the value assigned to the path
attribute) present in the network file.

The following sections describe the helper functions for each operator.

Aggregate
When writing a script for this operator, implementations are provided for mandatory and helper functions.


                                     This variable is re-initialized to an empty table as the operator
                                     begins processing each group of records with the same key
Global
                  work               value. It is available when processing all the records in a group
Variable
                                     and is used to retain information throughout the processing of
                                     the records.


                                     The Engine invokes this function as each record is processed.
                                     Include in the method body expressor Datascript that
                                     performs the desired data manipulation.

                                     Arguments:

                                              input – the record to be processed
Mandatory
                  aggregate                   index – which record in the group is being processed
Functions
                                     Data extracted and processed from the record are used to
                                     update the value(s) stored in work.

                                     Note: The aggregate function can also be called as the "collate"
                                     function. However, the name "collate" will be deprecated in the
                                     future.




                                                                                                              219
expressor Documentation



                          The Engine invokes this function after the last record with a
                          specific key value has been processed. Include in the method
                          body expressor Datascript that finalizes the processing and
                          returns the output record.

                          Arguments:

                result             input– the last record in the group of records with the
                                    same key value

                                   count – the number of records in the group

                          Data extracted and processed from work and/or input are used
                          to initialize the output record. This function returns the output
                          record.


                          Providing an implementation to this function is optional, though
                          a rule for the change function is included in the Aggregate
                          operator by default. This rule cannot be deleted, but initially it is
                          disabled. It must be enabled in the Rules Editor before it can be
                          executed.

                          Arguments:

                change             input

                                   previous – the previous input record

                          If this function returns true, it begins processing another group.

                          If it returns false, processing of the current group continues.
Helper
Functions                 When providing an implementation for this function, the method
                          configuration option must be set to sorted.


                          The Engine invokes this function before processing records with
                          a new key value. Argument:

                                   input – the first record in the group of records having
                                    the same key value.
                prepare
                          Include in the method body expressor Datascript that initializes
                          variables needed for the operator processing. If desired, initialize
                          a variable named work with the fields needed to complete the
                          processing and initialize the output record.

                          If this function is not implemented, an instance of the work



220
                                                                                              expressor functions



                                variable (representing an empty table) is available to the
                                aggregate function when processing the first record with a
                                specific key value.




                                Providing an implementation for this function is optional.
                                Argument:

                                        output
                sieve           If the script returns true or non-nil value, the evaluation is "true"
                                and the record is emitted by the operator.

                                If the script returns false or nil value, the evaluation is "false" and
                                the record is not emitted by the operator.


                                Invoked one time before the operator begins to process records.
                initialize      This function has no arguments and no return value. See The
                                initialize and finalize functions for details on using initialize.


                                Invoked one time after the operator has processed all records
                                successfully.
                finalize
                                This function has no arguments and no return value. See The
                                initialize and finalize functions for details on using finalize.




Filter
When writing a script for this operator, implementations are provided for mandatory and helper functions.


                                 The Engine invokes this function as each record is processed.
 Mandatory                       Include in the method body expressor Datascript code that
                 filter
Function                         uses field values from the record to return "true" or "false."

                                 The argument to filter is the incoming record.




                                                                                                             221
expressor Documentation



                                 Unlike the other operators, the Filter operator has two outputs
                                 on the right side of its icon.

                                 The Filter operator uses the values extracted from the incoming
                                 record's fields in its processing logic. If the processing logic
                                 returns "true" or a non-nil value, the operator interprets the
                                 result as "true" and emits the record through the upper output.
                                 Otherwise, the record is emitted through the lower output.

                                 Additional outputs can be added in the Rules Editor so that
                                 more than one rule can be applied to test the incoming record.
                                 The lowest output is always reserved for the output records that
                                 test "false."

                                 Outputs do not have to be connected. An unconnected output
                                 is treated as if its rule evaluates to "false."

                                 The Reject output on the bottom of the operator shape is used
                                 when the execution of any rule fails. The Reject output does not
                                 have to be connected. When it is connected, it affects handling
                                 of records when the All option has been selected. In that
                                 situation, records are not written as soon as the rule evaluates
                                 "true." Instead, all rules must be executed first to determine
                                 whether or not the record will be rejected.


                                 Invoked one time before the operator begins to process
                                 records.
                 initialize
                                 This function has no arguments and no return value. See The
                                 initialize and finalize functions for details on using initialize.
 Helper
 Functions
                                 Invoked one time after the operator has processed all records
                                 successfully.
                 finalize
                                 This function has no arguments and no return value. See The
                                 initialize and finalize functions for details on using finalize.




Join
When writing a script for this operator, implementations are provided for mandatory and optional
functions.




222
                                                                                      expressor functions



                          The Engine invokes this function as each pair of matched
                          records is processed. Include in the method body expressor
                          Datascript code that uses field values from the records to
                          initialize the output record.
 Mandatory
             join         The arguments to join are the two incoming records. Data
Function
                          extracted and processed from the records are used to initialize
                          the output record. This function returns the output record.

                          Note: The join function can also be called as the "joiner"
                          function. The Engine looks for "join" first.


                          Providing an implementation for this function is optional.
                          Argument:

                                    output
             sieve        If the script returns true or non-nil value, the evaluation is "true"
                          and the record is emitted by the operator.

                          If the script returns false or nil value, the evaluation is "false"
                          and the record is not emitted by the operator.
 Helper
                          Invoked one time before the operator begins to process
 Functions
                          records.
             initialize
                          This function has no arguments and no return value. See The
                          initialize and finalize functions for details on using initialize.


                          Invoked one time after the operator has processed all records
                          successfully.
             finalize
                          This function has no arguments and no return value. See The
                          initialize and finalize functions for details on using finalize.




                                                                                                     223
 expressor Documentation



 Read Custom
 When writing a script for this operator, implementations are provided for mandatory and helper functions.



                                 The Engine invokes this function to get data to send to the downstream
                                 operator in the dataflow.

                                 The standard approach to using this function is for the expressor
                                 Engine to call the read function repeatedly; each time the function
 Mandatory
                 read            returns it provides the processor with a record to send downstream. The
 Functions
                                 custom code in the read function would populate and return that
                                 record or one of the other acceptable return values which control the
                                 behavior of the read function as described in the Return Values table
                                 below.



                                 A Lookup Rule uses this function to generate a new record when
                                 the requested record does not exist in the specified Lookup
                                 Table.
                 generate        The generate function is written when the On miss setting on a
                                 Lookup Rule is Generate Record. When the On miss setting is
                                 Output Nil or Escalate Error, the generate function is not
                                 needed.


                                 The Engine invokes this function one time before the operator
                                 begins to invoke the read function.

 Helper                          This function has no arguments or return values.
Functions                        Use this function to set up a connection to your data source, e.g.,
                 initialize      establish a connection to an FTP server or obtain a handle to a
                                 file that will be progressively read on each invocation of the
                                 read function.

                                 See The initialize and finalize functions for details on using
                                 initialize.


                                 The Engine invokes this function one time after the operator has
                                 completed emitting all records, that is, after the operator stops
                 finalize
                                 receiving records from its upstream operator.

                                 This function has no arguments or return values.




 224
                                                                                                  expressor functions



                                    Use this function to free any resources obtained during execution
                                    of the initialize function.

                                    See The initialize and finalize functions for details on using
                                    finalize.




Transform
A transformation can be written as either an Expression Rule or a Function Rule. When writing Function Rules for the
Transform operator, the following mandatory and helper functions are available in the Rules Editor.


                                      The Engine invokes this function as each record is processed.
                                      Include one rule with expressor Datascript code that
                                      transforms the data and returns the output record.
 Mandatory
                   transform          The argument to the transform function is the incoming record.
Function
                                      Data extracted and processed from the record are used to
                                      initialize the output record. This function returns the output
                                      record.


                                      The Engine invokes this function as each record is processed
                                      before invoking the transform function. Include in the method
                                      body expressor Datascript code that determines whether the
                                      record should be processed by the transform function.

                                      The argument to filter is the incoming record:

                                      function filter(input)
                   filter
                                      If the processing logic returns true or a non-nil value, the
 Helper                               operator interprets the result as "true" and passes the record to
 Functions                            the transform function.

                                      If the calculation returns false or a nil value, the operator
                                      interprets the result as "false" and does not pass the record to
                                      the transform function.


                                      A Lookup Rule uses this function to generate a new record
                                      when the requested record does not exist in the specified
                   generate
                                      Lookup Table.

                                      The generate function is written when the On miss setting on



                                                                                                                  225
expressor Documentation



                                 a Lookup Rule is Generate Record. When the On miss setting is
                                 Output Nil or Escalate Error, the generate function is not
                                 needed.


                                 Providing an implementation for this function is optional.
                                 Argument:

                                           output.
                 sieve           If the script returns true or non-nil value, the evaluation is
                                 "true" and the record is emitted by the operator.

                                 If the script returns false or nil value, the evaluation is "false"
                                 and the record is not emitted by the operator.


                                 Invoked one time before the operator begins to process
                                 records.
                 initialize
                                 This function has no arguments and no return value. See The
                                 initialize and finalize functions for details on using initialize.


                                 Invoked one time after the operator has processed all records
                                 successfully.
                 finalize
                                 This function has no arguments and no return value. See The
                                 initialize and finalize functions for details on using finalize.




Write Custom
When writing a script for this operator, implementations are provided for mandatory and helper functions.


                                The Data Processing Engine invokes this function as each
                                record enters the operator from its upstream operator. The
                                upstream record is provided as an input argument to this
                                function.
Mandatory
                write           The method body should be customized with expressor
Functions
                                Datascript to write the record to the external target data system
                                or otherwise process the record.

                                The return values for this function are described in the Return
                                Values section below.




226
                                                                                    expressor functions



                         A Lookup Rule uses this function to generate a new record when
                         the requested record does not exist in the specified Lookup
                         Table.
            generate     The generate function is written when the On miss setting on a
                         Lookup Rule is Generate Record. When the On miss setting is
                         Output Nil or Escalate Error, the generate function is not
                         needed.


                         The Engine invokes this function one time before the operator
                         begins to invoke the write function.

                         This function has no arguments or return values.

                         Use this function to perform one-time initialization tasks such as:
            initialize   setting up a connection to your data source, e.g., establishing a
 Helper
Functions                connection to an FTP server or obtaining a handle to a file that
                         will be written during each invocation of the write function.

                         See The initialize and finalize functions for details on using
                         initialize.


                         The Engine invokes this function one time after the operator has
                         completed emitting all records, that is, after the operator stops
                         receiving records from its upstream operator.

                         This function has no arguments or return values.
            finalize
                         Use this function to free any resources obtained during execution
                         of the initialize function.

                         See The initialize and finalize functions for details on using
                         finalize.




                                                                                                   227
expressor Documentation



expressor runtime parameters
When a dataflow runs, several pieces of runtime information can be referenced from an expressor script. The
following table lists the available runtime parameters:


record.partition    dataflow.name      dataflow.start


dataflow.pid        dataflow.ppid


record.partition
When a dataflow uses a multi-channel network, this variable contains the channel number carrying the record.

dataflow.name
The name of the currently running dataflow.

dataflow.start
The time at which execution of the dataflow began. This is different than the value returned by the datetime.start
function, which returns the time at which the function was invoked.

dataflow.pid
The process identifier associated with the running dataflow.

dataflow.ppid
The parent process identifier.


The initialize and finalize functions
The initialize and finalize helper functions are available in operators that support scripting: Read Custom,
Write Custom, Aggregate, Filter, and Transform.

         initialize is invoked one time before the operator begins to process records

         finalize is invoked one time after the operator has processed all records successfully

The functions do not have arguments and their method bodies do not include a return statement. Starting point code
for these functions is not automatically included when the operator opens in the Rules Editor: they must be explicitly
added to the script, as illustrated in the following code fragments.

      function initialize()
        -- add datascript
      end

      function finalize()
        -- add datascript
      end




228
                                                                                                expressor functions



When these functions are executing, the Lua input/output and operating system libraries functions are
enabled, which permits, for example, reading from and writing to external files. While either function is
executing, runtime parameters are accessible.

    Note: The os.exit function executes differently depending on where it is called. If it is called at the
            command line with the datascript command, it executes as designed. If it is called from within
            Studio, it does not execute because all unsaved artifacts would be lost on the exit.

The following illustrates usage of these functions.

                               t={}

                               function initialize()
                                io.input("../presidents-
                               party.csv")

                               while true do
                                 -- reads a line from file
                                 local line=io.read()
                                 -- jump out of loop on EOF
                                 if (line==nil) then break end
                                 -- Insert each party
                               affiliation into
                                 -- a specific element of the
                               table
                                 -- The table index is the first
                               field
                                 -- in each line
                                 -- The party is the second
                               field in
                                 -- each line
                                 table.insert(
                                   t,
                                   string.substring(line,1,strin
                               g.find(line,",")
                                   -1),
                                   string.substring(line,string.
                               find(line,",")+1)
                                   )
                                 end
                               end

                               function transform(input)
                                 output=input
                                 -- initialize an output field
                               with a value



                                                                                                               229
expressor Documentation



                                    -- from the table
                                    output.party=t[input.position]

                                    return output
                                  end

                                  function finalize()
                                    io.output("../confirmation.txt"
                                  )
                                    io.write("Party
                                  list:","\r\n\t")
                                    -- print out the contents of
                                  the table
                                    for i,v in pairs(t) do io.write
                                    (i, "\t",v,"\r\n\t")end
                                  end;



A table, t, declared outside the scope of any function, has global scope and can be referenced from any
function.

         Before the operator begins processing records, the code in the initialize function reads an
          external file (presidents-party.csv) and the file contents are stored in the table t. Each line
          in the file contains two entries, the index and value of a table element. Notice the usage of the
          datascript string functions substring and find to parse each line read from the file.

         The transform function populates the output record with information extracted from the input
          record and the table.

         The finalize function, invoked as the execution of the dataflow is shutting down, creates a new
          file into which the contents of the table are written.

The io.input, io.read, io.output, and io.write function calls are valid in the bodies of the
initialize and finalize functions but cannot be invoked from the transform , filter or sieve
functions (optional helper functions not shown in this example).




230
                                                                                                        expressor functions



is functions
Is functions include:


blank       decimal       datetime     empty       finite


future      integer       null         number      past


pattern     string


Return to Reference: Datascript Module Editing

blank
blank operates on a string field, returning true if the field is not nil and consists entirely of space characters.


usage                   boolean is.blank(value)


arguments               value           string field or field (e.g., an integer) that may
                                        be converted into a string field


return                  boolean


Examples
     is.blank("")

     Returns true.

     is.blank("                  ")

     Returns true.

     is.blank("1001")

     Returns false.

     is.blank(1001)

     Returns false. Notice the numeric value was converted into a string.

     is.blank(nil)

     Returns true.




datetime
datetime operates on any value, returning true if the value is a number type (without conversion).


usage                   boolean is.datetime(value [, format])




                                                                                                                       231
expressor Documentation



arguments              value           field to be analyzed

                       format          datetime formatting specification


return                 boolean


Examples
      is.datetime("")

      Returns false.

      is.datetime(string.datetime(”10128888”,
      ”MMDDCCYY”))

      Returns true.

      is.datetime("10182007")

      Returns false. The string cannot be converted into a valid datetime with the default format "CCYY-MM-DD
      HH:MI:SS".

      is.datetime("10182007", "MMDDYYYY")

      Returns true. The string is converted into a valid datetime with the format "MMDDYYYY".

      is.datetime(1018, "MMDDYY")

      Returns false. The number cannot be converted into a valid datetime with the specified format.

      is.datetime(nil)

      Returns false.

      Note: If value is a number and the format is nil, a runtime error is raised. A missing format evaluates
              to the default format ("CCYY-MM-DD HH:MI:SS"), not nil.




decimal
decimal operates on any field, returning true if the field is a decimal.


usage                  boolean is.decimal(value)


arguments              value           field to be analyzed


return                 boolean


Examples
      is.decimal(decimal.todecimal(2))

      Returns true.




232
                                                                                                     expressor functions


     is.decimal(decimal.todecimal("2"))

     Returns true.

     is.decimal(98.6)

     Returns false. A numeric value must be explicitly converted to a decimal. By default, numeric values, even
     those with decimal parts, are the expressor number type.




empty
empty operates on any field. It is used to test for nil or an empty value.

        When used with a numeric argument, true is returned if the value is nil or 0.

        When used with a string argument, true is returned if the value is nil or the empty string.

        When used with a datetime argument, true is returned for nil and is undefined for all other values. Use the
         is.null function for testing datetime values.


usage                 boolean is.empty(value)


arguments             value            field to be analyzed


return                boolean


Examples
     is.empty("")

     Returns true.

     is.empty(string.datetime(”10128888”,
     ”MMDDCCYY”))

     Returns false.

     is.empty(98.6)

     Returns false.

     is.empty(nil)

     Returns true.




finite
finite operates on any field, returning true if the field is finite or can be represented by a finite number, (a number
that is a repeating or terminating decimal).

Since expressor does not include a boolean type, the return from this function cannot be used directly. It must be
used in a logical expression (e.g. as the test condition in an if statement) rather than being assigned to a variable.



                                                                                                                         233
expressor Documentation



usage                  boolean is.finite(value)


arguments              value           field to be analyzed


return                 boolean


Examples
      is.finite("")

      Returns false. This function returns false for all string arguments.

      is.finite(string.datetime(”10128888”,
      ”MMDDCCYY”))

      Returns true.

      is.finite(98.6)

      Returns true.

      is.finite(nil)

      Returns false.

      Note: This function is equivalent to the math.finite function.




integer
integer operates on any field, returning true if the field is an integer.


usage                  boolean is.integer(value)


arguments              value           field to be analyzed


return                 boolean


Examples
      is.integer(2)

      Returns false. The default expressor numeric type is number.

      is.integer(tolong(2))

      Returns true.

      is.integer(tolong("2"))

      Returns true.




234
                                                                                                    expressor functions



future
future operates on a datetime field, returning true if the field represents a date in the future.


usage                   boolean is.future(value)


arguments               value            datetime field to be analyzed


return                  boolean


Examples
       is.future("")

       Returns false.

       is.future(string.datetime(”10128888”,
       ”MMDDCCYY”))

       Returns true.

       is.future(4554)

       Returns false. The number is interpreted as seconds after the epoch January 1, 2000.

       is.future(nil)

       Returns false.

    Note: This function is equivalent to the datetime.future function.
               If datetime values are provided with only time information, the date is assumed to be the epoch.




null
null operates on any field, returning true for if the field is nil.


usage                   boolean is.null(value)


arguments               value            field to be analyzed


return                  boolean


Examples
       is.null("")

       Returns false.

       is.null(string.datetime(”10128888”,
       ”MMDDCCYY”))

       Returns false.




                                                                                                                   235
expressor Documentation


      is.null(98.6)

      Returns false.

      is.null(nil)

      Returns true.

      is.null(nil)

      Returns true.




number
number operates on any field, returning true if the value is a numeric or datetime type (without conversion). Numeric
must be the double data type, not integer or decimal.


usage                  boolean is.number(value)


arguments              value           field to be analyzed


return                 boolean


Examples
      is.number("")

      Returns false.

      is.number("15")

      Returns false.

      is.number(15)

      Returns true.

      is.number(math.cos(90))

      Returns true.

      is.number(tointeger(90))

      Returns false.

      is.number(nil)

      Returns false.




past
past operates on any field, returning true if the field is a numeric or datetime type (without conversion).




236
                                                                                                     expressor functions



usage                   boolean is.past(value)


arguments               value                field to be analyzed


return                  boolean


Examples
       is.past("")

       Returns false.

       is.past(string.datetime(”10128888”,
       ”MMDDCCYY”))

       Returns false.

       is.past(4554)

       Returns true. The number is interpreted as a datetime in seconds after the epoch January 1, 2000.

       is.past(nil)

       Returns false.

    Note: This function is equivalent to the datetime.past function.
               If datetime values are provided with only time information, the date is assumed to be the epoch.




pattern
pattern operates on any field, returning the pattern if the field contains the pattern expression; otherwise, returns
nil.


usage                 boolean is.pattern(value, pattern[, begin])


arguments               value              string field to search

                        pattern            pattern to find

                        begin              optional start position at which to begin the search. Default
                                           is 1.
                                           negative values are offsets from the right end of value


return                boolean


Examples
       is.pattern("hello all users", "hello")

       Returns hello.



                                                                                                                    237
expressor Documentation


      is.pattern("hello all users", "hello", 5)

      Returns nil because the search begins with character 5, which is beyond the matching string.

      is.pattern("Today is Tuesday,
      11/13/2007", "%d+/%d+/%d+")

      Returns 11/13/2007 since the pattern %d+/%d+/%d+ matches the characters 11/13/2007.

      Note: nil is not a valid an argument to this method. Passing nil raises an exception and processing
               terminates.




string
string operates on any field, returning true if the field is a string type (without conversion).


usage                  boolean is.string(value)


arguments              value                field to be analyzed


return                 boolean


Examples
      is.string("")

      Returns true.

      is.string(string.datetime(”10128888”,
      ”MMDDCCYY”))

      Returns false.

      is.string("10182007")

      Returns true.

      is.string(10182007)

      Returns false.

      is.string(nil)

      Returns false.




238
                                                                                                   expressor functions



logging functions
Logging functions include:


abort       information      notice   warning


    Note: The logging functions are only operational at runtime. They cannot be tested in a business rule
               created in expressor Studio or when using synthetic debugging in Studio.

Return to Reference: Datascript Module Editing

abort
abort prints a message to the expressor log and the Windows event log or Linux syslogd and then ends the
application.


usage                log.abort([...,] value1 [,value]2)


arguments            value                 the message to print to the log; ... arguments are
                                           formatting directives; format as a C print statement.


return




information
information prints a message to the expressor log only.


usage                boolean log.information([...,] value1 [,value]2)


arguments            value                 the message to print to the log; ... arguments are
                                           formatting directives; format as a C print statement.


return               true


Examples
     log.information("Processing entry: %s",
     some_field)

     Writes content similar to the following into the expressor log.

     <information item="log">Processing entry:
     ...</notice>

     Where ... is the content from some_field.




                                                                                                                  239
expressor Documentation



notice
notice prints a message to the expressor log and the Windows event log or Linux syslogd.


usage                boolean log.notice([...,] value1 [,value]2)


arguments             value                 the message to print to the log; ... arguments are
                                            formatting directives; format as a C print statement.


return               true


Examples
      log.notice("Processing entry: %s",
      some_field)

      Write content similar to the following into the expressor log.

      <notice item="log">Processing entry:
      ...</notice>

      Where ... is the content from some_field.




warning
warning prints a message to the expressor log and to the Windows event log or Linux syslogd.


usage                boolean log.warning([...,] value1 [,value]2)


arguments             value                 the message to print to the log; ... arguments are
                                            formatting directives; format as a C print statement.


return               true




240
                                                                                                      expressor functions



lookup functions
The lookup functions are designed for use in a Lookup Rule, particularly a Lookup Function Rule. To use them in a
Lookup Expression Rule would require calling them from a Datascript Module because more than a single code
statement is required for practical use of the functions.

The lookup functions are used when a Lookup Rule specifies that a record is to be generated when a lookup
operation fails to find a record in a Lookup Table. Generate Record is selected from the On miss drop-down menu at
the top of a Lookup Rule box:




There is one lookup function: get_connection. This function takes as an argument the name of an expressor
Lookup Artifact and returns an expressor.lookup.connection object value if it finds the artifact and returns
nil if the artifact is not found in the Project.

In addition to the get_connection function, the Lookup library contains six objects:

expressor.lookup.connection

expressor.lookup.reader

expressor.lookup.writer

expressor.lookup.updater

expressor.lookup.deleter

expressor.lookup.rowid

    Note: These objects use the colon operator to extract data from the in-memory data stores
              created by the Write Lookup operator.

expressor.lookup.connection is the top-level object that is used to create other objects. It is created using the
lookup.get_connection function. For example:

     connection =
     lookup.get_connection(ProjectName_LookupTableArtifactName
     )

    Note: The get_connection function requires the Project name and the name of the Lookup Table artifact.
              The two names must be separated by an underscore (_) character.

The connection is then used to get the reader, range_reader, writer, updater, and deleter objects such as:

     writer = connection:get_writer()

See the example below.




                                                                                                                     241
expressor Documentation



expressor.lookup.connection
The connection object has the following methods:

get_reader

usage              expressor.lookup.connection:get_reader(key)


input               key                 One value corresponding to the key fields in the Lookup
                                        Table artifact. The key identifies the record(s) selected from
                                        the connected Lookup Table.


return            On success, returns an expressor.lookup.reader object that
                  performs exact key matching. On failure returns nil.


get_range_reader

usage              expressor.lookup.connection:get_range_reader(key)


input               key                 One value corresponding to the key fields in the Lookup
                                        Table artifact. The key identifies the record(s) selected from
                                        the connected Lookup Table.


return            On success, returns an expressor.lookup.reader object that
                  performs key matching within a range. On failure returns nil.


get_writer

usage              expressor.lookup.connection:get_writer()


input               none


return            On success, returns an expressor.lookup.writer object that
                  inserts data. On failure returns nil.


get_updater

usage              expressor.lookup.connection:get_updater()


input               none


return            On success, returns an expressor.lookup.updater object that
                  updates a single row. On failure returns nil.


get_deleter

usage              expressor.lookup.connection:get_deleter()




242
                                                                                                expressor functions



input               none


return            On success, returns an expressor.lookup.deleter object that
                  deletes a single row. On failure returns nil.


lock

usage               expressor.lookup.connection:lock()


input               none


return            On success, prevents any other operator or dataflow from
                  accessing the Lookup Table and returns true. Returns nil if
                  the Lookup Table is locked by another process.


unlock

usage               expressor.lookup.connection:unlock()


input               none


return            On success, releases a locked Lookup Table for use by other
                  processes.


    Note: Frequent locking and unlocking of a Lookup Table slows down performance of lookups to the table.




expressor.lookup.reader
The reader object has the following methods:

execute

usage               expressor.lookup.reader:execute(table)


input               table                Table with the same structure as the Lookup Key specified
                                         in the expressor.lookup.connection:get_reader
                                         method.


return            none


expressor.lookup.reader:execute queries the Lookup Table using the appropriate match
method with the key values passed in.




                                                                                                               243
expressor Documentation



next

usage               expressor.lookup.reader:next()


input               none


return            A table with the same structure as the Lookup Table and an
                  expressor.lookup.rowid object when there is a record
                  available from the previous query. Returns {nil,nil} when a
                  record is not available.

expressor.lookup.reader:next can be called multiple times for each call to
expressor.lookup.reader:execute when a non-unique key is used.




expressor.lookup.writer
The writer object has the following method:

execute

usage               expressor.lookup.writer:execute(table)


input               table                Table with the same structure as the Lookup Table
                                         specified in the
                                         expressor.lookup.connection:get_writer
                                         method.


return            On success, inserts a new record and returns the rowid of the
                  new record and an empty string. On failure, returns nil and
                  an error string.

expressor.lookup.writer:execute can fail because of a unique key constraint violation
or a Semantic Type constraint violation.




expressor.lookup.updater
The updater object has the following method:

execute

usage               expressor.lookup.updater:execute(rowid,table)


input               rowid                Identifier of the row to be updated.

                    table                Table with the same structure as the Lookup Table
                                         specified in the




244
                                                                                                     expressor functions


                                           expressor.lookup.connection:get_updater
                                           method.


return             On success, updates the identified record and returns the
                   rowid of the updated record and an empty string. On failure,
                   returns nil and an error string.

expressor.lookup.updater:execute can fail because of a unique key constraint violation
or a Semantic Type constraint violation.




expressor.lookup.deleter
The deleter object has the following method:

execute

usage                expressor.lookup.deleter:execute(rowid)


input                rowid                 Identifier of the row to be deleted.


return             none


expressor.lookup.deleter:execute deletes the identified record if it exists in the
Lookup Table.




expressor.lookup.rowid
The rowid object is an opaque type and has no methods.

Example
The datascript below uses an initialize function to create a connection object for the Lookup Table identified by the
Lookup Table artifact named "CityStation." That connection object is, in turn, used to create a get_writer object that
can be used to write a record to the Lookup Table.

The generate function then creates an output table with two fields, city and call_sign. The call_sign comes as input to
the Transform operator and is used to lookup up the related city in the Lookup Table. The writer:execute function
writes the output table to the Lookup Table and returns the rowid of the table record it has added. If the
writer:execute is successful, the message is an empty string.

     function initialize()
        conn = lookup.get_connection("Project2_CityStation")
        writer = conn:get_writer()
      end

     function generate(input)
       output = {}



                                                                                                                     245
expressor Documentation


           output.city="XXXX"
           output.call_sign=input.call_sign

            rowid, message = writer:execute(output)
            if not (rowid) then error(message)
            else log.notice("RowID: " .. tostring(rowid))end

           reader = conn:get_reader("CallSign")
           if reader then log.notice("Good reader")
           else log.notice("No reader")
           end

           r={call_sign="WCAU"}
           reader:execute(r)
           out,rowid=reader:next()
           log.notice("Return:
         " .. out.call_sign .. out.city .. "ID:
         " .. tostring(rowid))

           return output;

         end;



math functions
expressor math functions include all of the math manipulation functions contained in Lua math library as well as
additional functions. While the expressor function names might appear to be aliases for existing math functions, this
interpretation is too simplistic. The expressor math functions contain error messages specific to expressor. Refer to
the functional descriptions in this manual.

Math functions include:


abs             acos    asin           atan      atan2


ceiling         cosh    cos/cosine     deg       exp


finite          floor   log            log10     max


min             power   rad            round     sin/sine


sinh            sqrt    tan/tangent    tanh


Return to Reference: Datascript Module Editing




246
                                                                                                     expressor functions



abs
abs operates on a numeric field, or string field converted into a numeric, returning the absolute value.


usage                numeric math.abs(value)


arguments            value                  numeric field or string field converted into numeric field


return               numeric


Examples
      math.abs("-44")

      Returns 44.

      math.abs(-44)

      Returns 44.

      math.abs(44)

      Returns 44.

      Note: Passing nil returns nil.




acos
acos operates on a numeric field, or string field converted into a numeric, returning the arc cosine value.


usage                numeric math.acos(value)


arguments            value                   numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric


Examples
      math.acos(0)

      Returns 1.570796.

      Note: Passing nil returns nil.




                                                                                                                    247
expressor Documentation



asin/asine
asine operates on a numeric field, or string field converted into a numeric, returning the arc sine value.


usage                numeric math.asine(value)


arguments              value                 numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric


Examples
      math.asine(0)

      Returns 0.

      math.asine(".5")

      Returns 0.5236.

      Note: Passing nil returns nil.



atan
atan operates on a numeric field, or string field converted into a numeric, returning the arc tangent value.


usage                  numeric math.atan(value)


arguments              value                 numeric field or string field converted into numeric field; units
                                             are in radians


return                 numeric


Examples
      math.atan(1)

      Returns 0.785.

      math.atan(".5")

      Returns 0.463.

      Note: Passing nil returns nil.




248
                                                                                                    expressor functions



atan2
atan2 operates on two numeric arguments of the same type, returning the principal value of the nonzero complex
number (X, Y) formed by the real arguments Y and X.


usage               numeric math.atan2(Y, X)


arguments            Y                      numeric fields or string field converted into numeric field; units
                     X                      are in radians


return              numeric


Examples
     math.atan2(1.5574077, 1.0)

     Returns 1.0.

    Note: The arguments must be the same numeric type.

            The result is in the range -PI to <= PI.

            If Y=0, X cannot be 0.

            If Y=0 and X>0, the result is zero.

            If Y=0 and X<0, the result is PI.

            If X=0, the absolute value of the result is PI/2.

            Passing nil returns nil.



ceiling
ceiling operates on a numeric field, or string field converted into a numeric, returning the smallest integer value
greater than or equal to the argument.


usage               integer math.ceiling(value)


arguments            value                  numeric field or string field converted into numeric field


return              integer


Examples
     math.ceiling("-1.55")

     Returns -1.

     math.ceiling(55.99)



                                                                                                                      249
expressor Documentation



      Returns 56.

      Note: Passing nil returns nil.



cosh
cosh operates on a numeric field, or string field converted into a numeric, returning the hyperbolic cosine.


usage                numeric math.cosh(value)


arguments            value                   numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric


Examples
      math.cosh(0.5)

      Returns 1.12763.

      Note: Passing nil returns nil.



cos/cosine
cos operates on a numeric field, or string field converted into a numeric, returning the cosine value. You can also use
cosine.


usage                numeric math.cos(value)


arguments            value                   numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric


Examples
      math.cos(0.5)

      Returns 0.8775.

      Note: Passing nil returns nil.




250
                                                                                                     expressor functions



deg
deg operates on a numeric field (or string field converted into a numeric) interpreted as radians, returning the value
in degrees.


usage                numeric math.deg(value)


arguments            value                   numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric (units are degrees)


Examples
      math.deg(3.14159)

      Returns 180.

    Note: Passing nil returns nil.

exp
exp operates on a numeric field, returning e raised to the argument.


usage                numeric math.exp(value)


arguments            value             numeric field


return               numeric


Examples
      math.exp(2.302585)

      Returns 10.

      math.exp(0)

      Returns 1.

    Note: Passing nil returns nil.



finite
finite operates on a numeric field or any field that can be interpreted as a number, returning true if the field is
finite or can be represented by a finite number (a number that is a repeating or terminating decimal).

Since expressor does not include a boolean type, the return from this function cannot be used directly. It must be
used in a logical expression rather than being assigned to a variable.




                                                                                                                      251
expressor Documentation



usage                boolean math.finite(value)


arguments             value            field to be analyzed


return               boolean


Examples
      if (math.finite(nil)) then ... [else ...]
      end

      The function call returns false, the then clause skipped and the optional else clause executed.

      if (math.finite(98.6)) then ... [else
      ...] end

      The argument is a finite number, so the call returns true. The then clause is executed.

      Note: This function is equivalent to the is.finite function.
              Passing nil returns nil.



floor
floor operates on a numeric field, or string field converted into a numeric, returning the smallest integer value less
than or equal to the argument.


usage                integer math.floor(value)


arguments             value                  numeric field or string field converted into numeric field


return               integer


Examples
      math.floor(2.8)

      Returns 2.

      math.floor("-2.8")

      Returns -3.

      Note: Passing nil returns nil.




252
                                                                      expressor functions



log
log operates on a numeric field, returning the natural logarithm.


usage                  numeric math.log(value)


arguments              value          numeric field


return                 numeric


Examples
      math.log(2)

      Returns 0.693.

      math.log(10000)

      Returns 9.210.

      Note: If value is zero, the return is infinite.
              If value is negative, the return is not a number.
              Passing nil returns nil.



log10
log10 operates on a numeric field, returning the base 10 logarithm.


usage                  numeric math.log10(value)


arguments              value          numeric field


return                 numeric


Examples
      math.log10(2)

      Returns 0.301.

      math.log10(10000)

      Returns 4.

      Note: If value is zero, the return is infinite.
              If value is negative, the return is not a number.
              Passing nil returns nil.




                                                                                     253
expressor Documentation



max
max operates on a list of numeric fields, or string field converted into a numeric, returning the argument with the
largest value.


usage                 nummeric math.max(value, ...)


arguments              value                 numeric field or string field converted into numeric field


return                numeric


Examples
      math.max(9, "999", 99)

      Returns 999.

      math.max(-4, -3)

      Returns -3.

      Note: nil is not a valid an argument to this method. Passing nil raises an exception and
                 processing terminates.



min
max operates on a list of numeric fields, or string field converted into a numeric, returning the argument with the
largest value.


usage                 nummeric math.min(value, ...)


arguments              value                 numeric field or string field converted into numeric field


return                numeric


Examples
      math.min(9, "999", 99)

      Returns 9.

      Note: nil is not a valid an argument to this method. Passing nil raises an exception and
                 processing terminates.




254
                                                                                                     expressor functions



power
power operates on two numeric fields, or string field converted into numerics, returning the first argument raised to
the second argument. You can also use pow.


usage               numeric math.power(value1, value2)


arguments            value                   numeric field or string field converted into numeric field


return              numeric


Examples
      math.power(2, 3)

      Returns 8.

      math.power("4", 2)

      Returns 16.

      Note: Passing nil returns nil.



rad
rad operates on a numeric field (or string field converted into a numeric) interpreted as degrees, returning the value
in radians.


usage               numeric math.rad(value)


arguments            value                   numeric field or string field converted into numeric field; units
                                             are in degrees


return              numeric (units are radians)


Examples
      math.rad(180)

      Returns 3.14159.

      Note: Passing nil returns nil.



round
round returns a number rounded to a specified number of decimal places. Rounding can be up or down.


usage               numeric math.round(value, precision[, direction])




                                                                                                                    255
expressor Documentation



arguments              value       numeric field to be rounded

                       precision   number of decimal places

                       direction   by default, values 5 and above result in upward rounding

                                   negative entry, rounding is always downwards
                                   positive entry, rounding is always upwards



                                   overrides default rounding rules (e.g. a positive entry always
                                   rounds up regardless of the value of the discarded digits


return                 numeric


Examples
      math.round(6.234, 2)

      Returns 6.23.

      math.round(-7.234, 2)

      Returns -7.23.

      math.round(8.235, 2)

      Returns 8.24.

      math.round(-9.235, 2)

      Returns -9.24.

      math.round(1.234, 2, 1)

      Returns 1.24.

      math.round(-2.234, 2, 1)

      Returns -2.23.

      math.round(3.235, 2, 1)

      Returns 3.24.

      math.round(-4.235, 2, 1)

      Returns -4.23.

      math.round(5.230, 2, 1)

      Returns 5.23.

      math.round(3.234, 2, -1)

      Returns 3.23.




256
                                                                                                     expressor functions


     math.round(-5.234, 2, -1)

     Returns -5.24.

     math.round(7.235, 2, -1)

     Returns 7.23.

     math.round(-9.235, 2, -1)

     Returns -9.24.

    Note: Passing nil returns nil.



sin/sine
sin operates on a numeric field, or string field converted into a numeric, returning the sine value. You can also use
sine.


usage                numeric math.sin(value)


arguments             value                  numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric


Examples
     math.sin(0)

     Returns 0.

     math.sin(1.570796)

     Returns 1.

    Note: Passing nil returns nil.



sinh
sinh operates on a numeric field, or string field converted into a numeric, returning the hyperbolic sine.


usage                numeric math.sinh(value)


arguments             value                  numeric field or string field converted into numeric field; units
                                             are in radians


return               numeric




                                                                                                                    257
expressor Documentation



Examples
      math.sinh(0.5)

      Returns 0.521.

      Note: Passing nil returns nil.



sqrt
sqrt operates on a numeric field, or string field converted into a numeric, returning the square root of the argument.


usage                  integer math.sqrt(value)


arguments              value                numeric field or string field converted into numeric field


return                 integer


Examples
      math.sqrt(36)

      Returns 6.

      Note: If value is negative, the return is not a number.
              Passing nil returns nil.

tan/tangent
tan operates on a numeric field, or string field converted into a numeric, returning the tangent value. You can also
use tangent.


usage               numeric math.tan(value)


arguments              value                numeric field or string field converted into numeric field; units
                                            are in radians


return              numeric


Examples
      math.tan(1)

      Returns 1.557.

      Note: Passing nil returns nil.




258
                                                                                                    expressor functions



tanh
tanh operates on a numeric field, or string field converted into a numeric, returning the hyperbolic tangent.


usage                 numeric math.tanh(value)


arguments             value                 numeric field or string field converted into numeric field; units
                                            are in radians


return                numeric


Examples
     math.tanh(1)

     Returns 0.762.

    Note: Passing nil returns nil.




                                                                                                                   259
expressor Documentation



string functions
expressor string functions include all of the string manipulation functions contained in Lua string library as well as
additional functions. While some expressor function names appear to be aliases for existing string functions (e.g.
repeat as an alias for rep or substring as an alias for sub), this interpretation is too simplistic. The expressor
string functions contain error messages that are specific to expressor. Refer to the functional descriptions in this
manual.

The string functions include:


allow         bit          byte         char            concatenate


datetime      duplicate    filter       find            format


frequency     insert       iterate      leftpad         lefttrim


length        lower        match        metaphone       replace


reverse       rightpad     righttrim    soundex         squeeze


substring     title        trim         upper           ..


Several of the string functions employ pattern matching capabilities.

Return to Reference: Datascript Module Editing

allow
allow operates on a string field, returning a string containing only the "allowed" characters.


usage                  string string.allow(value, allowed)


arguments              value           byte or string field

                       allowed         subset of characters to return


return                 string


Examples
      string.allow("George W. Bush", "GWB")

      Returns the string GWB.

      string.allow("I.B.M.", "IBM")

      Returns the string IBM.

      string.allow(nil, "IBM")




260
                                                                                                        expressor functions



      Returns nil.




bit
bit operates on a string field returning a bit set. The string field can optionally use characters to represent the
numerics 0 and 1. In expressor, bit sets contain 64 bits. The right-most bit is bit position 1 and the left-most bit is bit
position 64.


usage                bit set string.bit(value[, format])


arguments             value             byte or string field

                      format            the characters representing 0 and 1


return               bit set


Examples
      string.bit("1101")

      Returns the bit set 0000000000000000000000000000000000000000000000000000000000001101.

      string.bit("TFTF", "TF")

      Returns the bit set 0000000000000000000000000000000000000000000000000000000000001010.




byte
byte operates on a string field, returning the internal numerical codes for the characters. If both start and end are
specified, it returns an integer[]. To capture the elements of a multi-element array, assign the return to multiple
variables or use the select function to specify a single element.


usage                integer[] string.byte(value[, begin[,end]])


arguments             value                   byte or string field

                      begin                   optional positional offset in value of the first character to
                                              return.
                                              Default is 1, negative entries offset from the right end of value

                      end                     optional positional offset of last character to return.
                                              Default is start


return               integer[]




                                                                                                                        261
expressor Documentation



Examples
      string.byte("abcd")

      Returns the single element array: 97, which corresponds to the character "a."

      string.byte("abcd", 2)

      Returns the single element array: 98, which corresponds to the character "b."

      var1, var2 = string.byte("abcd", 1, 2)

      Returns the two element array: 97, 98, which are assigned to the variables var1 and var2 respectively.

      var2 = select(2, string.byte("abcd", -2,
      -1))

      Returns the two element array: 99, 100, which correspond to the characters "c" and "d" respectively, but the
      select function extracts only the second element (100, "d") into the variable var2.

      Note: Passing nil returns nil.



char
char operates on multiple byte codes, returning the corresponding character string.


usage                string string.char(integer[, ...])


arguments             integer                comma separated list of byte codes to be converted into
                                             alphanumeric characters

                                             only the values 0 through 127 can be specified; nil cannot be
                                             used as a value


return               string


Examples
      string.char(97, 98, 99, 100)

      Returns the string abcd.

      string.char(65, 66, 67)

      Returns the string ABC.

      Note: Passing nil returns nil.




262
                                                                                                 expressor functions



concatenate
concatenate operates on a list of fields, returning a concatenated string.


usage               string string.concatenate(value[, ...)


arguments            value                  fields to be concatenated.
                                            fields can be any type convertible to a string


return              string


Examples
     string.concatenate("expressor", " ",
     "software")

     Returns the string expressor software.

     string.concatenate(77, "+", 78)

     Returns the string 77+78. Notice the numeric fields are converted to string types and then concatenated.

    Note: All fields are converted to string .
             A nil argument is ignored and the remaining arguments concatenated.



datetime
datetime operates on a string field, returning an unformatted datetime as the number of seconds since January 1,
2000.


usage               datetime string.datetime(value[, format])


arguments            value                 string representation of a datetime

                     format                datetime format specification of value (not the format of the
                                           return value)


                                           the default format is CCYY-MM-DD HH24:MI:SS

                                           the format is optional only when value has the default format


                                                   Format                      Interpretation


                                              HH24                 hours in 24 hour format

                                                                   hours in 24 hour format
                                              H*24
                                                                   one digit hour formatting when



                                                                                                                263
expressor Documentation



                                      appropriate

                          HH12        hours in 12 hour format

                                      hours in 12 hour format
                          H*12        one digit hour formatting when
                                      appropriate

                          HH          hours in 24 hour format

                                      hours in 24 hour format
                          H*          one digit hour formatting when
                                      appropriate

                          MI          minutes

                          SS          seconds

                          s[ssssss]   fractional seconds

                                      used with HH or HH12 to indicate
                                      whether hour values are AM or
                                      PM

                          AM or PM    only valid if a full time format,
                                      including fractional seconds, is
                                      specified

                                      this value is passed to the output

                          DD          day

                                      day specified as either one or two
                                      digits

                                      format pattern must be delimited,
                                      i.e.
                          D*          MM-D*-CCYY or MM/D*/CCYY,
                                      not MMD*CCYY

                                      valid format delimiters are space,
                                      hyphen, forward slash, comma
                                      and period




264
                                     expressor functions




       invalid day specification accepted;
       converts the day to either 01 or
D?
       the last day of the month based
       on the input value

       allows processing of mixed
       day/month, giving precedence to
DM     day

       used in conjunction with the MD
       format

DDD    day of week abbreviated

DAY    day of week abbreviated

DDDD   day of week in long format

DDAY   day of week in long format

JJJ    Julian day of year

MM     month

       month specified as either one or
       two digit

       format pattern must be delimited,
       i.e.
M*     M*-DD-CCYY or M*/DD/CCYY, not
       M*DDCCYY

       valid format delimiters are space,
       hyphen, forward slash, comma
       and period

       invalid month specification
M?
       accepted

       allows processing of mixed
       month/day, giving precedence to
MD     month

       used in conjunction with the DM
       format




                                                    265
expressor Documentation




                                                   MMM             month in short format (e.g., JAN)

                                                                   month in long format (e.g.,
                                                   MMMM
                                                                   January)

                                                   YY              years

                                                                   forces a century designation
                                                                   anchored to NN

                                                                   in a date field, a two character
                                                   YNN
                                                                   year is interpreted as the current
                                                                   century if less than NN and the
                                                                   previous century if greater than
                                                                   NN

                                                   CC              century

return                datetime


Examples
      string.datetime("2005-01-12 16:30:30")

      Returns a datetime field containing 158862630, 2005-01-12 16:30:30 as the number of seconds since
      January 1, 2000.

      string.datetime("05-012 16:30", "YY-JJJ
      HH24:MI")

      Returns a datetime field containing 158862630, 05-012 16:30 as the number of seconds since January 1, 2000.
      Notice how the second argument describes the format of the first argument.

      string.datetime("05-01-12", "Y09-MM-DD")

      Returns a datetime field containing the date 2005-01-12, January 12, 2005. The year 05 is interpreted as the
      current century, 2005, since it is less than 09.

      string.datetime(nil)

      Returns nil.




duplicate
duplicate operates on a string field, returning the concatenation of a specified number of repeats of the string.


usage                 string string.duplicate(value, count)




266
                                                                                                      expressor functions



arguments            value                  string field or field (e.g. an integer) that can be converted into a
                                            string field

                     count                  the number of times to concatenate the string to itself


return              string


Examples
     string.duplicate(9, 9)

     Returns the string 999999999. Notice the numeric value has been converted into a string.

     string.duplicate("*", 4)

     Returns the string ****.

    Note: Passing nil returns nil.



filter
filter operates on a string field, returning a string from which "filtered" characters have been removed.


usage               string string.filter(value, filter)


arguments            value            byte or string field

                     filter           subset of characters to remove


return              string


Examples
     string.filter("George W. Bush", "BgG.")

     Returns the string eore W ush. Notice that multiple characters can be filtered and that the order of the
     characters to be filtered can be random.

     string.filter(nil)

     Returns nil.




                                                                                                                     267
expressor Documentation



find
find operates on a string field, returning the starting and ending character positions of a specified character pattern.
If the pattern has subpatterns (referred to as captures), the characters in the subpattern are also returned.


usage                 integer[] string.find
                     (value, pattern[, begin[,off]])


arguments             value                    byte or string field

                      pattern                  character pattern to find

                      begin                    optional starting point offset in value. If negative, offset search
                                               from right end of value and perform search from left to right

                      off                      if true, turn off character class capabilities in pattern and the
                                               function does a basic find substring operation


return                integer, integer[, capture[, ...]]


Examples
      var1, var2 = string.find("Hello all
      users", "all")

      Begins its pattern search from position 1 (H). Returns the starting position 7, into var1, and the ending position
      9 into var2.

      var1, var2 = string.find("Hello all
      users", "e", -5)

      Begins its pattern search from the fifth position from the string's right end, and searches from left to right.
       Returns the starting position 13 and the ending position 13, the position of the e character in the word users
      into both var1 and var2.

      var1, var2 = string.find("Hello all
      users", "%su")

      Begins its pattern search from position 1 (H). Returns the starting position 10, into var1, and the ending position
      11, into var2, corresponding to the space before, and the u character in, the word users.

      var1, var2 = string.find("Hello all
      users", "%su", 1, true)

      The fourth argument turns off pattern matching. Returns nil into var1 and var2.

      var1, var2, var3 = string.find("Hello all
      users", ".([lo]+)")

      Begins its pattern search from position 1 (H). Returns the starting position 2 into var1, the ending position 5 into
      var2, and the capture llo into var3.



268
                                                                                                      expressor functions


     var1, var2, var3 = string.find("Hello all
     users", ".([l]+)")

     Begins its pattern search from position 1 (H). Returns the starting position 2 into var1, the ending position 4 into
     var2, and the capture ll. into var3.

     In the two preceding examples, notice the interpretations of the two patterns. The character class [lo]+ is
     interpreted as one or more "l" or "o" characters, which makes the "e"character the start of the pattern and the
     space following "Hello" the end of the pattern. The character class [l]+ is interpreted as one or more "l"
     characters, which also starts the pattern at the "e" character but ends the pattern at the "o" character.

     var1 = 0
     while true do
         var1 = string.find("Hello all users",
     "%s", var1+1)
         if var1 == nil then break end
         -- other code using var1
     end

     Begins its pattern search from position 1 (H) and iterates through the sting finding each space character and
     placing its index into var1. This example was derived from an example in the book version of the Lua reference
     manual (Ierusalimschy, R., Programming in Lua, 2nd edition, Lua.org, Rio de Janeiro, 2006).

    Note: If off is specified, start must also be specified.
              Pattern matching guidelines are described in expressor pattern guidelines.
              Passing nil returns nil.



format
format returns a formatted version of its variable number of arguments. The format of the return value is described
in the first argument, which must be a string.


usage                string string.format(formatstring, ...)


arguments            formatstring            the formatting to be applied to the remaining arguments. The
                                             format string uses the same terminology as the C function
                                             printf

                                             the options c, d, E, e, f, g, G, i, o, u, X, and x require numbers
                                             as arguments; q and s require string arguments


                     ...                     one, or more, arguments that are returned as a string formatted
                                             as per formatstring


return               string




                                                                                                                     269
expressor Documentation



Examples
      string.format("%04d", 5)

      Returns the string 0005 (the numeric argument has been left padded with zeros to create a four character string.

      string.format("%s%s", "hello ",
      "expressor user")

      Returns the string hello expressor user.

      Note: The function does not accept string values containing embedded zeros except as an
              argument to a formatstring containing the q option.
              Passing nil returns nil.



frequency
frequency operates on a string field, returning the number of occurrences of a specified character.


usage                integer string.frequency(value, character)


arguments             value                   string field or field (e.g. an integer) that can be converted into a
                                              string field

                      character               character to determine frequency.
                                              Although character can be a string, only the frequency is
                                              determined for the first character


return               integer


Examples
      string.frequency("expressor", "s")

      Returns the integer 2.

      string.frequency("apple                    pie", " ")

      Returns the integer 5 (the number of spaces between apple and pie.

      string.frequency(000001234, "0")

      Returns the integer 0. Notice that for the analysis, the numeric 000001234 was transparently converted into the
      string 1234.

      string.frequency("000001234", "0")

      Returns the integer 5. Notice that for the analysis, leading zeros in the string numeric are not removed.

      string.frequency(nil, "a")

      Returns the integer 0.




270
                                                                                                     expressor functions



insert
insert operates on a string field, returning a string with another embedded string at a specified position.


usage                string string.insert(value, insert[, begin])


arguments            value                   string field

                     insert                  string to insert

                     begin                   optional position in value after which to embed insert. If
                                             unspecified, the insertion point is at the beginning of value


return               string


Examples
     string.insert("pressor", "ex")

     Returns the string expressor.

     string.insert("expreor", "ss", 5)

     Returns the string expressor.

     string.insert("hello users", "all ", 6)

     Returns the string hello all users.

     Note: Passing nil returns nil.



iterate
iterate operates on a string field using an iterator function to return all occurrences of the specified pattern in the
string.


usage                string string.iterate(value, pattern)


arguments            value             the string to examine

                     pattern           a pattern to find in value


return               string


     Note: Passing nil returns nil.




                                                                                                                     271
expressor Documentation



leftpad
leftpad operates on a string field, returning the string padded at the left end to a specified length with a defined
character. You can also use lpad.


usage                  string string.leftpad(value, length[, character])


arguments              value                  string field or field (e.g. an integer) that can be converted into a
                                              string field

                       length                 final length of padded string. If the length of the starting string is
                                              greater than length, the starting string is returned unaltered

                       character              the padding character. Default is space


return                 string


Examples
      string.leftpad("expressor", 12)

      Returns the string     expressor (the initial string preceded by three space characters).

      string.leftpad("expressor", 12, "*")

      Returns the string ***expressor.

      string.leftpad("expressor", 2)

      Returns the string expressor (since the starting string is greater than the specified padded string, the starting
      string is returned).

      string.leftpad(1234, 5, "0")

      Returns the string 01234.

      string.leftpad(nil, 5, "a")

      Returns nil.




lefttrim
lefttrim operates on a string field, returning the string trimmed of space characters from the left end up to the first
character that is not a space. You can also use ltrim.


usage                  string string.lefttrim(value)


arguments              value                 string field or field (e.g. an integer) that can be converted into a
                                             string field




272
                                                                                                      expressor functions



return               string


Examples
     string.lefttrim("              expressor          ")

     Returns the string expressor         . Only the space characters to the left of the starting string are removed.

     string.lefttrim(string.concatenate("
     Hello", " world"))

     Returns the string Hello world. Only the space character to the left of the concatenated string is removed.
     The space character at the left end of world is internal to the concatenated string and is not removed.

     string.lefttrim(nil)

     Returns nil.




length
length operates on a string field, returning the length of the string.


usage                string string.length(value)


arguments            value             string field


return               integer


Examples
     string.length("expressor")

     Returns 9.

     string.length(string.concatenate("Hello",
     " world"))

     Returns 11.

    Note: The empty string has length zero.
             Passing nil returns nil.



lower
lower operates on a string field, returning the string with all upper case characters converted to lower case. Lower
case characters are returned unaltered.


usage                string string.lower(value)




                                                                                                                        273
expressor Documentation



arguments             value             string field


return               string


Examples
      string.lower("EXPRESSOR")

      Returns expressor.

      string.lower(string.concatenate("HELLO",
      " World"))

      Returns hello world.

      Note: Passing nil returns nil.



match
match operates on a string field looking for the first match of a specified pattern. If a match is identified, the function
returns the characters matching the pattern.


usage                string string.match(value, pattern[, begin])


arguments             value                    string field to search

                      pattern                  pattern to find

                      begin                    optional start position at which to begin the search. Default is 1.
                                               Negative values are offsets from the right end of value


return               string


Examples
      string.match("hello all users", "hello")

      Returns the string hello.

      string.match("hello all users", "hello",
      5)

      Returns nil because the search begins with character 5, which is beyond the matching string.

      string.match("Today is Tuesday,
      11/13/2007", "%d+/%d+/%d+")
      string.match("Today is Tuesday, 1/11/08",
      "%d+/%d+/%d+")




274
                                                                                                 expressor functions



     The first function call returns 11/13/2007 while the second function call returns 1/11/08. Notice how the
     pattern successfully selects the date content from the string without any pre-existing knowledge of the number
     of characters used to represent month, day or year. The pattern %d+ specifies one or more characters for each
     representation.

    Note: Passing nil returns nil.



metaphone
metaphone operates on a string field, returning a key string using a metaphone phonetic algorithm. Similar sounding
strings return the same keys. Keys are variable length and composed of upper case alphabetic characters.


usage                  string string.metaphone(value)


arguments              value          string to be encoded


return                 string with the pattern %u+


Examples
     string.metaphone("Gammon")

     Returns KMN.

     string.metaphone("gamon")

     Returns KMN.

     string.metaphone("gamin")

     Returns KMN.

     string.metaphone("Cameron")

     Returns KMRN.

     string.metaphone("bruise")

     Returns BRS.

     string.metaphone("Bruce")

     Returns BRS.

     string.metaphone(nil)

     Returns nil.




                                                                                                                 275
expressor Documentation



replace
replace operates on a string field, returning a string with occurrences of a specified character sequence replaced
with another character sequence. The number of occurrences replaced is also returned.


usage                string string.replace(value, old, new[, count])


arguments            value                  string field or field (e.g. an integer) that can be converted into a
                                            string field

                     old                    the character sequence to be replaced

                     new                    the character sequence to be inserted

                     count                  the number of times to replace the character string old


return               string, integer


      Examples
       string.replace("expreSSor", "SS", "ss")

      Returns the string expressor.

      string.replace(12345, "23", "99")

      Returns the string 19945. Notice the numeric value 12345 has been converted into a string type.

      Note:

      If new is a string, its value is used for replacement. The character % works as an escape
              character:

                  any sequence in new of the form %n, with n between 1 and 9, stands for the value
                   of the n-th captured substring

                  the sequence %0 stands for the whole match

                  the sequence %% stands for a single %

      If new is a function, this function is called every time a match occurs, with all captured
              substrings passed as arguments in order. If the pattern specifies no captures, the whole
              match is passed as a sole argument.

      Passing nil returns nil.




276
                                                                                                        expressor functions



reverse
reverse operates on a string field, returning a string with the order of characters reversed.


usage               string string.reverse(value)


arguments            value                    string field or field (e.g. an integer) that can be converted into a
                                              string field


return              string


Examples
     string.reverse("expressor")

     Returns the string rosserpxe.

     string.reverse(12345)

     Returns the string 54321. Notice the numeric value 12345 has been converted into a string type.

    Note: Passing nil returns nil.



rightpad
rightpad operates on a string field, returning the string padded at the right end to a specified length with a
specified character. You can also use rpad.


usage               string strangleholds, length[, character])


arguments            value                    string field or field (e.g. an integer) that can be converted into a
                                              string field

                     length                   final length of padded string. If the length of the starting string
                                              is greater than length, the starting string is returned unaltered

                     character                the padding character. Default is space


return              string


Examples
     string.rightpad("expressor", 12)

     Returns the string expressor        (the initial string followed by three space characters).

     string.rightpad("expressor", 12, "*")

     Returns the string expressor***.




                                                                                                                       277
expressor Documentation


      string.rightpad("expressor", 2)

      Returns the string expressor (since the starting string is greater than the specified padded string, the starting
      string is returned).

      string.rightpad(1234, 5, "0")

      Returns the string 12340.

      string.rightpad(nil, 5, "a")

      Returns nil.




righttrim
righttrim operates on a string field, returning the string trimmed of space characters from the right up to the first
character that is not a space. You can also use rtrim.


usage                  string string.righttrim(value)


arguments              value                  string field or field (e.g. an integer) that can be converted into a
                                              string field


return                 string


Examples
      string.righttrim("              expressor            ")

      Returns the string     expressor. Only the space characters to the right of the starting string are removed.

      string.righttrim(string.concatenate(" Hello
      ", "world "))

      Returns the string Hello world. Only the space character to the right of the concatenated string is removed.
      The space character at the right end of Hello is internal to the concatenated string and is not removed.

      string.righttrim(nil)

      Returns the nil.




soundex
soundex operates on a string field creating a Soundex code, which is used to compare strings with the same
pronunciation but with different spellings. Generally an encoding consists of four code points: the string's first
character followed by three numeric characters representing the string's remaining consonants (see Wikipedia —
Soundex). The expressor soundex function returns a Soundex code containing a specified number of code points,
which is not necessarily a multiple of four characters..




278
                                                                                                       expressor functions



usage               string string.soundex(value[, points])


arguments            value                  string to be converted into a Soundex code

                     points                 optional specification of the number of code points;
                                            not necessarily a multiple of 4


return              string with the pattern %u%d+


Examples
     string.soundex("expressor", 4)

     Returns E216. Notice that 4 code points is not sufficient to fully represent the string. When invoked with 6 code
     points, the function returns E21626, which includes points for all of the string's consonants, and 8 code points
     return E2162600, which is longer than necessary.

     string.soundex("parallel", 4)

     Returns P644. This Soundex code is a complete representation of the string.

     string.soundex("expressor parallel", 8)

     Returns E2162616. As with the first example, the number of code points is not sufficient to fully represent the
     string; 10 code points generates E216261644, which is of a sufficient length.

     string.soundex(nil)

     Returns nil.




squeeze
squeeze operates on a string field, returning the string with duplicate adjacent characters removed.


usage               string string.squeeze(value[, character])


arguments            value                   string field or field (e.g. an integer) that can be converted into a
                                             string field

                     character               the character to remove. Default is the space character


return              string


Examples
     string.squeeze("expressor", "s")

     Returns the string expresor.

     string.squeeze("apple               pie")



                                                                                                                      279
expressor Documentation



      Returns the string apple pie.

      string.squeeze(00001234, "0")

      Returns the string 01234. Notice the numeric value 00001234 has been converted into a string type.

      string.squeeze(nil)

      Returns nil.




substring
substring operates on a string field, returning a substring.


usage                 string string.substring(value[, begin[, end])


arguments             value                byte or string field

                      begin                optional positional offset into value,default is 1.
                                           If negative offset from right end of value

                      end                  optional positional offset of last character to return.
                                           Default is to the last character of value


return                string


Examples
      string.substring("expressor", 2, 2)

      Returns the string x.

      string.substring("expressor", 2)

      Returns the string xpressor.

      string.substring("expressor", -3)

      Returns the string sor.

      string.substring("expressor", -5, -4)

      Returns the string es.

      Note: Passing nil returns nil.




280
                                                                                                   expressor functions



title
title operates on a string field, returning the string converted to title format.


usage                string string.title(value)


arguments            value             string field


return               string


Examples
     string.title("expressor")

     Returns the string Expressor.

     string.title("mR. jones")

     Returns the string Mr. Jones.

     string.title(string.concatenate("mR.", "
     ", "SMITH"))
     string.title(string.concatenate("dr.",
     "BILL")

     The first invocation returns the string Mr. Smith and the second invocation returns Dr.bill. The space
     character is essential for proper formatting of the last parameter.

    Note: Passing nil returns nil.

trim
trim operates on a string field, returning the string with space characters removed from the left and right ends.


usage                string string.trim(value)


arguments            value             string field


return               string


Examples
     string.trim("            expressor          ")

     Returns the string expressor.

     string.title(string.concatenate("                     hello
     ", "world "))

     Returns the string hello world. The space character at the right end of hello is internal to the concatenated
     string and is not removed.

     string.trim(nil)



                                                                                                                    281
expressor Documentation



      Returns nil.




upper
upper operates on a string field, returning the string with all lower case characters converted to upper case. Upper
case characters are returned unaltered.


usage                string string.upper(value)


arguments            value            string field


return               string


Examples
      string.upper("expressor")

      Returns EXPRESSOR.

      string.upper(string.concatenate("hello",
      " World"))

      Returns HELLO WORLD.

      Note: Passing nil returns nil.



..
The string concatenation operator denoted by two dots ('..') operates on two operands that are strings or numbers.
The operands are converted to strings. When a number is used where a string is expected, the number is converted to
a string, in a reasonable format. For complete control over how numbers are converted to strings, use the see
string.format.

usage                .. value ..


arguments            value            string or number field


return               string


Examples
      message = "The" ..funcName.. "function failed with return
      code" ..retCode

         Returns The exit function failed with the return code 0014




282
                                                                                                         expressor functions



table functions
Table functions, which are only relevant for numerically indexed tables, include:


concat      remove


insert      sort


maxn        #


Return to Reference: Datascript Module Editing

concat
concat returns a string that is the concatenation of all, or a range of, elements in a table.


usage                table.concat(table [, separator [, first [, last]]])


arguments             table                  a numerically indexed table where all values are strings or
                                             numbers. Table elements are extracted in continuous
                                             numerical order until the last element in the continuous
                                             order. For example, if a table has elements numbered 1, 2, 3,
                                             4, 6, elements 1-4 would be extracted and concatenated.
                                             Element number 6 would not be included. If the first element
                                             is not numbered 1, then no elements will be extracted.

                      separator              the character(s) to insert between each value extracted from
                                             the table. The default separator is the empty string.

                      first                  the index of the first table element to extract. Default is 1. If
                                             first element is empty, no elements are extracted.

                      last                   the index of the last table element to extract. Default is the
                                             length of the table (#table). If numerical indexing is not
                                             continuous, the last element extracted will be the last element
                                             in the continuous numerical order.


return               string




                                                                                                                        283
expressor Documentation



insert
insert inserts an element into a table, shifting existing elements up if necessary.


usage                table.insert(table, [position,] value)


arguments             table                  a numerically indexed table into which to insert an element.

                      position               the index at which to insert the element. Default is at the end
                                             of the table (#table + 1). If there is a break in the numerical
                                             sequence of the index, the inserted element will be placed at
                                             the end of the continuous numerical sequence. If the first
                                             element is not numbered 1, then the inserted element will be
                                             placed at the beginning of the table, at index position 1.

                      value                  the value to insert


return


      Examples
       table.insert(table, 2, value)

      Inserts value into table at index 2. If necessary, existing elements at indices 2 and higher are shifted up.

      table.insert(table, value)

      Inserts value at the end of table.




maxn
maxn returns the largest positive numerical index of a table, or zero if the table has no positive numerical indices.


usage                table.maxn(table)


arguments             table             a numerically indexed table. Table elements
                                        are evaluated in continuous numerical order
                                        until the last element in the continuous order.
                                        For example, if a table has elements numbered
                                        1, 2, 3, 4, 6, the largest positive numerical index
                                        would be 4.


return              number




284
                                                                                                      expressor functions



remove
remove removes an element from a table, shifting existing elements down if necessary.


usage                table.remove(table [, position])


arguments            table                    a numerically indexed table from which to remove an element.

                     position                 the index at which to remove the element. Default is at the end
                                              of the table (#table), or if there is a break in the numerical
                                              sequence of the index, the default location is the end of the
                                              continuous numerical sequence.


return




sort
sort sorts the elements in a table.


usage                table.sort(table [, order])


arguments            table                    the table to sort

                     order                    a function that takes two arguments and returns true when the
                                              table element represented by the first argument should come
                                              before the table element represented by the second argument.


return             none, the table is sorted in place


   Examples
    table.sort(table)

     Sorts the contents to table in-place.

     table.sort(table, function(a,b) return (a
     > b) end)

     Sorts the contents of table in-place such that element a has a lower position than element b if element a's field
     value is greater than element b's field value. The table is sorted descending.

     table.sort(table, function(a,b) return (a
     < b) end)

     Sorts the contents of table in-place such that element a has a lower position than element b if element a's field
     value is less than element b's field value. The table is sorted ascending.




                                                                                                                     285
expressor Documentation




#
The # operator returns the length of a numerically indexed table.


usage                     #table_name


arguments                  table_name             a sequentially numbered indexed table whose length is being
                                                  determined.


return                  the number of elements in the table, up to the highest continuously numbered element.
                          If the first element is not numbered 1, then the length value returned will be zero.




ustring functions
expressor ustring functions are used to manipulate Unicode characters. They are similar to the string functions with
the following caveats:

          When a string function takes or returns a byte index, the ustring function has a character index.

          When a string function operates on individual bytes, the ustring function operates on whole characters.

          When a string function takes a byte value, the ustring function takes a codepoint value.

       Note: Passing Unicode strings to non-ustring functions can produce unpredictable results.

The ustring functions include:


allow             bit              char          codepoint


concatenate       datetime         decimal       duplicate      filter


find              format           frequency     insert         iterate


leftpad           lefttrim         length        lower          match


replace           reverse          rightpad      righttrim      squeeze


substring         title            trim          unescape       upper


Several of the string functions employ pattern matching capabilities.

Return to Reference: Datascript Module Editing




286
                                                                                                       expressor functions



allow
allow operates on a Unicode string, returning a string containing only the "allowed" characters.


usage                ustring ustring.allow(value, allowed)


arguments             value             Unicode string

                      allowed           subset of codepoints to return


return               ustring




bit
bit operates on a Unicode string returning a bit set. The string field can optionally use characters to represent the
numerics 0 and 1. In expressor, bit sets contain 64 bits. The right-most bit is bit position 1 and the left-most bit is bit
position 64.


usage                bit set ustring.bit(value[, format])


arguments             value             Unicode string

                      format            the characters representing 0 and 1


return               bit set




char
char operates on multiple codepoints, returning the corresponding Unicode string. The ustring.char function does
not work with the datascript command in the expressor command prompt window.


usage                ustring ustring.char(integer[, ...])


arguments             integer                 comma separated list of codepoints to be converted into
                                              alphanumeric characters


return               ustring




                                                                                                                        287
expressor Documentation



codepoint
codepoint operates on a Unicode string, returning the internal numerical codes for the characters. If both start
and end are specified, it returns an integer[]. To capture the elements of a multi-element array, assign the return
to multiple variables or use the select function to specify a single element.


usage               integer[] ustring.codepoint(value[, begin[,end]])


arguments            value                  Unicode field

                     begin                  optional positional offset in value of the first character to
                                            return.
                                            Default is 1, negative entries offset from the right end of value

                     end                    optional positional offset of last character to return.
                                            Default is start


return              integer[]


      Note: Passing nil returns nil.



concatenate
concatenate operates on a list of Unicode strings, returning a concatenated string.


usage               ustring ustring.concatenate(value[, ...)


arguments            value                  Unicode strings


return              ustring


      Note: A nil argument is ignored and the remaining arguments concatenated.



datetime
datetime operates on a Unicode string, returning an unformatted datetime as the number of seconds since January
1, 2000.


usage               datetime ustring.datetime(value[, format])


arguments            value                  Unicode string representation of a datetime

                     format                 datetime format specification of value (not the format of the




288
                                                     expressor functions



return value)


the default format is CCYY-MM-DD HH24:MI:SS

the format is optional only when value has the default format


        Format                    Interpretation


   HH24               hours in 24 hour format

                      hours in 24 hour format
   H*24               one digit hour formatting when
                      appropriate

   HH12               hours in 12 hour format

                      hours in 12 hour format
   H*12               one digit hour formatting when
                      appropriate

   HH                 hours in 24 hour format

                      hours in 24 hour format
   H*                 one digit hour formatting when
                      appropriate

   MI                 minutes

   SS                 seconds

   s[ssssss]          fractional seconds

                      used with HH or HH12 to indicate
                      whether hour values are AM or
                      PM

   AM or PM           only valid if a full time format,
                      including fractional seconds, is
                      specified

                      this value is passed to the output

   DD                 day




                                                                    289
expressor Documentation




                                 day specified as either one or two
                                 digits

                                 format pattern must be delimited,
                                 i.e.
                          D*     MM-D*-CCYY or MM/D*/CCYY,
                                 not MMD*CCYY

                                 valid format delimiters are space,
                                 hyphen, forward slash, comma
                                 and period

                                 invalid day specification accepted;
                                 converts the day to either 01 or
                          D?
                                 the last day of the month based
                                 on the input value

                                 allows processing of mixed
                                 day/month, giving precedence to
                          DM     day

                                 used in conjunction with the MD
                                 format

                          DDD    day of week abbreviated

                          DAY    day of week abbreviated

                          DDDD   day of week in long format

                          DDAY   day of week in long format

                          JJJ    Julian day of year

                          MM     month

                                 month specified as either one or
                                 two digit

                                 format pattern must be delimited,

                          M*     i.e.
                                 M*-DD-CCYY or M*/DD/CCYY, not
                                 M*DDCCYY

                                 valid format delimiters are space,
                                 hyphen, forward slash, comma



290
                                                                                                 expressor functions



                                                                  and period

                                                                  invalid month specification
                                              M?
                                                                  accepted

                                                                  allows processing of mixed
                                                                  month/day, giving precedence to
                                              MD                  month

                                                                  used in conjunction with the DM
                                                                  format

                                              MMM                 month in short format (e.g., JAN)

                                                                  month in long format (e.g.,
                                              MMMM
                                                                  January)

                                              YY                  years

                                                                  forces a century designation
                                                                  anchored to NN

                                                                  in a date field, a two character
                                              YNN
                                                                  year is interpreted as the current
                                                                  century if less than NN and the
                                                                  previous century if greater than
                                                                  NN

                                              CC                  century

return              datetime




decimal
decimal operates on a Unicode string that is interpretable as a number, returning a decimal.


usage               decimal ustring.decimal(value)


arguments           value                  Unicode string


return              decimal




                                                                                                                291
expressor Documentation



duplicate
duplicate operates on a Unicode string, returning the concatenation of a specified number of repeats of the string.


usage                ustring ustring.duplicate(value, count)


arguments            value                   Unicode string

                     count                   the number of times to concatenate the string to itself


return               ustring


      Note: Passing nil returns nil.



filter
filter operates on a Unicode string, returning a string from which "filtered" characters have been removed.


usage                ustring ustring.filter(value, filter)


arguments            value             Unicode string

                     filter            subset of codepoints to remove


return               ustring




find
find operates on a Unicode string, returning the starting and ending codepoint positions of a specified character
pattern. If the pattern has subpatterns (referred to as captures), the characters in the subpattern are also returned.


usage               integer[] ustring.find
                   (value, pattern[, begin[,off]])


arguments            value                   Unicode string

                     pattern                 codeset pattern to find

                     begin                   optional codepoint starting point offset in value. If negative,
                                             offset search from right end of value and perform search from
                                             left to right




292
                                                                                                     expressor functions



                     off                     if true, turn off character class capabilities in pattern and the
                                             function does a basic find substring operation


return               integer, integer[, capture[, ...]]


    Note: Be certain to use the %! prefix when specifying patterns using Unicode character
              combinations.
              If off is specified, begin must also be specified.
              Passing nil returns nil.

format
format returns a formatted version of its variable number of arguments. The format of the return value is described
in the first argument, which must be a string.


usage                ustring ustring.format(formatstring, ...)


arguments            formatstring            the formatting to be applied to the remaining arguments. The
                                             format string uses the same terminology as the C function
                                             printf

                                             the options c, d, E, e, f, g, G, i, o, u, X, and x require numbers
                                             as arguments; q and s require string arguments


                     ...                     one, or more, codepoint values that are returned as a Unicode
                                             string as per formatstring


return               ustring


    Note: The function does not accept string values containing embedded zeros except as an
              argument to a formatstring containing the q option.
              Passing nil returns nil.

frequency
frequency operates on a Unicode string, returning the number of occurrences of a specified character.


usage                integer ustring.frequency(value, character)


arguments            value                   Unicode string

                     character               codepoint or codepoint sequence to determine frequency.


return               integer




                                                                                                                    293
expressor Documentation




insert
insert operates on a Unicode string, returning a string with another embedded string at a specified position.


usage               ustring ustring.insert(value, insert[, begin])


arguments            value                  Unicode string

                     insert                 Unicode string to insert

                     begin                  optional codepoint position in value after which to embed
                                            insert. If unspecified, the insertion point is at the beginning of
                                            value


return              ustring


      Note: Passing nil returns nil.



iterate
iterate operates on a Unicode string using an iterator function to return all occurrences of the specified pattern.


usage               ustring ustring.iterate(value, pattern)


arguments            value            Unicode string

                     pattern          a pattern to find in value


return              ustring


      Note: Be certain to use the %! prefix when specifying patterns using Unicode character
             combinations.
             Passing nil returns nil.



leftpad
leftpad operates on a Unicode string, returning the string padded at the left end to a specified length with a
defined character. You can also use lpad.


usage               ustring ustring.leftpad(value, length[, character])




294
                                                                                                      expressor functions



arguments            value                   string field or field (e.g. an integer) that can be converted into a
                                             string field

                     length                  final length, in codepoints, of padded string. If the length of the
                                             starting string is greater than length, the starting string is
                                             returned unaltered

                     character               the padding character. Default is space


return              ustring




lefttrim
lefttrim operates on a Unicode string, returning the string trimmed of white whitespace characters from the left
end up to the first character that is not whitespace. You can also use ltrim.


usage               ustring ustring.lefttrim(value)


arguments            value                  Unicode string


return              ustring




length
length operates on a Unicode string, returning the number of codepoints.


usage               integer ustring.length(value)


arguments            value            Unicode string


return              integer


    Note: An empty value has length zero.
             Passing nil returns nil.




                                                                                                                     295
expressor Documentation



lower
lower operates on a Unicode string, returning the Unicode string with all upper case codepoints converted to lower
case. Lower case codepoints are returned unaltered.


usage                ustring ustring.lower(value)


arguments            value             Unicode string


return               ustring


      Note: Passing nil returns nil.



match
match operates on a Unicode string looking for the first match of a specified pattern. If a match is identified, the
function returns the codepoints matching the pattern.


usage                ustring ustring.match(value, pattern[, begin])


arguments            value                   Unicode string

                     pattern                 pattern to find

                     begin                   optional codepoint start position at which to begin the search.
                                             Default is 1.
                                             Negative values are offsets from the right end of value


return               ustring


      Note: Be certain to use the %! prefix when specifying patterns using Unicode character
              combinations.
              Passing nil returns nil.



replace
replace operates on a Unicode string, returning a Unicode string with occurrences of a specified codepoint
sequence replaced with another codepoint sequence. The number of occurrences replaced is also returned.


usage                ustring ustring.replace(value, old, new[, count])


arguments            value                   Unicode string




296
                                                                                                       expressor functions



                     old                      the codepoint sequence to be replaced

                     new                      the codepoint sequence to be inserted

                     count                    the number of times the codepoint sequence old was replaced


return              ustring, integer


    Note: Be certain to use the %! prefix when specifying patterns using Unicode character
             combinations.
             If new is a function, this function is called every time a match occurs, with all captures
             passed as arguments in order.
             If the pattern specifies no captures, the whole match is passed as a sole argument.
             Passing nil returns nil.

reverse
reverse operates on a Unicode string, returning a Unicode string with the order of codepoints reversed.


usage               ustring ustring.reverse(value)


arguments            value                    Unicode string


return              ustring


    Note: Passing nil returns nil.



rightpad
rightpad operates on a Unicode string, returning the string padded at the right end to a specified length with a
specified character. You can also use rpad.


usage               ustring ustring.leftpad(value, length[, character])


arguments            value                    Unicode string

                     length                   final length, in codepoints, of padded string. If the length of the
                                              starting string is greater than length, the starting string is
                                              returned unaltered

                     character                the padding character. Default is space


return              ustring




                                                                                                                      297
expressor Documentation



righttrim
righttrim operates on a Unicode string, returning the string trimmed of white space characters from the right up to
the first character that is not white space. You can also use rtrim.


usage                ustring ustring.righttrim(value)


arguments            value                   Unicode string


return               ustring




squeeze
squeeze operates on a Unicode string, returning the string with duplicate adjacent characters removed.


usage                ustring string.squeeze(value[, character])


arguments            value                   Unicode string

                     character               the codepoint to remove. Default is the space character


return               ustring




substring
substring operates on a Unicode string, returning a segment of the string.


usage                ustring ustring.substring(value[, begin[, end])


arguments            value                  Unicode string

                     begin                  optional character index for positional offset into
                                            value,default is 1.
                                            If negative offset from right end of value

                     end                    optional character index for positional offset of last character to
                                            return.
                                            Default is to the last character of value


return               ustring


      Note: Passing nil returns nil.



298
                                                                                                 expressor functions



title
title operates on a Unicode string, returning the string converted to title format.


usage               ustring ustring.title(value)


arguments            value            Unicode string


return              ustring


    Note: Passing nil returns nil.

trim
trim operates on a Unicode string, returning the string with whitespace characters removed from the left and right
ends.


usage               ustring ustring.trim(value)


arguments            value            Unicode string


return              ustring




unescape
unescape parses a Unicode string, allowing \x and \u escapes.


usage               ustring ustring.unescape(value)


arguments            value            Unicode string


return              ustring




upper
upper operates on a Unicode string, returning the Unicode string with all lower case codepoints converted to upper
case. Upper case codepoints are returned unaltered.


usage               ustring ustring.upper(value)


arguments            value            Unicode string


return              ustring


    Note: Passing nil returns nil.


                                                                                                                 299
expressor Documentation



utility functions
Utility functions include:


crc                decrypt              encrypt            first


sequence           surrogate            unpack


store_decimal      retrieve_decimal     store_number       retrieve_number


store_integer      retrieve_integer     store_datetime     retrieve_datetime


store_string       retrieve_string      store_binary       retrieve_binary


store_boolean      retrieve_boolean     store_double       retrieve_double


Return to Reference: Datascript Module Editing

crc
crc operates over a list of string arguments, returning a cycle redundancy check (crc) error-detecting code typically
used to detect errors in data transmission or storage.


usage                 numeric utility.crc(value, ...)


arguments              value               string field from which to generate the crc. Maximum number of
                                           fields is 1. An error returns if a non-string datatype is passed in.


return                numeric


      Examples
       utility.crc(...)

      Returns a 10 character numeric.

      utility.crc(nil)

      Returns 0.




decrypt
decrypt returns a single field decrypted with a key known only to expressor Engine. The argument must have been
encrypted with the utility.encrypt function.


usage                 variable utility.decrypt(value)




300
                                                                                                  expressor functions



arguments           value            an encrypted byte[32] data field


return             variable


   Examples
    utility.decrypt(...)

    Returns the original non-encrypted field.

    Note: The return type is the same as the original type of the field prior to encryption.
             Decrypted strings are no longer than 30 characters.
             nil is not a valid an argument to this method. Passing nil raises an exception and
             processing terminates.



encrypt
encrypt returns a single field encrypted with a key known only to the Engine. The argument must subsequently be
decrypted with the utility.decrypt function. This function always returns a 32 byte value.


usage              byte[32] utility.encrypt(value)


arguments           value                  a non-encrypted data field

                                           maximum length is 32 bytes for all types except strings, which
                                           are limited to 30 bytes


return             byte[32]


   Examples
    utility.encrypt(...)

    Returns a byte[32] containing an encrypted representation of the original data field value.

    If you need to include an encrypted field in an image definition, the corresponding image field must be of type
    byte[], varbyte or varbyte2, as shown in the following example (assignments to the name and external
    attributes are illustrations: the extent attribute must be fixed).

    <varbyte2 name="encryptedField" external="EF"
    extent="fixed" value="32"/>

    Note: Since this function always returns byte[32]containing the encrypted value, strings to
             be encrypted can be no longer than 30 characters (2 bytes in the encrypted value are
             required to manage the string type).
             nil is not a valid an argument to this method. Passing nil raises an exception and
             processing terminates.



                                                                                                                 301
expressor Documentation




first
first operates on a list of arguments, returning the first argument that is not nil.


usage                  variable utility.first(value, ...)


arguments              value                  list of fields to examine.
                                              Maximum number of fields is 20


return                 variable


      Examples
       utility.first(nil, 77)

      Returns 77.

      utility.first("apple", "pear")

      Returns apple.




sequence
sequence returns an auto-incremented 64-bit integer value.


usage                  integer utility.sequence([start[, increment]])


arguments              start              The value returned by the first call to the function. Optional.

                       increment          The value by which to increment each return from the previous return.
                                          Optional: requires a value for start.


return                 integer


      Examples
       utility.sequence()

      Returns 1.

      Rerunning the function returns 2, 3, ....

      utility.sequence(3,6)

      Returns 3.

      Rerunning the function returns 9, 15, ....

      Note: Sequence values are unique to each partition.



302
                                                                                                expressor functions



surrogate
surrogate returns a unique surrogate key that can be used across multiple network channels without collisions.


usage               string utility.surrogate([value])


arguments           value             optional value between 1 and 7.
                                      Default is 0


return              string


The return string has a maximum length of 19 characters.

   Examples
    utility.surrogate()

    Returns 5879631589425635158.

    utility.surrogate(1)

    Returns 5983697425891123456.

    utility.surrogate(7)

    Returns 6898785236914732135.

    Note: Using surrogate keys allows seven applications to operate concurrently against the same table.




unpack
unpack returns each numerically indexed element of a table.

    Note: The unpack function works with the datascript command but not embedded in a transformation.
             To get the functionality of table.unpack in a transformation, use the table.concat function
             with a Tab character (table.concat(table_name,"             ").


usage               unpack(table)


arguments           table            the table to unpack


return            each value from the table, up to the highest continuously numbered
                  element. If the first element is not numbered 1, then no elements
                  will be unpacked.


Examples
    t= {6, 7, 8, 9}
    print (unpack(t))
    -- prints 6 7 8              9




                                                                                                                 303
expressor Documentation




      tt = {six=6, seven=7, 8, 9}
      print (unpack(tt))
      -- prints 8 9



store_decimal
store_decimal saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage                store_decimal(name, value)


arguments            name              variable in which to store a decimal value




                     value             a decimal value to be stored for persistence
                                       between separate runs of a dataflow and
                                       visible between multiple dataflows.




      Example
This example stores a decimal value in the variable PI and uses the retrieve_decimal function to print out the stored
value.

      > utility.store_decimal("PI",todecimal("3.1415926535"))
      > print(utility.retrieve_decimal("PI"))
      3.1415926535



retrieve_decimal
retrieve_decimal returns the Persistent Value stored in a variable.


usage                retrieve_decimal(name)


arguments            name              variable in which to store a decimal value


return             a decimal value stored for persistence between separate runs of a
                   dataflow and visible between multiple dataflows.




304
                                                                                                  expressor functions



store_number
store_number saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage               store_number(name, value)


arguments            name             variable in which to store a numeric value




                     value            a numeric value to be stored for persistence
                                      between separate runs of a dataflow and
                                      visible between multiple dataflows.


retrieve_number
retrieve_number returns the Persistent Value stored in a variable.


usage               retrieve_number(name)


arguments            name             variable in which to store a numeric value


return             a numeric value stored for persistence between separate runs of a
                   dataflow and visible between multiple dataflows.




store_integer
store_integer saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage               store_integer(name, value)


arguments            name             variable in which to store an integer value




                     value            an integer value to be stored for persistence
                                      between separate runs of a dataflow and
                                      visible between multiple dataflows.




                                                                                                                    305
expressor Documentation




      Example
This example stores an integer value in the variable INT and uses the retrieve_integer function to print out the stored
value.

      > utility.store_integer("INT",tointeger("12345678"))
      > print(utility.retrieve_decimal("INT"))
      12345678

If a period is inserted, the tointeger function will round the number.

      > utility.store_integer("INT2",tointeger("1234.5678"))
      > print (utility.retrieve_integer("INT2"))
      1235



retrieve_integer
retrieve_integer returns the Persistent Value stored in a variable.


usage                retrieve_integer(name)


arguments            name              variable in which to store an integer value


return              an integer value stored for persistence between separate runs of a
                    dataflow and visible between multiple dataflows.




store_datetime
store_datetime saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage                store_datetime(name, value)


arguments            name              variable in which to store a datetime value




                     value             a datetime value to be stored for persistence
                                       between separate runs of a dataflow and
                                       visible between multiple dataflows.




306
                                                                                                    expressor functions




retrieve_datetime
retrieve_datetime returns the Persistent Value stored in a variable.


usage                retrieve_datetime(name)


arguments            name              variable in which to store a datetime value


return             a datetime value stored for persistence between separate runs of a
                   dataflow and visible between multiple dataflows.




store_string
store_string saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage                store_string(name, value)


arguments            name              variable in which to store a string value




                     value             a string value to be stored for persistence
                                       between separate runs of a dataflow and
                                       visible between multiple dataflows.




   Example
This example stores a string value in the variable column2 and uses the retrieve_string function to print out the stored
value.

     > utility.store_string("column2","consolidated first
     quarter data")
     > print(utility.retrieve_string("column2"))
     consolidated first quarter data




                                                                                                                    307
expressor Documentation



retrieve_string
retrieve_string returns the Persistent Value stored in a variable.


usage               retrieve_string(name)


arguments            name             variable in which to store a string value


return             a string value stored for persistence between separate runs of a
                   dataflow and visible between multiple dataflows.




store_binary
store_binary saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage               store_binary(name, value)


arguments            name             variable in which to store a binary value




                     value            a binary value to be stored for persistence
                                      between separate runs of a dataflow and
                                      visible between multiple dataflows.




retrieve_binary
retrieve_binary returns the Persistent Value stored in a variable.


usage               retrieve_binary(name)


arguments            name             variable in which to store a binary value


return             a binary value stored for persistence between separate runs of a
                   dataflow and visible between multiple dataflows.




308
                                                                                                    expressor functions



store_boolean
store_boolean saves a name/value pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .


usage                store_boolean(name, value)


arguments            name              variable in which to store a boolean value




                     value             a boolean value to be stored for persistence
                                       between separate runs of a dataflow and
                                       visible between multiple dataflows.




   Example
This example stores a boolean value in the variable TorF and uses the retrieve_retrieve function to print out the stored
value.

     > utility.store_boolean("TorF",is.blank(" "))
     > print(utility.retrieve_boolean("TorF"))
     true



retrieve_boolean
retrieve_boolean returns the Persistent Value stored in a variable.


usage                retrieve_boolean(name)


arguments            name              variable in which to store a boolean value


return             a boolean value stored for persistence between separate runs of a
                   dataflow and visible between multiple dataflows.




store_double
store_double saves a name/value (double precision) pair as a Persistent Value. It does not return a value.

Persistent Values are stored in system global equivalent to the APPDATE directory. The name of that directory can
vary, but a commonly used directory name is CommonAppData .




                                                                                                                    309
expressor Documentation



usage              store_double(name, value)


arguments           name             variable in which to store a double precision
                                     value




                    value            a double precision value to be stored for
                                     persistence between separate runs of a
                                     dataflow and visible between multiple
                                     dataflows.




retrieve_double
retrieve_double returns the double precision Persistent Value stored in a variable.


usage              retrieve_double(name)


arguments           name             variable in which to store a double precision
                                     value


return            a double precision value stored for persistence between separate
                  runs of a dataflow and visible between multiple dataflows.




310
Connections

Connections
A Connection specifies the location of resources processed by Input and Output operators in a Dataflow.

There are two types of Connections:

        File

        Database

For Input operators such as Read File and Read Table, the Connection specifies the location of the file or database
containing the record to be read into the Dataflow. For Output operators, the Connection specifies the location where
the processed records are to be written.

    Note: The directory path to files and tables might be different in the development environment than
                what it will be when the dataflow is deployed. Connection paths must be set for the deployment
                environment before the dataflows that use the Connections are added to Deployment Packages.

File-based resources contain data in non-database form. In the expressor Studio 3.4 version, only connections to
flat files are supported. The encodings supported are:

        ASCII

        UTF

        ISO

        UCS

        others

Database Connections can be made through an ODBC DSN or through a provider-specific connection, such as
Oracle or IBM. DSN specifies a data structure that contains the information about a data source that an ODBC driver
needs to connect to that data source. Similarly, provider-specific connections specify the data structure the provider's
database needs to connect. expressor supports provider-specific connections for the databases in the following list,
and Studio allows you to create additional connections for any databases that are not in the list.

        IBM DB2

        Microsoft SQL Server

        MySQL Community Edition

        MySQL Enterprise Edition

        Oracle Database

        PostgreSQL

        Sybase ASE




                                                                                                                      311
expressor Documentation



           Netezza

           Teradata

      Note: DSN specifications are specific to the machine they are configured on. If the Connection files or
                Dataflows using them are moved to another machine, the DSN must be configured again on that
                machine.

For the Operators between Input and Output, such as Copy and Transform operators, a Connection does not apply.
However, operators such as Sort require a workspace in which to write data temporarily while they process.
Connections you have defined for a project are listed in the drop-down menu for the Working connection Operator
Property and so can be used to specify a location for the operator's temporary files. The location for that processing
must be a file system. It cannot be a database.

Connections can be stored in an expressor Repository. There they are placed under version control and can be
shared with other users. To be stored by Repository, the Connection must be in a Repository Workspace.


Create a Connection for a file-based resource
      1.    Click the New Connection button on Home tab of Studio ribbon bar.

      2.    Select File Connection

           A New File Connection dialog box opens.




      3.    Enter the directory path to the location of the file or other source for the data.

      Note: The directory path to files might be different in the development environment than what it will be
                when the dataflow is deployed. Connection paths must be set for the deployment environment
                before the dataflows that use the Connections are added to Deployment Packages.

      4.    Select the Project in which to place the new file Connection.

      5.    Name the Connection with a name unique within the workspace.

      6.    Provide a description of the purpose of the Connection (optional).




312
                                                                                                     Connections



Create a Connection for a Database

        Create a Connection for a Specific Database Provider


        Create a Connection for a Provider Not Supplied by expressor


        Create a Connection for an Existing DSN


        Drivers Supplied with the expressor Software




Create a Connection for a Supplied Database Provider
   1.    Click the New Connection button on Home tab of Studio ribbon bar.

   2.    Select Database Connection

        The New Database Connection dialog box opens.




   3.    Select Create a connection for a specific database provider (e.g., Microsoft SQL Server).

            Note: The list of supplied databases includes MySQL Community Edition and Netezza. The drivers
                     for MySQL Community Edition and Netezza are not supplied with expressor Studio
                     Version 3.4, but if the latest driver has been installed ( MySQL ODBC 5.1 Driver or
                     NetezzaSQL), selecting it from the list of supplied drivers will enable you to configure
                     normally.




                                                                                                                313
expressor Documentation



4.    Fill in the properties listed
      for the specific provider.

           The properties listed in
           the Connection and
           Credentials sections are
           required.




If there are additional connection properties required, they can be named and set in the Advanced section. See
Drivers Supplied with the expressor Software.

      Note: Access through the Connection is tested before you can finish creating it.

      5.    In the final dialog box, select the Project in which to place the new Connection.

      6.    Name the Connection with a name unique within the workspace.

      7.    Provide a description of the purpose of the Connection (optional).




Create a Connection for Providers Not Supplied by expressor
      1.    Click the New Connection button on Home tab of Studio ribbon bar.

      2.    Select Create a Connection for an Additional Database Provider and click Next.
            The next screen in the New Database Connection dialog box provides text-entry fields for naming
            connection properties and specifying values for each property.




            You need to know the minimum complement of connection properties that must be set to enable
            connectivity.



      3.    Specify the names of the properties that the database requires to establish a connection.

      4.    Specify the values each of the properties.
            For example, if the connection properties are host name, port number, and service name, you would fill in
            the properties like the following:




314
                                                                                                           Connections




          Also note that when creating a connection for this type of provider, you must specify the driver name.
          Drivers must be 32-bit.

    5.    Enter the username and password credentials for the database.

    6.    In the final dialog box, select the Project in which to place the new Connection.

    7.    Name the Connection with a name unique within the workspace.

    8.    Provide a description of the purpose of the Connection (optional).




Create a Connection for an Existing DSN
This type of connection is useful for connecting to Microsoft Excel spreadsheets, Microsoft Access databases, or some
other data resource for which there is an ODBC driver.

    1.    Click the New Connection button on Home tab of Studio ribbon bar.

    2.    Select Database Connection

         The New Database Connection dialog box opens.




    3.    Select Create a connection for an existing DSN (Data Source Name).

    Note: Microsoft Windows 7 32-bit systems have four drivers for connecting to Excel spreadsheets. Only
               the driver with version 12 allows Studio to connect to Excel spreadsheets.

    4.    Choose an expressor-3 driver that handles the type of database your resources are stored in.

         If there is no expressor-3 driver listed in the New Database Connection dialog box, your system administrator
         must add one.



                                                                                                                   315
expressor Documentation



      Note: When making a DSN connection to an Oracle database, N-CHAR support must be enabled.
               Otherwise, NCHAR and NVARCHAR come up as UNKNOWN.

      5.   Specify the Username and Password credentials required to access the database.

      6.   In the final dialog box, select the Project in which to place the new Connection.

      7.   Name the Connection with a name unique within the workspace.

      8.   Provide a description of the purpose of the Connection (optional).




Drivers Supplied with expressor Software
To see a list of drivers available on your system, both those supplied by expressor and those supplied separately,
open the ODBC Data Source Administrator.

      1.   Select the Start > All Programs > expressor > expressor3 > system tools > Data Sources (ODBC) menu item.

      2.   Click the Add command button.

      3.   In the Create New Data Source window, scroll down the listing until you see the entries beginning with
           expressor-3.

      4.   Select an expressor-3 driver from the list and press Finish.

      5.   Select the Advanced tab in the ODBC Driver Setup dialog box to see advanced properties that can be
           configured for the driver.

A complete listing of the connection properties for each of the drivers supplied with expressor 3.2 is available at:
http://media.datadirect.com/download/docs/odbc/allodbc/wwhelp/wwhimpl/js/html/wwhelp.htm


Change the Directory Path in a File Connection
      1.   Select the File Connection from the Explorer panel and double-click or use the right-click menu to open it.
           The Connection displays in the center panel of Studio.

      2.   Change the path to the directory containing the file to be read from or written to by an expressor Input or
           Output Operator.

      3.   Save the change with the Save icon on the Quick Access Toolbar.


      Change a DSN Connection
      1.   Select the Database Connection from the Explorer panel and double-click or use the right-click menu to
           open it.




316
                                                                                                      Connections



       The Connection displays in the center panel of Studio.




  2.   Change the DSN connection by selecting a DSN name from the drop-down list of existing DSN connections.

  Note: When making a DSN connection to an Oracle database, N-CHAR support must be enabled.
             Otherwise, NCHAR and NVARCHAR come up as UNKNOWN.

  Note: Microsoft Windows 7 32-bit systems have four drivers for connecting to Excel spreadsheets. Only
             the driver with version 12 allows Studio to connect to Excel spreadsheets.

  3.   Enter access credentials for the new connection.

  4.   Test the Connection by using the Test Connection button on the Connection Edit tab in the ribbon bar.
       Changes can be saved before testing the Connection. Testing is recommended before you use the
       Connection in a Dataflow.

  5.   Save the changes with the Save icon on the Quick Access Toolbar.


Change a Provider Connection
  1.   Select the Provider Connection from the Explorer panel and double-click or use the right-click menu to open
       it.
       The Connection displays in the center panel of Studio with the Connection and Credentials properties for
       the specific provider's driver.

  2.   Change Connection properties as appropriate.

  3.   Enter access credentials for the new connection.

  4.   Test connection by using the Test Connection button on the Connection Editing tab in the ribbon bar.

  5.   Save the changes with the Save icon on the Quick Access Toolbar.




                                                                                                                  317
Schema

Schema

What Are Schemas?


Changing and Mapping Schema


Schemas for Rejected Records




What Are Schemas?
Schemas are artifacts that define the structures of data (metadata) that is read or written by input and output
Operators. For example, if you were to read employee records from a Human Resources database, the table is likely
to contain fields such as EmployeeNumber, FirstName, LastName, HireDate, DepartmentID, and Salary. Those names
identify the fields that define the schematic structure of each employee record.

The Schema also indicates the data type for each field. For example, EmployeeNumber is likely to be "integer," Name
fields are likely "varchar" data and HireDate would likely be "datetime."

Specifying the metadata for the Schema is critical to reading in data and mapping it to Semantic Types for processing.
It is also critical for defining how the processed data will be represented in the data store it is sent to when
processing finishes.

Most external data systems provide some form of metadata that describes the schema of the records in that system.
Studio includes wizards that obtain this metadata to automatically construct Schema for expressor applications. For
example, when you create a Delimited Schema or Database Schema, Studio reads the metadata for the specified input
source.

Delimited Schema define the metadata for data typically stored in a file where the data fields and records are
separated with well-defined delimiters. For example, fields within a record can be delimited by commas, and the
records can be delimited by CR (carriage return).

During creation of a Delimited Schema, the user specifies the field names for the data and the explicit delimiters that
separate the fields and records. These values can be changed directly in the Schema editor after the initial
specification.

Delimited Schema can also be created from an existing Composite Type. Fields are based on the Composite Type
attributes, and default delimiters are assigned.

Studio handles the following characters as field delimiters and allows you to specify alternative field delimiters:

         Comma

         Tab

         Vertical bar




                                                                                                                      319
expressor Documentation



         Semi-colon

         Space

Studio provides the following as record delimiters and allows for additional characters to be specified as record
delimiters:

         CR+LF (carriage return plus line feed)

         CR

         LF

Quote characters can be specified to allow the field and record delimiters to be used within a field. The special
character would not be interpreted as a delimiter when bounded by the Quote character. The default Quote character
is the double quotation mark (").

Escape characters can also be specified to "escape" the special use of the Quote character. The default Escape
character is the double quotation mark ("). For example, if you are using the default Quote and Escape characters and
want to literally quote string data in a field, the string could be something like George ""The Man"" Jackson. The part
of the string surrounded by double quotation marks would be read as "The Man".

expressor Delimited Schema handle multiple types of data encoding. expressor also recognizes the Byte Order Mark
in Unicode strings that designate high-endian or low-endian mode.

Table Schema define the metadata for data in database tables and views. The Schema is derived directly from the
database tables. Table Schema can be changed after their initial specification. However, the changes must be derived
directly from the database table or view on which it is based. A Table Schema can be changed in the Schema editor
with the Update button in the ribbon bar.

The Table Schema wizard requires you to select a Database Connection. If an appropriate Database Connection does
not exist, a new Connection can be created during the process of creating the Schema.

SQL Query Schema contain SQL Query statements used to query the connected database table or view. This schema
is used exclusively in the SQL Query operators. The SQL Query Schema wizard requires you to select a Database
Connection. If an appropriate Database Connection does not exist, a new Connection can be created during the
process of creating the Schema.

Schemas can be stored in an expressor Repository. There they are placed under version control and can be shared
with other users. To be stored by Repository, the Schema must be in a Repository Workspace.

Changing and Mapping a Schema
Some Schemas can be changed after their initial creation. The Schema editor for file-based data sources allows you
to change the delimiters and add, edit, and delete fields. Delimited Schema can be created without reading in the
intended source file, so changes might be necessary in some cases, if the source file changes or fields or delimiters
were specified incorrectly when the Schema was created.

Changes to Table Schemas are driven by changes to the database tables or views they represent. After changing a
database table or view, create a new Table Schema and choose the Overwrite option to capture the updates in the
existing Schema.




320
                                                                                                               Schema



After a Schema is created, it must include at least one mapping to a Composite Type in order for it to be used in a
Dataflow. This is because all data records processed between Input and Output Operators in an expressor Dataflow
are represented by Composite Types.

When a Schema is created using one of the Schema wizards, the wizard will create a default Composite Type for that
Schema and default Mapping from the Schema fields to the Composite Type attributes. This default Composite Type
is known as a Local type because it is local to the Schema and has not been Shared for reuse.

This powerful mechanism of Semantic Mapping is a central aspect of the expressor product. Unlike legacy source-to-
target mapping approaches, Semantic Mapping allows you to map multiple Schemas to the same Composite Type.
When an Input Operator reads a data record, it maps the data into the Composite Type based on the mapping
configured for the operator. All of the common rules and constraints specific to that Composite Type are applied at
that time and throughout the Dataflow processing. The rest of the downstream non-output operators only need to be
concerned with the data as commonly represented by the Composite Type instead of needing to be concerned with
the particular source schema that defines its external format. When an Output Operator writes a data record, it maps
the data from the Composite Type to the external format defined by the Schema based on the mapping configured
for the operator. All of the common rules and constraints specific to that Composite Type are applied in that case as
well.

The characteristics of data, such as padding and rounding, is also specified in the Mapping. When string, number,
and data-time data types are mapped to one another, the Mapping specifies how a variety of characteristics are to be
handled. These Mapping settings affect how data changes when mapped from a Schema field to a Composite Type
attribute, but they do not change the basic data type of the mapped Composite Type attribute.

Since Composite Types, Schemas, and their Mappings are all reusable, developers and teams can easily reuse
mappings and ensure that their Dataflows conform to the constraints captured by their Semantic Types without
having to reinvent the mappings, constraints, and rules on each project.


When the Schema
is opened in the
editor, its default
Composite Type
mapping is
displayed along
with it.


The editor is where you map the Schema to Composite Types. A Schema may specify multiple mappings to the same
Type or different Types.

This mapping capability is critical for the Schema. Without mapping the Schema to a Composite Type, the data read
into the Dataflow cannot be processed. However, because the Schema wizards automatically generate default
Composite Types and mappings, mappings only need to be actively managed on an as needed basis.

Schemas for Rejected Records
When one or more fields of a data record fail to pass validations against Atomic Type attribute constraints, operators
must have a method for handling records with constraint errors. An operator could be set to abort the dataflow when



                                                                                                                      321
expressor Documentation



such a record is encountered, or it could skip or reject the record and proceed with the rest of the dataflow. Rejected
records are sent to the operator's reject port and passed to a Write File or Write Table operator, and from there they
can be written out to file or database for further examination.

The Write File and Write Table operators need a special schema to handle rejected data records. Schemas for rejected
records are created from the reject Composite Type that is automatically generated for records emitted on the reject
port of an operator. The Composite Type for rejected records has five fields, and Schemas generated from a reject
Composite Type have the same five fields:


RejectType         An integer that indicates the type of error that
                   caused the rejection:

                   1 - one or more errors occurred while converting
                   the source field to the corresponding Semantic Type
                   value.

                   2 - Not assigned.

                   3 - not enough fields in the input line.

                   4 - too many fields in the input line.


RecordNumber       The number of the rejected input line.


RecordData         A UTF-8 encoding of the data in the input line.


RejectReason       A UTF-8 string containing a structured description
                   of the problem that caused the rejection.


RejectMessage      An explanation of the reason for the rejection.




The following is an example of rejected-record output:

      "RejectType","RecordNumber","RecordData","RejectReason","
      RejectMessage"
      "1","12","11,Polk,James K,Democratic","1|TRANSLITERATE-
      1016,lname,Polk,4,5","The value 'Polk' for attribute
      'lname' has a length of '4', which is shorter than the
      defined limit of '5'"
      "1","28","27,Taft,William H,Republican","1|TRANSLITERATE-
      1016,lname,Taft,4,5","The value 'Taft' for attribute
      'lname' has a length of '4', which is shorter than the
      defined limit of '5'"
      "1","39","38,Ford,Gerald,Republican","1|TRANSLITERATE-
      1016,lname,Ford,4,5","The value 'Ford' for attribute
      'lname' has a length of '4', which is shorter than the
      defined limit of '5'"


322
                                                                                                                   Schema


     "1","42","41,Bush,George HW,Republican","1|TRANSLITERATE-
     1016,lname,Bush,4,5","The value 'Bush' for attribute
     'lname' has a length of '4', which is shorter than the
     defined limit of '5'"
     "1","44","43,Bush,George W,Republican","1|TRANSLITERATE-
     1016,lname,Bush,4,5","The value 'Bush' for attribute
     'lname' has a length of '4', which is shorter than the
     defined limit of '5'"

The Atomic Type constraint is that the president's last name (lname) have a Min Length of 5. The last names of these
five presidents are only four characters long, so they did not satisfy the constraint and were rejected. See Reference:
Constraint Corrective Actions for all the recovery actions that can be taken when data fails a constraint validation.

The procedure for creating a rejected-record schema are described in Create a Schema for a Rejected Record.




Create Delimited Schema

         Define New Delimited Schema


         Create New Delimited Schema
         from a Composite Type


         Create New Delimited Schema
         from Composite Type in an
         Upstream Operator


Define New Delimited Schema
    1.    Select Delimited Schema on the New Schema button on the Home tab of the ribbon bar.

    2.    Use the multi-line text box to describe the structure of the file that you want to read or write. There are three
          options for describing the file structure.

                    You may direct the wizard to read in several lines of the file.

                    You may paste in one or more lines of the file.

                    You may manually enter the field names separated by the desired delimiter.

                  This approach is especially suitable when describing the structure of an output file that currently does
                  not exist.

    3.    Select the appropriate Field and Record Delimiters from the drop-down lists.
          In addition to the specific characters in the drop-down lists, each list offers the option of specifying another
          characters or combination of characters to use as the delimiter. For the Other option, you can specify any
          valid character in the character encoding set you select or a hexadecimal number. For hexadecimal numbers,
          exactly three digits must be specified; for example, \x007.




                                                                                                                        323
expressor Documentation



      Note: Do not use the same delimiter for both fields and records, and do not use either the Quote
                 characters or Escape character as delimiters.

      4.    Enter a character to use for the purpose of "quoting" a character or string of characters that are to be
            interpreted literally and any special significance of the quoted character or characters is ignored.
            The default Quote character is the double quotation mark (").

      Note: Do not use either the field or record delimiter character as the Quote character.

      5.    Enter a character to use for the purpose of "escaping" the special use of the Quote character.
            The default Escape character is the double quotation mark (").
            For example, if you are using the default Quote and Escape characters and want to literally quote string data
            in a field, the string could be something like George ""The Man"" Jackson. The part of the string surrounded
            by double quotation marks would be read as "The Man".

      Note: Do not use either the field or record delimiter character as the Escape character.

      6.    Select the encoding method used for the data in the delimited file.

      7.    Check the Byte Order Mark box if a Byte Order Mark appears at the beginning of Unicode strings in the
            delimited file.

      8.    Use the Name text entry box under Field Details to name each field in the Schema.

           Alternatively, click the Set All Names from the Selected Row button to automatically name the fields with
           the names used in the selected row.

      5.    Select the Project or Library where the Schema is to be located.

      6.    Name the new Delimited Schema with a name that is unique within the workspace.

      7.    Provide a description of the Schema (optional).




Create New Delimited Schema from a Composite Type
There are four methods available for generating a new Delimited Schema from a Composite Type:

From the Home tab of the ribbon bar

From a Schema property button

From within the Schema Editor

From within the Composite Type Editor

From the Home tab or Schema property button
      1.    Select Delimited Schema from Type on the New Schema button on the Home tab of the ribbon bar.
            Alternatively, select New Delimited Schema from Type on the Schema property button of a Read File or
            Write File operator.

      2.    Select the set of Composite Types to choose from--existing Shared Types or Local Types that exist within
            Schemas.




324
                                                                                                               Schema



        If you select Types associated with a Schema, then:

            1.   Select from the drop-down list the Project or Library containing the Schema with the Local
                 Composite Type you intend to use for the new Schema.

            2.   Select from the drop-down list the Schema containing the Local Composite Type you intend to use.

            3.   Select the Local Composite Type from the list displayed for the chosen Schema.

            4.   Select the Project or Library in which to place the new Schema.

            5.   Name the new Delimited Schema with a name that is unique within the workspace.

            6.   Provide a description of the Schema (optional).

        If you select Shared types in a project, then:

            1.   Select from the drop-down list the Project or Library containing the Shared Composite Type you
                 intend to use for the new Schema.

            2.   Select the Shared Composite Type from the list displayed for the chosen Project or Library.

            3.   Select or specify the format options for the Delimited Schema.

            4.   Select the Project or Library in which to place the new Schema.

            5.   Name the new Delimited Schema with a name that is unique within the workspace.

            6.   Provide a description of the schema (optional).

From within the Schema Editor
   1.    Select Delimited Schema button on the Schema Edit tab of the ribbon bar.

   2.    Select the Composite Type from which to generate the new Delimited Schema.

   3.    Select the Project or Library in which to place the new Schema.

   4.    Name the new Delimited Schema with a name that is unique within the workspace.

   5.    Provide a description of the Schema (optional).

From within the Composite Type Editor
   1.    Select Delimited Schema button on the Type Edit tab of the ribbon bar.

   2.    Select the Composite Type from which to generate the new Delimited Schema.

   3.    Select the Project or Library in which to place the new Schema.

   4.    Name the new Delimited Schema with a name that is unique within the workspace.

   5.    Provide a description of the Schema (optional).




Create New Delimited Schema from a Composite Type in an Upstream Operator
   1.    Select New Delimited Schema from Upstream Type on the Schema property button of a Write File
         Operator.



                                                                                                                  325
expressor Documentation



      2.    Select the Composite Type from which to generate the Delimited Schema.

      3.    Select the Project in which to save the new Schema from the drop-down list.

      4.    Name the new Delimited Schema with a name that is unique within the workspace.

      5.    Provide a description of the schema (optional).




Create a Schema for a Database resource

           Define New Table Schema


           Create New Table Schema
           from a Composite Type


           Create New Table Schema
           from a Composite Type in an
           Upstream Operator


Define New Table Schema
      1.    Select Table Schema from the drop-down menu of the New Schema button on the Home tab of the ribbon
            bar.

      2.    Select the Project or Library containing the Connection to the database with the desired Schema.

      3.    Select a connection with an appropriate Schema from the list of existing Connections or click the New
            Database Connection... icon.

      4.    Select the table Schema to read.

      5.    Select one or more tables to use for creating the Schema.

      6.    Select the Project or Library in which to save the Schema.

      7.    Edit the Table name to create a new Schema or check the Overwrite box to change an existing Schema.

Create New Table Schema from a Composite Type
There are three methods available for generating a new Table Schema from a Composite Type:

From the Home tab of the ribbon bar

From a Schema property button

From within the Composite Type Editor

From the Home tab of the ribbon bar
      1.    Select Table Schema from Type from the drop-down menu of the New Schema button on the Home tab
            of the ribbon bar.




326
                                                                                                            Schema



         Alternatively, select New Table Schema from Type on the Schema property button of a Read Table or Write
         Table operator.

   2.    Select Shared type or Type contained in a schema from the Create Table Schema from Type dialog box.

   3.    Select the Project or Library containing the Composite Type from the Project drop-down list.

   4.    Select the Shared Composite Type or the Schema containing a Local Composite Type.
         If you are creating the Table Schema from a Type contained within a Schema, select the Local Composite
         Type from the list of Composite Types if there is more than one within the selected Schema.

   5.    Select the Database Type from the list on the next screen in the Create Table Schema from Type dialog box.

   6.    Select the Project or Library in which to save the Schema.

   7.    Edit the Table name to create a new Schema or check the Overwrite box to change an existing Schema.

From within the Composite Type Editor
   1.    Select the Table Schema button on the Type Edit tab of the ribbon bar.


         The New
         Table
         Schema
         from
         Type -
         View
         Type
         dialog
         box
         displays.


        Click Next.

   2.    Select a Connection from a Project or Library in the Project drop-down menu.

   3.    Select a Catalog Schema from the list on the next screen.




                                                                                                                  327
expressor Documentation



      4.   Name the new database table and change the name, data type, and other field values as necessary.




      5.   Select the project or library in which to store the new Schema from the Project drop-down list.

      6.   Name the new Table Schema.

      7.   Provide a description of the Schema (optional).

Create New Table Schema from a Composite Type in an Upstream Operator
      1.   Select New Table Schema from Upstream Type on the Schema property button of a Write Table operator.


           The New
           Table
           Schema
           from
           Type -
           View
           Type
           dialog
           box
           displays.


      2.   Click Next.

      3.   Select a Connection from a Project or Library in the Project drop-down menu.

      4.   Select a Catalog Schema from the list on the next screen.




328
                                                                                                           Schema



  5.   Name the new database table and change the name, data type, and other field values as necessary.




  6.   Select the project or library in which to store the new Schema from the Project drop-down list.

  7.   Name the new Table Schema.

  8.   Provide a description of the Schema (optional).


Create an SQL Query Schema
  1.   Select SQL Query Schema from the drop-down menu of the New Schema button on the Home tab of the
       ribbon bar.

  2.   Select the Project or Library containing the Connection to the database with the desired Schema.

  3.   Select a connection with an appropriate Schema from the list of existing Connections or click the New
       Database Connection... icon.

  4.   Write the SQL query statement to be executed with this schema.

  5.   Click the Validate button above the editing window in the New SQL Query Schema dialog box.

  6.   Correct any errors in the query statement.
       The next screen in the New SQL Query Schema dialog box shows the schema resulting from the query
       statement.




  7.   Select the Project or Library where the Schema is to be located.

  8.   Name the new SQL Query Schema with a name that is unique within the workspace.

  9.   Provide a description of the Schema (optional).




                                                                                                               329
expressor Documentation



Create a Schema for a Rejected Record
Create a Delimited Schema

Create a Table Schema

Create a Delimited Schema
The easiest way to create a delimited schema is to first connect a Write File operator to the reject port of a fully
configured Read File operator in a dataflow. Then:

      1.   Select New Delimited Schema from Upstream Type on the Schema property button of a Write File
           Operator.

      2.   Select the RejectRecord Composite Type to generate the Delimited Schema.




      3.   Select the Project in which to save the new Schema from the drop-down list.

      4.   Name the new Delimited Schema with a name that is unique within the workspace.

      5.   Provide a description of the schema (optional).

      6.   Open the RejectRecord schema.

      7.   Change the Field Delimiter to a character other than comma.
           Commas are used in the rejected record message content, so a comma should not be used to separate fields
           in the RejectRecord schema. If commas are used as the field delimiter, the Write File Quotes parameter must
           be specified. Otherwise, the schema will not be able to read the records correctly.




Create a Table Schema
      1.   Select Table Schema from the drop-down menu of the New Schema button on the Home tab of the ribbon
           bar.

      2.   Select the Project or Library containing the Connection to the database with the desired Schema.

      3.   Select a connection with an appropriate Schema from the list of existing Connections or click the New
           Database Connection... icon.

      4.   Select the table Schema to read.
           The schema must have the following fields:

                  RejectType
                  RecordNumber



330
                                                                                                                  Schema



               RecordData
               RejectReason
               RejectMessage

    5.   Select one or more tables to use for creating the Schema.

    6.   Select the Project or Library in which to save the Schema.

    7.   Edit the Table name to create a new Schema or check the Overwrite box to change an existing Schema.




Change Schema delimiters and fields
Edit a Schema for a file-based resource when the structure of the data originally specified does not accurately reflect
the actual file to be read into a Dataflow.

The Schema editor is used to both edit the Schema and map it to a Composite Type. The Composite Type and its
Attributes can also be changed in the Schema editor.

The delimiters and field names are the components of a Schema that can be changed in the Schema editor.

    1.   Select the file-based Schema from the Explorer panel and double-click or use the right-click menu to open it.
         The Schema editor displays the Schema in the center panel of Studio.




    2.   Select a new field delimiter and/or a new record delimiter from the drop-down lists.
         In addition to the specific characters in the drop-down lists, each list offers the option of specifying another
         characters or combination of characters to use as the delimiter. For the Other option, you can specify any
         valid character in the character encoding set you select or a hexadecimal number. For hexadecimal numbers,
         exactly three digits must be specified; for example, \x007.

    Note: Do not use the same delimiter for both fields and records, and do not use either the Quote
              characters or Escape character as delimiters.

    3.   Enter a character to use for quoting strings and numbers.
         The default character is the double quotation mark ("). It is used if you do not enter another character to use.

    Note: Do not use either the field or record delimiter character as the Quote character.

    4.   Enter a character to use as the escape character.
         The default character is the double quotation mark ("). It is used if you do not enter another character to use.
         For example, if you are using the default Quote and Escape characters and want to literally quote string data




                                                                                                                      331
expressor Documentation



           in a field, the string could be something like George ""The Man"" Jackson. The part of the string surrounded
           by double quotation marks would be read as "The Man".

      Note: Do not use either the field or record delimiter character as the Escape character.

      5.   Select the encoding format used in the data file from the drop-down list of encoding options.

      6.   Check the Byte Order Mark check box if the Encoding is Unicode (UTF) and uses the Byte Order Mark to
           indicate high-end or low-end byte interpretation.

      7.   Add a field with the Add button in the Schema Edit tab on the ribbon bar.

      8.   Edit a field by selecting it and clicking the Edit button in the Schema Edit tab on the ribbon bar.
           The name of the field can be changed.

      9.   Delete a field by selecting it and clicking the Delete button in the Schema Fields section.

      10. Save the changes with the Save icon on the Quick Access Toolbar.




Change a Table Schema
Update a Table Schema

Update a Create-Table Statement

Update a Table Schema
Update a Table Schema when the metadata in the source database table or view has changed.

The Schema editor is used to both update the Schema and map it to a Composite Type. The Composite Type and its
Attributes can also be changed in the Schema editor.

      1.   Select the Table Schema from the Explorer panel and double-click or use the right-click menu to open it.


           The Schema editor displays the Schema in
           the center panel of Studio.




      2.   Click the Update Schema button in the Schema Edit tab on the ribbon bar.

      3.   Select the Connection from the drop-down list.




332
                                                                                                               Schema



   4.   Click the Get Source Metadata button.




   5.   Click the Options button in the lower left of the Update Table Schema dialog box.
        The Update Options dialog box displays.




   6.   Select the mapping action for the changes in the Table Schema fields.
        Preserve existing mappings where possible will work for fields that existed prior to the SQL query
        changes.
        Create new mappings for all maps all the fields in the Schema after the SQL query has been changed. If
        Add new attributes for new fields under Attributes is not checked, then only fields for which there are
        corresponding attributes in the Semantic Type will be mapped.

   7.   Select Add and/or Delect new attributes to match the Schema fields that were added or deleted after the
        SQL query changes.
        The Add and Delete attribute boxes can be checked only if the Composite Type is local. If it is not local, you
        can check the Create a new local type box to replace the shared Composite Type currently mapped to the
        schema.

   8.   Click the Update button in the lower right of the Update Table Schema dialog box.
        The Schema editor now displays any changes in the schema, mapping, and Local Composite Type.

Update a Create-Table Statement




                                                                                                                    333
expressor Documentation



The CREATE TABLE statement that is used to create a table in a target database when the Create Missing Table
property is selected on a Write Table operator. The CREATE TABLE statement is generated automatically when a Table
Schema is created. The table that the statement creates in a database matches the schema for which the statement
was generated, and of course any changes to the statement must preserve that match so that the Write Table
operator using the schema can write data to the table correctly.

      1.   Select the Table Schema from the Explorer panel and double-click or use the right-click menu to open it.


           The Schema
           editor
           displays the
           Schema in
           the center
           panel of
           Studio.


      2.   Click the Create-Table Statement button in the Schema Edit tab on the ribbon bar.

      3.   Change the CREATE TABLE statement in the text box in the Create-Table Statement dialog box.




      Note: If the CREATE TABLE statement attempts to create a table with a schema that contains a user-
                 defined data type, the attempt will fail if the target database does not support the user-defined
                 data type.

      Note: expressor surrounds table names with quotation marks when it executes SQL and DDL statements
                 at runtime. Some database management systems treat quoted names differently than unquoted
                 names. Check the database system to which the table will be written to understand how the
                 quoted name will be handled. For example, Oracle databases treat tables named Foo and "Foo" as
                 separate tables.


Change SQL Query Schema
Edit an SQL Query Schema when you want to change the query statement to extract data differently.

The Schema editor is used to both edit the query statement and map it to a Composite Type. The Composite Type
and its Attributes can also be changed in the Schema editor.

      1.   Select the SQL Query Schema from the Explorer panel and double-click or use the right-click menu to open
           it.




334
                                                                                                          Schema



     The Schema editor displays the Schema in the center panel of Studio.




2.   Click the SQL Update Schema button in the Schema Edit tab on the ribbon bar.




3.   Select the Connection from the drop-down list.

4.   Change the SQL query statement to be executed with this schema.

5.   Click the Validate button above the editing window in the Edit SQL Query dialog box.

6.   Correct any errors in the query statement.

7.   Click the Options button in the lower left of the Edit SQL Query dialog box.


     The
     Update
     Options
     dialog
     box
     displays.




8.   Select the mapping action for the changes in the SQL Schema fields.
     Preserve existing mappings where possible will work for fields that existed prior to the SQL query
     changes.
     Create new mappings for all maps all the fields in the Schema after the SQL query has been changed. If




                                                                                                              335
expressor Documentation



            Add new attributes for new fields under Attributes is not checked, then only fields for which there are
            corresponding attributes in the Semantic Type will be mapped.

      9.    Select Add and/or Delect new attributes to match the Schema fields that were added or deleted after the
            SQL query changes.
            The Add and Delete attribute boxes can be checked only if the Composite Type is local. If it is not local, you
            can check the Create a new local type box to replace the shared Composite Type currently mapped to the
            schema.

      10. Click the Update button in the lower right of the Update SQL Query Schema dialog box.


           The
           Schema
           editor
           now
           displays
           the
           changed
           query and
           any
           changes it
           produces
           in the
           schema,
           mapping,
           and Local
           Composite
           Type.




Map Schema Fields to Composite Type Attributes
Start by selecting a Delimited or Table Schema from the Explorer panel and double-click or use the right-click menu
to open it.
The Schema editor displays the Schema in the center panel of Studio. Schema fields are automatically mapped to




336
                                                                                                                  Schema



corresponding Attributes.




         Change Mappings to Attributes          Add Mapping Sets


         Change Schema Fields                   Delete Attribute Mappings


Change Mappings to Attributes
    1.    Select a field in the Schema and drag the cursor to the intended Attribute.
          If the field or attribute is already mapped, a window will pop up asking if you want to replace the existing
          mapping.

    2.    Select Yes to replace the existing mapping.

Change Schema Fields
    1.    Add or edit Schema fields by clicking the appropriate buttons in the Schema Fields section of the Schema
          Edit tab on the ribbon bar.
          Editing Schema fields involves changing only the field name.
          Schema fields can be deleted with the Delete button in the Editing section of the Schema Edit tab on the
          ribbon bar.

    Note: Schema fields can only be changed in Delimited Schema. To change Table Schema fields, you
              change the database table and read it into the Schema again or create a new Table Statement.

Add Mapping Sets
    1.    Click the Add Set button on the Schema Edit tab in the ribbon bar.
          The Add Mapping Set dialog box appears.

    2.    Click Yes to create default attribute mappings for the new Set or No to create the Set without predefined
          mappings.

    Note: When using a schema for both Read and Write operators, it is helpful to create separate mapping
              sets for each usage and label them "Input" and "Output." Because the direction of data flow is
              different--from Schema to Type in Read operators and from Type to Schema in Write operators--
              can help to keep the distinction clear and help with any differences in the mappings.




                                                                                                                         337
expressor Documentation



Add Composite Type
      1.   Click the Add button in the Composite Type section of the Schema Edit ribbon bar.

      2.   Select New Local Type or Shared Type from the drop-down menu.


           If you select New Local
           Type, it is added to the
           Composite Type drop-down
           menu in the Semantic Type
           section of the Schema
           Editor. If you select Shared
           Type, the Select Shared
           Type dialog box displays.


      3.   Select the Shared Type from list of Composite Types.

Delete Attribute Mappings
      1.   Select one or more existing mapping lines in the Mappings column.

      2.   Click the Delete Mapping button in the Mappings section of the Schema Edit tab on the ribbon bar.
           The mapping line selected is deleted. The field and attribute can now be mapped differently.


Change Data Format for Mapping
How Data is Mapped

Format Data Mappings

Specify Error Corrections

How Data is Mapped
The Schema Editor enables you to change the format characteristics of string, number, and date-time data types
when they are mapped to string data. These changes affect how data changes when mapped from a Schema field to a
Composite Type attribute, but it does not change the basic data type of the mapped Composite Type attribute. To
change the data type of a Composite Type attribute, see Edit Semantic Type.

The string data type can be in either the Schema field or the Composite Type attribute. For example, when a field with
string data is mapped to an attribute with a decimal data type, you can specify how the decimal number is to be
represented. If you are mapping a field with string data to an attribute with a string data type, you can specify that the
string have a specific length when given to the attribute and how to pad or trim the string to create that length. When
mapping string data to a date-time attribute, you can specify the format the date-time data is to have.

Formats specified for mapping one data type to another are similar in some respects to constraints applied to Atomic
Types. There is an important difference between requirements set for mapping and constraints specified for Atomic
Types. Formats are prescriptive; they change the data to conform to the format requirement. Constraints, on the other
hand, are reactive in that they measure data's conformity to requirements and mark the data invalid if it fails to
measure up to the requirements set by the constraints.



338
                                                                                                                  Schema



While formats are prescriptive, there may be cases in which data cannot be changed to conform to the format
requirement. For example,

When data does not fit an Atomic Type's constraints, corrective actions can be taken to change the data in some way
to make it conform to the constraints. Similarly, corrective actions can be specified for errors that occur when data is
formatted. For example, when mapping string data to a numeric type, there might be an alphabetic character that
cannot be converted to number. The Edit Mapping dialog boxes discussed below each have an Error Corrections
tab that allows you to specify corrective actions. For example, when mapping string data to a numeric type, you can
get a Number Conversion error. One cause of this error is an attempt to convert an alphabetic character to a number.
You can specify that when a Number Conversion error occurs, change the field to a null value, or use a default value
(Correction Value) to replace the erroneous field.

    Note: Error conditions are evaluated in the order shown in the dialog box under the Error Corrections tab.
               When an error is found, the associated corrective action is taken, and error evaluation stops at
               that point. Thus, the reported error might not be the only error in the mapping conversion.

If errors that occur in mapping are not corrected, the faulty data is not passed to the Semantic Type. The error is
instead handled by the operator according to the setting on its Error handling parameter. If mapping errors are
corrected, the data is then evaluated by the Atomic Type for adherence to its constraints, if any are set.

Format Data Mappings
    1.    Select an existing mapping line in the Mappings column.

    2.    Click the Edit Mapping button in the Mappings section of the Schema Edit tab on the ribbon bar.

         The Edit Mapping dialog box displayed depends on the data type of the attribute to which the Schema field
         is mapped. Each dialog box and the subsections of its Formatting tab is shown below with tables explaining
         the fields in the dialog boxes.



         String




                                                                                                                      339
expressor Documentation




         Setting                    Meaning              Selections          Default


      Truncate string     Specifies whether or      Check = Remove     Don't remove
                          not to remove             No check = Don't
                          characters from the       remove
                          end of a string to
                          make it a certain
                          length.

                          If Truncate is checked,
                          Length must be
                          specified as a value
                          less than or equal to
                          the field length of the
                          Schema field.

                          Specify a truncate
                          length in a Schema
                          used in a Write Table
                          operator to ensure the
                          Schema field matches



340
                                                                                   Schema



                the database.


Pad string      Specifies whether or      Check = Add            Don't add
                not to add characters     No check = Don't add
                to make the string a
                certain length.


  Character     The character to use      Any character          space character
                for padding.


  Side          Specifies where to add    Prefix = Before        Suffix
                padding--before or        Suffix = After
                after the string.


  Length        The length of the         Integer value          0
                string after truncating
                or padding.

                Must specify length if
                Truncate string is
                selected. Length must
                be a value less than or
                equal to the field
                length of the Schema
                field.

                Specify a truncate
                length in a Schema
                used in a Write Table
                operator to ensure the
                Schema field matches
                the database.




Number

       Digits




                                                                                      341
expressor Documentation




       Setting                   Meaning                Selections                  Default


        Locale            The language used for   en-US = United States   en-US = United States
                          representing string,    English                 English
                          number, and special     es-US = United States
                          characters.             Spanish


   Specify fractional     Provides decimal        On                      On
         digits           representation of       Off
                          fractional numbers.


Minimum digits before     The minimum number      Non-negative integer    Don't add
                          of digits before the



342
                                                                                           Schema



        decimal             decimal.


 Minimum digits after       The minimum number        Non-negative integer      Suffix
        decimal             of digits after the
                            decimal.


    Maximum digits          The maximum number        Non-negative integer      0
     after decimal          of digits after the
                            decimal.


Specify total significant   Specifies that a          On                        Off
         digits             minimum and               Off
                            maximum will be set
                            for significant digits.


  Minimum significant       Specifies the minimum     Non-negative integer      0
         digits             number of significant
                            digits the converted
                            number will have.


        Maximum             Specifies the maximum     Non-negative integer      0
   significant digits       number of significant
                            digits the converted
                            number will have.


   Rounding mode            Specifies the rounding    Ceiling = round to        HalfEven
                            mode to use if the        positive infinity
                            number is truncated.      Floor = round to
                                                      negative infinity
                                                      Down = round toward
                                                      zero
                                                      Up = round away from
                                                      zero
                                                      HalfEven = round
                                                      towards the nearest
                                                      integer, or towards the
                                                      nearest even integer if
                                                      equidistant
                                                      HalfDown = round




                                                                                              343
expressor Documentation



                                                     towards the nearest
                                                     integer, or towards
                                                     zero if equidistant
                                                     HalfUp = round
                                                     towards the nearest
                                                     integer, or away from
                                                     zero if equidistant




          Note: An integer type can only contain 8 bytes, representing numbers between -
                  9223372036854775807 and 9223372036854775807. Rounding can cause the numeric
                  value to exceed the maximum allowed for an integer. For example, if Ceiling rounding is
                  used on 9223372036854775807 and Significant digits is set to 4, it would round to
                  9224000000000000000, which is larger than the maximum allowed for an integer. Because
                  rounding can cause integers to exceed their maximum value, it is possible that an output
                  string value cannot be parsed back into the same internal type it started from.




344
                                                                                        Schema



       Currency




  Setting                Meaning                Selections                    Default


   Locale         The language used for   Multiple global locales   en-US = United States
                  representing string,                              English
                  number, and special
                  characters.


Use currency      Specifies the numbers   No check = Do not use     No check
                  are to be represented   currency
                  as currency.            representation.

                                          Check = Use currency
                                          representation.



                                                                                            345
expressor Documentation



          Currency           Specifies the standard        Multiple global locales       USD
                             formatting of a
                             particular currency.


       Representation        How the currency              Symbol, Code, Text            Symbol
                             amount is represented,
                             e.g., $ (symbol), USD
                             (ISO code), or US
                             Dollars (text)

                             See Examples below.

Examples

               Data                              Representation                                 Result


-12                                    Symbol                                  ($12)


1234.5                                 Symbol                                  $1234.5


-12                                    Code                                    (USD12)


1234.55                                Code                                    USD1234.55


-12                                    Text                                    -12 US dollars


1234.566                               Text                                    1234.566 US dollars


In the above examples, the representation is determined by the Locale setting. Notice that there no way to display or
print "-$12" using the currency settings. You can get that representation by using number prefix and suffix under Sign
tab.

For example, to get "-$12.00," set negative prefix to "-$." However, by using this Sign setting on numbers, you are not
using currency formatting.

You can also override Code representations using Sign settings. But again, you would no longer be using currency
formatting.

You cannot use Sign to override Text representations. When the currency format is Text, the Sign settings are ignored.

Similarly, Grouping does not affect currency representation. Digits are grouped according to the Locale setting.




346
                                                                                        Schema



        Grouping




   Setting                Meaning                 Selections                  Default


    Locale         The language used for    en-US = United States   en-US = United States
                   representing string,     English                 English
                   number, and special      es-US = United States
                   characters.              Spanish


Use Grouping       On output, display the   Check = Use grouping    No check = Do not use
                   locale-specific          No check = Do not use   grouping
                   grouping. On input,      grouping
                   accommodate the




                                                                                            347
expressor Documentation



                          grouping parameter.


       Group size         Overrides the group        Non-negative integer   Locale's grouping size
                          size specified by the
                          locale.


       Secondary group    Overrides the              Non-negative integer   Locale's secondary
          size            secondary group size                              grouping size
                          specified by the locale.


          Grouping        Overrides the              Any single character   Locale's grouping
        character         grouping character                                separator
                          specified by the locale.


      Decimal separator   Overrides the decimal      Any single character   Locale's decimal
        character         separator specified by                            separator
                          the locale.




348
                                                                                            Schema



         Scientific




  Setting                    Meaning                  Selections                  Default


   Locale             The language used for     en-US = United States   en-US = United States
                      representing string,      English                 English
                      number, and special       es-US = United States
                      characters.               Spanish


Use scientific        Specifies whether or      Check = Can be used     Not checked = Cannot
                      not scientific notation   Not checked = Cannot    be used
                      can be used.              be used




                                                                                                349
expressor Documentation



Use exponent positive     Specifies whether or         Check = Display the           Not checked = Do not
         sign             not to display the sign      positive sign                 display the positive
                          for a positive               Not checked = Do not          sign
                          exponent.                    display the positive
                                                       sign


 Minimum exponent         Overrides the                Check = Set minimum           Not checked = Use the
        digits            minimum exponent             exponent digits               standard minimum for
                          digits for the locale.       Not checked = Use the         the locale
                                                       standard minimum for
                                                       the locale




          Note: When specifying scientific notation format, significant digits should be specified on the
                  Digits tab.

          Note: When processing data in fields that have any format specifications set, you should specify
                  scientific notation. If scientific notation is not selected, then an error will occur whenever
                  scientific notation appears in a data field. If no format specifications are set on the field,
                  however, then data with scientific notation will be processed. Prior to expressor Version
                  3.2, formatted data with scientific notation would be parsed even if the scientific notation
                  setting was not selected. Schemas created with versions prior to 3.2 should be changed to
                  set scientific notation on any fields for which other format specifications are set.

          Note: When Maximum digits after decimal is set to 2 and scientific notation is used, a number such
                  as 1.00e-04 becomes 0 instead of the two-decimal digit 1.00e-04.




350
                                                                                      Schema



        Sign




  Setting                Meaning                Selections                  Default


   Locale       The language used for     en-US = United States   en-US = United States
                representing string,      English                 English
                number, and special       es-US = United States
                characters.               Spanish


     Negative   Specifies the character   String value            Locale dependent
number prefix   used before a number
                to denote a negative
                value.




                                                                                          351
expressor Documentation



      Negative number     Specifies the character    String value              Locale dependent
        suffix            used after a number
                           to denote a negative
                          value.


      Positive number     Specifies the character    String value              Locale dependent
        prefix            used before a number
                           to denote a positive
                          value.


      Positive number     Specifies the character    String value              Locale dependent
        suffix            used after a number
                           to denote a positive
                          value.




       Date-time




The format for data and time is free-form. The format entered in this dialog box indicates how the date-
time data type mapped from the Schema field will be represented in the Composite Type attribute.

Format can also be selected from the Tokens drop-down menu




352
                                                                                                                   Schema



Again, this format applies to this particular mapping to the Composite Type attribute. If the Composite
Type is used with another Schema or Mapping, the date-time format can be specified differently in that
mapping to the attribute.

    3.    Complete fields related to the selected data type.

Specify Error Corrections
    1.    Format the data mappings.

    2.    Select the Error Corrections tab in the Edit Mapping dialog box opened for an individual mapping line in
          the Schema Editor.

    3.    Specify the Correction Actions and Correction Values for the errors associated with the particular data type
          mapping.
          The dialog box for each of the data type error corrections is shown below with a table explaining the errors
          and corrective actions.

         String




           Error                      Meaning                    Corrective Action                 Result


  String Conversion            Occurs when the string          Escalate                   Escalate passes the
                               data in the input field                                    string as it is to the
                                                               Use Null
                               contains characters                                        operator for error
                               that cannot be                  Use Default Value          handling.




                                                                                                                      353
expressor Documentation



                          converted.   Use Null replaces the
                                       string data with a null
                                       value.

                                       Use Default replaces
                                       the string data with
                                       the string value
                                       specified in Correction
                                       Value.




      Number




354
                                                                                                                   Schema



Error conditions are evaluated in the order shown in the dialog box. When an error is found, the associated corrective
action is taken, and error evaluation stops at that point. Thus, the reported error might not be the only error in the
mapping conversion.


          Error                        Meaning                 Corrective Action                     Result


 Number Conversion            Conversion string data         Escalate                      Escalate passes the
                              to a number format                                           value as it is to the
                                                             Use Null
                              fails.                                                       operator for error
                                                             Use Default Value             handling.

                                                                                           Use Null replaces the
                                                                                           string data with a null
                                                                                           value.

                                                                                           Use Default replaces
                                                                                           the string data with
                                                                                           the string value
                                                                                           specified in Correction
                                                                                           Value.


  Integer Truncation          Conversion of string           Escalate                      Escalate passes the
                              data with decimal to                                         value as it is to the
                                                             Use Null
                              integer drops numbers                                        operator for error
                              after decimal point.           Use Default Value             handling.
                                                             Accept Truncation             Use Null replaces the
                                                                                           string data with a null
                                                                                           value.

                                                                                           Use Default replaces
                                                                                           the string data with
                                                                                           the string value
                                                                                           specified in Correction
                                                                                           Value.

                                                                                           Accept Truncation
                                                                                           uses the number
                                                                                           regardless of
                                                                                           truncation


   String Conversion          Conversion of string to        Escalate                      Escalate passes the
                                                                                           value as it is to the



                                                                                                                         355
expressor Documentation



                          or from number fails.   Use Null      operator for error
                                                                handling.
                                                  Use Default
                                                                Use Null replaces the
                                                                string data with a null
                                                                value.

                                                                Use Default replaces
                                                                the string data with
                                                                the string value
                                                                specified in Correction
                                                                Value.


  Numeric Overflow        Conversion does not     Escalate      Escalate passes the
                          fit representation of                 value as it is to the
                                                  Use Null
                          primitive type.                       operator for error
                                                  Use Default   handling.

                                                                Use Null replaces the
                                                                string data with a null
                                                                value.

                                                                Use Default replaces
                                                                the string data with
                                                                the string value
                                                                specified in Correction
                                                                Value.




356
                                                                                               Schema



     Date-time




        Error                 Meaning              Corrective Action            Result


Date/Time Conversion   Conversion of string to   Escalate              Escalate passes the
                       or from date-time                               value as it is to the
                                                 Use Null
                       format fails.                                   operator for error
                                                 Use Default Value     handling.

                                                                       Use Null replaces the
                                                                       string data with a null
                                                                       value.

                                                                       Use Default replaces
                                                                       the string data with
                                                                       the string value
                                                                       specified in Correction
                                                                       Value.


  String Conversion    Conversion of string to   Escalate              Escalate passes the
                       or from a datetime                              value as it is to the
                                                 Use Null
                       format fails.                                   operator for error
                                                 Use Default Value     handling.

                                                                       Use Null replaces the



                                                                                                  357
expressor Documentation



                                                                                        string data with a null
                                                                                        value.

                                                                                        Use Default replaces
                                                                                        the string data with
                                                                                        the string value
                                                                                        specified in Correction
                                                                                        Value.




Change Schema Mappings to Composite Types
Start by selecting a Delimited or Table Schema from the Explorer panel and double-click or use the right-click menu
to open it.


The
Schema
editor
displays
the
Schema
in the
center
panel of
Studio.




           Assign a Composite Type to the       Remove a Composite Type
           Schema


           Create a New Local Composite         Share a Local Composite Type
           Type


           Add a Composite Type to a
           Schema




Assign a Composite Type to the Schema
A Schema has a Local Composite Type assigned to it by default. You can assign an existing Shared Composite Type
to the Schema instead of the Local Composite Type.

      1.    Click the Composite Type Add button on the Schema Edit tab in the ribbon bar.



358
                                                                                                              Schema



    2.    Select Shared Type from the drop-down menu on the Add button.
          This displays the Type Selector dialog box

    3.    Select an existing Shared Composite Type from the list in the dialog box.




Create a New Local Composite Type
A Schema can be mapped to multiple Local Composite Types. That is why the Local Type name is numbered, e.g.,
Local1.

    1.    Click the Composite Type Add button on the Schema Edit tab in the ribbon bar.

    2.    Select New Local Type from the drop-down menu on the Add button.
          The New Local Type name is listed on the Composite Type drop-down menu in the Composite Type column
          of the Schema editor.
          The new Local Composite Type contains the default attributes assigned in the original Local type created
          with the Schema.
          The default attribute mappings are displayed in the Mappings column.




Add a Composite Type from Another Operator
A single Schema can be mapped to multiple Local and Shared Composite Types. In addition to assigning a Shared
Type and creating a new Local Type, you can copy a Composite Type from an upstream operator to a Schema in an
Output Operator.

    1.    Select an Output Operator in a Dataflow to display its properties in Studio's right pane.

    2.    Click the button next to the Schema property drop-down list.

    3.    Select Add Upstream Type to Selected Schema.
          The selected Schema is the one currently displayed from the Schema property drop-down list.


                                          The Add Type to Schema
                                          dialog box displays the
                                          name of the Composite
                                          Type in the upstream
                                          operator immediately
                                          before the operator
                                          containing the Schema.

                                          If the upstream Type is Local
                                          to the upstream operator,
                                          then the new Type for the
                                          selected Schema will be
                                          Local.




                                                                                                                     359
expressor Documentation



If the upstream Type is Shared, then the new Type for the selected Schema will be a referenced copy, which mean any
changes made to that Shared Type will be reflected in the referenced copy in the selected Schema.




Remove a Composite Type
      1.    Select a Composite Type in the Composite Types drop-down list.
            To remove a Composite Type, there must be more than one Type mapped to the Schema. If there is only one
            Type, it cannot be removed.

      2.    Click the Remove button on the Schema Edit tab in the ribbon bar.
            The selected Composite Type is removed from the drop-down list. If the selected Composite Type is Shared,
            it is removed from the list but is not deleted. It remains available in the project.

      Note: When a Composite Type that is mapped to a Schema is deleted from the project, it is not deleted
                 from the list of Composite Types available to the Schema. It continues to display in the Schema's
                 list of Composite Types, but it has a line struck through its name to indicate that it is no longer
                 available as a Type.

           If the selected Composite Type is Local, it is removed from the drop-down list and is deleted. It cannot be
           retrieved.

Share a Local Composite Type
      1.    Select a Local Composite Type in the Composite Types drop-down list.

      2.    Click the Convert Local to Shared button on the Schema Edit tab in the ribbon bar.
            The Share Local Composite Type dialog box is displayed.

      3.    Select the Project in which to place the Composite Type.

      4.    Name the Composite Type.


Change Composite Type Attributes for a Schema
Attributes for a Composite Type that is mapped to a Schema can be changed within the Schema editor if the
Composite Type is a Local Type and thus ‖owned‖ by the Schema. If the Composite Type being mapped is Shared,
then it can only be changed in a standalone Composite Type editor. This topic describes how to change a Local
Composite Type in the Schema editor.

Start by selecting a Delimited or Table Schema from the Explorer panel and double-click or use the right-click menu
to open it.




360
                                                                                                                     Schema



The Schema
editor displays
the Schema in
the center
panel of
Studio.




          Change an Attribute


          Assign a Shared Atomic Type to an Attribute


          Share an Attribute's Local Atomic Type




Change an Attribute
    1.     Select an attribute from the list of attributes for a Local Composite Type.

    2.     Add, edit, or delete the attribute by clicking the appropriate buttons in the Attributes section of the ribbon
           bar.
           If you click the Edit button, the Edit Attribute dialog box is displayed.




                  a.   Change the name of the attribute.

                  b.   Specify whether or not the attribute is Nullable.
                       If you select Not Nullable, check the Default to box and specify the value to use in a data field that
                       is null.

                  c.   Select the data type from the drop-down list in the Semantic Type section.
                       If the Semantic Type is Shared, the Semantic Type section will be grayed out because you cannot
                       change the Type from within the Schema Editor.




                                                                                                                          361
expressor Documentation



               d.   Specify Constraints and error corrections, if applicable.
                    Again, if the Semantic Type is Shared, the Semantic Type section, including the Constraints, will be
                    grayed out.




Assign a Shared Atomic Type to an Attribute
Composite Type attributes are assigned Atomic Types in the expressor Studio 3.2 version.

      1.   Select an attribute from the list of attributes for a Local Composite Type.

      2.   Click the Assign Type button in the Attributes section of the ribbon bar.
           The Type Selector dialog box is displayed.

      3.   Select an existing Shared Atomic Type from the list in the dialog box.




Share an Attribute's Local Atomic Type
      1.   Select an attribute from the list of attributes for a Local Composite Type.

      2.   Click the Convert Local to Shared button in the Attributes section of the ribbon bar.
           The Share Local Semantic Type dialog box is displayed.

      3.   Select the Project in which to place the Atomic Type.

      4.   Name the Atomic Type.




362
Parameter Management

What Are Parameters?
Parameters are the settings given to Operator Properties in Studio's Properties panel. Parameters include settings
explicit in the Properties panel, such as Error handling, and settings contained within artifacts specified in the
Properties panel, such as the directory path contained in a File Connection.

When dataflows are being developed, parameters are usually set for the development environment, but when the
dataflow is tested and later put into production, it runs in different environments. Many of the parameter settings in
the development environment are not relevant in the other environments.

Parameter management provides a way to handle those environment changes without opening the dataflow and
resetting the operator properties. Parameters are managed at the command line with the eflowsubst command
and special options on the etask command.

Parameter values can be set in two ways:

    1.   With the -P argument to the etask command

    2.   With a substitution file created by the eflowsubst command

Priority for setting parameter values is in the order listed. Setting a parameter with the etask command when a
dataflow is run supersedes any other settings.


Substitute Parameters in a Dataflow
Parameters can be substituted in a dataflow in one of two ways:

    1.   With the command that runs a dataflow

    2.   With a substitution file created by the eflowsubst command

Parameters specified on the etask command line are specified individually.

Parameters are set as name=value pairs. The names of parameters is listed in Supported Parameters.

Priority for setting parameter values is in the order listed above. Setting a parameter with the etask command when
a dataflow is run supersedes any other settings.

Substitute Parameters on the Command Line
Dataflows are run from the command line with the etask command. In addition to specifying the dataflow to run,
you can specify a name-value pair with the -P argument to substitute individual parameters.

    1.   Open an expressor Command Prompt window from the Start menu.
         On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

    2.   In the expressor command prompt window, change your working directory to project directory.
         If you do not change your working directory to the project directory, you can use the -D argument on the
         etask command line with the path to the project directory.




                                                                                                                     363
expressor Documentation



      3.    Type the etask command:

                etask -x dataflow_name -P fileName=input-file.txt

           For example:

                etask -x Sample_Dataflow.rpx -P
                fileName=Customers.txt



Create a Substitution File
Substitution files are XML files created by running the eflowsubst command with a dataflow. The substitution file
created contains the parameter values set during development of the dataflow. Those are called the default
parameter values. You can then use the files to set different parameters for the multiple environments in which the
dataflow will be run.

Edit the file containing the default parameter values to change the values for the different environments. The
substitution file can be edited with standard XML and text editors.

The file If you intend to run the dataflow in multiple environments, create separate substitution files for each
environment by running the eflowsubst command multiple times to produce files with the default values. Then
change each file according to the environment in which you intend to use it.

The substitution files are supplied to the dataflow at runtime to reset the parameters for the environment in which the
dataflow is running.

      1.    Open an expressor Command Prompt window from the Start menu.
            On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

      2.    In the expressor command prompt window, change your working directory to the project directory.

      3.    Locate the Project containing the target dataflow .RPX file.
            The directory path to the Project must be supplied to the eflowsubst command when the parameter file is
            generated.

      4.    Run the eflowsubst command with the following options:

                eflowsubst -p project_name -V project_version_number

           The project_name variable supplied to the -p option must contain the full path to the project containing
           the target dataflow. The -V option specifies the version number of the project.

Run a Dataflow with a Substitution File
      1.    Open an expressor Command Prompt window from the Start menu.
            On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

      2.    In the expressor command prompt window, change your working directory to the project directory.
            If you do not change your working directory to the project directory, you can use the -D argument on the
            etask command line with the path to the project directory.




364
                                                                                              Parameter Management



    3.      Type the etask command:

               etask -x dataflow_name -D
               deployment_package_pathname




Supported Parameters
The following list contains the names used in the parameter values file generated by the eflowsubst command. It
also indicates the operator that contains the property in which the parameter is specified.


     Operator                  Parameter Name


Read File               fileName


Read File               quotes


Read File               skipRows


Read File               errorHandling


Read File               showErrors


Write File              fileName


Write File              quotes


Write File              includeHeader


Write File              appendToOutput


Write File              attemptTimestampToFilename


Read Table              showErrors


Read Table              overrideDatabase


Read Table              overrideSchema


Read Table              overrideTable


Write Table             mode




                                                                                                               365
expressor Documentation



Write Table          truncate


Write Table          createMissingTable


Write Table          maximizeBatchSize


Write Table          errorHandling


Write Table          showErrors


Write Table          overrideDatabase


Write Table          overrideSchema


Write Table          overrideTable


SQL Query            errorHandling


SQL Query            showErrors


Aggregate            errorHandling


Aggregate            showErrors


Join                 errorHandling


Join                 showErrors


Join                 workingConnectionPath


Join                 workingConnectionHost


Transform            errorHandling


Transform            showErrors


Buffer               workingConnectionPath


Buffer               workingConnectionHost


Sort                 workingConnectionPath


Sort                 workingConnectionHost




366
                                            Parameter Management



Sort                workingMemory


Unique              workingConnectionPath


Unique              workingConnectionHost


File Connection     Path


File Connection     Host


DB2 Connection      ipaddress


DB2 Connection      port


DB2 Connection      db


DB2 Connection      Credential


DSN Connection      dsn


DSN Connection      uid


DSN Connection      pwd


Sybase Connection   na


Sybase Connection   db


Oracle Connection   host


Oracle Connection   port


Oracle Connection   servicename


Teradata            DBCName
Connection


Teradata            port
Connection


Teradata            db
Connection


BaseODBC            server




                                                             367
expressor Documentation



BaseODBC             port


BaseODBC             db


BaseODBC             DatabaseType




368
Semantic Types

Semantic Types

What Are Semantic Types?


Mapping Schema to Semantic Types


Flexibility of Semantic Types




What Are Semantic Types?
Semantic Types define data that flows through an expressor Dataflow and provides a consistent type model for one
or more external sources of data. Semantic Types allow you to define "smart" types that include constraints and well-
defined rules about how they may or may not be transformed. This enables applications to ensure data consistency
and integrity very early in the application and rely on the data represented by the Type elsewhere in the application.

There are two kinds of Semantic Types--Atomic Types and Composite Types.

        An Atomic Type defines the type for a single unit of data that cannot be decomposed any further, such as a
         field in a record. Accordingly, it is associated with a data type such as string, integer, or decimal.

        A Composite Type defines the type for an aggregate or hierarchical data structure. It is primarily defined by
         a set of Attributes each of which is assigned a Semantic Type. The Semantic Type defined for an Attribute
         can be an Atomic Type or another Composite Type. A Composite Type Attribute uses another Composite
         Type when the record contains records nested within it.

    Note: In the expressor Studio 3.4 version, Composite Type attributes can use Atomic Types only.



The view on the right of the
Schema Editor demonstrates
how data from an external
source represented by a
Schema, on the left, maps to a
Composite Type and the Atomic
Types in the Composite Type
attributes.

Semantic Types are
fundamental to an expressor
application because they define
the structure and constraints of
the data that moves through an




                                                                                                                     369
expressor Documentation



expressor Dataflow.




An Atomic Type is used in the following contexts:

         In a Composite Type
          An Atomic Type specifies the type of an Attribute in a Composite Type.

A Composite Type is used in the following contexts:

         In a Composite Type
          A Composite Type specifies the type of an Attribute in a Composite Type such as when the containing
          Composite Type represents hierarchical data and the attribute represents nested data (see Note above).

         In a Schema
          A Composite Type is mapped to a Schema as needed to specify how external data is represented within an
          expressor Dataflow and vice versa. Using the mapping, an Input Operator can convert the external data
          represented by the Schema to the Semantic Type that will flow to a downstream operator. Using the
          mapping, an Output Operator can convert the internal data represented by the Composite Type to the
          Schema as needed to write the data externally.

         In all Operators
          A Composite Type defines the data that flows out of an Input Operator after being read from the external
          system according to the Schema, through all interior Operators such as Transformer Operators, and into an
          Output Operator before being written to the external system according to the Schema.

A Semantic Type may be Local or Shared. Generally speaking, a Local Semantic Type is defined within another artifact
or object that uses it and as such, is not reusable. That is, a Local Semantic Type cannot be used by another artifact. A
Shared Semantic Type stands on its own as an artifact, is referenced by name, and is reusable throughout a Project
and its related Libraries.

Both Atomic and Composite Types can be stored in an expressor Repository. There they are placed under version
control and can be shared with other users. To be stored by Repository, the Types must be in a Repository
Workspace.




Mapping Schema to Semantic Types
Semantics Types are mapped to the fields of a Schema in the Schema's mappings. Schemas are mapped to Semantic
Types to create consistency in data derived from multiple sources. When two or more fields in different source/target
systems should be processed in the same way because they are semantically equivalent, using a single Semantic Type
to represent them simplifies how the sources and targets are mapped to one another while also ensuring that the
constraints for processing that data are consistent across applications. For example, you can map similar data (e.g.,
Social Security number, street address) from multiple sources, regardless of differences in external name and types, to
a single Semantic Type so that the application can process them consistently.

Mapping of Schema to Semantic Type reduces the amount of transformation programming while preserving the
original mapping to reconstitute the Schema on output. This simple capability allows a developer to use a single



370
                                                                                                       Semantic Types



string type, named for example "SocialSecurityNumber," to represent a Social Security number in two different
database columns (e.g., CUST_NO and customerID) of differing types (e.g., varchar and integer). After defining the
new type once, you only need to associate this semantic type with the data being read or written, and the expressor
system does the rest, including automatic type conversions and validations.

An Atomic Type can be reused for other fields in the same Schema mapping. For example, fields such as employee
number and manager number may be distinct, but the semantics of their usage or type can be the same. Processing
them as a single Semantic Type simplifies the application, ensures that they adhere to the same constraints, increases
reusability, improves quality, and emphasizes their semantic equivalence.

In the context of a Schema mapping, a Composite Type defines the type that maps to the top-level record of the
Schema and may also be used to define the type that maps to nested records within a Schema. Mappings from
Schema fields to Composite Type Attributes are defined with the Schema editor and are stored within the Schema. A
Schema may define many Mappings to the same or different Composite Types. But when configuring a Schema for
use within an Input or Output Operator, exactly one Mapping must be chosen for use.




Flexibility of Semantic Types
Semantic Types may be created within the context where they are used (see list above) or they may be defined as
standalone artifacts that can be referenced from the context where they are to be used. When created within the
context where they are used, they are known as Local Types. When created as standalone reusable artifacts, they are
known as Shared Types. As indicated above, the key distinction is that Local Types can only be used within the context
where they are defined while Shared Types can be reused across contexts within the same application or across
applications. Local Types created within a particular context, such as the Schema editor, can be Shared. They do not
have to remain Local simply because they were created within a particular context. The Schema editor provides
means for sharing a Local Type.

Local and Shared Atomic Types set the data type (string, integer, datetime, double, decimal, or byte) handled by
Composite Type Attributes, and they place Constraints (Maximum Length, Minimum Length, Regular Expression,
Allowed Values, Maximum Value, Minimum Value, Maximum Digits before Decimal, and Maximum Digits after
Decimal, and Maximum Significant Digits) on the data. Data that violates constraints can be handled with
predetermined Corrective Actions, which are set by the user when defining an Atomic Type.

With Constraints, you can build data cleansing applications or use them to sort out incoming data that does not meet
the data values specified.

When Shared Types are changed, their settings for data type and constraints then apply to all fields to which the Type
is assigned.

You can change the Semantic Type assigned to an Attribute. If, for example, a constraint in a certain Semantic Type is
not appropriate for the Attribute, you can assign another Type that more accurately represents data mapped to that
Attribute. You can also change the constraints on the Semantic Type, and that change will apply to all other records
or fields that are assigned that Semantic Type.




                                                                                                                     371
expressor Documentation



Create an Atomic Type
      1.   Select Create Atomic Type... on the New Type button on the Home tab of the ribbon bar.

      2.   Name the Type in the New Type dialog box.

      3.   Select the Project or Library in which to place the new file Type.

      4.   Provide a description of the purpose of the Atomic Type (optional).
           The new Type is displayed in the Studio center panel.




Create a Composite Type
There are two ways to create Composite Types, depending on the context in which you are working. You can create a
Composite Type independently, or you can create one from within the Schema Editor.

Composite Types are also created automatically as you connect and configure operators. See Set Semantic Types for
Operators.

Create a Composite Type Independently
      1.   Select Create Composite Type... on the New Type button on the Home tab of the Studio ribbon bar.

      2.   Name the Type in the New Type dialog box.

      3.   Select the Project or Library in which to place the new file Type.

      4.   Provide a description of the purpose of the Composite Type (optional).
           The new Type is displayed in the Studio center panel.

Create a Composite Type in the Schema Editor
      1.   Open a Schema artifact.

      2.   Select New Local Type from the Add button's drop-down menu on the Schema Edit tab of the ribbon bar.
           When the new Local Composite Type is created, it has attributes that map to the Schema's fields.

      3.   Select the Add or Edit button from the Attributes section of the Schema Edit tab on the ribbon bar.
           The Delete button in the Editing section of the ribbon bar can be used to remove attributes from the
           Composite Type.

                   As a reference assigns the input as a Shared Type.

                   As a Local Type creates a new Local Type.




372
                                                                                                          Semantic Types



Select the Data Type and Constraints for an Atomic Type
    1.   Select an Atomic Type from the Explorer panel and double-click or use the right-click menu to open it.


         The Atomic
         Type editor
         displays the
         Type in the
         center panel
         of Studio.


    2.   Select the Data type from the drop-down menu.

    3.   Add Constraints by selecting the check box next to the constraints to be set.
         See Reference: Constraints on Semantic Types for explanations of each Constraint and expression of
         Constraint and Correction values.

    4.   Enter the desired Constraint value for each constraint being set.

    5.   Select a Corrective Action for each constraint being set.
         See Reference: Constraint Corrective Actions.

    6.   Specify a Correction Value when the selected Correction Action is Default.

    7.   Save the changes with the Save icon on the Quick Access Toolbar.


Add Attributes to a Composite Type
There are four environments in which you can add attributes to a Composite Type.

        Add an Attribute in the Composite Type Editor

        Add an Attribute for Mapping Schemas to Types in the Schema Editor

        Add an Attribute for Mapping input to output Composite Types in the Rules Editor

        Import Attributes for Mapping input to output Composite Types in the Rules Editor

Add an Attribute in the Composite Type Editor
    1.   Select a Composite Type from the Explorer panel and double-click or use the right-click menu to open it.

    2.   Click the Add button on the Type Edit tab on the ribbon bar.

    3.   Name the new Attribute in the Add Attribute dialog box.

    4.   Select whether or not the attribute is Nullable.
         If you select Not Nullable, check the Default to box and specify the value to use in a data field that is null.

    5.   Select the data type from the drop-down list.
         The data type is a property of the Semantic Type assigned to the attribute, and the drop-down list of data
         types displays within the Semantic Type section of the dialog box.



                                                                                                                       373
expressor Documentation



      6.   Specify Constraints and recovery actions, if applicable.
           See Reference: Constraints on Semantic Types.

Add an Attribute for Mapping a Schema to a Type in the Schema Editor
      1.   Select a Schema from the Explorer panel and double-click or use the right-click menu to open it, or create a
           new Schema by selecting the New Schema button on the Home tab of the ribbon bar and choosing either
           Delimited Schema or Table Schema.

      2.   Verify that the Composite Type is Local.
           If the Composite Type is a Shared rather than a Local Type, you cannot add attributes in the Schema editor.
           To add attributes to a Shared Type, you must use the Composite Type Editor. To open the Composite Type
           Editor, select the Shared Composite Type in the Explorer panel and double-click or use the right-click menu
           to open it.

      3.   Click the Add button in the Attributes section of the Schema Edit tab on the ribbon bar.

      4.   Name the new Attribute in the Add Attribute dialog box.

      5.   Select whether or not the attribute is Nullable.
           If you select Not Nullable, check the Default to box and specify the value to use in a data field that is null.

      6.   Select the data type from the drop-down list.
           The data type is a property of the Semantic Type assigned to the attribute, and the drop-down list of data
           types displays within the Semantic Type section of the dialog box.

      7.   Specify Constraints and recovery actions, if applicable.
           See Reference: Constraints on Semantic Types.

Add an Attribute for Mapping input Types to output Types in the Rules Editor
      1.   Select the Add button from the Output Attributes section of the Rules Editor's ribbon bar.

      2.   Name the new Attribute in the Add Attribute dialog box.

              Note: The names used for Attributes are case-sensitive. When attributes move through a dataflow,
                         they are matched automatically with attributes that have the same name. If an input
                         attribute has name "foo" and the user creates an output attribute name "Foo," the two
                         attributes will not be mapped to one another automatically. They can be mapped to each
                         other manually through a rule, but all other properties of the attributes, such as data type
                         and constraints, must match.

      3.   Select whether or not the attribute is Nullable.
           If you select Not Nullable, check the Default to box and specify the value to use in a data field that is null.

      4.   Select the data type from the drop-down list.
           The data type is a property of the Semantic Type assigned to the attribute, and the drop-down list of data
           types displays within the Semantic Type section of the dialog box.

      5.   Specify Constraints and Corrective Actions, if applicable.
           See Reference: Constraints on Semantic Types.




374
                                                                                                         Semantic Types



Import Attributes for Mapping input Types to output Types in the Rules Editor
    1.   Select the Import button from the Output Attributes section of the Rules Editor's ribbon bar.

    2.   Select Shared Types or Local Types in the Import Attributes dialog box.

    3.   Select the Project or Library from the drop-down list on the next screen of the Import Attributes dialog box.

    4.   Select a Composite Type from the list for the selected Project or Library.
         All the attributes from the selected Composite Type are imported to output attributes displayed in the Rules
         Editor.


Map Types in the Rules Editor
Transformation Operators such as Transformchange the structure of data between input and output. In such cases,
the attributes of the output Composite Type can be different from the attributes of the input Composite Type, so the
attributes must be mapped to one another. That mapping is performed in the Rules Editor.

Before mapping can be performed, the transformation operator must be connected to the upstream operator from
which it will receive input.

Composite Types attributes for output cannot be modified if they have been propagated from the input attributes or
if they have been propagated upstream from the downstream operator. Only attributes that have been created for
the operator, so-called "manufactured" attributes, can be changed in the Rules Editor.

    1.   Select an Operator that requires the Rules Editor to specify the data transformation.
         The Operators that currently use the Rules Editor to map Composite Types attributes are Transform, Join,
         and Aggregate.

    2.   Ensure the selected Operator is connected to the upstream operator or operators.
         It is this connection that provides the input attributes.

    3.   Click the Edit Rules button in the ribbon bar or the Properties panel.
         When the Rules Editor opens, the input Composite Type attributes display on the left, and by default, the
         same attributes are used for output. The output attributes are inherently mapped to the input attributes and
         so will continue downstream as is unless they are explicitly modified with a Rule.

    4.   Select an output attribute from a downstream operator, if one exists in the output attributes, and drag a line
         to an input attribute.

    5.   Add attributes to the output as required by the transformation.
         See Add an Attribute for Mapping input Types to output Types in the Rules Editor and Import Attributes for
         Mapping input Types to output Types in the Rules Editor.

             Note: The names used for Attributes are case-sensitive. When attributes move through a dataflow,
                       they are matched automatically with attributes that have the same name. If an input
                       attribute has name "foo" and the user creates an output attribute name "Foo," the two
                       attributes will not be mapped to one another automatically. They can be mapped to each
                       other manually through a rule, but all other properties of the attributes, such as data type
                       and constraints, must match.




                                                                                                                     375
expressor Documentation



      6.    Map the input Composite Type attributes to the new output attributes by selecting individual input
            attributes and dragging the cursor to the intended output attribute.




Change Semantic Types

Share a Semantic Type


Assign a Shared Composite Type


Assign a Semantic Type to a Composite Type
Attribute


Edit a Semantic Type


Change the Output Type in Transformation
Operators




Share a Semantic Type
Local Semantic Types can be changed to Shared Semantic Types. Once a Semantic Type is shared, it is listed as a
Type in the Explorer. It can then be reused in the project in the various contexts where a Semantic Type is assigned.
Both the Composite Type editor and Schema editor provide means to share Local Types.

Share an Atomic Type from the Composite Type Editor
      1.    Select a Composite Type from the Explorer panel and double-click or use the right-click menu to open it.
            The Composite Type editor displays the Type in the center panel of Studio.




      2.    Select an Attribute from the list in the Composite Type editor.

      3.    Click the Edit button on the Type Edit tab in the ribbon bar.
            The Edit Attribute dialog box opens.




376
                                                                                                      Semantic Types




   4.   Select the Share Type option from the Change Type drop-down menu on the right side of the dialog box.

   5.   Select from the list of Atomic Types in the Select Shared Type dialog box.

Share a Composite Type from the Schema Editor
   1.   Select a Schema from the Explorer panel and double-click or use the right-click menu to open it, or create a
        new Schema by selecting the New Schema button on the Home tab of the ribbon bar and choosing either
        Delimited Schema or Table Schema.

   2.   Select a Local Composite Type in the Composite Types drop-down list.

   3.   Click the Convert Local to Shared button in the Composite Type section of the Schema Edit tab on the
        ribbon bar.
        The Share Local Composite Type dialog box is displayed.

   4.   Select the project in which to place the new Composite Type.

   5.   Name the Composite Type in the Share Local Composite Type dialog box.

   6.   Save the changes with the Save icon on the Quick Access Toolbar.

Share an Atomic Type from the Schema Editor
   1.   Select a Schema from the Explorer panel and double-click or use the right-click menu to open it, or create a
        new Schema by selecting the New Schema button on the Home tab of the ribbon bar and choosing either
        Delimited Schema or Table Schema.

   2.   Verify that the Composite Type is Local.
        If the Composite Type is a Shared rather than a Local Type, you cannot change its attributes, including
        Atomic Type, in the Schema editor. To change an attribute's Atomic Type to a Shared Type, you must use the
        Composite Type Editor. To open the Composite Type Editor, select the Shared Composite Type in the
        Explorer panel and double-click or use the right-click menu to open it.

   3.   Select an attribute from the list of attributes in the Schema editor.

   4.   Click the Edit button in the Attributes section of the ribbon bar.
        The Edit Attribute dialog box opens.




                                                                                                                  377
expressor Documentation




      5.   Select the Share current Local Type option from the Change Type drop-down menu on the right side of
           the dialog box.

      6.   Select the project in which to place the new Atomic Type.

      7.   Name the Atomic Type.




Assign a Semantic Type to a Composite Type Attribute
The Semantic Type of a Composite Type Attribute can be changed by assigning another Semantic Type. Both Local
and Shared Semantic Types can be assigned another Semantic Type.

      Note: To change a Composite Type Attribute in the Schema Editor or the Rules Editor, the Composite
               Type must be Local. A Shared Composite Type can be changed only within the Composite Type
               Editor.

      1.   Open a Composite Type in one of the following ways:

                   Select a Composite Type from the Explorer panel and double-click or use the right-click menu to
                    open it.

                   Select a Schema from the Explorer panel and double-click or use the right-click menu to open it, or
                    create a new Schema by selecting the New Schema button on the Home tab of the ribbon bar and
                    choosing either Delimited Schema or Table Schema.

      2.   Select an attribute from the list in the Composite Type display.

      3.   Click the Edit button in the Attributes section on the ribbon bar.

      Note: In the expressor Studio 3.4 version, only Atomic Types can be assigned to attributes.

      4.   Select the Share Typed option from the Change Type drop-down menu on the right side of the dialog box.

      5.   Select an existing Shared Atomic Type from the list in the Select Shared Type dialog box.

      6.   Save the changes with the Save icon on the Quick Access Toolbar.




378
                                                                                                       Semantic Types



Edit a Semantic Type
Composite Types and Atomic Types can be edited within their respective Type editors and from within the Schema
Editor and the Rules Editor.

    Note: To change a Composite Type Attribute in the Schema Editor, the Composite Type must be Local. A
              Shared Composite Type can be changed only within the Composite Type Editor.

Edit a Composite Type
    1.   Open a Composite Type in one of the following ways:

                 Select a Composite Type from the Explorer panel and double-click or use the right-click menu to
                  open it.

                 Select a Schema from the Explorer panel and double-click or use the right-click menu to open it, or
                  create a new Schema by selecting the New Schema button on the Home tab of the ribbon bar and
                  choosing either Delimited Schema or Table Schema.

    2.   Perform any of the following actions.

                 Add an attribute to the Composite Type by clicking the Add button on the Type Edit tab in the
                  ribbon bar.


                  Adding an attribute entails giving it a name,
                  specifying whether or not it is Nullable, can
                  hold a null value, and selecting a data type
                  and constraints for the Semantic Type
                  assigned to it. The data type is a property of
                  the Semantic Type assigned to the attribute,
                  and the name of the Semantic Type is
                  displayed above the data type and
                  constraints.


                 Edit an attribute to the Composite Type by selecting an attribute and clicking the Edit button on
                  the Type Edit tab in the ribbon bar.


                                                                        When editing an attribute, you change its
                                                                        name, whether or not it is Nullable, and the
                                                                        data type, constraints, and error corrections
                                                                        on the Semantic Type assigned to the
                                                                        attribute.




                 Delete an Attribute to the Composite Type by selecting an attribute and clicking the Delete button
                  on the Type Edit tab in the ribbon bar.




                                                                                                                      379
expressor Documentation



                   Share a Local Atomic Type by selecting an attribute and clicking the Edit button on the Type Edit
                    tab in the ribbon bar and selecting Share current Local Type from the Change Type drop-down
                    menu on the right side of the dialog box.
                    Sharing a Atomic Type adds it to the list of Shared Types in the project.

                   Assign an existing Atomic Type to an attribute by selecting an attribute and clicking the Edit button
                    on the Type Edit tab in the ribbon bar and selecting Shared Type from the Change Type drop-
                    down menu on the right side of the dialog box.

      3.   Save the changes with the Save icon on the Quick Access Toolbar.

Edit an Atomic Type in the Atomic Type Editor
      1.   Select an Atomic Type from the Explorer panel and double-click or use the right-click menu to open it.
           The Atomic Type editor displays the Type in the center panel of Studio.




      2.   Select a data type from the drop-down list.

      3.   Specify constraints and error corrections, if applicable.




Change the Output Type in Transformation Operators
Once Semantic Types for output have been set in a Dataflow's transformation operators, you can reset them if
necessary. This can be done by opening an operator in the Rules Editor and:

          Add an output attribute.

          Import attributes from existing Shared or Local Composite Types.

See Summary of Propagation Rules for a complete explanation of the rules for setting Semantic Types in operators.

These actions change the Composite Type attributes assigned to the output. To change the Composite Types
themselves, see Assign a Semantic Type to a Composite Type Attribute and Edit a Semantic Type above.




380
                                                                                                        Semantic Types



Reference: Constraint Pattern Matching
You can use pattern matching in defining a constraint to be applied to an Atomic Type.


Character Class                   Pattern Item


Pattern                           Sub-expressions


Alternation


Character Class
A character class is used to represent a set of characters. The following character combinations are allowed when
describing a character class.

         Any keyboard character represents itself.

         The characters .[{}()\*+?|^$ are "magic" characters that cannot directly represent themselves. These
          characters must be escaped with a preceding \ character.

         A dot . represents all characters.

         Placing the \ escape character before any ordinary character is undefined.

         The combination \w or [[:word:]] represents all alphanumeric characters and the underscore.

                  The combination \W or [^[:word:]] represents the complement of \w.

         The combination [[:cntrl:]] represents all control characters.

                  The combination [^[:cntrl:]] represents the complement of [[:cntrl:]].

         The combination \d or [[:digit:]] represents all digits.

                  The combination \D or [^[:digit:]] represents the complement of \d.

         The combination \l or [[:lower:]] represents all lower case letters.

                  The combination \L or [^[:lower:]] represents the complement of \l.

         The combination [[:punct:]] represents all punctuation characters.

                  The combination [^[:punct:]] represents the complement of [[:punct:]].

         The combination \s or [[:space:]] represents all space characters.

                  The combination \S or [^[:space:]] represents the complement of \s.

         The combination \u or [[:upper:]] represents all upper case letters.

                  The combination \U or [^[:upper:]] represents the complement of \u.

         The combination [[:xdigit:]] represents all hexadecimal digits.

                  The combination [^[:xdigit:]] represents the complement of [[:xdigit:]].




                                                                                                                    381
expressor Documentation



         The combination [[:unicode:]] represents an extended character whose code point is above 255.

         The combination \x, where x is a ‖r;magic‖ character, represents the character x.

                   Use this combination to represent the "magic" characters in a character class.

Set
A character class that includes a union of characters is indicated by enclosing the characters in square brackets:
[...] is referred to as a set.

You can specify a range of characters in a set by separating the end characters of the range with a dash. All character
combinations in the preceding list can also be used as components in a set. All other characters in the set represent
themselves.

For example:

              [\w] represents represent all alphanumeric characters plus the underscore

              [a-z] represents the lower-case characters

              [[:lower:]] represents the lower-case characters

The complement of a set is represented by including the ^ character at the beginning of the set. For example, [^1-
5] represents the complement of the set [1-5].

For all character combinations, the corresponding combinations that use uppercase letters represent the complement
of the combination. For example, \L represents all upper-space characters (since \l represents all lower-space
characters).




Pattern Item
A pattern item can be represented by:

         A single character class, which matches any single character in the class.

         A single character class followed by *, which matches zero or more repetitions of characters in the class.
          These repetition items always match the longest possible sequence.

         A single character class followed by +, which matches 1 or more repetitions of characters in the class. These
          repetition items always match the longest possible sequence.

         A single character class followed by ?, which matches zero or 1 occurrence of a character in the class.

         A single character class followed by {n} matches that character class repeated exactly n times.

         A single character class followed by {n,} matches that character class repeated n or more times.

         A single character class followed by {n,m} matches that character class repeated between n and m times
          inclusively.




382
                                                                                                          Semantic Types



Pattern
A pattern is a sequence of pattern items:

             ^ at the beginning of a pattern anchors the match at the beginning of the subject string.

             $ at the end of a pattern anchors the match at the end of the subject string.

A pattern cannot contain embedded zeros. Use %z instead.




Sub-expressions
A section beginning with an open parenthesis ( and ending with a close parenthesis ) acts as a marked sub-
expression.

For example, a(bc) matches the character sequence ‖r;abc‖ whereas a[bc] matches the character sequences ‖r;ab‖ or
‖r;ac‖




Alternation
The | operator matches either flanking pattern.

For example, a(b|c) matches the character sequences ‖r;ab‖ or ‖r;ac‖




Reference: Constraints on Semantic Types
There are seven data types you can use when creating a Semantic Type.


               string                               decimal                                   datetime


              double                                  byte                                    boolean


              integer




Each Semantic Type can be further constrained by assigning additional values. There are a total of nine values that
can be set on Semantic Types. The Constraint values that can be set depend on the data type of the Composite Type
attribute. For example, if the data type is String, Constraint values can be set for Minimum and Maximum Length,
Regular Expression, and Allowed Values, but other Constraint values such as Maximum Significant Digits do not apply.

Corrective Actions can be specified for handling data that does not conform to the constraints set. See Reference:
Constraint Corrective Actions.

When data does not meet the criteria established by constraints, users can resolve the resulting error in most cases.
See Semantic Type Errors.



                                                                                                                     383
expressor Documentation



          Minimum Length                           Maximum Length                  Maximum Digits Before Decimal


         Regular Expression                         Allowed Values                  Maximum Digits After Decimal


          Maximum Value                             Minimum Value                       Maximum Significant Digits


Types
string: represents character strings. The value space of string is the set of finite-length sequences of characters (see
XML Schema string) that "match" the Char production from XML 1.0 (Second Edition). A character is an atomic unit of
communication; it is not further specified except to note that every character has a corresponding Universal
Character Set code point, which is an integer.

decimal: a subset of the real numbers, which can be represented by decimal numerals (see XML Schema decimal).
The "value space" of decimal is the set of numbers that can be obtained by multiplying an integer by a non-positive
power of ten. Precision is not reflected in this value space; the number 2.0 is not distinct from the number 2.00. The
"order-relation" on decimal is the order relation on real numbers, restricted to this subset.

datetime: objects with integer-value year, month, day, hour and minute properties, a decimal-valued second
property, and a boolean time zone property.

double: the value space of double matches that of XML Schema double, which is patterned off the IEEE double-
precision 64-bit floating point type.

byte: the value space of byte is the set of finite-length sequences of binary octets.

boolean: this value space has two values, one value that denotes true and the other that denotes false.

integer: a subtype of decimal made by fixing the value of "Max Digits after Decimal" to 0 and disallowing the trailing
decimal point. This results in the standard mathematical concept of the integer numbers. The value space used to
identify integers is the infinite set.




Constraints

Constraint      Applicable                                           Description
                     to


Maximum         string, byte     A non-negative integer. For byte attributes, Maximum Length is the maximum number
Length                           of bytes in the value. For string attributes, Maximum Length is the maximum number
                                 of code points in the value. A code point is the number assigned in Unicode to
                                 represent a character.
                                 See http://icu-project.org/docs/papers/forms_of_unicode/.


Minimum         string, byte     A non-negative integer. For byte attributes, Minimum Length is the minimum number
Length                           of bytes in the value. For string attributes, Minimum Length is the minimum number
                                 of code points in the value. A code point is the number assigned in Unicode to



384
                                                                                                          Semantic Types



                            represent a character.
                            See http://icu-project.org/docs/papers/forms_of_unicode/.


Regular       string        The string parameter contains a regular expression against which data is tested. POSIX
Expression                  Extended regular expression syntax is used. Two sources for the details of the POSIX
                            regular expressions can be found at http://www.regextester.com/pregsyntax.html and
                            http://www.boost.org/doc/libs/1_34_0/libs/regex/doc/syntax_extended.html#extended

                            The regular expression can contain the logical OR operator (pipe character or vertical
                            bar "|") to indicate alternative values. When OR is used, a value is valid if either the left
                            or right pattern matches.

                            See Examples of Constraints Used to Cleanse Data below.


Allowed       all           Constrains the data value to be one of the members in an explicitly stated list of
Values                      values. Values in the list must be separated by commas.

                            Values must be literal, not surrounded by quotation marks.

                            Datetime values are entered CCYYMMDD HH:MI:SS. Date separators (/ and -) are
                            optional. For input, the datetime type within expressor Engine contains both date
                            and time, even if time is not included in the original value. As with string and numeric
                            data, values in a list must be separated by commas, and must not be surrounded by
                            quotation marks.

                            If Use Default Value is specified as the Corrective Action, the Correction Value must be
                            included in the list of Allowed Values.


Maximum       all numeric   The inclusive upper bound of the data value.
Value         and
                            The value must be in the value space of the base type.
              datetime


Minimum       all numeric   The inclusive lower bound of the data value.
Value         and
                            The value must be in the value space of the base type.
              datetime


Maximum       decimal       A non-negative integer. This constraint specifies the maximum number of significant
Significant                 digits a data value can contain, regardless of the position of the decimal point.
Digits                      Significant digits are all digits except leading and trailing zeroes. For example, if the
                            constraint is 3 Maximum Signigicant Digits, the number 5400 would pass the test and
                            5401 and 5400.01 would fail.


Maximum       all numeric   A non-negative integer. This constraint defines the maximum number of digits to the
Digits                      left of the decimal point in a value, not counting leading zeroes.
Before
Decimal




                                                                                                                        385
expressor Documentation



Maximum        decimal         A non-negative integer. This constraint specifies the maximum number of digits to
Digits                         the right of the decimal point in a value, not counting trailing zeroes.
After
Decimal




Examples of Constraints Used to Cleanse Data
Allowed values specify exactly and literally what value or values an Atomic Type will accept or not accept. For
example, the Allowed values could be specified as:




The value must be the string "123" and nothing else. If it is anything else, then the default Correction Value "0" will be
used instead. Note that "0" is included in the list of Allowed values because it must pass validation when it is
substituted for any values that are not allowed.

Regular expression specifies a pattern rather than an exact value that can be used. The regular expressions web sites
noted above explain the many special character constructions used in regular expressions. Below are a few simple
examples.




This expression indicates that the data must contain the pattern "123" with one or more instances of the "3" digit.
"12333" would match, as would "123". The values "124" and "1234" would not match, and in this case, the default
value would be used instead. Similarly, the expression



requires that the data have any lowercase alphabetic characters. If the data were to include uppercase or numeric
values (0-9), then the Corrective Action specifies the value should be set to Null.

The expression could be modified easily to include both upper case and lower case alphabetic characters by adding
A-Z: [a-zA-Z]. This regular expression matches single characters, not words or multi-character data. To make the
expression accept longer strings, simply add the plus sign: [a-zA-Z]+. See this tech blog on the expressor Community
site for an alternative expression to accomplish the same alphabetic upper and lower case match.




386
                                                                                                            Semantic Types



An expression that would accept only numbers is [0-9] and [0-9]+.

The following example is a more complex expression that test for valid US phone numbers.




This expression will match phone numbers written as 1-ddd-ddd-dddd, 1(ddd)-ddd-dddd, 1(ddd)ddddddd,
1(ddd)ddd-dddd, and 1dddddddddd.

The first symbol after the 1 is an open parenthesis to indicate that the sequence of symbols and characters between it
and its corresponding close parenthesis are to be treated as a whole. The first character is a hyphen, and it is followed
by a question mark. The question mark indicates that the hyphen is optional. The question mark is followed by a
vertical bar that means the next sequence is an alternative to the hyphen. That alternative is the parentheses to
enclose the area code. To prevent the open parenthesis from being interpreted as the special character it was after
the 1, it is preceded by the escape character–a backslash (\). That allows the open parenthesis to be interpreted as a
literal character. This open parenthesis is also followed by a question mark to indicate that it is optional.

The optional open parenthesis alternative is then followed by an unqualified close parenthesis, which in POSIX
Extended closes the group that was begun by the open parenthesis after the 1.

The area code numbers are defined as three decimal digits (\d{3}). That is followed by an escape backslash and
another close parenthesis. This means that the area code digits can be followed by a close parenthesis. But the
parenthesis is followed by another question mark that indicates the parenthesis is optional. So is the hyphen that
follows.

Note that this construct allows for either a hyphen or open parenthesis or neither after the 1. At the end of the area
code, though, data can have either the hyphen or open parenthesis or both. And again, it can have neither. Then
there is a \d{3} for the three-digit exchange number, followed by another optional hyphen (-?). Finally, there is \d{4}
for the last four digits of the phone number.

The validity of strings that represent zip codes can be tested in a similar manner.




The final example constructs a regular expression to text the validity of email addresses.




The expression "\l\u" is an alternative to the [a-zA-Z] construct used in the example above. It too specifies that the
characters must be alphabetic, either upper or lower case. The dot (.) added indicates that the user name portion of
the address can contain a dot, as in "james.smith". The plus sign indicates that the user name can be one or more
characters.

The @ symbol is literal, and it is followed by a domain name pattern that accepts upper and lower case alphabetic
characters but not a dot. The dot is indicated by the "\.". The escape character \ is required to obviate the symbolic
interpretation of the dot, which in this position would match any character except a newline. With the escape
character, the dot is treated as a literal dot, which separates the components of the domain name. Following the dot,
the top level portion of the domain name must be upper or lower case alphabetic characters, a minimum of two
letters as in country code (uk, de, fr) or at most three letters (com, org, edu).



                                                                                                                         387
expressor Documentation



The patterns that can be specified with regular expressions are rich and complicated. In addition to the regular
expressions specification, see http://en.wikipedia.org/wiki/Regular_expressions and
http://en.wikipedia.org/wiki/Regular_expression_examples for more examples of expressions you can create.




Reference: Constraint Corrective Actions
When a Constraint is set on an Atomic Type, a Corrective Action can be specified to deal with cases in which data
violates the constraint.

A corrective action changes the data so that it is valid for the Atomic Type. During processing of a unit of data, all
validations are performed before any corrective actions are taken. When taken, the corrective actions are performed
by group. The first group of corrective actions pertains to all constraints except the Regular Expression and Allowed
Values constraints. After the first group of corrective actions are performed, the Regular Expression and Allowed
Values constraints' corrective actions are performed.

When there are multiple constraint violations, the corrective action is determined by a priority ranking. The corrective
actions in the second table below are listed in their priority order. Only one corrective action is performed. So, if data
violates two constraints and the corrective actions are Default and Truncate Right, the Default action would be used.
The new data value would then be passed to the second group if there are any violations of the Regular Expression
and Allowed Values constraints.

If corrective actions are not specified for one or more constraints, then it is possible for data to fail validation. When
that happens, the operator processing the data must decide how to handle the error. Operators have property
settings that specify how invalid data is to be dealt with. Operators that handle errors have some or all of the
following property setting options:

         Abort Dataflow--the dataflow stops immediately. This is the default action.

         Reject Remaining--all remaining input is sent to the reject port on the operator.

         Skip Remaining--all remaining input is read but processing stops with the current operator.

         Reject Record--the current record is sent to the reject port on the operator and the dataflow continues
          processing the remaining data. (See Schemas for Rejected Records for a description of the structure of
          rejected data sent on a reject port.)

         Skip Record--the current record is dropped and the dataflow continues processing the remaining data.

         Skip Group--the current record and its group are dropped and the dataflow continues processing the
          remaining data. the current and remaining records in the group are skipped

         Reject Group--the current record and all remaining records in the group are sent to the reject port.

Abort Dataflow and Skip Remaining stop processing, but Skip Remaining allows all data that has been successfully
processed up to this operator to continue through the dataflow. The dataflow completes with a success code. The
Abort Dataflow corrective action stops all processing immediately and results in dataflow failure.

When an operator uses Reject Remaining, the rejected data can still be processed by operators attached to the reject
port. In this case, the dataflow can process the rejected data differently and so can complete successfully.




388
                                                                                                          Semantic Types



Reject and Skip Record allow the operator to deal with the one invalid record separately and continue processing
remaining records.

The Aggregate operator handles records in groups and as a result can perform error handling actions that apply to
groups. The group-based error handling setting options are:

         Reject Group--the current group and all remaining records in the group are sent to the reject port on the
          operator. This error handling action is used only for aggregations in which Method is set to either "In
          memory" or "On disk." These methods make all of the records in the group available for error processing.
          This error handling action is not used when Method is set as "Sorted."

         Reject Current Skip Remaining--the current input record is sent to the reject port and all remaining
          unprocessed records skipped. This can be used whenever a single error causes all further processing to be
          invalid. It allows the record that caused the problem to be examined.

         Reject Current Skip Group--the current input record is sent to the reject port and all remaining unprocessed
          records for the current aggregation group are skipped. Since the rejected record will include the keys used
          for aggregation, this error handling action identifies the group that failed as well the record that caused the
          problem.

         Skip Group--all records in the group are skipped, regardless of whether or not any have been processed
          without error. Processing continues for subsequent groups.

When any of the group-level error handling actions are taken, no output is produced from the affected group. A
record-level error handling action can, however, be used to ignore a record in a group and still produce output from
the remaining records in the group.

The following table describes the violations that occur for each constraint and the recovery actions that can be
specified. Each field-level recovery action is described in the second table.


   Constraint              Data Type               Corrective
                                                     Actions


Max Length             String and byte         Truncate Right,
                                               Truncate Left,
                                               Default, Null,
                                               <Escalate>


Min Length             String and byte         Pad Right, Pad
                                               Left, Default, Null,
                                               <Escalate>


Regular                String                  Default, Null,
Expression                                     <Escalate>


Allowed Values         All types               Default, Null,
                                               <Escalate>




                                                                                                                      389
expressor Documentation



Max Value               Numeric and               Default, Null,
                        datetime                  <Escalate>


Max Digits Before       All numeric types         Default, Null,
Decimal                                           <Escalate>


Min Value               Numeric and               Default, Null,
                        datetime                  <Escalate>


Max Significant         All numeric types         Round, Default,
Digits                                            Null, <Escalate>


Max Digits After        Decimal                   Round, Default,
Decimal                                           Null, <Escalate>




 Corrective                              Result
   Action

 (In Priority
   Order)


<Escalate>         Do not correct the constraint violation. Pass the
                   invalid data to a dataflow operator for error
                   handling.


Null               Substitutes a null value for the value that failed
                   validation.


Default            Substitutes a recovery value, supplied by the
                   user, in place of the value that failed validation.


Truncate           Removes characters (code points) or bytes from
Right              the end of string or byte data that failed
                   validation so that the constraint criteria is met.


Truncate Left      Removes characters (code points) or bytes from
                   the beginning of the string that failed validation
                   so that the constraint criteria is met.


Round              Rounds the value to fit both the constraint
                   criteria and the capabilities of the internal
                   representation. The rounding method used is




390
                                                                Semantic Types



            half even.


Pad Right   Pads the end of string or byte data with the
            character (code points or bytes) specified as the
            Correction Value.


Pad Left    Pads the beginning of string or byte data with
            the character (code points or bytes) specified as
            the Correction Value.




                                                                          391
Datascript Modules

Datascript Modules
Datascript Modules are user-written datascript functions created as standalone workspace artifacts. As separate
artifacts, they can be shared and reused by multiple datascript operators, such as Transform and Aggregate, instead
of rewriting the datascript in each operator that needs it.

Datascript functions can also be included in External Files, but Datascript Modules are the preferred vehicles for
including user-written functions. External Files must be managed by the user, and External File functions referenced
by datascript operators or Datascript Modules must be managed manually by the user. If the user fails to include an
External File containing functions in a Deployment Package, references to functions in the External File will fail when
the compiled dataflow attempts to run. Datascript Modules are managed automatically, so if a function is a module is
referenced by a datascript operator or another Datascript Module, it will automatically be included in Deployment
Packages when required. Datascript Modules are also managed for compatibility. For example, name clashes between
Modules are prevented.

External File functions also should not call functions in Datascript Modules because such references are not managed
automatically. For example, if a Datascript Module referred to by an External File is not otherwise referenced in a
dataflow, it will not be included in a Deployment Package, and as a result, the External File's function reference would
not be found when the dataflow is executed from its Deployment Package.

Datascript Modules are written in an editor specifically designed for Datascript Modules.




The Datascript Module Editor provides for manual entry of datascript code, and the ribbon bar provides buttons for
entering new functions, including calls to functions from other Datascript Modules, and inserting built-in expressor
Datascript functions. See Reference: Datascript Module Editing for details on using the Editor for writing Datascript
Modules.

Modules are listed in the Workspace Explorer in the artifact folder named Datascript Modules. From there they can be
included in datascript operators by using the Rules Editor. The Edit tab on the Rules Editor's ribbon bar includes a
From Modules button that automatically generates the correct require statement to include a Datascript Module in
an operator's transformation code.




                                                                                                                       393
expressor Documentation




There are two types of user-written datascript functions--Public and Private. Public functions are easily accessible
because they are added to the lists of available functions displayed as you type and in the From References drop-
down list. Private functions are not accessible in those automated ways. Functions are designated Private when they
do not have broad applicability. For example, a Private function might be a function you include for a specific purpose
in a Datascript Module and you do not expect to reuse that function elsewhere.

Snippets are another important aspect of Datascript Modules. Snippets are skeletal code constructs, such as if-then-
else statements, and comment blocks that are important to the processing of Datascript Modules. The Datascript
Module Edit tab on the Studio ribbon bar includes snippets for code, Public and Private Functions, and References.

Code snippets set up correct statement constructs to simplify the task of writing datascript. For example, choosing the
If Else code snippet would insert the following statement construct in a Datascript Module:

      if condition then

      else
      end

Public and Private Function and Reference snippets insert standard comments that are used in Datascript Modules to
indicate how a function is to be treated in certain situations. For example, the comment attached to the require
statement illustrated in the screen shot above (--@datascript module dependency) indicates that the Datascript
Module named by the require statement will be included in a Deployment Package when it is created for the
Project. That is comment produced by a Reference.

See Learning to Write Datascripts for a tutorial on expressor Datascript. For more information about writing Rules
with Datascript, see Write Rules.


Create a Datascript Module
      1.    Select the New Datascript Module button from the Home tab of the ribbon bar.

      2.    Select the Project or Library in which to place the Module from the drop-down list in the New Datascript
            Module dialog box.

      3.    Name the Module with a name that is unique within the workspace.

      4.    Provide a description of the purpose of the Datascript Module (optional).

           The Datascript Module Editor opens when you finish the New Datascript Module dialog box.



394
                                                                                                     Datascript Modules




5.    Use the buttons on the Datascript Module Edit tab of the ribbon bar to enter properly structured datascript
      code.

     The Datascript Module Edit tab on the ribbon bar contains tools to facilitate writing and structuring datascript.
     Some of the buttons, such as New Function and Code, insert structural elements of a function or a code
     construct like if-then-else. You must then enter the elements that are specific to your datascript. Other
     buttons, such as String, provide built-in functions that you can select for inclusion. The From Modules button
     allows you to insert references to functions in other Datascript Modules.




     See Reference: Datascript Module Editing for details on each of the buttons in the Datascript Module Edit tab.

     See Learning to Write Datascripts for instructions on writing properly structured datascript.

     The following is an example of a function written in the Datascript Module Editor:




                                                                                                                   395
expressor Documentation



      6.    Save the changes with the Save icon on the Quick Access Toolbar.


Include a Datascript Module in a Datascript Operator
      1.    Select a datascript operator in a dataflow.

      2.    Open the Rules Editor on the datascript operator.

      3.    Select the Edit tab on the ribbon bar in the Rules Editor.




      4.    Select the From Modules button to drop down its menu.

      5.    Select either a function from one of the Datascript Modules on the menu or the Module References option.

      6.    If you chose the Manage References option, select each of the Datascript Modules you want to include from
            the Manage References dialog box.
            The selected Datascript Modules are added to the Datascript Operator with a require statement.




           The require statement causes the named datascript module to be read into the operator's script at the time
           the operator processes data.

       Note: Reference to a Datascript Module in a require statement must include the name of the Project or
                  Library that contains the module and the version number. In the screen shot above, "Google0.0"
                  is the name and version of the project containing the Contacts datascript module.




396
                                                                                                     Datascript Modules



       The comment inserted before the require statement marks the datascript module for inclusion in any
       Deployment Packages created with the dataflow that is using the datascript operator in which the module is
       included.


Reference: Datascript Module Editing
When a Datascript Module is open in Studio, the Datascript Module Edit tab is available on the ribbon bar. The
buttons on this tab facilitate building properly structured datascript. Datascript code may also be entered directly in
the editor. When function are selected from buttons in the Edit tab, specific values for the application must be
entered manually. For example, if the concatenate option is selected from the String button's drop-down list, the
function appears in the Datascript Module as:

     string.concatenate(, )

When a function or snippet is selected, it is always inserted at the cursor location.

Click the hot spots on the following graphic to see the function of each button.




The New Function button inserts a Public or Private function into the Datascript Module at the cursor location.




You can then replace functionname with an appropriate name for the function you are writing. Insert the function
code between the function and end statements.

A Public Function is readily visible and accessible in any context (Rules Editor or Datascript Module Editor) because it
is added to the lists of available functions displayed as you type and in the From Modules button drop-down menu.

A Private Function is not visible in drop-down menus and type-completion lists. Functions are designated Private
when they do not have broad applicability. For example, a Private function might be a function you include for a
specific purpose in a Datascript Module and you do not expect to reuse that function elsewhere.

Snippets are skeletal code constructs, such as if-then-else statements, and comment blocks that are important to the
processing of Datascript Modules. An example of a code snippet is a For Loop.

     for(int i = 0; i < end; i++)




                                                                                                                     397
expressor Documentation



Comment snippets mark an ensuring block of code for special handling. For example, a Reference snippet inserts a
comment before a require statement to indicate that the referenced Datascript Module is to be included in
Deployment Packages with the dataflows that have that reference.

The From Modules button's drop-down menu allows you to select another Datascript Module to include in the
current module. References to other Datascript Modules are made by the require statement. For example:




      Note: Reference to a Datascript Module in a require statement must include the name of the Project or
              Library that contains the module and the version number. In the example here, "Project1.0" is the
              name and version of the project containing DatascriptModule2.

Once another module is included, the From Modules button allows you to insert a call to a specific function
contained in that other module.

The Comment Out button inserts a comment marker for the line in which the cursor is located or for multiple lines if
a block of code is selected. The Comment marker allows you to leave the code in but not execute it. Comment
markers can also be used to retain code that has errors until you can fix them.

The Uncomment button removes comment markers from the line in which the cursor is located or from multiple lines
if a block of code is selected.




398
Lookup Tables

What is a Lookup Table?
A Lookup Table is a database table designed to serve a special, limited function within a data integration application
or group of applications. Lookup Tables are usually created from a subset of data from a larger table or from a source
designed to add data that an application can use. For example, a Lookup Table might be created to provide
department names to data from a source that contains only department numbers. During the process of integrating
data, the Lookup Table could be read to add department names to department numbers.

The advantage of Lookup Tables is that they are stored within an expressor Project and are included in Deployment
Packages. Access to them is thereby made easier and faster. When their function and size are limited, accessing their
data is also easy and fast.

Lookup Tables can be used by multiple dataflows in a Project. Sharing a Lookup Table among multiple dataflows is
straightforward when the Project is deployed because the Deployment Package places all the Lookup Tables in a
single location--the Project's external directory. If Lookup Tables are to be shared by multiple dataflows from
within expressor Studio, independent of a Deployment Package, an explicit location should be specified in the
Lookup Table artifact.

In expressor Studio, Lookup Tables are defined by a Lookup Table artifact contained within a Project or Library, just
like Dataflows, Connections, Schemas, and Semantic Types. The Lookup Table artifact defines the structure of records
in the physical Lookup Table.

The Lookup Table artifact defines the structure as a Composite Type. As such, the Lookup Table artifact has
Composite Type attributes defined by Atomic Types. In addition, the Lookup Table artifact defines a Lookup Key or
keys that are used when accessing the Lookup Table. For example, if the Lookup Table contains department names to
match department numbers, the department number could be set as the key so that any time an application
encountered a record with department number, it could use that as the lookup key to find the matching department
name in the Lookup Table.

A Lookup Table artifact can contain multiple keys, though only one key is used for each Lookup Rule written for a
Datascript Operator. Lookup Table keys are associated with one or more of the Table's attributes. When a key is used,
the key's attributes are used to match data in the fields of the Lookup Table specified by those attributes.

Lookup Tables are populated with data by the Write Lookup Table operator. The operator uses the Lookup Table
artifact to read from the data source and structure the data in the Lookup Table. Once the Lookup Table has been
populated, the Transform operator can access the data with Lookup Expression Rules and Lookup Function Rules.

In addition to the Write Lookup Table operator, expressor includes a Read Lookup Table operator that enables you to
read and review the data in the Lookup Table.




                                                                                                                    399
expressor Documentation



Define the Structure of a Lookup Table
A Lookup Table is defined by a Lookup Table artifact. A Lookup Table artifact consists of a Composite Type and
lookup keys. The Composite Type can be a Shared Type, or it can be a Local Type created at the time the Lookup
Table artifact is created.

      1.    Select the New Lookup button on the Home tab of the Studio ribbon bar.

      2.    Select the Project or Library in which to create the Lookup Table artifact.

      3.    Name the Lookup Table artifact.

      4.    Describe the Lookup Table artifact (optional).

After creating the artifact, it appears in the Project's or Library's list on Lookup Table artifacts in the Studio Explorer.

      1.    Double-click the artifact name in the Explorer to open it in the center panel of Studio.


           The new Lookup Table
           artifact opens a Local
           Composite Type with no
           attributes. The Lookup
           Table Edit tab on the
           ribbon bar enables you
           to add, edit, and delete
           Composite Type
           attributes, Keys, and Key
           Attributes.



      2.    Assign an existing File Connection artifact to the Lookup Table artifact with the Assign drop-down menu next
            to the File Connection field.
            Assigning a File Connection is not required, but doing so will provide a means of locating the Lookup Table
            if you later want to use it in another Deployment Package or if you want multiple dataflows to access the
            Lookup Table while running in Studio.

           If a location for the Lookup Table is not specified, it is placed in the working directory. In this default location,
           the Lookup Table cannot be shared by other dataflows running in Studio. When a dataflow is compiled in a
           Deployment Package, its Lookup Tables are placed in the Project's external directory, and from there other
           dataflows in the Deployment Package can access it.

      3.    Assign an existing Shared Composite Type with the Assign drop-down menu next to the Composite Type
            field.
            You can also elect to assign a new Local Composite Type. You might do that if you decide not to use the
            Composite Type that you have created for the Lookup Table.

      Note: Once a Lookup Table is created with the Composite Type in the Lookup Table artifact, the Table
                 does not change if changes are made to the Composite Type or the Lookup Table artifact. The
                 Lookup Table itself is constructed with the Lookup Table artifact as it was at the time the Table



400
                                                                                                         Lookup Tables



             was built. The Lookup Table would have to be regenerated to reflect changes to the Composite
             Type or the Lookup Table artifact.

    4.   Add Composite Type attributes with the Add button in the Attributes section of the Lookup Table Edit tab
         on the ribbon bar.

    5.   Add lookup keys with the Add button in the Keys section of the Lookup Table Edit tab on the ribbon bar.
         You must add at least one key because a key is required to look up data in the Lookup Table. In many cases,
         only one lookup key is needed.

    Note: Lookup Tables must have at least one attribute that is not used as a key.

    6.   When adding a key, select the Unique check box in the Add Key dialog box if you intend for the Key
         attribute(s) to have a unique value in the Lookup Table.
         A Unique key in a Lookup Table is the same as a Primary key in a database table. When a lookup is
         performed with a Unique key, only one record will be returned from the Lookup Table. A non-unique key
         would return multiple records if there are multiple instances of the key value in the Lookup Table.

    7.   Add lookup key attributes with the Add button in the Key Attributes section of the Lookup Table Edit tab on
         the ribbon bar.
         Adding an attribute to a key makes the attribute the basis for searching the Lookup Table with that key.
         More than one attribute can be assigned to a key so that multiple attributes form the basis of a search on
         that key.




Populate a Lookup Table
Before a Lookup Table can be populated with data, the Lookup Table artifact must be defined. Once the artifact is
defined, the Lookup Table can be populated following these steps:

    1.   Add a Write Lookup Table operator to a dataflow and link it to an operator whose output provides the data
         intended for the Lookup Table.
         The Write Lookup Table operator, like other Write operators, does not have an output port. Like the other
         Write operators, Write Lookup Table must be placed at the end of a dataflow or a dataflow step.

    2.   Name the Write Lookup Table operator with the Name property in the Properties panel.
         Changing the default name of the operator is not required, though it is useful to give the operator a name
         that indicates its specific purpose in the dataflow.

    3.   Select the Lookup Table artifact from the drop-down menu next to the Lookup Table property.

    4.   Select an option for Error handling.
         The action specified will be employed when the operator encounters a data error. The most common sources
         of data errors are mappings from schema fields to Atomic Types and constraint violations.

    5.   Select or deselect the Show errors check box to display data errors and results of recovery actions in Studio
         Messages panel and expressor command prompt window when using the etask command.

    6.   Select whether or not to truncate the Lookup Table.




                                                                                                                    401
expressor Documentation



      7.   Run the dataflow.
           The Lookup Table will be populated each time the dataflow or dataflow step containing the Write Lookup
           Table operator is run.

      Note: The Write Lookup Table operator locks the Lookup Table while loading it, so the operator must be
               used in a dataflow Step prior to the Step that contains Lookup Rules or a Read Lookup Table
               operator. It can also be in a separate dataflow to ensure that it is not being loaded when a lookup
               operation takes place. If the Lookup Table is loaded by a separate dataflow, remember that its
               location must be explicitly specified so that it can be located by the other dataflows.




Use Data in a Lookup Table
Data in a Lookup Table is used by Datascript Operators, such as the Transform operator. The Lookup Table data is
matched with data from the Datascript Operator's input by using the keys specified in the Lookup Table artifact.

Access to the Lookup Table data is provided by Lookup Rules. Lookup Rules return 0, 1, or multiple outputs
corresponding to the data records found in the Lookup Table. For example, the following illustration of a Lookup Rule
contains a Lookup Table named "StateLookupTable" and uses the Key named "StateAbbrev." The Lookup Table itself
contains two fields--"StateAbbrev" and "StateFullName." The Key specifies that the field "StateAbbrev" is used to find
records in the input data. The input's "State" attribute is mapped to the Lookup Table's "StateAbbrev" field, and when
a match is found, the "StateAbbrev" and "StateFullName" are output from the Lookup Rule.


                                                                                The "StateFullName" output field
                                                                                is then mapped to the output
                                                                                attribute for "StateFullName." The
                                                                                input attribute "State," which is
                                                                                the two-character abbreviation for
                                                                                US state names, is automatically
                                                                                one of the output attributes. It has
                                                                                a matching Lookup Table output
                                                                                in "StateAbbrev," but they are not
                                                                                mapped to one another because
                                                                                the abbreviation is no longer
                                                                                needed. The output attributes
                                                                                now includes the full state name
                                                                                instead. In this illustration, getting
                                                                                that full name was the reason for
                                                                                using a Lookup Table.

                                                                                Notice that On miss option
                                                                                chosen is Generate Record. To
                                                                                generate a record in place of a
                                                                                missing record, the rule must use
                                                                                the generate function to specify




402
                                                                                                             Lookup Tables



                                                                                    what output will be generated
                                                                                    and used in place of the missing
                                                                                    record.

                                                                                    In this case, the generate function
                                                                                    simply substitutes the string
                                                                                    "NoName" in place of
                                                                                    StateFullName for any state name
                                                                                    for which an abbreviation does
                                                                                    not exist in the Lookup Table.


Both Lookup Expression Rules and Lookup Function Rules can be marked as a Range Lookup. A Range Lookup works
on Lookup Tables that are constructed as ranges. For a range lookup to work, the key values in the Lookup Table
must be sorted into ranges. For example, a Lookup Table that has key values of 100 and 200 has ranges of 0-100 and
101-200. The values of 100 and 200 represent the top of the ranges. A Range Lookup on an input value of 50 would
return the record with the key value of 100 because 50 is in the range 0-100.

If the Lookup Table had key values of 100, 101,103, and 200, it would be less useful for Range Lookups because it has
very small ranges 101-101 and 102-103.

The key used for a Range Lookup must be a Unique key. When a lookup is performed with a Unique key, only one
record will be returned from the Lookup Table. A non-unique key returns all the matches it finds in a Lookup Table.




Read Data in a Lookup Table
The Read Lookup Table operator provides a means for seeing the data in a Lookup Table. This allows users to review
and verify data in the Lookup Table and move it to another source if necessary. Unlike tables in other databases,
Lookup Tables cannot be accessed with SQL queries. Data is stored in expressor's internal format that provides for
greater uniformity and efficiency in lookup operations.

Like other Read operators, Read Lookup Table is placed at the beginning of a dataflow or a dataflow step. Ordinarily,
it is linked directly to a Write File or Write Table operator that writes the data from the Lookup Table to a file or
database that can be read directly.

    1.   Add a Read Lookup Table operator to a dataflow and link it to an operator that can write the data to an
         external target.
         The Read Lookup Table operator, like other Read operators, does not have an input port. Like the other Read
         operators, Read Lookup Table must be placed at the start of a dataflow or a dataflow step.

    2.   Name the Read Lookup Table operator with the Name property in the Properties panel.
         Changing the default name of the operator is not required, though it is useful to give the operator a name
         that indicates its specific purpose in the dataflow.

    3.   Select the Lookup Table artifact from the drop-down menu next to the Lookup Table property.




                                                                                                                          403
expressor Documentation



      4.   Select or deselect the Show errors check box to display data errors and results of recovery actions in the
           Studio Messages panel and expressor command prompt window when using the etask command.

      5.   Run the dataflow.
           The Lookup Table will be written to an external target each time the dataflow or dataflow step containing the
           Read Lookup Table operator is run.




404
Deployment Packages

Deployment Packages
Deployment Packages are collections of one or more dataflows that can be run by the expressor Engine. Dataflows
in a Deployment Package are compiled with the artifacts (Connections, Schemas, and Semantic Types) that were used
to create the dataflows.

The Deployment Package makes dataflows portable because they can be run on systems with a standalone Engine.
When a Deployment Package is deployed to a production environment, the dataflows in the package are called out
for separate execution. See the etask command line utility for instructions on running dataflows in a Deployment
Package.

Deployment Packages can be checked into the expressor Repository. From the Repository, Deployment Packages
can easily be ported to systems where they can be run by the Engine. Otherwise, they have to be moved manually
from a Studio environment to systems running the Engine. See Running Deployment Packages with the Engine.

Another advantage of using the Repository is that it provides version control for Deployment Packages and all the
other artifacts that are stored there.

When a Deployment Package is created in Studio, External Files are added to it along with the compiled dataflows.
External files are any files that you want the compiled dataflows to have access to. These are commonly external
datascripts that a transformation operator, such as Transform or Join, calls for execution within the internal datascript.

External files can also be data files used by the Read File and Write File operators. Data files do not have to be
included in External Files, though, because File Connections can specify address file locations where those files are
stored. Also, data files can be large and thus not practical for storing in the Deployment Package.

All files that are included in External Files are placed in the
expressor/Workspaces/workspace_name/Metadata/project_name/external directory.




Create a New Deployment Package
    1.     Click the New Deployment Package button under the Home tab of the ribbon bar.
           The New Deployment Package dialog box displays.




    2.     Select the project in which to place the Deployment Package.

    3.     Name the Deployment Package

    4.     Provide a description of the purpose of the Deployment Package (optional).




                                                                                                                        405
expressor Documentation



      Note: When you are using the free Community Edition Studio, you cannot create deployment packages.
                  To create deployment packages, you must have a 30-day trial license or a long-term license for
                  expressor Standard Edition or a license for expressor Desktop Edition .




Build a Deployment Package
When a new or existing Deployment Package is open in the Studio Deployment Package panel:

      1.    Click the Add button in the ribbon bar.


           The Add
           Dataflow
           dialog box
           displays
           with a list of
           dataflows
           available to
           include in
           the
           Deployment
           Package.



           Alternatively, drag and drop individual dataflows from the Explorer panel to the Compiled Dataflows section of
           the Deployment Package panel.

      2.    Select the dataflows to add to the Deployment Package.


           When added,
           the dataflows
           are compiled
           with the
           Connections,
           Schemas, and
           Semantic
           Types they use.
           Also, all files in
           the External
           Files folder in
           the Explorer
           are added to
           the External
           Files section of




406
                                                                                                     Deployment Packages



         the package.


         If the dataflow cannot be compiled, it is not added to the Deployment Package.

    3.    Deselect any External Files that you want to exclude from the selected dataflow.

    4.    Save the Deployment Package with the save icon in the Quick Access Toolbar.




Update Artifacts in a Deployment Package

Update a Package                 Remove a Dataflow from a
                                 Package


Review the Source of a           Refresh the Status Symbols
Dataflow in a Package


Update a Package
    1.    Select the dataflows or external files in the package to be updated.

    2.    Click the Update button on the Deployment Package Edit tab of the ribbon bar.

    3.    Select the desired option from the Update button drop-down menu.
          Dataflows are updated with changes made to the dataflow versions currently in Explorer. This procedure
          does not update against versions of the dataflows that might be stored in expressor Repository.
          Similarly, external files are updated with changes to the version of the file that exists in the Explorer. Changes
          in versions of the external files in the Repository or on disk are not included in this update procedure. Only
          files that have not been deselected are updated.

    Note: Updating of external files is not completed until the Deployment Package is saved. This allows you
               to reverse the update.

         To include new external files that have been added to the External Files folder in Explorer, select and update
         dataflows in the Deployment Package. Updating External Files does not add external files to the Deployment
         Package.




Review the Source of a Dataflow in a Package
    1.    Select a dataflow in the Deployment Package.

    2.    Click the Open Source button on the Deployment Package Edit tab of the ribbon bar.
          The current version of the dataflow in the Explorer is displayed in the Dataflow Editor.




                                                                                                                        407
expressor Documentation



Remove a Dataflow from a Package
      1.   Select a dataflow in the Deployment Package.

      2.   Click the Remove button on the Deployment Package Edit tab of the ribbon bar.
           Removing a dataflow has no effect on the external files in the Deployment Package.




Refresh the Status Symbols
Each dataflow and external file listed in a Deployment Package has a status symbol in a left-hand column. Those
symbols reflect the status of the dataflows and external files in the Package with respect to their source version listed
in the Explorer. In the following screen shot, both the dataflow and the external file have the status symbol that
indicates they are not synchronized with corresponding source file in the Explorer.




      1.   Click the Update button Update button on the Deployment Package Edit tab of the ribbon bar and update
           both the dataflow and included external files.

      2.   Click the Refresh Status button and select Selected Compiled Dataflows from the drop-down list.




408
                                                                                                     Deployment Packages




         You see in the screen shot that the yellow status symbols have changed. Dataflow1_Feb8 now has a green
         check mark in its status column. The DeptartmentTables.txt external file has a green check mark, but it
         overlays the yellow symbol. That is because updates of external files is not complete until the Deployment
         Package changes have been saved. See Note above.




Manage External Files
All files in the External Files folder in the Explorer are automatically added to list of External Files in the Deployment
Package panel when a dataflow is added to the Deployment Package. The files do not have to be referenced by the
dataflow to be included in the Deployment Package.

Remove Files from a Deployment Package
    1.    Deselect the external files you intend to remove by removing the check mark from the check box next to the
          files listed in the External Files section of the Deployment Package editor.

         All files that have been deselected will not be included in the Deployment Package, even though they
         continue to be listed, unchecked, in the External Files list.

         Deselected files do not have a status symbol next to them in the External Files list.

Update Files in a Deployment Package


                                                                                                                         409
expressor Documentation



      1.    Select the external files in the package to be updated.

      2.    Click the Update button on the Deployment Package Edit tab of the ribbon bar.

      3.    Select the Selected External Files option from the Update button drop-down menu.
            The external files are updated with changes to the version of the file that exists in the Explorer. Changes in
            versions of the external files in the Repository or on disk are not included in this update procedure. Only files
            that have not been deselected are updated.

      Note: Updating of external files is not completed until the Deployment Package is saved. This allows you
                  to reverse the update.

           To include new external files that have been added to the External Files folder in Explorer, select and update
           dataflows in the Deployment Package. Updating External Files does not add external files to the Deployment
           Package.




Run a Compiled Dataflow
      1.    Select one or more dataflows in the Compiled Dataflows list.

      Note: Multiple dataflows are run sequentially.

      2.    Click the Start button on the Deployment Package Edit tab in the ribbon bar.

      Note: When running a compiled dataflow within Studio, the paths to data files and any external scripts
                  used by the dataflow must be accessible. Paths specified in Connection files must be reachable
                  from the system running Studio. If a relative pathname is used, it must start with the deployment
                  package's external directory (workspace_name\Metadata\project_name\dpp\deployment-
                  package_name\external).
                  External scripts called by datascript within transformation operators must be in the deployment
                  package's external directory or accessible in the require statement's search path order. See
                  Call external scripts from an expressor script for details on including external scripts in datascript.

      3.    View messages from the test run in the Run Status beneath the External Files section.

      4.    Check the output source file or database to confirm the dataflow produced the intended results.




Deploying Packages Manually
If you are not using expressor Repository to store deployment packages, you can deploy them manually to a system
running expressor Engine.

      1.    Locate the Workspaces directory on your system. Its location varies according the platform Studio is installed
            on.

                  C:\Documents and Settings\<user_name>\My
                  Documents\expressor\Workspaces (Windows XP)




410
                                                                                           Deployment Packages



        C:\Users\<user_name>\Documents\expressor\Workspaces
        (Windows 7)
2.   In the Workspaces directory, open the folder with the name of the workspace in which you created the
     deployment package.

3.   Go to the Metadata\SampleProject.10\dpp directory in that workspace and copy the deployment package.

4.   Paste the deployment package into a directory on a system with expressor Engine installed.
     If you explicitly installed the Engine when you installed Studio, then you can use the same system on which
     you built this sample deployment package as the deployment system.

5.   Run dataflows in the deployment package.




                                                                                                             411
Samples and Solutions

Build and Deploy an expressor Application
This topic pulls all the pieces together and explains how to get from the start--setting up workspace--to deploying a
finished application on server running the expressor Engine. Every step in this process is covered in complete detail
in various sections of the expressor documentation, and links are included to relevant sections at different points in
this process. To build this sample application, it is best to install or get access to expressor Repository. If you cannot,
you can follow the steps at the end of the example for deploying the deployment package manually. Either way, you'll
get the experience of building an application and running it in a deployment environment.

      Note: When you are using the free Community Edition Studio, you cannot create deployment packages.
              To create deployment packages, you must have a 30-day trial license or a long-term license for
              expressor Standard Edition or a license for expressor Desktop Edition .

Choosing the Workspace
Deployment Packages can be built in either a Standalone Workspace or a Repository Workspace. If built in a
Standalone Workspace, deployment packages have to be deployed manually. When built in a Repository Workspace,
deployment packages can be stored in a repository and then checked out on the Engine server. There are, of course,
other advantages to using a repository. It provides version control and the ability to share projects, so this topic
creates a Repository Workspace to demonstrate the full capability of the expressor Repository and Engine.

Create a Repository Workspace named SampleRepoWS. For the sake of brevity, create just one Project and no
Libraries. Name the project SampleProject1.

      Note: If you do not have access to Repository, you will have to create a Standalone Workspace. You can
              then proceed with every step of this example until you get to checking into a repository. At that
              point, you will have to deploy according the manual steps at the end of the example.

Creating Dataflows
With SampleProject1 open in the SampleRepoWS, we can begin creating dataflows. This topic creates two dataflows
for the sake of illustrating how a deployment package handles multiple dataflows. Create the first dataflow and name
it SampleDataflow1.


When the
dataflow
editing
panel
opens and
the
Operators
tab is
visible in
the left




                                                                                                                       413
expressor Documentation



panel,
move one
Read File
and one
Write File
operator
into the
editor.




Connect the two operators by placing the cursor on the Read File's output port on the right side of the shape. Then
create a Connection to file-based resources. In the Path for the Connection, simply put a single period (.) because we
are going to include the resource file in the External Files so it will be placed in the directory where the dataflow
application is executed. Name the Connection SampleConnection1.

Next, create a delimited Schema for a file named presidents.dat that will be the data source for this dataflow. The
data source files used in this example will be those in the Studio Sample Workspace. In the Program Files directory on
the local disk where Studio is installed, locate the directory
expressor\expressor3\Studio\samples\studio_sample_workspace. Create copies of the files
presidents.dat, salesOrderDetail.txt, and salesOrderHeaders.txt and place them in convenient location
to use with this example.

Then, when creating the Schema, read the presidents.dat file into the first New Delimited Schema dialog box.




Name the schema PresidentsFileSchema. This one schema will be used for both the Read File and Write File operators
in SampleDataflow1.

After creating PresidentsFileSchema, open it in the Schema Editor in the center panel by double-clicking its name in
Studio Explorer or right-clicking and selecting Open.




414
                                                                                                   Samples and Solutions



In this example,
we will use the
Local1 Semantic
Type displayed
in the Schema
Editor rather
than create a
shared
Composite Type
or any shared
Atomic Types.




Next, configure the properties for the Read File and Write File operators in SampleDataflow1 according to the screen-
shot illustration below:




When you finish setting the properties for the operators, notice that both shapes in the dataflow panel turn white. The
color white indicates that an operator's properties are set and are compatible with the operators to which they are
connected. The operator connected to the white operator can still be yellow. Yellow indicates that it is not configured
completely or compatibly with all the operators it is connected to, but the fact that one of its connected operators is
white indicates that the yellow operator is configured correctly for the link to the white operator.




                                                                                                                      415
expressor Documentation



Now we will create a second dataflow as in the following illustration and name it SampleDataflow2.




Before configuring the properties for each of the operators in SampleDataflow2, create the Schemas and Types
needed in the parameters.

The Connection for this dataflow's resource file will be the same as SampleDataflow1, so we will use
SampleConnection1.

The Schema will be different, however, and we need three of them. Name them:

          OrderDetailSchema

          OrderHeaderSchema

          OrderSummarySchema

Create delimited schemas for the first using salesOrderDetail.txt, and salesOrderHeaders.txt as the
resource files to read into the schemas on the first page of the New Delimited Schema dialog box.

To create the OrderSummarySchema, simply enter two field names on the first page of the New Delimited Schema
dialog box:

          SalesOrderID

          LineItems

Open the three new schemas. For all three, we will again use a local Composite Type adapted from the one that is
automatically generated for mapping with each schema.

For both OrderDetailSchema and OrderHeaderSchema, add a New Local Type, which will automatically be named
Local2. Remove all the attributes except SalesOrderID. Set the Data Type for SalesOrderID to Integer.




416
                                                                                              Samples and Solutions



In the OrderSummarySchema, change the Data Type for the two attributes (SalesOrderID and LineItems) from String
to Integer.

Now set the properties and map the transformations in the Rules Editor as each illustration below demonstrates.

Read Header File




                                                                                                                  417
expressor Documentation



Read Details File




Join

When you select the Join operator and its properties display, there will be only one attribute--SalesOrderID--in the
Join keys list.




418
                                                                                                  Samples and Solutions




If you see more than one attribute in the Join keys list, one or both of the Read File operators is configured to use the
Local1 Composite Type. Check the properties of both Read File operators to make sure they are using the Local2
Composite Type. When both Read File operators are using the Local2 Composite Type, the attributes of that Type will
automatically be propagated to the Join operator.




                                                                                                                     419
expressor Documentation




420
Samples and Solutions




                 421
expressor Documentation



Aggregate

When you select the Aggregate operator, you might notice as with Join that there are more keys to select from than
you might expect.




Again, the only key you need is SalesOrderID. You must open the Rules Editor and resolve the input Type. Once that is
done, you can set the output Type by using Assign Type button or clicking on the Assign from Input... link in the
output panel. Be sure to select As a Local Type as the Assignment method because you need to add an attribute to
the new Type. That attribute is LineItems. This attribute will map to the LineItems field in the OrderSummarySchema
when we configure the Read File operators.

To add the LineItems attribute, click the Add button in Output Attributes section of the ribbon bar. The Add Attribute
dialog box opens:

Name the new attribute "LineItems" and select "Integer" as the Data type.

In the Rules Editor, connect SalesOrderID in the input Type to both SalesOrderID and LineItems in the output Type.
Select "count non-null" from the drop-down menu in the rule box attached to the mapping line to LineItems. That will
serve as the count variable as the application counts the number of line items in each sales order. The Rules Editor
window will now look like:




422
                                                                                                   Samples and Solutions




Filter

The Filter operator has Error handling properties. The default setting is Abort Dataflow. Leave that as it is. Also leave
the Show errors box checked. Then double-click on the shape in the Dataflow2 and bring up the Rules Editor.




To write the test for true, click in the box on the mapping line and the Expression Editor pane will open above the
mapping. Delete the word true and write "input.LineItems>10". That will select sales orders based on whether they
have more than ten line items or fewer than ten.




                                                                                                                       423
expressor Documentation




Write Large Orders




424
                                                                                                   Samples and Solutions



Write Small Orders




Adding External Files
External Files are added to a project by right-clicking on the artifact folder in the Explorer and selecting Add Files. For
this example, we want to add the three files copied above when creating the schemas:

        presidents.dat

        salesOrderDetail.txt

        salesOrderHeaders.txt

Building the Deployment Package
Create a new Deployment Package with default name "SampleDeploymentPackage1." When the deployment package
opens in the center panel, build the package by adding dataflows and the appropriate external files.


                                                 Since we added the
                                                 only two dataflows in
                                                 the project, the
                                                 external files listed in
                                                 the deployment
                                                 package are all used
                                                 by the two dataflows.
                                                 It is not necessary to
                                                 deselect any.




                                                                                                                       425
expressor Documentation



It is worth noting here that at this point, all the dataflows and external files in the deployment package are marked
with a green check. That indicates that they are synchronized with the artifacts listed in the Explorer panel. If you
made a change to SampleDataflow1, for example, the SampleDataflow1 in the deployment package would not
change. The status symbol would remain green until the next time you open the Deployment Package. Then you
would see a yellow triangle indicating that it is no longer the same as the SampleDataflow1 in the Explorer. See
Refresh the Status Symbols.

Save the deployment package by clicking the Save icon in the Quick Access Toolbar.

Putting the Deployment Package in the Repository
If you are not using a Repository Workspace, skip this section and go to Deploying Packages Manually.

To check the deployment package into the Repository, first select the Repository tab on the ribbon bar. Then select
the SampleDeploymentPackage1 in the Explorer. The Commit button will activate. Select Commit or Commit All from
the Commit button drop-down menu. Because SampleProject1.0 has not been checked into the Repository before,
the entire project is checked in. Notice that now the artifacts listed in the Explorer all have status symbols. These
symbols indicate their status in relation to the version of the artifacts in the Repository.




Getting the Deployment Package from the Repository
Now we are ready to take a copy of the SampleDeploymentPackage1 out of the Repository and put it where
expressor Engine can run it.

Go to a system where the Engine is installed.

Open the expressor command prompt from the Windows Start menu.

Enter the eproject command:

      eproject checkout svn://william-
      47.CompanyB.net:53690/SampleProject1.0/dpp/SampleDeployme
      ntPackage1 SampleDeploymentDirectory

The svn://william-47.CompanyB.net:53690 portion of the command line identifies the system running
Repository. The number 53690 identifies the port on which the system accepts communication with Repository. The




426
                                                                                                Samples and Solutions



port number 53690 is the default port for the Repository, but it could be different on your Repository system. Of
course, william-47.CompanyB.net is a hypothetical Repository name and system. Substitute your real Repository
and system name.

To identify the deployment package in the Repository, we must specify the project that it is in and the subdirectory
within the project. The project name must include the version number. Look back at the project listing in the Studio
Explorer to see the name of the project with its version number: SampleProject1.0. The subdirectory /dpp is Studio's
standard location for the deployment package.

SampleDeploymentDirectory is hypothetical, but for the remainder of this example, we will use that as the
deployment directory.

Now we are ready to run the deployed dataflows.

Deploying Packages Manually
Locate the Workspaces directory on your system. Its location varies according the platform Studio is installed on.

     C:\Documents and Settings\<user_name>\My
     Documents\expressor\Workspaces (Windows XP)
     C:\Users\<user_name>\Documents\expressor\Workspaces
     (Windows 7)
In the Workspaces directory, open the folder named SampleRepoWS. If you created this example in a different
workspace, open that one. Go to the Metadata\SampleProject.10\dpp directory in that workspace and copy
SampleDeploymentPackage1.

Paste SampleDeploymentPackage1 into a directory on a system with expressor Engine installed. If you explicitly
installed the Engine when you installed Studio, then you can use the same system on which you built this sample
deployment package as the deployment system.

For the remainder of this example, we will use a directory named SampleDeploymentDirectory as the
deployment directory.

Now we are ready to run the deployed dataflows.

Running Deployed Dataflows
Now that we have deployed our example, we are ready to run it. But we will not be running the deployment package.
The deployment package itself cannot be executed as a program. We will run the individual dataflows in the package.
For this we use the etask command utility in the expressor command prompt. The expressor command prompt
is available on the Windows Start menu.

First, we will list the dataflows in the deployment package to make sure we know the names. For this we issue the
etask command similar to the following:

     etask -L
     directory_path\SampleDeploymentDirectory\SampleProject1\d
     pp\SampleDeploymentPackage1

The directory_path\ directs etask to the directory where SampleDeploymentPackage1 has been checked out
from Repository or copied manually. If you manually deployed SampleDeploymentPackage1, it will be in




                                                                                                                     427
expressor Documentation



SampleDeploymentDirectory. But if you checked it out of Repository, the deployment package will be located in
SampleDeploymentDirectory/SampleProject1.0/dpp.

The output of this command is:

      - name: SampleDataflow1, source: SampleProject1, source
      version: 0
      - name: SampleDataflow2, source: SampleProject1, source
      version: 0

Finally, we can run the dataflows, confident that we have all the information we need. And we do need to know which
project the dataflows are in (SampleProject1) and what the version number of the project is (0). Here is the etask
syntax to run the first dataflow:

      etask -x SampleDataflow1 -D SampleDeploymentPackage1 -p
      SampleProject1 -V 0

      etask - ETASK-0014-W:
             This Field license for all expressor products will
      expire on July 15, 2011
      1.
                    Upon expiration, this software will cease to
      operate.
                  Please contact expressor to renew this Standard
      license.
                           Thank you for using expressor
      software.
      <task dataflow="" configuration="(null)">
      <!-- project.home[..\..], project[SampleProject1] -->
       <page id="0" process="4904" run="0" status="0"
      start="2011-02-25T10:45:56">
      Read File - IN_FILE_OP-0010-I: Input processing complete
      for file '.\presidents
      .dat'. records in=45, leading records skipped=0, records
      out=45, records rejecte
      d=0
         <interval process="4904" status="0">
          <tool id="1" records="45" status="2">
           <channel id="0" records="45" status="2"/>
          </tool>
          <tool id="3" records="45" status="2">
           <channel id="0" records="45" status="2"/>
          </tool>
         </interval>
       </page>
       <statistic>0.152</statistic>
      </task>



428
                                                                   Samples and Solutions


     etask - ETASK-0037-N: Processing has completed
     successfully.

Voila! "Processing has completed successfully." It worked!

Now run SampleDataflow2:

     etask -x SampleDataflow2 -D SampleDeploymentPackage1 -p
     SampleProject1 -V 0

     etask - ETASK-0014-W:
            This Field license for all expressor products will
     expire on March 15, 201
     1.
                   Upon expiration, this software will cease to
     operate.
                 Please contact expressor to renew this Standard
     license.
                          Thank you for using expressor
     software.
     <task dataflow="" configuration="(null)">
     <!-- project.home[..\..], project[SampleProject1] -->
      <page id="0" process="3420" run="0" status="0"
     start="2011-02-25T11:01:12">
     Read Details File - IN_FILE_OP-0010-I: Input processing
     complete for file '.\sa
     lesOrderDetail.txt'. records in=121,318, leading records
     skipped=1, records out=
     121,317, records rejected=0
     Read Header File - IN_FILE_OP-0010-I: Input processing
     complete for file '.\sal
     esOrderHeaders.txt'. records in=31,466, leading records
     skipped=1, records out=3
     1,465, records rejected=0
        <interval process="3420" status="0">
         <tool id="1" records="31465" status="2">
          <channel id="0" records="31465" status="2"/>
         </tool>
         <tool id="3" records="121317" status="2">
          <channel id="0" records="121317" status="2"/>
         </tool>
         <tool id="9" records="31465" status="2">
          <channel id="0" records="1810" status="2"/>
          <channel id="0" records="29655" status="2"/>
         </tool>
         <tool id="5" records="121317" status="2">
          <channel id="0" records="121317" status="2"/>



                                                                                    429
expressor Documentation


         </tool>
         <tool id="7" records="31465" status="2">
          <channel id="0" records="31465" status="2"/>
         </tool>
         <tool id="11" records="1810" status="2">
          <channel id="0" records="1810" status="2"/>
         </tool>
         <tool id="13" records="29655" status="2">
          <channel id="0" records="29655" status="2"/>
         </tool>
        </interval>
       </page>
       <statistic>3.254</statistic>
      </task>
      etask - ETASK-0037-N: Processing has completed
      successfully.

Wow! It worked again. We have proved that it can be done, and you have done it. Way to go!

If you have any questions about this example, any problems getting it to work, or help transferring what you have
learned here, visit the expressor Community web site forum to ask questions.


Connect to Data Services with Datascript Operators
Importing the Sample Project and Libraries

Understanding the Project and Library Artifacts

Running the Dataflows

Importing the Sample Project and Libraries
      1.    Open a new or existing Workspace.

      2.    Select Import Projects from the Studio drop-down menu.

      3.    Select the ZIP file option on the Import Projects dialog box.

      4.    Enter or navigate to the following path in the ZIP file text-entry box.

           C:\Program Files\expressor\expressor3\Studio\solutions\WebServiceSampleBundle.zip




430
                                                                                                   Samples and Solutions



    5.   Select all four projects and libraries listed on the next panel of the Import Projects wizard.




    6.   Click Finish to bring the project and libraries into the workspace.




Understanding the Project and Library Artifacts
The Web Services example contains dataflows that demonstrate using expressor to access data and services from
web services. The example contains three Libraries and a Project whose contents are described below.

Google.0 is a library containing Datascript modules for accessing particular Google services. The ‖Connect‖ modules
contains DataScript functions for establishing a Google session and obtaining a session (or ‖authorization token‖).
‖Contacts‖ contains functions for querying contact records in a user’s Google contact list. Geo contains a function
that uses Google’s Geocode service to convert an unstructured address (e.g. ‖1 Main St, Amherst, MA‖) and convert it
to a multi-part record containing the various address parts (including adding zip code, latitude and longitude, etc.).
Note that according to their online documentation, Google places a restriction on requests to the Geocode service
limiting usage to 2500 requests per day per requesting IP.

The Google library also contains sample dataflows that demonstrate reading Google contact data and structuring
addresses using Geocode. The key operators for these sample dataflows have been saved as operator templates so
that it is easy for users to re-use these operators in their own dataflows. The Types that these operators operate on,
which are included in the library’s list of Composite Types, are ’GoogleContact’ and ’GoogleContactStructured’.

Salesforce.0 is a library containing Datascript modules for accessing particular Salesforce.com services. The ‖Connect‖
module includes an ‖authenticate‖ function that uses Salesforce.com’s SOAP-based API to authenticate a user’s




                                                                                                                      431
expressor Documentation



account credentials and obtain a valid session id. The ‖Bulk‖ module contains functions for using Salesforce.com’s
bulk API. The ‖Contacts‖ API contains a ‖bulkLoadContacts‖ function that uses several of the ‖Bulk‖ package functions
to load Contact record data into a Salesforce.com account’s contact list.

The Salesforce library contains a sample dataflow that demonstrates loading Salesforce records from a file. Two of
the operators of that dataflow, including one labeled ’Bulk Load Contact Batch’, are saved as operator templates to
facilitate re-using them in your own project or library. The Types that these operators process, namely ContactInput,
ContactInputBatch and ContactBulkLoadStatus, are all included in the library as reusable shared Composite Types.

A third library named ’Master’ contains a single ’Contacts’ module whose functions server to mediate between a
master contact record and the two, service-specific definitions for Google and Saleforce.com. The Master library
contains a single Contact Composite Type consisting of attributes that are based either on Atomic Types from either
the Salesforce or Google libraries (we chose to use Salesforce.com’s Atomic Types anywhere where there was a
semantic overlap between the two). The ’Contacts’ module contains one function that takes a
GoogleContactStructured record and converts it to a Master contact record and a second function that takes a Master
contact record and converts it to a Salesforce ContactInput record. Note that this example only captures a subset of
what these two web services support in terms of record content so certainly more content could be added.

The Master library contains no dataflows but it does contain an operator template for a transform operator labeled
’GoogletoSFDCContact’ that makes use of the two conversion functions to translate a Google contact record into a
Salesforce.com contact.

The example also includes a project named GoogleToSalesforce that contains a dataflow that depend on the services
in the modules from the Google and Salesforce libraries. The dataflow named ’MigrateGoogleContactsToSFDC’ reads
GoogleContact records from Google, passes their unstructured address fields to Google’s Geocode service using the
’GeocodeGoogleContact’ operator template, converts them to Salesforce ContactInput record’s using the Master
library’s Contact module, groups those contact records into batches , and submits each batch to Salesforce.com’s bulk
load API using the 'BulkLoadContactBatch’ operator template from the Salesforce library. Contact records that are
successfully processed are written as comma-separated values records to a ’Success’ file while those that encountered
errors are directed to a ’Failures’ file.




Running the Dataflow
In order to successfully execute the dataflows in this example, you will need to have your own Google and
Salesforce.com accounts. If you do not already have a Google account, you can register for one here:
http://mail.google.com/mail/signup

If you are a developer, you can register for a Developer Salesforce.com account at:
http://www.developerforce.com/events/regular/registration.php

Your Google account will need to have some contacts in it: use http://www.google.com/contacts to add contacts.

There are two approaches you can take to specify your Google and Salesforce.com credentials to the dataflows. The
first is to directly open the operators that access those services and editor the rule logic where it says either:


          username='your Google account name'




432
                                                                                                    Samples and Solutions



          pwd='your Google password'

or:

          username='you salesforce account name'

          pwd='your salesforce password'

          sectoken='you salesforce security token'

The operator’s whose rule code you will need to change are labeled ’Read Google Contacts’ and ’Bulk Load Contact
Batch’.

A second approach for specifying account credentials, which avoids having to directly edit the rule logic of the
operators, is to create two files on your hard drive that contain the credential information, one called ’Google.eds’ and
the other called ’Salesforce.eds’. In both cases, the folder in which to search for these files must be recorded as a full
path (e.g. ’ C:\My Documents\web_accounts’) that is set into a user environment variable called
’WEBSERVICES_ACCOUNTS’ prior to invoking expressor Studio. The service connection logic in the Google and
Salesforce operators will first attempt to use the credentials that are specified in the operator rule code but, if these
values are empty, it will try to identify the Google.eds and Salesforce.eds files in the folder specified by
WEBSERVICES_ACCOUNTS and use that version if those files are available and properly specified. The syntax of these
files is identical to the variable assignment syntax in the operator rule logic as shown above. If you decide to use the
file-based approach, it is advisable that take precautions to prevent others from easily reading your account
credentials.

Note that in the Salesforce credentials case, you will need to know your Salesforce.com security token. To have a
Salesforce security token emailed to you by Salesforce.com, follow the instructions at

          http://www.salesforce.com/us/developer/docs/api/Content/sforce_api_concepts_security.htm#topic-
          title_login_token

With those changes, you should be able to run all the dataflows in the various Libraries and the Project.




                                                                                                                        433
Using expressor Engine

Running Dataflows with the expressor Engine
The expressor Engine runs both dataflows that have been compiled in Deployment Packages and those compiled
outside Deployment Packages. When dataflows are constructed in expressor Studio, they can be compiled by
running them within Studio or by putting them in Deployment Packages.

Dataflows are normally stored by the expressor Repository as part of the project they were created in. If they have
been compiled in a Deployment Package or by running within Studio, the Deployment Package or compiled version
are also stored with the project. From there they can be checked out to the environment in which they will be run by
the Engine.

Compiled dataflows and Deployment Packages can also be moved manually from the Studio environment in which
they were created. The packages are stored in the
expressor\Workspaces\workspace_name\Metadata\project_name directory. Compiled dataflows are
stored in the dfp directory under the project, and Deployment Packages are stored in the dpp directory under the
project. Copy the folder with the Deployment Package's name and place it in a directory on the system running the
Engine where the etask command can access it.

The Engine uses two commands to retrieve projects from a repository and run their dataflows.

         eproject: Check out a project from a repository so that dataflows can be run by the Engine.

         etask: Run dataflows from a Deployment Package or the project's dataflow directory.




Check Out a Project
When Projects are stored by expressor Repository, use the eproject command to check them out of the
repository so that their dataflows can be run with the etask command.

    1.    Open an expressor Command Prompt window from the Start menu.
          On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

    2.    Type the eproject command with the checkout subcommand:

              eproject checkout URI location

         The URI variable uses an SVN-style syntax for the location of the repository and project containing the
         Deployment Package. For example, svn://hostname:port/project.

         The location variable indicates where the checked out project directory is to be located.




                                                                                                                   435
expressor Documentation



Run a Dataflow in a Deployment Package
Once a project has been checked out of a repository, you run individual Dataflows within a Deployment Package.

      1.    Open an expressor Command Prompt window from the Start menu.
            On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

      2.    In the expressor command prompt window, change your working directory to the project directory.
            If you do not change your working directory to the project directory, you can use the -D argument on the
            etask command line with the path to the project directory.

      3.    List the dataflows in the deployment package:

                etask -L deployment_package_name

           The dataflows listed have names with the following structure:

                  project_name...version_number...dataflow_name

           For example, Proj1...0...Dataflow1

           All of these components are used with the etask command when it executes the dataflow.

      4.    Type the etask command:

                etask -x dataflow_name -p project_name -V
                version_number



Run a Compiled Dataflow
Once a project has been checked out of a repository, you run individual Dataflows that standalone, independent of a
Deployment Package.

      1.    Open an expressor Command Prompt window from the Start menu.
            On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

      2.    In the expressor command prompt window, change your working directory to project directory.
            If you do not change your working directory to the project directory, you can use the -D argument on the
            etask command line with the path to the project directory.

      3.    Type the etask command:

                etask -x dataflow_name




436
Managing and Using expressor Repository

Storing Artifacts in the Repository
Artifacts such as Connections, Schemas, Semantic Types, and Deployment Packages can be stored in expressor
Repository if they are in a Repository Workspace. Storing artifacts in a Repository provides version control for the
artifacts and allows them to be easily shared among members of a project team and across an organization.
Administrative tools provide security by setting user access to the Repository and backing up and restoring the
Repository files. Other administrative functions, and some user functions, can be performed with the eproject
command.

When a Repository Workspace is open and connected to a repository, the Repository tab on the Studio ribbon bar is
active. The button on this tab allow users with access privileges to the repository to check out, update, and commit
projects, lock artifacts in the repository to prevent changes by other users, and resolve differences in changed
versions of artifacts.

User access privileges are set by the repository administrator. Users then provide their access credentials when they
create Repository Workspaces. Once a user has provided those credentials, they are cached in the workspace and
reused every time the user opens the workspace. After the initial access, users are prompted again for credentials only
if the cached credentials are no longer valid. If the user enters new credentials that are valid, those credentials are
then stored in place of the old credentials.

A user can clear the cached credentials if necessary with the Clear Credentials button on the Repository tab of the
ribbon bar. Because the credentials are stored with the workspace, they will be used for any access to the repository
from that workspace. If the user wants to access the repository as a different repository user, he or she would have to
clear the cache.

If the user is going to share the workspace, he or she might want to clear the credentials out first.

The properties of a workspace's repository can be changed with the Repository Properties button on the
Repository tab of the ribbon bar. The properties a workspace user can change are the host ID and port number. The
repository at the new location specified by the changed properties must be the repository for the current workspace.
If it is not, the change is rejected.




Set User Access to a Repository
After a repository is installed, the repository administrator must set up user accounts with passwords to enable
individual users to create Repository Workspaces and check projects in and out of the repository.

     1.    Locate the conf directory, which contains the passwd file that was set up when Repository was installed.
           The location of conf directory varies with the different Microsoft Windows platforms.

          Windows 7 and Windows Server 2008:

          C:\ProgramData\expressor\repository\conf

          Windows XP and Windows Server 2003:



                                                                                                                          437
expressor Documentation



           C:\Documents and Settings\All Users\Application Data\expressor\repository\conf

      2.    Open the passwd file in a text editor.

           ### This file is an example password file for evnserve.
           ### Its format is similar to that of evnserve.conf. As shown in the
           ### example below it contains one section labelled [users].
           ### The name and password for each user follow, one account per line.
           [users]
           # harry = harryssecret
           # sally = sallyssecret
           evnserv = test

      3.    Using the "evnserv = test" entry as a model, add "username = password" for each user who will have access
            to the repository.

      4.    Save the passwd file.

      5.    Stop and restart Repository.
            Changes in the passwd file do not take effect until Repository is started again.


Backup and Restore Repository Data
The expressor ecommand is used to backup and restore a repository.

To backup a repository:

      1.    Select repository Stop from the Start menu to stop the repository service.
            On the Start menu, go to expressor>expressor3>repository to find the repository Stop option.

      2.    Open an expressor Command Prompt window from the Start menu.
            On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

      3.    Navigate to the directory where the ecommand is installed.
            By default, expressor Repository installs in the Program Files directory. ecommand is installed in the
            expressor\expressor3\Repository\bin directory.

      4.    Run the ecommand export-repository subcommand to create a zip file of the repository.

                ecommand export-repository -r repository_path -f
                repository-export.zip

           The zip file is stored in the working directory.

To restore a repository:

      1.    Select repository Stop from the Start menu to stop the repository service.
            On the Start menu, go to expressor>expressor3>repository to find the repository Stop option.

      2.    Open an expressor Command Prompt window from the Start menu.
            On the Start menu, go to expressor>expressor3> to find the expressor command prompt.

      3.    Run the ecommand import-repository subcommand to unzip the repository zip file in the specified
            location.



438
                                                                           Managing and Using expressor Repository


             ecommand import-repository -r repository_path -f
             repository-export.zip

Check Artifacts Out of a Repository
   1.    Open an existing Repository Workspace or create a new one in Studio.

   2.    Click the Check Out Projects button under the Repository tab of the ribbon bar.

   3.    Select a project from the Checkout Projects dialog box.
         The Checkout Project Progress dialog box shows the individual artifacts (Dataflows, Connections, Schemas,
         Semantic Types, and External Files) moved from the repository to Studio.


        The checked out project
        appears in the Studio
        Explorer panel. Green
        check marks indicate
        that the checked out
        version is currently
        synchronized with the
        repository version.




                                  When changes are made to
                                  the artifacts, the status
                                  indicator changes to show the
                                  artifact, the artifact folder,
                                  and the project are no longer
                                  synchronized with the version
                                  in the repository.




                                                                                                                439
expressor Documentation




Manage Changes to Projects and Artifacts Checked Out of a Repository

Update the Checked Out Project or                   Review the Revision History of a Project
Artifact                                            or Artifact


Revert to a Previous Version of a Project or        Update Status of External Files
Artifact


Check a Project or Artifact into a
Repository




Update the Checked Out Project or Artifact
When you have a project or artifact checked out of the Repository for an extended period of time, or when you know
other users checked in changes to the project or artifact after you checked it out, then you probably should update
the version you have checked out.

      1.    Select the project or artifact in the Studio Explorer that you want to update.

      2.    Click the Update button on the Repository tab of the ribbon bar and select Update or Update All from the
            drop-down menu.
            The Update option updates only the selected artifact or all artifacts in a selected project. The Update All
            option updates all checked-out projects and artifacts.

      3.    If you have modified the artifact or project you are updating and another user has committed changes to
            that artifact or project, the Update dialog box will indicate there is a conflict.


           Even though
           there is a
           conflict, the
           update
           completes.




      4.    If there is a conflict, either close the Update dialog box and review the updates or resolve the conflict within
            the Update dialog box.
            See Resolve Conflicts between Versions of Project and Artifacts.




440
                                                                              Managing and Using expressor Repository



Revert to a Previous Version of a Project or Artifact
   1.    Select the artifact in the Studio Explorer whose version you want to change.

   2.    Click the Revert button on the Repository tab of the ribbon bar and select Revert to Last Revision or Revert
         to Revision... from the drop-down menu.
         If you select the Most Recent option, then the last version checked into the repository will replace the
         version you have checked out.


        If you select the Revert to
        Revision... option, then the
        Revert to Revision dialog
        box displays with a list of
        revisions to for the
        artifact.

        The list of revisions
        authors of revisions, the
        date the revision was
        committed to the
        repository, and the
        author's comment on the
        Commit.


        When the reversion is complete, the status symbol on the artifact in Explorer shows that it has been changed.




Check a Project or Artifact into a Repository
   1.    Select the artifact or project in the Studio Explorer you want to check in.

   2.    Click the Commit button on the Repository tab of the ribbon bar and select Commit or Commit All from the
         drop-down menu.
         The Commit option checks in the selected artifact or project. The Commit All option checks in all projects
         and artifacts that have been checked out.
         When the check in is complete, all projects and artifacts that were committed show green check mark status
         symbols in the Explorer to indicate that they are synchronized with the version in the repository.




Review the Revision History of a Project or Artifact
   1.    Select the artifact or project in the Studio Explorer whose revision history you want to view.

   2.    Click the Revision History button on the Repository tab of the ribbon bar.




                                                                                                                      441
expressor Documentation



           The Revision
           History dialog box
           displays the
           versions of the
           project or artifact
           with author and
           modification date
           in the top panel.



      3.    Select a version from the list in the top panel of the Revision History dialog box.


            The changes made to the selected
            version then appear in the bottom,
            Details panel of the Revision History
            dialog box. If you are viewing the
            history of a single artifact, the
            Details panel displays other artifacts
            that were committed at the same
            time the selected artifact was
            committed.


           If artifacts have been deleted, you can restore them by selecting the artifact in the Details panel and clicking
           the Restore Deleted button on the right, above the Details panel.

Update Status of External Files
Unlike with other artifacts, changes to External Files are not automatically indicated by the status symbol. When you
change Connections, Schemas, Semantic Types, and Deployment Packages in Studio, the green check mark status
symbol changes to a red circle to indicate that the artifact is no longer the same as the version in the repository. If
you change an external file by, for example, editing it with a text editor, that change is not reflected by a change in
the status symbol. To update the status symbol:

      1.    Select the External Files folder in the Studio Explorer.

      2.    Right click and select Refresh from the menu.


Lock Artifacts in a Repository
You can lock an artifact you have checked out of a repository so that other users cannot commit changes to that
artifact in the repository.

      1.    Select an artifact in the Explorer.

      2.    Click the Lock button on the Repository tab in the ribbon bar.
            The Lock dialog box displays with the selected artifact listed.

      3.    Enter a comment to explain why you are locking the artifact.



442
                                                                                Managing and Using expressor Repository



    4.   Confirm that you want to lock the artifact by clicking the Lock button in the dialog box.
         The Lock Progress dialog box displays to indicate that locking is complete or to display an error.

You can remove the lock from the artifact two ways:

        Select the artifact in Explorer and click the Unlock button on the Repository tab in the ribbon bar.

        Commit the artifact to the repository.

You can also break a lock that another user has placed on an artifact that you have checked out of the repository. It
might be necessary to break someone else's lock when you must make an urgent change or when you know the other
user has inadvertently left the artifact in a locked state.

    1.   Select an artifact in the Explorer.

    2.   Click the Unlock button on the Repository tab in the ribbon bar.
         The Unlock Progress dialog box displays to indicate that unlocking is complete or to display an error.




Resolve Conflicts between Versions of Projects or Artifacts
If you attempt to check in (commit) a project or artifact that you have modified and another user has checked in
changes before you, then your version is in conflict with the version committed by the other user. The Studio does
not, however, display any indication of such conflicts until you update or commit the project or artifact in conflict.

Resolve a Conflict after an Update
When you encounter a conflict while updating a project or artifact with the latest version in the repository, you can
resolve it before or after closing the Update dialog box. If you want to examine the changes before deciding how to
resolve the conflict, close the Update dialog box and open the updated project or artifact and review the changes
made by other users.

To resolve the conflict before closing the Update dialog box:

    1.   Select the Action listed as Conflicted.
         Note that the status symbol for project or artifact that is in conflict with the version in the repository is a
         yellow triangle.

    2.   Click either the Resolve conflict using mine or Resolve conflict using theirs button at the top of the
         dialog box.
         You can also click the Revision History button at the top of the dialog box to review the version changes
         before making a decision.




                                                                                                                           443
expressor Documentation



           After selecting the type of resolution, the
           Conflicted Action changes to Resolved.




      3.    Close the Update dialog box.
            Note that the status symbol for the project or artifact is now a green check mark.

To resolve the conflict after closing the Update dialog box:

      1.    Select the project or artifact that is in conflict with the repository version. It should have yellow triangle for a
            status symbol.

      2.    Click either the Resolve Using Mine or Resolve Using Theirs button on the Repository tab of the ribbon
            bar.
            Note that the status symbol for the project or artifact is now a green check mark.

Resolve a Conflict during a Commit

           When you encounter a
           conflict while attempting
           to check a project or
           artifact into the repository,
           the Commit Progress
           dialog box displays an
           error message indicating
           the Commit has failed.


      1.    Select the project or artifact that is in conflict with the repository version. It should have yellow triangle for a
            status symbol.

      2.    Click either the Resolve Using Mine or Resolve Using Theirs button on the Repository tab of the ribbon
            bar.
            Note that the status symbol for the project or artifact is now a green check mark.




Change Repository Properties
      1.    Select the Repository Properties button on the Repository tab of the ribbon bar.

      2.    Change the host ID, port number, or both.




444
                                                                               Managing and Using expressor Repository



         When you press OK, the Repository Credentials dialog box prompts for the Username and Password for the
         repository.

    3.    Enter the credentials for the new repository location.

         At this point, the cached credentials are not used because you are connecting to the repository in a new
         location.

    Note: The repository at the new location specified by the changed properties must be the repository for
               the current workspace. If it is not, the change is rejected.




Change Repository Credentials
User access privileges are set by the repository administrator. Users then provide their access credentials when they
create Repository Workspaces. Once a user has provided those credentials, they are cached in the workspace and
reused every time the user opens the workspace. After the initial access, users are prompted again for credentials only
if the cached credentials are no longer valid. If the user enters new credentials that are valid, those credentials are
then stored in place of the old credentials.

Because the credentials are stored with the workspace, they will be used for any access to the repository from that
workspace. If the user wants to access the repository as a different repository user, he or she would have to clear the
cache.

If the user is going to share the workspace, he or she might want to clear the credentials out first.

To change the credentials used for a repository connection:

    1.    Select the Clear Credentials button on the Repository tab of the ribbon bar.
          After the credentials have been cleared, the name displayed in the lower right status pane will be "no user."

    2.    Perform an operation from the Repository tab of the ribbon bar that requires connection to the repository.
          Operations that require a connection are in the Check Out, Changes, Locking, and Conflicts sections of the
          Repository tab.

    3.    Enter the credentials in the Repository Credentials dialog box.
          Once the credentials are validated, the user name entered is displayed in the lower right status pane. And
          those are now the cached credentials.




                                                                                                                          445
Command Line Utilities

ecommand
The ecommand utility backs up and restores repositories managed by expressor Repository.

     ecommand [-h] [export-repository | import-repository] [-r
     repository_path] [-f zip_file] [-h | -?| --help]

The subcommands and command line flags have the following interpretations.


-h                         Display help information about the command.


                           Export a specified repository to a specified export file. NOTE: The repository
export-repository
                           must not be running when the export is invoked.


                           Import a previously exported repository export file into an existing repository.
import-repository
                            CAUTION: The current target repository content will be completely overwritten.


                           Identifies the repository to be copied or restored from an export zip file. Default
-r
                           location is the location specified when expressor Repository was installed.


                           Identifies the name of the export zip file to which the repository is copied or
-f                         from which it is copied. Default is repository-export.zip, and the default
                           location is the current working directory.


                           Display help information about the subcommand (export-repository or import-
-h | -? --help
                           repository).


eflowsubst
The eflowsubst command creates substitution files that contain substitute values for operator parameters such as
the Connection path, file name, and database credentials used in a dataflow.

     eflowsubst -x dataflow_name -D deployment_package_name [
     -e ] [ -p project_name -V project_version_number ] [ -F
     dataflow_name ] [ -K dataflow_pathname ] [ -O
     substitution_file_name ] [ -h | -? ]

The command line flags have the following interpretations.


                                 Followed by the name of the dataflow .rpx to be run. Requires the -D
-x | --xfile
                                 option to be set.


-e | --setEncrypted              Encrypts values in the substitution file.




                                                                                                                 447
expressor Documentation



                                 The name of the project that contains the dataflow. The Engine can then
-p | --projectName               locate the required deployment package, which is in the dpp subdirectory
                                 under the project directory. Must be used with the -V parameter.


                                 Locates the dataflow name identified by the -x parameter in the
-D | --deploymentPath
                                 deployment package.


-F | --dataflow                  The dataflow from which to construct the substitution file.


-K | --packageFile               Provides an explicit path to the dataflow's .rpx file.


-O | --outputFilename            Override the default name for the substitution file.


                                 Identifies the version number of the project identified by the -p
-V | --projectVersion
                                 parameter. Must be used with the -p parameter.

-h | --help
                                 Display help information.
-? | --?




ekill
The ekill command is used to suspend, stop (cleanly terminate) or kill (terminate immediately) a running task.

      ekill -x dataflow | -p pid
            -a continue | stop | quit
      | terminate

The command line flags have the following interpretations.


             Followed by the full name of the dataflow project_name.dataflow_name.

-x           The project_name does not include the dot-version number, such as .0.

             The dataflow_name is the dataflow file name as it appears in Studio .


             Followed by the process id of the process. The process id is in the log details displayed when
-p           etask runs a dataflow. The process id can also be obtained from the Windows Task
             Manager.


             Followed by the action to take on the process.

                     continue
-a                   stop
                     quit            (graceful shutdown)
                     terminate       (hard shutdown)




448
                                                                                               Command Line Utilities



             Default is quit; terminate always stops an expressor Dataflow.


-h | -?      Display help information.


elicense
The elicense command requests or installs an expressor license.

     elicense -i file | -k license_key | -l |
     -r file | -v | -x [e,r,s] | [-h | -?| --
     help]

The command line flags have the following interpretations.


-i file               Install a license file received from expressor.

-k
                      Install a license key received from expressor.
license_key


-l                    List information about the installed license.


                      Generate host specific information into file. (The file is sent to expressor, along with
-r file               the name of the operating system, and expressor returns along-term license to run the
                      expressor software on that system.)


-v                    Validate an existing license file and display its expiration date.


                      Remove the license file.

                      e removes the Engine license. Only the user who installed the Engine and the license
-x [e,r,s]            file may remove the license file.

                      r removes the Repository license.

                      s removes the Studio license.

-h | -? --
                      Display help information.
help


     Note: The computer being licensed must be able to resolve its host name either through DNS or another
             technique such as the hosts file.


eproject
The eproject command line utility manages a project deployed from the expressor Repository version control
system to a Windows computer running expressor Engine. This command is run from a command/terminal window
on the computer running the Engine.

eproject, which can be run from any command window, includes the following sub-commands.



                                                                                                                 449
expressor Documentation


      eproject checkout URI [location][-h | -?| --help]

The command line flag has the following interpretation.


                     URI is an SVN style entry that specifies the location of the version control system.

                     For the SVN protocol-compliant version control system that is part of the Repository,
                     the URI is svn://hostname:port/deployment_package:

                               hostname:port is the name, or IP address, of the computer running
URI                             Repository.

                               port is the TCP/IP port used by the Repository.

                               deployment_package includes the path to and the name of the deployment
                                package to be checked out

                     The default port assignment for the version control system is 53690.


                     The file system location into which to place a deployment package, or the file system
                     location that contains a deployment package.

                     If no location is specified, the deployment package is downloaded to the directory
                     where eproject was invoked. A directory with the same name as the deployment
                     package name will be created to hold the subdirectories of the deployment package.
location
                     If location is provided (e.g., ./mydeployedapps), eproject will download the
                     subdirectories of the deployment package directly into mydeployedapps. There will
                     not be a directory with the same name as the deployment package; the directory
                     mydeployedapps will replace the directory that had the name of the deployment
                     package.

-h | -? --
                     Display help information.
help


Usage details
eproject checkout checks out a deployment package to a computer running the Engine. Since arguments to this
command provide the URI of the Repository and the target file system location, this command can be run from a
command window opened to any file system location, but generally it is the home directory under the Engine
installation.

When checking out a deployment package, the URI and the location arguments work in concert. The URI
identifies the deployment package to be checked out while the location specifies the file system location of the
deployment package directory.

      eproject checkout
      svn://hostname:53690/myProject/dpp/my_package.0
      ./my_package.0

my_package.0 is the deployment package to be checked out while my_package.0 is the name of the subdirectory,
under the current directory, which will hold the checked out deployment package.



450
                                                                                                   Command Line Utilities



datascript
The datascript command tests an expressor datascript from a command window rather than in the Rules Editor in
expressor Studio. This utility can be run interactively, in which case each line of the script is entered into the
command window, or a script may be contained in a file and executed as a single block of code.

Unlike the Rules Editor, all underlying Lua libraries, functions, and command line options are available to this utility,
which is an extension to the Lua stand-alone interpreter. All datascript functions are also available with the following
exceptions:

          The utility.encrypt and utility.decrypt functions do not apply the encryption key, but rather
           convert between text and base-64 representations.

The command line flags have the following interpretations.


-e "stat"               Execute the datascript statement stat. The " are a required part of the syntax.


-l name                 Load the library name.


-i                      Enter interactive mode after executing any other flags.


-v                      Show version information.


--                      Stop handling options.


-                       Execute stdin and stp handling options.

-h | -? --
                        Display help information.
help


Usage details
     1.    To start the utility in interactive mode, open an expressor Command Prompt window and issue the
           following command.

     datascript

     2.    Enter each line of the script to be tested.

          Pressing Enter before completing a statement is permitted. The interpreter waits until the statement is
           complete before processing.

          Preceding a statement with an equal sign causes the interpreter to print the value returned by the statement.

          For example, the following statements (blue) test the logical expression and print the corresponding return
          (red).

               c:\>datascript




                                                                                                                        451
expressor Documentation


                datascript v.i.i copyright (C)
                2003-2011 expressor software
                corporation
                > customer_account_balance=10000
                >
                print((customer_account_balance>9000
                and "Preferred") or "Standard")
                Preferred
                > customer_account_balance=8000
                >
                print((customer_account_balance>9000
                and "Preferred") or "Standard")
                Standard
                >

      3.    Use Ctrl-C, or close the command window, to exit interactive mode.

           Alternatively, the code

                customer_account_balance=10000
                print((customer_account_balance>9000
                and "Preferred") or "Standard")
                customer_account_balance=8000
                print((customer_account_balance>9000
                and "Preferred") or "Standard")

           could be placed into a file (e.g., my_script.ddx) and executed.

                c:\>datascript my_script.ddx
                Preferred
                Standard
                c:\>

      Note:     Processing in datascript runs as if the datascript.optimize.flag configuration value is set
                 to false. This has an effect on the functionality of the pairs function.

      Note: An expressor best practice is to use the extensions .ddx or .lua when naming an external script
                 file.




452
                                                                                                 Command Line Utilities



etask
The etask command runs a dataflow in expressor Engine. When the dataflow completes successfully, etask
returns zero; if a dataflow encounters an error, etask returns a non-zero value.

     etask -x dataflow ||
     [ -d Step_number ] [ -l logfile_name ] [ -a ] [ -f ] [ -g
     ]
     [ -i interval ] [ -p project_name ] [ -D
     deployment_package_path ]
     [ -P parameter_name=value ] [ -K pathname ] [ -L ]
     [ -S substitute_parameter_filename ] [ -P
     parameter_name=value ]
     [ -s Steps ]
     ||
     [-L deployment_package]
     [ -V version ] [ -h | -? | --help ]

The command line flags have the following interpretations.


                         Followed by the name of the dataflow .rpx to be run. Requires the -D option to be
-x | --xfile
                         set.


                         Followed by an integer that indicates the Step in a multi-step dataflow on which to
-d | --firstStep
                         start running the application. Default is 1.


                         Followed by the name of the file into which to write the runtime log. This log is
-l | --logfile           used by other expressor components to monitor performance.

                         To suppress writing of the file, specify nul (Windows) as the output file name.

-a | --
                         Only log information from running operators.
onlyLogOperators


-f | --                  Only log information from the reporting interval immediately before processing
onlyLastInterval         completes.


-g | --debugInfo         Dump detailed debug information for interpretation by expresssor Support.


                         Followed by an integer that sets the log reporting interval (the interval, in seconds,
-i | --
                         at which entries are written into the log file). Default is 60 seconds and the
logInterval
                         minimum is 5 seconds.


                         Followed by the name of the project, the Engine can then locate the required
-p | --
                         deployment package, which is in the dpp subdirectory under the project directory.
projectName
                         Must be used with the -V parameter.




                                                                                                                   453
expressor Documentation



-D | --                 Locates the dataflow name identified by the -x parameter in the deployment
deploymentPath          package.

-K | --
                        Provides an explicit path to the dataflow's .rpx file.
packageFile


-L | --listFlows        Lists the dataflows in the specified deployment package.


                        Lists parameter_name=value pairs to be substituted for the named parameters
-P | --                 in the dataflow. The parameter values specified with this etask argument takes
subEntries              precedence over any other specification of the parameter values. The names of
                        parameters is listed in Supported Parameters.


-s | --steps            Lists the substitution files containing parameter values to be used in the dataflow.


                        Lists the set_names containing parameter_name=value pairs to be
                        substituted for the named parameters in the dataflow. The set_names syntax is
                        Step_number@operator_name@property_name=property_value. The @ symbol
                        separates each of the elements required in specification of the name. If any
                        element contains the @ symbol, it must be substituted with the escape value
                          "&at;". The equal sign must also be substituted if it appears in an element name.
-S | --                 The escape value for the equal sign is "&eq;" For example,
substFiles
                        "Step_1@Write&at;File@fileName"=Blammo.txt

                        correctly specifies that the property name "fileName" in the operator named
                        "Write@" in "Step_1" of the dataflow is to have the property value "Blammo.txt."
                        The operator name "Write@" is changed to "Write&at;" to distinguish its @ from
                        the @ symbol used as a delimiter or separator in the full specification of the
                        property name.


-V | --                 Identifies the version number of the project identified by the -p parameter. Must
projectVersion          be used with the -p parameter.


-h | -? | --help        Display help information.


Examples
Current working directory is Sample_2.0 and path to deployment package is:

      dpp\DeploymentPackage1

Command syntax to run the dataflow is:

      etask -x Dataflow2 -D dpp\DeploymentPackage1 -p Sample_2
      -V 0

To get the name of a dataflow in a deployment package:



454
                                                                                              Command Line Utilities


     etask -L dpp\DeploymentPackage1

     - name: Dataflow2, source: Sample_2, source version: 0

To execute a dataflow with the -x parameter alone, change directory to the directory containing the executable
dataflow file, which contains the .rpx suffix. Starting from the working directory above:

     cd dpp\DeploymentPackage1\Sample_2...0...Dataflow2

     dir
     FolderArtifactDescriptor.xml
     Sample_2...0...Dataflow2.rpx

     etask -x Sample_2...0...Dataflow2

.




                                                                                                                 455
Glossary

                                                           A
Aggregation Rule: A Rules Editor rule that takes a single input parameter to use as the key for grouping records.

Artifact: An object in an expressor application that can be be created and managed by the user within the Studio and
         can be reused within a project or library.

Assignment Rule: A Rules Editor rule that simply connects an input attribute to an output attribute without any
         change or transformation.

Atomic Type: A semantic type that defines the type for a single cohesive unit of data that cannot be decomposed
         any further and accordingly is associated with a data type such as a string or integer. In the context of a
         schema mapping, an atomic type defines the type for a composite attribute that maps to a field in a schema.

Attribute: A member of a Composite Type that has a unique name and is associated with a semantic type.

Attribute Mapping: A set of associations defined within the Schema Editor that maps the Schema fields to the
         Attributes within a Composite Type. Attributes are also mapped to other Composite Type Attributes in the
         Transform Editor.

Attribute Propagation: The movement of an Attribute between operators.

Attribute Transfer: The movement of an Attribute from input to output within an operator.



                                                           C
Change Rule: A Rules Editor rule that uses the change function to select records for aggregation instead of
         Aggregate keys.

Composite Type: A semantic type that defines a simple or complex data structure. It contains a set of Attributes, each
         of which is also defined by a semantic type. In the context of a schema mapping, a Composite Type defines
         the type that maps to the top-level record of the schema and may also be used to define the type that maps
         to nested records within a schema.

Connection: An artifact that identifies the location, method of access, and type of resources processed by input and
         output operators in a dataflow.

Constraint: A requirement set on an Atomic Type to qualify data for processing in a dataflow.



                                                           D
Data Processing Engine: The expressor process that executes a Dataflow.

Data Type: A basic data type known to the expressor system such as integer, text string, datetime, decimal, byte,
         double, and Boolean. Each Atomic Type defines a fundamental Data Type for the data it represents.




                                                                                                                       457
expressor Documentation



Dataflow: Graphical representation of an expressor data integration application, constructed with operators that are
         connected by links.

Dataflow Run: An execution of a dataflow. This can occur in the context of development within the expressor Studio
         or in a deployed environment with the expressor Data Processing Engine.

Datascript: A light-weight scripting language used by certain expressor Operators for processing data. The Operators
         that use Datascript are called Datascript Operators.

Datascript Module: A collection of reusable datascript code stored in a single file. Functions within a Datascript
         Module are typically called from within an expressor Datascript Operator to perform common processing
         tasks or to provide the value for an output attribute. They can also be used to perform processing outside of
         expressor, such as initialization or in the context of command operators.

Datascript Operator: An Operator that uses Datascript to define its processing function.

Deployment Package: A collection of dataflows, compiled with their required Connections, Schemas, and Types, to
         be run by the Data Processing Engine.



                                                           E
Engine: See Data Processing Engine

Expression Rule: A Rules Editor rule that assigns an Input Parameter to an Output Parameter with a single Datascript
         expression.



                                                           F
Field: An element in a Record that describes a single unit of data in an external data source, such as a column from a
         database table or field in a file. A Field in a Schema can be mapped to an Attribute whose Semantic Type is
         Atomic.

Function Rule: A Rule written in the Rules Editor that contains a Datascript function.



                                                           I
Input Attribute: The attribute of a Composite Type that has been propagated to a transformation operator.

Input Operator: An operator that reads data into a Dataflow from a source outside of expressor, such as a file,
         database, message queue, or command.

Input Parameter: The input to a rule in the Rules Editor that comes from an operator's Input Attribute.

Iterator: Any datascript construction that iterates over the elements of a collection. For and While statements are
         common iterator constructions.




458
                                                                                                                  Glossary



                                                              L
Library: A collection of Dataflows and other Artifacts that can be used by other Libraries and Projects but cannot be
         deployed independent of a Project.

Library Reference: A reference that a Project makes to a Library to include the Library's artifacts in the Project.
         Libraries can also establish References to other Libraries.

Link: The point-to-point connection between exactly one upstream operator in a Dataflow producing output and one
         downstream operator consuming it.

Local Attribute: An output attribute in the Rules Editor created explicitly by a user or by connecting an output
         attribute to a rule.

Local Semantic Type: A Semantic Type defined for use only within the context in which it is created. See Shared
         Semantic Type

Lookup Expression Rule: An Expression Rule that assigns data to an Output Parameter by using an Input Parameter
         with a Lookup Table.

Lookup Function Rule: A Rules Editor rule that uses a Datascript function to query a Lookup Table.

Lookup Key: An identifier that specifies the attribute(s) used to search data in a Lookup Table.

Lookup Rule: A transformation rule, written in the Rules Editor, that specifies how to look up data in a Lookup Table.

Lookup Table: A database table, accessible to lookup rules, designed to serve a special, limited function within a data
         integration application or group of applications.



                                                              M
Mapping: See Attribute Mapping



                                                              N
Nested Record: The name of a unit of data in a Schema that represents a hierarchical or complex structure of data in
         an external data record such as a nested table from a database table record or a nested record in a COBOL
         record. A Nested Record in a Schema can be bound to an Attribute whose Semantic Type is Composite.



                                                              O
Operator Template: An configured operator saved as a template to use as a preconfigured operator in another
         dataflow or in a different location within the same dataflow.

Operators: Perform the various operations in a Dataflow—input, transform, sort, and output.

Output Attribute: An attribute that has been transferred from an Input Attribute in a transformation operator or
         created explicitly by a users in the Rules Editor.




                                                                                                                      459
expressor Documentation



Output Operator: An operator that takes data from within a Dataflow and sends it to a target outside of expressor
         such as a file, database, message queue, or command.

Output Parameter: A single output from a Rule written in the Rules Editor. An Output Parameter is mapped to an
         Output Attribute.



                                                           P
Project: A collection of Dataflows and other Artifacts in a Workspace.



                                                           R
Range Rule: A Lookup Rule that matches input values with a range of key values.

Record: A set of data elements that is read from a source or written to a target as a unit, such as a row in a database
         or a line of delimited values in a file. Record also refers to the set of data elements described by a Schema or
         Composite Type.

Reference Semantic Type: A Semantic Type created as a non-editable copy of a Local or Shared Semantic Type.

Regular Expression: A syntax for specifying data patterns. expressor uses the POSIX Extended regular expression
         syntax.

Repository: The expressor source-control database used to store Projects and Libaries and their Artifacts

Repository Workspace: A Studio Workspace whose projects and libraries are shared amongst multiple users through
         the revision control system of the expressor Repository.

Required Attribute: An attribute that is required by a downstream Write operator.

Rule: A Datascript expression or function or an Assignment Statement that generates a value assigned to an output
         attribute in the Rules Editor.



                                                           S
Schema: An Artifact that describes the structure of data that is read or written by expressor’s Input and Output
         operators. The Schema is mapped to a Composite Type that is processed by the expressor Dataflow.

Semantic Type: An Artifact that adds semantic information to data to create a consistent type model for data as it
         flows through an expressor Dataflow. Semantic Types are mapped to the Records and Fields of a Schema in
         the Schema’s mappings. There are two kinds of Semantic Type—Atomic Type and Composite Type.

Shared Semantic Type: A Semantic Type defined as an artifact for reuse that can be used in multiple contexts.

Standalone Workspace: A Studio Workspace whose Projects and Libraries are not shared amongst multiple users
         through the expressor Repository. Useful for prototyping, demonstration projects, and initial development
         prior to sharing through the Repository.




460
                                                                                                               Glossary



Step: Set of Dataflow operations, including input and output, that is part of a larger Dataflow. A Dataflow can have
         one or more steps that are run sequentially.

Studio: The expressor GUI application that allows users to build, test, and manage Dataflows and the Artifacts used
         by Dataflows.



                                                          T
Transferred Attribute: An output attribute that has been transferred from an input attribute.

Transformer Operator: An operator that transforms data in a dataflow after it has been read in by an Input Operator
         and mapped to specific Composite Types and before it is written by an Output Operator.



                                                         W
Workspace: Within expressor Studio, the highest level organizing container for Projects and Libraries. See also,
         Standalone Workspace and Repository Workspace.




                                                                                                                   461

				
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
views:55
posted:7/31/2012
language:English
pages:471