Title of Presentation Two Lines Maximum

Document Sample
Title of Presentation Two Lines Maximum Powered By Docstoc
					Informix User Forum 2005        Moving Forward With Informix

             IDS Extensibility
         for Business Advantage

                          Jacques Roy
                     Technical Lead/IBM

       Atlanta, Georgia                 December 8-9, 2005
• What is Database Extensibility?
• Database and Extensibility background
• IDS features
• Extensibility Examples
• DataBlade Modules
• Bladelets
• Considerations for building your own
What is Database Extensibility?
• Ability to add business components in the database
 – Tailor the database to the business environment
• Reduce application complexity
 – Put the processing where it makes the most sense
 – Set processing provided by the database
• Higher performance
 – Less data movement, better data representation, better indexing
• Faster development, lower maintenance cost

            Databases are not commodities!
          Database Background
• Hierarchical/Network models
 – Tied to the physical representation of the data
 – Solves one problem very well
• Relational Model
 – Logical organization of the data
 – Multiple views of the data
• Object-Relational Model (extensibility)
 – Business representation and processing
 – Adapts to the business environment
  Extensibility Background
• Historic Approach
 – user exits
 – device drivers
• Relational Databases
 – triggers, constraints
 – Stored Procedures
• New development platforms
 – Web Server: CGI, NSAPI/ISAPI
 – App Servers: J2EE
• Object-Relational Capabilities
 – Relational Framework

         Databases are not commodities!
  Objects Instantiation Problem

       Branches           ...         ...              10 regions
                                                       100 branches
                                                       10,000 accnts
               ...              ...         ...
Accounts                                               Over 10M objs
                                ...                    And 20M msgs

           At 1KB per object you use 10GB of memory!
         IDS Extensibility Features
• User-defined types         • Complex types
    – Distinct, opaque           – Row, set, multiset, list
•   Table/Type inheritance   •   Polymorphism
•   Functional index         •   User-Defined Function
•   R-Tree index             •   User-Defined Aggregate
•   Smart Large Objects      •   Primary/Secondary
•   More. . .                    Access Methods

    Extensions can be
    written in:
    SPL, C, Java
           Performance Relativity
•   336MHz SPARC  5200MB/sec                                     CPU

•   Memory  1300MB/sec                                           cache

•   System Bus  100MB/sec
                                                             System bus
•   SCSI Bus  20MB/sec
•   SCSI Disk  0.24 to 80MB/sec
                                                         Memory        Disk
    – Track-to-track seek time, rotation latency, etc.               Controller
• Network  1 to 10MB/sec
• Users  0.00001 MB/sec
  (600 char/min, ~120 words/min)
       When to Use UDFs/UDRs
• Eliminate large data transfer
 – Transfer time is much larger than processing time
• Simplify or improve processing
 –   Eliminate table scan
 –   Eliminate application set processing
 –   Define new sorting order
 –   Replace store procedures with aggregates (IDS only)
• Provide business processing to applications
 – Consistency of result
• Eliminate the need for custom interfaces

               Databases are not commodities!
              Better Grouping
• Quarter() function:
 SELECT Quarter(date), SUM(income)
 FROM orders
 WHERE Quarter(date) LIKE '2003Q%‘
• AgeGroup() function:
 SELECT AgeGroup(birthdate, 20, 10)
      SUM(total_price) Total
 FROM customer c, orders o, items I
 WHERE c.customer_num = o.customer_num
 AND       o.order_num = i.order_num
 ORDER BY 1 DESC;            AgeGrouping   Total
                                  40-49    $12646.97
                                  50-59    $ 5507.80
                      Quarter Function
#include <mi.h>

mi_lvarchar *quarter(mi_date date, MI_FPARAM *fparam)
  mi_lvarchar *RetVal;     /* The return value. */
  short    mdy[3];
  mi_integer ret, qt;
  char *pdate, buffer[10];

    ret = rjulmdy(date, mdy); /* Extract month, day, and year from the date */

    qt = (mdy[0] - 1) / 3; /* calculate the quarter */
    sprintf(buffer, "%4dQ%d", mdy[2], qt);
    RetVal = mi_new_var(strlen(buffer));
    mi_set_vardata(RetVal, buffer);

    /* Return the function's return value. */
    return RetVal;
      Compiling and Linking

COMPILE=-I$(INFORMIXDIR)/incl/public -O -c
cc -DMI_SERVBUILD $(COMPILE) quarter.c

ld -G -o mylib.bld quarter.o
chmod a+x mylib.bld (775 or read only to other)
          Creating the function

CREATE FUNCTION quarter(date)
RETURNS varchar(10)
WITH (not variant, parallelizable)
external name "$INFORMIXDIR/extend/class/qlib.bld(quarter)"

GRANT EXECUTE ON FUNCTION quarter (date) TO public;
 Removing the function

DROP FUNCTION quarter(date);
       UDR Processing

Stored Procedures              SQL


UDR                    order           order

                    scan   scan    scan    scan
  Using SPL for Extensions
• Better date manipulation:   CREATE FUNCTION quarter(dt date)
  –   Day of the year         RETURNS integer
                              WITH (NOT VARIANT)
  –   Week of the year
                              RETURN (YEAR(dt) * 100) + 1 +
  –   Week of the month         (MONTH(dt) - 1) / 3;
  –   Quarter                 END FUNCTION;
• Unit conversion
  – Feet  Meters
  – Gallons  Liters          Example:
  – Fahrenheit  Celsius      EXECUTE FUNCTION
• Functional indexes            quarter('9/2/2005');


      See Developer Works Informix zone:
The Node Type

"Hierarchically" aware type: Node
Pre-processed the hierarchical
 relationships                                        1.0
  ƒ ex: Chapter 11, section 7, paragraph 3:
Add new way to relate objects to one
                                               1.1    1.2      1.3
  ƒ IsAncestor(), IsChild(), IsDescendant(),   1.2.1 1.2.2 1.2.3
Can change processing from
 exponential to linear                     
 policies, product classification,                         
 bill-of-material, LDAP, XML, etc.
Node Application Example

    Geo Hierarchy:
     country > state > metro > city
         Policy                            Resource

        Q: What policy apply to the Hyatt in Denver?
        A: A Colorado Policy
            Bill-of-Material Example
                                  CREATE TABLE Components (
 A component can be made up of     ComponentId Node,
        multiple components         Sequence     integer,
                                    Quantity     integer,
 A component is made up of         Name         varchar(30),
        multiple parts              PartNumber   integer

component                         CREATE TABLE Parts (
                                    PartNumber   integer,
                                    Name         varchar(30),
                                    ProviderId   integer,
                                    Price        Money(10,2)
Node performance vs standard
                            Timing Ratios
Levels       Count         Traditional/Node
  2            6                  1.0
  3             42                 1.8
  4            258                 8.4
  5           1554                41.0
  6           9330                37.75
         Databases are not commodities!
     For more information on the
            Node Type
“Downloadable bladelets and demos”


“Open-Source Components for Informix Dynamic
  Server 9.x”
 Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown
 ISBN 0-13-042827-2
                 What is XML?
• XML is a Structured Document Definition
• It Represents a Hierarchy of Elements
• Two Main Types of XML Documents
 – Document and Data
• Two Ways to Process XML Documents
 – Text Retrieval System
 – Explode the XML Document into Columns
                Generating XML
• Converting rows to XML format
 SELECT genxml("customer", customer) FROM customer;
 SELECT genxml("stats",ROW(customer_num, COUNT(*))) FROM
 cust_calls GROUP BY customer_num;
• Creating a complete XML document
 EXECUTE FUNCTION genxmlhdr("customer_set",
      "SELECT * FROM customer");
• User-Defined Aggregate:
 SELECT aggrxml(customer, "customer_set")
 FROM customer;

See: Generating XML from IDS 9.x
     Other XML Possibilities
• Decompose (Shred) an XML document into a table row
• Generate rows from an XML document stored in the database
• Use a generic table to keep track of the element type and
  position in the hierarchy
• Use the XML Path language to extract values from an XML
 –   /table/row/lname
 –   /table/row[1]/lname
 –   /table/row[lname='Pauli']/customer_num
 –   /table/row[@operation='I']/lname
 –   /table/*/lname

        Stay Tuned to DeveloperWorks!
           Fine-Grained Auditing
• Use triggers and user-define routine
• Register event processing
 – Commit or rollback
• Send events to file or outside process
• Use a generic function for any table
• Available in IDS 9.40.xC4 or later

  See the developerworks article:
   Event-driven fined-grained auditing with Informix Dynamic Server
        New Trigger Use

CREATE TRIGGER tab1instrig
    INSERT ON tab1
           Event-Driven Architecture
           Commit/Rollback       6
       5                                           7
                             Callback                        MonitorProgram
      1                                                      8


                             2              4

            Table                               EventTable
 Inheritance and Polymorphism

                  Telco    Retail   Healthcare   Financial
Manufacturing                                    Services

                Clothing   Food      Novelties
Inheritance and Polymorphism (cont.)
   SELECT branch_id, AVG(risk(loans))
    FROM loans
    GROUP BY 1
    HAVING AVG(risk(loans)) > 1

   SELECT branch_id, AVGRISK(loans)
    FROM loans
    GROUP BY 1
    HAVING AVGRISK(loans) > 1
   Replacing Store Procedures
           with UDA
• Business problem: Merger multi-polygon types (ESRI)
• Original Solution:
 – SPL stored procedure (82 lines, 3 SELECT statements, 1 insert
   statement, 2 embedded FOREACH)
• New Solution: User-Defined Aggregate
 – 23 lines of SPL, no SQL, no FOREACH

 SELECT a.user_id, a.case_id, a.case_type,
      do_union(a.shp) shp
 FROM case_shp a
 WHERE user_id = "user0001“
 GROUP BY 1, 2, 3, 4, 5
 INTO TEMP my_temp_table WITH NO LOG;
    User-Defined Aggregate Code
CREATE FUNCTION do_union_iter(state lvarchar, arg lvarchar)
RETURNING lvarchar

 DEFINE retval   lvarchar;

  IF (state IS NULL ) THEN
    RETURN arg ;
    RETURN state ;

  LET retval = state || arg ;
  RETURN retval ;


WITH (ITER=do_union_iter, COMBINE=do_union_iter);
       User-Defined Aggregates
• A lot more flexible than standard aggregate functions
• Can take any type as input
 – ex.: row type
• Can take an initialization parameter of any type
 – ex: row type
• Can return complex types
 – ex: list
          Fabric Classification
• Business Problem:
  Provide an efficient way to select fabrics
• Indexing Colors:
  Cielab coding (3 dimensions) and other attributes
  (Use the R-tree multi-dimensional indexing method)
• Fabric Type Hierarchy
  Requires a hierarchy-aware type
  ex: We want a “natural” fabric
• Fabric Style and Patterns
  ex: What does “Victorian” mean?
              Other examples
• Other multi-dimensional problems:
 – 3D CAD drawings
• Support for Globally Unique Identifier (GUID)
• Implementation of unsigned integer
• Soundex/Phonex
 Key to Successful Extensibility
• Start small, develop your expertise
 – remember the first OO projects
• Use pre-built extensions
• Study examples

• Approaches:
 – Use DataBlade Modules
 – Use Bladelets
 – Build your own
 What are DataBlade Modules?
• Business Components
 – A set of functionality that solves a specific business
• Building Blocks
• Can include:
 – User-defined types
 – User-defined Functions
 – Tables, views
 – Client component
• An DataBlade can come from IBM, a third party
  or be built in-house
      IDS DataBlade Modules
• Spatial                     • MQSeries
 – Free of charge              – Included in IDS 10.0 xC3
 – with IDS 9.30 and higher   • C-ISAM
• Geodetic                    • Image Foundation
• TimeSeries                  • Video Foundation
• NAG                         • Web
• Real-Time Loader            • Excalibur Text
• Large Object Locator        • Excalibur Image
  (LLD)                       • XSLT
 – Comes with IDS 9.x
          Spatial is Everywhere
• Where are my stores located related to my distributors?
• How can I efficiently route my delivery trucks?
• How can I micro-market to customers fitting a particular
  profile near my worst performing store?
• How can I set insurance rates near to flood plain?
• Where are the parcels in the city that are impacted by a
  zoning change?
• Which bank branches do I keep after the merger based
  on my customers locations (among other things)?
         Complex Spatial Example
                                         CREATE TABLE e_Yellow_Pages (
                                           Name        VARCHAR(128)       NOT NULL,
•Raise the level of abstraction at the
                                           Business Business_Type         NOT NULL,
database level.                            Description Document        NOT NULL,
•Buy components to solve common            Location    GeoPoint        NOT NULL,
problems, and build components to          Occupied SET( Period NOT NULL )
achieve a competitive edge               );
                                         -- “Show me available service stations specializing
                                         -- in Porsche brakes within 30 miles of where
                                         -- I am?”
                                          SELECT Y.Name
                                            FROM e_Yellow_Pages Y
                                          WHERE Contains ( Y.Location,
                                                       Circle( :GPS_Loc, „30 Miles‟ ) )
                                              AND Y.Business MATCH “Automotive Service”
                                              AND DocContains( Y.Description,
                                                                   “Porsche AND brakes” )
                                             AND NOT Booked ( Y.Occupied,
                                                       Period (TODAY,TODAY+5));
TimeSeries DataBlade Module
• A time series is a set of data as it varies over time
• TimeSeries DataBlade optimizes storage usage
 – 50% Savings not uncommon
• Optimized Access Time
 – 10 times performance improvement typical
• Calendars
 – Defines period of time that data may or may not be collected
• SQL, Java, and C interfaces
• VTI Interface
 – Makes a time series look like a regular table
• Office Connect Interface
 – Web enables Microsoft Excel plug-in
Who’s Interested in TimeSeries
• Capital Markets
 – Arbitrage opportunities, breakout signals, risk/return
   optimization, portfolio management, VaR calculations,
   simulations, backtesting...
• Telecommunications:
 – Network monitoring, load prediction, blocked calls (lost
   revenue) from load, phone usage, fraud detection and
• Manufacturing:
 – Machinery going out of spec; process sampling and analysis
• Logistics:
 – Location of a fleet (e.g. GPS); route analysis
• Scientific research:
 – Temperature over time...
  The NAG DataBlade Module
• NAG: Numerical Analysis Group
  – over 30 years experience
  – Experts in numerical and statistical computation
  – Reputation for accuracy and performance
• Business functions: convert timeseries data to vectors and
  vectors of returns; present value calculations; ErlangB
  (network) calculations; data export functions
• NAG Fortran functions: correlation and regression analysis;
  variance-covariance matrix generation; optimization;
  eigenvectors, eigenvalues; ...
• Examples:
  – Oil industry: temperature variability down an oil well
  – Finance: calculate volume weighted average price and volatility
  – Spatial routing: Find the shortest path between two points
                     IDS Bladelets
• Located at:
• Bladelets:
  – mrLvarchar, Node, regexp, shape, exec, period, etc.

• For detailed information, see:
"Open-Source Components for Informix Dynamic Server 9.x "
  Jacques Roy, William W. White, Jean T. Anderson, Paul G. Brown
  ISBN 0-13-042827-2
includes: Node, Period, ffvti, exec, shape, sqllib/IUtil, regexp,
                            mrLvarchar, JPGImage
       mrLvarchar and regexp
• Store data efficiently based on its length
 – Document data vary in length
• Many business data set include such documents
 – Web pages, XML documents, product description, articles,
• mrLvarchar stays in row when shorter than 2KB, in
  BLOB otherwise
• Includes useful functions
 – Snip, clip, concat, concatAll, Upper, Lower, Length, Instr,
   regexp functions
• RegExp
 – regexp_match, regexp_replace, regexp_extract, regexp_split
 – Search on mrLvarchar
 – more complete search capabilities than LIKE and MATCHES
• Time periods comparisons based on dates or datetimes
 – Manages information about fixed intervals in a timeline
• Rich comparison functions
 – Equal(), NotEqual(), WithinNotTouches(), Within(),
   ContainsNotTouch(), Contains(), AfterTouches(), BeforeTouches(),
   Overlap(), Before() After()
• R-Tree Indexing
• Can be used in any type of scheduling
 – Hotel room reservations
 – Consultant scheduling
 – Equipment scheduling
• Advantages:
• simpler SQL, Range indexing (R-Tree), proper behavior of a
• Example:
 "Are there any situations where two different trains are scheduled on
   the same track over the next week?"
              New Applications
• Tracking Systems (TimeSeries, Spatial)
 – RTD buses, U-Haul, Cell phones
• Customer Services: closest location
• Security Systems: face recognition, fingerprints,
• Handling hierarchical problems:
 – material types, policies, XML, bill-of-material, etc.

 Building Your Own Extensions
• Use the DataBlade Development Kit (DBDK)
• Learn the DataBlade API
 – Many ESQL/C functions are also part of the DAPI
• Review the latest documentation in the release directory

• For More Information on the DataBlade API:
"Informix Dynamic Server.2000:Server-Side Programming
  in C"
  Jacques Roy ISBN 0-13-013709-X
     Development Environment
 – Standard SPL environment
 – Include directory: $INFORMIXDIR/incl/public
 – Makefile generated by DBDK
 – DBDK Include file for Makefile:
• Java
 – Server configuration: onconfig parameters
   (see release notice and machine notes)
 – $(INFORMIXDIR)/extend/krakatoa/krakatoa.jar
             DBDK Components
• GUI tool, runs only on Windows NT
• Bladesmith
 – Defines/build DataBlade modules
 – Generates functional tests
 – Generates script info for install
• Bladepack
 – Organize distribution
 – Generates installshield script
• Blade manager
 – Stores info in the database
 – install/uninstall DataBlade Modules in the application database
    Programming Environment
• Proprietary Thread implementation
• Non-Preemptible Threads
• Multiple Processes on UNIX
• Multiple Threads on NT
        Threading Restrictions
• Function Libraries
 – Re-entrant
• Signals
• Memory Allocation
• Blocking Calls
 – File Access
           Dynamic Libraries
• Used for Server-Side Functions
• UNIX: dlopen(), dlsym(), dlclose()
• NT: loadLibrary(), GetProcAddress(),
• Symbol Visibility
• IDS 9.x is a framework for business solutions
• You can use building blocks to speed up the
  creation of solutions
• The database can adapt to the business
• IDS is a full fledge partner in your business
• Faster Time-to-Market, Higher Performance,
  Lower Maintenance
              Business Advantage!
         Databases are not commodities!
Informix User Forum 2005        Moving Forward With Informix

                   IDS Extensibility
        for Business Advantage

                          Jacques Roy

       Atlanta, Georgia                 December 8-9, 2005

Shared By: