Docstoc

Perl Extracts Manual

Document Sample
Perl Extracts Manual Powered By Docstoc
					                                   Magnolia Data Warehouse Starter Kit
                                                  Perl Extracts Manual




                      Perl Extracts Manual

               Magnolia Data Warehouse Starter Kit




                                                        Bruce Linn
                                                     Chris Sullivan
                                                  September 3, 2002




Version 1.01                                              Page 1 of 61
                                                                                       Magnolia Data Warehouse Starter Kit
                                                                                                      Perl Extracts Manual




Table of contents:

1 Purpose ..................................................................................................................................... 3
2 Tyler System ............................................................................................................................. 3
2.1    Datafiles ................................................................................................................................ 3
3 Extract Design ........................................................................................................................... 5
3.1    Program Flow ........................................................................................................................ 5
3.2    Components .......................................................................................................................... 6
3.2.1    runall.pl .............................................................................................................................. 6
3.2.1.1 Configuration File............................................................................................................... 6
3.2.1.2 Command Line Options ..................................................................................................... 8
3.2.1.3 FTP .................................................................................................................................... 9
3.2.2    extract.pl ............................................................................................................................ 9
3.2.2.1 Configuration File............................................................................................................... 9
3.2.2.2 Command Line Options ................................................................................................... 11
3.2.3    finddiff.pl .......................................................................................................................... 11
3.2.3.1 Configuration File............................................................................................................. 11
3.2.3.2 Command Line Options ................................................................................................... 13
3.2.4    ExtractCommon.pm ......................................................................................................... 14
3.2.4.1 Configuration File............................................................................................................. 14
4 Environment ............................................................................................................................ 16
4.1    Components and locations .................................................................................................. 16
4.2    Scheduling (cron) ................................................................................................................ 16
4.3    Error Notification.................................................................................................................. 17
4.4    Log Files .............................................................................................................................. 17
4.5    Error Messages ................................................................................................................... 19
4.6    Verification of run ................................................................................................................ 19
4.7    Purge of files ....................................................................................................................... 19
5 Maintenance of Extract Process ............................................................................................. 21
5.1    Abnormally Terminated Process ......................................................................................... 21
5.2    Update of Datafile................................................................................................................ 22
5.3    Addition of New Datafiles .................................................................................................... 22
5.4    Change of FTP site ............................................................................................................. 22
5.5    Change of File Names......................................................................................................... 22
5.6    Change of File Locations .................................................................................................... 23
5.7    Rerunning Extract for Missed Transactions ........................................................................ 23
Appendix ........................................................................................................................................ 25
  Error Messages .......................................................................................................................... 25
  Source Code ............................................................................................................................... 29
     1. runall.pl ............................................................................................................................ 29
     2. extract.pl .......................................................................................................................... 34
     3. finddiff.pl .......................................................................................................................... 39
     4. finddifftrans.pl .................................................................................................................. 44
     5. ExtractCommon.pm ......................................................................................................... 44
  Configuration Files ...................................................................................................................... 49
     1. runall.config ..................................................................................................................... 49
     2. extract.config ................................................................................................................... 52
     3. finddiff.config.................................................................................................................... 58
     4. ExtractCommon.mail ....................................................................................................... 60




Version 1.01                                                                                                                    Page 2 of 61
                                                            Magnolia Data Warehouse Starter Kit
                                                                           Perl Extracts Manual




                                        Revision History
   Date                Name                              Revision                        Number

  9/3/2002           Bruce Linn                         Initial Version                      1.00

 10/3/2002           Bruce Linn          Provided information on ability to override         1.01
                                          Ok file names for runall.pl. Updates to
                                           runall.pl information, source code and
                                                       configuration file.




1 Purpose
The purpose of this document is to provide a comprehensive guide to the Perl programs developed to
extract data from Magnolia’s Tyler System.

2 Tyler System
The Tyler System is Magnolia’s single system for all of its operations. Tyler is the name of the software
company that created and still maintains the system. The system was developed in Microsoft’s Quick
Basic using its “random files” to store data in an indexed means. These files are what are referred to as
datafiles in this document.

2.1   Datafiles
There are many datafiles that make up the entire Tyler System. Eight logical datafiles were identified for
the purposes of the Data Warehouse Starter kit. The locations of the datafiles are the same on both the
development and production boxes.

Datafile              Physical Locations
Dimension Files
  Branch.info         /home/DSK/DA1/STBIF.dat
  Dept.names          /home/DSK/DA1/TSSADEPT.dat
  Inv.brands          /home/DSK/DA1/TSSAVEND.dat
  Inventory.mast      /home/DK2/DA3/IP01.dat
  Payroll.master      /home/DSK/DA1/STP1.dat
  Product.types       /home/DSK/DA1/STSATYPE.dat
Transaction Files
  Trans.header        /home/DK3/DA4/TP01N.dat
                      /home/DSK/DA1/TP0102.dat
                      /home/DSK/DA1/TP0103.dat
  Trans.detail        /home/DK3/DA4/TP02N.dat
                      /home/DSK/DA1/TP0202.dat
                      /home/DSK/DA1/TP0203.dat

The data dictionary definitions of the datafiles were provided via the Tyler system. See the “flat file to
staging transformations” document for the detailed descriptions of the datafiles. The physical datafiles
are close, but not exactly the same as the dictionary definitions. The two biggest differences are:




Version 1.01                                                                            Page 3 of 61
                                                              Magnolia Data Warehouse Starter Kit
                                                                             Perl Extracts Manual


1       The key defined for the datafile is repeated at the beginning of each line. Therefore, to use the
        data dictionary offsets, you must adjust by the physical size of the key at the beginning of the line.
2       It is possible for multiple logical records to be stored on one physical line within the datafile. This
        appears to be a function of the Quick Basic random files.

One other inconsistency within the datafiles is the manner that the inventory.mast is stored. All of the
other datafiles are stored in ASCII text and are easily read. The inventory.mast appears to contain a
variety of numbers stored in a “packed” format. After a fair amount of research, I was still unable to
determine how these values are being stored within the datafile. Tyler is working to create a program to
dump the contents of this datafile into a usable format.

Datafiles also have an index file. It is possible for a record to be deleted (logically) in the index file and
still appear in the datafile itself. There is a delete flag that the Perl extract program looks for at the end of
the datafile record. When this delete flag is set to \x00, the record has been logically deleted and the
extract program will not include it in the extract file.




Version 1.01                                                                               Page 4 of 61
                                                            Magnolia Data Warehouse Starter Kit
                                                                           Perl Extracts Manual




3 Extract Design
All programs of the extract where designed as general-purpose programs. In other words, the programs
were designed to extract data from datafiles in general, not just the eight identified at this point in the
project. To accommodate this design, each of the Perl programs has a companion configuration file. The
configuration files maintain machine dependent information as well as a variety of meta-data describing
the datafiles.

The programs have also been designed with a reasonable amount of program logging. The programs will
report both successes as it moves along or errors as it stops. The design includes the ability for the
programs to contact a support person via email in the case of failure.

3.1   Program Flow



                                                    Tyler
                                                    Datafiles




                                             2        Extract.pl                Extracted
                                                                                Data


                    1
                                             3                                  Prior Day
                        runall.pl                                               Extracted
                                                       Finddiff.pl
                                                                                Data


                                                         Identify changes
                                                         for dimension tables
                                     Deltas and
                    FTP        4     Transactions


                     Best Buy
                     DW Staging
                     for ETL


1.      Run the runall.pl program. This begins the entire process and calls the other programs directly.
2.      extract.pl is run by runall.pl. This program will read through the dimension datafiles extracting
        data as directed and create a complete dump of the datafile. It will only extract new records for
        each of the transaction files.
3.      finddiff.pl is run by runall.pl. This program will compare the dump of the datafile to the previous
        day’s dump for the dimension tables of the data warehouse.
4.      runall.pl will transfer the delta files and the transaction files to the staging server.




Version 1.01                                                                                Page 5 of 61
                                                            Magnolia Data Warehouse Starter Kit
                                                                           Perl Extracts Manual


3.2   Components

3.2.1 runall.pl
runall.pl is the Perl program which controls the whole extract process at a high level. To start the entire
process, this is the only program that needs to be run, as it will call the others as needed. The basic
functions it initiates are:
 Extract information from datafiles
 Determine changes to dimension files
 FTP to Best Buy data warehouse staging machine
 Purge old files


3.2.1.1 Configuration File

The default configuration file name for this program is runall.config. This can be overridden with a
command line option. See the Appendix for a complete sample of this configuration file.

There are four different sections to this configuration.
1. Location of the programs that runall.pl will run
2. FTP target server information
3. File ftp information
4. Purge old files


3.2.1.1.1 Command Locations
This section provides the ability to change or move the extract and difference programs.
 extract           - name/location of the extract Perl program
 difference        - name/location of the difference Perl program
 gzip              - directory and program name for gzip utility

Format:
command:extract: cmd=<extract>
command:diff: cmd=<difference>
command:gzip: cmd=<gzip>




Version 1.01                                                                            Page 6 of 61
                                                              Magnolia Data Warehouse Starter Kit
                                                                             Perl Extracts Manual



3.2.1.1.2 FTP Target server
This section provides all of the information necessary for the target FTP server. The target server is the
machine that will receive the files from the Magnolia server. This information can easily be changed as
needed for either environment changes or for testing.

FTP target server information
 host               - DNS or IP of the target server
 userid             - FTP userid on target server
 password           - FTP password for userid on target server
 directory          - destination directory for files on target server
 okfile             - name/location of file that will be sent when file has been ftp'd correctly. This is a
   dummy file that will be sent as a signal for ETL to indicate the delta or transaction file has been sent
   successfully.

Format:
config:target: host=<host> userid=<userid> passwd=<password>
config:target: directory=<directory>
config:target: okfile=<okfile>


3.2.1.1.3 FTP File Information
This section lists all of the files that are going to be FTP’d. It also includes the name the file will have
when it is transferred to the target server.

File FTP information
 datafile            - common name of datafile this information is for
 source              - name/location of delta or transaction files on Magnolia server
 target              - name of file once placed on BBY data warehouse staging server. In the target
     name, the <YYYYMMDD> will be resolved by the program to the previous date.
 ok                  - By default, the ok file will be named to match the target file with an “.ok” extension.
     This option allows you to override that naming scheme for the ok file. The option was provided
     because it was found that the scheduling system for ETL had issues with ok file names in excess of
     28 characters.
 concat              - The transaction files must be concatenated from three regional files into one for
     FTP. This is the name of the single file that the three will be combined into. This line is optional as it
     should not be included for the dimensional datafiles.

Format:
ftp:<datafile>:        source=<source>
ftp:<datafile>:        target=<target>
ftp:<datafile>:        concat=<concat>
ftp:<datafile>:        ok=<ok>




Version 1.01                                                                               Page 7 of 61
                                                              Magnolia Data Warehouse Starter Kit
                                                                             Perl Extracts Manual



3.2.1.1.4 Purge file information
This section directs the program in what files should be purged and at what time frame. The configuration
files tells the program which directories to read and the retention period in days. The program determines
the age of the file by the YYYYMMDD embedded in the file name, not by the system date of the file. For
this reason, the program will only delete records with the date embedded in the file name.

Purge information
The program will go through any directories listed here and look for files to delete.
 dir               - Name of directory to look for files (i.e. /home/bby/)
 retain            - Number of days to retain files. To retain files for two weeks a value of 14 should be
   used.

Format:
purge:<dir>: retain=<#>



3.2.1.2 Command Line Options
There are a few command line options that may be specified when running the runall.pl program. Most
are intended for testing, but some may be useful when fixing an abended run.

The available options are:
Option                            Meaning                            Usage
-c <file>                         Allows user to pass in name of     Testing - Yes
                                  a configuration file to use        Production – could be used to
                                  instead of the default             temporarily change
                                                                     configuration for a special run
-d <YYYYMMDD>                     Allows pass in of a start date     Testing - Yes
                                  for the extract process.           Production – Could be used
                                  Default is to extract for          when extracting for a longer
                                  previous day.                      period than previous day
-q <YYYYMMDD>                     Allows pass in of an end date      Testing - Yes
                                  for the extract process. The       Production – Could be used
                                  end date is not inclusive.         when extracting a specific
                                  Default is the current system      period of dates.
                                  date.
-r <#>                            Limit number of records to be      Testing – Yes
                                  extracted from each datafile       Production – Probably not
-s y                              Only attempt FTP of all files.     Testing – Yes
                                                                     Production – Can be used if
                                                                     FTP fails. This would allow
                                                                     resend without extracting.
-x y                              Allow create of "delete            Testing – Yes
                                  records" by diff program           Production – By definition, no.
                                                                     If it is determined that a
                                                                     “delete” type record is needed,
                                                                     turn this on to see if it creates
                                                                     what you need.
-p y                              Only run file purge logic          Testing – Yes
                                                                     Production – Probably no

Examples:
Run extract with a test configuration file selecting records from 08/08/2002 until current date.


Version 1.01                                                                            Page 8 of 61
                                                             Magnolia Data Warehouse Starter Kit
                                                                            Perl Extracts Manual


runall.pl –c test.config –d 20020808

Run extract with a limit of 50 records for each data file.
runall.pl –r 50

Run extract only attempting to FTP the files back to staging server
runall.pl –s y


3.2.1.3 FTP
The program will process through all of the files entered into the configuration and send them to the target
server. Once the transfer is complete, the program will do a file size comparison between the source file
on Magnolia to the one on the staging server. If they match, an “ok file” will be sent to tell ETL that the file
has been transferred successfully and can be loaded.



3.2.2 extract.pl
extract.pl is the Perl program which will do the extraction of data from the Tyler datafiles. It has been
written as a general purpose Tyler datafile extractor. In other words, the program has been made to
extract most any Tyler datafile rather than the specific ones needed at this time. To achieve this
flexibility, a configuration file has been used which will contain data dictionary type information describing
the datafiles to the extract. Going forward, it should be relatively simple to change or add datafiles and
their layouts to the extract by updating the configuration file.

3.2.2.1 Configuration File
The default configuration file name for this program is extract.config. This can be overridden with a
command line option. See the Appendix for a complete sample of the configuration file.

This configuration file has three sections that provide the extract with the necessary information:
1) Datafile lists
2) File level attributes
3) Field level attributes


3.2.2.1.1 Datafile Lists
The first section lists the available datafiles to extract. The datafiles are referred to by their common
name (i.e. dept.names). There is also a “default flag” that indicates whether the datafile should be run as
part of the typical run of the extract program.

Datafile Information
 common name               - name that identifies the datafile (i.e. dept.names, inv.brands, etc.)
 default                   - Indicates whether this datafile should be extracted as part of the normal
   extract execution. “y” = yes, extract it, “n” = no, do not extract it.

Format:
datafiles:<common name>: default=<y|n>


3.2.2.1.2 File Level Attributes

The second section is the "config" section. This section describes the necessary attributes of the datafile
at a file level.


Version 1.01                                                                             Page 9 of 61
                                                               Magnolia Data Warehouse Starter Kit
                                                                              Perl Extracts Manual


   common name                - The datafile that this section is describing.
   keyOffset                  - Each datafile stores its key at the beginning of each line of the file. This
    section of the line will be skipped over when reading in the fields based on offset. The key offset is
    the number of characters of the key that should be skipped.
   recLength                  - Physical length of one logical record in characters (bytes). This is needed
    because the Tyler file system sometimes puts multiple records on one physical line. This tells the
    program where to make the break between logical records.
   type                       - Special purpose flag to indicate if this datafile is a transaction file. This tells
    the extract program to only pull records from the previous day. This is set to “trans” for the
    transaction history and detail datafiles. Leave blank for all others.
   datafile                   - Lists the path and filename of the datafile you wish to extract.
   outFile                    - Lists the path and filename of the extract file to create. This filename will be
    appended with the date (YYYYMMDD)

Format:
<common name>:config: keyOffset=<#> recLength=<#> type=<trans|blank>
<common name>:config: datafile=<physical location of datafile>
<common name>:config: outFile=<physical location to write extract to>


3.2.2.1.3 Field Level Attributes
The third section lists the field attributes for each datafile. This information can largely be taken directly
from the Tyler data dictionary listings. The section also controls the sequence the files are written to the
extract file as well as some minimal formatting.
 common name                   - The datafile that this section is describing.
 field                         - The name of the field that will be described. The name must be unique
    within the datafile.
 type                          - "s" for string. "n" for numeric. Only use "n" when you want a decimal
    removed from the format of a number. Decimal signs are being stripped from numeric fields at the
    request of the ETL group.
 start                         - Starting position of field within datafile (usually based on the Tyler data
    dictionary definition).
 length                        - Length of field in the file (usually based on the Tyler data dictionary
    definition).
 output                        - This is the order in which the fields will be written out in the extracted file.
    The numbers should start with 0 and increment by 1. If two fields are listed with the same number,
    only one of the two files will appear in the file. This sequence is unique to the datafile that is being
    defined.
 format                        - Format mask that will be used when writing out the field. For the purposes
    of this extract, all formats are being written as "strings". The format assures a minimum number of
    characters to be written for the field. It cannot be used to truncate a field (use the start and length to
    do this). If you understate the length of a format, the actual length of the field will override. The
    format of “%15s” will guarantee that for this field a minimum of 15 characters will be printed. If the
    value is less than 15 characters, blanks will be added at the end of the field.
    NOTE: When using the type="n", the length of the field will be one less that the Tyler definition since
    this option strips the decimal character.

Format:
<common name>:<field name>: type=<s|n> start=<#> length=<#> output=<#>
format=%<min length for field>s




Version 1.01                                                                                Page 10 of 61
                                                             Magnolia Data Warehouse Starter Kit
                                                                            Perl Extracts Manual


3.2.2.2 Command Line Options

There are a few command line options that may be specified when running this program. Most are
intended for testing, but some may be useful when fixing an abended run.

The available options are:
Option                             Meaning                           Usage
-c <file>                          Allows user to pass in name of    Testing - Yes
                                   a configuration file to use       Production – could be used to
                                   instead of the default            temporarily change
                                                                     configuration for a special run
-d <YYYYMMDD>                      Allows pass in of a start date    Testing - Yes
                                   for the extract process.          Production – Could be used
                                   Default is to extract for         when extracting for a longer
                                   previous day.                     period than previous day or for
                                                                     specific dates.
-q <YYYYMMDD>                      Allows pass in of an end date     Testing - Yes
                                   for the extract process. The      Production – Could be used
                                   end date is not inclusive.        when extracting a specific
                                   Default is the current system     period of dates.
                                   date.
-r <#>                             Limit number of records to be     Testing – Yes
                                   extracted from each datafile      Production – Probably not
-e <common name>                   Extract only datafile specified   Testing – Yes
                                   by “common name” (i.e.            Production – Can be used if
                                   dept.names, inventory.master,     an abend causes a specific file
                                   etc.)                             to need to be recreated.

Examples:
Run extract with a test configuration file selecting records from 08/08/2002 until current date.
extract.pl –c test.config –d 20020808

Run extract with a limit of 50 records for each data file.
extract.pl –r 50

Run extract for only the dept.names datafile.
extract.pl –e dept.names


3.2.3 finddiff.pl
finddiff.pl is the program that determines on a daily basis what, if anything, changed in the dimensional
datafiles. Only additions or changes to the dimension datafiles will be sent to ETL.

The program determines the additions and changes by comparing the complete extract, provided by
extract.pl, with the previous day’s complete extract.


3.2.3.1 Configuration File
The default configuration file name for this program is finddiff.config. This can be overridden with a
command line option. See the Appendix for a complete sample of the configuration file.

There are two sections to this configuration.
1. Location of the programs that finddiff.pl will run



Version 1.01                                                                           Page 11 of 61
                      Magnolia Data Warehouse Starter Kit
                                     Perl Extracts Manual


2. File information




Version 1.01                                Page 12 of 61
                                                            Magnolia Data Warehouse Starter Kit
                                                                           Perl Extracts Manual



3.2.3.1.1 Command Locations
This section provides the ability to change or move the gzip utility.
 gzip              - directory and program name for gzip utility

Format:
command:gzip: cmd=<gzip>


3.2.3.1.2 File Information
This section defines some basic file information that the difference program will use. Each datafile extract
to process must be defined in this configuration file.
 common name                   - The datafile that this section is describing.
 input                         - The path and file base name of the current day’s extract file. The
    embedded date in the file name will be assumed by the programs.
 output                        - The path and file base name to write any additions or changes identified by
    the difference program. The file name will be appended with the date YYYYMMDD
 keylength                     - The length of the key for the data contained in the file. It is assumed that
    the key is the first field(s) of the file. The key will be used when determining if a record is a new or
    changed one.
 default                       - datafiles listed as default=y will be automatically processed when the
    program is run

Format:
datafiles:<common name>: input=<Location and base name of extract file>
datafiles:<common name>: output=<Base name of difference file>
datafiles:<common name>: keylength=<#> default=<y|n>


3.2.3.2 Command Line Options
There are a few command line options that may be specified when running this program. Most are
intended for testing, but some may be useful when fixing an abended run.

The available options are:
Option                             Meaning                              Usage
-c <file>                          Allows user to pass in name of       Testing - Yes
                                   a configuration file to use          Production – could be used to
                                   instead of the default               temporarily change
                                                                        configuration for a special run
-e <common name>                   Run difference on only datafile      Testing – Yes
                                   specified by “common name”           Production – Can be used if
                                   (i.e. dept.names,                    an abend causes a specific file
                                   inventory.master, etc.)              to need to be reprocessed.
-x y                               Creates “delete” records in          Testing – Yes
                                   delta files                          Production – By definition No,
                                                                        but if it is determined at some
                                                                        point in the future that deletes
                                                                        should be captured as well,
                                                                        then this could be turned on to
                                                                        create this type of record (puts
                                                                        a “D” in the rec_typ_cd).

Examples:



Version 1.01                                                                             Page 13 of 61
                                                             Magnolia Data Warehouse Starter Kit
                                                                            Perl Extracts Manual


Run difference with a test configuration file.
finddiff.pl –c test.config

Run difference for only the dept.names datafile.
extract.pl –e dept.names


3.2.4 ExtractCommon.pm
ExtractCommon.pm is a Perl module that contains several subroutines that are used by all of the Perl
programs. A Perl module is a means to share code between Perl programs. This Perl module contains
the following functions:
      Error notification
      Date calculation
      Date validation
      Configuration file loading
      Email sending

The sending of emails may not be feasible due to technical difficulties in setting up the Magnolia servers
as SMTP servers. At this time, the ability to send emails has been programmed and tested as much as
possible, but a full test can not be completed without the SMTP server installation. This section of the
program will need to be thoroughly tested if at some point in the future SMTP is enabled.


3.2.4.1 Configuration File
ExtractCommon.pm has a configuration file that contains information required to send emails when the
processes fails or completes. There are two sections to the configuration file. The first contains
miscellaneous attributes needed for the setup. The second section contains the email addresses that any
messages from the process will be sent.


3.2.4.1.1 Miscellaneous Information
At the time of this documentation, it is still uncertain as to whether the extract process will or will not send
email messages. The Perl programs have been programmed with some basic support of emailing
messages, but at this time it has not been completely tested.

   enabled                  - Is smtp mail available on this server (y or n)? If set to "n", no attempt will be
    made to send emails (and therefore no error messages about the inability to send them). This flag
    has been included as it is probably that the extract process will not send emails.
   address                  - Default email address that email will be sent from. This will be the email of
    the return address.

Format:
default:mail: enabled=<y|n>
default:from: address=<email address>




Version 1.01                                                                             Page 14 of 61
                                                              Magnolia Data Warehouse Starter Kit
                                                                             Perl Extracts Manual



3.2.4.1.2 Email Addresses
This section of the file will list the email addresses to send the notifications.

   Address                     - An individual address to send the emails generated by the extract process
   type                        - Is the address to be included on the "to" or "copy" of the message. Values:
    “to” or “copy”.
   name                        - Optional. Not used at this point, but may be nice to keep track of who the
    email address is for if it is not obvious.

Format:
email:<address>: type=<to|copy> name=<name>




Version 1.01                                                                           Page 15 of 61
                                                            Magnolia Data Warehouse Starter Kit
                                                                           Perl Extracts Manual




4 Environment

4.1   Components and locations
This section lists the critical components of the extract as a whole. The locations of the programs,
configuration files, and generated files are all configurable through the configuration file. The locations
listed here represent how they were initially configured.
Magnolia production machine (168.94.140.50):
      Perl 5.6.1                      - /opt/perl/bin/perl
      Additional Perl Modules
          libnet-1.12                 - Provides Perl support to FTP
          Mail-Sender-0.8.00          - Provides Perl support to Mail
      Programs / Configuration Files:
          runall.pl                   - /home/bby
          runall.config               - /home/bby
          extract.pl                  - /home/bby
          extract.config              - /home/bby
          finddiff.pl                 - /home/bby
          finddiff.config             - /home/bby
          ExtractCommon.pm            - /home/bby
          ExtractCommon.mail          - /home/bby
          ok.file                     - /home/bby
      Generated Files:
          Dimension files             - /home/bby/files
          Transaction files           - /home/bby/files
          Delta files                 - /home/bby/files
          FTP location                - sent to dsd02x

Magnolia development machine (168.94.140.51):
    Perl 5.6.1                  - /opt/perl/bin/perl
    Additional Perl Modules
       libnet-1.12              - Provides Perl support to FTP
       Mail-Sender-0.8.00       - Provides Perl support to Mail
    Programs / Configuration Files:
       runall.pl                - /home/bby
       runall.config            - /home/bby
       extract.pl               - /home/bby
       extract.config           - /home/bby
       finddiff.pl              - /home/bby
       finddiff.config          - /home/bby
       ok.file                  - /home/bby
    Generated Files:
       Dimension files          - /home/bby/files
       Transaction files        - /home/bby/files
       Delta files              - /home/bby/files
       FTP location             - /home/bby/testftp


4.2   Scheduling (cron)
The extract program will be scheduled through the existing UNIX CRON scheduler. Chris Sullivan of
Magnolia will maintain this schedule. The runall.pl is the only program that needs to be scheduled and



Version 1.01                                                                           Page 16 of 61
                                                             Magnolia Data Warehouse Starter Kit
                                                                            Perl Extracts Manual


run as it calls the other programs as needed. The program should be scheduled to run once per day after
midnight. The program will always attempt to extract transaction records for the previous calendar day.

The command line to schedule should include redirecting of the standard out (STDOUT) and standard
error (STDERR) to log files. The log files will provide a variety of information to indicate success or failure
of a give run.

Recommended command line:
/opt/perl/bin/perl –I /home/bby /home/bby/runall.pl 1>>/home/bby/runall.out
2>>/home/bby/runall.err

Once it is deemed that the system is stable, it may be desirable to replace the “>>” (append to end of file)
with “>” (replace file) to keep the logs a more manageable size.

Frequency:       once per day (every day of the year)
Time:            Sometime after midnight when system is essentially inactive.
Length of run: Less than 30 minutes. (Test runs on the Magnolia development server are averaging
less than 40 minutes. The production machine’s hardware is vastly superior to the development machine,
so ultimately the extract process may take less than 10 minutes to run.)


4.3   Error Notification
The Perl programs will use two means of error notification when an abnormal termination of the program
occurs:
    Email
    Program Log Files

In the case of any error, the programs will write the error message to the log files. The programs can also
send email messages to the mail addresses listed in the mail configuration file. This email will contain the
error message generated by the program. As stated earlier, email functionality may not be part of the
initial release of these programs.

4.4   Log Files
The extract process creates two basic log files. Most messages and will be generated to the standard out
(STDOUT) while a few may go into standard error (STDERR). These two logs are captured through the
command line that is submitted by the CRON scheduler.
The STDOUT log file contains information pertaining to the progress of the run. This is also the file where
any errors recognized by the application will be written.
                                                                             Indicates start of run. Lists
Example of a successful run:                                                 the date that the extract is
                                                                             running for and the actual
###############################################################
###Daily process begun for 20020906.      20020907 06:00:02                  date and time it began
####################################################
./extract.pl: Beginning Extracts: 20020907 06:00:03

Extracting   branch.info.   20020907 06:00:03
Extracting   dept.names.   20020907 06:00:04
Extracting   inv.brands.   20020907 06:00:04                                     Lists the datafiles extracted
Extracting   inventory.mast.   20020907 06:00:04                                 and the date time they
Extracting   payroll.master.   20020907 06:00:13
Extracting   product.types.   20020907 06:00:22                                  started.
Extracting   tp0102.   20020907 06:00:23
Extracting   tp0103.   20020907 06:00:37
Extracting   tp01n.   20020907 06:00:37
Extracting   tp0202.   20020907 06:11:36
Extracting   tp0203.   20020907 06:12:17
Extracting   tp02n.   20020907 06:12:17



Version 1.01                                                                            Page 17 of 61
                                                       Magnolia Data Warehouse Starter Kit
                                                                      Perl Extracts Manual



Extract Complete.   20020907 06:37:40
####################################################

####################################################
./finddiff.pl: Beginning Diff: 20020907 06:37:41

Comparing: /home/bby/files/source.inv.brands.20020906 <->
 /home/bby/files/source.inv.brands.20020905. 20020907 06:37:41
matched=441     add=0   change=0        delete=0

Comparing: /home/bby/files/source.branch.info.20020906 <->
                                                                                  Lists the files that
 /home/bby/files/source.branch.info.20020905. 20020907 06:37:42                   are having the
matched=51      add=0   change=0        delete=0                                  difference
                                                                                  program run
Comparing: /home/bby/files/source.product.types.20020906 <->
 /home/bby/files/source.product.types.20020905. 20020907 06:37:42                 against them.
matched=705     add=0   change=0        delete=0                                  Also provides a
                                                                                  simple report of
Comparing: /home/bby/files/source.dept.names.20020906 <->                         difference
 /home/bby/files/source.dept.names.20020905. 20020907 06:37:42
matched=57      add=0   change=0        delete=0                                  program results.

Comparing: /home/bby/files/source.inventory.mast.20020906 <->
 /home/bby/files/source.inventory.mast.20020905. 20020907 06:37:43
matched=2000    add=0   change=0        delete=0

Comparing: /home/bby/files/source.payroll.master.20020906 <->
 /home/bby/files/source.payroll.master.20020905. 20020907 06:37:45
matched=1450    add=0   change=0        delete=0


Diff Processing Complete.   20020907 06:37:48
####################################################                               FTP section
                                                                                   showing files
Begin FTP:    /home/bby/files/trans.detail.20020906. 20020907 06:37:48             transferred.
Complete FTP: /home/bby/files/trans.detail.20020906. 20020907 06:37:49
Begin FTP:    /home/bby/files/trans.header.20020906. 20020907 06:37:49
Complete FTP: /home/bby/files/trans.header.20020906. 20020907 06:37:49
Begin FTP:    /home/bby/files/delta.dept.names.20020906. 20020907 06:37:49
Complete FTP: /home/bby/files/delta.dept.names.20020906. 20020907 06:37:50
Begin FTP:    /home/bby/files/delta.inventory.mast.20020906. 20020907 06:37:50
Complete FTP: /home/bby/files/delta.inventory.mast.20020906. 20020907 06:37:50
Begin FTP:    /home/bby/files/delta.payroll.master.20020906. 20020907 06:37:50
Complete FTP: /home/bby/files/delta.payroll.master.20020906. 20020907 06:37:51
Begin FTP:    /home/bby/files/delta.inv.brands.20020906. 20020907 06:37:51
Complete FTP: /home/bby/files/delta.inv.brands.20020906. 20020907 06:37:51
Begin FTP:    /home/bby/files/delta.branch.info.20020906. 20020907 06:37:51
Complete FTP: /home/bby/files/delta.branch.info.20020906. 20020907 06:37:52
Begin FTP:    /home/bby/files/delta.product.types.20020906. 20020907 06:37:52
Complete FTP: /home/bby/files/delta.product.types.20020906. 20020907 06:37:52
Daily Extract Process Complete: 20020907 06:37:52
###############################################################                   Printed when
###############################################################                   entire process
                                                                                  has completed.




Version 1.01                                                                 Page 18 of 61
                                                           Magnolia Data Warehouse Starter Kit
                                                                          Perl Extracts Manual


Example of an unsuccessful run: (In this example, I ran the extract a second time for the same day.)


                                                                                           Error from
###############################################################                            extract.pl says a
###Daily process begun for 20020910.   20020911 08:10:47
####################################################
                                                                                           file it needs to
./extract.pl: Beginning Extracts: 20020911 08:10:49                                        create already
                                                                                           exists. Name of
Extracting branch.info.   20020911 08:10:49                                                file is on the next
ERROR: [./extract.pl]: 20020911 08:10:49: Output file already exists:
/home/bby/files/source.branch.info.20020910
                                                                                           line.
ERROR: [/home/bby/runall.pl]: 20020911 08:10:50: Execution failed. "./extract.pl" 65280

                                                                                        Error from runall.pl
                                                                                        indicates run failed

The STDERR log file will contain system errors and some Perl warnings. If an unusual problem occurs
while running the extracts, this log may contain information to assist in identifying the problem. Below is
an example where the Perl module tried to run the gzip utility, but the configuration file specified the
wrong location of the utility.

Can't exec "gzip": No such file or directory at ./finddiff.pl line 206.

This is another example of a warning from Perl that can appear in STDERR. This particular message
should not cause any concern and should be ignored.

substr outside of string at ./extract.pl line 190, <GEN7> line 6502.
Use of uninitialized value in sprintf at ./extract.pl line 212, <GEN7> line 6502.




4.5   Error Messages
The Perl programs will generate a wide variety of error messages when a problem occurs. The error
messages will provide the reason the program failed, the date and time it failed and the program in which
the error occurred. It will take a little more investigation to determine what caused the error to occur.
Review Appendix A for a complete list of error messages and possible causes.

4.6   Verification of run
The best way to verify that the run of the extract programs has been successful is to review the STDOUT
log. By looking at this log, it should be easy to determine whether:
 The program ran on the expected date
 The program ran to completion with no error messages
 The program extracted the expected datafiles
 The program successfully compared the expected datafiles
 The program successfully FTP’d the delta and transaction files to the expected location

See example and explanation of logs under Log Files.

4.7   Purge of files
The runall.pl program has been programmed to delete all files generated as part of the extract process. It
bases this delete on the date (YYYYMMDD) embedded in the file names, not the date the file was
actually created. Based on the directories and retention period placed in the runall.config, the program
will attempt to delete any records within the directory where the embedded date matches the calculated
date (based on retention length). The retention period has been initially set at 14 days.



Version 1.01                                                                          Page 19 of 61
               Magnolia Data Warehouse Starter Kit
                              Perl Extracts Manual




Version 1.01                         Page 20 of 61
                                                           Magnolia Data Warehouse Starter Kit
                                                                          Perl Extracts Manual




5 Maintenance of Extract Process

5.1    Abnormally Terminated Process
Situation                 Action
Process stopped            Determine and correct cause of failure. During testing, the only
during extract phase of       time this happened was due to incorrect path names in the
process                       configuration file or due to the existence of a file the program
                              wanted to create.
                           At this point in the process, it is probably just as easy to restart the
                              whole process.
                          To Restart in this scenario:
                           Go to directory where source and transaction files are created.
                              Delete or move any files that were created by the aborted run.
                              These should have the previous day’s date embedded in the file
                              name. (to delete them, there is a little utility name cleanup.pl that
                              can be used)
                           Go to directory with runall.pl
                           Type runall.pl 1>>runall.out 2>>runall.err
                           This will start the entire process over again. Watch the runall.out
                              file for progress of the job. (tail –f runall.out)
Process stopped            Determine and correct cause of failure. During testing, the only
during difference             time this happened was due to incorrect path names in the
phase of processing           configuration file, due to the existence of a file the program wanted
                              to create or when there was no prior day’s source file.
                          To Restart in this scenario:
                           Go to directory where delta files are created. Delete or move any
                              delta files that were created by the aborted run. These should have
                              the previous day’s date embedded in the file name. (to delete them,
                              there is a little utility name cleanup.pl that can be used)
                           Go to directory with finddiff.pl and runall.pl
                           Type finddiff.pl 1>>runall.out 2>>runall.err
                           This will re-run the difference process for all dimension files. Check
                              the runall.out log to verify process was successful. If successful,
                              continue; otherwise, back to the determining what is causing the
                              problem.
                           Type runall.pl –s y 1>>runall.out 2>>runall.err
                           This will tell the program to FTP the files, without re-running the
                              extract and difference processes.
FTP of files failed        If the process just failed on the FTP, the FTP can be reprocessed
                              once the original problem is fixed (network down, change of login,
                              etc.)
                           To reprocess FTP, login to production machine and change to the
                              directory with the runall.pl
                           Enter: runall.pl –s y 1>>runall.out 2>>runall.err
                           This will tell the runall.pl to only attempt to ftp the files without re-
                              running the extracts or differences.




Version 1.01                                                                          Page 21 of 61
                                                        Magnolia Data Warehouse Starter Kit
                                                                       Perl Extracts Manual



5.2    Update of Datafile
Tasks                 Action
extract.config         Update entries for datafile in appropriate sections of configuration
                          file. Pay special attention to the file and field attributes sections.
                          These attributes control how the datafile will be read and parsed.
                       Run tests in development verifying results.
finddiff.config        If this is a dimensional table, this file will only require update if the
                          length of the key was changed


5.3    Addition of New Datafiles
Tasks                 Action
extract.config         Add entries for datafile to all three sections of configuration file.
                          Pay special attention to the file and field attributes sections. These
                          attributes control how the datafile will be read and parsed.
                       Run tests in development verifying results.
finddiff.config        If this is a dimensional table, it should be added to this file. Enter
                          items as instructed in configuration file header.
                       For the initial run of this file, a dummy file must be created to
                          simulate the previous day’s file. Typically, this will be an empty file.
                          This will be necessary in test as well as in production upon the first
                          run of the process.
runall.config          Add entries to this configuration file to configure FTP for the file.
                       Standard naming convention for the target file name is <target
                          staging table name><YYYYMMMDD>.dat


5.4    Change of FTP site
Item                  Action
Host                  Update runall.config with new host name or IP address
Userid                Update runall.config with new userid
Password              Update runall.config with new password
Target Directory      Update runall.config with new directory name

5.5    Change of File Names
Type of file          Action
Datafile              Update the extract.config file with the new filename of the datafiles
Generated Files
  Transaction Files   Update extract.config and runall.config files with new names
  Dimension Files     Update extract.config, finddiff.config and runall.config files with the new
                      names
   Dimension Delta    Update finddiff.config and runall.config files with the new names.
   Files
   FTP Target Files   Update runall.config with the new names.




Version 1.01                                                                        Page 22 of 61
                                                               Magnolia Data Warehouse Starter Kit
                                                                              Perl Extracts Manual



5.6   Change of File Locations
Type of file              Action
Datafile                  Update the extract.config file with the new file location(s) of the datafiles
Generated Files
  Transaction Files       Update extract.config and runall.config files with new locations
  Dimension Source        Update extract.config and finddiff.config files with the new locations
  Files
  Dimension Delta         Update finddiff.config and runall.config files with the new locations
  Files
  FTP Target Files        Update runall.config with the new names of the file when sent to the
                          target FTP server.


5.7   Rerunning Extract for Missed Transactions
The extract process is dependent on a manual processed kicked off by the warehouse staff at Magnolia.
There is a commitment to have this process completed by 6:30 a.m., but in the first week of extracting, we
have had one occurrence where the job did not complete by this time. The following instructions should
be followed if it is determined that the records need to be loaded to the data warehouse. The process is
entirely manually driven.

The example is rerunning files for 10/11. I reran the extract portion on 10/13. The dates on the files are
important in getting this to run.

Step                   Action
1. Re-extract          Re-extract transaction files using "extract.rerun" for the configuration file for the
transactions for day   day. The “extract.rerun” configuration will prevent the extract from over writing
                       any current day files.
                       extract.pl   -c   extract.rerun   -e   tp01n -d 20021011 -q 20021012
                       extract.pl   -c   extract.rerun   -e   tp0102 -d 20021011 -q 20021012
                       extract.pl   -c   extract.rerun   -e   tp0103 -d 20021011 -q 20021012
                       extract.pl   -c   extract.rerun   -e   tp02n -d 20021011 -q 20021012
                       extract.pl   -c   extract.rerun   -e   tp0202 -d 20021011 -q 20021012
                       extract.pl   -c   extract.rerun   -e   tp0203 -d 20021011 -q 20021012
2. Combine             Combine the region header files into one file.
transaction header     cat trans.header.n.rerun.20021013 trans.header.02.rerun.20021013
                       trans.header.03.rerun.20021013 > trans.header.rerun.20021011
files
3. Combined            Combine the region detail files into one file.
transaction detail     cat trans.detail.n.rerun.20021013 trans.detail.02.rerun.20021013
                       trans.detail.03.rerun.20021013 > trans.detail.rerun.20021011
files
4. Locate original     Check that the original extract files for the day in question are available. They
transaction files      should be located at /home/bbydev/files. If the files have been zipped (end with
                       .gz), use “gzip –d” to unzip them
                       In this example the files to look for are trans.header.20021011 and
                       trans.detail.20021011
5. Find differences    Use the program finddifftrans.pl to identify the differences between the original
                       and newly extracted files. In the command, the –n is the new file while the –o is
                       the original file.

                       finddifftrans.pl -e trans.header -n files/trans.header.rerun.20021011 –o
                       files/trans.header.20021011

                       finddifftrans.pl -e trans.detail -n files/trans.detail.rerun.20021011 -o
                       files/trans.detail.20021011




Version 1.01                                                                            Page 23 of 61
                                                       Magnolia Data Warehouse Starter Kit
                                                                      Perl Extracts Manual


6. Review          I allowed the program to determine if records were changes or adds. ETL does
differences        not know how to handle changes for transactions, so you probably will need to
                   remove these from the files before sending to ETL. The changes could be
                   updated manually in the data warehouse if need be.

                   The changes all end with a "C" at the end of the record. So, to remove them
                   from the files, go to the files directory and use the following commands to create
                   a new file of only add records.

                   grep -v "C$" delta.trans.detail.rerun.20021013 >
                   trans.detail.rerun.adds.20021011

                   grep -v "C$" delta.trans.header.rerun.20021013 >
                   trans.header.rerun.adds.20021011



7. Send files to   Work with ETL to determine where to place these records. In the example that
ETL                this is based upon, the transactions were combined with the current day’s
                   records and processed all together by ETL. In production, this sounds as if it
                   will be the most likely scenario as well. Special attention to the current day’s
                   run will be necessary to prevent the process from sending the day’s file to ETL
                   prior to being able to add the records.




Version 1.01                                                                     Page 24 of 61
                                                             Magnolia Data Warehouse Starter Kit
                                                                            Perl Extracts Manual




Appendix
Error Messages
Each of the programs uses a common format of logging errors. The application error messages are logged to STDOUT.

The format is:
ERROR: [<program name>]: <Date Time of Error>: <Error message>

Example:
ERROR: [./extract.pl]: 20020906 12:00:02: Output file already exists: /home/bby/files/source.branch.info.20020905

Program      Error Message                                        Action
All          ERROR: [<program name>]: <date time>: Error          Attempt to send email failed. Review messages and logs to try and determine cause
             sending email(1): <Msg>                              of error. Note, this message will only appear in logs as it is highlighting a problem
                                                                  with email.
All          ERROR: [<program name>]: <date time>: Error          Attempt to send email failed. This version of the message occurs when “copy”
             sending email(2): <Msg>                              recipients are sent to the function. Review messages and logs to try and determine
                                                                  cause of error. Note, this message will only appear in logs as it is highlighting a
                                                                  problem with email.
extract.pl   Bad Usage                                            An invalid command line parameter has been passed to the program.
                                                                  Verify command line parameters. Check documentation for valid command line
                                                                  parameters.
extract.pl   Configuration file does not exist: <configuration    Program was unable to open the listed configuration file. Verify configuration file
             file>                                                exists in the expected directory.
extract.pl   Could not open input file. <file name>               File name listed in error message could not be found.
                                                                  Verify directory and file name is correct in configuration file.
extract.pl   Could not open output file. <file name>              The extract file name listed could not be created and opened.
                                                                  Verify file name is a valid file name.
                                                                  Look for additional messages in log as to why file could not be opened.
extract.pl   Datafile requested not defined in config: <datafile> Datafile specified on command line is not in configuration file. Verify in configuration
                                                                  file the name of the datafile. Add datafile attributes if necessary.




Version 1.01                                                                           Page 25 of 61
                                                                Magnolia Data Warehouse Starter Kit
                                                                               Perl Extracts Manual



Program       Error Message                                          Action
extract.pl    Date must be in YYYYMMDD format. <date>                Date passed in on command line for transaction extract is not in an expected format.
                                                                     Verify date is in correct format and not prior to 1995/01/01.
extract.pl    Output file already exists: <file name>                The extract file name listed already exists.
                                                                     Verify same file name is not used multiple times in configuration file.
                                                                     If this is a restart of the program, you must delete the extract files for the day before
                                                                     re-running them.
                                                                     Verify through log files that cron has not accidentally run the extracts twice in a day.
                                                                     Verify that the cron has not changed. The programs expect to be run after midnight
                                                                     and then extract the previous day's data.
finddiff.pl   Bad Usage                                              An invalid command line parameter has been passed to the program.
                                                                     Verify command line parameters. Check documentation for valid command line
                                                                     parameters.
finddiff.pl   Configuration file does not exist: <configuration      Program was unable to open the listed configuration file. Verify configuration file
              file>                                                  exists in the expected directory.
finddiff.pl   Could not open input file. <file name>                 Verify file exists. If it does not, determine why. (ie. Did extract.pl successfully
                                                                     complete?, is file name in configuration file correct?)
finddiff.pl   Datafile requested not defined in config file: <file   Datafile specified on command line is not in configuration file. Verify in configuration
              name>                                                  file the name of the datafile. Add datafile attributes if necessary.
finddiff.pl   Input file is empty . <file name>                      The file listed has no records in it. The difference program expects there to be
                                                                     records in each dimension table extract each night. If the file is empty, look for
                                                                     messages from the extract program to try and determine why the file might be empty.
                                                                     Re-run extract to create file correctly.
finddiff.pl   Input file not found. <file name>                      The file specified in the error message does not exist.
                                                                     Verify whether file exists. If not, verify if extract successfully competed. Verify default
                                                                     files in extract configuration file are the same as the difference configuration file.
finddiff.pl   Logic error - array handling                           If message is ever hit, there is something significantly wrong in the Perl program. Get
                                                                     a developer to look at the issue.
finddiff.pl   Logic Error checking keys                              If message is ever hit, there is something significantly wrong in the Perl program. Get
                                                                     a developer to look at the issue.
finddiff.pl   Sort of <file name> failed. <OS error>                 The system sort utility failed in attempting to sort the specified file. Look in the logs
                                                                     for additional messages.




Version 1.01                                                                              Page 26 of 61
                                                                   Magnolia Data Warehouse Starter Kit
                                                                                  Perl Extracts Manual



Program       Error Message                                            Action
finddiff.pl   Unable to write output file. <file name>                 The difference file name listed could not be created and opened.
                                                                       Verify file name is a valid file name.
                                                                       Look for additional messages in log as to why file could not be opened.
runall.pl     Bad Usage                                                An invalid command line parameter has been passed to the program.
                                                                       Verify command line parameters. Check documentation for valid command line
                                                                       parameters.
runall.pl     Cannot cd to <directory name>.                           Program is attempting to change to the directory.
                                                                       Verify directory exists. If so, problem is likely in program.
runall.pl     Configuration file does not exist: <configuration        Program was unable to open the listed configuration file. Verify configuration file
              file>                                                    exists in the expected directory.
runall.pl     Could not delete <file list>                             Program could not delete listed file(s) during purge processing. Determine if file can
                                                                       be manually deleted and remove manually.
runall.pl     Could not open input file. <file name>                   Verify file exists. If it does not, determine why. (ie. Did extract.pl successfully
                                                                       complete?)
runall.pl     Date must be in YYYYMMDD format. <date>                  Date passed in on command line for transaction extract is not in an expected format.
                                                                       Verify date is in correct format and not prior to 1995/01/01.
runall.pl     Error sending file: <source file> -> <target file>       Look for additional messages in STDOUT and STDERR. Verify source file exists.
                                                                       Verify Target file is valid name. Verify there is space on target machine.
runall.pl     Execution failed. "<command submitted>" <OS              This is typically a secondary message that indicates either extract.pl or finddiff.pl has
              return code>                                             failed. Look for more specific message from those programs.
runall.pl     Failed to concatenate files:<command                     Program could not concatenate files listed. Look for a warning message from system
              submitted>:<OS return code>                              in STDOUT and STDERR.
                                                                       Verify files in command exist.
                                                                       Verify there is enough disk space to create file.
runall.pl     File for concat does not exist: <file name>              Verify extract.pl has successfully completed.
                                                                       Verify "concat" in runall.config is correct.
runall.pl     Size validation error on ftp of file <source file>       Size of file FTP'd is not the same on source machine as target. Verify source and
                                                                       target sizes. Verify runall.pl is doing a binary transfer. If new target machine, verify
                                                                       that file sizes are computed in the same manner.
runall.pl     Unable to change to ftp directory: <directory>           Verify directory is correct for target server. If not, update runall.config with the
                                                                       appropriate directory.




Version 1.01                                                                                Page 27 of 61
                                                           Magnolia Data Warehouse Starter Kit
                                                                          Perl Extracts Manual



Program     Error Message                                      Action
runall.pl   Unable to connect to ftp host <host name>          Verify ftp is working correctly for the host using the information in the runall.config. If
                                                               all is working correctly, attempt resend of files using "runall.pl -s y" option.

runall.pl   Unable to create ok file: <target>                 Program is not able to create "ok file" on target server. Verify "okfile" attribute is
                                                               correct in runall.config. Verify dummy ok file exists on source machine.

runall.pl   Unable to login to ftp host: userid=<userid>       Verify ftp is working correctly for the host using the information in the runall.config. If
            passwd=<password>                                  all is working correctly, attempt resend of files using "runall.pl -s y" option.




Version 1.01                                                                        Page 28 of 61
                                               Magnolia Data Warehouse Starter Kit
                                                              Perl Extracts Manual


Source Code

1. runall.pl
#!/opt/perl/bin/perl -w
##########################################################################
# Magnolia Data Warehouse Extract
#
# program: runall.pl
# purpose: This program controls the high level processing for the overall
#           extract process. After all files have been created, the program
#           will ftp them to the location from the configuration file.
#           As a final step, the program will purge old working files.
#
# required: runall.config - file containing configuration information
#           for each file to be sent. This file will contain the IP and
#           userid information necessary to send the files.
#
# Maintenance History:
# 8/27/02, Bruce Linn, Incline Technologies.
#
##########################################################################

use strict;
use IO::File;
use Time::localtime;
use Getopt::Std;
use Net::FTP;
use ExtractCommon;
autoflush STDOUT 1;

print "###############################################################\n";
print "###Daily process begun for " . getDate("","-1"). ".   " . getDate("time")."\n";

# change base directory to directory program was run from
my $dir = $0;
$dir =~ s/runall\.pl//;
if ($dir) {
    chdir $dir or rptError("Cannot cd to $dir.");
}

# Useful command line options
# -c <file> = allows user to pass in name of config file to use
# -d <YYYYMMDD> = Allows pass in of a start date for the extract process
# -q <YYYYMMDD> = Allows pass in of an end date for the extract process
# -r <#> = Allows limit on number of records extracted per datafile
# -s y = Only attempt ftp of all files.
# -x y = Allow create of "delete records" by diff program (default is off)
# -p y = run purge only
my %opts;
getopt('cdqrsx',\%opts) or rptError("Bad Usage");
my $configFile    = $opts{c};
my $startDate     = $opts{d};
my $endDate       = $opts{q};
my $recLimit      = $opts{r};
my $sendOnly      = $opts{s};
my $createDelete = $opts{x};
my $purgeOnly     = $opts{p};

# perform some rudimentary checks that the date is a date
if ($startDate) {




                                                                     Page 29 of 61
                                               Magnolia Data Warehouse Starter Kit
                                                              Perl Extracts Manual

    checkDate($startDate)
}
if ($endDate) {
    checkDate($endDate)
}

# Load configuration file information info config hash
my $configRef = loadConfig("./runall.config",$configFile);
my %config = %$configRef;

# if "sendonly" flag set or "purgeonly" flag, skip extract and diff steps
if ((!$sendOnly || $sendOnly eq "n") && !$purgeOnly) {

    ## Run all extracts
    my $cmd = $config{"command"}{"extract"}{"cmd"};

    # Add other flags if they are passed in
    $cmd .= " -d $startDate" if $startDate;
    $cmd .= " -q $endDate" if $endDate;
    $cmd .= " -r $recLimit" if $recLimit;

    system($cmd) == 0 or
             rptError("Execution failed. \"$cmd\" $?");

    ## Run all file compares

    $cmd = $config{"command"}{"diff"}{"cmd"};
    $cmd .= " -x $createDelete" if $createDelete;

    system($cmd) == 0 or
             rptError("Execution failed. \"$cmd\" $?");
}

## FTP files

# ftp the files unless the purgeonly flag is set.
ftpFiles(%config) unless $purgeOnly;

cleanupFiles(%config);

print "Daily Extract Process Complete: " . getDate("time") . "\n";
print "###############################################################\n";
print "###############################################################\n";

sub ftpFiles {
    my %config = @_;

    my %concat;

    # Process through files that need concatenation. At this point,
    # there is a need to combine the transaction files from the various
    # regions into one file.
    for my $datafile (keys %{ $config{"ftp"} }) {
        next unless $config{"ftp"}{$datafile}{"concat"};

       # Append the date to the source file names in hash
       $concat{ $config{"ftp"}{$datafile}{"concat"} }
              { $config{"ftp"}{$datafile}{"source"} } =
              $config{"ftp"}{$datafile}{"source"} .".". getDate("","-1");

       # Replace wild card with date in ftp target file names
       $concat{ $config{"ftp"}{$datafile}{"concat"} }{"target"} =
           nameReplace($config{"ftp"}{$datafile}{"target"});



                                                                     Page 30 of 61
                                                  Magnolia Data Warehouse Starter Kit
                                                                 Perl Extracts Manual


        $concat{ $config{"ftp"}{$datafile}{"concat"} }{"ok"} =
            nameReplace($config{"ftp"}{$datafile}{"ok"});
    }


    # Create concatenation commands and execute
    for my $newfile (keys %concat) {

        my $list = "";
        for my $source (keys %{ $concat{$newfile} }) {
            next if $source eq "target";
            next if $source eq "ok";

            # Verify file to concatenate exists.
            rptError("File for concat does not exist: " .
                "$concat{$newfile}{$source}")
                unless -e $concat{$newfile}{$source};

            # add file to list of files to concatenate
            $list .= $concat{$newfile}{$source} . " ";
        }

        # Create cat command.
        my $cmd = "cat " . $list . " > " . $newfile .".". getDate("","-1");

        # Excute cat command
        system($cmd) == 0
            or rptError("Failed to concatenate files:\n$cmd:\n$?");

        # FTP file to target server.
        sendFile( $newfile .".". getDate("","-1"),
                  $concat{$newfile}{"target"},
                  $concat{$newfile}{"ok"},
                  %config);

        # zip up files from previous day
        $cmd = $config{"command"}{"gzip"}{"cmd"} ." -f     ".
                   $newfile .".". getDate("","-2") ;

        system($cmd);
    }

    # Process through files that do not need concatenation
    for my $datafile (keys %{ $config{"ftp"} }) {
        next if $config{"ftp"}{$datafile}{"concat"};

        sendFile($config{"ftp"}{$datafile}{"source"} .".". getDate("","-1"),
                 nameReplace($config{"ftp"}{$datafile}{"target"}),
                 nameReplace($config{"ftp"}{$datafile}{"ok"}),
                 %config);

        # zip up files from previous day
        my $cmd = $config{"command"}{"gzip"}{"cmd"} ." -f ".
                  $config{"ftp"}{$datafile}{"source"} ."." . getDate("","-2");
        system($cmd);
    }
}

########################################################################
#
# Makes connection and sends (FTP) a file to desired location
#



                                                                        Page 31 of 61
                                                Magnolia Data Warehouse Starter Kit
                                                               Perl Extracts Manual

# Input: source - Name of file to send
#         target - Name of file when placed on target system
#         config - Hash containing configuration information including
#                  the machine, directory, userid, & password that the
#                  program should ftp to.
#
########################################################################
sub sendFile {
    my ($source,
        $target,
        $ok,
        %config) = @_;


    print "Begin FTP:    $source.   " . getDate("time") . "\n";

    my $ftp = Net::FTP->new($config{"config"}{"target"}{"host"}, Debug => 0) or
         rptError("Unable to connect to ftp host: " .
             $config{"config"}{"target"}{"host"});

    $ftp->login($config{"config"}{"target"}{"userid"},
                  $config{"config"}{"target"}{"passwd"}) or
              rptError("Unable to login to ftp host:" .
                   " userid=".$config{"config"}{"target"}{"userid"} .
                   " passwd=".$config{"config"}{"target"}{"passwd"});

    $ftp->cwd($config{"config"}{"target"}{"directory"}) or
             rptError("Unable to change to ftp directory:" .
                  $config{"config"}{"target"}{"directory"});
    $ftp->binary;

    $ftp->put($source,$target) or
              rptError("Error sending file: $source -> $target");
    my $sizeTarget = $ftp->size($target);

    # Verify, through file sizes, that the file transported correctly.    This
    # is only effective if using a binary transfer.
    my $sizeSource = -s $source;
    if ($sizeSource == $sizeTarget) {
        my $oktarget = "";
        if ($ok) {
           $oktarget = $ok;
        } else {
           ($oktarget = $target) =~ s/\.dat$/\.ok/;
        }

        $ftp->put($config{"config"}{"target"}{"okfile"}, $oktarget) or
            rptError("Unable to create ok file: $target");
    } else {
        rptError("Size validation error on ftp of file $source." .
            "\nSource size=$sizeSource, Target size=$sizeTarget");
    }

    $ftp->quit;
    print "Complete FTP: $source.   " . getDate("time") . "\n";

}


########################################################################
# Replaces "wildcard" strings in file names from configuration info.
# At this point the only replacement is for the current date
#



                                                                      Page 32 of 61
                                                 Magnolia Data Warehouse Starter Kit
                                                                Perl Extracts Manual

# Input: filename
# Output: filename with replacements.
########################################################################
sub nameReplace {
    my $file = shift;

    # Get today's date
    my $today = getDate("","-1");

    # Search string for "<YYYYMMDD>" replacing it with today's date
    $file =~ s/<YYYYMMDD>/$today/;

    return $file
}

sub cleanupFiles {
    my %config = @_;

    for my $dir ( keys % { $config{"purge"} }) {

        my $purgeDate = getDate("",   $config{"purge"}{$dir}{"retain"} * -1);

        opendir(my $purgedir, $dir);
        my @filelist = grep {/$purgeDate/} map {"$dir$_"} readdir($purgedir);
        closedir $purgedir;

        if ($#filelist > 0) {
            my @failed = grep {not unlink} @filelist;
            rptError("Could not delete @failed\n") if @failed;
        }
    }
}




                                                                       Page 33 of 61
                                               Magnolia Data Warehouse Starter Kit
                                                              Perl Extracts Manual


2. extract.pl
#!/opt/perl/bin/perl -w
##########################################################################
# Magnolia Data Warehouse Extract
#
# program: extract.pl
# purpose: This program will extract the the necessary data from the
#           raw Tyler system datafiles that will ultimately be passed to
#           the BBY data warehouse. The program will extract data from
#           both lookup tables as well as transaction tables. All data
#           from the lookup tables will be extracted each day. A separate
#           program will determine if the are any changes in these files.
#           The transaction files will be pulled based on date. Typically,
#           only records for the previous calendar day will be extracted.
#
# required: extract.config (or some other specified config file - file
#           containing configuration information
#           for each file to be processed. This file contains data dictionary
#           like information to describe the Tyler system datafiles.
#           This program then uses this information to complete the extract.
#           It should be possible to begin new datafile extracts with
#           only modifying this configuration file.
#           This configuration file should be stored in the same location
#           as the program file.
#
# Maintenance History:
# 8/15/02, Bruce Linn, Incline Technologies.
#
##########################################################################


use   strict;
use   IO::File;
use   Getopt::Std;
use   Time::localtime;
use   ExtractCommon;

autoflush STDOUT 1;

print "####################################################\n";
print "$0: Beginning Extracts: " . getDate("time") . "\n\n";

# For testing (and potentially re-extract of file), allow user to
# pass in the datafile they wish to extract (i.e. branch.info).
# Also allow means to limit data returned (for testing).
# Provide means of passing starting date for transaction file records.
my %opts;
getopt('cderq',\%opts) or rptError("Bad Usage");
my $startDate   = $opts{d};
my $endDate     = $opts{q};
my $oneDatafile = $opts{e};
my $recLimit    = $opts{r};
my $configFile = $opts{c};

# perform some rudimentary checks that the date is a date
if ($startDate) {
    checkDate($startDate)
}
if ($endDate) {
    checkDate($endDate)
}



                                                                     Page 34 of 61
                                                  Magnolia Data Warehouse Starter Kit
                                                                 Perl Extracts Manual


# Load configuration file information info config hash
my $configRef = loadConfig("./extract.config",$configFile);
my %config = %$configRef;

# if a datafile was passed in on the command line, process only that
# datafile, otherwise process all datafiles.

if ($oneDatafile) {
   extract($oneDatafile, $startDate, $endDate, %config);

} else {

    for my $datafile ( sort keys %{ $config{"datafiles"} }) {
       if ($config{"datafiles"}{$datafile}{"default"} eq "y") {
          extract($datafile, $startDate, $endDate, %config);
       }
    }
}

print "\nExtract Complete.   " . getDate("time") . "\n";
print "####################################################\n\n";

sub extract {

    my ($datafile,
        $startDate,
        $endDate,
        %config) = @_;

    my $count = 0;

    # Verify that this datafile is defined in configuration information.
    if (!defined $config{$datafile}{"config"}{"datafile"} ) {
       rptError("Datafile requested not defined in config: $datafile");
    }

    # if this is a transaction type file, determine the base date to use when
    # extracting records. If a date is passed in on the command line, use
    # that date, otherwise use the previous day.
    # Also determine the limit (or end date). If it is passed in, use that one
    # otherwise use the current system date.
    my $baseDate;
    my $limitDate;
    if ($config{$datafile}{"config"}{"type"}) {
       if ($startDate) {
          $baseDate = $startDate;
       } else {
          $baseDate = getDate("","-1");
       }

        if ($endDate) {
           $limitDate = $endDate;
        } else {
           $limitDate = getDate();
        }

    }

    print "Extracting $datafile.     " . getDate("time") . "\n";

    # Open Tyler system datafile for reading.
    my $IFILE = $config{$datafile}{"config"}{"datafile"};



                                                                        Page 35 of 61
                                            Magnolia Data Warehouse Starter Kit
                                                           Perl Extracts Manual

my $if = IO::File->new($IFILE,"r")
   or rptError("Could not open input file. $IFILE");

# Assemble output file names from configuration information.
my $OFILE = $config{$datafile}{"config"}{"outFile"} . getDate("","-1");

# First verify that it doesn't already exists.
rptError("Output file already exists: $OFILE") if -e $OFILE;

# Open file to write extracted records to.
my $of = IO::File->new(">$OFILE")
   or rptError("Could not open output file. $OFILE");

# Process through records in Tyler system datafile
while (my $line = <$if>) {

  chomp($line);

  # Because there may be multiple records on each line, process through       #
  # Each individually.                                                        #

  while ($line) {

     my @outLine = "";

     # Replace any remaining non-printable characters from line.
     # Leave the nulls as record separators.
     $line =~ tr/ -~\0/ /c;

     # Read one record embedded in the line and remove it from the front
     # of the line.
     my $piece =
        substr($line,0,$config{$datafile}{"config"}{"recLength"},'');
     my $pieceLength = length($piece);

     # If the next character on the line is a null, this indicates that the
     # record has been logically deleted. Therefore skip the current
     # record.
     if ($line =~ s/^\0//) {
         next;
     }


     # Verify that the piece being processed is not all blank.
     my $checkline =
           substr($piece,
              $config{$datafile}{"config"}{"keyOffset"},
              $config{$datafile}{"config"}{"recLength"} );

     # if the piece only contains white space, skip it.
     if ($checkline =~ /^\s+$/) { next; }


     # if this is a transaction datafile, only pull records from the
     # base date and newer (base date determined at beginning of routine).
     if ($config{$datafile}{"config"}{"type"}) {
         if ($config{$datafile}{"config"}{"type"} eq "trans") {
             my $dateKey = substr($piece,0,8);

             # Skip record if it is too old or from the end date.
             if ($dateKey < $baseDate || $dateKey >= $limitDate ) {
                 next;
             }



                                                                   Page 36 of 61
                                           Magnolia Data Warehouse Starter Kit
                                                          Perl Extracts Manual

       }
   }

   $count++;

   # If record limiter has been passed, see if the limit
   # has been hit. If so, leave loop.
   if ($recLimit) {
       if ($count > $recLimit) { last; }
   }

   # Split out the record into its fields by processing through
   # the data dictionary information from the configuration file.
   for my $field (keys %{ $config{$datafile} } ) {
       next unless ($field ne "config");

       my $value = "";

       # make sure record is long enough to handle the substring
       if ($config{$datafile}{"config"}{"keyOffset"}
           + $config{$datafile}{$field}{"start"}
           <= $pieceLength) {

             $value =
                 substr($piece,
                    ($config{$datafile}{"config"}{"keyOffset"}
                      + $config{$datafile}{$field}{"start"}),
                      $config{$datafile}{$field}{"length"} );
       }

       # For fields specified as number ("n"), strip out the decimal
       # point. If no decimal is found reduce field by removing first
       # space. This is to handle the fact the Tyler raw datafiles
       # don't default numbers to zero. They are often blank when no
       # value is entered.
       if ($config{$datafile}{$field}{"type"} eq "n") {
           if ($value =~ s/\.//) {
           } else {
               $value =~s/\s//;
           }
       }

       #   place the value of the field in an array at the array index
       #   corresponding to the "output" sequence defined in the
       #   configuration file. Use the "format" defined in the
       #   configuration file to format the sprintf for the field.

       $outLine[$config{$datafile}{$field}{"output"}] =
          sprintf($config{$datafile}{$field}{"format"}, $value);

   }

   #Put the fields in the sequence needed for output
   my $outFinal = "";
   for my $i (0 .. $#outLine) {
       $outFinal .= $outLine[$i];
   }

   # Write the record to the extract file.
   print $of $outFinal . "\n";
}
# If record limiter has been passed, see if the limit
# has been hit. If so, leave loop.



                                                                 Page 37 of 61
                                                   Magnolia Data Warehouse Starter Kit
                                                                  Perl Extracts Manual

      if ($recLimit) {
          if ($count > $recLimit) { last; }
      }
    }
    # Close the Tyler datafile and extract file.
    $if->close;
    $of->close;

}




                                                                         Page 38 of 61
                                                 Magnolia Data Warehouse Starter Kit
                                                                Perl Extracts Manual


3. finddiff.pl
#!/opt/perl/bin/perl -w
##########################################################################
# Magnolia Data Warehouse Extract
#
# program: finddiff.pl
# purpose: This program will determine the changes from day to day between
#           the extract files created from the Tyler system datafiles.
#           This program will be used only on the look up files as the
#           transaction files will be pulled by date.
#
# required: finddiff.config - file containing configuration information
#           for each file to be processed. The file should be stored in the
#           same location as the program file.
#
# Maintenance History:
# 8/22/02, Bruce Linn, Incline Technologies.
#
##########################################################################

use   strict;
use   IO::File;
use   Getopt::Std;
use   Time::localtime;
use   ExtractCommon;

autoflush STDOUT 1;

print "####################################################\n";
print "$0: Beginning Diff: " . getDate("time") . "\n\n";

# For testing (and potentially re-extract of file), allow user to
# pass in the datafile they wish to extract (i.e. branch.info).
my %opts;
getopt('cex',\%opts) or rptError("Bad Usage.");
my $oneDatafile = $opts{e};
my $createDiffs = $opts{x};
my $configFile = $opts{c};

# Load configuration file information info config hash
my $configRef = loadConfig("./finddiff.config",$configFile);
my %config = %$configRef;

# if a datafile was passed in on the command line, process only that
# datafile, otherwise process all datafiles.

if ($oneDatafile) {
    findDiffs($oneDatafile, %config);

} else {

      # process through all datafiles loaded into the config hash
      for my $datafile ( keys %{ $config{"datafiles"} } ) {
          # Only process those with the default flag set to "y"
          if ($config{"datafiles"}{$datafile}{"default"} eq "y") {
             findDiffs($datafile, %config);
          }
      }
}

print "\nDiff Processing Complete.    " . getDate("time") . "\n";



                                                                       Page 39 of 61
                                               Magnolia Data Warehouse Starter Kit
                                                              Perl Extracts Manual

print "####################################################\n\n";

sub findDiffs {
    my ($datafile,
        %config)   = @_;

    # Define the record type codes to be passed on the change records
    # NOTE: AT THIS TIME, NO DELETE RECORDS WILL BE SENT! DUE TO
    # THE WAY THE TYLER SYSTEM WORKS, IT SHOULD NOT CREATE DELETES.
    my $addCode    = "A";
    my $changeCode = "C";
    my $deleteCode = "D";

    # Verify datafile passed in has configuration information
    if (!defined $config{"datafiles"}{$datafile} ) {
        rptError("Datafile requested not defined in config file: $datafile");
    }

    # Load files to compare in arrays. This is ok since the size of the files
    # to be compared is quite small. If larger files are processed in the
    # future, another means may need to be created.

    # First element will contain key, second element will contain data.
    my ($newRef, $oldRef) = loadarrays( $datafile, %config);

    # Create the file that will contain the changes to be passed to data
    # warehouse. Base name is loaded from configuration information.
    my $deltaFile = $config{"datafiles"}{$datafile}{"output"}."."
                       . getDate("","-1");
    my $deltaf = IO::File->new(">$deltaFile") or
             rptError("Unable to write output file. $deltaFile");

    # Assign array references to local arrays to simplify coding.
    my @newArray = @$newRef;
    my @oldArray = @$oldRef;

    # flag to indicate when at end of new array, 1=end of array, 0=not at end
    my $newEOF   = 0;
    # flag to indicate when at end of old array, 1=end of array, 0=not at end
    my $oldEOF   = 0;
    my $newIndex = 0;                          # index for processing new
    my $oldIndex = 0;                          # index for processing old
    my $newMax   = $#newArray;                 # Number of new array elements
    my $oldMax   = $#oldArray;                 # Number of old array element
    my ($match, $add, $change, $delete) = (0,0,0,0); # Counters for reporting

    # if either array has no elements, set end of array flag for that array.
    if ($newMax < 0) { $newEOF++; }
    if ($oldMax < 0) { $oldEOF++; }

    # Process while there are records in either array.
    while (!$newEOF || !$oldEOF) {

       # Records exist in both arrays.
       if (!$newEOF && !$oldEOF) {

           # Key fields match
           if ($newArray[$newIndex][0] eq $oldArray[$oldIndex][0]) {

               # Data portion matches, no delta record needed
               if ($newArray[$newIndex][1] eq $oldArray[$oldIndex][1]) {
                   $match++;




                                                                     Page 40 of 61
                                         Magnolia Data Warehouse Starter Kit
                                                        Perl Extracts Manual

          } else {
              # Data portion does not match, "change" delta record needed
              print $deltaf $newArray[$newIndex][0].
                            $newArray[$newIndex][1].
                            "$changeCode\n";
                            $change++;
                     }
          # Advance index on both arrays.
                     $newIndex++;
                     $oldIndex++;

    # New key found in new array, "add" delta record needed.
    } elsif ($newArray[$newIndex][0] lt $oldArray[$oldIndex][0]) {
        print $deltaf $newArray[$newIndex][0],
                      $newArray[$newIndex][1],
                      "$addCode\n";
        # Advance only new array index.
                   $newIndex++;
                   $add++;

    #   Key found in old array but not new, "delete" delta record needed.
    #   NOTE: AT THIS TIME, NO DELETE RECORDS WILL BE SENT! DUE TO
    #   THE WAY THE TYLER SYSTEM WORKS, IT SHOULD NOT CREATE DELETES.
    }   elsif ($newArray[$newIndex][0] gt $oldArray[$oldIndex][0]) {

          if ($createDiffs) {
              print $deltaf $oldArray[$oldIndex][0],
                               $oldArray[$oldIndex][1],
                            "$deleteCode\n";
          }

        # Advance only old array index.
                   $oldIndex++;
                   $delete++;
    } else {
        # This shold never be hit
        rptError("Logic Error checking keys.");
    }


# Records remain only in new array.
} elsif ( !$newEOF && $oldEOF ) {
    print $deltaf $newArray[$newIndex][0],
                  $newArray[$newIndex][1],
                  "$addCode\n";
    # Advance only new array index.
            $newIndex++;
            $add++;

# Records only remain in the old set
# NOTE: AT THIS TIME, NO DELETE RECORDS WILL BE SENT! DUE TO
# THE WAY THE TYLER SYSTEM WORKS, IT SHOULD NOT CREATE DELETES.
} elsif ( $newEOF && !$oldEOF ) {
    if ($createDiffs) {
        print $deltaf $oldArray[$oldIndex][0],
                       $oldArray[$oldIndex][1],
                       "$deleteCode\n";
    }
    # Advance only old array index.
            $oldIndex++;
            $delete++;
} else {
    rptError("Logic error - array handling");



                                                               Page 41 of 61
                                                 Magnolia Data Warehouse Starter Kit
                                                                Perl Extracts Manual

        }

        # Set EOF indicators when last item has been processed in array.
        $newEOF++ unless $newIndex <= $newMax;
        $oldEOF++ unless $oldIndex <= $oldMax;


    }

    # Print out a simple report identifying changes.
       print "matched=$match\tadd=$add\tchange=$change\tdelete=$delete\n\n";

    $deltaf->close;                          # Close file of deltas

    # gzip previous day's file to save room on disk
    my $fileToZip = $config{"datafiles"}{$datafile}{"input"}."."
              . getDate("","-3");

    if (-e $fileToZip) {
        my $cmd = $config{"command"}{"gzip"}{"cmd"}. " -f " . $fileToZip;
        system($cmd);
    }
}

sub loadarrays {
    my ($datafile,
        %config ) = @_;

    # Get current datafile names.
    my $newFile = $config{"datafiles"}{$datafile}{"input"}.".".getDate("","-1");
    my $oldFile = $config{"datafiles"}{$datafile}{"input"}.".".getDate("","-2");

    # Verify files exist.
    rptError("Input file not found. $newFile") unless -e $newFile;
    rptError("Input file not found. $oldFile") unless -e $oldFile;

    # Verify new file has data.
    rptError("Input file is empty . $newFile") unless -s $newFile;

    # print progress message.
    print "Comparing: $newFile <->\n $oldFile.   " . getDate("time") . "\n";

    # Sort new file. File must be in ascending order for routine to work.
    my $cmd = "sort $newFile -o $newFile";
    system($cmd) == 0 or
              rptError("Sort of $newFile failed. $?");

    # Load new file into array.
    my $newRef = loadfiles($datafile, $newFile, \%config);

    #Old file should already be sorted, but just to be safe...
    $cmd = "sort $oldFile -o $oldFile";
    system($cmd) == 0 or
              rptError("Sort of $oldFile failed. $?");

    # Load old file into array.
    my $oldRef = loadfiles($datafile, $oldFile, \%config);

    return ($newRef, $oldRef);
}


sub loadfiles {



                                                                       Page 42 of 61
                                               Magnolia Data Warehouse Starter Kit
                                                              Perl Extracts Manual

    my ($datafile,
        $fileName,
        $config)   = @_;

    # Open file passed in.
    my $fh = IO::File->new($fileName)
       or rptError("Could not open input file. $fileName");

    # Get the length of the key field in the file from the configuration info.
    my $keyLength = $config{"datafiles"}{$datafile}{"keylength"};

    # Create an unpack format based on the key length. This will be used
    # to split the file record in to the key and data portions.
    my $unpackFormat = "a".$keyLength."a\*";
    my @data;

    while (my $line = <$fh>) {                  # Process through file

        chomp($line);                           # Remove new line

        # Split line into key and data portions.
        push @data , [unpack($unpackFormat, $line)];
    }

    return \@data;
}

# Simple routine to print contents of new/old arrays (used in debugging).
sub printarray {
    my @data = @_;

    for my $i (0 .. $#data) {

        print $data[$i][0] . " -- " . $data[$i][1] . "\n";
    }

}




                                                                     Page 43 of 61
                                                               Magnolia Data Warehouse Starter Kit
                                                                              Perl Extracts Manual


4. finddifftrans.pl
This is a special purpose program that was written solely for finding differences in two transaction header
or detail files. The code is a slight variant of the finddiff.pl, so I will not include it here. Please look to the
Magnolia UNIX machines if you wish to review the code.




5. ExtractCommon.pm
##########################################################################
# Magnolia Data Warehouse Extract
#
# Module:   ExtractCommon.pm
# purpose: This module contains a number of functions that are used
#           in common by the perl programs.
#
# Maintenance History:
# 9/09/02, Bruce Linn, Incline Technologies.
#
##########################################################################

package       ExtractCommon;
require       Exporter;

#use strict;
use Time::localtime;
use Sys::Hostname;
use Mail::Sender;

our @ISA     = qw(Exporter);
our @EXPORT  = qw(rptError
                  getDate
                  checkDate
                  loadConfig
                  sendEmail);
our $VERSION = 1.00;


########################################################################
# Standard fatal error processing for this program
#
# Input: Message to report before terminating.
#         Indicator to not send message to pagers (default will send
#         to pagers.
########################################################################
sub rptError {
    my $msg = shift;

     my $complete = "ERROR: [$0]: ". getDate("time") . ": $msg";

     print $complete . "\n";

     -e "./ExtractCommon.mail" or die "[$0}: ". getDate("time") .
            ": Unable to find ExtractCommon.mail file\n";
     my $notifyList = loadConfig("ExtractCommon.mail","");
     my %notify = %$notifyList;

     # Only attempt email send if flag is set in ExtractCommon.mail.
     if ($notify{"default"}{"mail"}) {
         if ($notify{"default"}{"mail"}{"enabled"} eq "y") {



                                                                                           Page 44 of 61
                                                     Magnolia Data Warehouse Starter Kit
                                                                    Perl Extracts Manual

             my @to_list;
             my @copy_list;

             for my $types ( keys %notify ) {
                 # skip over "default" lines in file
                 next if $types eq "default";

                      # process through list of addresses
                      for my $address ( keys %{ $notify{$types} } ) {

                          # Put addresses into proper list (to or copy)
                          if ($notify{$types}{$address}{"type"} &&
                             ($notify{$types}{$address}{"type"} eq "to")) {
                              push(@to_list, $address);

                          # Send copy to these addresses
                          } else {
                              push(@copy_list, $address);
                          }
                      }
             }

             my $subject = "Magnolia Data Warehouse Extract Error - " .
                               getDate("time");

             sendEmail(
                 from =>      $notify{"default"}{"from"}{"address"},
                 to   =>      \@to_list,
                 copy =>      \@copy_list,
                 subj =>      "$subject",
                 msg =>       "$complete",
             );
         }
    }

    exit (-1);
}

########################################################################
#
# Returns a date or date and time.
#
# input:
#    time = Any value passed here tells routine to return time as well
#    offset = Used to calculate date in past (or future)
#             -1 will get date 24 hours prior to current system date
#              2 will get date 48 hours into the future.
#
# return:
#    date - YYYYMMDD format or
#    date/time - YYYYMMDD hh:mi:ss format
#
########################################################################
sub getDate {
    my ($time, $offset) = @_;

    # if no offset is passed in, default to current date.
    if (!$offset) { $offset = 0 ; }

    my   $date    =   localtime(time() + ($offset*60*60*24));
    my   $mday    =   $date->mday();
    my   $month   =   $date->mon + 1;
    my   $year    =   $date->year + 1900;



                                                                           Page 45 of 61
                                                  Magnolia Data Warehouse Starter Kit
                                                                 Perl Extracts Manual


    my $hour   = $date->hour;
    my $min    = $date->min;
    my $sec    = $date->sec;

    if ($time) {
        return sprintf("%4d%02d%02d %02d:%02d:%02d",
                  $year, $month, $mday, $hour, $min, $sec);
    } else {
        return sprintf("%4d%02d%02d", $year, $month, $mday);
    }
}

########################################################################
# perform some rudimentary checks that the date is a date. Used in
# validating command line arguments.
#
# input: Date field
########################################################################
sub checkDate {
    my $startDate = shift;

    if ($startDate) {
        if (($startDate =~ /\D/) ||              # Value is not all digits
            length($startDate) != 8 ||           # Value is not 8 digits long
            substr($startDate,0,4) < 1990 ||     # Date is before 1990
            substr($startDate,4,2) > 12 ||       # invalid month value
            substr($startDate,6,2) > 31 ) {      # invalid day value
            rptError("Date must be in YYYYMMDD format. $startDate");
        }
    }
}

########################################################################
#
# Loads the requested configuration file into a hash.
#
# input:
#        Location of default configuration file.
#        Location of configuration file (if not using default
# ouput:
#        Reference to hash containing loaded information.
#
#######################################################################
sub loadConfig {
    my ($configDefault,
        $configFile) = @_;

    # Default configuration file.
    my $CFILE = $configDefault;

    # If a configFile is passed in, verify that it exists, else use default.
    if ($configFile) {
        rptError("Configuration file does not exist: $configFile") unless
            -e $configFile;

        # Override default with file passed in.
        $CFILE = $configFile;
    }

    # Retrieve datafile configuration information.
    my %config;




                                                                        Page 46 of 61
                                                    Magnolia Data Warehouse Starter Kit
                                                                   Perl Extracts Manual

     my $cf = IO::File->new($CFILE)
           or rptError("Could not open input file. $CFILE");

     # Load configuration file info into a hash.
     # See configuration file for a description of the file.
     while (<$cf>) {
        next unless s/^(.*?):(.*?):\s*//;    # first two fields are : delimited
        my $type = $1;                       # Type of config line
        my $field = $2;                      # Key to line
        for my $info (split) {
           my ($key, $value) = split /=/, $info; # split field/value pairs
           $config{$type}{$field}{$key} = $value;   # store in hash
        }
     }

     $cf->close;                                 # close the configuration file

     return \%config;
}

#########################################################################
# sends email message to a list of people
#
# Input:
#   to   - list of people to send the message to
#   copy - list of people to include on CC
#   from - email address to use on from line
#   subj - subject of the message
#   msg - Message text to send
#
#########################################################################
sub sendEmail{

    my %arg = @_;

    my   $to     =   $arg{to};     # Array ref
    my   $from   =   $arg{from};
    my   $copy   =   $arg{copy};   # Array ref
    my   $subj   =   $arg{subj};
    my   $msg    =   $arg{msg};

    # get host name of the machine. Assumes this machine is the smtp server.
    my $host = hostname();

    # Create a mail session.
    my $sender = Mail::Sender->new({smtp=>$host, from=>$from});

    my($to_list, $cc_list);

    # assemble arrays of recipients into comma separated strings
    if(defined $copy->[0]){ $cc_list = join(',',@$copy) }
    eval{ $to_list = join ',', @$to };

    # If no users are listed in the "to" list, see if there are users in
    # the "copy" list. If so, use this list. Otherwise send the message
    # to the default "from";
    if (!$to_list) {
        if ($cc_list) {
            $to_list = $cc_list;
            $cc_list = "";
        } else {
            print "ERROR: [$0]: ". getDate("time") .
                  ": No recipients listed for mail message.\n";



                                                                          Page 47 of 61
                                               Magnolia Data Warehouse Starter Kit
                                                              Perl Extracts Manual

            exit (-1);
        }
    }

    # send the message. Format of call is slightly different with a copy list.
    # if the send fails, just print the error, rather than using the rptError,
    # to avoid causing an ugly loop.
    unless(defined($cc_list)){
       $sender->MailMsg(
          { to        => "$to_list",
            from      => "$from",
            fake_from => "Magnolia Extract",
            subject   => "$subj",
            msg       => "$msg",
          }
       ) or print "ERROR: [$0]: ". getDate("time") .
                           ": Error sending email(1): $!\n";
       return;
    }
    else {
      $sender->MailMsg(
          { to        => "$to_list",
            from      => "$from",
            fake_from => "Magnolia Extract",
            subject   => "$subj",
            cc        => "$cc_list",
            msg       => "$msg",
          }
       ) or print "ERROR: [$0]: ". getDate("time") .
                           ": Error sending email(2): $!\n";
       return;
    };
}




                                                                     Page 48 of 61
                                                           Magnolia Data Warehouse Starter Kit
                                                                          Perl Extracts Manual


Configuration Files

1. runall.config

An example of the runall configuration file is listed below. This example was used during the testing of
the application. With the exception of the FTP target server information, the remainder of the
configuration file should be representative of the production configuration file.
# This is the configuration file for the program that
# controls the high level flow and ftp's the files to BBY.
#
# There are four different sections to this configuration.
# 1) Location of the extract and difference file that runall.pl will run
# 2) FTP target server information
# 3) File ftp information
# 4) Purge information
#
# Command locations
# -extract - name/location of the extrat perl module
# -difference - name/location of the difference program
# -gzip - name/location of the gzip utility
# Format:
# command:extract: cmd=<extract>
# command:diff: cmd=<difference>
# command:gzip: cmd=<gzip>
#
# FTP target server information
# -host - dns or IP of the server to send the files to
# -userid - ftp userid on target server
# -password - ftp password for userid on target server
# -directory - destination directory for files on target server
# -okfile - name/location of file that will be sent when file has been ftp'd
#           correctly. This is just a dummy file that will be sent as a
#           signal for ETL that the delta or transaction file has been sent
#           successfully.
#
# Format:
#config:target: host=<host> userid=<userid> passwd=<password>
#config:target: directory=<directory>
#config:target: okfile=<okfile>
#
# File FTP information
# -datafile - common name of datafile this information is for
# -source - name/location of delta or transaction files on Magnolia server
# -target - name of file once placed on bby data warehouse staging server. In
#           the target name, the <YYYYMMDD> will be resolved by the program
#           to the previous date.
# -ok     - <optional> By default, the ok file will be named to match the target file
#           with an ".ok" extension. This option allows you to override that
#           naming scheme for the ok file.
# -concat - The transaction files must be concatenated from three regional
#           files into one for FTP. This is the name of the single file
#           that the three will be combined into. This line is optional as it
#           should not be included for simple transfers.
#
# Format:
#ftp:<datafile>: source=<source>
#ftp:<datafile>: target=<target>
#ftp:<datafile>: concat=<concat>
#ftp:<datafile>: ok=<ok>
#
# Purge information
# The program will go through any directories listed here and look for
# files to delete. The files must have a date in the name in YYYYMMDD
# format.
# -<dir> = Name of directory to look for files (i.e. /home/bby/)




                                                                                     Page 49 of 61
                                                       Magnolia Data Warehouse Starter Kit
                                                                      Perl Extracts Manual

# -retain = number of days to retain files.    So, to retain files for two weeks,
#           a value of 14 should be used.
#purge:<dir>: retain=<#>


command:extract: cmd=./extract.pl
command:diff: cmd=./finddiff.pl
command:gzip: cmd=/usr/contrib/bin/gzip

config:target: host=168.94.140.51 userid=bbydev passwd=bby19dev
config:target: directory=/home/bby/testftp

config:target: okfile=/home/bby/ok.file

ftp:dept.names: source=/home/bby/files/delta.dept.names
ftp:dept.names: target=tb_ma_stg_of_dept_names<YYYYMMDD>.dat
ftp:dept.names: ok=tbmas_deptnames<YYYYMMDD>.ok

ftp:branch.info: source=/home/bby/files/delta.branch.info
ftp:branch.info: target=tb_ma_stg_ge_branch_info<YYYYMMDD>.dat
ftp:branch.info: ok=tbmas_branchinfo<YYYYMMDD>.ok

ftp:inv.brands: source=/home/bby/files/delta.inv.brands
ftp:inv.brands: target=tb_ma_stg_of_inv_brands<YYYYMMDD>.dat
ftp:inv.brands: ok=tbmas_brands<YYYYMMDD>.ok

ftp:inventory.mast: source=/home/bby/files/delta.inventory.mast
ftp:inventory.mast: target=tb_ma_stg_of_inventory_mast<YYYYMMDD>.dat
ftp:inventory.mast: ok=tbmas_invmast<YYYYMMDD>.ok

ftp:product.types: source=/home/bby/files/delta.product.types
ftp:product.types: target=tb_ma_stg_of_product_types<YYYYMMDD>.dat
ftp:product.types: ok=tbmas_product<YYYYMMDD>.ok

ftp:payroll.master: source=/home/bby/files/delta.payroll.master
ftp:payroll.master: target=tb_ma_stg_pa_payroll_master<YYYYMMDD>.dat
ftp:payroll.master: ok=tbmas_payroll<YYYYMMDD>.ok

ftp:tp0102:   source=/home/bby/files/trans.header.02
ftp:tp0102:   target=tb_ma_stg_sa_trans_header<YYYYMMDD>.dat
ftp:tp0102:   concat=/home/bby/files/trans.header
ftp:tp0102:   ok=tbmas_header<YYYYMMDD>.ok

ftp:tp01n:   source=/home/bby/files/trans.header.n
ftp:tp01n:   target=tb_ma_stg_sa_trans_header<YYYYMMDD>.dat
ftp:tp01n:   concat=/home/bby/files/trans.header
ftp:tp01n:   ok=tbmas_header<YYYYMMDD>.ok

ftp:tp0103:   source=/home/bby/files/trans.header.03
ftp:tp0103:   target=tb_ma_stg_sa_trans_header<YYYYMMDD>.dat
ftp:tp0103:   concat=/home/bby/files/trans.header
ftp:tp0103:   ok=tbmas_header<YYYYMMDD>.ok

ftp:tp0202:   source=/home/bby/files/trans.detail.02
ftp:tp0202:   target=tb_ma_stg_sa_trans_detail<YYYYMMDD>.dat
ftp:tp0202:   concat=/home/bby/files/trans.detail
ftp:tp0202:   ok=tbmas_detail<YYYYMMDD>.ok

ftp:tp0201:   source=/home/bby/files/trans.detail.n
ftp:tp0201:   target=tb_ma_stg_sa_trans_detail<YYYYMMDD>.dat
ftp:tp0201:   concat=/home/bby/files/trans.detail
ftp:tp0201:   ok=tbmas_detail<YYYYMMDD>.ok

ftp:tp0203:   source=/home/bby/files/trans.detail.03
ftp:tp0203:   target=tb_ma_stg_sa_trans_detail<YYYYMMDD>.dat
ftp:tp0203:   concat=/home/bby/files/trans.detail
ftp:tp0203:   ok=tbmas_detail<YYYYMMDD>.ok

purge:/home/bby/files/: retain=14




                                                                               Page 50 of 61
Magnolia Data Warehouse Starter Kit
               Perl Extracts Manual




                      Page 51 of 61
                                                          Magnolia Data Warehouse Starter Kit
                                                                         Perl Extracts Manual


2. extract.config
An example of the extract configuration file is listed below. This example was used during the testing of
the application. The configuration file should be representative of the production configuration file.

# This is the datafile configuration file for the extract of data from
# the Tyler system to the BBY Data Warehouse. This file drives how the data
# is read from the Tyler raw files and its format.
#
# This data file has at three sections of data in this file to
# allow the extract to work. The formatting of these lines is very important
# to the extract program, so please be careful to format correctly.
# 1) The first section adds the datafile to the list of available datafiles
#    to extract. It is listed here by its common name. The default flag
#    indicates whether this datafile should be run as part of the typical
#    run of the extract program.
#    Format:
#    datafiles:<common name>: default=<y|n>
#
# 2) The second section is the "config" section. This section describes
#    the necessary attributes of the datafile at a file level.
#    -keyOffset - Each data file stores its key at the beginning of the file.
#                 This section of the file will be skipped over when reading
#                 in the fields based on offset.
#    -recLength - Physical length of one logical record. This is needed because
#                 the Tyler file system sometimes puts multiple records on
#                 one physical line. This tells the program where to make the
#                 break.
#    -type      - For the trans detail and trans header files, this is set
#                 to "trans". This tells the extract program to only pull
#                 records from the previous day. Leave blank
#                 for all others.
#    -datafile - Lists the path and filename of the datafile you wish to
#                 extract.
#    -outFile   - Lists the path and filename of the extract file to create.
#                 This filename will be appended with the date (CCYYMMDD)
#
#    Format:
#    <common name>:config: keyOffset=<#> recLength=<#> type=<trans|blank>
#    <common name>:config: datafile=<physical location of datafile>
#    <common name>:config: outFile=<physical location to write extract to>
#
# 3) The third section lists the field attributes for each datafile.
#    -field     - The name of the field that will be described. The name
#                 must be unique within the datafile.
#    -type      - "s" for string. "n" for numeric. Only use "n" when you
#                 want a decimal removed from the format of a number.
#    -start     - Starting position of field with in datafile (usually based
#                 on the Tyler data dictionary definition).
#    -length    - Length of field in the file.
#    -output    - This is the order in which the fields that are read in
#                 will be written out in the extracted file. The numbers
#                 should start with 0 and increment. If two fields are listed
#                 with the same number, only one of the two files will appear
#                 in the file. This sequence is unique to the datafile that
#                 is being defined.
#    -format    - Format mask that will be used when writing out the field.
#                 For the purposes of this extract, all formats are being
#                 written as "strings". The format assures a minimum number
#                 of characters to be written for the field. It cannot be
#                 used to truncate a field (use the start and length to do
#                 this). If you understate the length of a format, the
#                 actual length of the field will override.
#                 NOTE: When using the type="n", the length of the field will
#                 be reduced by one since this option strips the decimal place.
#
#    Format:
#    <common name>:<field name>: type=<s|n> start=<#> length=<#> output=<#> format=%<min length
for field>s




                                                                                    Page 52 of 61
                                                     Magnolia Data Warehouse Starter Kit
                                                                    Perl Extracts Manual

#


datafiles:branch.info: default=y
datafiles:dept.names: default=y
datafiles:inv.brands: default=y
datafiles:payroll.master: default=y
datafiles:product.types: default=y
datafiles:inventory.mast: default=y
#trans headers
datafiles:tp01n: default=y
datafiles:tp0102: default=y
datafiles:tp0103: default=y
#trans details
datafiles:tp02n: default=y
datafiles:tp0202: default=y
datafiles:tp0203: default=y

dept.names:config: keyOffset=2 recLength=16 type=
dept.names:config: datafile=/home/DSK/DA1/TSSADEPT.dat
dept.names:config: outFile=/home/bby/files/source.dept.names.
dept.names:dept_code: type=s start=0 length=2 output=0 format=%2s
dept.names:dept_name: type=s start=2 length=12 output=1 format=%12s

inv.brands:config: keyOffset=5 recLength=35 type=
inv.brands:config: datafile=/home/DSK/DA1/TSSAVEND.dat
inv.brands:config: outFile=/home/bby/files/source.inv.brands.
inv.brands:brand_id: type=s start=0 length=5 output=0 format=%5s
inv.brands:brand_name: type=s start=5 length=25 output=1 format=%25s

inventory.mast:config: keyOffset=0 recLength=302 type=
inventory.mast:config: datafile=/home/bby/files/inventory.master
inventory.mast:config: outFile=/home/bby/files/source.inventory.mast.
inventory.mast:item_number: type=s start=0 length=14 output=0 format=%14s
inventory.mast:description1: type=s start=14 length=30 output=1 format=%30s
inventory.mast:description2: type=s start=44 length=30 output=2 format=%30s
inventory.mast:last_po_cost: type=s start=0 length=0 output=3 format=%7s
inventory.mast:mast_scratch: type=s start=0 length=0 output=4 format=%2s
inventory.mast:unit_of_measure: type=s start=74 length=2 output=5 format=%2s
inventory.mast:vendor_pack: type=s start=76 length=4 output=6 format=%4s
inventory.mast:unit_cost: type=n start=81 length=11 output=7 format=%10s
inventory.mast:lst_invoice_cost: type=n start=92 length=11 output=8 format=%10s
inventory.mast:product_type: type=s start=103 length=3 output=9 format=%3s
inventory.mast:department: type=s start=106 length=2 output=10 format=%2s
inventory.mast:wholesale_price: type=n start=108 length=11 output=11 format=%10s
inventory.mast:min_sell_price: type=n start=119 length=11 output=12 format=%10s
inventory.mast:selling_price: type=n start=130 length=11 output=13 format=%10s
inventory.mast:list_price: type=n start=141 length=11 output=14 format=%10s
inventory.mast:brand: type=s start=170 length=5 output=15 format=%5s
inventory.mast:vendor_price: type=n start=175 length=11 output=16 format=%10s
inventory.mast:plan_flag: type=s start=186 length=1 output=17 format=%1s
inventory.mast:priority_flag: type=s start=187 length=1 output=18 format=%1s
inventory.mast:serial_num_flag: type=s start=188 length=1 output=19 format=%1s
inventory.mast:warranty_num_flg: type=s start=189 length=1 output=20 format=%1s
inventory.mast:cust_prch_hst: type=s start=190 length=1 output=21 format=%1s
inventory.mast:inst_plan_1: type=n start=191 length=11 output=22 format=%10s
inventory.mast:inst_plan_2: type=n start=202 length=11 output=23 format=%10s
inventory.mast:inst_plan_3: type=n start=213 length=11 output=24 format=%10s
inventory.mast:freight: type=n start=224 length=11 output=25 format=%10s
inventory.mast:mfg_sku: type=s start=235 length=14 output=26 format=%14s
inventory.mast:buyer_comments: type=s start=249 length=15 output=27 format=%15s
inventory.mast:nrf_code: type=s start=0 length=0 output=28 format=%5s
inventory.mast:color: type=s start=273 length=3 output=29 format=%3s
inventory.mast:seq_no: type=s start=276 length=2 output=30 format=%2s
inventory.mast:warr_code: type=s start=278 length=3 output=31 format=%3s
inventory.mast:fabric: type=s start=280 length=2 output=32 format=%2s
inventory.mast:locator_exempt: type=s start=283 length=1 output=33 format=%1s
inventory.mast:whse_delivery: type=s start=284 length=1 output=34 format=%1s
inventory.mast:feature_access: type=s start=285 length=2 output=35 format=%2s
inventory.mast:serv_policy: type=s start=287 length=1 output=36 format=%1s
inventory.mast:barcode: type=s start=288 length=14 output=37 format=%-14s




                                                                             Page 53 of 61
                                                     Magnolia Data Warehouse Starter Kit
                                                                    Perl Extracts Manual


product.types:config: keyOffset=3 recLength=39 type=
product.types:config: datafile=/home/DSK/DA1/STSATYPE.dat
product.types:config: outFile=/home/bby/files/source.product.types.
product.types:product_type_code: type=s start=0 length=3 output=0 format=%3s
product.types:product_type_desc: type=s start=3 length=30 output=1 format=%30s
product.types:product_type_dept: type=s start=33 length=2 output=2 format=%2s

branch.info:config: keyOffset=4 recLength=256 type=
branch.info:config: datafile=/home/DSK/DA1/STBIF.dat
branch.info:config: outFile=/home/bby/files/source.branch.info.
branch.info:BRANCH_ID: type=s start=0 length=4 output=0 format=%4s
branch.info:BRANCH_NAME: type=s start=4 length=25 output=1 format=%25s
branch.info:ADDRESS_ONE: type=s start=29 length=25 output=2 format=%25s
branch.info:ADDRESS_TWO: type=s start=54 length=25 output=3 format=%25s
branch.info:CITY: type=s start=79 length=15 output=4 format=%15s
branch.info:STATE: type=s start=94 length=2 output=5 format=%2s
branch.info:ZIP_CODE: type=s start=96 length=9 output=6 format=%9s
branch.info:TELEPHONE: type=s start=105 length=11 output=7 format=%11s
branch.info:ATTENTION: type=s start=116 length=15 output=8 format=%15s
branch.info:BRANCH_NUMB: type=s start=131 length=2 output=9 format=%2s
branch.info:REGION_NUMB: type=s start=133 length=2 output=10 format=%2s
branch.info:PROFIT_COST_CTR: type=s start=135 length=2 output=11 format=%2s
branch.info:SALES_TAX_CODE: type=s start=137 length=4 output=12 format=%4s
branch.info:REGION_CODE: type=s start=141 length=2 output=13 format=%2s
branch.info:INVENTORY_LOC: type=s start=143 length=1 output=14 format=%1s
branch.info:DATE_PHY_CYMD: type=s start=144 length=8 output=15 format=%8s
branch.info:PHYS_UPDT_CYMD: type=s start=152 length=8 output=16 format=%8s
branch.info:TAXABLE: type=s start=160 length=1 output=17 format=%1s
branch.info:NORM_INV_PU: type=s start=161 length=4 output=18 format=%4s
branch.info:INV_PR_CODE: type=s start=165 length=2 output=19 format=%2s
branch.info:LOC_BRCH_FLG: type=s start=167 length=1 output=20 format=%1s
branch.info:PU_FOR_DD: type=s start=168 length=4 output=21 format=%4s
branch.info:FRT_CALC_PERC: type=n start=172 length=7 output=22 format=%6s
branch.info:FRT_ADJ_FLG: type=s start=179 length=1 output=23 format=%1s
branch.info:STOCK_DIST_BRCH: type=s start=183 length=4 output=24 format=%4s
branch.info:MIN_SELL_PR_CD: type=s start=187 length=2 output=25 format=%2s
branch.info:MIN_SELL_AUTH: type=s start=194 length=1 output=26 format=%1s
branch.info:SERVICE_BRANCH: type=s start=195 length=4 output=27 format=%4s


tp0102:config: keyOffset=20 recLength=398 type=trans
tp0102:config: datafile=/home/DSK/DA1/TP0102.dat
tp0102:config: outFile=/home/bby/files/trans.header.02.
tp0102:date_ccyymmdd: type=s start=0 length=8 output=0 format=%8s
tp0102:audit_store: type=s start=8 length=4 output=1 format=%4s
tp0102:trans_number: type=s start=12 length=6 output=2 format=%6s
tp0102:trans_type: type=s start=18 length=2 output=3 format=%2s
tp0102:salesman_1: type=s start=20 length=5 output=4 format=%5s
tp0102:salesman_2: type=s start=25 length=5 output=5 format=%5s
tp0102:salesman_2_perc: type=s start=30 length=2 output=6 format=%2s
tp0102:next_purch_code: type=s start=32 length=5 output=7 format=%5s
tp0102:next_purch_mnth: type=s start=37 length=2 output=8 format=%2s
tp0102:cash_amt: type=n start=39 length=10 output=9 format=%9s
tp0102:check_amt: type=n start=49 length=10 output=10 format=%9s
tp0102:credit_card_amt: type=n start=59 length=10 output=11 format=%9s
tp0102:media: type=s start=69 length=10 output=12 format=%10s
tp0102:financed_amount: type=n start=79 length=10 output=13 format=%9s
tp0102:office_refund: type=n start=89 length=10 output=14 format=%9s
tp0102:charge_to_acct: type=n start=99 length=10 output=15 format=%9s
tp0102:credit_used: type=n start=109 length=10 output=16 format=%9s
tp0102:total_sale: type=n start=119 length=10 output=17 format=%9s
tp0102:cost_at_uc: type=n start=129 length=10 output=18 format=%9s
tp0102:cost_at_vend_pr: type=n start=139 length=10 output=19 format=%9s
tp0102:tax_amt: type=n start=149 length=9 output=20 format=%8s
tp0102:freight: type=n start=158 length=9 output=21 format=%8s
tp0102:finance_charge: type=n start=167 length=8 output=22 format=%7s
tp0102:finance_company: type=s start=175 length=10 output=23 format=%10s
tp0102:credit_memo_num: type=s start=185 length=10 output=24 format=%10s
tp0102:account_number: type=s start=195 length=10 output=25 format=%10s
tp0102:terms: type=s start=205 length=3 output=26 format=%3s




                                                                              Page 54 of 61
                                                     Magnolia Data Warehouse Starter Kit
                                                                    Perl Extracts Manual

tp0102:discount: type=s start=208 length=3 output=27 format=%3s
tp0102:discount_pct: type=s start=211 length=2 output=28 format=%2s
tp0102:taxable_sale: type=s start=213 length=1 output=29 format=%1s
tp0102:tax_code: type=s start=214 length=4 output=30 format=%4s
tp0102:system: type=s start=218 length=1 output=31 format=%1s
tp0102:sold_at_store: type=s start=219 length=4 output=32 format=%4s
tp0102:mkd_approved_by: type=s start=223 length=5 output=33 format=%5s
tp0102:credit_card_num: type=s start=228 length=16 output=34 format=%16s
tp0102:authorization: type=s start=244 length=6 output=35 format=%6s
tp0102:trade_in_purch: type=n start=250 length=10 output=36 format=%9s
tp0102:gen_numb: type=s start=260 length=1 output=37 format=%1s
tp0102:shift_id: type=s start=261 length=1 output=38 format=%1s
tp0102:terminal: type=s start=262 length=3 output=39 format=%3s
tp0102:trans_time: type=s start=265 length=5 output=40 format=%5s
tp0102:return_auth: type=s start=270 length=6 output=41 format=%6s
tp0102:cr_card_exp: type=s start=276 length=4 output=42 format=%4s
tp0102:check_number: type=s start=280 length=8 output=43 format=%8s
tp0102:cust_num_bill: type=s start=288 length=11 output=44 format=%11s
tp0102:cust_num_ship: type=s start=299 length=11 output=45 format=%11s
tp0102:po_number: type=s start=310 length=16 output=46 format=%16s
tp0102:trade_in_actual: type=n start=326 length=10 output=47 format=%9s
tp0102:changed_master: type=s start=336 length=1 output=48 format=%1s
tp0102:reason_code: type=s start=337 length=2 output=49 format=%2s
tp0102:trans_status: type=s start=339 length=1 output=50 format=%1s
tp0102:trans_user_id: type=s start=340 length=5 output=51 format=%5s
tp0102:check_auth_numb: type=s start=345 length=6 output=52 format=%6s


tp01n:config: keyOffset=20 recLength=398 type=trans
tp01n:config: datafile=/home/DK3/DA4/TP01N.dat
tp01n:config: outFile=/home/bby/files/trans.header.n.
tp01n:date_ccyymmdd: type=s start=0 length=8 output=0 format=%8s
tp01n:audit_store: type=s start=8 length=4 output=1 format=%4s
tp01n:trans_number: type=s start=12 length=6 output=2 format=%6s
tp01n:trans_type: type=s start=18 length=2 output=3 format=%2s
tp01n:salesman_1: type=s start=20 length=5 output=4 format=%5s
tp01n:salesman_2: type=s start=25 length=5 output=5 format=%5s
tp01n:salesman_2_perc: type=s start=30 length=2 output=6 format=%2s
tp01n:next_purch_code: type=s start=32 length=5 output=7 format=%5s
tp01n:next_purch_mnth: type=s start=37 length=2 output=8 format=%2s
tp01n:cash_amt: type=n start=39 length=10 output=9 format=%9s
tp01n:check_amt: type=n start=49 length=10 output=10 format=%9s
tp01n:credit_card_amt: type=n start=59 length=10 output=11 format=%9s
tp01n:media: type=s start=69 length=10 output=12 format=%10s
tp01n:financed_amount: type=n start=79 length=10 output=13 format=%9s
tp01n:office_refund: type=n start=89 length=10 output=14 format=%9s
tp01n:charge_to_acct: type=n start=99 length=10 output=15 format=%9s
tp01n:credit_used: type=n start=109 length=10 output=16 format=%9s
tp01n:total_sale: type=n start=119 length=10 output=17 format=%9s
tp01n:cost_at_uc: type=n start=129 length=10 output=18 format=%9s
tp01n:cost_at_vend_pr: type=n start=139 length=10 output=19 format=%9s
tp01n:tax_amt: type=n start=149 length=9 output=20 format=%8s
tp01n:freight: type=n start=158 length=9 output=21 format=%8s
tp01n:finance_charge: type=n start=167 length=8 output=22 format=%7s
tp01n:finance_company: type=s start=175 length=10 output=23 format=%10s
tp01n:credit_memo_num: type=s start=185 length=10 output=24 format=%10s
tp01n:account_number: type=s start=195 length=10 output=25 format=%10s
tp01n:terms: type=s start=205 length=3 output=26 format=%3s
tp01n:discount: type=s start=208 length=3 output=27 format=%3s
tp01n:discount_pct: type=s start=211 length=2 output=28 format=%2s
tp01n:taxable_sale: type=s start=213 length=1 output=29 format=%1s
tp01n:tax_code: type=s start=214 length=4 output=30 format=%4s
tp01n:system: type=s start=218 length=1 output=31 format=%1s
tp01n:sold_at_store: type=s start=219 length=4 output=32 format=%4s
tp01n:mkd_approved_by: type=s start=223 length=5 output=33 format=%5s
tp01n:credit_card_num: type=s start=228 length=16 output=34 format=%16s
tp01n:authorization: type=s start=244 length=6 output=35 format=%6s
tp01n:trade_in_purch: type=n start=250 length=10 output=36 format=%9s
tp01n:gen_numb: type=s start=260 length=1 output=37 format=%1s
tp01n:shift_id: type=s start=261 length=1 output=38 format=%1s
tp01n:terminal: type=s start=262 length=3 output=39 format=%3s




                                                                           Page 55 of 61
                                                     Magnolia Data Warehouse Starter Kit
                                                                    Perl Extracts Manual

tp01n:trans_time: type=s start=265 length=5 output=40 format=%5s
tp01n:return_auth: type=s start=270 length=6 output=41 format=%6s
tp01n:cr_card_exp: type=s start=276 length=4 output=42 format=%4s
tp01n:check_number: type=s start=280 length=8 output=43 format=%8s
tp01n:cust_num_bill: type=s start=288 length=11 output=44 format=%11s
tp01n:cust_num_ship: type=s start=299 length=11 output=45 format=%11s
tp01n:po_number: type=s start=310 length=16 output=46 format=%16s
tp01n:trade_in_actual: type=n start=326 length=10 output=47 format=%9s
tp01n:changed_master: type=s start=336 length=1 output=48 format=%1s
tp01n:reason_code: type=s start=337 length=2 output=49 format=%2s
tp01n:trans_status: type=s start=339 length=1 output=50 format=%1s
tp01n:trans_user_id: type=s start=340 length=5 output=51 format=%5s
tp01n:check_auth_numb: type=s start=345 length=6 output=52 format=%6s

tp0103:config: keyOffset=20 recLength=398 type=trans
tp0103:config: datafile=/home/DSK/DA1/TP0103.dat
tp0103:config: outFile=/home/bby/files/trans.header.03.
tp0103:date_ccyymmdd: type=s start=0 length=8 output=0 format=%8s
tp0103:audit_store: type=s start=8 length=4 output=1 format=%4s
tp0103:trans_number: type=s start=12 length=6 output=2 format=%6s
tp0103:trans_type: type=s start=18 length=2 output=3 format=%2s
tp0103:salesman_1: type=s start=20 length=5 output=4 format=%5s
tp0103:salesman_2: type=s start=25 length=5 output=5 format=%5s
tp0103:salesman_2_perc: type=s start=30 length=2 output=6 format=%2s
tp0103:next_purch_code: type=s start=32 length=5 output=7 format=%5s
tp0103:next_purch_mnth: type=s start=37 length=2 output=8 format=%2s
tp0103:cash_amt: type=n start=39 length=10 output=9 format=%9s
tp0103:check_amt: type=n start=49 length=10 output=10 format=%9s
tp0103:credit_card_amt: type=n start=59 length=10 output=11 format=%9s
tp0103:media: type=s start=69 length=10 output=12 format=%10s
tp0103:financed_amount: type=n start=79 length=10 output=13 format=%9s
tp0103:office_refund: type=n start=89 length=10 output=14 format=%9s
tp0103:charge_to_acct: type=n start=99 length=10 output=15 format=%9s
tp0103:credit_used: type=n start=109 length=10 output=16 format=%9s
tp0103:total_sale: type=n start=119 length=10 output=17 format=%9s
tp0103:cost_at_uc: type=n start=129 length=10 output=18 format=%9s
tp0103:cost_at_vend_pr: type=n start=139 length=10 output=19 format=%9s
tp0103:tax_amt: type=n start=149 length=9 output=20 format=%8s
tp0103:freight: type=n start=158 length=9 output=21 format=%8s
tp0103:finance_charge: type=n start=167 length=8 output=22 format=%7s
tp0103:finance_company: type=s start=175 length=10 output=23 format=%10s
tp0103:credit_memo_num: type=s start=185 length=10 output=24 format=%10s
tp0103:account_number: type=s start=195 length=10 output=25 format=%10s
tp0103:terms: type=s start=205 length=3 output=26 format=%3s
tp0103:discount: type=s start=208 length=3 output=27 format=%3s
tp0103:discount_pct: type=s start=211 length=2 output=28 format=%2s
tp0103:taxable_sale: type=s start=213 length=1 output=29 format=%1s
tp0103:tax_code: type=s start=214 length=4 output=30 format=%4s
tp0103:system: type=s start=218 length=1 output=31 format=%1s
tp0103:sold_at_store: type=s start=219 length=4 output=32 format=%4s
tp0103:mkd_approved_by: type=s start=223 length=5 output=33 format=%5s
tp0103:credit_card_num: type=s start=228 length=16 output=34 format=%16s
tp0103:authorization: type=s start=244 length=6 output=35 format=%6s
tp0103:trade_in_purch: type=n start=250 length=10 output=36 format=%9s
tp0103:gen_numb: type=s start=260 length=1 output=37 format=%1s
tp0103:shift_id: type=s start=261 length=1 output=38 format=%1s
tp0103:terminal: type=s start=262 length=3 output=39 format=%3s
tp0103:trans_time: type=s start=265 length=5 output=40 format=%5s
tp0103:return_auth: type=s start=270 length=6 output=41 format=%6s
tp0103:cr_card_exp: type=s start=276 length=4 output=42 format=%4s
tp0103:check_number: type=s start=280 length=8 output=43 format=%8s
tp0103:cust_num_bill: type=s start=288 length=11 output=44 format=%11s
tp0103:cust_num_ship: type=s start=299 length=11 output=45 format=%11s
tp0103:po_number: type=s start=310 length=16 output=46 format=%16s
tp0103:trade_in_actual: type=n start=326 length=10 output=47 format=%9s
tp0103:changed_master: type=s start=336 length=1 output=48 format=%1s
tp0103:reason_code: type=s start=337 length=2 output=49 format=%2s
tp0103:trans_status: type=s start=339 length=1 output=50 format=%1s
tp0103:trans_user_id: type=s start=340 length=5 output=51 format=%5s
tp0103:check_auth_numb: type=s start=345 length=6 output=52 format=%6s




                                                                           Page 56 of 61
                                                     Magnolia Data Warehouse Starter Kit
                                                                    Perl Extracts Manual

payroll.master:config: keyOffset=5 recLength=761 type=
payroll.master:config: datafile=/home/DSK/DA1/STP1.dat
payroll.master:config: outFile=/home/bby/files/source.payroll.master.
payroll.master:EMPL_NO: type=s start=0 length=5 output=0 format=%5s
payroll.master:EMPL_NAME: type=s start=5 length=25 output=1 format=%25s
payroll.master:EMPL_ADDRESS1: type=s start=30 length=25 output=2 format=%25s
payroll.master:EMPL_ADDRESS2: type=s start=55 length=25 output=3 format=%25s
payroll.master:EMPL_CITY: type=s start=80 length=15 output=4 format=%15s
payroll.master:EMPL_STATE: type=s start=95 length=2 output=5 format=%2s
payroll.master:EMPL_ZIP: type=s start=97 length=9 output=6 format=%9s
payroll.master:EMPL_PHONE_NO: type=s start=106 length=11 output=7 format=%11s
payroll.master:EMPL_DEPT_NO: type=s start=117 length=6 output=8 format=%6s
payroll.master:EMPL_SOC_SEC_NO: type=s start=123 length=11 output=9 format=%11s
payroll.master:EMPL_STATUS: type=s start=134 length=2 output=10 format=%2s
payroll.master:TYPE_OF_JOB: type=s start=136 length=5 output=11 format=%5s
payroll.master:SALARY_VS_HOUR: type=s start=141 length=1 output=12 format=%1s
payroll.master:MARITAL_STATUS: type=s start=142 length=1 output=13 format=%1s
payroll.master:EMPL_SEX: type=s start=143 length=1 output=14 format=%1s
payroll.master:PAY_PER_CODE: type=s start=144 length=2 output=15 format=%2s
payroll.master:FED_EXEMPTIONS: type=s start=147 length=3 output=16 format=%3s
payroll.master:ST_EXEMPTIONS: type=s start=151 length=3 output=17 format=%3s
payroll.master:STATE_TAX_CODE: type=s start=154 length=2 output=18 format=%2s
payroll.master:CITY_EXEMPTIONS: type=s start=157 length=3 output=19 format=%3s
payroll.master:CITY_CODE: type=s start=160 length=2 output=20 format=%2s
payroll.master:BIRTH_DATE_CYMD: type=s start=162 length=8 output=21 format=%8s
payroll.master:EMPL_ST_DT_CYMD: type=s start=170 length=8 output=22 format=%8s
payroll.master:DT_TERMIN_CYMD: type=s start=178 length=8 output=23 format=%8s
payroll.master:SENIOR_DT_CYMD: type=s start=186 length=8 output=24 format=%8s
payroll.master:PRESENT_PAY: type=n start=194 length=10 output=25 format=%9s
payroll.master:DT_LST_INC_CYMD: type=s start=204 length=8 output=26 format=%8s
payroll.master:AMT_LAST_RAISE: type=n start=212 length=10 output=27 format=%9s
payroll.master:CONTACT_NAME: type=s start=222 length=20 output=28 format=%20s
payroll.master:CONTACT_PHONE: type=s start=242 length=11 output=29 format=%11s
payroll.master:HEAD_HOUSEHOLD: type=s start=253 length=1 output=30 format=%1s
payroll.master:FED_WH_FLAG: type=s start=254 length=1 output=31 format=%1s
payroll.master:EXEMPT_OVERTIME: type=s start=255 length=1 output=32 format=%1s
payroll.master:REGION_ALPHA: type=s start=256 length=2 output=33 format=%2s
payroll.master:COMMISS_PLAN: type=s start=258 length=1 output=34 format=%1s
payroll.master:STATE_WH_FLAG: type=s start=259 length=1 output=35 format=%1s
payroll.master:WP_NAME: type=s start=260 length=25 output=36 format=%25s

tp0202:config: keyOffset=23 recLength=275 type=trans
tp0202:config: datafile=/home/DSK/DA1/TP0202.dat
tp0202:config: outFile=/home/bby/files/trans.detail.02.
tp0202:DATE_CCYYMMDD: type=s start=0 length=8 output=0 format=%8s
tp0202:AUDIT_STORE: type=s start=8 length=4 output=1 format=%4s
tp0202:TRANS_NUMBER: type=s start=12 length=6 output=2 format=%6s
tp0202:TRANS_TYPE: type=s start=18 length=2 output=3 format=%2s
tp0202:LINE_NUMBER: type=s start=20 length=3 output=4 format=%3s
tp0202:QUANTITY: type=s start=24 length=6 output=5 format=%6s
tp0202:ITEM_NUMBER: type=s start=30 length=14 output=6 format=%14s
tp0202:DESCRIPTION: type=s start=44 length=18 output=7 format=%18s
tp0202:SALESMAN1_5: type=s start=62 length=5 output=8 format=%5s
tp0202:SALESMAN2_5: type=s start=67 length=5 output=9 format=%5s
tp0202:COMMISS_PCT: type=s start=72 length=2 output=10 format=%2s
tp0202:SERIAL_NUMB_FLG: type=s start=74 length=1 output=11 format=%1s
tp0202:PICKUP_STORE: type=s start=75 length=4 output=12 format=%4s
tp0202:WARRANTY_CODE: type=s start=79 length=3 output=13 format=%3s
tp0202:SPLIT_FLAG: type=s start=82 length=1 output=14 format=%1s
tp0202:LINE_TOTAL: type=n start=83 length=10 output=15 format=%9s
tp0202:LINE_COG_AT_UC: type=n start=93 length=10 output=16 format=%9s
tp0202:COG_AT_VP: type=n start=103 length=10 output=17 format=%9s
tp0202:MIN_SELLING_PR: type=n start=113 length=10 output=18 format=%9s
tp0202:BRAND: type=s start=123 length=5 output=19 format=%5s
tp0202:DEPARTMENT: type=s start=128 length=2 output=20 format=%2s
tp0202:PRODUCT_TYPE: type=s start=130 length=3 output=21 format=%3s
tp0202:UNIT_OF_MEASURE: type=s start=133 length=2 output=22 format=%2s
tp0202:FLOOR_PLAN: type=s start=135 length=1 output=23 format=%1s
tp0202:SERIAL_NUMBER: type=s start=136 length=15 output=24 format=%15s
tp0202:SOLD_AT_STORE: type=s start=151 length=4 output=25 format=%4s
tp0202:FLR_MODEL_FLG: type=s start=155 length=1 output=26 format=%1s




                                                                             Page 57 of 61
                                                           Magnolia Data Warehouse Starter Kit
                                                                          Perl Extracts Manual


tp02n:config: keyOffset=23 recLength=275 type=trans
tp02n:config: datafile=/home/DK3/DA4/TP02N.dat
tp02n:config: outFile=/home/bby/files/trans.detail.n.
tp02n:DATE_CCYYMMDD: type=s start=0 length=8 output=0 format=%8s
tp02n:AUDIT_STORE: type=s start=8 length=4 output=1 format=%4s
tp02n:TRANS_NUMBER: type=s start=12 length=6 output=2 format=%6s
tp02n:TRANS_TYPE: type=s start=18 length=2 output=3 format=%2s
tp02n:LINE_NUMBER: type=s start=20 length=3 output=4 format=%3s
tp02n:QUANTITY: type=s start=24 length=6 output=5 format=%6s
tp02n:ITEM_NUMBER: type=s start=30 length=14 output=6 format=%14s
tp02n:DESCRIPTION: type=s start=44 length=18 output=7 format=%18s
tp02n:SALESMAN1_5: type=s start=62 length=5 output=8 format=%5s
tp02n:SALESMAN2_5: type=s start=67 length=5 output=9 format=%5s
tp02n:COMMISS_PCT: type=s start=72 length=2 output=10 format=%2s
tp02n:SERIAL_NUMB_FLG: type=s start=74 length=1 output=11 format=%1s
tp02n:PICKUP_STORE: type=s start=75 length=4 output=12 format=%4s
tp02n:WARRANTY_CODE: type=s start=79 length=3 output=13 format=%3s
tp02n:SPLIT_FLAG: type=s start=82 length=1 output=14 format=%1s
tp02n:LINE_TOTAL: type=n start=83 length=10 output=15 format=%9s
tp02n:LINE_COG_AT_UC: type=n start=93 length=10 output=16 format=%9s
tp02n:COG_AT_VP: type=n start=103 length=10 output=17 format=%9s
tp02n:MIN_SELLING_PR: type=n start=113 length=10 output=18 format=%9s
tp02n:BRAND: type=s start=123 length=5 output=19 format=%5s
tp02n:DEPARTMENT: type=s start=128 length=2 output=20 format=%2s
tp02n:PRODUCT_TYPE: type=s start=130 length=3 output=21 format=%3s
tp02n:UNIT_OF_MEASURE: type=s start=133 length=2 output=22 format=%2s
tp02n:FLOOR_PLAN: type=s start=135 length=1 output=23 format=%1s
tp02n:SERIAL_NUMBER: type=s start=136 length=15 output=24 format=%15s
tp02n:SOLD_AT_STORE: type=s start=151 length=4 output=25 format=%4s
tp02n:FLR_MODEL_FLG: type=s start=155 length=1 output=26 format=%1s

tp0203:config: keyOffset=23 recLength=275 type=trans
tp0203:config: datafile=/home/DSK/DA1/TP0203.dat
tp0203:config: outFile=/home/bby/files/trans.detail.03.
tp0203:DATE_CCYYMMDD: type=s start=0 length=8 output=0 format=%8s
tp0203:AUDIT_STORE: type=s start=8 length=4 output=1 format=%4s
tp0203:TRANS_NUMBER: type=s start=12 length=6 output=2 format=%6s
tp0203:TRANS_TYPE: type=s start=18 length=2 output=3 format=%2s
tp0203:LINE_NUMBER: type=s start=20 length=3 output=4 format=%3s
tp0203:QUANTITY: type=s start=24 length=6 output=5 format=%6s
tp0203:ITEM_NUMBER: type=s start=30 length=14 output=6 format=%14s
tp0203:DESCRIPTION: type=s start=44 length=18 output=7 format=%18s
tp0203:SALESMAN1_5: type=s start=62 length=5 output=8 format=%5s
tp0203:SALESMAN2_5: type=s start=67 length=5 output=9 format=%5s
tp0203:COMMISS_PCT: type=s start=72 length=2 output=10 format=%2s
tp0203:SERIAL_NUMB_FLG: type=s start=74 length=1 output=11 format=%1s
tp0203:PICKUP_STORE: type=s start=75 length=4 output=12 format=%4s
tp0203:WARRANTY_CODE: type=s start=79 length=3 output=13 format=%3s
tp0203:SPLIT_FLAG: type=s start=82 length=1 output=14 format=%1s
tp0203:LINE_TOTAL: type=n start=83 length=10 output=15 format=%9s
tp0203:LINE_COG_AT_UC: type=n start=93 length=10 output=16 format=%9s
tp0203:COG_AT_VP: type=n start=103 length=10 output=17 format=%9s
tp0203:MIN_SELLING_PR: type=n start=113 length=10 output=18 format=%9s
tp0203:BRAND: type=s start=123 length=5 output=19 format=%5s
tp0203:DEPARTMENT: type=s start=128 length=2 output=20 format=%2s
tp0203:PRODUCT_TYPE: type=s start=130 length=3 output=21 format=%3s
tp0203:UNIT_OF_MEASURE: type=s start=133 length=2 output=22 format=%2s
tp0203:FLOOR_PLAN: type=s start=135 length=1 output=23 format=%1s
tp0203:SERIAL_NUMBER: type=s start=136 length=15 output=24 format=%15s
tp0203:SOLD_AT_STORE: type=s start=151 length=4 output=25 format=%4s
tp0203:FLR_MODEL_FLG: type=s start=155 length=1 output=26 format=%1s




3. finddiff.config
An example of the finddiff configuration file is listed below. This example was used during the testing of
the application. The configuration file should be representative of the production configuration file.



                                                                                     Page 58 of 61
                                                 Magnolia Data Warehouse Starter Kit
                                                                Perl Extracts Manual


#   This is the configuration file for the program that determines
#   what changes have been made to the Tyler system on a day to day basis.
#   This file controls which files need to have this process as well as the
#   basic information to do the "diff".
#
#   There is one short section to tell the program where the gzip utility
#   command can be found.
#
#   -gzip - name/location of the gzip utility
#   Format:
#   command:gzip: cmd=<gzip>
#
#   Each data file has one section to describe the files and keys.
#   The formatting of these lines is very important
#   to the difference program, so please be careful to format correctly.
#
#   The information passed into the difference program is as follows:
#   -input - Base name of extract files for this datafile. The difference
#            program will automatically look for the current day's file and the
#            previous day's file. It is assumed this process will run every day
#            of the year.
#   -output - Base name of the output file that will contain the list of
#            differences. Each record will have an indicator to tell whether
#            the change is an "add" or "change". It is not anticipated that
#            there will be any "deletes". The file name will be appended with
#            the date (CCYYMMDD).
#   -keylength - The length of the key for the data contained in the file. It
#            is assumed that the key is the first field(s) of the file. The key
#            will be used when determining if a record is a new or changed one.
#   -default - datafiles listed as default=y will be automatically processed when
#            the program is run from the command line.
#
#   Format:
#   datafiles:<common name>: input=<Location and base name of extract file>
#   datafiles:<common name>: output=<Location and base name of file containing
#                            differences (deltas)
#   datafiles:<common name>: keylength=<#> default=<y|n>
#

command:gzip: cmd=/usr/contrib/bin/gzip

datafiles:dept.names: input=/home/bby/files/source.dept.names
datafiles:dept.names: output=/home/bby/files/delta.dept.names
datafiles:dept.names: keylength=2 default=y

datafiles:payroll.master: input=/home/bby/files/source.payroll.master
datafiles:payroll.master: output=/home/bby/files/delta.payroll.master
datafiles:payroll.master: keylength=5 default=y

datafiles:inventory.mast: input=/home/bby/files/source.inventory.mast
datafiles:inventory.mast: output=/home/bby/files/delta.inventory.mast
datafiles:inventory.mast: keylength=15 default=y

datafiles:branch.info: input=/home/bby/files/source.branch.info
datafiles:branch.info: output=/home/bby/files/delta.branch.info
datafiles:branch.info: keylength=4 default=y

datafiles:inv.brands: input=/home/bby/files/source.inv.brands
datafiles:inv.brands: output=/home/bby/files/delta.inv.brands
datafiles:inv.brands: keylength=5 default=y

datafiles:product.types: input=/home/bby/files/source.product.types



                                                                       Page 59 of 61
                                                           Magnolia Data Warehouse Starter Kit
                                                                          Perl Extracts Manual

datafiles:product.types: output=/home/bby/files/delta.product.types
datafiles:product.types: keylength=3 default=y

datafiles:test: input=/home/bby/odata
datafiles:test: output=/home/bby/tdata
datafiles:test: keylength=20 default=n




4. ExtractCommon.mail
An example of the ExtractCommon.mail configuration file is listed below. This example was used during
the testing of the application. The configuration file should be viewed only as an example of what the file
could look like.

# This is the configuration file for the email functions of the
# ExtractCommon.pm module. The file contains some basic configuration
# infomation for the mail server as well as the lists of addresses to
# send the message to.
#
# The information requried in the first section of the configuration file is
# some default information.
# -enabled - Is smtp mail available on this server? (y or n)
#            If set to "n", no attempt will be made to send emails (and
#            therefore no error messages about the inability to send them)
# -address - Default email address that email will be sent from
# Format:
#   default:mail: enabled=<y|n>
#   default:from: address=<email address>
#
# The second section lists the addresses that emails should go to.
# -<address> - An individual address to send the emails generated by the
#              extract process
# -type      - Is the address to be included on the "to" or "copy" of the msg.
#              values: to or copy.
# -name      - Optional. Not used at this point, but may be nice to keep
#              track of who the email address is for if it is not obvious.
#email:<address>: type=<to|copy> name=<name>


default:mail: enabled=n
default:from: address=csullivan@magnoliahifi.com

email:bruce.anderson@bestbuy.com: type=to name=Bruce Anderson
email:csullivan@magnoliahifi.com: type=to name=Chris Sullivan
email:6125551234@att.net: type=copy name=Chris S pager (fake #)




                                                                                     Page 60 of 61
                                                               Magnolia Data Warehouse Starter Kit
                                                                              Perl Extracts Manual


SUPPORTING DOCUMENTATION
ADDED BY BRUCE ANDERSON, FROM EMAIL COMMUNICATION FROM BRUCE LINN

Change the FTP location for the files from Qual to Prod:

1) login to the Magnolia production box
2) type
         cp runall.prod runall.config

This will copy a configuration file for production into the default configuration file.

To change back to qual type:
       cp runall.qual runall.config



To Disable the daily schedule:
1) log on to the Magnolia Production system

2) type crontab -e

3) You will be placed in a "vi" editing session for the schedule
Move the cursor down to the beginning of the following line:

35 6 * * *       /opt/perl/bin/perl -I /home/bby /home/bby/runall.pl
1>>/home/bby/runlog.out 2>>/home/bby/runlog.err

4) Type i#       This should insert a lb. sign on the front of the line which comments out the entry.

5) type ESC :wq (the ESC is the escape key). This will save the schedule. If you don't want to save the
change, type :q!


Manually running extracts, compare and ftp:
Once the schedule is disabled, you may need to run the programs manually. To do this, run this
command from the command line:
/opt/perl/bin/perl -I /home/bby /home/bby/runall.pl 1>>/home/bby/runlog.out
2>>/home/bby/runlog.err


To re-enable the schedule:
1) log on to the Magnolia Production system

2) type crontab -e

3) You will be placed in a "vi" editing session for the schedule
Move the cursor down to the beginning of the following line:

#35 6 * * *       /opt/perl/bin/perl -I /home/bby /home/bby/runall.pl
1>>/home/bby/runlog.out 2>>/home/bby/runlog.err

4) Type x      This should remove the lb. sign from the beginning of the statement.

5) type ESC :wq (the ESC is the escape key). This will save the schedule.




                                                                                          Page 61 of 61

				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:11
posted:2/26/2010
language:English
pages:61