Data Flow Diagram for Employee Timesheet - DOC by lok79549

VIEWS: 477 PAGES: 24

More Info
									                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



1 Sources and Target Table Descriptions
1.1   Description of External Source Data to be Integrated into Database
Seven packages were created to integrate the source data into ten tables. The ten tables are:
   1. Employees
   2. EmployeeRates
   3. Division
   4. DivisionXCounty
   5. County
   6. Clients
   7. ClientGroupingsXClient
   8. ClientGroupings
   9. JobMaster
   10. JobTimeSheets


1.2   Packages and the Tables to be Integrated
Target Table                                       Package to be used to Integrate
Employees                                          EmployeeMasterPackage
EmployeeRates                                      EmployeeRates
Clients                                            ClientMaster
Division                                           DivisionCounty
County                                             DivisionCounty
DivisionXCounty                                    DivisionCounty
ClientGroupings                                    Grouping
ClientGroupingsXClient                             Grouping
JobMaster                                          JobMaster
JobTimeSheets                                      Timesheets

2 Control Flows & Data Flow Conversions
2.1   Control Flows

Generally, the Control Flows of each package will be followed by a SendMail Task that will send
an email to the SetFocus Administrator informing he/she of the successful execution of a given
package along with row counts of insertions, invalid records and changed records.


2.2   Data Flow Conversions

Due to Excel and comma-delimited data being incompatible with SQL Server datatypes, each
data source was read, converted to the proper data type of the destiny database table, and
loaded into the corresponding AllWorks SQL database tables. The conversions included the
following data types:
      String (DT_STR) for characters and varchars – the destination size was retained
      Data timestamp (DT_DBTIMESTAMP) for dates
      Numeric (DT_NUMERIC) with the correct precision and scale for decimals, currency, and
        numbers. For example, Decimal 18.2 format was transformed to DT_NUMERIC with a
        precision of 18 and a scale of 2.
      Four-byte signed integer (DT_I4) for integers




P. Costantino                              Page 1 of 24                                 1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



3 Packages
3.1   EmployeeMaster Package
Import the worksheet Employees$ from the spreadsheet Employees.XLS file located in folder
C:\SSIS Project\SetFocusBISourceData.

The following diagram shows the flow of tasks which import the data from the spreadsheet,
convert the fields or columns to conform with the AllWorks database, and derive the FullName
column from the converted lastname + “, “ + firstname columns. The converted employeeID
column is compared to the EmployeePK column of the AllWorks Employee table through the
lookup task.

The conditional split tests 2 conditions:

Condition       Logic Statement
NewRow          ISNULL(TargetEmployeePK)
ChangedRow      (TransformedFirstname != TargetFirstName) || (TransformedLastname != TargetLastName) ||
                (TransformedEmployeeFlag != TargetEmployeeFlag) || (TransformedCellPhone !=
                TargetCellPhone)

Through the use of a Conditional Split, if an EmployeePK record is missing (ISNULL), then the
new record is inserted into the Employee table. The ChangedRow condition tests to see if any of
the columns amongst the existing records were changed, then the affected records are updated.
See Figure 1, for illustration.

The valid records are inserted using an OLE DB Destination task. The updated records are
modified using an OLE DB Command transformation task. The New Row Count and Change
Row Count tasks are used to collect the number of rows affected by each operation which are
then reported to the Send Mail Task.




Figure 1, Data Flow for EmployeeMaster.pkg



P. Costantino                                  Page 2 of 24                                         1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



3.2   EmployeeRate Package
Import the worksheet EmployeeRates$ from the spreadsheet Employees.XLS located in folder
C:\SSIS Project\SetFocusBISourceData. Figure 2 illustrates the flow of tasks which import the
spreadsheet data, converts the fields or columns to conform with the AllWorks database, and
checks for invalid EmployeePK’s (Employee ID’s) against the Employees table and checks for
EmployeePK and EffectiveDate against the Employee Rates table. The conditional split checks
for 3 conditions as displayed below.

The conditional split tests 3 conditions:

Condition           Logic Statement
InvalidRows         ISNULL(TargetEmpEmployeePK)
NewRows             ISNULL(TargetRateEmployeeRatePK)
ChangedRows         (TransformedHourlyRate != TargetRateHourlyRate)

The NewRows row count, InvalidRow count and the ChangedRow count tasks are used to collect
the number of rows affected by each operation which are then reported to the Send Mail Task.

The Derived Column count for invalid employees is used to create an error description string that
is sent to the ErrorLog table along with today’s date using GetDate().

New Rows are sent to the EmployeeRates table and any modifications are sent to the OLE DB
Command task that modifies the EmployeeRates records.




Figure 2, DataFlow for EmployeeRate.pkg



P. Costantino                                 Page 3 of 24                              1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



3.3   ClientMaster Package
Import the worksheet ClientListing$ from the spreadsheet ClientGeographies.XLS located in
folder C:\SSIS Project\SetFocusBISourceData. Figure 3 shows the data flow of tasks which
import the data from the spreadsheet, convert the fields or columns to conform to the AllWorks
database, verify Client numbers, insert new records into the Clients table, and update appropriate
changes to the Client table.

The conditional split tests 2 conditions as shown below:

Condition           Logic Statement
NewRows             ISNULL(TrgtClientClientPK)
ChangedRows         (TrfmAccountNo != TrgtAccountNo) || ([TrfmClient Name] != TrgtClientName) ||
                    (TrfmProjectManager != TrgtProjectManager) || (TrfmClientImage != TrgtClientImage)




Figure 3, DataFlow for ClientMaster.pkg




P. Costantino                                  Page 4 of 24                                          1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




3.4   DivisionCounty Package
The objective is to normalize the Division table and omit any duplicate rows from the source data.
New divisions may be added, and division names may be changed. The same basic operation is
performed on the County data.

Three separate data flows handle the Division, County and DivisionXCounty tables within the
same package. Figure 4 illustrates the control flow.


The first dataflow handles the “Divisions Definitions$” sheet in the ClientGeographies.XLS
spreadsheet.

The second dataflow handles the “County Definitions$” sheet in the ClientGeographies.XLS
spreadsheet.

The third dataflow handles the “Divisions Definitions$” sheet to populate the DivisionXCounty
table.

The Send Mail Task sends the completion email alert to SetFocus.




Figure 4, Control flow for DivisionCounty.pkg




P. Costantino                              Page 5 of 24                                 1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



Import the worksheet DivisionDefinitions$ from the spreadsheet ClientGeographies.XLS located
in folder C:\SSIS Project\SetFocusBISourceData as shown in Figure 5 below.




Figure 5, Data flow for Division table in DivisionCounty package.




Figure 6, Data flow for County table in DivisionCounty package.




P. Costantino                               Page 6 of 24                                1/18/2011
                  Business Intelligence Masters Program – AllWorksDatabase SSIS Project



Import the worksheet DivisionDefinitions$ from the spreadsheet ClientGeographies.XLS located
in folder C:\SSIS Project\SetFocusBISourceData. First any duplicates are eliminated using the
Eliminate Duplicates task, then data is converted to conform to SQL Server types. DivisionPK’s
are validated against the Division table, then the same is done with CountyPK’s against the
County table. A lookup is then performed against the DivXCounty table to check for previously
inserted records and then a conditional split is used. Refer to Figure 7 for DivisionXCounty
dataflow.

The conditional split tests 3 conditions as shown below:

Condition             Logic Statement
InvalidDivision       ISNULL(TrgtDivisionDivisionPK)
InvalidCounty         ISNULL(TrgtDivisionCountyPK)
NewRow                ISNULL(TrgtDivisionXCountyPK)

The NewRow row count, Invalid County row count and Invalid Division row count tasks are used
to collect the number of rows affected by each operation which are then reported to the Send Mail
Task.

The Derived Column tasks for invalid county and division respectively are used to create an error
description string that is sent to the ErrorLog table along with today’s date using GetDate().




Figure 7, Data flow for DivisionXCounty table in DivisionCounty package.




P. Costantino                                   Page 7 of 24                              1/18/2011
                 Business Intelligence Masters Program – AllWorksDatabase SSIS Project




3.5    Grouping Package




Figure 8, Control flow for Grouping package.

Two separate data flows handle the ClientGroupings table and the cross reference table,
ClientGroupingsXClients within the same package. Figure 8 illustrates the control flow.

Figure 9, illustrates the data flow for handling Client Groupings. First, duplicates are eliminated,
data is converted to conform to SQL Server types, client groupings are validated with a lookup
and a conditional split tests for new rows and changed rows. Row counts for new and changed
rows are used to tabulate the counts for eventual reporting to the Send Mail Task in the control
flow.

Conditional split logic is as follows:
Condition           Logic Statement
NewRows             ISNULL(TrgtClientGroupingPK)
ChangedRows         ([TrfmGrouping Name] !=
                    TrgtGroupingName)




P. Costantino                                  Page 8 of 24                                 1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




Figure 9, Data flow for ClientGroupings in the Grouping package.

Figure 12, below illustrates the data flow task for handling the ClientGroupingsXClients cross
reference table data. First a source for SpecialGroupings$ worksheet is used to import the data
from ClientGeographies.XLS. A Multicast is used to split the data imported from the Source task
into 2 directions, to a Sort Task named Direct Data Sorted and to an Aggregate task to count
Grouping #’s by Account Key as shown in Figure 10 below. The result of the aggregation is also
sent to a Sort Task. Sort Tasks are required before using the Merge Join task.




Figure 10, Aggregate task for ClientGroupings.




P. Costantino                               Page 9 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



The Merge Join task is then used to perform a left outer join between the “Direct Data Sorted”
sort task and the “Aggregated Data Sorted” sort task via the “Account key” field. This operation
yields a new field called “Count” which will be used to determine accounts belonging to more than
1 group. The count field created is used in the conditional split to detect invalid rows.




Figure 11, Data flow for ClientGroupingsXClients in the Grouping package.

Data conversion is then performed to cast types to SQL Server standards. Client ID’s are
validated against the Clients table, then Client Grouping ID’s are validated against
ClientGroupings table and finally incoming records are checked against any existing records in
the cross reference table, ClientGroupingsXClients table.

The conditional split logic is as follows:
Condition            Logic Statement
InvalidRow           ISNULL(TrgtClientsClientPK) || ISNULL(TrgtCGrp_ClientGroupingPK) || (TrfmCount > 1)
wRow                 ISNULL(TrgtCGrpXC_ClientGroupingsXClientsPK)




P. Costantino                                   Page 10 of 24                                         1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



The NewRow row count and Invalid row count tasks are used to collect the number of rows
affected by each operation which are then reported to the Send Mail Task.

The Derived Column tasks for invalid row counts are used to create an error description string
that is sent to the ErrorLog table along with today’s date using GetDate().




Figure 12, Data flow for ClientGroupingsXClients in the Grouping package.

3.6    JobMaster Package
Import the spreadsheet ProjectMaster.xls located in folder
C:\SSIS\Project\SetFocusBISourceData. The entries are verified against the Clients table to verify
that they are valid. Entries are also checked to see if they’ve already been inserted to the
JobMaster table. A Conditional Split task tests all the entries.

The conditional split logic is as follows:
Condition            Logic Statement
InvalidRow           ISNULL(TrgtClientsClientPK)
NewRow               ISNULL(TrgtJobMasterJobMasterPK)
BlankField           ISNULL(TrfmClientJob)
ChangedRow           (TrfmDescription != TrgtJobMasterDescription) || (TrfmClientJob != TrgtJobMasterClientJob) ||
                     (TrfmMaterialMarkupPct != TrgtJobMasterMaterialMarkupPct) || (TrfmAdditionalOverheadPct !=
                     TrgtJobMasterAdditionalOverheadPct) || (TrfmJobClosed != TrgtJobMasterJobClosed) ||
                     (TrfmJobClosedDate != TrgtJobMasterJobClosedDate) || (TrfmJobCreatedDate !=
                     TrgtJobMasterJobCreatedDate)

All invalid entries are reported via the Derived Column for Error Description in the ErrorLog table.
Valid entries are inserted into the JobMaster table. New jobs may be added and previously
inserted jobs may be changed.




P. Costantino                                    Page 11 of 24                                           1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



The Derived Column count for invalid ClientPK’s is used to create an error description string that
is sent to the ErrorLog table along with today’s date using GetDate(). The Derived Column task
for blank client job is used to create an error description string that describes a record with a
blank client job to the ErrorLog table.

The NewRow row count, InvalidRow count, Blank Client Job count and the ChangedRow count
tasks are used to collect the number of rows affected by each operation which are then reported
to the Send Mail Task.




Figure 13, Data flow for the JobMaster package.

3.7   Timesheets Package


A ForEach loop and a user defined variable (User::csvFile) is used to read through the EmpTime
sheets at the Control Level. In addition to the Data Flow task, a Script task is looped through at
this level to accumulate the totals for each of the counters for each of the .CSV files processed.
Refer to Figure 14, for Control Task design. Refer to Listing 1 for the script code that processes
the counter totals.

With the use of a For Each container new time sheets may be added to the :\SSIS
Project\SetFocusBISourceData\time directory without modifying the SSIS package code.




P. Costantino                              Page 12 of 24                                  1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




Figure 14 Control flow for the Timesheet package.




P. Costantino                              Page 13 of 24                                1/18/2011
                 Business Intelligence Masters Program – AllWorksDatabase SSIS Project




Listing 1

Import the series of csv employee time sheets (EmpTime500.csv, EmpTime1000.csv,
EmpTime2000.csv, EmpTime3000.csv, and EmpTime4000.csv) located in folder
C:\SSIS Project\SetFocusBISourceData\time. Perform a Data Conversion task to cast types to
conform with SQL Server’s types.

Each entry is verified against the JobMaster and the Timesheets tables. All invalid entries are
entered in the ErrorLog table with appropriate error descriptions. Valid entries are inserted into
the JobTimeSheets table. Refer to Figure 15 for data flow diagram.

The conditional split tests for 4 conditions:

Condition            Logic Statement
InvalidJobMasterPK   ISNULL(TrgtJobMasterJobMasterPK)
InvalidTimesheet     (TrgtJobMasterJobClosed == TRUE )&&( [TrfmWorkDate]> [TrgtJobMasterJobClosedDate])
NewRow               ISNULL(TrgtJobTimeShtJobTimeSheetsPK)
ChangedRow           TrfmHoursWorked != TrgtJobTimeShtHoursWorked

The InvalidJobMasterPK, NewRow, InvalidTimesheet row count tasks are used to collect the
number of rows affected by each operation which are then reported to the Send Mail Task.

The Derived Column tasks for InvalidJobMasterPK and InvalidTimesheet row counts are used to
create an error description string that is sent to the ErrorLog table along with today’s date using
GetDate().




P. Costantino                                   Page 14 of 24                                    1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




Figure 15, Data flow for time sheets in the Timesheets package.




P. Costantino                              Page 15 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




3.8   Master Package
The Master package simply directs the order of the execution of the seven packages. See Figure
16 for Data Flow diagram.




Figure 16, Data flow for the Master package.




P. Costantino                              Page 16 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




3.9   DBMaint Package
The DBMaint Package sequentially calls the backup and maintenance tasks required for midnight
daily. See Figure 17 for data flow diagram.




Figure 17, Data flow for the DBMaint package.


4 Send Mail Tasks
Generally the Send Mail Tasks for each package will have the following parameter values set
within the Send Mail Task Editor.




P. Costantino                              Page 17 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




Figure 18, Send Mail Task, Mail properties.

The following is the expression to generate the message body of the email to be sent to
biproject@setfocus.com to alert completion of a package. The following is the expression and
Figure 19, shows where the expression is entered in the Send Mail Task editor.

@[System::PackageName] + " was executed at " + (DT_WSTR,40)@[System::StartTime] + " by
user " + @[System::UserName] + " on machine " + @[System::MachineName] + ". " + "There
were " + (DT_STR,4,1252) @[User::NewRow_Cumul] + " new rows inserted, " +
(DT_STR,4,1252) @[User::InvalidJobMasterPK_Cumul] + " invalid rows detected, " +
(DT_STR,4,1252) @[User::InvalidTimesheet_Cumul] + " invalid timesheets detected."




P. Costantino                                 Page 18 of 24                             1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project




Figure 19, Expression builder for message source in Send Mail Task.




P. Costantino                              Page 19 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



5 SSMS Upload
Once the packages have been tested and finalized, the packages are then uploaded singly to the
CLASSR206 server under AllWorksDB as illustrated below. The packages uploaded must come




from the \Bin directory of the solution.
Each compiled package is imported into the
AllWorks directory under Stored
Packages|MSDB|AllWorks on the local SQL
Server. See Figure 20 for details.




Figure 20, SSMS directory location for compiled packages.



P. Costantino                              Page 20 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



6 Schedule Tasks
To implement a scheduled task, SSMS is used by clicking on “Job” and highlighting “New Jobs” in
the pop-up window (Fig. 21), this will in turn open a dialog box to set all of the pertinent
parameters to establish a scheduled task using SQL Server (Fig. 22).




Figure 21, create a new job in SSMS SQL Server Agent




Figure 22, New Job dialog box




P. Costantino                              Page 21 of 24                                1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



Complete the properties as illustrated in Figure 23 below for the main package, as in this example
for AllWorksDB and then repeat the same procedure for DBMaint package which will be the
second job.




Figure 23, Job Properties dialog box with parameters completed.




P. Costantino                              Page 22 of 24                                 1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



To set up the tasks on a schedule, choose the “Schedules” item amongst the items shown under
“Select a page” to set up the properties. The dialog box changes to the following display as
shown below:




Figure 24, Job Properties dialog box to set up a scheduled task.




P. Costantino                               Page 23 of 24                               1/18/2011
                Business Intelligence Masters Program – AllWorksDatabase SSIS Project



A new dialog box, “New Properties” appears and requests the parameters desired for the
scheduled task. See Figure 25 for display appearance.




Figure 25, New Job Schedule dialog box




P. Costantino                              Page 24 of 24                                1/18/2011

								
To top