Docstoc

SSIS Creating a Custom Transformation Component

Document Sample
SSIS Creating a Custom Transformation Component Powered By Docstoc
					Hands-On Lab
Lab Manual
Creating a custom transformation component in SQL Server Integration Services

Page i

Information in this document is subject to change without notice. The example companies, organizations, products, people, and events depicted herein are fictitious. No association with any real company, organization, product, person or event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarked, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. © 2005 Microsoft Corporation. All rights reserved. Microsoft, MS-DOS, MS, Windows, Windows NT, MSDN, Active Directory, BizTalk, SQL Server, SharePoint, Outlook, PowerPoint, FrontPage, Visual Basic, Visual C++, Visual J++, Visual InterDev, Visual SourceSafe, Visual C#, Visual J#, and Visual Studio are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other countries. Other product and company names herein may be the trademarks of their respective owners.

Page ii

Contents
LAB 1: CREATING A NO-OP COMPONENT ........................................................................................................... 1 Lab Objective ......................................................................................................................................................... 1 Exercise 1 – Writing the no-op data flow transformation component .................................................................... 2 Task 1 – Create a new C# Class Library project ............................................................................................... 2 Task 2 – Specify the default component settings .............................................................................................. 2 Exercise 2 – Building and registering the component ........................................................................................... 5 Task 1 – Signing and building the component .................................................................................................. 5 Task 2 – Registering the component. ................................................................................................................ 6 Exercise 3 – Using the component in a package .................................................................................................. 7 Task 1 – Registering the component for use in the designer ............................................................................ 7 Task 2 – Using the component in a data flow ................................................................................................... 8 Task 3 – Executing the test package .............................................................................................................. 12 Exercise 4 – Debugging the component code ..................................................................................................... 14 Task 1 – Debugging the component through DTExec .................................................................................... 15 LAB 2: CREATING A HASH TRANSFORMATION ................................................................................................ 18 Lab Objective ....................................................................................................................................................... 18 Exercise 1 – Initializing the component‟s inputs and outputs .............................................................................. 19 Task 1 – Initializing the component assembly environment ............................................................................ 19 Task 2 – Initialize the component .................................................................................................................... 19 Exercise 2 – Providing design time validation ..................................................................................................... 22 Task 1 – Validate the component .................................................................................................................... 22 Task 2 – Testing design time validation .......................................................................................................... 24 Exercise 3 – Caching column indices in PreExecute .......................................................................................... 28 Task 1 – Populate the index of input and output columns in the buffer .......................................................... 28 Task 2 – Caching the MD5 crypto service provider ........................................................................................ 29 Exercise 4 – Producing a hash result on the output column ............................................................................... 31 Task 1 – Create the ProcessInput method ...................................................................................................... 31 Exercise 5 - Testing the component .................................................................................................................... 33 CONCLUSION ......................................................................................................................................................... 35

Page iii

Lab 1: Creating a no-op component
Instructor notes: This lab will lead attendees through creating the bare bones data flow component and how they can debug it. The code will be written in C#. However, any language that can produce a .Net assembly can be used: VB.Net, Delphi, etc.

Lab Objective
Estimated time to complete this lab: 40 minutes The objective of this lab is to write the simplest transformation possible: a no-op transform that allows you to see the data as it flows through the transform. The exercises in this lab are: Writing the no-op data flow transformation component Building and registering the component Using the component in a data flow task Debugging the execution of the component

Page 1

Exercise 1 – Writing the no-op data flow transformation component
Instructor notes: Some students might have a question about why the namespaces of integration services has DTS in the name. This is because the name change happened too late into the development cycle for us to go back and update the object model.

In this exercise, you will create a new C# based class library which will be the implementation of the data flow component. Task 1 – Create a new C# Class Library project Create a new Visual C# Class Library Project inside Visual Studio 2005 called „SeeBuffer‟. Put the directory in the location shown so that it can be cleaned up automatically after you‟re done with this lab.

Task 2 – Specify the default component settings 1. Add reference to the „Microsoft.SqlServer.PipelineHost‟ assembly which provides the base implementation of a pipeline component. 2. Add reference to the „Microsoft.SqlServer.DTSPipelineWrap‟ assembly which provides an interop wrapper around the pipeline object model.

Page 2

3. Declare the use of the „Microsoft.SqlServer.Dts.Pipeline‟ namespace, change the namespace of your class to CustomComponents, and change the name of the class to SeeBufferComponent:
using using using using System; System.Collections.Generic; System.Text; Microsoft.SqlServer.Dts.Pipeline;

namespace CustomComponents { public class SeeBufferComponent { } }

Derive the SeeBufferComponent class from the default base component implementation, PipelineComponent:
using using using using System; System.Collections.Generic; System.Text; Microsoft.SqlServer.Dts.Pipeline;

namespace CustomComponents { public class SeeBufferComponent : PipelineComponent { } }

Add the DtsPipelineComponent attribute to your component class. This will allow you to also set the name of the component that will show up in the toolbox, as well as describe whether this is a Source, Transformation or Destination component.
Page 3

namespace CustomComponents { [DtsPipelineComponent(DisplayName = "SeeBuffer", ComponentType = ComponentType.Transform)] public class SeeBufferComponent : PipelineComponent { } } Note: DtsPipelineComponent has other properties to specify description, icon, editor, etc.

Override the base ProcessInput implementation. ProcessInput is the method where a buffer is made available to the transform as it‟s produced from upstream components.
public class SeeBufferComponent : PipelineComponent { public override void ProcessInput(int inputID, PipelineBuffer buffer) { int numberOfRows = buffer.RowCount; bool eof = buffer.EndOfRowset; } } Note: The ‘numberOfRows’ and ‘eof’ variables are created for debugging only.

That’s it! You’ve now created the simplest transform. Lets use it in the designer.

Page 4

Exercise 2 – Building and registering the component
Task 1 – Signing and building the component 1. Edit the properties of the project, and switch to the „Signing‟ tab:

2. Check the „Sign the assembly‟ option and click „<New…>‟ to create a new key file:

3. Select „SeeBuffer‟ as the name of the key file. For this exercise, do not protect the key file with a password:

Page 5

4. Build the project. After building, you‟ll have a „SeeBuffer.dll‟ file in the bin\Debug directory of your project. If you used the suggested folder for creating the project, the project is in the „C:\SSIS Hands-OnLab\Scratch\SeeBuffer\SeeBuffer\‟ folder. Task 2 – Registering the component.
1.

Copy SeeBuffer.DLL that was created by the build process to %ProgramFiles%\Microsoft SQL Server\90\DTS\PipelineComponents. This will enable the SSIS component enumeration to discover the component you‟ve just created. Use GACUtil.EXE to install the assembly into the GAC using the following command:
gacutil /i "<location of the project>\bin\Debug\SeeBuffer.dll"

2.

This command will enable the assembly to be used by the designer and during execution.
Note: Subsequent labs will invoke a batch file that registers the component after a successful build.

Congratulations! You‟ve now registered your first component. Let us use this component in a package next.

Page 6

Exercise 3 – Using the component in a package
In this exercise, we‟ll register the component in the designer toolbox and use it in a simple package. Task 1 – Registering the component for use in the designer 1. Start the Business Intelligence Development Studio. 2. Create a new „Integration Services Project‟ project, again under the „Scratch‟ directory:

3. Right click on the toolbox and select „Choose Items…‟

Page 7

At this point, the designer will enumerate all tasks, connections, and data flow components registered for use on this machine. 4. In the „SSIS Data Flow Items‟ tab, check the „SeeBuffer‟ list item. This will add the „SeeBuffer‟ transform to your toolbox.

Task 2 – Using the component in a data flow 1. Add a „Data Flow Task‟ to your package from the toolbox by either double clicking the task on the toolbox or dragging and dropping it on the Control Flow diagramming surface:

2. Double click the Data Flow Task to switch to its editor. 3. Add an „OLE DB Source‟ adapter to the data flow task.
Page 8

4. Edit the OLE DB Source adapter, click „New‟ to create a new connection for the package. 5. In the dialog, you‟ll see a list of two or more most recently used connections that have been defined for you already. Use „dbil405.AdventureWork‟ as the source connection.

6. In the „Name of the table or the view‟ drop down, select „[Person].[Address]‟ as the source table. For this exercise, it doesn‟t matter what source table is selected.

7. Click „OK‟ to commit your edits to the OLE DB Source component.

Page 9

8. Next, find the „SeeBuffer‟ transform on the toolbox. Since it was declared to be a transformation and not a source or a destination, it would show up under the „Data Flow Transformations‟ category of the toolbox.

9. Drag and drop the „SeeBuffer‟ transform on the data flow surface. 10. Click and drag the green arrow out of the OLE DB Source component and drop it on the SeeBuffer component to configure a path of data flow between them.

11. Since the transform doesn‟t do any operations, it is not necessary to edit it. You can, however, edit the transform and notice that a default editor is provided for you automatically:

Page 10

Note: You, of course, have the ability to define your custom transformation that hides most of the object model from your users.

12. Whenever you‟re done with the SeeBuffer editor, click Cancel to get back to the data flow task editor. 13. To make sure the transform run, add a Union All after the See Buffer so that your data flow looks like this:

Page 11

Task 3 – Executing the test package 1. Right click on the path between the „SeeBuffer‟ and the „UnionAll‟ and select „Data Viewers‟. 2. Click „Add‟ to create a new Data Viewer. 3. Select the default „Grid‟ viewer, and click „OK‟. 4. Click OK to get out of the Path editor and back to the data flow task again. Your data flow will now look like this:

5. To execute the package, hit „F5‟. When Business Intelligence Development Studio switches to debug mode, you will get an extra tool window which contains the data viewer. Rearrange that as appropriate. Your environment will look like this:

Here, you can see the data coming from the Person.Address table passing through the SeeBuffer component without any operations.
Page 12

6. Once done with the debugging, you can stop the debugger to stop execution. 7. Close the Business Intelligence Development Studio.

Page 13

Exercise 4 – Debugging the component code
To understand how to debug the component, you need to first understand how the designer executes the package. The Business Intelligence Development Studio (BIDS) is basically Visual Studio, or devenv.exe. When the SSIS debug engine starts, devenv starts a process called DtsDebugHost.EXE in which to execute the package. This isolates the designer (devenv) from side effects of execution – memory corruptions, incorrect cleanup, side effects of debugging values, etc. In order to keep the debugger operational during debugging, devenev.exe actually starts another dummy copy of DtsDebugHost.EXE and connects to that which receiving events from the executing DtsDebugHost. This dummy process could have been anything – including notepad – but we selected dtsdebughost.exe since we understand and can easily control that process. This means that if you want to debug the design time functionality of the component (none of which you‟ve actually created in your package as yet), you‟ll have to attach your debugger to the devenv.exe process that‟s hosting the Integration Services project. However, in order to debug execution time behavior, you‟ll have to attach to DtsDebugHost.EXE if the package is started from devenv. As outlined above, there will be two DtsDebugHost.EXE available, we‟ll have to pick the right one because the other one is boring. The easiest way to identify the right DtsDebugHost.eXE would be to check what types of debug engines have been loaded in the process. You‟re looking for the one that has Managed and native debug engines, process with PID 3216 below.

Another way to execute the process is by using dtexec.exe. This command line utility executes the packages outside the designer. Since debugging execution time using dtexec is more straightforward, that‟s what we‟ll use for this lab.

Page 14

Task 1 – Debugging the component through DTExec 1. Open the existing project, called „SeeBuffer‟ from „C:\SSIS Hands-On-Lab\Scratch\Lab1 Completed\SeeBuffer\SeeBuffer.csproj‟ 2. Open the „SeeBuffer.cs‟ file in the project and review the code – it should look almost identical to what you created in the previous exercises. 3. Edit the Properties of this project, and switch to the „Build Events‟ tab to review its settings:

The „Build Events‟ tab specifies that a batch file will be run to register the components so that we don‟t have to worry about that. However, having done this in the previous exercises, you‟re now aware of what the batch file contains and why it is necessary. Review the batch file if you wish, it is in the same directory as the project. 4. Switch to the „Debug‟ tab and review and settings there which instruct Visual Studio to start dtexec.exe with a pre defined package which includes the project to start debugging:

Page 15

5. Lets get debugging. Put a breakpoint on the first line of the ProcessInput implementation:

6. Start execution. 7. Dtexec.exe will popup, write a lot of validation messages, and then cause the debugger to stop at the breakpoint. 8. Inspect the various buffer properties. It seems this buffer has 16 rows.

9. Hit F5 again to complete processing this buffer. The debugger will then stop again, this time with a buffer.RowCount of 0 and buffer.EndOfRowset of True. This is the indication to your transformation that there is no more data coming from upstream components.
Page 16

Note: if you’re writing a transform such as a Sort or and Aggregate, you need to know when you have encountered all of the data… EndOfRowset is exactly the property to use for that.

10. Close all instances of Business Intelligence Development Studio and Visual Studio.

Congratulations, you have built your first transform, registered it in the designer, used it in a data flow, reviewed the default editor your transform gets automatically, watched the debugging experience in the designer, and finally have debugged through the component at execution time!

Page 17

Lab 2: Creating a Hash Transformation
Lab Objective
Estimated time to complete this lab: 50 minutes The objective of this lab is to write a more functional transform that references data available from an upstream component and produces new data on the output. The transform will also perform some validation so that you can get an understanding of what‟s involved there.

In order to demonstrate this functionality, the transform will implement the ability to create a hash value of a Unicode string.

The exercises of this lab are:

Initializing the component Providing design time validation Caching column indices in PreExecute Producing a hash result on the output column Testing the component

Page 18

Exercise 1 – Initializing the component’s inputs and outputs
The component object model provides a method, ProvideComponentProperties, for allowing component authors to initialize the component when it‟s first added to the data flow. This function is useful for components to set up the initial number of inputs/outputs. For example, the Merge Join transform uses this to create two inputs and one output. This method can also be called to reset a component back to its initial state, which is useful if users edit the persisted XML (which itself is not supported) and get it wrong.

Task 1 – Initializing the component assembly environment These steps are similar to what we did in the first exercise of the first lab. Instead of repeating the steps of Exercise #1, a starting point is available to you. 1. Open the CreateHashTransform.csproj project in “C:\SSIS Hands-OnLab\Scratch\CreateHashTransform”. 2. First review the list of references. You‟ll see a new reference there, to Microsoft.SqlServer.ManagedDTS, the managed runtime object model; and to Microsoft.SqlServer.DTSRuntimeWrap, the interop for the native runtime object model. 3. In addition, you‟ll also notice the following declarations of using new namespaces in CreateHashTransform.cs:
using Microsoft.SqlServer.Dts.Runtime.Wrapper; using System.Security.Cryptography;

Both of these will be used by the component code. 4. The use of DtsPipelineComponent attribute and deriving from PipelineComponent was outlined in Exercise 1 and is already done for you. 5. In addition, a successful build automatically registers the component for you. Task 2 – Initialize the component Finally, we can work on the component code itself. In this task, we will implement the code called when this component is first added to the data flow, or when the component is asked to reset itself. 1. Override the „ProvideComponentProperties‟ method. Comments are added for clarification and do not, of course, need to be typed in verbatim.
public class CreateHashTransform : PipelineComponent { /// <summary> /// Called to initialize/reset the component /// </summary> public override void ProvideComponentProperties() Page 19

{ } }

2. Call the base implementation:
/// <summary> /// Called to initialize/reset the component /// </summary> public override void ProvideComponentProperties() { base.ProvideComponentProperties(); }

3. Now, since this method can also be called to reset the component, delete any existing metadata (inputs, outputs, input columns, etc.) from it by calling another method in the base class:
/// <summary> /// Called to initialize/reset the component /// </summary> public override void ProvideComponentProperties() { base.ProvideComponentProperties(); // remove any existing metadata base.RemoveAllInputsOutputsAndCustomProperties();

4. Next, create the input and output. This transform will have only one input and one output. Note that this is where you can name the input and output appropriately.
/// <summary> /// Called to initialize/reset the component /// </summary> public override void ProvideComponentProperties() { base.ProvideComponentProperties(); // remove any existing metadata base.RemoveAllInputsOutputsAndCustomProperties(); // create the input and the output IDTSInput90 input = this.ComponentMetaData.InputCollection.New(); input.Name = "Input"; IDTSOutput90 output = this.ComponentMetaData.OutputCollection.New(); output.Name = "Output";

5. Next, we‟ll tie the output to the input. In other words the buffer that comes in to the input will be the same one that goes to the output, we‟ll just add our data to a separate column in the buffer:
// create the input and the output IDTSInput90 input = this.ComponentMetaData.InputCollection.New(); Page 20

input.Name = "Input"; IDTSOutput90 output = this.ComponentMetaData.OutputCollection.New(); output.Name = "Output"; // the output is synchronous with the input output.SynchronousInputID = input.ID;

6. Finally, lets create the output column that will contain the result of the transform, the hash value:
// the output is synchronous with the input output.SynchronousInputID = input.ID; // create our hash value output column IDTSOutputColumn90 outputColumn = output.OutputColumnCollection.New(); outputColumn.Name = "Hash value"; outputColumn.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.Da taType.DT_WSTR, 24, 0, 0, 0);

The highlighted lines of code create a new output column with the name “Hash Value”, and a data type of DT_WSTR (Unicode string) of size 24 characters. That‟s it. We‟ve now created the metadata of the component.

Page 21

Exercise 2 – Providing design time validation
As you can imagine, components can provide fairly unique functionality. In order to help guide the user along during the design time and also to highlight unintentional errors during both design time and during execution, the pipeline object model provides the ability for components to validate their state and report this status to the designer or any other pipeline object model client. In this exercise, we will add rudimentary validation to the component. Note that the Validate method that is discussed below is called during both design time and during execution. Task 1 – Validate the component 1. Override the „Validate‟ method that‟s provided for you by the base class.
/// <summary> /// only one column can be supported at this time /// </summary> /// <returns></returns> public override DTSValidationStatus Validate() { return base.Validate(); }

You‟ll notice that the DTSValidationStatus is an enumeration that has the following potential values: DTSValidationStatus
VS_ISVALID VS_ISBROKEN VS_NEEDSNEWMETADATA VS_ISCORRUPT

Description
The component is correctly configured and ready for execution. The component is incorrectly configured; typically, this indicates that a property is set incorrectly. The component's metadata is outdated or corrupt, and a call to component’s ReinitializeMetaData() will repair the component. The component is irreparably damaged and must be completely reset. The designer calls the component's ProvideComponentProperties().

2. If the base implementation of Validate says the component is valid, we‟ll perform our own validation. The simplest validation for us is to make sure we have only one input, one output, and the single output has only one column. Otherwise, we report an error using the „FireError‟ method. The only way we could get into this state is if the user modified the XML. We‟ll therefore use the VS_ISCORRUPT validation status flag notifying the caller of this.
DTSValidationStatus status = base.Validate(); if (status == DTSValidationStatus.VS_ISVALID) { // input and output are created by us, they should be exactly what we wanted unless someone

Page 22

// played around with the XML. if ((ComponentMetaData.InputCollection.Count != 1) || (ComponentMetaData.OutputCollection.Count != 1) || (ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1)) { bool cancel = false; this.ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } } return status; }

3. In this version of the transform, only a single input column can be selected by the user.
// input and output are created by us, they should be exactly what we wanted // unless someone played around with the XML. if ((ComponentMetaData.InputCollection.Count != 1) || (ComponentMetaData.OutputCollection.Count != 1) || (ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1)) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid metadata", string.Empty, 0, out cancel); status = DTSValidationStatus.VS_ISCORRUPT; } else { // only one column can be selected if (ComponentMetaData.InputCollection[0].InputColumnCollection.Count != 1) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, "Only one input column is currently supported", string.Empty, 0, out cancel); return DTSValidationStatus.VS_ISBROKEN; } } return status;

4. Finally, the data type of the input column must say it is a Unicode string since that‟s all we currently support:
// only one column can be selected if (ComponentMetaData.InputCollection[0].InputColumnCollection.Count != 1) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, "Only one input column is currently supported", string.Empty, 0, out cancel); return DTSValidationStatus.VS_ISBROKEN; } else Page 23

{ // the data type of the input column must be DT_WSTR because that's all we support right now if (ComponentMetaData.InputCollection[0].InputColumnCollection[0].DataType != DataType.DT_WSTR) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, "Input column data type must be DT_WSTR", string.Empty, 0, out cancel); return DTSValidationStatus.VS_ISBROKEN; } }

That‟s it for some simple validation. Lets test this design time behavior out in the designer. Task 2 – Testing design time validation 1. Build the component which will register it as well. 2. Start the Business Intelligence Development Studio and add the „CreateHash‟ transform to the toolbox. The steps to follow here are similar to what we went through to get „SeeBuffer‟ registered. Of course the transform to add to the toolbox is now called „CreateHash‟ instead of „SeeBuffer‟. 3. Create a new package that contains an OLE DB source to any table that contains a unicode string column. While it is unlikely users will want to create a hash value on a string that denotes the address of a person, that‟s what we‟ll use for this experiment. Such data exists in the [Person].[Address] table of AdventureWorks:

4. Now add a new „Create Hash transform‟. Notice the red X icon signifying a validation error. Click on the transform to get the validation error in a tooltip, or use the Errors List window. In this case, we‟re getting a validation error saying only one column can be selected in the input:

Page 24

5. Edit the Create Hash Transform, switch to the Input Columns tab, and select a column which is not a Unicode string. In this example, we‟ve selected „rowguid‟:

6. Click OK and see the new validation error complaining about the data type of the selected column:

Page 25

7. Finally, edit the transform again, unselect „rowguid‟, and select the „AddressLine1‟ column:

8. Click „OK‟ and notice that the „Create Hash Transform‟ now no longer has the red X icon identifying the error. It is now valid!

Page 26

9. Close the Business Intelligence Development Studio whenever you‟re done with this exercise.

Page 27

Exercise 3 – Caching column indices in PreExecute
Now we‟re ready to work on the execution time behavior of this component. As we saw in Lab1, ProcessInput is called once per buffer. PreExecute is a method called before ProcessInput to allow a component to cache whatever information it needs to cache and get ready for a call to ProcessInput (or other per-buffer calls depending on the type of component). If you‟ve written code to access rowset fields by names in the past, you‟ve probably optimized the loop over all the rows by first finding what column index contains a column for a specific name. Since this is done outside the loop, and the loop indexes into the fields collection by an integer rather than the column name, execution time improves. We‟ll follow a similar pattern here and get the column index of both the input column and the output column so that the per-buffer call is faster. Task 1 – Populate the index of input and output columns in the buffer 1. Create a couple of variables at the class level that will hold this information:
public class CreateHashTransform : PipelineComponent { /// <summary> /// buffer indices for processing /// </summary> private int inputColumnBufferIndex = -1; private int outputColumnBufferIndex = -1;

2. Now, override the PreExecute method:
/// <summary> /// called before executing, remembers the buffer column index /// </summary> public override void PreExecute() { base.PreExecute(); }

3. Lets get the index of the input column:
/// <summary> /// called before executing, remembers the buffer column index /// </summary> public override void PreExecute() { base.PreExecute(); // get the location of the input column in the buffer IDTSInput90 input = ComponentMetaData.InputCollection[0]; inputColumnBufferIndex = BufferManager.FindColumnByLineageID(input.Buffer, input.InputColumnCollection[0].LineageID); }

Page 28

Several concepts need to be highlighted here: a. The BufferManager manages the layout of buffer in the pipeline engine and is the class which can tell us where a column exists in a buffer. b. As you know, several components can reference the same column in a data flow. For example, we‟re using the AddressLine1 column to create a hash. A subsequent destination might use the AddressLine1 to map to a destination column; and another transform in the middle might be responsible for parsing the contents of the AddressLine1 column. However, all of them are using the same column! The pipeline keeps track of this fact by assigning a unique LineageID to the AddressLine1 column when it‟s introduced by the OLE DB Source into the data flow. From that point onwards all input columns that reference this column refer to it by its LineageID. c. Finally, during runtime, the Buffer is available as a property of the Input. 4. Now we do the same for the output column. Note that since the output of this transform is synchronous with the input (no new buffer is created), the output column will be in the input‟s buffer.
/// <summary> /// called before executing, remembers the buffer column index /// </summary> public override void PreExecute() { base.PreExecute(); // get the location of the input column in the buffer IDTSInput90 input = ComponentMetaData.InputCollection[0]; inputColumnBufferIndex = BufferManager.FindColumnByLineageID(input.Buffer, input.InputColumnCollection[0].LineageID); // get the location of the output column in the buffer IDTSOutput90 output = ComponentMetaData.OutputCollection[0]; outputColumnBufferIndex = BufferManager.FindColumnByLineageID(input.Buffer, output.OutputColumnCollection[0].LineageID); }

Task 2 – Caching the MD5 crypto service provider We‟ll be using this class for performing the hash function. So, add a variable at the class and initialize it in PreExecute: 1. Add the variable to the class:
public class CreateHashTransform : PipelineComponent { /// <summary> /// our hash code provider /// </summary>

Page 29

private MD5 md5 = null;

2. Add the code to initialize this variable in PreExecute:
public override void PreExecute() { [...] // initialize our hash provider md5 = new MD5CryptoServiceProvider(); }

3. Finally, lets clear it out as part of a new PostExecute:
public override void PostExecute() { base.PostExecute(); md5 = null; }

Page 30

Exercise 4 – Producing a hash result on the output column
Finally, we can now create the hash value itself. Task 1 – Create the ProcessInput method 1. Override the ProcessInput method from the base:
/// <summary> /// called to do the actual processing, computes the has value and saves it /// in the output column /// </summary> /// <param name="inputID"></param> /// <param name="buffer"></param> public override void ProcessInput(int inputID, PipelineBuffer buffer) { }

2. Only do work if this buffer has some data in it:
public override void ProcessInput(int inputID, PipelineBuffer buffer) { if (buffer.EndOfRowset == false) { } }

3. Some basic error traps to report any errors encountered during execution:
public override void ProcessInput(int inputID, PipelineBuffer buffer) { if (buffer.EndOfRowset == false) { try { } catch (System.Exception ex) { bool cancel = false; ComponentMetaData.FireError(0, ComponentMetaData.Name, ex.Message, string.Empty, 0, out cancel); throw new Exception("Could not process input buffer."); } } }

4. Now, loop over all the rows in the buffer provided:
try { while (buffer.NextRow()) { } }

5. If the input column is null, the hash value will also be null.

Page 31

while (buffer.NextRow()) { // hash value for NULLs is NULL. if (buffer.IsNull(inputColumnBufferIndex)) { buffer.SetNull(outputColumnBufferIndex); } }

As you can see, the buffer provides methods for checking for NULL values and for setting NULL values for output columns. 6. If the input value is not null, we calculate a hash using the MD5 crypto service provider
// hash value for NULLs is NULL. if (buffer.IsNull(inputColumnBufferIndex)) { buffer.SetNull(outputColumnBufferIndex); } else { // get the raw contents of the buffer string inputValue = buffer.GetString(inputColumnBufferIndex); byte[] columnContents = UnicodeEncoding.Unicode.GetBytes(inputValue); // now compute the hash byte[] hash = md5.ComputeHash(columnContents); // convert to strings to save in the output string hashString = Convert.ToBase64String(hash, Base64FormattingOptions.None); // include newlines // set the output column value buffer.SetString(outputColumnBufferIndex, hashString); }

That‟s it!

Page 32

Exercise 5 - Testing the component
Your project is ready to execute with a sample package that looks like this:

1. Put a breakpoint at the beginning of PreExecute, and another at the beginning of ProcessInput.

2. Press F5 to debug the Create Hash Transform. You‟ll see dtexec.exe pop up, and then the PreExecute breakpoint will be hit:

Page 33

3. Hit F5 again and the breakpoint in ProcessInput will be hit. Step through the function to look at the code execution. You can also open the package up in the designer, add a visualizer, and see the results of the execution. Here‟s the configuration of the visualizer to show only the „AddressLine1‟ and „Hash value‟ columns:

And, finally, here‟s the result of the visual debugging of the data flow itself:

Page 34

Conclusion
At this point, you are well versed in:  Building and registering transformation components for use in the designer.  Writing transforms with synchronous outputs  Providing users validation messages during both design time and execution time  Referencing data available on input columns  Writing the results of transformation to the pipeline buffers  Debugging both the design time and execution time behavior of the component To build on this momentum, we suggest the following potential next steps:  Go through the samples that are installed with SQL Server 2005, which includes not just transformations with synchronous outputs but also Sources and transformations with asynchronous outputs. In addition, source code is available in both C# and VB.  Visit http://www.sqlis.com for excellent materials presented by our MVPs and the community on how to build component and the component UIs, in addition to several more samples.  Interact with the product team and the community on the SSIS MSDN forum at http://forums.microsoft.com/msdn/ShowForum.aspx?ForumID=80  Visit the SSIS MSDN development center for more information on webcasts, training, bloggers, books, etc. at http://msdn.microsoft.com/SQL/sqlwarehouse/SSIS/default.aspx Finally, we‟d like to thank you for your enthusiasm of our product and your support! Do let us know how this material could be improved for future consumptions by you peers or what other information would be useful to you. Sincerely, SQL Server Integration Services Product Team

Page 35


				
DOCUMENT INFO
Shared By:
Categories:
Stats:
views:3181
posted:11/12/2009
language:English
pages:38