Lab Manual
Creating a custom transformation
component in SQL Server Integration
Services
Page i
Information in this document is subject to change without notice. The example companies, organizations, products,
people, and events depicted herein are fictitious. No association with any real company, organization, product, person or
event is intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user.
Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a
retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or
otherwise), or for any purpose, without the express written permission of Microsoft Corporation.
Microsoft may have patents, patent applications, trademarked, copyrights, or other intellectual property rights covering
subject matter in this document. Except as expressly provided in any written license agreement from Microsoft, the
furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual
property.
Microsoft, MS-DOS, MS, Windows, Windows NT, MSDN, Active Directory, BizTalk, SQL Server, SharePoint, Outlook,
PowerPoint, FrontPage, Visual Basic, Visual C++, Visual J++, Visual InterDev, Visual SourceSafe, Visual C#, Visual J#,
and Visual Studio are either registered trademarks or trademarks of Microsoft Corporation in the U.S.A. and/or other
countries.
Other product and company names herein may be the trademarks of their respective owners.
Page ii
Contents
CONTENTS.............................................................................................................................................................III
CONCLUSION........................................................................................................................................................35
Page iii
Lab 1: Creating a no-op component
Instructor notes: This lab will lead attendees through creating the bare bones data flow component and
how they can debug it. The code will be written in C#. However, any language that can produce a .Net
assembly can be used: VB.Net, Delphi, etc.
Lab Objective
Estimated time to complete this lab: 40 minutes
The objective of this lab is to write the simplest transformation possible: a no-op transform that allows
you to see the data as it flows through the transform.
Page 1
Exercise 1 – Writing the no-op data flow transformation component
Instructor notes: Some students might have a question about why the namespaces of integration
services has DTS in the name. This is because the name change happened too late into the
development cycle for us to go back and update the object model.
In this exercise, you will create a new C# based class library which will be the implementation of the
data flow component.
Page 2
3. Declare the use of the ‘Microsoft.SqlServer.Dts.Pipeline’ namespace, change the namespace of
your class to CustomComponents, and change the name of the class to SeeBufferComponent:
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Dts.Pipeline;
namespace CustomComponents
{
public class SeeBufferComponent
{
}
}
Derive the SeeBufferComponent class from the default base component implementation,
PipelineComponent:
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SqlServer.Dts.Pipeline;
namespace CustomComponents
{
public class SeeBufferComponent : PipelineComponent
{
}
}
Add the DtsPipelineComponent attribute to your component class. This will allow you to also set the
name of the component that will show up in the toolbox, as well as describe whether this is a Source,
Transformation or Destination component.
Page 3
namespace CustomComponents
{
[DtsPipelineComponent(DisplayName = "SeeBuffer",
ComponentType = ComponentType.Transform)]
public class SeeBufferComponent : PipelineComponent
{
}
}
Note: DtsPipelineComponent has other properties to specify description, icon,
editor, etc.
Override the base ProcessInput implementation. ProcessInput is the method where a buffer is made
available to the transform as it’s produced from upstream components.
public class SeeBufferComponent : PipelineComponent
{
public override void ProcessInput(int inputID, PipelineBuffer buffer)
{
int numberOfRows = buffer.RowCount;
bool eof = buffer.EndOfRowset;
}
}
Note: The ‘numberOfRows’ and ‘eof’ variables are created for debugging only.
That’s it! You’ve now created the simplest transform. Lets use it in the designer.
Page 4
Exercise 2 – Building and registering the component
2. Check the ‘Sign the assembly’ option and click ‘<New…>’ to create a new key file:
3. Select ‘SeeBuffer’ as the name of the key file. For this exercise, do not protect the key file with a
password:
Page 5
4. Build the project.
After building, you’ll have a ‘SeeBuffer.dll’ file in the bin\Debug directory of your project. If you used
the suggested folder for creating the project, the project is in the ‘C:\SSIS Hands-On-
Lab\Scratch\SeeBuffer\SeeBuffer\’ folder.
1. Copy SeeBuffer.DLL that was created by the build process to %ProgramFiles%\Microsoft SQL
Server\90\DTS\PipelineComponents. This will enable the SSIS component enumeration to discover
the component you’ve just created.
2. Use GACUtil.EXE to install the assembly into the GAC using the following command:
This command will enable the assembly to be used by the designer and during execution.
Note: Subsequent labs will invoke a batch file that registers the component after a
successful build.
Congratulations! You’ve now registered your first component. Let us use this component in a package
next.
Page 6
Exercise 3 – Using the component in a package
In this exercise, we’ll register the component in the designer toolbox and use it in a simple package.
Page 7
At this point, the designer will enumerate all tasks, connections, and data flow components
registered for use on this machine.
4. In the ‘SSIS Data Flow Items’ tab, check the ‘SeeBuffer’ list item. This will add the ‘SeeBuffer’
transform to your toolbox.
1. Add a ‘Data Flow Task’ to your package from the toolbox by either double clicking the task on
the toolbox or dragging and dropping it on the Control Flow diagramming surface:
Page 8
4. Edit the OLE DB Source adapter, click ‘New’ to create a new connection for the package.
5. In the dialog, you’ll see a list of two or more most recently used connections that have been
defined for you already. Use ‘dbil405.AdventureWork’ as the source connection.
6. In the ‘Name of the table or the view’ drop down, select ‘[Person].[Address]’ as the source table.
For this exercise, it doesn’t matter what source table is selected.
Page 9
8. Next, find the ‘SeeBuffer’ transform on the toolbox. Since it was declared to be a transformation
and not a source or a destination, it would show up under the ‘Data Flow Transformations’
category of the toolbox.
9. Drag and drop the ‘SeeBuffer’ transform on the data flow surface.
10. Click and drag the green arrow out of the OLE DB Source component and drop it on the
SeeBuffer component to configure a path of data flow between them.
11. Since the transform doesn’t do any operations, it is not necessary to edit it. You can, however,
edit the transform and notice that a default editor is provided for you automatically:
Page 10
Note: You, of course, have the ability to define your custom transformation that
hides most of the object model from your users.
12. Whenever you’re done with the SeeBuffer editor, click Cancel to get back to the data flow task
editor.
13. To make sure the transform run, add a Union All after the See Buffer so that your data flow
looks like this:
Page 11
Task 3 – Executing the test package
1. Right click on the path between the ‘SeeBuffer’ and the ‘UnionAll’ and select ‘Data Viewers’.
4. Click OK to get out of the Path editor and back to the data flow task again. Your data flow will
now look like this:
5. To execute the package, hit ‘F5’. When Business Intelligence Development Studio switches to
debug mode, you will get an extra tool window which contains the data viewer. Rearrange that
as appropriate. Your environment will look like this:
Here, you can see the data coming from the Person.Address table passing through the SeeBuffer
component without any operations.
Page 12
6. Once done with the debugging, you can stop the debugger to stop execution.
Page 13
Exercise 4 – Debugging the component code
To understand how to debug the component, you need to first understand how the designer executes
the package.
The Business Intelligence Development Studio (BIDS) is basically Visual Studio, or devenv.exe. When
the SSIS debug engine starts, devenv starts a process called DtsDebugHost.EXE in which to execute
the package. This isolates the designer (devenv) from side effects of execution – memory corruptions,
incorrect cleanup, side effects of debugging values, etc. In order to keep the debugger operational
during debugging, devenev.exe actually starts another dummy copy of DtsDebugHost.EXE and
connects to that which receiving events from the executing DtsDebugHost. This dummy process could
have been anything – including notepad – but we selected dtsdebughost.exe since we understand and
can easily control that process.
This means that if you want to debug the design time functionality of the component (none of which
you’ve actually created in your package as yet), you’ll have to attach your debugger to the devenv.exe
process that’s hosting the Integration Services project.
However, in order to debug execution time behavior, you’ll have to attach to DtsDebugHost.EXE if the
package is started from devenv. As outlined above, there will be two DtsDebugHost.EXE available,
we’ll have to pick the right one because the other one is boring. The easiest way to identify the right
DtsDebugHost.eXE would be to check what types of debug engines have been loaded in the process.
You’re looking for the one that has Managed and native debug engines, process with PID 3216 below.
Another way to execute the process is by using dtexec.exe. This command line utility executes the
packages outside the designer. Since debugging execution time using dtexec is more straightforward,
that’s what we’ll use for this lab.
Page 14
Task 1 – Debugging the component through DTExec
2. Open the ‘SeeBuffer.cs’ file in the project and review the code – it should look almost identical
to what you created in the previous exercises.
3. Edit the Properties of this project, and switch to the ‘Build Events’ tab to review its settings:
The ‘Build Events’ tab specifies that a batch file will be run to register the components so that
we don’t have to worry about that. However, having done this in the previous exercises, you’re
now aware of what the batch file contains and why it is necessary. Review the batch file if you
wish, it is in the same directory as the project.
4. Switch to the ‘Debug’ tab and review and settings there which instruct Visual Studio to start
dtexec.exe with a pre defined package which includes the project to start debugging:
Page 15
5. Lets get debugging. Put a breakpoint on the first line of the ProcessInput implementation:
6. Start execution.
7. Dtexec.exe will popup, write a lot of validation messages, and then cause the debugger to stop
at the breakpoint.
8. Inspect the various buffer properties. It seems this buffer has 16 rows.
9. Hit F5 again to complete processing this buffer. The debugger will then stop again, this time with
a buffer.RowCount of 0 and buffer.EndOfRowset of True. This is the indication to your
transformation that there is no more data coming from upstream components.
Page 16
Note: if you’re writing a transform such as a Sort or and Aggregate, you need to
know when you have encountered all of the data… EndOfRowset is exactly the
property to use for that.
10. Close all instances of Business Intelligence Development Studio and Visual Studio.
Congratulations, you have built your first transform, registered it in the designer, used it in a data flow,
reviewed the default editor your transform gets automatically, watched the debugging experience in the
designer, and finally have debugged through the component at execution time!
Page 17
Lab 2: Creating a Hash Transformation
Lab Objective
Estimated time to complete this lab: 50 minutes
The objective of this lab is to write a more functional transform that references data available from an
upstream component and produces new data on the output. The transform will also perform some
validation so that you can get an understanding of what’s involved there.
In order to demonstrate this functionality, the transform will implement the ability to create a hash value
of a Unicode string.
Page 18
Exercise 1 – Initializing the component’s inputs and outputs
The component object model provides a method, ProvideComponentProperties, for allowing
component authors to initialize the component when it’s first added to the data flow. This function is
useful for components to set up the initial number of inputs/outputs. For example, the Merge Join
transform uses this to create two inputs and one output. This method can also be called to reset a
component back to its initial state, which is useful if users edit the persisted XML (which itself is not
supported) and get it wrong.
These steps are similar to what we did in the first exercise of the first lab. Instead of repeating the steps
of Exercise #1, a starting point is available to you.
2. First review the list of references. You’ll see a new reference there, to
Microsoft.SqlServer.ManagedDTS, the managed runtime object model; and to
Microsoft.SqlServer.DTSRuntimeWrap, the interop for the native runtime object model.
3. In addition, you’ll also notice the following declarations of using new namespaces in
CreateHashTransform.cs:
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using System.Security.Cryptography;
4. The use of DtsPipelineComponent attribute and deriving from PipelineComponent was outlined
in Exercise 1 and is already done for you.
Finally, we can work on the component code itself. In this task, we will implement the code called when
this component is first added to the data flow, or when the component is asked to reset itself.
1. Override the ‘ProvideComponentProperties’ method. Comments are added for clarification and
do not, of course, need to be typed in verbatim.
Page 19
public override void ProvideComponentProperties()
{
}
}
/// <summary>
/// Called to initialize/reset the component
/// </summary>
public override void ProvideComponentProperties()
{
base.ProvideComponentProperties();
}
3. Now, since this method can also be called to reset the component, delete any existing metadata
(inputs, outputs, input columns, etc.) from it by calling another method in the base class:
/// <summary>
/// Called to initialize/reset the component
/// </summary>
public override void ProvideComponentProperties()
{
base.ProvideComponentProperties();
4. Next, create the input and output. This transform will have only one input and one output. Note
that this is where you can name the input and output appropriately.
/// <summary>
/// Called to initialize/reset the component
/// </summary>
public override void ProvideComponentProperties()
{
base.ProvideComponentProperties();
IDTSOutput90 output =
this.ComponentMetaData.OutputCollection.New();
output.Name = "Output";
5. Next, we’ll tie the output to the input. In other words the buffer that comes in to the input will be
the same one that goes to the output, we’ll just add our data to a separate column in the buffer:
Page 20
// create the input and the output
IDTSInput90 input = this.ComponentMetaData.InputCollection.New();
input.Name = "Input";
IDTSOutput90 output =
this.ComponentMetaData.OutputCollection.New();
output.Name = "Output";
6. Finally, lets create the output column that will contain the result of the transform, the hash value:
// the output is synchronous with the input
output.SynchronousInputID = input.ID;
outputColumn.SetDataTypeProperties(Microsoft.SqlServer.Dts.Runtime.Wrapper.DataType
.DT_WSTR, 24, 0, 0, 0);
The highlighted lines of code create a new output column with the name “Hash Value”, and a data
type of DT_WSTR (Unicode string) of size 24 characters.
Page 21
Exercise 2 – Providing design time validation
As you can imagine, components can provide fairly unique functionality. In order to help guide the user
along during the design time and also to highlight unintentional errors during both design time and
during execution, the pipeline object model provides the ability for components to validate their state
and report this status to the designer or any other pipeline object model client.
In this exercise, we will add rudimentary validation to the component. Note that the Validate method
that is discussed below is called during both design time and during execution.
/// <summary>
/// only one column can be supported at this time
/// </summary>
/// <returns></returns>
public override DTSValidationStatus Validate()
{
return base.Validate();
}
You’ll notice that the DTSValidationStatus is an enumeration that has the following potential values:
DTSValidationStatus Description
VS_ISVALID The component is correctly configured and ready for
execution.
VS_ISBROKEN The component is incorrectly configured; typically,
this indicates that a property is set incorrectly.
VS_NEEDSNEWMETADATA The component's metadata is outdated or corrupt,
and a call to component’s ReinitializeMetaData()
will repair the component.
VS_ISCORRUPT The component is irreparably damaged and must be
completely reset. The designer calls the
component's ProvideComponentProperties().
2. If the base implementation of Validate says the component is valid, we’ll perform our own
validation. The simplest validation for us is to make sure we have only one input, one output,
and the single output has only one column. Otherwise, we report an error using the ‘FireError’
method. The only way we could get into this state is if the user modified the XML. We’ll
therefore use the VS_ISCORRUPT validation status flag notifying the caller of this.
if (status == DTSValidationStatus.VS_ISVALID)
{
// input and output are created by us, they should be exactly
what we wanted unless someone
Page 22
// played around with the XML.
if ((ComponentMetaData.InputCollection.Count != 1)
|| (ComponentMetaData.OutputCollection.Count != 1)
||
(ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1))
{
bool cancel = false;
this.ComponentMetaData.FireError(0, ComponentMetaData.Name,
"Invalid metadata", string.Empty, 0, out cancel);
status = DTSValidationStatus.VS_ISCORRUPT;
}
}
return status;
}
3. In this version of the transform, only a single input column can be selected by the user.
// input and output are created by us, they should be exactly what we wanted
// unless someone played around with the XML.
if ((ComponentMetaData.InputCollection.Count != 1)
|| (ComponentMetaData.OutputCollection.Count != 1)
||
(ComponentMetaData.OutputCollection[0].OutputColumnCollection.Count != 1))
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Invalid
metadata", string.Empty, 0, out cancel);
status = DTSValidationStatus.VS_ISCORRUPT;
}
else
{
// only one column can be selected
if (ComponentMetaData.InputCollection[0].InputColumnCollection.Count !=
1)
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Only one
input column is currently supported", string.Empty, 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}
}
return status;
4. Finally, the data type of the input column must say it is a Unicode string since that’s all we
currently support:
Page 23
else
{
// the data type of the input column must be DT_WSTR because that's all
we support right now
if
(ComponentMetaData.InputCollection[0].InputColumnCollection[0].DataType !=
DataType.DT_WSTR)
{
bool cancel = false;
ComponentMetaData.FireError(0, ComponentMetaData.Name, "Input
column data type must be DT_WSTR", string.Empty, 0, out cancel);
return DTSValidationStatus.VS_ISBROKEN;
}
}
That’s it for some simple validation. Lets test this design time behavior out in the designer.
2. Start the Business Intelligence Development Studio and add the ‘CreateHash’ transform to the
toolbox. The steps to follow here are similar to what we went through to get ‘SeeBuffer’
registered. Of course the transform to add to the toolbox is now called ‘CreateHash’ instead of
‘SeeBuffer’.
3. Create a new package that contains an OLE DB source to any table that contains a unicode
string column. While it is unlikely users will want to create a hash value on a string that denotes
the address of a person, that’s what we’ll use for this experiment. Such data exists in the
[Person].[Address] table of AdventureWorks:
4. Now add a new ‘Create Hash transform’. Notice the red X icon signifying a validation error. Click
on the transform to get the validation error in a tooltip, or use the Errors List window. In this
case, we’re getting a validation error saying only one column can be selected in the input:
Page 24
5. Edit the Create Hash Transform, switch to the Input Columns tab, and select a column which is
not a Unicode string. In this example, we’ve selected ‘rowguid’:
6. Click OK and see the new validation error complaining about the data type of the selected
column:
Page 25
7. Finally, edit the transform again, unselect ‘rowguid’, and select the ‘AddressLine1’ column:
8. Click ‘OK’ and notice that the ‘Create Hash Transform’ now no longer has the red X icon
identifying the error. It is now valid!
Page 26
9. Close the Business Intelligence Development Studio whenever you’re done with this exercise.
Page 27
Exercise 3 – Caching column indices in PreExecute
Now we’re ready to work on the execution time behavior of this component. As we saw in Lab1,
ProcessInput is called once per buffer. PreExecute is a method called before ProcessInput to allow a
component to cache whatever information it needs to cache and get ready for a call to ProcessInput (or
other per-buffer calls depending on the type of component).
If you’ve written code to access rowset fields by names in the past, you’ve probably optimized the loop
over all the rows by first finding what column index contains a column for a specific name. Since this is
done outside the loop, and the loop indexes into the fields collection by an integer rather than the
column name, execution time improves.
We’ll follow a similar pattern here and get the column index of both the input column and the output
column so that the per-buffer call is faster.
Task 1 – Populate the index of input and output columns in the buffer
1. Create a couple of variables at the class level that will hold this information:
/// <summary>
/// called before executing, remembers the buffer column index
/// </summary>
public override void PreExecute()
{
base.PreExecute();
}
/// <summary>
/// called before executing, remembers the buffer column index
/// </summary>
public override void PreExecute()
{
base.PreExecute();
Page 28
Several concepts need to be highlighted here:
a. The BufferManager manages the layout of buffer in the pipeline engine and is the class
which can tell us where a column exists in a buffer.
b. As you know, several components can reference the same column in a data flow. For
example, we’re using the AddressLine1 column to create a hash. A subsequent destination
might use the AddressLine1 to map to a destination column; and another transform in the
middle might be responsible for parsing the contents of the AddressLine1 column. However,
all of them are using the same column! The pipeline keeps track of this fact by assigning a
unique LineageID to the AddressLine1 column when it’s introduced by the OLE DB Source
into the data flow. From that point onwards all input columns that reference this column refer
to it by its LineageID.
4. Now we do the same for the output column. Note that since the output of this transform is
synchronous with the input (no new buffer is created), the output column will be in the input’s
buffer.
/// <summary>
/// called before executing, remembers the buffer column index
/// </summary>
public override void PreExecute()
{
base.PreExecute();
Page 29
2. Add the code to initialize this variable in PreExecute:
md5 = null;
}
Page 30
Exercise 4 – Producing a hash result on the output column
Finally, we can now create the hash value itself.
3. Some basic error traps to report any errors encountered during execution:
Page 31
while (buffer.NextRow())
{
// hash value for NULLs is NULL.
if (buffer.IsNull(inputColumnBufferIndex))
{
buffer.SetNull(outputColumnBufferIndex);
}
}
As you can see, the buffer provides methods for checking for NULL values and for setting NULL
values for output columns.
6. If the input value is not null, we calculate a hash using the MD5 crypto service provider
// hash value for NULLs is NULL.
if (buffer.IsNull(inputColumnBufferIndex))
{
buffer.SetNull(outputColumnBufferIndex);
}
else
{
// get the raw contents of the buffer
string inputValue = buffer.GetString(inputColumnBufferIndex);
byte[] columnContents =
UnicodeEncoding.Unicode.GetBytes(inputValue);
That’s it!
Page 32
Exercise 5 - Testing the component
Your project is ready to execute with a sample package that looks like this:
1. Put a breakpoint at the beginning of PreExecute, and another at the beginning of ProcessInput.
2. Press F5 to debug the Create Hash Transform. You’ll see dtexec.exe pop up, and then the
PreExecute breakpoint will be hit:
Page 33
3. Hit F5 again and the breakpoint in ProcessInput will be hit. Step through the function to look at the
code execution.
You can also open the package up in the designer, add a visualizer, and see the results of the
execution. Here’s the configuration of the visualizer to show only the ‘AddressLine1’ and ‘Hash value’
columns:
And, finally, here’s the result of the visual debugging of the data flow itself:
Page 34
Conclusion
At this point, you are well versed in:
• Building and registering transformation components for use in the designer.
• Writing transforms with synchronous outputs
• Providing users validation messages during both design time and execution time
• Referencing data available on input columns
• Writing the results of transformation to the pipeline buffers
• Debugging both the design time and execution time behavior of the component
Finally, we’d like to thank you for your enthusiasm of our product and your support! Do let us know how
this material could be improved for future consumptions by you peers or what other information would
be useful to you.
Sincerely,
Page 35