Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
RATE THIS
29
Performing range lookups (i.e. to find a key for a given range) is a common ETL operation in data warehousing
scenarios. It's especially for historical loads and late arriving fact situations, where you're using type 2
dimensions and you need to locate the key which represents the dimension value for a given point in time.
This blog post outlines three separate approaches for doing range lookups in SSIS:
1. Using the Lookup Transform
2. Merge Join + Conditional Split
3. Script Component
All of our scenarios will use the AdventureWorksDW2008 sample database (DimProduct table) as the dimension,
and take its fact data from AdventureWorks2008 (SalesOrderHeader and SalesOrderDetail tables). The
"ProductNumber" column from the SalesOrderDetail table maps to the natural key of the DimProduct dimension
(ProductAlternateKey column). In all cases we want to lookup the key (ProductKey) for the product which was
valid (identified by StartDate and EndDate) for the given OrderDate.
One last thing to note is that the Merge Join and Script Component solutions assume that a valid range exists
for each incoming value. The Lookup Transform approach is the only one that will identify rows that have no
matches (although the Script Component solution could be modified to do so as well).
Lookup Transform
The Lookup Transform was designed to handle 1:1 key matching, but it can also be used in the range lookup
scenario by using a partial cache mode, and tweaking the query on the Advanced Settings page. However, the
Lookup doesn't cache the range itself, and will end up going to the database very often - it will only detect a
match in its cache if all of the parameters are the same (i.e. same product purchased on the same date).
We can use the following query to have the lookup transform perform our range lookup:
select [ProductKey], [ProductAlternateKey],
[StartDate], [EndDate]
from [dbo].[DimProduct]
where [ProductAlternateKey] = ?
and [StartDate] <= ?
and (
[EndDate] is null or
[EndDate] > ?
)
On the query parameters page, we map 0 -> ProductNumber, 1 and 2 -> OrderDate.
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
1/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
This approach is effective and easy to setup, but it is pretty slow when dealing with a large number of rows, as
most lookups will be going to the database.
For example, a row coming in from our source would contain an OrderDate and ProductNumber, like this:
From the DimProduct source, we take three additional columns - ProductKey (what we're after), StartDate and
EndDate. The DimProduct dimension contains three entries for the "LJ-0192-L" product (as its information, like
unit price, has changed over time). After going through the Merge Join, the single row becomes three rows.
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
2/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
We use the Conditional Split to do the range lookup, and take the single row we want. Here is our expression
(remember, in our case an EndDate value of NULL indicates that it's the most current row):
StartDate <= OrderDate && (OrderDate < EndDate || ISNULL(EndDate))
This approach is a little more complicated, but performs a lot better than using the Lookup Transform.
Script Component
The third approach uses a custom script component to perform the lookup. I wrote the script in two ways - one
that simulates a "Full Cache" type lookup, and one that is similar to partial cache except it pulls back all values
for a given natural key, instead of just the one for the given date range. The caching behavior is controlled by
the PreCache boolean package variable.
Conclusion
I ran the three packages using the following environment (my laptop):
Dual core Intel 1.8ghz
3gb of RAM
AdventureWorks2008 and AdventureWorksDW2008
~120,000 order rows (SalesOrderDetail)
~600 reference rows (DimProduct)
Here are the results, in rows per second (larger being better):
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
3/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
At 120k+ rows per second, we can see that the custom script (or better yet, a custom transform) is the best
alternative here. We can also see that even though the Lookup approach was by far the slowest (3639 rows /
second), it is still a viable choice when you're processing a small number of rows.
There's a couple of reasons that the Lookup Transform performs poorly here. First, because it's not able to precache any of the reference data, it has to go to the database often. Second, it matches only on actual parameter
values - it doesn't have a concept of ranges. Since it will only find a cache hit if all parameters are the same, it
ends up hitting the database for almost every row (120k times). By comparison, the script component will only
query once per unique ProductNumber (~600 times max).
So there you have three different approaches for doing range lookups in SSIS. I'm hoping we'll be able to either
enhance the Lookup component to support this functionality in the future, or perhaps provide a new transform
to handle this case.
In the mean time, please feel free to post / email any alternative approaches you might have.
I've attached the packages used in this post incase you want to try out the different options for yourself.
Lookup
Like
Share
Leave a Comment
Name
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
4/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
Comment
Post
Comments
SSIS Team Blog
From the Lookup presentation I put together for the MS BI conference in October, here is a series of
This is just a place holder. Please see the post on different approaches of doing Range Lookups in SSIS
Douglas Laudenschlager
The SQL Server Integration Services team added valuable new caching options (and scalability) to the
ravisharvindMSBI
Hi,
Can anybody explain in brief the code used in the script component?
I have a scenario to implement the same.
Any help would be greatly appreciated. Thanks
ravisharvindMSBI
5/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
An optimization. It assumes that everything up to the next end date belongs to the previous record.
ex.
Key 1, EndDate 2001/01/01
Key 20, EndDate 2003/01/01
Key 40, EndDate 2008/01/01
Key 60, EndDate NULL
If a record with a transaction date of 1999 comes in, we can determine it belongs to Key 1. If a record
with a transaction date of 2004 comes in, we know it belongs to Key 40.
Note, this will only work if there are no gaps in your history. It assumes that your data has been
cleaned before hand.
ravisharvindMSBI
Hi,
How do i redirect the lookup no match output in the script component if there is no dimension natural
key available?
Can you please help me out with the code?
Thanks in advance for the help.
ravisharvindMSBI
How do i redirect the lookup no match output in the script component if there is no dimension natural
key available?
Even if redirection is not possible i would like to replace the corresponding key of the record like 1, which
is the key for default record in my dimension?
waiting for your response...
You can redirect with the script by adding another output to the component.
http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/66136/
Alternatively, you can replace the key value with 1 instead of throwing an exception like the script does
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
6/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
now.
ravisharvindMSBI
Hi Masson,
Thanks for the reply.
I tried replacing the key for the corresponding row to 1 instead of throwing the exception as you said.
But it fails again. I don't want the script component to fail at any cost, just replace the key with 1 and
proceed.
Below is the change I made to the code.
if (ranges == null)
{
Row.ProductKey = 1;
//commented
//throw new NullReferenceException("Couldn't find //value for product number " + productNumber);
}
Am I missing something here or should I do anything else.. Please have a look at it. It is very critical for
me.
COZYROC
There is solution based on the third-party commercial CozyRoc SSIS+ library. CozyRoc has implemented
data flow destination script, which creates memory-efficient range dictionary object. The dictionary object
can then be used in CozyRoc Lookup Plus component. For more information and demonstration how to
use the script, check here:
http://www.cozyroc.com/script/range-dictionary-destination
pzajkowski
Hi, I'd really like to try out the downloadable solutions, but it appears they are in SSIS 2008 format.
Any chance a SSIS 2005 version can be made available too?
Thanks in advance.
Langston
Is it possible to get a VB.Net version of the script component's code? Thanks in advance.
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
7/8
3/27/2015
Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs
JJBear11
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx
8/8