Anda di halaman 1dari 8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

Lookup Pattern: Range Lookups


Matt Masson - MSFT

25 Nov 2008 12:30 PM

RATE THIS

29

Performing range lookups (i.e. to find a key for a given range) is a common ETL operation in data warehousing
scenarios. It's especially for historical loads and late arriving fact situations, where you're using type 2
dimensions and you need to locate the key which represents the dimension value for a given point in time.
This blog post outlines three separate approaches for doing range lookups in SSIS:
1. Using the Lookup Transform
2. Merge Join + Conditional Split
3. Script Component
All of our scenarios will use the AdventureWorksDW2008 sample database (DimProduct table) as the dimension,
and take its fact data from AdventureWorks2008 (SalesOrderHeader and SalesOrderDetail tables). The
"ProductNumber" column from the SalesOrderDetail table maps to the natural key of the DimProduct dimension
(ProductAlternateKey column). In all cases we want to lookup the key (ProductKey) for the product which was
valid (identified by StartDate and EndDate) for the given OrderDate.
One last thing to note is that the Merge Join and Script Component solutions assume that a valid range exists
for each incoming value. The Lookup Transform approach is the only one that will identify rows that have no
matches (although the Script Component solution could be modified to do so as well).

Lookup Transform
The Lookup Transform was designed to handle 1:1 key matching, but it can also be used in the range lookup
scenario by using a partial cache mode, and tweaking the query on the Advanced Settings page. However, the
Lookup doesn't cache the range itself, and will end up going to the database very often - it will only detect a
match in its cache if all of the parameters are the same (i.e. same product purchased on the same date).
We can use the following query to have the lookup transform perform our range lookup:
select [ProductKey], [ProductAlternateKey],
[StartDate], [EndDate]
from [dbo].[DimProduct]
where [ProductAlternateKey] = ?
and [StartDate] <= ?
and (
[EndDate] is null or
[EndDate] > ?
)
On the query parameters page, we map 0 -> ProductNumber, 1 and 2 -> OrderDate.

http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

1/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

This approach is effective and easy to setup, but it is pretty slow when dealing with a large number of rows, as
most lookups will be going to the database.

Merge Join and Conditional Split


This approach doesn't use the Lookup Transform. Instead we use a Merge Join Transform to do an inner join on
our dimension table. This will give us more rows coming out than we had coming in (you'll get a row for every
repeated ProductAlternateKey). We use the conditional split to do the actual range check, and take only the rows
that fall into the right range.

For example, a row coming in from our source would contain an OrderDate and ProductNumber, like this:

From the DimProduct source, we take three additional columns - ProductKey (what we're after), StartDate and
EndDate. The DimProduct dimension contains three entries for the "LJ-0192-L" product (as its information, like
unit price, has changed over time). After going through the Merge Join, the single row becomes three rows.

http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

2/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

We use the Conditional Split to do the range lookup, and take the single row we want. Here is our expression
(remember, in our case an EndDate value of NULL indicates that it's the most current row):
StartDate <= OrderDate && (OrderDate < EndDate || ISNULL(EndDate))

This approach is a little more complicated, but performs a lot better than using the Lookup Transform.

Script Component
The third approach uses a custom script component to perform the lookup. I wrote the script in two ways - one
that simulates a "Full Cache" type lookup, and one that is similar to partial cache except it pulls back all values
for a given natural key, instead of just the one for the given date range. The caching behavior is controlled by
the PreCache boolean package variable.

Conclusion
I ran the three packages using the following environment (my laptop):
Dual core Intel 1.8ghz
3gb of RAM
AdventureWorks2008 and AdventureWorksDW2008
~120,000 order rows (SalesOrderDetail)
~600 reference rows (DimProduct)
Here are the results, in rows per second (larger being better):

http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

3/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

At 120k+ rows per second, we can see that the custom script (or better yet, a custom transform) is the best
alternative here. We can also see that even though the Lookup approach was by far the slowest (3639 rows /
second), it is still a viable choice when you're processing a small number of rows.
There's a couple of reasons that the Lookup Transform performs poorly here. First, because it's not able to precache any of the reference data, it has to go to the database often. Second, it matches only on actual parameter
values - it doesn't have a concept of ranges. Since it will only find a cache hit if all parameters are the same, it
ends up hitting the database for almost every row (120k times). By comparison, the script component will only
query once per unique ProductNumber (~600 times max).
So there you have three different approaches for doing range lookups in SSIS. I'm hoping we'll be able to either
enhance the Lookup component to support this functionality in the future, or perhaps provide a new transform
to handle this case.
In the mean time, please feel free to post / email any alternative approaches you might have.
I've attached the packages used in this post incase you want to try out the different options for yourself.

Lookup

Like

Share

Save this on Delicious

Leave a Comment
Name
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

4/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

Comment

Post

Comments
SSIS Team Blog

25 Nov 2008 12:33 PM

From the Lookup presentation I put together for the MS BI conference in October, here is a series of

SSIS Team Blog

25 Nov 2008 12:41 PM

This is just a place holder. Please see the post on different approaches of doing Range Lookups in SSIS

Douglas Laudenschlager

19 Jan 2009 5:22 PM

The SQL Server Integration Services team added valuable new caching options (and scalability) to the

ravisharvindMSBI

30 Jul 2009 12:50 PM

Hi,
Can anybody explain in brief the code used in the script component?
I have a scenario to implement the same.
Any help would be greatly appreciated. Thanks

Matt Masson - MSFT

30 Jul 2009 3:44 PM

Which part do you need explained?

ravisharvindMSBI

1 Aug 2009 4:47 PM

Basically the part within the while loop.


Why is StartDate not considered in the code?
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

5/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

I am a starter, so please bear with me.


Thanks for your help.

Matt Masson - MSFT

3 Aug 2009 2:00 PM

An optimization. It assumes that everything up to the next end date belongs to the previous record.
ex.
Key 1, EndDate 2001/01/01
Key 20, EndDate 2003/01/01
Key 40, EndDate 2008/01/01
Key 60, EndDate NULL
If a record with a transaction date of 1999 comes in, we can determine it belongs to Key 1. If a record
with a transaction date of 2004 comes in, we know it belongs to Key 40.
Note, this will only work if there are no gaps in your history. It assumes that your data has been
cleaned before hand.

ravisharvindMSBI

20 Aug 2009 2:55 PM

Hi,
How do i redirect the lookup no match output in the script component if there is no dimension natural
key available?
Can you please help me out with the code?
Thanks in advance for the help.

ravisharvindMSBI

23 Aug 2009 11:19 AM

How do i redirect the lookup no match output in the script component if there is no dimension natural
key available?
Even if redirection is not possible i would like to replace the corresponding key of the record like 1, which
is the key for default record in my dimension?
waiting for your response...

Matt Masson - MSFT

24 Aug 2009 10:41 AM

You can redirect with the script by adding another output to the component.
http://www.sqlservercentral.com/articles/Integration+Services+(SSIS)/66136/
Alternatively, you can replace the key value with 1 instead of throwing an exception like the script does
http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

6/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

now.

ravisharvindMSBI

24 Aug 2009 2:06 PM

Hi Masson,
Thanks for the reply.
I tried replacing the key for the corresponding row to 1 instead of throwing the exception as you said.
But it fails again. I don't want the script component to fail at any cost, just replace the key with 1 and
proceed.
Below is the change I made to the code.
if (ranges == null)
{
Row.ProductKey = 1;
//commented
//throw new NullReferenceException("Couldn't find //value for product number " + productNumber);
}
Am I missing something here or should I do anything else.. Please have a look at it. It is very critical for
me.

COZYROC

27 Jan 2010 12:31 PM

There is solution based on the third-party commercial CozyRoc SSIS+ library. CozyRoc has implemented
data flow destination script, which creates memory-efficient range dictionary object. The dictionary object
can then be used in CozyRoc Lookup Plus component. For more information and demonstration how to
use the script, check here:
http://www.cozyroc.com/script/range-dictionary-destination

pzajkowski

14 May 2010 10:51 AM

Hi, I'd really like to try out the downloadable solutions, but it appears they are in SSIS 2008 format.
Any chance a SSIS 2005 version can be made available too?
Thanks in advance.

Langston

20 Jul 2010 1:00 PM

Is it possible to get a VB.Net version of the script component's code? Thanks in advance.

http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

7/8

3/27/2015

Lookup Pattern: Range Lookups - SSIS Team Blog - Site Home - MSDN Blogs

JJBear11

3 Aug 2010 1:30 PM

I second that -- any chance for a VB.Net version of this code?

http://blogs.msdn.com/b/mattm/archive/2008/11/25/lookup-pattern-range-lookups.aspx

8/8

Anda mungkin juga menyukai