Anda di halaman 1dari 6

Rank Transformation Overview

By PenchalaRaju.Yanamala
Transformation type:
Active
Connected
You can select only the top or bottom rank of data with Rank transformation. Use
a Rank transformation to return the largest or smallest numeric value in a port or
group. You can also use a Rank transformation to return the strings at the top or
the bottom of a session sort order. During the session, the Integration Service
caches input data until it can perform the rank calculations.
The Rank transformation differs from the transformation functions MAX and MIN,
in that it lets you select a group of top or bottom values, not just one value. For
example, use Rank to select the top 10 salespersons in a given territory. Or, to
generate a financial report, you might also use a Rank transformation to identify
the three departments with the lowest expenses in salaries and overhead. While
the SQL language provides many functions designed to handle groups of data,
identifying top or bottom strata within a set of rows is not possible using standard
SQL functions.
You connect all ports representing the same row set to the transformation. Only
the rows that fall within that rank, based on some measure you set when you
configure the transformation, pass through the Rank transformation. You can
also write expressions to transform data or perform calculations.
Figure 17-1 shows a mapping that passes employee data from a human
resources table through a Rank transformation. The Rank transformation only
passes the rows for the top 10 highest paid employees to the next
transformation.
Ranking String Values
When the Integration Service runs in the ASCII data movement mode, it sorts
session data using a binary sort order.
When the Integration Service runs in Unicode data movement mode, the
Integration Service uses the sort order configured for the session. You select the
session sort order in the session properties. The session properties lists all
available sort orders based on the code page used by the Integration Service.
For example, you have a Rank transformation configured to return the top three
values of a string port. When you configure the workflow, you select the
Integration Service on which you want the workflow to run. The session
properties display all sort orders associated with the code page of the selected
Integration Service, such as French, German, and Binary. If you configure the
session to use a binary sort order, the Integration Service calculates the binary
value of each string, and returns the three rows with the highest binary values for
the string.
Rank Caches
During a session, the Integration Service compares an input row with rows in the
data cache. If the input row out-ranks a cached row, the Integration Service
replaces the cached row with the input row. If you configure the Rank
transformation to rank across multiple groups, the Integration Service ranks
incrementally for each group it finds.
The Integration Service stores group information in an index cache and row data
in a data cache. If you create multiple partitions in a pipeline, the Integration
Service creates separate caches for each partition.
Rank Transformation Properties
When you create a Rank transformation, you can configure the following
properties:

Enter a cache directory.

Select the top or bottom rank.

Select the input/output port that contains values used to determine the rank.
You can select only one port to define a rank.

Select the number of rows falling within a rank.

Define groups for ranks, such as the 10 least expensive products for each
manufacturer.
Ports in a Rank Transformation
The Rank transformation includes input or input/output ports connected to
another transformation in the mapping. It also includes variable ports and a rank
port. Use the rank port to specify the column you want to rank.
The following table describes the ports in a Rank transformation:
Ports Number
Required
Description
I Minimum of
one
Input port. Create an input port to receive data from another
transformation.
O Minimum of
one
Output port. Create an output port for each port you want to
link to another transformation. You can designate input ports
as output ports.
V Not
Required
Variable port. Can use to store values or calculations to use in
an expression. Variable ports cannot be input or output ports.
They pass data within the transformation only.
R One only Rank port. Use to designate the column for which you want to
rank values. You can designate only one Rank port in a Rank
transformation. The Rank port is an input/output port. You
must link the Rank port to another transformation.
Rank Index
The Designer creates a RANKINDEX port for each Rank transformation. The
Integration Service uses the Rank Index port to store the ranking position for
each row in a group. For example, if you create a Rank transformation that ranks
the top five salespersons for each quarter, the rank index numbers the
salespeople from 1 to 5:
RANKINDEX SALES_PERSON SALES
1 Sam 10,000
2 Mary 9,000
3 Alice 8,000
4 Ron 7,000
5 Alex 6,000
The RANKINDEX is an output port only. You can pass the rank index to another
transformation in the mapping or directly to a target.
Defining Groups
Like the Aggregator transformation, the Rank transformation lets you group
information. For example, if you want to select the 10 most expensive items by
manufacturer, you would first define a group for each manufacturer. When you
configure the Rank transformation, you can set one of its input/output ports as a
group by port. For each unique value in the group port, the transformation
creates a group of rows falling within the rank definition (top or bottom, and a
particular number in each rank).
Therefore, the Rank transformation changes the number of rows in two different
ways. By filtering all but the rows falling within a top or bottom rank, you reduce
the number of rows that pass through the transformation. By defining groups, you
create one set of ranked rows for each group.
For example, you might create a Rank transformation to identify the 50 highest
paid employees in the company. In this case, you would identify the SALARY
column as the input/output port used to measure the ranks, and configure the
transformation to filter out all rows except the top 50.
After the Rank transformation identifies all rows that belong to a top or bottom
rank, it then assigns rank index values. In the case of the top 50 employees,
measured by salary, the highest paid employee receives a rank index of 1. The
next highest-paid employee receives a rank index of 2, and so on. When
measuring a bottom rank, such as the 10 lowest priced products in the inventory,
the Rank transformation assigns a rank index from lowest to highest. Therefore,
the least expensive item would receive a rank index of 1.
If two rank values match, they receive the same value in the rank index and the
transformation skips the next value. For example, if you want to see the top five
retail stores in the country and two stores have the same sales, the return data
might look similar to the following:
RANKINDEX SALES STORE
1 10000 Orange
1 10000 Brea
3 90000 Los Angeles
4 80000 Ventura
Creating a Rank Transformation
You can add a Rank transformation anywhere in the mapping after the source
qualifier.
To create a Rank transformation:
1.
In the Mapping Designer, click Transformation > Create. Select the Rank
transformation. Enter a name for the Rank. The naming convention for Rank
transformations is RNK_TransformationName.
Enter a description for the transformation. This description appears in the
Repository Manager.
2. Click Create, and then click Done.
The Designer creates the Rank transformation.
3. Link columns from an input transformation to the Rank transformation.
4. Click the Ports tab and select the Rank (R) option for the rank port.
If you want to create groups for ranked rows, select Group By for the port that
defines the group.
5. Click the Properties tab and select whether you want the top or bottom rank.
6.
For the Number of Ranks option, enter the number of rows you want to select
for the rank.
7. Change the other Rank transformation properties, if necessary.
The following table describes the Rank transformation properties:
Setting Description
Cache Directory Local directory where the Integration Service creates the index
and data cache files. By default, the Integration Service uses
the directory entered in the Workflow Manager for the process
variable $PMCacheDir. If you enter a new directory, make sure
the directory exists and contains enough disk space for the
cache files.
Top/Bottom Specifies whether you want the top or bottom ranking for a
column.
Number of Ranks Number of rows you want to rank.
Case-Sensitive
String
Comparison
When running in Unicode mode, the Integration Service ranks
strings based on the sort order selected for the session. If the
session sort order is case sensitive, select this option to enable
case-sensitive string comparisons, and clear this option to have
the Integration Service ignore case for strings. If the sort order
is not case sensitive, the Integration Service ignores this
setting. By default, this option is selected.
Tracing Level Determines the amount of information the Integration Service
writes to the session log about data passing through this
transformation in a session.
Rank Data Cache
Size
Data cache size for the transformation. Default is 2,000,000
bytes. If the total configured session cache size is 2 GB
(2,147,483,648 bytes) or more, you must run the session on a
64-bit Integration Service. You can configure a numeric value,
or you can configure the Integration Service to determine the
cache size at runtime. If you configure the Integration Service to
determine the cache size, you can also configure a maximum
amount of memory for the Integration Service to allocate to the
cache.
Rank Index
Cache Size
Index cache size for the transformation. Default is 1,000,000
bytes. If the total configured session cache size is 2 GB
(2,147,483,648 bytes) or more, you must run the session on a
64-bit Integration Service. You can configure a numeric value,
or you can configure the Integration Service to determine the
cache size at runtime. If you configure the Integration Service to
determine the cache size, you can also configure a maximum
amount of memory for the Integration Service to allocate to the
cache.
Transformation
Scope
Specifies how the Integration Service applies the transformation
logic to incoming data:
-
Transaction. Applies the transformation logic to all rows in a
transaction. Choose Transaction when a row of data depends
on all rows in the same transaction, but does not depend on
rows in other transactions.
-
All Input. Applies the transformation logic on all incoming data.
When you choose All Input, the PowerCenter drops incoming
transaction boundaries. Choose All Input when a row of data
depends on all rows in the source.
8. Click OK.