1. GETTING STARTED
Metadata Extensions tab
Execute SQL to create table
The source qualifier represents the rows that the
Integration Service reads from the source when it
runs a session
**Overview window
When you run the workflow, the Integration Service
runs all sessions in the workflow, either
simultaneously or in sequence, depending on how
you arrange the sessions in the workflow.
Gateway Nodes One node acts as the gateway at any given time.
That node is called the master gateway.
A gateway node can run application services, and it
can serve as a master gateway node.
The master gateway node is the entry point to the
domain
Informatica Administrator
Use the Administrator tool to complete the following
types of tasks:
Domain administrative tasks Manage logs, domain objects, user permissions, and
domain reports.
Generate and upload node diagnostics.
Monitor jobs and applications that run on the Data
Integration Service.
Domain objects include application services, nodes,
grids, folders, database connections, operating
system profiles, and licenses.
Security administrative tasks Manage users, groups, roles, and privileges
Application
services
represent
functionality. Application services
following services:
- Analyst Service
- Content Management Service
- Data Integration Service
- Metadata Manager Service
- Model Repository Service
- PowerCenter Integration Service
- PowerCenter Repository Service
- PowerExchange Listener Service
- PowerExchange Logger Service
- Reporting Service
- SAP BW Service
- Web Services Hub
server-based
include the
High Availability
High
availability
consists
of
the
following
components:
Resilience - The ability of application services to
tolerate transient network failures until either the
resilience timeout expires or the external system
failure is fixed.
Failover - The migration of an application service or
task to another node when the node running the
service process becomes unavailable.
Recovery - The automatic completion of tasks after
a service is interrupted. Automatic recovery is
available for PowerCenter Integration Service and
PowerCenter Repository Service tasks. You can also
manually recover PowerCenter Integration Service
workflows and sessions. Manual recovery is not part
of high availability
Folder Management
Folders can contain nodes, services, grids, licenses,
and other folders.
User Accounts
Default Administrator The default administrator is a user account in the
native security domain.
You cannot create a default administrator.
You cannot disable or modify the user name or
privileges of the default administrator.
You can change the default administrator password
High Availability
If you have the high availability option, you can
achieve full high availability of internal Informatica
components.
You can achieve high availability with external
components based on the availability of those
components.
If you do not have the high availability option,
you can achieve some high availability of
internal components
Example
While you are fetching a mapping into the
PowerCenter Designer workspace, the PowerCenter
Repository Service becomes unavailable, and the
request fails. The PowerCenter Repository Service
fails over to another node because it cannot restart
on the same node.
Privileges
Informatica includes the following privileges:
Domain privileges - Determine actions on the
Informatica domain that users can perform using the
INFA_CLIENT_RESILIENCE_TIMEOUT
that
is
configured on the client machine.
Default value - If you do not use the command line
option or the environment variable, the command
line program uses the default resilience timeout of
180 seconds.
Limit on timeout - If the limit on resilience timeout
for the service is smaller than the command line
resilience timeout, the command line program uses
the limit as the resilience timeout
Reject Files
By default, the PowerCenter Integration Service
process creates a reject file for each target in the
session.
The writer may reject a row in the following
circumstances:
- It is flagged for reject by an Update Strategy or
Custom transformation.
- It violates a database constraint such as primary
key constraint.
- A field in the row was truncated or overflowed, and
the target database is configured to reject truncated
or overflowed data.
Note: If you enable row error logging, the
PowerCenter Integration Service process does not
create a reject file.
Control File
When you run a session that uses an external
loader, the PowerCenter Integration Service process
creates a control file and a target flat file.
The control file contains information about the
target flat file such as data format and loading
instructions for the external loader.
The control file has an extension of .ctl.
The PowerCenter Integration Service process creates
the control file and the target flat file in the
PowerCenter Integration Service variable directory,
$PMTargetFileDir, by default.
Indicator File
If you use a flat file as a target, you can configure
the PowerCenter Integration Service to create an
indicator file for target row type information.
For each target row, the indicator file contains a
number to indicate whether the row was marked for
insert, update, delete, or reject.
The PowerCenter Integration Service process names
this file target_name.ind and stores it in the
USING PMCMD
pmcmd is a program you use to communicate with
the Integration Service.
With pmcmd, you can perform some of the tasks
that you can also perform in the Workflow Manager,
such as starting and stopping workflows and
sessions.
pmcmd
pmcmd> connect -sv MyIntService -d MyDomain -u
seller3 -p jackson
pmcmd> setfolder SalesEast
pmcmd> startworkflow wf_SalesAvg
pmcmd> startworkflow wf_SalesTotal
USING PMREP
pmrep is a command line program that you use to
update
repository
information
and perform
repository functions.
pmrep is installed in the PowerCenter Client and
PowerCenter Services bin directories.
Use pmrep to perform repository administration
tasks such as listing repository objects, creating
and editing groups, restoring and deleting
repositories,
and
updating
session-related
parameters and security information in the
PowerCenter repository.
When you use pmrep, you can enter commands in
the following modes:
Command line mode - You can issue pmrep
commands directly from the system command line.
Use command line mode to script pmrep commands.
3. DESIGNER GUIDE
The Designer provides the following tools:
Source Analyzer - Import or create source definitions
for flat file, XML, COBOL, Application, and relational
sources.
Target Designer - Import or create target definitions.
Transformation Developer - Create reusable
transformations.
Mapplet Designer - Create mapplets.
Mapping Designer - Create mappings.
The Designer consists of the following windows:
Navigator - Connect to multiple repositories and folders. You
can also copy and delete objects and create shortcuts using the
Navigator. Workspace - View or edit sources, targets, mapplets,
transformations, and mappings. You work with a single tool at a
time in the workspace, which has two formats: default and
workbook. You can view multiple versions of an object in the
workspace.
Status bar - Displays the status of the operation you perform.
Output - Provides details when you perform certain tasks, such
as saving work or validating a mapping. Rightclick the Output
window to access window options, such as printing output text,
saving text to file, and changing the font size.
10
Creating a Toolbar
You can create a new toolbar and choose buttons for
the new toolbar.
You can create toolbars in the Designer, Workflow
Manager, and the Workflow Monitor.
Find Next
Use the Find Next tool to search for a column or port
name in:
- Transformations
- Mapplets
- Source definitions
- Target definitions
With the Find Next tool, you can search one object at
a time.
You cannot search multiple objects at the same
time.
Use Find Next in each Designer tool.
Select a single transformation or click in the Output
window before performing the search.
The Designer saves the last 10 strings searched in
the Find Next box on the Standard toolbar
You can search for a string in the Save, Generate, or
Validate tabs in the Output window.
The Find in Workspace tool searches for a field name
or transformation name in all transformations in the
workspace.
The Find in Workspace tool lets you to search all of
the transformations in the workspace for port or
transformation names.
You can search for column or port names or table
names matching the search string.
You can specify whether to search across all names
in the workspace, or across the business name of a
table, column, or port.
You can also choose to search for whole word
matches for the search string or matches which
match the case of the search string
You can complete the following tasks in each Designer
tool:
- Add a repository.
- Print the workspace.
- View date and time an object was last saved.
- Open and close a folder.
- Create shortcuts (You cannot create shortcuts to
objects in non-shared folders)
- Check out and in repository objects.
- Search for repository objects.
- Enter descriptions for repository objects.
- View older versions of objects in the workspace.
- Revert to a previously saved object version.
- Copy objects.
- Export and import repository objects.
- Work with multiple objects, ports, or columns.
- Rename ports.
- Use shortcut keys.
You can also view object dependencies in the
Designer.
Rules and Guidelines for Viewing and Comparing
Versioned Repository Objects
You cannot simultaneously view multiple versions of
composite objects, such as mappings and
mapplets.
Older versions of composite objects might not
include the child objects that were used when the
composite object was checked in.
If you open a composite object that includes a child
object version that is purged from the repository,
the preceding version of the child object appears in
the workspace as part of the composite object.
For example, you want to view version 5 of a
mapping that originally included version 3 of a
source definition, but version 3 of the source
definition is purged from the repository. When you
view version 5 of the mapping, version 2 of the
source definition appears as part of the mapping.
Shortcut objects are not updated when you modify
the objects they reference. When you open a
shortcut object, you view the same version of the
object that the shortcut originally referenced, even
if subsequent versions exist.
Viewing an Older Version of a Repository Object
To open an older version of an object in the
workspace:
1. In the workspace or Navigator, select the object
and click Versioning > View History.
2. Select the version you want to view in the
workspace and click Tools > Open in Workspace.
Note: An older version of an object is read-only, and
the version number appears as a prefix before the
object name. You can simultaneously view multiple
11
12
13
14
- Fields
- OCCURS
- REDEFINES
FD Section
The Designer assumes that each FD entry defines
the equivalent of a source table in a relational
source and creates a different COBOL source
definition for each such entry.
For example, if the COBOL file has two FD entries,
CUSTOMERS and ORDERS, the Designer creates
one COBOL source definition containing the fields
attributed to CUSTOMERS, and another with the
fields that belong to ORDERS
Fields
The Designer identifies each field definition, reads
the datatype, and assigns it to the appropriate
source definition.
OCCURS
COBOL files often contain multiple instances of the
same type of data within the same record.
For example, a COBOL file may include data about
four different financial quarters, each stored in the
same record.
When the Designer analyzes the file, it creates a
different column for each OCCURS statement in the
COBOL file.
These OCCURS statements define repeated
information in the same record. Use the Normalizer
transformation to normalize this information.
For each OCCURS statement, the Designer creates
the following items:
- One target table when you drag the COBOL source
definition into the Target Designer.
- A primary-foreign key relationship
- A generated column ID (GCID)
REDEFINES
COBOL uses REDEFINES statements to build the
description of one record based on the definition of
another record.
When you import the COBOL source, the Designer
creates a single source that includes REDEFINES.
The REDEFINES statement lets you specify multiple
PICTURE clauses for the sample physical data
location.
Therefore, you need to use Filter transformations to
separate the data into the tables created by
REDEFINES.
For each REDEFINES:
15
16
- Mapplets
The mapping is dependent on these objects.
When this metadata changes, the Designer and
other PowerCenter Client applications track the
effects of these changes on mappings.
In these cases, you may find that mappings become
invalid even though you do not edit the mapping.
When a mapping becomes invalid, the Integration
Service cannot run it properly, and the Workflow
Manager invalidates the session.
The only objects in a mapping that are not
stored as independent repository objects are
the non-reusable transformations that you
build within the mapping.
These non-reusable transformations are
stored within the mapping only.
17
18
19
20
21
22
23
Editing Mapplets
You can edit a mapplet in the Mapplet Designer. The
Designer validates the changes when you save the
mapplet.
When you save changes to a mapplet, all
instances of the mapplet and all shortcuts to
the mapplet inherit the changes.
These changes might invalidate mappings that use
the mapplet.
To see what mappings or shortcuts may be
affected by changes you make to a mapplet,
select the mapplet in the Navigator, rightclick, and select Dependencies. Or, click
Mapplets > Dependencies from the menu.
24
25
26
27
Variable Functions
Variable functions determine how the Integration
Service calculates the current value of a mapping
variable in a pipeline.
Use variable functions in an expression to set the
value of a mapping variable for the next session
run.
The transformation language provides the following
variable functions to use in a mapping:
SetMaxVariable: Sets the variable to the maximum
value of a group of values. It ignores rows marked for
update, delete, or reject. To use the SetMaxVariable
with a mapping variable, the aggregation type of the
mapping variable must be set to Max.
SetMinVariable: Sets the variable to the minimum
value of a group of values. It ignores rows marked for
update, delete, or reject. To use the SetMinVariable
with a mapping variable, the aggregation type of the
mapping variable must be set to Min.
SetCountVariable: Increments the variable value by
one. In other words, it adds one to the variable value
when a row is marked for insertion, and subtracts one
when the row is marked for deletion. It ignores rows
marked for update or reject. To use the
SetCountVariable with a mapping variable, the
aggregation type of the mapping variable must be set
to Count.
SetVariable: Sets the variable to the configured
value. At the end of a session, it compares the final
current value of the variable to the start value of the
variable. Based on the aggregate type of the variable,
it saves a final value to the repository. To use the
SetVariable function with a mapping variable, the
aggregation type of the mapping variable must be set
to Max or Min. The SetVariable function ignores rows
marked for delete or reject.
Use variable functions only once for each mapping
variable in a pipeline.
The Integration Service processes variable functions
as it encounters them in the mapping.
The order in which the Integration Service
encounters variable functions in the mapping
may not be the same for every session run.
28
29
30
31
Designer Behavior
When the Debugger starts, you cannot
perform the following tasks:
- Close the folder or open another folder.
- Use the Navigator.
- Perform repository functions, such as Save.
- Edit or close the mapping.
- Switch to another tool in the Designer, such
as Target Designer.
- Close the Designer.
Note: Dynamic partitioning is disabled during
debugging.
Monitoring the Debugger
When you run the Debugger, you can monitor the
following information:
Session status: Monitor the status of the session.
Data movement: Monitor data as it moves through
transformations.
Breakpoints: Monitor data that meets breakpoint
conditions.
Target data: Monitor target data on a row-by-row
basis.
The Mapping Designer displays windows and debug
indicators that help you monitor the session:
Debug indicators: Debug indicators on
transformations help you follow breakpoints and
data flow.
Instance window: When the Debugger pauses, you
can view transformation data and row information in
the Instance window.
Target window: View target data for each target in
the mapping.
Output window: The Integration Service writes
messages to the following tabs in the Output
window:
- Debugger tab: The debug log displays in the
Debugger tab.
- Session Log tab: The session log displays in the
Session Log tab.
- Notifications tab: Displays messages from the
Repository Service.
You can step to connected transformations in
the mapping, even if they do not have an
associated breakpoint.
You cannot step to the following instances:
- Sources
- Targets
- Unconnected transformations
- Mapplets not selected for debugging
Modifying Data
When the Debugger pauses, the current instance
displays in the Instance window, and the current
instance indicator displays on the transformation in
the mapping. You can make the following
modifications to the current instance when
the Debugger pauses on a data breakpoint:
Modify output data: You can modify output data
of the current transformation. When you continue
the session, the Integration Service validates the
data. It performs the same validation it performs
when it passes data from port to port in a regular
session.
Change null data to not-null: Clear the null
column, and enter a value in the value column to
change null data to not-null.
Change not-null to null: Select the null column to
change not-null data to null. The Designer prompts
you to confirm that you want to make this change.
Modify row types: Modify Update Strategy, Filter,
or Router transformation row types.
For Router transformations, you can change the row
type to override the group condition evaluation for
user defined groups.
For example, if the group condition evaluates to
false, the rows are not passed through the output
ports to the next transformation or target.
The Instance window displays <no data
available>, and the row type is filtered. If you
want to pass the filtered row to the next
transformation or target, you can change the
row type to Insert.
Likewise, for a group that meets the group condition,
you can change the row type from insert to filtered.
After you change data, you can refresh the
cache before you continue the session.
When you issue the Refresh command, the Designer
processes the request for the current
transformation, and you can see if the data you
enter is valid.
You can change the data again before you continue
the session
Restrictions
You cannot change data for the following output
ports:
Normalizer transformation: Generated Keys and
Generated Column ID ports.
Rank transformation: RANKINDEX port.
Router transformation: All output ports.
Sequence Generator transformation: CURRVAL
and NEXTVAL ports.
32
Enhanced Security
The Workflow Manager has an enhanced security
option to specify a default set of permissions for
connection objects.
When you enable enhanced security, the
Workflow Manager assigns default
permissions on connection objects for users,
groups, and others.
When you disable enable enhanced security, the
Workflow Manager Assigns read, write, and execute
permissions to all users that would otherwise
receive permissions of the default group.
If you delete the owner from the repository,
the Workflow Manager assigns ownership of
the object to the administrator.
Viewing and Comparing Versioned Repository
Objects
You can view and compare versions of objects in the
Workflow Manager. If an object has multiple
versions, you can find the versions of the object in
the View History window. In addition to comparing
versions of an object in a window, you can view the
various versions of an object in the workspace to
graphically compare them.
Use the following rules and guidelines when you
view older versions of objects in the workspace:
- You cannot simultaneously view multiple versions
of composite objects, such as workflows and
worklets.
- Older versions of a composite object might not
include the child objects that were used when the
composite object was checked in. If you open a
composite object that includes a child object version
that is purged from the repository, the preceding
version of the child object appears in the workspace
as part of the composite object. For example, you
might want to view version 5 of a workflow that
33
34
35
Tasks
36
Decision Task
You can enter a condition that determines the
execution of the workflow, similar to a link condition
with the Decision task.
The Decision task has a predefined variable called
$Decision_task_name.condition that represents the
result of the decision condition.
The Integration Service evaluates the condition in
the Decision task and sets the predefined condition
variable to True (1) or False (0).
You can specify one decision condition per
Decision task.
After the Integration Service evaluates the Decision
task, use the predefined condition variable in other
expressions in the workflow to help you develop the
workflow.
Depending on the workflow, you might use link
conditions instead of a Decision task.
However, the Decision task simplifies the workflow.
37
Timer Task
You can specify the period of time to wait before the
Integration Service runs the next task in the
workflow with the Timer task.
You can choose to start the next task in the workflow
at a specified time and date.
You can also choose to wait a period of time after
the start time of another task, workflow, or worklet
before starting the next task.
The Timer task has the following types of settings:
Absolute time - You specify the time that the
Integration Service starts running the next task in
the workflow. You may specify the date and time, or
Sources
Allocating Buffer Memory
When the Integration Service initializes a session, it
allocates blocks of memory to hold source and
target data.
The Integration Service allocates at least two blocks
for each source and target partition.
Sessions that use a large number of sources or
targets might require additional memory blocks.
If the Integration Service cannot allocate
enough memory blocks to hold the data, it
fails the session.
Partitioning Sources
You can create multiple partitions for relational,
Application, and file sources.
For relational or Application sources, the Integration
Service creates a separate connection to the source
database for each partition you set in the session
properties.
For file sources, you can configure the session to
read the source with one thread or multiple
threads.
Overriding the Source Table Name
If you override the source table name on the
Properties tab of the source instance, and you
override the source table name using an SQL query,
the Integration Service uses the source table name
defined in the SQL query.
Targets
Working with Relational Targets
When you configure a session to load data to a
relational target, you define most properties in the
Transformations view on the Mapping tab.
Performing a Test Load
38
39
Constraint-Based Loading
In the Workflow Manager, you can specify constraintbased loading for a session.
When you select this option, the Integration Service
orders the target load on a row-by-row basis.
For every row generated by an active source, the
Integration Service loads the corresponding
transformed row first to the primary key table, then
to any foreign key tables.
Constraint-based loading depends on the following
requirements:
Active source - Related target tables must have
the same active source.
Key relationships - Target tables must have key
relationships.
Target connection groups - Targets must be in
one target connection group.
Treat rows as insert - Use this option when you
insert into the target. You cannot use updates
with constraint based loading.
40
Reserved Words
If any table name or column name contains a
database reserved word, such as MONTH or YEAR,
the session fails with database errors when the
Integration Service executes SQL against the
database.
You can create and maintain a reserved words file,
reswords.txt, in the server/bin directory.
When the Integration Service initializes a session, it
searches for reswords.txt.
If the file exists, the Integration Service places
quotes around matching reserved words when it
executes SQL against the database.
41
Task Validation
The Workflow Manager validates each task in the
workflow as you create it.
When you save or validate the workflow, the
Workflow Manager validates all tasks in the
workflow except Session tasks
When you delete a reusable task, the Workflow
Manager removes the instance of the deleted task
from workflows.
The Workflow Manager also marks the workflow
invalid when you delete a reusable task used in a
workflow.
The Workflow Manager verifies that there are no
duplicate task names in a folder, and that there are
no duplicate task instances in the workflow
42
Session Validation
If you delete objects associated with a Session task
such as session configuration object, Email, or
Command task, the Workflow Manager marks a
reusable session invalid.
However, the Workflow Manager does not mark a
non-reusable session invalid if you delete an object
associated with the session.
If you delete a shortcut to a source or target from
the mapping, the Workflow Manager does not mark
the session invalid.
The Workflow Manager does not validate SQL
overrides or filter conditions entered in the session
properties when you validate a session.
You must validate SQL override and filter conditions
in the SQL Editor.
If a reusable or non-reusable session instance is
invalid, the Workflow Manager marks it invalid in
the Navigator and in the Workflow Designer
workspace.
Workflows using the session instance remain valid
Expression Validation
The Workflow Manager validates all expressions in
the workflow. You can enter expressions in the
Assignment task, Decision task, and link conditions.
43
44
Partition Points
A stage is a section of a pipeline between any
two partition points.
When you set a partition point at a transformation,
the
new
pipeline
stage
includes
that
transformation.
Figure shows the default partition points and
pipeline stages for a mapping with one pipeline:
45
Dynamic Partitioning
When you use dynamic partitioning, you can
configure the partition information so the
Integration Service determines the number of
partitions to create at run time
The Integration Service scales the number of
session partitions at run time based on
factors such as source database partitions or
the number of nodes in a grid.
If any transformation in a stage does not
support partitioning, or if the partition
configuration does not support dynamic
partitioning, the Integration Service does not
46
The
following
dynamic
partitioning
configurations cause a session to run with one
partition:
1. You override the default cache directory for an
Aggregator, Joiner, Lookup, or Rank transformation.
The Integration Service partitions a transformation
cache directory when the default is $PMCacheDir.
2. You override the Sorter transformation default
work directory. The Integration Service partitions the
Sorter transformation work directory when the
default is $PMTempDir.
3. You use an open-ended range of numbers or date
keys with a key range partition type.
4. You use datatypes other than numbers or dates
as keys in key range partitioning.
5. You use key range relational target partitioning.
6. You create a user-defined SQL statement or a
user-defined source filter.
7. You set dynamic partitioning to the number of
nodes in the grid, and the session does not run on a
grid.
8. You use pass-through relational
source
partitioning.
9. You use dynamic partitioning with an Application
Source Qualifier.
10. You use SDK or PowerConnect sources and
targets with dynamic partitioning
Cache Partitioning
When you create a session with multiple partitions,
the
Integration
Service
may
use
cache
partitioning for the Aggregator, Joiner,
Lookup, Rank, and Sorter transformations.
When the Integration Service partitions a cache, it
creates a separate cache for each partition
and allocates the configured cache size to
each partition. The Integration Service stores
different data in each cache, where each cache
contains only the rows needed by that partition.
As a result, the Integration Service requires a portion
of total cache memory for each partition.
47
48
49
50
transformation
logic,
mapping
and
session
configuration.
The Integration Service processes all transformation
logic that it cannot push to a database
Pushdown Optimization Types
You can configure the following types of pushdown
optimization:
Source-side pushdown optimization
- The Integration Service pushes as much
transformation logic as possible to the source
database.
- Integration Service analyzes the mapping from the
source to the target or until it reaches a downstream
transformation it cannot push to the source
database.
- Integration Service generates and executes a
SELECT statement based on the transformation logic
for each transformation it can push to the database.
- Then, it reads the results of this SQL query and
processes the remaining transformations
Target-side pushdown optimization
- The Integration Service pushes as much
transformation logic as possible to the target
database.
- Integration Service analyzes the mapping from the
target to the source or until it reaches an upstream
transformation it cannot push to the target
database.
- It generates an INSERT, DELETE, or UPDATE
statement based on the transformation logic for
each transformation it can push to the target
database.
Full pushdown optimization
- Integration Service attempts to push all
transformation logic to the target database.
- To use full pushdown optimization, the
source and target databases must be in the
same relational database management system
- If the Integration Service cannot push all
transformation logic to the database, it performs
both
source-side
and
target-side
pushdown
optimization.
- When you run a session with large quantities of
data and full pushdown optimization, the database
server must run a long transaction.
- Consider the following database performance
issues when you generate a long transaction:
- A long transaction uses more database
resources.
- A long transaction locks the database for longer
periods of time. This reduces database concurrency
and increases the likelihood of deadlock.
Pushdown Compatibility
To push a transformation with multiple connections
to a database, the connections must be pushdown
compatible.
Connections are pushdown compatible if they
connect to databases on the same database
management system and the Integration Service
can identify the database tables that the
connections access.
The following transformations can have multiple
connections:
Joiner - The Joiner transformation can join data from
multiple source connections.
Union - The Union transformation can merge data
from multiple source connections.
51
Recovery
If you configure a session for full pushdown
optimization and the session fails, the Integration
Service cannot perform incremental recovery
because
the
database
processes
the
transformations.
Instead, the database rolls back the transactions.
If the database server fails, it rolls back transactions
when it restarts.
If the Integration Service fails, the database server
rolls back the transaction.
If the failure occurs while the Integration Service is
creating temporary sequence objects or views in
the database, which is before any rows have been
processed, the Integration Service runs the
generated SQL on the database again.
If the failure occurs before the database processes
all rows, the Integration Service performs the
following tasks:
1. If applicable, the Integration Service drops and
recreates temporary view or sequence objects in the
database to ensure duplicate values are not
produced.
2. The Integration Service runs the generated SQL
on the database again.
If the failure occurs while the Integration Service is
dropping the temporary view or sequence objects
from the database, which is after all rows are
processed, the Integration Service tries to drop the
temporary objects again
Using the $$PushdownConfig Mapping Parameter
Depending on the database workload, you might
want to use source-side, target-side, or full
pushdown optimization at different times.
For example, use source-side or target-side
pushdown optimization during the peak hours of
the day, but use full pushdown optimization from
midnight until 2 a.m. when database activity is low.
To
use
different
pushdown
optimization
configurations at different times, use the $
$PushdownConfig mapping parameter.
The settings in the $$PushdownConfig parameter
override the pushdown optimization settings in the
session properties.
Partitioning
You can push a session with multiple partitions to a
database if the partition types are pass-through
partitioning or key range partitioning
52
53
includes
the
54
55
56
57
- Timer
- Event-Wait
- Worklet
Grid Processing
Rules and Guidelines for Configuring a Workflow or
Session to Run on a Grid
- To run sessions over the grid, verify that the
operating system and bit mode is the same for each
node of the grid. A session might not run on the grid
if the nodes run on different operating systems or bit
modes.
- If you override a service process variable, ensure
that the Integration Service can access input files,
caches, logs, storage and temporary directories, and
source and target file directories.
- To ensure that a Session, Command, or predefined
Event-Wait task runs on a particular node, configure
the Integration Service to check resources and
specify a resource requirement for a the task.
- To ensure that session threads for a mapping
object run on a particular node, configure the
Integration Service to check resources and specify a
resource requirement for the object.
- When you run a session that creates cache files,
configure the root and cache directory to use a
shared location to ensure consistency between
cache files.
- Ensure the Integration Service builds the cache in a
shared location when you add a partition point at a
Joiner transformation and the transformation is
58
59
60
61
62
63
- MM/DD/YYYY HH24:MI:SS.MS
- MM/DD/RR HH24:MI:SS.US
- MM/DD/YYYY HH24:MI:SS.US
- MM/DD/RR HH24:MI:SS.NS
- MM/DD/YYYY HH24:MI:SS.NS
You can use the following separators: dash (-), slash
(/), backslash (\), colon (:), period (.), and space. The
Integration Service ignores extra spaces. You cannot
use one- or three-digit values for year or the HH12
format for hour.
Do not enclose parameter or variable values in
quotes - The Integration Service interprets
everything after the first equals sign as part
of the value.
Use a parameter or variable value of the
proper length for the error log table name
prefix - If you use a parameter or variable for the
error log table name prefix, do not specify a prefix
that exceeds 19 characters when naming Oracle,
Sybase, or Teradata error log tables. The error table
names can have up to 11 characters, and Oracle,
Sybase, and Teradata databases have a maximum
length of 30 characters for table names. The
parameter or variable name can exceed 19
characters
Troubleshooting Parameters and Parameter Files
I have a section in a parameter file for a
session, but the Integration Service does not
seem to read it.
Make sure to enter folder and session names as they
appear in the Workflow Manager. Also, use the
appropriate prefix for all user-defined session
parameters.
I am trying to use a source file parameter to
specify a source file and location, but the
Integration Service cannot find the source file.
Make sure to clear the source file directory in the
session
properties.
The
Integration
Service
concatenates the source file directory with the
source file name to locate the source file. Also,
make sure to enter a directory local to the
Integration Service and to use the appropriate
delimiter for the operating system.
I am trying to run a workflow with a parameter
file and one of the sessions keeps failing.
The session might contain a parameter that is not
listed in the parameter file. The Integration Service
uses the parameter file to start all sessions in the
workflow. Check the session properties, and then
verify that all session parameters are defined
correctly in the parameter file.
64
Incremental Aggregation
If the source changes incrementally and you can
capture changes, you can configure the session to
process those changes.
This allows the Integration Service to update the
target incrementally, rather than forcing it to
process the entire source and recalculate the same
data each time you run the session.
For example, you might have a session using a
source that receives new data every day.
You can capture those incremental changes because
you have added a filter condition to the mapping
that removes pre-existing data from the flow of
data. You then enable incremental aggregation
aggregation
in the
Each subsequent time you run a session with
incremental aggregation, the Integration Service
65
66
6. TRANSFORMATIONS
Active Transformations
An active transformation can perform any of the
following actions:
Change the number of rows that pass through
the transformation - For example, the Filter
transformation is active because it removes rows
that do not meet the filter condition. All multi-group
transformations are active because they might
change the number of rows that pass through the
transformation.
Change the transaction boundary - For example,
the Transaction Control transformation is active
because it defines a commit or roll back transaction
based on an expression evaluated for each row.
67
Ports
Port name - Use the following conventions while
naming ports:
- Begin with a single- or double-byte letter or singleor double-byte underscore (_).
- Port names can contain any of the following singleor double-byte characters: a letter, number,
underscore (_), $, #, or @.
Creating a Transformation
You can create transformations using the following
Designer tools:
Mapping Designer - Create transformations that
connect sources to targets. Transformations in a
mapping cannot be used in other mappings unless
you configure them to be reusable.
Transformation Developer - Create individual
transformations, called reusable transformations
that use in multiple mappings.
Mapplet Designer - Create and configure a set of
transformations, called mapplets, that you use in
multiple mapping
68
Variable Initialization
The Integration Service does not set the initial value
for variables to NULL. Instead, the Integration
Service uses the following guidelines to set initial
values for variables:
- Zero for numeric ports
- Empty strings for string ports
- 01/01/1753 for Date/Time ports with PMServer 4.0
date handling compatibility disabled
- 01/01/0001 for Date/Time ports with PMServer 4.0
date handling compatibility enabled
69
70
Reusable Transformations
The Designer stores each reusable transformation as
metadata separate from any mapping that uses the
transformation.
If you review the contents of a folder in the
Navigator, you see the list of all reusable
transformations in that folder
You can create most transformations as a nonreusable or reusable.
However, you can only create the External
Procedure transformation as a reusable
transformation.
When you add instances of a reusable
transformation to mappings, you must be careful
that changes you make to the transformation do
not invalidate the mapping or generate unexpected
data
Instances and Inherited Changes
Note that instances do not inherit changes to
property settings, only modifications to
ports, expressions, and the name of the
transformation.
A. AGGREGATOR
The Integration Service performs aggregate
calculations as it reads and stores data group and
row data in an aggregate cache.
Its unlike the Expression transformation, in that you
use the Aggregator transformation to perform
calculations on groups.
The Expression transformation permits you to
perform calculations on a row-by-row basis.
After you create a session that includes an
Aggregator transformation, you can enable the
session option, Incremental Aggregation. When the
Integration Service performs incremental
71
Group By Ports
When you group values, the Integration Service
produces one row for each group.
If you do not group values, the Integration Service
returns one row for all input rows.
The Integration Service typically returns the
last row of each group (or the last row
received) with the result of the aggregation.
However, if you specify a particular row to be
returned (for example, by using the FIRST function),
the Integration Service then returns the specified
row
B. CUSTOM TRANSFORMATION
Custom transformations operate in conjunction with
procedures you create outside of the Designer
interface to extend PowerCenter functionality.
You can create a Custom transformation and bind it
to a procedure that you develop using the Custom
transformation functions.
Use the Custom transformation to create
transformation applications, such as sorting
and aggregation, which require all input rows
to be processed before outputting any output
rows.
To support this process, the input and output
functions occur separately in Custom
transformations compared to External
Procedure transformations.
The Integration Service passes the input data
to the procedure using an input function.
The output function is a separate function
that you must enter in the procedure code to
pass output data to the Integration Service.
In contrast, in the External Procedure
transformation, an external procedure
function does both input and output, and its
parameters consist of all the ports of the
transformation.
You can also use the Custom transformation to
create a transformation that requires
multiple input groups, multiple output
groups, or both
Rules and Guidelines for Custom Transformations
- Custom transformations are connected
transformations. You cannot reference a Custom
transformation in an expression.
- You can include multiple procedures in one
module. For example, you can include an XML
writer procedure and an XML parser procedure in the
same module.
72
73
74
C. DATA MASKING
The Data Masking transformation modifies source
data based on masking rules that you configure for
each column.
You can maintain data relationships in the masked
data and maintain referential integrity between
database tables.
Locale
The locale identifies the language and region of the
characters in the data.
Choose a locale from the list.
The Data Masking transformation masks character
data with characters from the locale that you
choose.
The source data must contain characters that are
compatible with the locale that you select.
Seed
The seed value is a start number that enables
the Data Masking transformation to return
deterministic data with Key Masking.
75
76
77
78
79
<PC
Directory>\infa_shared\SrcFiles\defaultValue.xml
D. EXTERNAL PROCEDURE
External Procedure transformations operate in
conjunction with procedures you create outside of
the Designer interface to extend PowerCenter
functionality.
If you are an experienced programmer, you may
want to develop complex functions within a
dynamic link library (DLL) or UNIX shared library,
instead of creating the necessary Expression
transformations in a mapping.
To get this kind of extensibility, use the
Transformation Exchange (TX) dynamic invocation
interface built into PowerCenter.
E. FILTER
A filter condition returns TRUE or FALSE for each row
that the Integration Service evaluates, depending
on whether a row meets the specified condition.
For each row that returns TRUE, the Integration
Services pass through the transformation.
80
F. HTTP
The HTTP transformation enables you to connect to
an HTTP server to use its services and applications.
When you run a session with an HTTP
transformation, the Integration Service connects to
the HTTP server and issues a request to retrieve
data from or update data on the HTTP server,
depending on how you configure the
transformation:
Read data from an HTTP server - When the
Integration Service reads data from an HTTP server;
it retrieves the data from the HTTP server and
passes the data to the target or a downstream
transformation in the mapping. For example, you
can connect to an HTTP server to read current
inventory data, perform calculations on the data
during the PowerCenter session, and pass the data
to the target.
Update data on the HTTP server - When the
Integration Service writes to an HTTP server, it posts
data to the HTTP server and passes HTTP server
responses to the target or a downstream
transformation in the mapping. For example, you
can post data providing scheduling information from
upstream transformations to the HTTP server during
a session
G. IDENTITY RESOLUTION
The Identity Resolution transformation is an active
transformation that you can use to search and
match data in Informatica Identity Resolution
(IIR).
The PowerCenter Integration Service uses the search
definition that you specify in the Identity Resolution
transformation to search and match data residing in
the IIR tables.
The input and output views in the system determine
the input and output ports of the transformation.
H. JAVA
Extend PowerCenter functionality with the Java
transformation.
The Java transformation provides a simple native
programming interface to define transformation
functionality with the Java programming language.
You can use the Java transformation to quickly
define simple or moderately complex
transformation functionality without advanced
knowledge of the Java programming language or an
external Java development environment.
The PowerCenter Client uses the Java
Development Kit (JDK) to compile the Java
code and generate byte code for the
transformation.
The PowerCenter Client stores the byte code in
the PowerCenter repository.
The Integration Service uses the Java Runtime
Environment (JRE) to execute generated byte
code at run time.
When the Integration Service runs a session with a
Java transformation, the Integration Service uses
the JRE to execute the byte code and process input
rows and generate output rows.
Create Java transformations by writing Java code
snippets that define transformation logic.
Define transformation behavior for a Java
transformation based on the following events:
- The transformation receives an input row.
- The transformation has processed all input rows.
- The transformation receives a transaction
notification such as commit or rollback.
81
I. JOINER
The master pipeline ends at the Joiner
transformation, while the detail pipeline
continues to the target.
The Joiner transformation accepts input from most
transformations. However, consider the following
limitations on the pipelines you connect to the
Joiner transformation:
- You cannot use a Joiner transformation when
either input pipeline contains an Update
Strategy transformation.
82
83
2. ITEM_NAME
3. PRICE
When you configure the join condition, use the
following guidelines to maintain sort order:
- You must use ITEM_NO in the first join condition.
- If you add a second join condition, you must use
ITEM_NAME.
- If you want to use PRICE in a join condition,
you must also use ITEM_NAME in the second
join condition.
If you skip ITEM_NAME and join on ITEM_NO
and PRICE, you lose the sort order and the
Integration Service fails the session.
Joining Two Branches of the Same Pipeline
When you join data from the same source, you can
create two branches of the pipeline.
When you branch a pipeline, you must add a
transformation between the source qualifier
and the Joiner transformation in at least one
branch of the pipeline.
You must join sorted data and configure the Joiner
transformation for sorted input.
The following figure shows a mapping that joins two branches
of the same pipeline:
84
85
J. LOOKUP
Use a Lookup transformation in a mapping to look up
data in a flat file, relational table, view, or
synonym
86
Pipeline Lookups
Create a pipeline Lookup transformation to
perform a lookup on an application source
that is not a relational table or flat file.
A pipeline Lookup transformation has a source
qualifier as the lookup source. The source qualifier
can represent any type of source definition,
including JMS and MSMQ.
The source definition cannot have more than one
group.
When you configure a pipeline Lookup
transformation, the lookup source and source
qualifier are in a different pipeline from the Lookup
transformation.
The source and source qualifier are in a partial
pipeline that contains no target.
The Integration Service reads the source data in this
pipeline and passes the data to the Lookup
transformation to create the cache.
87
88
89
90
Lookup Caches
You can configure a Lookup transformation to cache
the lookup file or table.
The Integration Service builds a cache in memory
when it processes the first row of data in a cached
Lookup transformation.
It allocates memory for the cache based on the
amount you configure in the transformation or
session properties.
The Integration Service stores condition values in
the index cache and output values in the data
cache.
The Integration Service queries the cache for each
row that enters the transformation.
The Integration Service also creates cache files by
default in the $PMCacheDir.
If the data does not fit in the memory cache, the
Integration Service stores the overflow values in
the cache files.
91
92
Lookup Caches
The Integration Service builds a cache in memory
when it processes the first row of data in a cached
Lookup transformation.
It allocates memory for the cache based on the
amount you configure in the transformation or
session properties.
The Integration Service stores condition values in
the index cache and output values in the data
cache.
The Integration Service queries the cache for each
row that enters the transformation
If the data does not fit in the memory cache,
the Integration Service stores the overflow
values in the cache files.
When the session completes, the Integration Service
releases cache memory and deletes the cache files
unless you configure the Lookup transformation to
use a persistent cache.
If you use a flat file or pipeline lookup, the
Integration Service always caches the lookup
source.
If you configure a flat file lookup for sorted input, the
Integration Service cannot cache the lookup if the
condition columns are not grouped.
Shared cache You can share the lookup cache between multiple
transformations.
You can share an unnamed cache between
transformations in the same mapping.
You can share a named cache between
transformations in the same or different mappings.
Lookup transformations can share unnamed static
caches within the same target load order group if
the cache sharing rules match. Lookup
transformations cannot share dynamic cache within
the same target load order group.
Re-cache from source If the persistent cache is not synchronized with the
lookup table, you can configure the Lookup
transformation to rebuild the lookup cache.
Static cache You can configure a static, or read-only, cache for
any lookup source.
By default, the Integration Service creates a static
cache.
It caches the lookup file or table and looks up values
in the cache for each row that comes into the
transformation.
When the lookup condition is true, the Integration
Service returns a value from the lookup cache.
The Integration Service does not update the cache
while it processes the Lookup transformation.
Dynamic cache To cache a table, flat file, or source definition and
update the cache, configure a Lookup
transformation with dynamic cache.
The Integration Service dynamically inserts or
updates data in the lookup cache and passes the
data to the target.
The dynamic cache is synchronized with the target.
93
94
K. NORMALIZER
The Normalizer transformation generates a key for
each source row.
95
L. RANK
You can select only the top or bottom rank of data
with Rank transformation.
Use a Rank transformation to return the largest or
smallest numeric value in a port or group.
You can also use a Rank transformation to return the
strings at the top or the bottom of a session sort
order.
During the session, the Integration Service caches
input data until it can perform the rank calculations.
96
M. ROUTER
A Filter transformation tests data for one condition
and drops the rows of data that do not meet the
condition.
However, a Router transformation tests data for one
or more conditions and gives you the option to
route rows of data that do not meet any of the
conditions to a default output group
N. SEQUENCE GENERATOR
The Sequence Generator transformation generates
numeric values.
Use the Sequence Generator to create unique
primary key values, replace missing primary keys,
or cycle through a sequential range of numbers.
The Sequence Generator transformation is a
connected transformation.
It contains two output ports that you can connect to
one or more transformations.
The Integration Service generates a block of
sequence numbers each time a block of rows
enters a connected transformation
If you connect CURRVAL, the Integration Service
processes one row in each block.
When NEXTVAL is connected to the input port of
another transformation, the Integration Service
generates a sequence of numbers.
When CURRVAL is connected to the input port of
another transformation, the Integration Service
generates the NEXTVAL value plus the Increment
By value
You can make a Sequence Generator reusable, and
use it in multiple mappings
97
NEXTVAL
Connect NEXTVAL to multiple transformations to
generate unique values for each row in each
transformation.
Use the NEXTVAL port to generate sequence
numbers by connecting it to a downstream
transformation or target.
You connect the NEXTVAL port to generate the
sequence based on the Current Value and
Increment By properties.
If the Sequence Generator is not configured to cycle
through the sequence, the NEXTVAL port generates
sequence numbers up to the configured End Value.
CURRVAL
CURRVAL is NEXTVAL plus the Increment By value.
You typically only connect the CURRVAL port
when the NEXTVAL port is already connected
to a downstream transformation.
When
a
row
enters
a
transformation
connected to the CURRVAL port, the
Integration Service passes the last created
NEXTVAL value plus one.
The following figure shows connecting CURRVAL and
NEXTVAL ports to a target:
For example, you configure the Sequence Generator
transformation as follows:
Current Value = 1, Increment By = 1.
The Integration Service generates the following
values for NEXTVAL and CURRVAL:
NEXTVAL
CURRVAL
1
2
2
3
3
4
4
5
5
6
If you connect the CURRVAL port without
connecting the NEXTVAL port, the Integration
Service passes a constant value for each row.
When you connect the CURRVAL port in a Sequence
Generator transformation, the Integration Service
processes one row in each block.
98
O. SORTER
End Value
End Value is the maximum value you want the
Integration Service to generate.
If the Integration Service reaches the end
value and the Sequence Generator is not
configured to cycle through the sequence,
the session fails with the following error
message:
TT_11009 Sequence Generator Transformation:
Overflow error.
99
P. SOURCE QUALIFIER
The Source Qualifier transformation represents the
rows that the Integration Service reads when it runs
a session.
Use the Source Qualifier transformation to complete
the following tasks:
Join data originating from the same source
database You can join two or more tables with primary key
foreign key relationships by linking the sources to
one Source Qualifier transformation.
Filter rows when the Integration Service reads
source data If you include a filter condition, the Integration
Service adds a WHERE clause to the default query.
Specify an outer join rather than the default
inner join If you include a user-defined join, the Integration
Service replaces the join information specified by
the metadata in the SQL query.
Specify sorted ports If you specify a number for sorted ports, the
Integration Service adds an ORDER BY clause to the
default SQL query.
Select only distinct values from the source If you choose Select Distinct, the Integration Service
adds a SELECT DISTINCT statement to the default
SQL query.
Create a custom query to issue a special
SELECT statement for the Integration Service
to read source data For example, you might use a custom query to
perform aggregate calculations
If the datatypes in the source definition and
Source Qualifier transformation do not
match, the Designer marks the mapping
invalid when you save it.
You specify a target load order based on the
Source
Qualifier
transformations
in
a
mapping
If one Source Qualifier transformation provides data
for multiple targets, you can enable constraintbased loading in a session to have the Integration
Service load data based on target table primary
and foreign key relationships
You can use parameters and variables in the SQL
query, user-defined join, source filter, and pre- and
post-session SQL commands of a Source Qualifier
transformation
The Integration Service first generates an SQL
query and expands each parameter or
variable.
It replaces each mapping parameter, mapping
variable, and workflow variable with its start value.
Then it runs the query on the source database
Source Qualifier Transformation Properties
100
Default Query
For relational sources, the Integration Service
generates a query for each Source Qualifier
transformation when it runs a session.
The default query is a SELECT statement for
each source column used in the mapping.
In other words, the Integration Service reads
only the columns that are connected to
another transformation
If any table name or column name contains a
database reserved word, you can create and
maintain a file, reswords.txt, containing reserved
words.
When the Integration Service initializes a session, it
searches for reswords.txt in the Integration Service
installation directory.
If the file exists, the Integration Service places
quotes around matching reserved words when it
executes SQL against the database.
If you override the SQL, you must enclose any
reserved word in quotes.
When a mapping uses related relational sources, you
can join both sources in one Source Qualifier
transformation.
During the session, the source database performs
the join before passing data to the Integration
Service
Custom Join You might need to override the default join under
the following circumstances:
- Columns do not have a primary key-foreign key
relationship.
- The datatypes of columns used for the join do not
match.
- You want to specify a different type of join, such as
an outer join.
Adding an SQL Query
The Source Qualifier transformation provides the
SQL Query option to override the default query.
You can enter an SQL statement supported by the
source database.
Before entering the query, connect all the input and
output ports you want to use in the mapping.
Entering a User-Defined Join
Entering a user-defined join is similar to entering a
custom SQL query.
However, you only enter the contents of the WHERE
clause, not the entire query.
When you perform an outer join, the Integration
Service may insert the join syntax in the WHERE
clause or the FROM clause of the query, depending
on the database syntax.
101
102
Q. SQL TRANSFORMATION
The SQL transformation processes SQL queries
midstream in a pipeline.
You can insert, delete, update, and retrieve rows
from a database.
You can pass the database connection information to
the SQL transformation as input data at run time.
The transformation processes external SQL scripts or
SQL queries that you create in an SQL editor.
The SQL transformation processes the query and
returns rows and database errors
When you create an SQL transformation, you
configure the following options:
Mode - The SQL transformation runs in one of the
following modes:
Script mode: The SQL transformation runs ANSI SQL
scripts that are externally located. You pass a script
name to the transformation with each input row.
The SQL transformation outputs one row for
each input row.
Query mode: The SQL transformation executes a
query that you define in a query editor. You can pass
strings or parameters to the query to define
dynamic queries or change the selection
parameters. You can output multiple rows when
the query has a SELECT statement.
103
You can pass the full query or pass part of the query
in an input port:
Full query - You can substitute the entire SQL query
with query statements from source data.
Partial query - You can substitute a portion of the
query statement, such as the table name.
You can add pass-through ports to the SQL
transformation
When the source row contains a SELECT query
statement, the SQL transformation returns the data
in the pass-through port in each row it returns from
the database.
104
105
R. STORED PROCEDURE
There are three types of data that pass between the
Integration Service and the stored procedure:
- Input/output parameters
- Return values
- Status codes
for
the
106
107
<Print 384-385>
Changing the Stored Procedure
If the number of parameters or the return value in a
stored procedure changes, you can either re-import
it or edit the Stored Procedure transformation
manually.
The Designer does not verify the Stored
Procedure transformation each time you open
the mapping.
After you import or create the transformation, the
Designer does not validate the stored procedure.
The session fails if the stored procedure does not
match the transformation.
Configuring an Unconnected Transformation
An unconnected Stored Procedure transformation is
not directly connected to the flow of data through
the mapping.
Instead, the stored procedure runs either:
From an expression - Called from an expression
written in the Expression Editor within another
transformation in the mapping.
Pre- or post-session - Runs before or after a
session
When using an unconnected Stored Procedure
transformation in an expression, you need a
108
S. TRANSACTION CONTROL
A transaction is the set of rows bound by commit or
roll back rows. You can define a transaction based
on a varying number of input rows. You might want
to define transactions based on a group of rows
ordered on a common key, such as employee ID or
order entry date.
In PowerCenter, you define transaction control at the
following levels:
Within a mapping - Within a mapping, you use the
Transaction Control transformation to define a
transaction. You define transactions using an
expression in a Transaction Control transformation.
Based on the return value of the expression, you can
choose to commit, roll back, or continue without any
transaction changes.
Within a session - When you configure a session,
you configure it for user-defined commit. You can
choose to commit or roll back a transaction if the
Integration Service fails to transform or write any
row to the target.
If the mapping has a flat file target you can
generate an output file each time the
Integration Service starts a new transaction.
You can dynamically name each target flat
file.
Transaction Control Transformation Properties
Use the Transaction Control transformation to define
conditions to commit and roll back transactions
from transactional targets.
Transactional targets include relational, XML,
and dynamic MQSeries targets
The transaction control expression uses the IIF
function to test each row against the condition.
Use the following syntax for the expression:
IIF (condition, value1, value2)
The Integration Service evaluates the condition on a
row-by-row basis.
109
110
T. UNION
The Integration Service processes all input
groups in parallel.
It concurrently reads sources connected to the Union
transformation and pushes blocks of data into the
input groups of the transformation.
You can connect heterogeneous sources to a Union
transformation.
The transformation merges sources with matching
ports and outputs the data from one output group
with the same ports as the input groups.
The Union transformation is developed using the
Custom transformation
Similar to the UNION ALL statement, the Union
transformation does not remove duplicate rows
Rules and Guidelines for Union - You can create multiple input groups, but only one
output group.
- All input groups and the output group must have
matching ports. The precision, datatype, and scale
must be identical across all groups.
- The Union transformation does not remove
duplicate rows. To remove duplicate rows, you must
U. UPDATE STRATEGY
In PowerCenter, you set the update strategy at two
different levels:
Within a session - When you configure a session,
you can instruct the Integration Service to either
treat all rows in the same way (for example, treat all
rows as inserts), or use instructions coded into the
session mapping to flag rows for different database
operations.
Within a mapping - Within a mapping, you use the
Update Strategy transformation to flag rows for
insert, delete, update, or reject.
Note: You can also use the Custom transformation
to flag rows for insert, delete, update, or reject
Flagging Rows Within a Mapping
For the greatest degree of control over the update
strategy, you add Update Strategy transformations
to a mapping.
The most important feature of this transformation is
its update strategy expression, used to flag
individual rows for insert, delete, update, or reject.
The following table lists the constants for each
database operation and their numeric equivalent:
111
112
X. XML GENERATOR
Use an XML Generator transformation to create XML
inside a pipeline.
The XML Generator transformation lets you read
data from messaging systems, such as TIBCO and
MQ Series, or from other sources, such as files or
databases.
The XML Generator transformation functionality is
similar to the XML target functionality, except it
generates the XML in the pipeline.
For example, you might want to extract data from
relational sources and pass XML data to targets.
The XML Generator transformation accepts data
from multiple ports and writes XML through a single
output port.
W. XML PARSER
Use an XML Parser transformation to extract XML
inside a pipeline.
The XML Parser transformation lets you extract XML
data from messaging systems, such as TIBCO or
MQ Series, and from other sources, such as files or
databases.
The XML Parser transformation functionality is
similar to the XML source functionality, except it
parses the XML in the pipeline.
For example, you might want to extract XML data
from a TIBCO source and pass the data to relational
targets.
The XML Parser transformation reads XML data from
a single input port and writes data to one or more
output ports.
7. TRANSFORMATION LANGUAGE
REFERENCE
113
Reserved Words
Some keywords in the transformation language,
such as constants, operators, and built-in variables,
are reserved for specific functions. These include:
- :EXT
- :INFA
- :LKP
- :MCR
- :SD
- :SEQ
- :SP
- :TD
- AND
- DD_DELETE
- DD_INSERT
- DD_REJECT
- DD_UPDATE
- FALSE
- NOT
- NULL
- OR
- PROC_RESULT
- SESSSTARTTIME
- SPOUTPUT
- SYSDATE
- TRUE
- WORKFLOWSTARTTIME
The following words are reserved for workflow
expressions:
114
ABORTED
DISABLED
FAILED
NOTSTARTED
STARTED
STOPPED
SUCCEEDED
TC_COMMIT_BEFORE,
STRING_PORT,
115
Example
The following expression produces the same return
values for any current year between 1950 and 2049:
TO_DATE( ORDER_DATE, 'MM/DD/RR' )
ORDER_DATE RETURN_VALUE
'04/12/98'
04/12/1998 00:00:00.000000000
'11/09/01'
11/09/2001 00:00:00.000000000
SHIP_DATE
RETURN_VALUE
Dec 31 1999 23:59:59
2451544
Jan 1 1900 01:02:03 2415021
RR FORMAT STRING
The transformation language provides the RR format
string to convert strings with two-digit years to
dates.
Using TO_DATE and the RR format string, you can
convert a string in the format MM/DD/RR to a date.
The RR format string converts data differently
depending on the current year.
Current Year Between 0 and 49 - If the current year
is between 0 and 49 (such as 2003) and the source
string year is between 0 and 49, the PowerCenter
Integration Service returns the current century plus
the two-digit year from the source string. If the
source string year is between 50 and 99, the
Integration Service returns the previous century
plus the two-digit year from the source string.
Current Year Between 50 and 99 - If the current year
is between 50 and 99 (such as 1998) and the
source string year is between 0 and 49, the
PowerCenter Integration Service returns the next
116
format
is
MM/DD/YYYY
117
118
CHR
ASCII Mode - CHR returns the ASCII character
corresponding to the numeric value you pass
to this function.
Unicode Mode - returns the Unicode character
corresponding to the numeric value you pass to this
function
ASCII values fall in the range 0 to 255.
You can pass any integer to CHR, but only ASCII
codes 32 to 126 are printable characters.
Syntax
CHR( numeric_value )
119
COUNT
The return value is: Joan's car
CHRCODE
ASCII Mode - CHRCODE returns the numeric ASCII
value of the first character of the string passed to
the function.
UNICODE Mode - returns the numeric Unicode value
of the first character of the string passed to the
function
COMPRESS
Compresses data using the zlib 1.2.1 compression
algorithm.
Use the COMPRESS function before you send large
amounts of data over a wide area network.
Syntax
COMPRESS( value )
Return Value
Compressed binary value of the input value.
NULL if the input is a null value.
CONCAT
Syntax
CONCAT( first_string, second_string )
Return Value
String.
If one of the strings is NULL, CONCAT ignores
it and returns the other string.
If both strings are NULL, CONCAT returns NULL
values in a group.
you can include the asterisk (*)
argument to count all input values in a
transformation.
Optionally,
CONVERT_BASE
Converts a number from one base value to another
base value.
Syntax
CONVERT_BASE( value, source_base, dest_base )
The following example converts 2222 from the
decimal base value 10 to the binary base value 2:
Syntax
CRC32( value )
CUME
Returns a running total. A running total means CUME
returns a total each time it adds a value.
You can add a condition to filter rows out of the row
set before calculating the running total.
Use CUME and similar functions (such as
MOVINGAVG and MOVINGSUM) to simplify reporting
by calculating running values.
The PowerCenter
100010101110
Integration
Service
returns
Syntax
CUME( numeric_value [, filter_condition] )
120
DATE_COMPARE
Returns an integer indicating which of two dates
is earlier. DATE_COMPARE returns an integer value
rather than a date value.
Return Value
-1 if the first date is earlier.
0 if the two dates are equal.
1 if the second date is earlier.
NULL if one of the date values is NULL
DATE_DIFF
Returns the length of time between two dates. You
can request the format to be years, months, days,
hours,
minutes,
seconds,
milliseconds,
microseconds, or nanoseconds.
The PowerCenter Integration Service subtracts the
second date from the first date and returns the
difference.
Var1
21
22
23
24
25
Syntax
DATE_DIFF( date1, date2, format )
Return Value
Double value. If date1 is later than date2, the return
value is a positive number. If date1 is earlier than
date2, the return value is a negative number.
0 if the dates are the same.
NULL if one (or both) of the date values is NULL.
DECODE
Examples
You might use DECODE in an expression that
searches for a particular ITEM_ID and returns the
ITEM_NAME:
DECODE( ITEM_ID, 10, 'Flashlight',
14, 'Regulator',
20, 'Knife',
40, 'Tank',
'NONE' )
ITEM_ID RETURN VALUE
10
Flashlight
14
Regulator
17
NONE
20
Knife
25
NONE
NULL NONE
40
Tank
Var2
47
49
49
27
50
RETURN VALUE
Variable 1 was less than 23.
Variable 1 was 22!
Variable 2 was 49!
Variables were out of desired ranges.
Variable 2 was more than 30.
ERROR
Causes the PowerCenter Integration Service to skip
a row and issue an error message, which you
define.
The error message displays in the session log.
The PowerCenter Integration Service does not write
these skipped rows to the session reject file.
For example, you use the ERROR function in an
expression, and you assign the default value,
1234, to the output port.
Each time the PowerCenter Integration Service
encounters the ERROR function in the expression, it
overrides the error with the value 1234 and
passes 1234 to the next transformation.
It does not skip the row, and it does not log an error
in the session log
IIF( SALARY < 0, ERROR ('Error. Negative salary
found. Row skipped.', EMP_SALARY )
SALARY
RETURN VALUE
10000 10000
-15000
'Error. Negative salary found. Row
skipped.'
NULL NULL
150000
150000
EXP
Returns e raised to the specified power (exponent),
where e=2.71828183.
121
Syntax
EXP( exponent )
Return Value
Double value.
NULL if a value passed as an argument to the
function is NULL
FIRST
Returns the first value found within a port or group.
Optionally, you can apply a filter to limit the rows
the PowerCenter Integration Service reads.
You can nest only one other aggregate function
within FIRST.
Syntax
FIRST( value [, filter_condition ] )
Return Value
First value in a group
If a value is NULL, FIRST ignores the row. However, if
all values passed from the port are NULL, FIRST
returns NULL
FIRST groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port, FIRST treats all rows as
one group, returning one value.
FLOOR
Returns the largest integer less than or equal to the
numeric value you pass to this function.
For example, if you pass 3.14 to FLOOR, the function
returns 3.
If you pass 3.98 to FLOOR, the function returns 3.
Likewise, if you pass -3.17 to FLOOR, the function
returns -4.
Syntax
FV( rate, terms, payment [, present value, type] )
Example
You deposit $2,000 into an account that earns 9%
annual interest compounded monthly (monthly
interest of 9%/ 12, or 0.75%).
You plan to deposit $250 at the beginning of every
month for the next 12 months.
The following expression returns $5,337.96 as the
account balance at the end of 12 months:
FV(0.0075, 12, -250, -2000, TRUE)
GET_DATE_PART
Returns the specified part of a date as an integer
value.
Therefore, if you create an expression that returns
the month portion of the date, and pass a date
such as Apr 1 1997 00:00:00, GET_DATE_PART
returns 4.
Syntax
GET_DATE_PART( date, format )
Return Value
Integer representing the specified part of the date.
NULL if a value passed to the function is NULL.
Syntax
FLOOR( numeric_value )
Return Value
Integer if you pass a numeric value with declared
precision between 0 and 28.
Double if you pass a numeric value with declared
precision greater than 28.
GREATEST
122
RETURN VALUE
13
3
22
IN
Matches input data to a list of values. By default, the
match is case sensitive.
Syntax
IN( valueToSearch, value1, [value2, ..., valueN,]
CaseFlag )
Example
The following expression determines if the input
value is a safety knife, chisel point knife, or
medium titanium knife.
The input values do not have to match the case of
the values in the comma-separated list:
IN( ITEM_NAME, Chisel Point Knife, Medium Titanium
Knife, Safety Knife, 0
Syntax
IIF( condition, value1 [,value2] )
Unlike conditional functions in some systems, the
FALSE (value2) condition in the IIF function is not
required.
If you omit value2, the function returns the
following when the condition is FALSE:
- 0 if value1 is a Numeric datatype.
- Empty string if value1 is a String datatype.
- NULL if value1 is a Date/Time datatype.
For example, the following expression does not
include a FALSE condition and value1 is a string
datatype so the PowerCenter Integration Service
returns an empty string for each row that evaluates
to FALSE: IIF( SALES > 100, EMP_NAME )
SALES
150
50
120
NULL
EMP_NAME
John Smith
Pierre Bleu
Sally Green
Greg Jones
RETURN VALUE
John Smith
'' (empty string)
Sally Green
'' (empty string)
ITEM_NAME
RETURN VALUE
Stabilizing Vest
0 (FALSE)
Safety knife
1 (TRUE)
Medium Titanium knife
1 (TRUE)
NULL
INDEXOF
Finds the index of a value among a list of values. By
default, the match is case sensitive.
Syntax
INDEXOF( valueToSearch,
stringN,] CaseFlag )
[string2,
...,
Example
The following expression determines if values from
the ITEM_NAME port match the first, second, or
third string:
INDEXOF( ITEM_NAME, diving hood, flashlight,
safety knife)
ITEM_NAME
Safety Knife
diving hood
Compass
safety knife
flashlight
string1,
RETURN VALUE
0
1
0
3
2
123
COMPANY
Blue Fin Aqua Center
Maco Shark Shop
Scuba Gear
Frank's Dive Shop
VIP Diving Club
Syntax
INITCAP( string )
Example
The following expression capitalizes all names in the
FIRST_NAME port: INITCAP( FIRST_NAME )
FIRST_NAME
ramona
18-albert
NULL
?!SAM
THOMAS
PierRe
RETURN VALUE
Ramona
18-Albert
NULL
?!Sam
Thomas
Pierre
INSTR
Returns the position of a character set in a string,
counting from left to right.
Syntax
INSTR( string, search_value
[,comparison_type ]]] )
[,start
[,occurrence
COMPANY
Blue Fin Aqua Center
Maco Shark Shop
Scuba Gear
Frank's Dive Shop
VIP Diving Club
RETURN VALUE
0
8
9
0
0
CUST_NAME,1,INSTR(
CUST_NAME
PATRICIA JONES
MARY ELLEN SHAH
CUST_NAME,'
'
,-
RETURN VALUE
PATRICIA
MARY ELLEN
Return Value
Integer if the search is successful. Integer represents
the position of the first character in the
search_value, counting from left to right.
0 if the search is unsuccessful.
NULL if a value passed to the function is NULL
The following expression returns the position of the
second occurrence of the letter a, starting at the
beginning of each company name.
Because the search_value argument is case
sensitive, it skips the A in Blue Fin Aqua Center,
and returns 0:
INSTR( COMPANY, 'a', 1, 2 )
RETURN VALUE
1
0
0
0
0
CUST_ID
ID#33
#A3577
SS #712403399
RETURN VALUE
ID33
A3577
SS 712403399
ISNULL
Returns whether a value is NULL. ISNULL evaluates
an empty string as FALSE.
Note: To test for empty strings, use LENGTH.
Syntax
ISNULL( value )
Example
The following example checks for null values in the
items table:
ISNULL( ITEM_NAME )
ITEM_NAME
Flashlight
NULL
Regulator system
''
NULL
RETURN VALUE
0 (FALSE)
1 (TRUE)
0 (FALSE)
0 (FALSE) Empty string is not
IS_NUMBER
Returns whether a string is a valid number. A valid
number consists of the following parts:
124
RETURN VALUE
1 (True)
1 (True)
1 (True - Windows only)
0 (False - UNIX only)
0 (False) Incomplete number
0 (False) Consists entirely of blanks
0 (False) Empty string
0 (False)
1 (True) Leading white blanks
1 (True) Trailing white blanks
0 (False)
0 (False)
NULL
IS_SPACES
Returns whether a string value consists entirely of
spaces.
A space is a blank space, a formfeed, a newline, a
carriage return, a tab, or a vertical tab.
IS_SPACES evaluates an empty string as FALSE
because there are no spaces. To test for an
empty string, use LENGTH.
''
0 (FALSE) (Empty string does
not contain spaces.)
LAST
Returns the last row in the selected port.
Optionally, you can apply a filter to limit the rows
the PowerCenter Integration Service reads.
You can nest only one other aggregate function
within LAST
Syntax
LAST( value [, filter_condition ] )
Return Value
Last row in a port.
NULL if all values passed to the function are NULL, or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
LAST_DAY
Returns the date of the last day of the month for
each date in a port.
Syntax
LAST_DAY( date )
Return Value
Date. The last day of the month for that date value
you pass to this function
If a value is NULL, LAST_DAY ignores the row.
However, if all values passed from the port are
NULL, LAST_DAY returns NULL
LAST_DAY groups values based on group by ports
you define in the transformation, returning one
result for each group. If there is no group by port,
LAST_DAY treats all rows as one group, returning
one value
Examples
The following expression returns the last day of the
month for each date in the ORDER_DATE port:
LAST_DAY( ORDER_DATE )
Example
The following expression checks the ITEM_NAME port
for rows that consist entirely of spaces:
ORDER_DATE
RETURN VALUE
Apr 1 1998 12:00:00AM
Apr
30
1998
12:00:00AM
Jan 6 1998 12:00:00AM
Jan
31
1998
12:00:00AM
Feb 2 1996 12:00:00AM
Feb
29
1996
12:00:00AM (Leap year)
NULL
NULL
Jul 31 1998 12:00:00AM
Jul 31 1998 12:00:00AM
IS_SPACES( ITEM_NAME )
ITEM_NAME
Flashlight
Regulator system
NULL
RETURN VALUE
0 (False)
1 (True)
0 (False)
NULL
125
LEAST
Returns the smallest value from a list of input
values.
By default, the match is case sensitive.
BASE EXPONENT
RETURN VALUE
15
1
0
.09
10
-0.956244644696599
NULL 18
NULL
35.78 NULL
NULL
-9
18
Error. (PowerCenter Integration
Service does not write the row.)
0
5
Error. (PowerCenter Integration
Service does not write the row.)
10
-2
Error. (PowerCenter Integration
Service does not write the row.)
Syntax
LEAST( value1, [value2, ..., valueN,] CaseFlag )
Example
The following expression
quantity of items ordered:
returns
the
smallest
QUANTITY2
QUANTITY3
RETURN
150
756
27
5000
120
97
1724
17
965
27
NULL
17
120
LENGTH
Returns the number of characters in a string,
including trailing blanks.
Return Value
Integer representing the length of the string.
NULL if a value passed to the function is NULL
LN
Returns the natural logarithm of a numeric value.
For example, LN(3) returns 1.098612.
You usually use this function to analyze scientific
data rather than business data.
This function is the reciprocal of the function EXP.
Syntax
LN( numeric_value )
Return Value
Double value.
NULL if a value passed to the function is NULL
LOG
Returns the logarithm of a numeric value.
Most often, you use this function to analyze
scientific data rather than business data
Syntax
LOG( base, exponent )
Return Value
Double value.
NULL if a value passed to the function is NULL
Example
search1,
value1
[,
search2,
Return Value
Result if all searches find matching values. If the
PowerCenter Integration Service finds matching
values, it returns the result from the same row as
the search1 argument.
NULL if the search does not find any matching
values.
Error if the search finds more than one matching
value.
Example
The following expression searches the lookup source
:TD.SALES for a specific item ID and price, and
returns the item name if both searches find a
match:
126
LOOKUP(
:TD.SALES.ITEM_NAME,
:TD.SALES.ITEM_ID, 10, :TD.SALES.PRICE, 15.99 )
ITEM_NAME
ITEM_ID
Regulator
5
Flashlight
10
Halogen Flashlight
15
NULL
20
RETURN VALUE: Flashlight
PRICE
100.00
15.99
15.99
15.99
RETURN VALUE
000702
000001
000553
484834
RETURN VALUE
*..**.Flashlight
*..**..**Compass
Regulator System
*..*Safety Knife
LTRIM
Removes blanks or characters from the beginning of
a string.
You can use LTRIM with IIF or DECODE in an
Expression or Update Strategy transformation to
avoid spaces in a target table.
If you do not specify a trim_set parameter in the
expression:
- In UNICODE mode, LTRIM removes both single- and
double-byte spaces from the beginning of a string.
- In ASCII mode, LTRIM removes only single-byte
spaces.
If you use LTRIM to remove characters from a string,
LTRIM compares the trim_set to each character in
the
string
argument,
character-by-character,
starting with the left side of the string.
If the character in the string matches any character
in the trim_set, LTRIM removes it.
LTRIM continues comparing and removing characters
until it fails to find a matching character in the
trim_set.
Then it returns the string, which does not include
matching characters.
Syntax
LTRIM( string [, trim_set] )
Return Value
String. The string values with the specified
characters in the trim_set argument removed.
NULL if a value passed to the function is NULL. If the
trim_set is NULL, the function returns NULL.
Example
The following expression removes the characters S
and . from the strings in the LAST_NAME port:
LTRIM( LAST_NAME, 'S.')
LAST_NAME
Nelson
127
RETURN VALUE
Nelson
Osborne
NULL
S. MacDonald
Sawyer
H. Bender
Steadman
Osborne
NULL
MacDonald
awyer
H. Bender
teadman
LAST_NAME
Nelson
Page
Osborne
NULL
Sawyer
H. Bender
Steadman
RETURN VALUE
Nelson
Pag
Osborn
NULL
Sawy
H. Bend
Steadman
Syntax
RTRIM( string [, trim_set] )
Example
The following expression creates a date and time
from the input ports:
Example
128
MAX/MIN (String)
Returns the highest string value found within a port
or group.
You can apply a filter to limit the rows in the search.
You can nest only one other aggregate function
within MAX.
Note: The MAX function uses the same sort order
that the Sorter transformation uses. However, the
MAX function is case sensitive, and the Sorter
transformation may not be case sensitive.
You can also use MAX to return the latest date or the
largest numeric value in a port or group.
Syntax
MAX( string [, filter_condition] )
Return Value
String.
NULL if all values passed to the function are NULL, or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
MAX groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port, MAX treats all rows as
one group, returning one value
MD5
Calculates the checksum of the input value. The
function uses Message-Digest algorithm 5 (MD5).
MD5 is a oneway cryptographic hash function with a
128-bit hash value.
You can conclude that input values are different
when the checksums of the input values are
different. Use MD5 to verify data integrity.
Syntax
MD5( value )
Return Value
Unique 32-character string of hexadecimal digits 0-9
and a-f.
NULL if the input is a null value.
129
MEDIAN
Returns the median of all values in a selected port.
If there is an even number of values in the port, the
median is the average of the middle two values
when all values are placed ordinally on a number
line. If there is an odd number of values in the port,
the median is the middle number.
You can nest only one other aggregate function
within MEDIAN, and the nested function must return
a Numeric datatype.
The PowerCenter Integration Service reads all rows
of data to perform the median calculation.
The process of reading rows of data to perform the
calculation may affect performance.
Optionally, you can apply a filter to limit the rows
you read to calculate the median.
MOD
Returns the remainder of a division calculation. For
example, MOD(8,5) returns 3.
Syntax
MOD( numeric_value, divisor )
Return Value
Numeric value of the datatype you pass to the
function. The remainder of the numeric value
divided by the divisor.
NULL if a value passed to the function is NULL.
Examples
The following expression returns the modulus of the
values in the PRICE port divided by the values in
the QTY port:
MOD( PRICE, QTY )
PRICE QTY
10.00 2
12.00 5
9.00 2
15.00 3
NULL 3
20.00 NULL
25.00 0
write row.
Syntax
MEDIAN( numeric_value [, filter_condition ] )
Return Value
Numeric value.
NULL if all values passed to the function are NULL, or
if no rows are selected. For example, the filter
condition evaluates to FALSE or NULL for all rows.
Note: If the return value is Decimal with precision
greater than 15, you can enable high precision to
ensure decimal precision up to 28 digits.
MEDIAN groups values based on group by ports you
define in the transformation, returning one result
for each group.
If there is no group by port, MEDIAN treats all rows
as one group, returning one value
PRICE
10.00
12.00
9.00
15.00
NULL
20.00
25.00
METAPHONE
Encodes string values. You can specify the length of
the string that you want to encode.
METAPHONE encodes characters of the English
language alphabet (A-Z).
It encodes both uppercase and lowercase letters in
uppercase.
METAPHONE encodes characters according to the
following list of rules:
- Skips vowels (A, E, I, O, and U) unless one of them
is the first character of the input string.
METAPHONE(CAR)
returns
KR
and
METAPHONE(AAR) returns AR.
- Uses special encoding guidelines
RETURN VALUE
0
2
1
0
NULL
NULL
Error. Integration Service does not
QTY
2
5
2
3
3
NULL
0
RETURN VALUE
0
2
1
0
NULL
NULL
NULL
130
rowset
ROW_NO
1
2
3
4
5
6
7
[,
Return Value
Numeric value.
MOVINGAVG ignores null values when calculating
the moving average. However, if all values are
NULL, the function returns NULL thereafter, returns
the average for the last five rows read:
SALES
600
504
36
100
550
39
490
RETURN VALUE
NULL
NULL
NULL
NULL
1790
1229
1215
MOVINGAVG( SALES, 5 )
ROW_NO
1
600
2
504
3
36
4
100
5
550
6
39
7
490
NPER
Returns the number of periods for an investment
based on a constant interest rate and periodic,
constant payments.
Syntax
NPER( rate, present value, payment [, future value,
type] )
rowset
Return Value
Numeric.
Example
The present value of an investment is $2,000. Each
payment is $500 and the future value of the
investment is $20,000.
The following expression returns 9 as the number of
periods for which you need to make the payments:
[,
Return Value
Numeric value.
MOVINGSUM ignores null values when calculating
the moving sum. However, if all values are NULL,
the function returns NULL.
Example
The following expression returns the sum of orders
for a Stabilizing Vest, based on the first five rows in
the Sales port, and thereafter, returns the average
for the last five rows read:
PERCENTILE
Calculates the value that falls at a given percentile
in a group of numbers.
You can nest only one other aggregate function
within PERCENTILE, and the nested function must
return a Numeric datatype.
The PowerCenter Integration Service reads all rows
of data to perform the percentile calculation.
The process of reading rows to perform the
calculation may affect performance.
Optionally, you can apply a filter to limit the rows
you read to calculate the percentile
MOVINGSUM( SALES, 5 )
Syntax
131
PERCENTILE(
numeric_value,
filter_condition ] )
percentile
[,
Return Value
Numeric value.
If a value is NULL, PERCENTILE ignores the row.
However, if all values in a group are NULL,
PERCENTILE returns NULL.
PERCENTILE groups values based on group by ports
you define in the transformation, returning one
result for each group.
If there is no group by port, PERCENTILE treats all
rows as one group, returning one value
Example
The PowerCenter Integration Service calculates a
percentile using the following logic:
Use the following guidelines for this equation:
- x is the number of elements in the group of values
for which you are calculating a percentile.
- If i < 1, PERCENTILE returns the value of the first
element in the list.
- If i is an integer value, PERCENTILE returns the
value of the ith element in the list
- Otherwise PERCENTILE returns the value of n:
Return Value
Numeric.
Example
The following expression returns -2111.64 as the
monthly payment amount of a loan:
PMT( 0.01, 10, 20000 )
POWER
Returns a value raised to the exponent you pass to
the function.
Syntax
POWER( base, exponent )
Return Value
Double value.
Example
The following expression returns the values in the
Numbers port raised to the values in the Exponent
port:
POWER( NUMBERS, EXPONENT )
NUMBERS EXP
RETURN VALUE
10.0 2.0
100
3.5
6.0
1838.265625
3.5
5.5
982.594307804838
NULL 2.0
NULL
10.0 NULL NULL
-3.0
-6.0
0.00137174211248285
3.0
-6.0
0.00137174211248285
-3.0
6.0
729.0
-3.0
5.5
729.0
SALARY
125000.0
27900.0
100000.0
NULL
55000.0
9000.0
85000.0
86000.0
48000.0
99000.0
RETURN VALUE: 106250.0
PMT
Returns the payment for a loan based on constant
payments and a constant interest rate.
Syntax
PMT( rate, terms, present value[, future value,
type] )
Return Value
Numeric.
Example
The following expression returns 12,524.43 as the
amount you must deposit in the account today to
132
RAND
Returns a random number between 0 and 1. This is
useful for probability scenarios.
Syntax
RAND( seed )
Return Value
Numeric.
For the same seed, the PowerCenter Integration
Service generates the same sequence of numbers
Return Value
Returns the value of the nth subpattern that is part
of the input value. The nth subpattern is based on
the value you specify for subPatternNum.
NULL if the input is a null value or if the pattern is
null
Example
You might use REG_EXTRACT in an expression to
extract middle names from a regular expression
that matches first name, middle name, and last
name.
For example, the following expression returns the
middle name of a regular expression:
REG_EXTRACT( Employee_Name, '(\w+)\s+(\w+)\s+
(\w+)',2)
Example
The following expression may return a value of
0.417022004702574:
Employee_Name
Return Value
Stephen Graham
Smith Graham
Juan Carlos Fernando
Carlos
RAND (1)
RATE
Returns the interest rate earned per period by a
security.
Syntax
RATE( terms, payment, present value[, future value,
type] )
REG_MATCH
Returns whether a value matches a regular
expression pattern.
This lets you validate data patterns, such as IDs,
telephone numbers, postal codes, and state names.
Note: Use the REG_REPLACE function to replace a
character pattern in a string with a new character
pattern.
Return Value
Numeric.
Example
The following expression returns 0.0077 as the
monthly interest rate of a loan:
RATE( 48, -500, 20000 )
Syntax
REG_MATCH( subject, pattern )
Return Value
TRUE if the data matches the pattern.
FALSE if the data does not match the pattern.
NULL if the input is a null value or if the pattern is
NULL.
Example
You might use REG_MATCH in an expression to
validate telephone numbers.
For example, the following expression matches a 10digit telephone number against the pattern and
returns a Boolean value based on the match:
REG_MATCH
'(\d\d\d-\d\d\d-\d\d\d\d)' )
Phone_Number
408-555-1212
NULL
510-555-1212
92 555 51212
Syntax
REG_EXTRACT( subject, 'pattern', subPatternNum )
133
Return Value
TRUE
TRUE
FALSE
(Phone_Number,
650-555-1212
415-555-1212
831 555 12123
TRUE
TRUE
FALSE
Employee_Name
Adam Smith Adam
Greg Sanders
Sarah Fe
Sarah
Sam Cooper
RETURN VALUE
Smith
Greg Sanders
Fe
Sam Cooper
REPLACECHR
Replaces characters in a string with a single
character or no character.
REPLACECHR searches the input string for the
characters you specify and replaces all occurrences
of all characters with the new character you specify.
Syntax
REPLACECHR( CaseFlag, InputString, OldCharSet,
NewChar )
Return Value
String.
Empty string if REPLACECHR removes all characters
in InputString.
NULL if InputString is NULL.
InputString if OldCharSet is NULL or empty
REG_REPLACE
Replaces characters in a string with another
character pattern.
By default, REG_REPLACE searches the input string
for the character pattern you specify and replaces
all occurrences with the replacement pattern.
You can also indicate the number of occurrences of
the pattern you want to replace in the string.
Syntax
REG_REPLACE(
subject,
numReplacements )
pattern,
replace,
Return Value
String
Example
The following expression removes additional spaces
from the Employee name data for each row of the
Employee_name port:
REG_REPLACE( Employee_Name, \s+, )
REPLACESTR
Replaces characters in a string with a single
character, multiple characters, or no character.
REPLACESTR searches the input string for all strings
you specify and replaces them with the new string
you specify.
Syntax
REPLACESTR ( CaseFlag, InputString, OldString1,
[OldString2, ... OldStringN,] NewString )
Return Value
String.
134
REVERSE
Reverses the input string.
Syntax
REVERSE( string )
<Print 123-127>
SETCOUNTVARIABLE
Counts the rows evaluated by the function and
increments the current value of a mapping variable
based on the count.
Increases the current value by one for each row
marked for insertion.
Decreases the current value by one for each row
marked for deletion.
Keeps the current value the same for each row
marked for update or reject.
Returns the new current value.
At the end of a successful session, the PowerCenter
Integration Service saves the last current value to
the repository.
When used with a session that contains multiple
partitions, the PowerCenter Integration Service
generates different current values for each
partition.
Syntax
SET_DATE_PART( date, format, value )
Return Value
Date in the same format as the source date with the
specified part changed.
135
0100002
0100003
0100004
0100005
0100006
0100007
12
5
18
35
5
14
22
22
22
35
35
35
ITEMS MAX_ITEMS
136
TRANSACTION
TOTAL SET_$$TIME
0100002
534.23
10/10/2000 01:34:33
0100003
699.01
10/10/2000 01:34:34
0100004
97.50 10/10/2000 01:34:35
0100005
116.43
10/10/2000 01:34:36
0100006
323.95
10/10/2000 01:34:37
At the end of the session, the PowerCenter
Integration Service saves 10/10/2000 01:34:37 to
the repository as the last evaluated current value
for $$Time. The next time the session runs, the
PowerCenter Integration Service evaluates all
references to $$Time to 10/10/2000 01:34:37
SIGN
Returns whether
negative, or 0.
numeric
value
is
positive,
Syntax
SIGN( numeric_value )
Return Value
-1 for negative values.
0 for 0.
1 for positive values.
NULL if NULL.
SOUNDEX
Encodes a string value into a four-character string.
SOUNDEX works for characters in the English
alphabet (A-Z).
It uses the first character of the input string as the
first character in the return value and encodes the
remaining three unique consonants as numbers.
SOUNDEX encodes characters according to the
following list of rules:
- Uses the first character in string as the first
character in the return value and encodes it in
uppercase. For example, both SOUNDEX(John) and
SOUNDEX(john) return J500.
- Encodes the first three unique consonants
following the first character in string and ignores the
rest. For example, both SOUNDEX(JohnRB) and
SOUNDEX(JohnRBCD) return J561.
- Assigns a single code to consonants that sound
alike.
SOUNDEX
Syntax
SOUNDEX( string )
Return Value
String.
NULL if one of the following conditions is true:
- If value passed to the function is NULL.
- No character in string is a letter of the English
alphabet.
- string is empty.
Example
The following expression encodes the values in the
EMPLOYEE_NAME port:
SOUNDEX( EMPLOYEE_NAME )
encoding
EMPLOYEE_NAME
John
William
jane
joh12n
137
RETURN VALUE
J500
W450
J500
J500
1abc
NULL
A120
NULL
STDDEV
Returns the standard deviation of the numeric
values you pass to this function.
STDDEV is used to analyze statistical data.
You can nest only one other aggregate function
within STDDEV, and the nested function must
return a Numeric datatype.
Syntax
STDDEV( numeric_value [,filter_condition] )
Return Value
Numeric value.
NULL if all values passed to the function are NULL or
if no rows are selected (for example, the filter
condition evaluates to FALSE or NULL for all rows).
PHONE
808-555-0269
809-555-3915
357-687-6708
NULL
NULL
RETURN VALUE
555
555
687
CUST_NAME
PATRICIA JONES
MARY ELLEN SHAH
RETURN VALUE
PATRICIA
MARY ELLEN
PHONE
809-555-0269
357-687-6708
NULL
NULL
RETURN VALUE
809
357
SUM
Returns the sum of all values in the selected port.
Optionally, you can apply a filter to limit the rows
you read to calculate the total.
You can nest only one other aggregate function
within SUM, and the nested function must return a
Numeric datatype
Syntax
SUM( numeric_value [, filter_condition ] )
Return Value
Numeric value.
138
NULL
NULL
'A12.3Grove'
0
' 176201123435.87' 176201123435
'-7245176201123435.2
-7245176201123435
'-7245176201123435.23'
-7245176201123435
-9223372036854775806.9 9223372036854775806
9223372036854775806.9 9223372036854775806
SYSTIMESTAMP
Returns the current date and time of the node
hosting the PowerCenter Integration Service with
precision to the nanosecond.
The precision to which you display the date and time
depends on the platform.
The return value of the function varies depending on
how you configure the argument:
When
you
configure
the
argument
of
SYSTIMESTAMP as a variable, the PowerCenter
Integration Service evaluates the function for each
row in the transformation.
When
you
configure
the
argument
of
SYSTIMESTAMP as a constant, the PowerCenter
Integration Service evaluates the function once and
retains the value for each row in the transformation.
Syntax
SYSTIMESTAMP( [format] )
TO_BIGINT
Converts a string or numeric value to a bigint value.
TO_BIGINT syntax contains an optional argument
that you can choose to round the number to the
nearest integer or truncate the decimal portion.
TO_BIGINT ignores leading blanks.
Syntax
TO_BIGINT( value [, flag] )
TO_CHAR (Dates)
Converts dates to character strings.
TO_CHAR also converts numeric values to strings.
You can convert the date into any format using the
TO_CHAR format strings.
Syntax
TO_CHAR( date [,format] )
Return Value
String.
NULL if a value passed to the function is NULL.
Examples
The following expression converts the dates in the
DATE_PROMISED port to text in the format MON DD
YYYY:
TO_CHAR( DATE_PROMISED, 'MON DD YYYY' )
DATE_PROMISED
RETURN VALUE
Apr 1 1998 12:00:10AM
'Apr 01 1998'
Feb 22 1998 01:31:10PM
'Feb 22 1998'
Oct 24 1998 02:12:30PM
'Oct 24 1998'
NULL
NULL
If you omit the format argument, TO_CHAR returns a
string in the date format specified in the session, by
default, MM/DD/YYYY HH24:MI:SS.US:
TO_CHAR( DATE_PROMISED )
DATE_PROMISED
RETURN VALUE
Apr 1 1998 12:00:10AM
'04/01/1998
00:00:10.000000'
Feb 22 1998 01:31:10PM
'02/22/1998
13:31:10.000000'
Oct 24 1998 02:12:30PM
'10/24/1998
14:12:30.000000'
NULL
NULL
Return Value
Bigint.
NULL if the value passed to the function is NULL.
0 if the value passed to the function contains
alphanumeric characters
Examples
The following expressions use values from the port
IN_TAX:
TO_BIGINT( IN_TAX, TRUE )
IN_TAX
RETURN VALUE
'7245176201123435.6789' 7245176201123435
'7245176201123435.2'
7245176201123435
'7245176201123435.2.48' 7245176201123435
TO_CHAR (Numbers)
Converts numeric values to text strings. TO_CHAR
also converts dates to strings.
TO_CHAR converts numeric values to text strings as
follows:
139
Return Value
If the string contains a non-numeric character,
converts the numeric portion of the string up to the
first nonnumeric character.
If the first numeric character is non-numeric, returns
0.
Decimal of precision and scale between 0 and 28,
inclusive.
NULL if a value passed to the function is NULL.
Example
This expression uses values from the port IN_TAX.
The datatype is decimal with precision of 10 and
scale of 3:
TO_DECIMAL( IN_TAX, 3 )
IN_TAX
'15.6789'
'60.2'
'118.348'
NULL
'A12.3Grove'
711A1
RETURN VALUE
15.679
60.200
118.348
NULL
0
711
TO_FLOAT
Converts a string or numeric value to a doubleprecision floating point number (the Double
datatype).
TO_FLOAT ignores leading blanks.
Syntax
TO_FLOAT( value )
Return Value
Double value.
0 if the value in the port is blank or a non-numeric
character.
NULL if a value passed to this function is NULL.
Example
This expression uses values from the port IN_TAX:
TO_FLOAT( IN_TAX )
IN_TAX
'15.6789'
'60.2'
'118.348'
NULL
'A12.3Grove'
TO_DECIMAL
Converts a string or numeric value to a decimal
value. TO_DECIMAL ignores leading blanks.
Syntax
TO_DECIMAL( value [, scale] )
TO_INTEGER
140
RETURN VALUE
15.6789
60.2
118.348
NULL
0
Syntax
TO_INTEGER( value [, flag] )
Return Value
Integer.
NULL if the value passed to the function is NULL.
0 if the value passed to the function contains
alphanumeric characters
Examples
The following expressions use values from the port
IN_TAX. The PowerCenter Integration Service
displays an error when the conversion causes a
numeric overflow:
Examples
The following expressions truncate the year portion
of dates in the DATE_SHIPPED port:
TRUNC( DATE_SHIPPED, 'Y' )
TRUNC( DATE_SHIPPED, 'YY' )
TRUNC( DATE_SHIPPED, 'YYY' )
TRUNC( DATE_SHIPPED, 'YYYY' )
DATE_SHIPPED
Jan 15 1998 2:10:30AM
12:00:00.000000000
Apr 19 1998 1:31:20PM
12:00:00.000000000
Jun 20 1998 3:50:04AM
12:00:00.000000000
Dec 20 1998 3:29:55PM
12:00:00.000000000
NULL
NULL
IN_TAX
'15.6789'
15
'60.2'
60
'118.348'
118
5,000,000,000
doesn't write row.
NULL
NULL
'A12.3Grove' 0
' 123.87'
123
'-15.6789'
-15
RETURN VALUE
Error.
Integration
Service
TRUNC (Dates)
Truncates dates to a specific year, month, day, hour,
minute, second, millisecond, or microsecond.
You can also use TRUNC to truncate numbers.
You can truncate the following date parts:
Year - If you truncate the year portion of the date,
the function returns Jan 1 of the input year with the
time set to 00:00:00.000000000. For example, the
following
expression
returns
1/1/1997
00:00:00.000000000: TRUNC(12/1/1997 3:10:15,
'YY')
Month - If you truncate the month portion of a date,
the function returns the first day of the month with
the time set to 00:00:00.000000000. For example,
the
following
expression
returns
4/1/1997
00:00:00.000000000: TRUNC(4/15/1997 12:15:00,
'MM')
Day - If you truncate the day portion of a date, the
function returns the date with the time set to
00:00:00.000000000. For example, the following
RETURN VALUE
Jan
1
1998
Jan
1998
Jan
1998
Jan
1998
TRUNC (Numbers)
Truncates numbers to a specific digit. You can also
use TRUNC to truncate dates.
Syntax
TRUNC( numeric_value [, precision] )
If precision is a positive integer, TRUNC returns
numeric_value with the number of decimal places
specified by precision.
If precision is a negative integer, TRUNC changes
the specified digits to the left of the decimal point
to zeros.
If you omit the precision argument, TRUNC truncates
the decimal portion of numeric_value and returns
an integer.
If you pass a decimal precision value, the
PowerCenter
Integration
Service
rounds
numeric_value to the nearest integer before
evaluating the expression.
Return Value
Numeric value.
NULL if one of the arguments is NULL
141
Examples
The following expressions truncate the values in the
Price port:
TRUNC( PRICE, 3 )
PRICE
12.9995
-18.8652
56.9563
15.9928
NULL
RETURN VALUE
12.999
-18.865
56.956
15.992
NULL
8. PERFORMANCE TUNING
TRUNC( PRICE, -1 )
PRICE
12.99
-187.86
56.95
1235.99
RETURN VALUE
10.0
-180.0
50.0
1230.0
UPPER
Converts lowercase string characters to uppercase.
Syntax
UPPER( string )
VARIANCE
Returns the variance of a value you pass to it.
VARIANCE is used to analyze statistical data.
You can nest only one other aggregate function
within VARIANCE, and the nested function must
return a Numeric datatype.
Syntax
VARIANCE( numeric_value [, filter_condition ] )
142
Example
When you run a session, the session log lists run
information and thread statistics similar to the
following text:
***** RUN INFO FOR TGT LOAD ORDER GROUP [1],
CONCURRENT SET [1] *****
Thread [READER_1_1_1] created for [the read stage]
of partition point [SQ_two_gig_file_32B_rows] has
completed.
Total Run Time = [505.871140] secs
Total Idle Time = [457.038313] secs
Busy Percentage = [9.653215]
Thread [TRANSF_1_1_1] created for [the
transformation stage] of partition point
[SQ_two_gig_file_32B_rows] has completed.
Total Run Time = [506.230461] secs
Total Idle Time = [1.390318] secs
Busy Percentage = [99.725359]
Thread work time breakdown:
LKP_ADDRESS: 25.000000 percent
SRT_ADDRESS: 21.551724 percent
RTR_ZIP_CODE: 53.448276 percent
Thread [WRITER_1_*_1] created for [the write stage]
of partition point [scratch_out_32B] has completed.
Total Run Time = [507.027212] secs
Total Idle Time = [384.632435] secs
Busy Percentage = [24.139686]
In this session log, the total run time for the
transformation thread is 506 seconds and the busy
percentage is 99.7%.
This means the transformation thread was never idle
for the 506 seconds.
The reader and writer busy percentages were
significantly smaller, about 9.6% and 24%.
143
144
145
146
147
148
Optimizing Expressions
Complete the following tasks to isolate the slow
expressions:
1. Remove the expressions one-by-one from the
mapping.
2. Run the mapping to determine the time it takes to
run the mapping without the transformation.
If there is a significant difference in session run time,
look for ways to optimize the slow expression.
Factoring Out Common Logic
If the mapping performs the same task in multiple
places, reduce the number of times the mapping
performs the task by moving the task earlier in the
mapping.
For example, you have a mapping with five target
tables.
Each target requires a Social Security number
lookup.
149
||
CUSTOMERS.LAST_NAME
150
151
Optimizing
Sequence
Generator
Transformations
To optimize Sequence Generator transformations,
create a reusable Sequence Generator and using it
in multiple mappings simultaneously.
Also, configure the Number of Cached Values
property.
The Number of Cached Values property determines
the number of values the Integration Service
caches at one time.
Make sure that the Number of Cached Value is not
too small.
Consider configuring the Number of Cached Values
to a value greater than 1,000.
If you do not have to cache values, set the Number
of Cache Values to 0.
Sequence Generator transformations that do not use
cache are faster than those that require cache.
Optimizing Sorter Transformations
- Allocate enough memory to sort the data.
- Specify a different work directory for each partition
in the Sorter transformation.
152
When
a PowerCenter mapping
contains
a
transformation that has cache memory, deploying
adequate memory and separate disk storage for
each cache instance improves performance.
Running a session on a grid can improve throughput
because the grid provides more resources to run
the session.
Performance improves when you run a few sessions
on the grid at a time.
Running a session on a grid is more efficient than
running a workflow over a grid if the number of
concurrent session partitions is less than the
number of nodes.
When you run multiple sessions on a grid, session
subtasks share node resources with subtasks of
other concurrent sessions.
Running a session on a grid requires coordination
between processes running on different nodes.
For some mappings, running a session on a grid
requires additional overhead to move data from
one node to another node.
In addition to loading the memory and CPU
resources on each node, running multiple sessions
on a grid adds to network traffic.
When you run a workflow on a grid, the Integration
Service loads memory and CPU resources on nodes
without requiring coordination between the nodes
Pushdown Optimization To
increase
session
transformation logic to
database.
push
target
Concurrent Sessions and Workflows If possible, run sessions and workflows concurrently
to improve performance.
For example, if you load data into an analytic
schema, where you have dimension and fact tables,
load the dimensions concurrently
performance,
the source or
153
Buffer Memory
When the Integration Service initializes a session, it
allocates blocks of memory to hold source and
target data.
The Integration Service allocates at least two blocks
for each source and target partition.
Sessions that use a large number of sources and
targets might require additional memory blocks.
If the Integration Service cannot allocate enough
memory blocks to hold the data, it fails the session.
You can configure the amount of buffer memory, or
you can configure the Integration Service to
calculate buffer settings at run time.
154
Caches
The Integration Service uses the index and data
caches for XML targets and Aggregator, Rank,
Lookup, and Joiner transformations.
The Integration Service stores transformed data in
the data cache before returning it to the pipeline.
It stores group information in the index cache.
Also, the Integration Service uses a cache to store
data for Sorter transformations.
To configure the amount of cache memory, use the
cache calculator or specify the cache size.
You can also configure the Integration Service to
calculate cache memory settings at run time.
If the allocated cache is not large enough to store
the data, the Integration Service stores the data in
a temporary disk file, a cache file, as it processes
the session data.
Performance slows each time the Integration Service
pages to a temporary file.
Examine the performance counters to determine
how often the Integration Service pages to a file.
155
Target-Based Commit
Each time the Integration Service commits,
performance slows.
Therefore, the smaller the commit interval, the more
often the Integration Service writes to the target
database, and the slower the overall performance
If you increase the commit interval, the number of
times the Integration Service commits decreases
and performance improves.
When you increase the commit interval, consider the
log file limits in the target database.
If the commit interval is too high, the Integration
Service may fill the database log file and cause the
session to fail.
Therefore, weigh the benefit of increasing the
commit interval against the additional time you
would spend recovering a failed session.
Click the General Options settings in the session
properties to review and adjust the commit interval.
Log Files
A workflow runs faster when you do not configure it
to write session and workflow log files.
Workflows and sessions always create binary logs.
When you configure a session or workflow to write a
log file, the Integration Service writes logging
events twice.
You can access the binary logs session and workflow
logs in the Administrator tool
Error Tracing
If a session contains a large number of
transformation errors, and you do not need to
correct them, set the session tracing level to Terse.
At this tracing level, the Integration Service does not
write error messages or row-level information for
reject data.
If you need to debug the mapping and you set the
tracing level to Verbose, you may experience
significant performance degradation when you run
the session. Do not use Verbose tracing when you
tune performance.
The
session
tracing
level
overrides
any
transformation-specific tracing levels within the
mapping.
This is not recommended as a long-term response to
high levels of transformation errors.
Post-Session Emails
When you attach the session log to a post-session
email, enable flat file logging.
If you enable flat file logging, the Integration
Service gets the session log file from disk.
If you do not enable flat file logging, the
Integration Service gets the log events from
the Log Manager and generates the session
log file to attach to the email.
When the Integration Service retrieves the session
log from the log service, workflow performance
slows, especially when the session log file is large
and the log service runs on a different node than
the master DTM.
For optimal performance, configure the session to
write to log file when you configure post-session
email to attach a session log.
Optimizing Grid Deployments Overview
When you run PowerCenter on a grid, you can
configure the grid, sessions, and workflows to use
resources efficiently and maximize scalability.
To improve PowerCenter performance on a grid,
complete the following tasks:
- Add nodes to the grid.
- Increase storage capacity and bandwidth.
- Use shared file systems.
- Use a high-throughput network when you complete
the following tasks:
1. Access sources and targets over the
network.
2. Transfer data between nodes of a grid
when using the Session on Grid option.
156
Storing Files
When you configure PowerCenter to run on a grid,
you specify the storage location for different types
of session files, such as source files, log files, and
cache files.
To improve performance, store files in optimal
locations.
For example, store persistent cache files on a highbandwidth shared file system.
Different types of files have different storage
requirements.
You can store files in the following types of locations:
Shared file systems - Store files on a shared file
system to enable all Integration Service processes to
access the same files. You can store files on lowbandwidth and high-bandwidth shared file systems.
Local - Store files on the local machine running the
Integration Service process when the files do not
have to be accessed by other Integration Service
processes.
High Bandwidth Shared File System Files
Because they can be accessed often during a
session, place the following files on a highbandwidth shared file system:
- Source files, including flat files for lookups.
- Target files, including merge files for partitioned
sessions.
- Persistent cache files for lookup or incremental
aggregation.
- Non-persistent cache files for only grid-enabled
sessions on a grid.
This allows the Integration Service to build the cache
only once.
If these cache files are stored on a local file system,
the Integration Service builds a cache for each
partition group.
Low Bandwidth Shared File System Files
Because they are accessed less frequently during a
session, store the following files on a low-bandwidth
shared file system:
- Parameter files or other configuration related files.
- Indirect source or target files.
- Log files.
Local Storage Files
To avoid unnecessary file sharing when you use
shared file systems, store the following files locally:
- Non-persistent cache files for sessions that are not
enabled for a grid, including Sorter transformation
temporary files.
- Individual target files for different partitions when
performing a sequential merge for partitioned
sessions.
- Other temporary files that are deleted at the end of
a session run. In general, to establish this, configure
$PmTempFileDir for a local file system.
Avoid storing these files on a shared file system,
even when the bandwidth is high
OPTIMIZING THE POWERCENTER COMPONENTS
You can optimize performance of the following
PowerCenter components:
- PowerCenter repository
- Integration Service
157
158
159
160
161