Anda di halaman 1dari 21

Informatica Power Center consists of 3 main components.

1. Informatica PowerCenter Client Tools: These are the development tools installed at developer
end. These tools enable a developer to

Define transformation process, known as mapping. (Designer)

Define run-time properties for a mapping, known as sessions (Workflow Manager)

Monitor execution of sessions (Workflow Monitor)

Manage repository, useful for administrators (Repository Manager)

Report Metadata (Metadata Reporter)

2. Informatica PowerCenter Repository: Repository is the heart of Informatica tools. Repository is a


kind of data inventory where all the data related to mappings, sources, targets etc. is kept. This is the place
where all the metadata for your application is stored. All the client tools and Informatica Server fetch data
from Repository. Informatica client and server without repository is same as a PC without memory/hard
disk, which has got the ability to process data but has no data to process.

This can be treated as backend of Informatica.

3. Informatica PowerCenter Server: Server is the place, where all the executions take place. Server
makes physical connections to sources/ targets, fetches data, applies the transformations mentioned in
the mapping and loads the data in the target system.

What are the different types of repositories created using Informatica?


There can be any number of repositories in Informatica but eventually it depends on number of ports.
- Informatica uses Repository manager to manage the upgrades and updates of the complete software
system.
- The repositories contained by the Informatica are as follows,
- Standalone Repository: this is a repository that includes all the individual functions and that is not
related to the repository.
- Global Repository: it provides the repository in a centralized domain and allows it to be shared
across many other repositories and platforms.
The objects are shared using the global shortcuts and the domains are fixed such that easy transaction
can be done.
- Local Repository: this allows the repository to be given within the domain using the global
repository.
- Versioned Repository: this is the repository used for both the local as well as global object as it
allows the version control of it as well.

What is the difference between STOP and ABORT options in Workflow Monitor?
- The STOP option is used to execute the session task and allow other task to run, whereas ABORT
option completely turns off the task which is running.
- When using STOP option the integration services stop reading the data from the source of the file,
whereas ABORT waits for the services to be finished and then only any actions take place.
- The STOP command processes the data to the source or to targets, whereas ABORT option has the
timeout period of 60 seconds.
- STOP option allows the writing of the data and committing the data to the targets, whereas ABORT
shows no commitment as such.
- STOP option doesnt kill any process and it just stops the resource sharing between the processes,
whereas ABORT stops the process and the session gets terminated.

Write the program through which the records can be updated?


- The records in Informatica can be updated without using the update strategy and it uses the target table
for that.
- There is a need to define the key in target table of the Informatica level and allow the connection of the
key area.
- Informatica provides the key and field level updation and it allows the mapping of the target.

- The program is as follows:


The program uses the target table say "Customer" with fields as
"Customer ID", "Customer Name" and "Customer Address".
Then "Customer ID" is being defined as the primary key and the Customer ID and Address fields are
connected together for the mapping of it.
What are the different options available for update strategy?
- Informatica uses the source data row-by-row to process the data with different transformation models.
- Every row that is used in the model by default allows the insertion of the data in the target table.
The options available for update strategy are as follows:
- DD_INSERT: this option allows the use of update strategy flags using the row of insertion.
This keeps the numeric value of the option to 0.
- DD_UPDATE: allows the use of Update Strategy flags to provide the update for the row.
The value for it will be given as 1.
- DD_DELETE: it is used for deletion purpose and its value remains 2 as it is given in the rules.
- DD_REJECT: The Update Strategy flags are used for the rejection rows and the value of it will be 3.

What are the different lookup cache(s)?


- Informatica provides the lookup cache and un-cache for the static or dynamic types of applications.
- The types are defined as follows:
- Static caches are those caches that dont modify the cache when the build of it already being done.
The cache remains the same for all the session and for the users data that it will store.
- Dynamic cache is used to store the data that is generated dynamically by some user action.
It allows the running of the session by inserting or updating the records in the cache that is based
on the source data.
5. What are the conditions needed to improve the performance of Informatica Aggregator
Transformation?
- The records can be stored before the value is being passed to the aggregator and by using the sorted
input option.
- The aggregator properties are checked to see that the inputs are provided and handled dynamically.
- The record set in the transformation is being used on the sorted columns used in GROUP BY
operations.
- The record sets are being sorted at the database level and the source qualifier transformations are used
for the unsorted data.
- The tuning of the transformation is being done by seeing the level of source data being transferred from
one source to another.
What is the difference between Router and Filter?
- Router transformation provides the division of the incoming records into multiple groups using the
condition, whereas Filter transformation doesnt take care of division.
- Router transformation provides mutually inclusive groups to have the same records, whereas Filter
transformation used to restrict the incoming record.
- Router transformation doesnt block the incoming record, whereas Filter transformation blocks or
restricts the incoming record on the condition given.
- Router transformation doesnt block the record that is being used, whereas Filter transformation doesnt
provide the default groups.
- Router transformation allows the record that doesnt match the routing conditions, whereas Filter
transformations are used not to match the condition.

What does reusable transformation mean?


- Reusable transformations are used to provide the mapping multiple times and allow different methods
to do it.
- Reusable transformations are stored in the metadata to provide the separate logic from different
mappings.
- Reusable transformations are used to provide the usage technique for other transformations.
- It keeps all the changes to the reusable transformation and by providing the mappings that is provided
to be invalidated.
What is main use of mapplet?
- Mapplets are used to provide the reusable object that is being created by the mapplet designer.
- It is used to consist of the designers designs that consist of the provision to provide the objects in use.
- The mapplet consists of the transformations that allow the use of mapping techniques.
- It also reuses the transformation logic that allows the multiple mappings to be done with the source
data.
What is the difference between a connected look up and unconnected lookup?
- Connected lookup is used to provide the input values that are taken from the transformation directly,
where unconnected lookup doesnt take the values directly.
- Connected lookup is connected to the database for the purpose of synchronization,
Whereas unconnected lookup doesnt have any synchronization technique in place.
- Connected lookup provides the transformations that are in the pipeline, whereas unconnected lookup
doesnt provide the provision to have pipelines.
- Connected lookup cant be used for other transformation like expression, etc. whereas unconnected
lookup can only be used with the transformation.
- Connected lookup cant be called for multiple times in a mapping, whereas unconnected lookup can be
called multiple times.
What is the function of look up transformation?
- Transformation is used with the data source and used to provide the mapping techniques.
- The transformation is a lookup transformation when the lookup data is used in flat file.
- It also uses the relational table and view that allow the file to be easily used in the transformation.
- The lookup transformation consists of the ports or input ports using the source column values.
- The column values allow the transformation to be based on the conditions provided by the lookup.
What is the function of Union Transformation?
- The union transformation provides the use of multiple input groups that allows the transformation to
process further.
- It provides the data sources to be merged by using different sources and then the data is used in the

application.
- The pipeline consists of the data sources that are used for merging purpose and other union purposes.
- The union transformations allow the transformations of the multiple input groups that provide merging of
the data from different sources.
- The transformation allows the use of functions like UNION ALL statements in SQL that combines with
the result of the SELECT statements.
What is the use of Incremental Aggregation?
- Incremental aggregation is used whenever a session is created using the execution of the application.
- It is used to map the Aggregate Transformation using the session option and that it needs to be
enabled.
- PowerCenter is used to perform the incremental aggregation on the source data and perform the
actions on it.
- It is used to pass the new source through the mapping and cache data to perform higher calculations.
- The calculations are performed using the new aggregation methods that are incremental in nature.

What is the function of aggregator transformation?


- Aggregator transformation is used to perform the aggregate calculations i.e. average and sums.
- Aggregator transformation is not like expression transformation and it is used to perform the calculations
on the groups.
- It doesnt allow the calculations to be done on the row-by-row basis and other calculations to be done.
- It consists of the groups by using the ports that indicate about the data of the groups.
- Grouping of the data and aggregate operations can be performed using the transformation properties.
- The operations used in this are as follows: AVG, COUNT, FIRST, LAST, MAX, MEDIAN, MIN,
PERCENTILE, STDDEV, SUM, and VARIANCE.
What is meant by Query Override?
- The query override is being done in the source qualifier when the current query is being overridden by
the other query.
- PowerCenter is used to generate the query for the use by the Source Qualifier Transformation.
- This transformation is running only when the session of the particular user runs providing the source
qualifier is in place.
- The default data is used that allow the use of the default query containing the SELECT statement.
- The statement consists of all the sources and the default setting that consists of the transformation
properties.
What is the use of source qualifier?
- Source qualifiers are used to represent the rows using the PowerCenter server and allow the reading
of it.

- It provides the reading of the rows that has been formed and provides the source of relational or flat
file.
- It keeps the session on for the qualifiers to provide the input using the source files and result as the
input given.
- It uses the relational or flat file source that provides the mapping of the server with the database.
- It remains connected with the transformation method also known as Source Qualifier transformation.
What are different types of transformations available in Informatica?
- Transformation is being done using the calculations that needs to be performed on the data.
The different types of transformation available in Informatica are as follows:
- Aggregator: this type deals with the specific calculations of the data and allows the aggregate result to
be shown.
- Application Source Qualifier: this allows the application source to be presented by using the qualifier.
- Expression: Expressions are used as statements and the operations are performed on that only.
- External Procedure: are used to provide the sources externally without the use of the internal sources.

What is the way to execute PL/SQL script using Informatica mapping?


- Stored procedures are used to have the transformation of the procedures to be placed in the
application.
- The execution can be used to provide PL/SQL script that provides the mapping to be done as well with
the application.
- The procedure name is specified during the Stored procedure transformation using the script
mentioned.
- The session is executed and saved by the session that will call up the procedure using the scripts.
- The mapping is to be done for the procedures so that connectivity can be done according to the use.

What is the use of code page?


- Code page is used to encode the specify characters having the set of one or more languages at one
place.
- The code page is selected using the source of the data and it is used to provide reliable data sources.
- Code page allow the automatic selection of the languages as soon as the page loads and the
page is selected.
- Code page is chosen such that the program or the application that used to specify the specific set of
data.
- It also describes the characters using the application that recognizes the software and allow the
receiving and sending of the character data.

Types of Error or Exception Handling in Informatica:


Type of errors in the ETL Job. When you run a session, the PowerCenter Integration Service can
encounter fatal or non-fatal errors. Typical error handling includes:
o

User Defined Exceptions: Data issues critical to the data quality, which might get
loaded to the database unless explicitly checked for quality. For example, a credit
card transaction with a future transaction data can get loaded into the database
unless the transaction date of every record is checked.

Non-Fatal Exceptions: Error which would get ignored by Informatica PowerCenter


and cause the records dropout from target table otherwise handled in the ETL logic.
For example, a data conversion transformation error out and fail the record from
loading to the target table.

Fatal

Exceptions:

Errors

such

as

database

connection

errors,

which

forces Informatica PowerCenter to stop running the workflow.

I. User Defined Exceptions


Business users define the user defined exception, which is critical to the data quality. We can setup
the user defined error handling using;
1. Error Handling Functions.

2. User Defined Error Tables.

1. Error Handling Functions


We can use two functions provided by Informatica PowerCenter to define our user defined error
capture logic.
ERROR() : This function Causes the PowerCenter Integration Service to skip a row and gives an
error message, which you define. The error message displays in the session log or written to the
error log tables based on the error logging type configuration in the session.

You

can

use ERROR in

Expression

transformations

to

validate

data.

Generally,

you

use ERROR within an IIF or DECODE function to set rules for skipping rows.
E.g.: IIF(TRANS_DATA > SYSDATE, ERROR('Invalid Transaction Date'))
Above expression raises an error and drops any record whose transaction data is greater than the
current date from the ETL process and the target table.
ABORT(): Stops the session, and gives a specified error message to the session log file or written
to the error log tables based on the error logging type configuration in the session.
When the PowerCenter Integration Service encounters an ABORT function, it stops transforming
data at that particular row. It processes any rows read before the session aborts.
You can use ABORT in Expression transformations to validate data.
E.g.: IIF(ISNULL (LTRIM(RTRIM(CREDIT_CARD_NB))), ABORT('Empty Credit Card Number'))
Above expression aborts the session if any one of the transaction records are coming without a
credit card number.

Error Handling Function Use Case


Below shown is the configuration required in the expression transformation using ABORT() and
ERROR() Function. This transformation is using the expressions as shown in above examples.

Note: - You need to use these two functions in a mapping along with a session configuration for row
error logging to capture the error data from the source system. Depending on the session
configuration, source data will be collected into Informatica predefined PMERR error tables or files.
Please refer the article "User Defined Error Handling in Informatica PowerCenter" for more detailed
level implementation information on user defined error handling.

2. User Defined Error Tables


Error Handling Functions are easy to implement with very less coding efforts, but at the same time
there are some disadvantages such as readability of the error records from the PMERR tables
and performance impact. To avoid the disadvantages of error handling functions, you can create
your own error log tables and capture the error records into it.
Typical approach is to create an error table which is similar in structure to the source table.
Error tables will include additional columns to tag the records as "error fixed", "processed".
Below is a sample error table. This error table includes all the columns from the source table and
additional columns to identify the status of the error record.

Below is the high level design.

Typical ETL Design will read error data from both error table along with the source data. During
the data transformation, data quality will be checked and any record violating the quality check will
be moved to error tables.
Record flags will be used to identify the reprocessed and records which are fixed for reprocessing.

II. Non-Fatal Exceptions


Non-fatal exception causes the records to be dropped out in the ETL process, which is critical to
quality. You can handle non-fatal exceptions using;
1. Default Port Value Setting.
2. Row Error Logging.
3. Error Handling Settings.

1. Default Port Value Setting


Using default value property is a good way to handle exceptions due to NULL values and
unexpected transformation errors. The Designer assigns default values to handle null values and
output transformation errors. PowerCenter Designer let you override the default value in input, output
and input/output ports.
Default value property behaves differently for different port types;

Input ports: Use default values if you do not want the Integration Service to treat null values
as NULL.

Output ports: Use default values if you do not want to skip the row due to transformation
error or if you want to write a specific message with the skipped row to the session log.

Input/output ports: Use default values if you do not want the Integration Service to treat null
values as NULL. But no user-defined default values for output transformation errors in an
input/output port.

Default Value Use Case


Use Case 1
Below shown is the setting required to handle NULL values. This setting converts any NULL value
returned by the dimension lookup to the default value -1. This technique can be used to handle late
arriving dimensions

Use Case 2
Below setting uses the default expression to convert the date if the incoming value is not in a valid
date format.

2. Row Error Logging


Row error logging helps in capturing any exception, which is not consider during the design and
coded in the mapping. It is the perfect way of capturing any unexpected errors.
Below shown session error handling setting will capture any unhandled error into PMERR tables.

Please refer the article Error Handling Made Easy Using Informatica Row Error Logging for more
details.

3. Error Handling Settings


Error handling properties at the session level is given with options such as Stop On Errors, Stored
Procedure Error, Pre-Session Command Task Error and Pre-Post SQL Error. You can use these
properties to ignore or set the session to fail if any such error occurs.

Stop On Errors: Indicates how many non-fatal errors the Integration Service can encounter
before it stops the session.

On Stored Procedure Error: If you select Stop Session, the Integration Service stops the
session on errors executing a pre-session or post-session stored procedure.

On Pre-Session Command Task Error: If you select Stop Session, the Integration Service
stops the session on errors executing pre-session shell commands.

Pre-Post SQL Error: If you select Stop Session, the Integration Service stops the session
errors executing pre-session or post-session SQL.

III. Fatal Exceptions


A fatal error occurs when the Integration Service cannot access the source, target, or
repository. When the session encounters fatal error, the PowerCenter Integration Service
terminates the session.
To handle fatal errors, you can either use a restartable ETL design for your workflow or use
the workflow recovery features of Informatica PowerCenter,
1. Restartable ETL Design

2. Workflow Recovery

1. Restartable ETL Design


Restartability is the ability to restart an ETL job if a processing step fails to execute properly. This will
avoid the need of any manual cleaning up before a failed job can restart. You want the ability to
restart processing at the step where it failed as well as the ability to restart the entire ETL session.
ETL restartability approaches to support commonly used ETL Jobs types,

1. Slowly Changing Dimension


2. Fact Table
3. Snapshot Table
4. Current State Table
5. Very Large Table

1. Slowly Changing Dimension Load


Below diagram shows the high level steps required for SCD loading ETL Job.
Key Design Factor: The key aspect of this design is the CHECKSUM Number comparison. As per
this design, any incoming record with the same CHECKSUM number of the active record in the
target will not be processed into the Dimension table, Hence we can restart the process without
impacting the partially processed data.

Step 1: In this step, we will read all the data from the staging table. This will include joining data from
different tables and applying any incremental data capturing logic.
Step 2: Data will be compared between source and target to identify if any change in any of the
attributes. CHECKSUM Number can be used to make this process simple.

Step 3: If the check CHECKSUM Number is different, Data is processed further, else ignored.
Step 4: Do any transformation required, including the error handling.
Step 5: Load the data into the Dimension Table.

2. Fact Table Load


High level design for the Fact Table design is given in below image.
Key Design Factor: As per this design, FACT Table records are loaded into a TEMP table, then to
the actual table. Truncate and Reload design for the TEMP gives the restartability.
Note: Data movement from the TEMP table to FACT table is assumed to be very less likely to get
errors. Any error in this process will require manual intervention.

Step 1: In this step, we will read all the data from the source table. This will include joining data from
different tables and applying any incremental data capturing logic.
Step 2: Perform any transformation required, including the error handling.
Step3:

Load

the

data

into

the

TEMP

Table.

Step 4: Load the data from the TEMP Table into the FACT table. This can be done either using
Database script or using an Informatica PowerCenter session.

3. Snapshot Table Load


Many times we create snapshot tables and do build reporting on top of it. This particular restartability
technique is appropriate for such scenarios. Below image shows the high level steps.
Key Design Factor: Truncate and Load design for the TEMP gives the restartability.

Detailed steps are as below.

Step 1 : In this step, we will read all the data from the source table. This will include joining data from
different tables and applying any incremental data capturing logic.
Step 2 : Truncate the data from the target table.
Step

3:

Perform

any transformation required,

including

the error

handling.

Step 4 : Load the data into Target Table.

4. Current State Table Load


Just like SCD Type 1, there are scenarios, where you are interested to keep only the latest state of
the data. Here we are discussing a very common and simple approach to achieve restartability for
such scenarios.

Key Design Factor: Update else Insert design gives the restartability.

More about the Steps.

Step 1: In this step, we will read all the data from the source table. This will include joining data from
different tables and applying any incremental data capturing logic.
Step 2: Identify Records for INSERT/UPDATE and perform any transformations that is required,
including the error handling.
Step

3:

Insert

the

record

which

is

identified

for

Insert.

Step 4: Update the record which is identified for Update.


Note: Click the link to Learn more on Slowly Changing Dimension Load

5. Very Large Table Load


The approach we are discussing here is appropriate for loading very large snapshot table, which is
required to be available 24/7.
Key Design Factor: Switching the tables using RENAME DDL Command.

Step 1: In this step, we will read all the data from the source table. This will include joining data from
different tables and applying any incremental data capturing logic.
Step 2: Perform any transformations that is required, including the error handling.
Step 3: Load the data into the TEMP Table.
Step 4: Rename the TEMP table to the Target table. This will move the data from the TEMP table to
the actual target table.

2. Workflow Recovery (Checkpoints in SSIS)


Workflow recovery allows you to continue processing the workflow and workflow tasks from the point
of interruption. During the workflow recovery process Integration Service access the workflow state,
which is stored in memory or on disk based on the recovery configuration. The workflow state of
operation includes the status of tasks in the workflow and workflow variable values.