Informatica Bhaskar20161012

INFORMATICA POWERCENTER
Transaction:
A transaction is a business operation
Technical point of view :
It is a set of DML operations(Insert,UPDATE_DATE,Delete)
OLTP System=OLTP applications(Front end)+Database(Back end)
Data warehousing=ETL Development+BI Development
Enterprise Data warehouse:
An Enterprise Data warehouse is a relational DB which is specially designed for analyzing the
business and making decisions to achieve the business goals and responding to business
problems ,but not designed for business transactional processing
A Data warehouse is a concept of consolidating the data from multiple OLTP data bases
Storage Capacity point of view Relational DB is categorized in to three types
1.Low range
2.Mid range
3.High range
1.Low range DB:
Can organized and managed mega bytes of information
Example:Ms-Access
2.Mid range DB:
Can organized and managed Giga bytes of information
Example:Oracle,Microsoft SQL SERVER,Sybase,DB2,Informix,Postgress SQL
3.High range DB:
Can organized and managed Tera bytes and Peta Bytes of information
Example:Teradata,Netezza,GreenPlum,Hadoop.
Storage point of view data base categorized in to two types.
1.NFS-Normal File storage
2.DFS-Distributed File storage
Data storage Patterns:
There are two types of data storage patterns which are supported by relational DB
1.NFS-Normal File storage
2.DFS-Distributed File storage
NFS-Normal File storage:
1.Single Disk for storing the data
2.Shared every thing architecture(data shared in single disk)
3.Data reads in Sequential
4.All Mid range DB are developed on platform of NFS.
5.Limit scalability or expansion
6.Strongly recommended for OLTP applications
7.Recomended for data warehousing for small and medium scale enterprises with storage
capacity of gigabytes
8.default processor in NFS is only one
9.Disk cant scalable in NFS

Example:Oracle,Sybase,SQL server,DB2,Redbrics,Informix,Postgress SQL
Note:Processor is a S/W component run as .exe
DFS-Distributed File Storage:
1.Multiple disks for storing the data
2.Storing nothing architecture (every processor has dedicated memory& disk that is not shared
by another processor)
3.Data reads in parallel(supports parallelism)
4.Unlimited Scalability
5.Designed only for Building Enterprise data warehouse but not for OLTP
Example:Teradata,Netteza,Hadoop,green plum
Enterprise DWH database Evaluation:
1.Data base that supports enormous storage capacity(Billions of rows and Tera bytes)
2. DB that supports distributed file storage pattern
3.DB that supports nothing architecture
4.Database that supports unlimited scalability(expansion)
5.DB that massively parallel processing
6.DB that supports mature optimizers to handle complex SQL Queries( Run the queries more
faster with less system resource usage
7..DB that supports High Availability(Users can access)
8.100% data without data loss even S/W,H/W components are down
9.Data base that supports parallel loading
10.That DB supports low TCO (total cost of owner ship) ease to set up ,administrate & Manage
11. Single DB server that can provide access to hundreds of users concurrently
Data Acquisition:
It is a process of extracting the data from multiple source systems,transforming the data into
consistent format and load in to a target system,To implement the ETL process we need ETL
tools
Types of ETL tools
Two types of ETL tools to build Data Acquisition
1.GUI based ETL tool
2.Program Based ETL tool
Code Based ETL:
ETL applications are developed using programming languages such as
SQL, PLSQL, SAS, Teradata,ETL utilities

GUI Based ETL:
ETL applications are developed using simple graphical user interface,point& click features
Example:Informatca,Data stage, Abnitio,SSIS
MSBI is a package it has(ETL+Reporting=SSIS+SSRS)
Data Cleansing:
It is a process of filtering or rejecting Un wanted source data or records
Data Scrubbing: It is the process of Deriving new attributes or columns
Data Merging:
It is the process of combining the data from multiple source systems
Data merging are two types

1.Join
2.Union
Data warehouse:
1.Data warehouse is a relational DB that is used to store the historical data for query& Analysis
2.Data in a Data warehouse is derived from source system(OLTP/SOI)
SOR-->Source of records
OLTP: (Online transactional Processing)
Computer system that stores time sensitive transaction related data that is processed
immediately and analysis and always kept current.
Difference Between OLTP And Data ware house
Tables in Data Warehouse:
There are two types of tables we have in Data Warehouse
1. Dimension Table
2. Fact Table
1. Dimension Table:
Stores textual or descriptive information about business process
Dimension tables example s in Retail Domain:
Customer,Product,Stores,Employees,Pramotions,Time
Dimension tables example s in Banking Domain:
Applictions, Customers, Products, Branches, Promotions, Time, Billing cycle Dimension
Fact Table:
Fact table stores measurements or metrics of a business process
Fact table examples in Retail Domain:
Sales,Purchase,Inventry
Fact tables examples in Banking Domain
1. SA_LoanTransaction Fact
2. CC_Transaction Fact
3. CC_Statement Fact
Fact table consists of Keys and Measures and Fact table consist of Composite Primary key
Composite Primary Key
Store Prod Date
Key(X) Key(X) Key(X) Revenue(X)
S1 P1 D1 3000
S1 P2 D1 2000
S2 P1 D1 2000
Types of Fact tables:
There are three types of fact tables
1. Fact Less Fact table
2. Cumulative Fact table
3. Snap shot Fact table
1. Fact less Fact table:
1.Fact less Fact table consist of only keys and No Measures
2.Fact less Fact table is to record the events
3.Fact less Fact table acts as a Bridge between the Dimensional tables
Example of Fact less Fact table: Employee Attendence Fact less Fact
Dimension Tables
Auditorim Sponsors Time Paticipant Events
Aud Id Sponsor Id Date Key Paticipant Id Event Id
Sponsor Month Paticipant Event
Aud Name Name Key Name Name
Aud Type Contribution Qtr Gender Event Type
Aud Mgr Address Year Address Event Desc
Aud
Address
Fact Table
Aud Id Sponsor Id Paticipant id Event id
A1 S1 P1 E1
A1 S1 P2 E1
A2 S1 P3 E1
2. Cumulative Fact table:
It consist of additive fact it describes what happened over a period of time
Ex: Sales Fact table, Order Fact table
3. Snapshot Fact table:
It consist of semi additive facts and non additive facts it describes states of things in a
particular instance of time
Ex: Bank Fact table, Inventory Fact table
Degenarate Dimension Key:
Key In a Fact table that is not associated with any Dimension
Example:Order Id,Sale Id, Bill No,Invoice etc
Types of Facts:
There are 3 types of Facts in Fact tables
1. Additive Facts
2.Semi Additive Facts
3. Non Additive Facts
1.Additive Fact: Business measurements in a fact table that can be summed up through all of
the dimensional Keys
Fact Table
Store Key Prod Key Date Key Revenue
S1 P1 12-Jan-15 600
S1 P2 12-Jan-15 400
S2 P2 12-Jan-15 800
S2 P3 13-Jan-15 500
S3 P1 13-Jan-15 700
S3 P3 14-Jan-15 900
Reports generation using Keys In above Fact table
Revenue Report By Revenue Report By Revenue Report By

Store Product Date
Store Key Revenue Product Key Revenue Date Key Revenue
S1 1000 P1 1300 12-Jan-15 1800
S2 1300 P2 1200 13-Jan-15 1200
S3 1600 P3 1400 14-Jan-15 900
Bank Fact table:
Semi Additive Fact: Business measurements in a fact table that can be summed up across only
few Dimensional Keys
Transaction Profit
Acct Id Date Balance Margin
21653 12-Jan-15 700000 -
21654 12-Jan-15 400000 -
21653 13-Jan-15 900000 -
21654 13-Jan-15 600000 -
Reports:
Balance By Acct Id
Acct
Id Balance Balance
21653 1600000 900000
21654 1000000 600000
Balance By Date
Date Key Balance
12-Jan-15 1100000
13-Jan-15 1500000
The above example is for Semi additive Fact
3.Non Additive Fact:Business measurements in a fact table that cannot be summed up across
any Dimension KeysNote: In a Fact table percentage are always non additive
SEM1 80%
SEM2 60%
TOTAL 140% Wrong
Note: Example of Non Additive Fact is Unit Price
Types of Dimensions:
The following are the diff types of dimensions in DW
1. Confirmed Dimension
2. Degenerated Dimension
3. Shrunken Dimension
4. Junk Dimension
5.Dirty Dimension
Types of Dimensions:
Conformed Dimension: A Dimension that is shared across multiple Fact table that is called
Conformed Dimension Or Dimension that is used to join Data mart
Banking Domain:
Degenerated Dimension:
If a fact table act as dimension and it’s shared with another fact table (or) maintains foreign key
in another fact table .such a table called degenerated dimension.
Shrunken Dimension:
Dimension that is subsetof toanother dimension
Or
Dimension that is not directly linked to the Fact table

Junk Dimension:
Dimension that is organized based on low cardinality indicator or flag values
Cardinality is no of unique values in a column or Cardinality expresses the minimum and the
maximum no of instances of an entity ‘B’ that can be associated to an instance of Entity ‘A’
The Minimum and Maximum no can be 0,1 or “n”
Dirty Dimension:
If a record occurs more than one time in a table by the difference of non key attribute such a
table is called dirty dimension
Orders:
Order Order Payment Payment Mode Comm/Non

Id Date Mode Type Comm Amount
111 - Cash Cash No -
113 - Credit Master No -
116 - Credit Visa Yes -
Payment Mode Comm/Non
Ord Ind Id Payment Type Comm
1 Cash Cash No
2 Credit Master No
3 Credit Visa Yes
Order Order
Order Id Date Id Amount
111 - 1 -
112 - 1 -
113 - 2 -
114 - 1 -
115 - 1 -
116 - 3 -
117 - 1 -
Slowly Changing Dimension:
Dimension that change slowly and irregularly
Or
Dimension that change across time
There are three choices to handle slowly changing dimensions
1.SCD TYPE1
2.SCD TYPE-II
3.SCD TYPE-III
1. SCD TYPE-I:
Most recent changes are maintained
Type1 is current status
Type1 is used for error correction
CID CNAME DOB

11 BEN 12-JAN-1967
12 ALEN 15-FEB-1966
CKEY CID CNAME DOB
101 11 BEN 12-JAN-1967
102 12 ALEN 15-FEB-1966
SCD TYPE-II:
Change is inserted as a new record
Type-II is used to maintain historical status
PRODUCTS
PID PNAME PRICE EFF_DATE
11 ABC 300 12-JAN-10
12 PQR 270 15-JAN-10
PRODUCT PRICE OF 12
CHANGED 199 27-AUG-11
PKEY PID PNAME PRICE EFF_DATE END_DATE

100 11 ABC 300 12-JAN-10
101 12 PQR 270 15-JAN-10 26-AUG-11
102 12 PQR 199 27-AUG-11
Type-II Dimension is referred as Dirty Dimension
Type-II Dimension has redundant data
SCD Type-III: Change is appended as a new column
Type-III is used to maintain partial history status
CID CNAME LOC

11 BEN HYD
12 TOM CHE
CURR
CKEY CID CNAME LOC PREVLOC
101 11 BEN HYD
102 12 TOM CHE
CID CNAME LOC
11 BEN HYD
12 TOM BNG
CURR
101 11 BEN HYD -
102 12 TOM BNG CHE
CID CNAME LOC

11 BEN KER
12 TOM BNG
CURR
101 11 BEN KER HYD
102 12 TOM BNG CHE
Role Play Dimension: Dimension that is recycled in multiple applications within the DB
Data Modeling:
Model: Business presentation of the structure of the data in one or more database
OLTP:ER-Mode is used
Model is normalized
Model is efficient to wards transaction
Datawarehouse:Dimensional model is used

Model designed based on Facts&Dimensions
Model is efficient in query processiong
Schema:Scema is a collection of users’objects can be a Table,View or Synanim
Types of Schema:
1.Star Schema
2.Snow Flake Schema
3.Galaxy Schema
1.Star Schema: In a star schema a centre of a star is Fact table and corners are Dimension
tables
In simple start schema consist of only one Fact table
Star schema Dimension ‘s do not have parent tables
Star schema Dimension’s are Denarmalized
Star schema is De Normalized(every thing in one table) efficient in query processing

2. Snow Flake Schema
Snow flake schema dimensions have one or more parent tables
Snow flake schema is normalized
Snow flake schema is efficient in transaction processing
Customer
Cid Cname Gender Geoid
11 C1 1 111
12 C2 1 111
13 C3 0 112
14 C4 1 111
Geography
Geoid City State Country Region
111 Hyd Ts India Asia
112 VSP Ap India Asia
Cid Cname Gender Geoid City State Country Region

11 C1 1 111 Hyd Ts India Asia
13 C3 0 112 VSP Ap India Asia
Star schema use more space than Snow flake schema

Galaxy Schema:
Multiple Fact tables are connected to multiple Dimensions tables
Index: (Fast accessing path)
1.B*Tree Index
2.BitMap Index
1.B*Tree Index
It is used on High Cardinality columns
Example for B*Tree Index=EMPNO
2.BitMap Index
It is used on Low Cardinality columns
Example for Bit Map Index=GENDER

1.Flat File to Oracle:
Using SQLLDR to load the flat file data in to Oracle Table
STEP1:
Create Data file with few sample records
=============================================================================
STATE_ID,STATE_NAME,COUNTRY_ID
250.00,Rio Negro,111
251.00,Buenos Aires,111
252.00,Victoria,115
253.00,South Australia,115
254.00,Queensland,115
255.00,Northern Territory,115
256.00,New South Wales,115
257.00,Australian Capital Territory,115
258.00,Sao Paulo,110
259.00,Santa Catarina,110
260.00,Rio de Janeiro,110
=============================================================================
Save the file in the following directory
C:\SOURCE\States.txt
STEP2:
Create table in the oracle data base using the script given below
CREATE TABLE STATES(STATE_ID NUMBER(5,2),STATE_NAME VARCHAR2(25),COUNTRY_ID
NUMBER(3));
STEP3: Create control file using note pad file:
LOAD DATA
INFILE 'C:\SOURCE\States.txt'
APPEND INTO TABLE STATES
FIELDS TERMINATED BY "," (STATE_ID,STATE_NAME,COUNTRY_ID)
Save control file in the following directory C:\SOURCE\”States.ctl”
STEP4:
Connect to SQL and use the following command SQL>HOST CMD
C:\oracle\product\10.2.0\db_1\BIN>SQLLDR scott@oracle/tiger direct=true skip=1

log=c:\SOURCE\States.log control=c:\SOURCE\States.ctl
C:\oracle\product\10.2.0\db_1\BIN>exit
STEP5:
SQL>SELECT * FROM STATES
Informatica Powercenter 9.5:
Informatica Powercenter is a data integration tool from Informatica corporation which was founded in
1993 Redwood City of Las Angels California.
Informatica Power center is a GUI based data integration plotform which can access the data from
various types of source systems,transform the data into universal format
It is a client server based ETL product.
Informatica Products:
Informatica Power Center
Informatica Power Mart
Informatica Power Exchange

Informatica Power Analyzer
Informatica Data Quality
Master Data Management
Informatica B2B Integration
Informatica Information Life Cycle Management
Informatica Power Center Architecture:
When we install Informatic Power center the following components installed.
1.Power Center Clients:
1.Power Center Designer
2. Power Center Work flow Manager
3. Power Center Workflow Monitor
4. Power Center Repository Manager
2. Power Center Repository
3.Power Center Repository Services(PCRS)
4.Power Center Integration Services(PCIS)
5.Power Center Domain
6.Informatica Administrator(Web Client)
1.Power Center Clients:
1.Power Center Designer: It is an GUI based client component which allow you to design ETL
applications known as Mapping.
A Mapping defines Extraction,Transformation,Loading
The following objects can be created from designer client

1.Source Definition(Meta Data)
2.Target Definition
3.Mappings with Business rules
2. Power Center Work flow Manager:
It is a GUI based client component which allow you create following objects.
1.Session
2.Work flow
3.Schedulers
Session:
A Session is a task that executes mapping .
A session is a set of instructions that tells ETL server
Work flow:
A Work flow is a set of instructions that tells how to execute the session tasks
A Work flow is designed with two types of batch process
1.Sequential Batch Process
2.Concurrent or Parallel batch Process
1.Sequential Batch Process:
Sequential batch process is recommended when hence it is a dependency between data loads
2.Concurrent batch Process:
Work flow executes the session tasks all at once this is recommended when there is no dependency
between data loads.
Work flow is a top object in the object development hierarchy
Schedule:It is an Automation of executing the work flow
Power Center work flow Monitor:
a)It is a GUI based client component which allows you to monitor the execution of sessions and work
flows on ETL server
b)Can collect ETL statistics such as
No of records Extracted
No of records Rejected
No of recorded Loaded
Through Put in Power Center :
It defines the efficiency or the rate at which records are extracted/sec,the records are loaded/sec
Through put can also be express in bytes/sec
Can evaluate the ETL server Efficiency
Users can access session Log(Execution Log)
Development of ETL Objects:
Step1: Create Source Definition
Step2: Create Target Definition
Step3: Design Mapping (ETL application with or with without Business rules)
Step4: Create Session for each Mapping
Step5: Design Work flow
Step6: Run Work flow
Step7: Monitor Workflow
Power Center Repository Manager:
It is a GUI based Administrative client which is used to perform the following tasks
a) Create ,Edit ,Delete Folders

b) Objects Back up and Restore
c) Assign users to access the folders with read ,write, execute permissions
Power Center Repository:
A repository is a brain of ETL system that stores ETL objects or Meta Data.
A relational DB is required to create repository
Repository DB that consist of system tables that stores ETL objects

Power Centre Repository Service[PCRS]:
A Power Center client component connects to the repository DB using repository service
A repository service is a set of process that insert ,UPDATE_DATE,delete,retrive metadata from

repository
Instance: Instance is a Image of Original Objects
Or
Instance is a Image of physical Objects
Note: Repository Service provides Design Level Environment
Power Center Integration Service(PCIS):
An Integraton service is an ETL server that performs Extraction,Transformation,Loading
It provided run time Environment where ETL objects are executed integration Service creates Log’s and
saved in the repository data base through repository service.
Integration service consists of following server components
1.Reader
2.DTM
3.Writer
Reader: It connects to the source and Extract the data from tables,Files,etc
Data Transformation Manager (DTM): It process the data according to the business rules that you
configured in the mapping
Writer: It connects to the target system and loads the data into the tables (or) Files
Note: Log Created by Integration service and saved in repository that log can accessed by work flow
Manager
Power Center Domain:
1.The Informatica power center has the ability to scale the services and shared resources across multiple
machines
2.The power center domain is a primary unit for managing and administrating application
services(PCRS,PCIS)
3.Power Center Domain is a collection of one or more Nodes
4.A Node which host the Domain is known as Primary Node or Master Gate way Node
5.If master gate way Node fails users request can’t be processed
6.Hence it is recommended to configure more than one Node as Master Gate way Node
7.If the worker Node fails the request can be distributed to other Nodes[High Availabilty]
8.Each Node is created or Configured with application services
Informatica Administrator (Web Client):
1.It is an Administrative web client which is used to manage&administrate power center Domain
2.The following admin tasks can be performed with web client
a) creation of users,Groups
b) assign roles&permissions to the users or users groups
c) enable and disable existing nodes
d) Configuring existing nodes to increase the processing efficiency
f) adding or deleting Nodes

g) Creation of application services(PCRS,PSIC)
Pre Requisites of an ETL process:
STEP1: Set Up Sourc&Target Data Base
STEP2: Create ODBC connections for Sources & Target DB
Set Up Target Data Base:
Start--- >Programs---> Oracle---> Application Development---> SQL PLUS
Log on to Oracle with following details
Create User:
SQL>SHO USER
SQL>Create user BATCH7AM identified by TARGET;
Assign permission to User:
SQL>Grant DBA to BATCH7AM;
ETL Development process:
1.Creation of Source& Target Definitions
2 A Source Definition is created using Source Analyzer tools
3 A Source Analyzer connects to the Source DB using ODBC connection

1.Creation of Source Definition:
A Source Definition is created using Source Analyzer tools
A Source Analyzer connects to the Source DB using ODBC connection
2.Creation Of Target Definition:
A Target Definition is created using Target Designer tool in the Designer Client Component
A Target Designer connects to the Target DB using ODBC connection
3.Create Mapping(with or with out Business Rule):
A Mapping is made up of following metadata components
a) Source(E)
b) Business Rule(T)
c) Target (L)
A Mapping with out business rule known as Flat Mapping
A Mapping is created using Mapping Designer Client component
4.Creation of Session:
1.A Session is a task that runs the mapping
2.It is created using Task Developer tool in Work flow Manager Client component
3.Every Session is configured with the following details
a) Source Connection
b) Target Connection
c) Load Type
Creation of Reader Connection(Oracle):
From the client Power center work flow Manager Select connections menu click on Realational select
the type Oracle click on New Enter the following details
Creation of Writer Connection(Oracle):
From the client Power center work flow Manager Select connections menu click on Relational select the
type Oracle click on New Enter the following details
Configuring the Session:
1.Double click the session select the mapping tab from left window select the source
2.From Left window select source and from connections section click on ( down arrow) to open
relational connection browser select connection ORACLE_SCOTT_DB
3.From Left window select target and from connections section click on(down arrow) to open relational
connection browser select connection ORACLE_BATCH7AM_DB click Ok
4.From properties section select target load type=Normal click apply and click Ok
5.From Repository menu Click on Save
Create Work Flow:
From Client Power Center Work flow Manager select Tools menu click on Work flow Designer
2.From work flow menu select create enter the Work flow Name w_s_flatmapping_oracle
3.From left window drag the session drop in Work flow Designer
4.From Tool Bar click on Link Task
5.Drag the link from start drop on session instance
6.From Repository menu click on Save

7.Run work flow
8.From Workflow menu click on Start Workflow
Creation of target tables Using Target Designer Tool:
1.Open the click Power Center Designer from Tools menu select designer
2.From left window expand sources subfolder
3.Drop the source definition [EMP] to the target Designers work space
4.Double click the target definition click on Rename DIM_EMPLOYEES
5.Select columns tab from tool bar click on Cut to delete columns
6.From tool bar click on add a new column click apply click Ok
7.From target menu click on generate/Execute SQL
8.Click on connect and connect to the DB with the following Details
Select Create table& Click on Generate&Execute and click Ok then the SQL stores in a file ,file
name called MKTABLES.SQL
Transformations&Types of Transformations:
A transformation is a power center object which allow you to develop the business rules to process the
data in desired business formats.
Transformations are categorized in two types
1.Active transformation
2.Passive Transformation
1.Active transformation:
A transformation that can effect the no of rows(or) change the no of rows is known as Active
transformation
The following are the list of active transformations used to process the data
1.Source Qualifier Transformation
2.Filter Transformation
3.Rank Transformation
4.Sorter Transformation
5.TransactionControl Transformation
6.UPDATE_DATE Strategy Transformation
7.Normalizer Transformation
8.Aggrigator Transformation
9.Joiner Transformation
10.Union Transformation
11 .Router Transformation
12.SQL Transformation
13.JAVA Transformation
14.Look Up Transformation(From 9.0 version on wards Act as Active transformation)

1.Passive transformation:
A transformation that doesn’t effect the no of rows(or) does n’t change the no of rows is known as
Passive transformation
The following are the list of active transformations used to process the data
1.Look UpTransformation( Up to Informatica 8.6 act as Passive transformation)
2.Expression Transformation
3.SQL Transformation(it Act as Duel Transformation)
4.Stored Procedure Transformation
5.Sequence Generator Transformation
6.XML Source Qualifier Transformation
Ports & Types of Ports:
A port represents column of the table (or) File
Every Transformation can have two basic types of Ports
1.Input Port(I)
2.Output Port(O)
Input Port(I): A Port which can receives the data is known as Input Port
Output Port(O): A port which can can provide the data is known as Output Port
Connected &Unconnected Transformations:
Connected Transformation:
A Transformation which is the part of mapping in Data flow direction is known as connected
Transformation
2.Connected to the source and connected to the Target
3. A connected transformation can receive multiple Input Ports and Can return Multiple Output Ports
Note: All active and passive transformations can be configured as connected transformation
Un Connected Transformation:
A Transformation which is not a part of Data flow direction neither connected to the source nor
connected to the target is known as Un Connected Transformation
2.Can receive multiple input ports but it always returns a single Output Port.
3. The following transformations can be configured as Un Connected
Look Up Transformation
Stored Procedure Transformation
Filter Transformation:
1.It is an active transformation that can filter the records based on the given condition
2. The condition can define on single /multiple ports
3. The integration service evaluates the condition writ tens True/False
4. True indicates that the records are allowed for further processing (or) Loading the data into target
5. False indicates that the records are rejected from filter transformation
6. Rejected records can’t be captured(even can’t be identified in session log)
7. The Filter transformation functions as where clause in SQL
8. The filter transformation supports single condition on one/more ports
Limitations :
Allows you to define only a single condition
Rejected records can’t be captured

Performance Considerations:
1.Keep the filter transformation as close o the Source Qualifier transformation as possible o filter the
rows early in the data flow,as a result we can reduce the no of rows for further processing
2.copy the required ports from source qualifier to expression transformation
3.Consider the data concatenation rule while designing mapping
Expression Transformation:
1.It is a passive transformation which allow you to calculate the expression for each row
2.It performs row by row process
3.Expressions are developed using functions&arthematical operations
4.An expression transformation is created with 3types of ports
Input,Output,Variable
5.Expressions are developed either in output(O) or Variable ports(V)
6.Varible ports are recommended to create to simplify the complex expressions and reuse expressions
Scenario1:
Calculate the tax for each employee who belongs to the sales department ,If sal is greater than 5000
then calculate the tax as Sal*0.17 else calculated the tax as Sal*0.13
Sales department is identified with department identification no is 30
Logic:
Expression transformation
SAL-[I]
TAX[O] (IFF(SAL>5000,SAL*0.17,SAL*0.13)
LOAD_DATE[O](SYSDATE)
Scenario2:
Calculate the total salary for each employee based on Sal and Commission
Total sal=Sal+Comm
Comm May have Nulls
Logic:
Expression transformation
TotSal=IIF(ISNULL(COMM),SAL,SAL+COMM)
Scenario3:
Implement LIKE operator using filter transformation in job column of EMP table ‘SALESMAN’ is
represented 3 different format
SALESMAN
SALES-MAN
PRE-SALES
Variable Port:
A port which can store the data temporarly is known as variable port(v)
2.Varible ports are created to simplify the complex expressions and reuse expressions in several Output
Ports
3.Varible ports are local to the transformation
4.Increase the efficiency of calculations

5.The default value for numerical variable port is “0”
6.The default value for variable port with data type string is “space”
7.Varible ports are not visible normal view of transformation but in edit view
Router Transformation:
A Router Transformation it is of type an active transformation which allows you to create multiple
conditions and passes the data to the multiple target
2.A router transformation is created with two types of the groups
1.Input Group
2.Output Group
Input Group:
Only Input Group can receive the data from source pipe line
Output Group:
Multiple Output Group categorized in to two types
1.User defined Output group
2.Default group
1.User defined Output group:
1.Each user defined output group has one condition
2.All Group conditions are evaluated for each row
3.One row can pass multiple conditions
Default Group
1.Always one default group
2.Captures the rows that fails all group conditions(Rejected records)

Performance Considerations:
The router transformation has a performance advantage over multiple filter conditions because .A row
is read once into Input Group but evaluated multiple times based on the no of groups,where as using
multiple filter transformation requires the same data to be duplicated for each filter transformation.
Source Qualifier Transformation:
1.An active transformation that can read the data from relational sources and flat files
2.Can define SQL override when source is relational data base
3 SQL Override: It is the process of changing or overriding default SQL
4.User Defined SQL select statement
5.The following properties can be defined with source qualifier transformation
a) SQL Query(SELECT)
b) User defined Join(WHERE)
c)Source Filter(Modified WHERE)
d) No of sorted Ports(Order By)
f) Select Distinct(Distinct)
g) Pre&Post SQL Commands
SQL Query:Allows you to over ride the default select query

User Defined Joins:
1.Join Separate sources using where clause can join any no of source tables
2.Supports Homogeneous relational tables Only
3.Supports Standards SQL joins like Inner Join (Or) Equie Joins
a.Inner Join
b.Left Outer Join
c.Right Outer Join
d. Full Outer Join
Source Qualifier-Advantages:
1.Can join any no of tables full functionality of standard SQL available
2.May reduce the volume of data on network(when we are writing where clause in SQL statement)
Source Qualifier-Dis Advantages:
1.Can only join homogeneous relational tables
2.Can effect the performance on the source database(when source database cache is less memory)
Difference Between Joiner and Source Qualifier Transformation
Joiner:
Can join only two tables
2.Suports heterogeneous relational tables
3.Join uses Heterogeneous Cache
4.Supports Non relational sources(Flat Files)
Source Qualifier:
1.Can join any no of tables
2. supports homogeneous relational tables
3. Source Qualifier uses Database Cache
4. Doesn’t support non relational sources

Aggregator Transformation:
1.Aggrigator is active and connected transformation
2.Use Aggregator transformation to perform aggregate or group level calculations
Ports In Aggregator:
Input Port:
It receives the data from source or other transformation
Output port:
To perform aggregate calculations to pass data to target or to the transformation
Variable Port:
To perform non aggregate calculation
Note: Aggregate functions is not allowed in Variable port
Group By Port: To specify the Groups
1.By default Integration service returns last record from each group ,if no group by port is specified
integration service consider entire data as single group and returns last record
2.Aggrigator transformation supports both single and multiple aggregate functions
Single: Min(sal)
Multiple: Min(Avg(sal))
3.Aggrigator doesn’t support multilevel aggregate functions with in a single Aggregator transformation
4.Another important functionality of aggregator is DE duplication
Aggregate Expression:
1.Aggrigate expressions are allowed in aggregator transformation only
2. Aggregate functions are always allowed in Output port
Examples for Aggregate functions:
Min,Max,Avg,First,Last,Sum,Count
Sorted Aggregator:
An aggregator transformation with sorted input option enables is called sorted aggregator
Guide line to implement sorted aggregator
1. Perform sorting all group by ports before passing data to aggregator
2. If multiple ports are selected for group by perform sorting on all ports in same order
3.If sorted I/P is enabled, unsorted data provided integration services fails the Session
Performance Tuning:
To improve the performance of aggregator then enable sorted I/P
Union Transformation:
This is of type active transformation which combines similar data sources into a single result set or
data set
2. Union transformation functions as Union All set operator in SQL
3.Union transformation created with two types of groups
a) Input Group(Multiple I/P Groups)
b)Output Group(Always one O/P Group provides results set)
Each I/P Group receives the data from source pipeline
4.Union transformation supports Heterogeneous data sources i.e different data bases
Note: Union eliminate duplicates
Union All allows duplicates
Union All performance wise more faster

Sorter Transformation:
1. Sorter is Active and connected transformation
2. use sorter transformation to perform sorting in ascending or descending order
3. To perform sorting based on Case sensitivity
4. To perform distinct operations on records
5. Sorter transformation it contain I/P,O/P,Key ports
6. Use key port to specify based on which column sorting has to be performed
7.You can select one more ports as key ports
8.If more than one port is selected as a key port integration service perform sorting on all columns in
sequential order from top to bottom ,However ports appears in sorter transformation
9.With in a single sorter transformation both ascending and descending sort order can be configured
10.If a sorter transformation is configured with out Key port& Distinct option Integration service makes
the mapping is Invalid
11. If distinct option is selected then sorter transformation eliminate duplicate records hence sorter will
be called as a active transformation
Sorter Cache:
1.Integration service creates single cache to process sorter transformation
2.when a session started then sorter transformation Integration services Cache the data before
performing sort operation
3.Integration service perform sort operation inside cache memory based on key column&sort order,
return the records out
4.If Cache memory is less than required memory space to perform sorting,integration service writes the
data to disk memory
5.the process of writing data to disk memory&swapping that between cache memory&disk is called
paging
6.In paging reduce the performance of sort operation
7.To improve the performance of sort operation ,increase the Cache memory.
Stored Procedure Transformation:
1.This is of type passive transformation that can import the stored procedure from database
2.A stored procedure transformation can configure in to two different categories
a) Connected stored procedure
b) Un connected stored procedure
Connected stored procedure:
1.It is connected to the source and connected to the target
2.Can receive multiple I/P ports and can return Multiple O/P ports
Un Connected stored procedure:
1.Niether connected to the source and Nor connected to the target
2.Can receive multiple I/P ports and can return a single O/P port
Uses:
1.Calculations per each row
2.Dropping and re creates Index
3.Calculating the space required for Loading
PL SQL Program:
CREATE OR REPLACE PROCEDURE STG_CALC_PROCS
V_EMPNO IN NUMBER,
TOTSAL OUT NUMBER,
TAX OUT NUMBER,
HRA OUT NUMBER
IS
BEGIN
SELECT SAL+NVL(COMM,0),SAL*0.1,SAL*0.4
INTO
TOTSAL,TAX,HRA
FROM EMP
WHERE EMPNO=V_EMPNO;
END;
/
Procedure for Implementing stored procedure:
Source: Table[emp]
Target:Table[empno,ename,sal,comm,totsal,tax,hra)
Mapping:M_STG_EMP_ST_PROC
1.Drag and drop the source def,target def in mapping designer work space
2.From transformation menu select create select transformation type stored procedure
Enter the name click on create connect to the DB with following details
ODBC Data source Name:
Username:
Owner Name:
Password:
Then next click on connect then select the procedure of name STG_CALC_PROCS from scott user click
on OK
3.From SQ_EMP connect port Empno to the Stored Procedure
EMPNO->V_EMPNO
4.From stored procedure connect the 3 output ports to the target
5.double click the stored procedure transformation select properties tab
6.From SQ connect a remaining ports the target
7.Save mapping,create session,create workflow,run work flow

Transformation Control Transformation:
1.TCL is a active and connected transformation
2.A se t of rows that are bounded by commit or rollback is called transaction
3.Informatica power center the transactions can be controlled in two different ways
a)Mapping
b)Session
Configuring user defined commit:
User defined commits has to be specified at two levels
a) Maping Level
To configure user defined commit use TCL transformation in mapping
b) session lEVEL
To configure user defined commit at session level set commit type as userdefined
TCL Variable:
To configure user defined commit type TCL provides the following variable
1.TC_CONTINUE_TRANSACTION
2.TC_COMMIT_BEFORE
3.TC_COMMIT_AFTER
3.TC_ROLLBACK_BEFORE
4.TC_ROLLBACK_AFTER
Transaction control-Mapping:
If we want to control the transaction based on a given condition then create a transaction control
transformation in mapping
is known as user defined commit
Ex: IF(SAL>8000,TC_COMMIT_AFTER,TC_ROLLBACK_AFTER)
Transaction Control-Session:
1.Transaction can be controlled based on no of rows
2.define commit interval property at session level
3.A commit interval is the no of rows that you want to use as a basis for commits
4.default commit interval is 10,000
5.Session can be configured with commit type
a) Source
b) Target
c) Userdefined
6.Default commit type is target

Rank Transformation:
This is of type an active transformation which allows you to calculate the ‘TOP” and “BOTTOM”
performance
The Rank Transformation is created with following types of ports
1.I/P port
2.O/P port
3.V/Port(Varible)
4.R/port(Rank)
Rank Port:
The port which is participated in rank calculations is known as Rank port
Variable Port:
A port which allow you to develop expression to store the data temporarily for rank calculation is known
as variable port
Variable port to support to write expressions which are required for Rank calculation
Set the if properties to calculates the ranks
1.TOP(OR )Bottom=TOP
2.No of Ranks=3
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
----- ---------- --------- ----- --------- ----- ----- ------
7499 ALLEN SALESMAN 7698 20-FEB-81 1600 300 30
7521 WARD SALESMAN 7698 22-FEB-81 1250 500 30
7654 MARTIN SALESMAN 7698 28-SEP-81 1250 1400 30
7698 BLAKE MANAGER 7839 01-MAY-81 2850 30
7844 TURNER SALESMAN 7698 08-SEP-81 1500 0 30
7900 JAMES CLERK 7698 03-DEC-81 950 30
EMPNO ENAME JOB SAL DEPTNO TAX
----- ---------- --------- ----- ------ -----
7698 BLAKE MANAGER 2850 30 484.5
7499 ALLEN SALESMAN 1600 30 272
7844 TURNER SALESMAN 1500 30 255
EMPNO ENAME JOB SAL DEPTNO TAX RANKINDEX
----- ---------- --------- ----- ------ ----- ---------
7698 BLAKE MANAGER 2850 30 484.5 1
7499 ALLEN SALESMAN 1600 30 272 2
7844 TURNER SALESMAN 1500 30 255 3

Mapplets&Types of Mapplets:
1.A Mapplet is a reusable object created with the business rules using set of transformation
2.A mapplet is created using mapplet designer tool in designer client component
There are two types of mapplets
a) Active Mapplet
b) Passive Mapplet
1.Active Mapplet:
A mapplet is created with at least one active transformation is known as active mapplet
2.Passive Mapplet:
A Mapplet which is created with all passive transformations is known as passive mapplet
Mapplet Limitations:
Keep the following instructions in the mind while creating the mapplet
1 If you want to create a stored procedure T/R you should create stored procedure with type normal
2.If you want to use a sequence generator T/R you should use reusable sequence generator T/R
3.The following T/R can’t be used
a) Normalizer T/R
b) XML Source qualifier T/R
c) PRE-POST Stored procedure T/R
d)Mapplet(Nested Mapplet cant be created)
Advantages:
1.Business rules can be reused across multiple mappings
2.Save Development time
Note:
Set of T/R are placed between Mapplet 1/P&Mapplet O/P
A Mapplet can also be created without Mapplet I/P T/R

Create a Mapplet without Mapplet I/P T/R:
Reusable Transformation:
A reusable Transformation is a reusable object created with business rules using Single T/R
There are two ways to create a reusable T/R
1.Using Transformation Developer Tool

2.Converting Non reusable T/R in to reusable T/R
Limitations:
Source qualifier T/R can’t be create as reusable Transformation
Trasnformation Developer Tool:
Open the client Power center Designer from tools menu select Transformation Developer,from
Transformation menu select Create and select the Transformation type Sequence Generator and enter
the name “DW_KEY” click on create and done
And next repository menu click on save
Converting Non reusable T/R into reusable T/R:
Select a Mapping in a Mapping Designer workspace,select the T/R which you want to convert to
reusable,double the transformation and from Trasformation tab select
make reusable next clik on YES and click Apply and click Ok
Key Points:
When we drag the reusable transformation to the mapping designer work space it will created as
instance,you can modify the instance properties,that doest not reflect on original Object
User Defined function(UDF):
1.It is a reusable object that extends power center built in functionality
2.It can be private or public
3.It is created using power center designer client
4. :UDF is an identifier that identifies User Defined function
Procedure for creating User Defined function:
From Left window select User Defined function sub folder from tools menu select “user defined
function” click on “NEW” provide the function name “TRIM” and select type “Public” and next clik on
new Argument
And next clik on Launch Editor
Write the function LTRIM(RTRIM(Arg1))
And from repository menu click on Save
Constraint Based Load Order:

Loads the data in an order which is based on Primary key and forigen Key relation ship
Business Purpose:
Loading data into Snowflake Dimensions which are related to primary key and forigen key relations ship
Constraint Based Loding:

Design a mapping described above
Create a session and double click the session and Select Configure Object
Target Load Plan:

Define the order In which data Extracted from Source Qualifier
Single Mapping Two Pipe lines:
Procedure:
Design a Mapping with two pipelines as described above
Select the Mapping Menu Click on “Target Load Plan” and change the Load Order using Up and Down
arrows click ok and save mapping
Unconnected Look Up Transformation:

1. It is not a part of mapping data flow direction
2.It is Neither connected to the source nor connected to the target

3.It can receive the multiple I/P port but it returns single output port that should designated as return
port[R]
4. If the [R] port is not checked then the mapping is valid ,but the session created for that mapping will
fail at run time.
5.An unconnected look up is very commonly used when the look up is not needed for every input
record
6.Look up data is called at the point in the mapping that needs it
7.Look up function can be set with in any transformation that supports to write expressions(Expression
transformation)
8.Use a look up function within conditional statement(IIF())
9.The condition is evaluated for each record, but the look up function is only called if the condition
evaluates True
10.The un connected look up transformation is called following key expression
(:LKP.LookupName)
Business Purpose:
1.A source table or file may have a percentage of records with incomplete data.the holes in the data can
be filled by performing a look up to another table or tables.
2. As only a percentage of the rows are effected it is better to perform the look up on only those rows
that need it and not the entire data set
SCD-Type1:
SCD Type1 is used to maintain current status of the data in Dimension table
2.SCD-I never maintains the historical data
Business Functionality:
1.Insert New records which are coming from Source
2.UPDATE_DATE existing records that are coming with change from source
EXAMPLE DATA:
EMPNO ENAME JOB SAL
7369 SMITH CLERK 800
7499 ALLEN SALESMAN 1600
INITIAL LOAD
EMP_SURR_KEY EMPNO ENAME JOB SAL
1 7369 SMITH CLERK 800
2 7499 ALLEN SALESMAN 1600
SMITH PRAMOTED TO CLERK TO ANALYST
EMP_SURR_KEY EMPNO ENAME JOB SAL
1 7369 SMITH ANALYST 800
2 7499 ALLEN SALESMAN 1600
3.To verify the existence of records in target
4.to assign flag(INSERT/UPDATE_DATE) for each record
5.To route records INSERT/UPDATE_DATE how
6.To generate surrogate key
7.To change row type to UPDATE_DATE
8.Target Instance for INSERT
9.Target Instance for UPDATE_DATE

Implementation of SCD-TYPE1:
1.create a target with name EMP_SCD1(EMP_SURR_KEY+EMP)
2.Create a Mapping with name M_EMP_SCD1
3.Drag & Drop EMP source definition,Two instances of target EMP_SCD1 into mapping designer
workspace
4.Rename the target instances as EMP_SCD1_INSERT,EMP_SCD1_UPDATE_DATE
5.Create a Look Up T/R on target EMP_SCD1 with name LKP_EMP_SCD1
6.Select EMPNO port from Source Qualifier and Connect to Look Up T/R
7.Double click on Look UP T/R go to condition tab and enter the below condition
EMPNO=EMPNO1
8.Go to ports tab and configure the ports as shown below
9.Click on apply click on Ok
10.Create an Expression T/R,Select EMP_SURR_KEY,JOB,SAL ports from Look Up T/R and connect to
Expression T/R
11.Select All ports from Source Qualifier and connect to Expression T/R
12.Double click on Expression T/R ,Go to ports tab and add two O/P Ports INSERT,UPDATE_DATE and
enter the Expression as shown below.
INSERT =IIF(ISNULL(EMP_SURR_KEY),’TRUE’,’FALSE’)
UPDATE_DATE= IIF(NOT ISNULL(EMP_SURR_KEY) AND (JOB!=JOB1 ORS SAL!=SAL1),’TRUE’,’FALSE’)

13.Click on Apply and click on Ok
14,Create a ROUTER T/R ,Select EMP_SURR_KEY,All Ports coming from source,INSERT,UPDATE_DATE

Ports from EXPRESSION T/R and connect to ROUTER T/R
15.Double click on ROUTER T/R go to group tab and add two groups as shown below
15.Click on Apply and click on OK
Implementing INSERT Flow:

1.Create a SEQ GENERATOR T/R,collect next value port to EMP_SURR_KEY in EMP_SCD1_INSERT(target
Instance)
2. Select EMPNO,ENAME--------DEPTNO from INSERT group of ROUTER and connect to

EMP_SCD1_INSERT(target Instance)
Implementing UPDATE_DATE Flow:

1.Create an UPDATE_DATE STRATEGY T/R ,Connect EMP_SURR_KEY,JOB,SAL ports from UPDATE_DATE
group of ROUTER T/R to UPDATE_DATE STRATEGY T/R
2.Double click on UPDATE_DATE STRATEGY T/R go to properties tab and enter the UPDATE_DATE
strategy expression value as DD_UPDATE_DATE or 1
3. Connect EMP_SURR_KEY,JOB,SAL ports from UPDATE_DATE STRATEGY T/R to

EMP_SCD1_UPDATE_DATE target instance
4.Save the Mapping and Create Session with name S_M_EMP_SCD1 and create Work flow with name
W_S_M_EMP_SCD1,Execute Work flow
Note: Don’t Enable Truncate Target table option in session properties
SCD-TYPE-2:
SCD-Type2 method is used to maintain current data along with complete history in Dimension tables
Business Functionality:
1.Insert New records that are coming from source

2.Insert existing records that are coming source with change
SOURCE
EMPNO ENAME JOB DEPTNO
7369 SMITH CLERK 20
7499 ALLEN SALESMAN 30
TARGET
EMP_SURR_KEY EMPNO ENAME JOB DEPTNO IND START_DATE END_DATE
1 7369 SMITH CLERK 20 Y 17/12/1980
2 7499 ALLEN SALESMAN 30 Y 20/02/1981
SMITH PRAMOTED FROM CLERK TO

ANALYST
EMP_SURR_KEY EMPNO ENAME JOB DEPTNO IND START_DATE END_DATE
1 7369 SMITH CLERK 20 N 17/12/1980 31/12/2014
2 7499 ALLEN SALESMAN 30 Y 20/02/1981
3 7369 SMITH ANALYST 20 Y 01/01/1015
Blue Print or Proto type of SCD-Type2:
3.To verify the existence of records in target
4.To collect rows from Look up to SQ
5.To route records INSERT/UPDATE_DATE

6.To generate Date&IND values
7.To generate surrogate key
8.Target Instance for INSERT
9.To generate Date&IND values
10.To change row type to UPDATE_DATE
11.Target Instance for UPDATE_DATE
Steps to implement SCD-Type2:

1.Create a EMP_SCD2(EMP_SURR_KEY+EMP+STARTDATE,ENDDATE,IND)
2.Create a Mapping with Name M_EMP_SCD2,drag &drop EMP source Defination,two instances
EMP_SCD2 target into mapping designer workspace
3. Rename the target instances as EMP_SCD2_INSERT and EMP_SCD2_UPDATE_DATE
4.Create a Look up T/R on target EMP_SCD2
5. Select EMPNO port from SQ T/R and connect to Look UP T/R
6.Double click on Look Up T/R go to ports tab and configure the ports as shown below
7.Go to condition tab and add below condition
EMPNO=EMPNO1
8.Go to properties tab and enter below query for Look Up SQL override attribute
SELECT EMP_SCD2.EMP_SURR_KEY AS EMP_SURR_KEY,
EMP_SCD2.JOB AS JOB
EMP_SCD2.SAL AS SAL
EMP_SCD2.EMPNO AS EMPNO
EMP_SCD2
WHERE EMP_SCD2.ENDDATE IS NULL
OR
EMP_SCD2.IND=’Y’
10.Create an Expression T/R select all ports from Look Up T/R,Source Qualifier and connect to
Expression T/R
11.Create a ROUTER T/R,Select all ports from EXPRESSION T/R and connect to ROUTER T/R
12.Double click on ROUTER T/R and two groups as INSERT,UPDATE_DATE enter below expression for
INSERT and UPDATE_DATE group
13.
INSERT=ISNULL(EMP_SURR_KEY)
OR
(JOB!=JOB1)
OR
(SAL!=SAL1)
UPDATE_DATE=NOT ISNULL(EMP_SURR_KEY)
AND
(JOB!=JOB1 OR SAL!=SAL1)
Implementing INSERT flow:
1.Create an Expression transformation, select EMPNO,ENAME------DEPTNO from INSERT group of

router and connect to EXPRESSION transformation.
2.Double click on Expression transformation ,go to the ports tab and add two O/P ports STARTDATE,IND
3.
5.Select All ports from EXPRESSION T/R and Connect to EMP_SCD2_INSERT target INSTANCE
6.Create SEQUENCE GENERATOR T/R,Connect next value port to the EMP_SURR_KEY column in
EMP_SCD2_INSERT target Instance
Implementing UPDATE_DATE flow:
1.Create an EXPRESSION T/R,Select EMP_SURR_KEY port from UPDATE_DATE group of ROUTER T/R
and connect to the EXPRESSION T/R
2.Double click on EXPRESSION T/R and add two OUTPUT ports
3.Click on Apply and Click on OK
4.Create an UPDATE_DATE STRATEGY T/R,Select All ports from EXPRESSION T/R and connect to
UPDATE_DATE STRATEGY T/R
5.Double click on UPDATE_DATE STRATEGY T/R ,go to properties tab and enter DD_UPDATE_DATE or 1
as UPDATE_DATE STRATEGY EXPRESSION click on Apply and click on OK
6.Select All ports from UPDATE_DATE STRATEGY T/R and connect to respective ports in
EMP_SCD2_UPDATE_DATE target instance
7.Create a session,workflow,Run work flow
Note: Don’t Truncate target table option in session properties

Importing& Exporting repository objects:
Exporting Repository Objects:
1.Repositary Objects such as Mappings,Sessions etc can be exported into a metadata file format called
.XML
2.Use Repository Manager client to perform object exports
Procedure:
1.Open client Powercenter repository Manager
2.From Left window expand the folders
3.Expand Mappings,Select all the Mappings
4.From Repository Menu click on Export Object
5.Select the file directory enter the file name “Dev_20150512 and click on save
Import Repository Objects:
Repository Objects such as Work flows,Sessions,Mapping etc can be imported from metadata file called
.XML
Procedure:Create a New folder from left window select the new folder(Repository Manager Client)
1.From repository menu click on Import Objects
2.click on browse to select an XML file
3.Select XML file and click on OK
4.Click on Next
5.Click on Add All
6.Click on Next
7.Choose the destination folder and click on Next and again click on Next and click on Import
8.Click on Done
9.Select the folder and right click and click on refresh

Source Qualifier Transformation:
1.It is an active transformation and that can read the data from relational sources and flat files
2.Can define SQL override when source is a relational DB
3.SQL override: it is the process of changing or overriding default SQL
4.User defined SQL select statement
5.The following properties can be defined in Source Qualifier transformation.
a) SQL query(SELECT)
b) User defined join(WHERE)
c)Source filter(Modified WHERE)
d)No of sorted ports(ORDER BY)
f) Select Distinct(DISTINCT)
g)Pre and Post SQL commands
6.SQL query: Allow you to override the default select statement
7.User Defined Join:Joins separate sources using where clause can join any no of source tables
8.It supports homogeneous relational tables onlyr
9.Supports Standard SQL joins like Inner,Left,Right,Full Outer joins
Advantages of Source Qualifier:
1.Can join any no of tables full functionality of standard SQL available
2.May reduce the volume of data on network (when we are writing where clause in SQL statement)
Dis Advantages of Source Qualifier:
1.Can only join homogeneous relational tables
2.Can effect performance on the source database

Flat Files:
A data file with .txt or .csv or .dat is called as Flat file
2. A flat file can be used as Source and target
3.There are two types of flat files
a) delimated flat files
b)Fixed width flat file
Delimated Flat File:
1.In a delimated flat file each column will be separated by a special char like
,Comma(,),Pipe(|),Dollor($) etc
2Combination of special chars also can be used as delimator
3.Most commonly used delimator is Comma
File List:
1.File list is the method used for loading the data from multiple files to single target by using single
source definition
2.File list is also called indirect loading method.
3.File list can be apply on files that are having similar metadata
Mapping Parameter:
A Parameter represents a constant value which can be defined before mapping run.
1.Mapping parameter is a constant value which means the value remains same throughout the session
2.Mapping parameter s are local to the mapping
3.Maping parameters can be used with SQL override
4.A mapping parameter is created with name, data type,precision,scale
Advantages:
1.Mapping parameters are created with to standardize the business and increase the flexibility in
development
2.Mappinmgs can be reused for various constants
3.Mapping parameters can be defined with constant value in parameter file which is saved with an
extension either .txt or .prm
Syntax of parameter file:
[FOLDER.WF:WORKFLOW.ST:SESSION]
$$Param1=const value
Procedure:
1.Source table column:EMP
2.Target Table column:STG_EMP_PARAM[Empno,Ename,Job,Sal,Tax,Deptno]
3.Create a mapping with name ‘M_Stg_Emp_Param’
4.Drop the source & target definations
5.Create a transformation type expression
6.From Mapping menu select parameters&variables
7.From tool bar click on add a new variable
Name Type Data type Prec Scale
$$DNO Parameter Integer 10 0
$$PERCENT Parameter decimal 6 2
8.Click ok
9.From Source Qualifier copy the required ports to the expression Empno,Ename,Job,Sal,Deptno
10.double click the source qualifier and select the properties tab
Transformation Attribute Value
SELECT EMP.EMPNO,EMP.ENAME,EMP.JOB,EMP.SAL
SQL Query EMP.DEPTNO FROM EMP WHERE EMP.DEPTNO=$$DNO
11.Click apply & click ok
12.Double click the expression transformation select the ports tab
Portname Data type Prec Scale I O V
Tax decimal 7 2 yes

In the Expression Write the following derivation SAL*$$PERCENT
13.From exp connect the ports to the target
14.Save Mapping
Creation of Parameters file :
1.Open the notepad and type the following syntax
[BATCH7AM.WF:W_S_M_PARAM.ST:S_M_PARAM]
$$DNO=20
$$PERCENT=0.25
2.From file menu click on Save Test.txt
C:\Param\Test.txt
3.Create a session with name ‘S_M_PARAM’
4.Double click the session select the properties tab’
Attribute Value
Parameter file name C:\Param\Test.txt
5.Select the mapping tab define reader and writer connections
6.Click Apply and Click Ok
7.Save the session
8.Create work flow with name ‘W_S_M_PARAM’
9.Run workflow
Advantages: Use mapping parameters to perform incremental extraction from source.

SQL Transformation:
1. It is active and Passive and connected transformation
2. It is used to process SQL queries in the mid stream of pipeline, we can insert and UPDATE_DATE and
delete and retrieve rows from the data base at run time using the SQL transformation
3. The SQL transformation process external SQL scripts or SQL queries created in the SQL editor
Script Mode (Passive):
The SQL transformation runs SQL scripts that are externally located, we pass a script name to the
transformation, with each input row, The SQL transformation O/P‘s one for each I/P row
Query Mode (Active):
The SQL transformation executes a query that we defined in a query editor we can pass strings or
parameter s to the query to define dynamic queries or change the selection parameters, we can O/P
multiple rows when the query has select statement
Example:
Create SQL script and save it in notepad file
INSERT INTO FIR_AGG(
SELECT DISTRICT,COUNT(*) AS TOTALFIR,COUNT(ONLINEFIR) AS ONLINEFIR,COUNT(OFFLINEFIR) AS

OFFLINEFIR FROM
SELECT DISTRICT,
CASE WHEN INSERT_DATE=UPDATE_DATE THEN 1 ELSE NULL END AS ONLINEFIR,
CASE WHEN INSERT_DATE=UPDATE_DATE THEN 1ELSE NULL END AS OFFLINEFIR FROM FIR)Q GROUP BY
Q.DISTRICT)
Save as C:\SOURCE\FIR_SCRIPT.txt
And also create one more note pad text file and save it as C:\SOURCE\SCRIPT_ADD.txt
PC DESIGNER:
SA:Import C:\SOURCE\SCRIPT_ADD.TXT
TD:FIR_RES(FLAT FILE)
STATUS STRING 10
MESSAGE STRING 1000
MD:MAP_SQL
Create SQL transformation S1 and select SCRIPT MODE ok
Note:Edit session Mapping tab SQ_FIR_ADD assign C:\SOURCE\FIR_ADD.txt

Informatica SCD Type1:
Source: CUST
CID
NAME
DOB
Create table CUST(CID NUMBER,NAME VARCHAR2(12),DOB DATE)
Insert into CUST values(11,’BEN’,’12-JAN-67’)
Insert into CUST values(12,’ALEX’,’15-JAN-62’)
Insert into CUST values(13,’JOHN’,’13-FEB-85’)
SELECT * FROM CUST;
CID NAME DOB
---- ------------ ---------
13 john 13-FEB-85
11 ben 12-JAN-67
12 alex 15-JAN-62
Target: CUST_TYP1
CKEY
CID
NAME
DOB
Mapping Designer: M_SCDTYPE1
Save Mapping and Create Session with Name S_M_SCDTYPE1
Workflow Designer: Create workflow with name Wf_S_M_SCDTYPE1
Step: Run Workflow
After that check the data in target table
SELECT * FROM CUST_TYP1;
CKEY CID NAME DOB
----- ---------- ------------ ---------
100 13 john 13-FEB-85
101 11 ben 12-JAN-67
102 12 alex 15-JAN-62

Update source table
UPDATE CUST SET DOB ='16-AUG-85' WHERE CID=13;
UPDATE CUST SET DOB='14-FEB-68' WHERE CID=11;
Start workflow once again and check data in target table
SELECT * FROM CUST_TYP1;
CKEY CID NAME DOB
---- ---------- ------------ ---------
100 13 john 16-AUG-85
101 11 ben 14-FEB-68
102 12 alex 15-JAN-62
Source:CUST
CID
NAME
LOC
Create table CUSTSRC(CID NUMBER,NAME VARCHAR2(20),LOC VARCHAR2(20));
Insert into CUSTSRC values(11,’BEN’,’CHE’)
Insert into CUSTSRC values(12,’ALEN’,’MUM’)
Insert into CUSTSRC values(13,’RAM’,’PUN’)

SELECT * FROM CUST;
CID NAME LOC
---- -------------------- ---------
11 BEN CHE
12 ALEN MUM
13 RAM PUN
Target: CUST_TYPE2
DROP TABLE CUST_TYPE2;
CREATE TABLE CUST_TYPE2
CKEY number,
CID number,
NAME varchar2(20),
LOC varchar2(20),
FLAG Number
);

Create Session with Name S_M_SCDTYPE2 and create Workflow with name Wf_S_M_SCDTYPE2
Step: Start workflow
Check the data in target table
SELECT * FROM CUST_TYPE2;
select * from cust_type2;
CKEY CID NAME LOC FLAG
----- ---------- -------------------- ---------------------------------------
100 11 BEN CHE 1
101 12 ALEN MUM 1
102 13 RAM PUN 1

update cust set loc='BLG' where CID=11;
update cust set loc='CHE' where CID=12;
SELECT * FROM CUST;
CID NAME LOC
---- -------------------- ---------
11 BEN BLG
12 ALEN CHE
13 RAM PUN
After that Start Workflow Wf_S_M_SCDTYPE2
And check data in target table
Select * from CUST_TYPE2;
CKEY CID NAME LOC FLAG
----- ---------- -------------------- -------------------- -------------------
100 11 BEN CHE 0
101 12 ALEN MUM 0
102 13 RAM PUN 1
103 11 BEN BLG 1
104 12 ALEN CHE 1

Source: CUSTSRC
CID
NAME
LOC
Create table CUSTSRC(CID NUMBER,NAME VARCHAR2(20),LOC VARCHAR2(20));

Insert into CUSTSRC values(11,’BEN’,’CHE’)
Insert into CUSTSRC values(12,’ALEN’,’MUM’)
Insert into CUSTSRC values(13,’RAM’,’PUN’)
SELECT * FROM CUSTSRC;
CID NAME LOC
---- -------------------- ---------
11 BEN CHE
12 ALEN MUM
13 RAM PUN
Target: CUST_TYPE3
DROP TABLE CUST_TYPE3;
CREATE TABLE CUST_TYPE3
CKEY number,
CID number(15),
NAME varchar2(20),
CLOC varchar2(20),
PLOC varchar2(10)
);
Create Session with Name S_M_SCDTYPE3 and create Workflow with name Wf_S_M_SCDTYPE3
Step: Start workflow
Check the data in target table
CKEY CID NAME CLOC PLOC
----- ---------- -------------------- ------------------------------------
100 11 BEN CHE
101 12 ALEN MUM
102 13 RAM PUN
Next Update source table data
UPDATE CUSTSRC SET LOC='BNG' WHERE CID=11;
UPDATE CUSTSRC SET LOC='VJY' WHERE CID=12;

COMMIT;
SELECT * FROM CUSTSRC;
CID NAME LOC
---- -------------------- -----------------
11 BEN BNG
12 ALEN VJY
13 RAM PUN
Next check the data in target table
CKEY CID NAME CLOC PLOC
----- ---------- -------------------- -------------------- -------
100 11 BEN BNG CHE
101 12 ALEN VJY MUM
102 13 RAM PUN
Informatica SCD Type2(Version):

Source: CUST
CID
NAME
LOC
Create table CUST(CID NUMBER,NAME VARCHAR2(20),LOC VARCHAR2(20));
Insert into CUST values(11,’BEN’,’CHE’)
Insert into CUST values(12,’ALEN’,’MUM’)
Insert into CUST values(13,’RAM’,’PUN’)

SELECT * FROM CUST;
CID NAME LOC
---- --------------------- ---------
11 BEN CHE
12 ALEN MUM
13 RAM PUN
Target: cust_type2_vrsn
CREATE TABLE CUST_TYPE2_VRSN
CKEY number NOT NULL,
CID number,
NAME varchar2(20),
LOC varchar2(20),
VERSION number
);
Generate and execute
Mapping Designer: create Mapping with name M_SCDTYPE2_VERSION

Create session with name S_M_SCDTYPE2_VERSION and assign source and target and look up
connections
Next create workflow with name Wf_S_M_SCDTYPE2_VERSION
Before start work flow check data in source table and target table
SELECT * FROM CUST;
CID NAME LOC
------ -------------------- ----------
11 BEN CHE
12 ALEN MUM
13 RAM PUN
SELECT * FROM CUST_TYPE2_VRSN
No rows selected
Next Start workflow and check data in target table

SELECT * FROM CUST_TYPE2_VRSN;
CKEY CID NAME LOC VERSION
---- ---------- -------------------- -------------------- --------------------
100 11 BEN CHE 1
101 12 ALEN MUM 1
102 13 RAM PUN 1
UPDATE CUST SET LOC='BNG' WHERE CID=11;
UPDATE CUST SET LOC='CHE' WHERE CID=12;
Commit;
Start Workflow again and check the data in target table
SELECT * FROM CUST_TYPE2_VRSN;
CKEY CID NAME LOC VERSION
---- ---------- -------------------- -------------------- -------------------------
100 11 BEN CHE 1
101 12 ALEN MUM 1
102 13 RAM PUN 1
103 11 BEN BNG 2
104 12 ALEN CHE 2
Project Development Life Cycle:

X KICK OFF MEETINGS1
X KICK OF MEETINGS2
ANALYSIS PHASE
DESIGN PHASE
CODING PHASE
REVIEWS
TESTING PHASE
GO LIVE PHASE
SUPPORT
1.Analysis Phase:
Business Analyst:
Gathers Business requirements it in Business , It consists of Business process,Organization

structure,Target users requirements details,source system
Senior Team based on BRS provides Hardware and Software requirements
Outcome:
SRS (System requirement Specification) consist of the following details
1. Operating system has to be used
2. DB Tool to be used
3. ETL, OLAP, Modeling tools to be used
2.Design Phase:
Data warehouse Architect/ETL Architect provides solution to build the DW or Data marts
Requirements it in Business consists of Business process, Organization structure, Target users

requirements details, source system
Outcome:
HLD (High Level Design Document) consists of the following Details
1.Summary Information
2.Project Architecture
3.System Architecture
4.Source
5.DB
6.ETL tool Details
7.Data Flow Diagram
8.Data Model
9.Source Object Details
10.Target Object Details
11.Staging Object Details
12.Mapping Details
Senior Technical Team:
Provides detail technical specifications for each Mapping
Outcome: Low Level Design Document It consists of Source and Target Object Details
(Field Names,Data Types,Length,Description),Entire Mapping Flow,Detail technical Design for

each Mapping.Block Diagram,Business Logic Pre and Post Dependencies,Schedule Options,Error
Handling
ETL Team:
Mapping Design Document is prepared for each Mapping
Outcome: Mapping Design Document
3.Coding Phase:
Mapping is created based on Design document
Code Review: Code review is to check Business Logic and whether naming standards are
followed or Not
Peer Review:
Team member review the same as above mentioned, If everything is Ok then do testing
4.Testing Phase:
1. Unit testing (Mappings are tested by individual users debugger or enable test load to test
mapping with limited test data
2. SIT (System Integration testing): Mappings are tested according to their dependencies
3. UAT (User acceptance testing): Mappings are tested in the presence of onsite users
4.Production Phase:
Jobs are scheduled and monitored scheduling tools-UC4,DAC,Autosis,Control-M,Tivoli Work

Load Scheduler
Project Architecture:
Tell Me About Your Self:
Hi good morning
Myself is Bhaskar Reddy Allam
Coming to my education details, I have completed my MCA from a college which is affiliated to
JNTU University, Hyderabad.
Coming to my professional summary

I started my carrier as an employee in Infinite Computer Solutions joined there as a fresher. I
was trained over there on Data stage and Informatica and was mapped to a project.
My first experience in that project AP GBS DATA MART. We have developed data marts from
global data warehouse .after signing off from that project, I was mapped to another project in
the same company. In my second project we are worked for prudential insurance .there we
developed data warehouse for their proposed system. I worked as a Associate Software
engineer in IBM from Jun‘2014 –Oct’ 2016.
Later on that I was selected in IBM India Private Limited, Pune. I am working at IBM for client
GE. For GE we developed a data warehouse for their INTERNAL BILLING SYSTEM. Few days back
I got relieved from project and they are trying to map me into another project. I am working in
IBM from June’2014 to till date.
In my 6 years of experience I grown up as a developer gradually from time to time
In my 6 years exp I involved in many things like developing mappings, performance tuning, Data
profiling etc
In these 6 years of experience I gained hands on Experience on tools Informatica, data stage,
Information Analyzer, Quality stage, Trillium Discovery oracle and some knowledge on PL/SQL
and UNIX environment.
Documented By
Bhaskar Reddy Allam
Mail:abreddy2003@gmail.com
Mobile:9948047694

Informatica Bhaskar20161012

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Informatica Bhaskar20161012

Diunggah oleh

Hak Cipta:

Format Tersedia

INFORMATICA POWERCENTER

A transaction is a business operation

Technical point of view :

It is a set of DML operations(Insert,UPDATE_DATE,Delete)

OLTP System=OLTP applications(Front end)+Database(Back end)

Data warehousing=ETL Development+BI Development

Enterprise Data warehouse:

Storage Capacity point of view Relational DB is categorized in to three types

1.Low range DB:

Can organized and managed mega bytes of information

Can organized and managed Giga bytes of information

Example:Oracle,Microsoft SQL SERVER,Sybase,DB2,Informix,Postgress SQL

3.High range DB:

Storage point of view data base categorized in to two types.

1.NFS-Normal File storage

2.DFS-Distributed File storage

Data storage Patterns:

1.NFS-Normal File storage

2.DFS-Distributed File storage

NFS-Normal File storage:

1.Single Disk for storing the data

2.Shared every thing architecture(data shared in single disk)

3.Data reads in Sequential

4.All Mid range DB are developed on platform of NFS.

5.Limit scalability or expansion

6.Strongly recommended for OLTP applications

8.default processor in NFS is only one

9.Disk cant scalable in NFS

Note:Processor is a S/W component run as .exe

DFS-Distributed File Storage:

1.Multiple disks for storing the data

3.Data reads in parallel(supports parallelism)

2. DB that supports distributed file storage pattern

3.DB that supports nothing architecture

4.Database that supports unlimited scalability(expansion)

5.DB that massively parallel processing

7..DB that supports High Availability(Users can access)

9.Data base that supports parallel loading

Types of ETL tools

Two types of ETL tools to build Data Acquisition

1.GUI based ETL tool

2.Program Based ETL tool

Code Based ETL:

ETL applications are developed using programming languages such as

SQL, PLSQL, SAS, Teradata,ETL utilities

Example:Informatca,Data stage, Abnitio,SSIS

MSBI is a package it has(ETL+Reporting=SSIS+SSRS)

It is a process of filtering or rejecting Un wanted source data or records

Data Scrubbing: It is the process of Deriving new attributes or columns

It is the process of combining the data from multiple source systems

Data merging are two types

2.Data in a Data warehouse is derived from source system(OLTP/SOI)

OLTP: (Online transactional Processing)

Difference Between OLTP And Data ware house

Tables in Data Warehouse:

There are two types of tables we have in Data Warehouse

Stores textual or descriptive information about business process

Dimension tables example s in Retail Domain:

Dimension tables example s in Banking Domain:

Applictions, Customers, Products, Branches, Promotions, Time, Billing cycle Dimension

Fact table stores measurements or metrics of a business process

Fact table examples in Retail Domain: