Version 1.0
REVISION HISTORY
Page 1 of 115
01-Nov-
2004 1.0 Initial Document
14-Sep-
2010 1.1 Updated Document
Table of Contents
1 Introduction 4
1.1 Purpose 4
2 ORACLE 4
2.1 DEFINATIONS 4
NORMALIZATION: 5
First Normal Form: 5
Second Normal Form: 5
Third Normal Form: 5
Boyce-Codd Normal Form: 6
Fourth Normal Form: 6
ORACLE SET OF STATEMENTS: 6
Data Definition Language :(DDL) 6
Data Manipulation Language (DML) 6
Data Querying Language (DQL) 7
Data Control Language (DCL) 7
Transactional Control Language (TCL) 7
Syntaxes: 7
ORACLE JOINS: 9
Equi Join/Inner Join: 10
Non-Equi Join 10
Self Join 10
Natural Join 11
Cross Join 11
Outer Join 11
Left Outer Join 11
Right Outer Join 11
Full Outer Join 12
What’s the difference between View and Materialized View? 12
Page 2 of 115
Page 3 of 115
1 Introduction
1.1 Purpose
The purpose of this document is to provide the detailed
information about DWH Concepts and Informatica based
on real-time training.
2 ORACLE
2.1 DEFINATIONS
Organizations can store data on various media and in different
formats, such as a hard-copy document
Page 4 of 115
Create
Alter
Drop
Truncate
Insert
Update
Page 6 of 115
Select
Grant
Revoke
Commit
Rollback
Save point
Syntaxes:
REFRESH COMPLETE
AS
Page 7 of 115
DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');
Case Statement:
Select NAME,
(CASE
WHEN (CLASS_CODE = 'Subscription')
THEN ATTRIBUTE_CATEGORY
ELSE TASK_TYPE
END) TASK_TYPE,
CURRENCY_CODE
From EMP
Decode()
Select empname,Decode(address,’HYD’,’Hyderabad’,
‘Bang’, Bangalore’, address) as address
from emp;
Procedure:
cust_id_IN In NUMBER,
BEGIN
End
Trigger:
Page 8 of 115
REFERENCING
NEW AS NEW
OLD AS OLD
DECLARE
BEGIN
ELSE
-- Exec procedure
Exec update_sysdate()
END;
ORACLE JOINS:
• Equi join
• Non-equi join
• Self join
• Natural join
• Cross join
Page 9 of 115
USING CLAUSE
ON CLAUSE
Non-Equi Join
Self Join
Page 10 of 115
Cross Join
Outer Join
This will display the all matching records and the records which
are in left hand side table those that are not in right hand side
table.
Or
e.deptno=d.deptno(+);
This will display the all matching records and the records which
Page 11 of 115
Or
This will display the all matching records and the non-matching
records from both tables.
OR
View:
Materialized View:
Page 13 of 115
Ex:
Get dept wise max sal along with empname and emp no.
DELETE
TRUNCATE
DROP
The DROP command removes a table from the database. All the
tables' rows, indexes and privileges will also be removed. The
operation cannot be rolled back.
ROWID
ROWNUM
Page 15 of 115
Rowid Row-num
FROM table
[WHERE condition]
[GROUP BY group_by_expression]
[HAVING group_condition]
[ORDER BY column];
Page 16 of 115
Both where and having clause can be used to filter the data.
MERGE Statement
On (s1.no=s2.no)
Page 17 of 115
Sub Query:
Example:
Select deptno, ename, sal from emp a where sal in (select sal
from Grade where sal_grade=’A’ or sal_grade=’B’)
Example:
Find all employees who earn more than the average salary in
their department.
Group by B.department_id)
EXISTS:
Example: Example:
Indexes:
You should first get the explain plan of your SQL and determine
what changes can be done to make the code operate without
using hints if possible. However, hints such as ORDERED,
LEADING, INDEX, FULL, and the various AJ and SJ hints can take
a wild optimizer and give you optimal performance
Hint categories:
• ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data
warehousing systems.
• FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
• CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and
FIRST_ROWS, based on statistics gathered.
• Additional Hints
• HASH
Hashes one table (full scan) and creates a hash index for
that table. Then hashes other table and uses hash index to
find corresponding records. Therefore not suitable for < or
> join conditions.
/*+ use_hash */
Explain Plan:
If query taking long time then First will run the query in Explain
Plan, The explain plan process stores data in the PLAN_TABLE.
Page 22 of 115
• ALL_ROWS
One of the hints that 'invokes' the Cost based optimizer
ALL_ROWS is usually used for batch processing or data
warehousing systems.
• FIRST_ROWS
One of the hints that 'invokes' the Cost based optimizer
FIRST_ROWS is usually used for OLTP systems.
• CHOOSE
One of the hints that 'invokes' the Cost based optimizer
This hint lets the server choose (between ALL_ROWS and
FIRST_ROWS, based on statistics gathered.
• HASH
Hashes one table (full scan) and creates a hash index for
that table. Then hashes other table and uses hash index to
find corresponding records. Therefore not suitable for < or
> join conditions.
/*+ use_hash */
Store Procedure:
Packages:
Page 24 of 115
Triggers:
Types of Triggers
• INSTEAD OF Triggers
Row Triggers
Page 26 of 115
Table Space:
Control File:
Page 27 of 115
UNION
select
emp_id,
max(decode(row_id,0,address))as address1,
max(decode(row_id,1,address)) as address2,
max(decode(row_id,2,address)) as address3
group by emp_id
Other query:
select
emp_id,
max(decode(rank_id,1,address)) as add1,
Page 28 of 115
max(decode(rank_id,3,address))as add3
from
group by
emp_id
5. Rank query:
Or
8. 2 nd highest Sal:
9. Top sal:
Select * from EMP where sal= (select max (sal) from EMP);
Starting at the root, walk from the top down, and eliminate
employee Higgins in the result, but
FROM employees
3 DWH CONCEPTS
What is BI?
Business Intelligence refers to a set of methods and techniques
that are used by organizations for tactical and strategic decision
making. It leverages methods and technologies that focus on
counts, statistics and business objectives to improve business
performance.
Page 30 of 115
Subject Oriented:
Integrated:
Time-variant:
Non-volatile:
What is a DataMart?
Page 31 of 115
What is a Schema?
Fact table contains primary keys from all the dimension tables
and other numeric columns columns of additive, numeric facts.
Page 33 of 115
Page 34 of 115
Page 35 of 115
Types of facts?
What is Granularity?
Page 36 of 115
Dimensional Model
Page 37 of 115
Dimensional Model:
Page 39 of 115
Data modeling
• No attribute is specified.
Page 40 of 115
6. Normalization.
At this level, the data modeler will specify how the logical data
model will be realized in the database schema.
9. http://www.learndatamodeling.com/dm_standard.htm
Entity Table
Attribute Column
Definition Comment
Page 42 of 115
Page 43 of 115
Page 45 of 115
Page 46 of 115
ACW_PRODUCT S_D
Columns
ACW_DF_APPROVAL_STG
PRODUCT_KEY NUMB ER(10) [P K1]
Columns
PRODUCT_NAME CHAR(30)
INVENTORY_IT EM_ID NUMB ER(10) BUS INESS_UNIT _ID NUMB ER(10)
CISCO_PA RT _NUM BERCHAR(30) ACW_DF_APPROVA L_F ACW_PART _T O_PID_D
BUS INESS_UNIT VARCHAR2(60)
LATEST _REV CHAR(10) Columns Columns
PRODUCT_FAM ILY_ID NUMB ER(10)
PCBA_IT EM_FLAG CHAR(1) DF_APPROVAL_KEY NUMBER(10) [P K1] PART _T O_PID_KEY NUMB ER(10) [P K1]
PRODUCT_FAM ILY VARCHAR2(180)
APPROVAL_FLAG CHAR(1) PART_K EY NUMBER(10) PART _KEY NUMB ER(10)
IT EM_T YPE CHAR(30)
APPROVAL_DA TE DAT E CISCO_PART _NUMBE R CHA R(30) CISCO_PA RT_NUMBERCHA R(30)
D_CREA TED_BY CHAR(10)
LOCAT ION_ID NUMB ER(10) SUP PLY _CHANNE L_KEYNUMBER(10) PRODUCT _KEY NUMB ER(10) D_CREA TION_DATE DAT E
SUPPLY_CHANNE L CHAR(10) PCB A_IT EM_FLAG CHA R(1) PRODUCT _NAME CHA R(30)
D_LAST _UPDAT E_BY CHAR(10)
BUYER VARCHAR2(240) APP ROV ED CHA R(1) LATEST _REVIS ION CHA R(10)
D_LAST _UPDAT ED_DAT CEHAR(10)
BUYER_ID NUMB ER(10) APP ROV AL_DAT E DAT E D_CREAT ED_BY CHA R(10)
RFQ_CREATED CHAR(1) BUY ER_ID NUMBER(10) D_CREAT ION_DAT E DAT E
RFQ_RESPONSE CHAR(1) RFQ_CREAT ED CHA R(1) D_LAST _UPDAT ED_BYCHA R(10)
CSS CHAR(10) RFQ_RE SPONSE CHA R(1) D_LAST _UPDAT E_DAT E
DAT E
CSS CHA R(10)
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E
D_LAST _UPDAT ED_BY CHA R(10)
D_LAST _UPDAT E_DAT EDAT E
ACW_SUPPLY_CHANNEL_D
Columns
SUP PLY _CHANNE L_KEYNUMB ER(10) [P K1]
SUP PLY _CHANNE L CHA R(60)
DES CRIPT ION VARCHAR2(240)
LAST _UPDAT ED_BY NUMB ER
LAST _UPDAT E_DATE DAT E
CRE ATED_BY NUMB ER(10)
CRE ATION_DATE DAT E
D_LAST_UPDAT ED_BY CHA R(10)
D_LAST_UPDAT E_DAT EDAT E
D_CREA T ED_BY CHA R(10)
D_CREA T ION_DAT E DAT E
Users
Page 47 of 115
Customer
Name State
Key
Customer
Name State
Key
Advantages:
Disadvantages:
- Usage:
Page 48 of 115
Customer
Name State
Key
Customer
Name State
Key
Advantages:
Disadvantages:
- This will cause the size of the table to grow fast. In cases
where the number of rows for the table is very high to start
with, storage and performance can become a concern.
Usage:
Page 49 of 115
Customer
Name State
Key
• Customer Key
• Name
• Original State
• Current State
• Effective Date
Advantages:
- This does not increase the size of the table, since new
information is updated.
Disadvantages:
Usage:
Data cleansing
Data merging
Data scrubbing
Page 52 of 115
Page 53 of 115
Informatica Transformations:
o Normalizer transformations
o COBOL sources
o XML sources
o Target definitions
o Other mapplets
Page 55 of 115
System Variables
1) Input group
Page 56 of 115
1) Connected
2) Unconnected
Page 57 of 115
Lookup Caches:
• Persistent cache
• Static cache
• Dynamic cache
Page 58 of 115
NewLookupRo
Description
w Value
Page 60 of 115
Union Transformation:
1) Mapping level
2) Session level.
Aggregator Transformation:
Transformation type:
Active
Connected
Aggregate Expressions:
Aggregate Functions
(AVG,COUNT,FIRST,LAST,MAX,MEDIAN,MIN,PERCENTAGE,SUM,V
ARIANCE and STDDEV)
When you use any of these functions, you must use them in an
expression within an Aggregator transformation.
SQL Transformation
Transformation type:
Page 63 of 115
Connected
Page 64 of 115
Script Mode
ScriptErr Outp Returns errors that occur when a script fails for
or ut a row.
Transformation type:
Active/Passive
Connected
Transformation type:
Active
Connected
When you run the session, the Integration Service evaluates the
expression for each row that enters the transformation. When it
evaluates a commit row, it commits all rows in the transaction
to the target or targets. When the Integration Service evaluates
a roll back row, it rolls back all rows in the transaction from the
target or targets.
If the mapping has a flat file target you can generate an output
file each time the Integration Service starts a new transaction.
You can dynamically name each target flat file.
Page 66 of 115
Joiner Lookup
When both source and lookup When the source and lookup
are in same database we can table exists in different
use source qualifier. database then we need to use
Page 67 of 115
Page 68 of 115
Ans:
Page 69 of 115
this case drop constraints and indexes before you run the
• If the source and target data is same then I can make a flag
as ‘S’.
• Based on the flag values in router I can route the data into
insert and update flow.
Complex Mapping
• If the file size is greater than zero then it will send email
Page 74 of 115
• Or
Source qualifier will select the data from the source table.
• $DBConnection_Source
• $DBConnection_Target
• $InputFile
• $OutputFile
Page 75 of 115
• Parameter
Flat File
• Delimiter
• Fixed Width
List file:
We can use one mapping one session using list file option.
First we need to create the list file for all the files. Then we can
use this file in the main mapping.
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_
GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALE
S_HIST_AUSTRI]
$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfile
s/HS_025_20070921
$DBConnection_Target=DMD2_GEMS_ETL
$$CountryCode=AT
$$CustomerNumber=120165
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_
GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALE
S_HIST_BELUM]
$DBConnection_Sourcet=DEVL1C1_GEMS_ETL
$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/
HS_002_20070921
$$CountryCode=BE
$$CustomerNumber=101495
Page 77 of 115
Page 78 of 115
Page 79 of 115
2. IS starts ISP
Page 82 of 115
Load Balancer
Load Balancer
Page 83 of 115
Page 86 of 115
Page 87 of 115
Page 88 of 115
Page 90 of 115
Parameterfile:
[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_
GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALE
S_HIST_AUSTRI]
$DBConnection_Source=DMD2_GEMS_ETL
$DBConnection_Target=DMD2_GEMS_ETL
Page 91 of 115
Main mapping
Page 92 of 115
Workflod Design
Page 93 of 115
I want to generate the separate file for every State (as per
state, it should generate file).It has to generate 2 flat files and
name of the flat file is corresponding state name that is the
requirement.
Below is my mapping.
Source:
AP 2 HYD
AP 1 TPT
KA 5 BANG
KA 7 MYSOR
E
KA 3 HUBLI
When you have different sets of input data with different target
files created, use the same instance, but with a Transaction
Page 95 of 115
Source:
Enam EmpNo
e
stev 100
methe 100
w
john 101
tom 101
Target:
Ename EmpNo
Stev 100
methew
Page 96 of 115
Source:
Enam EmpNo
e
stev 100
Stev 100
john 101
Mathe 102
w
Output:
Target_1:
Ename EmpNo
Stev 100
Page 97 of 115
Mathew 102
Target_2:
Ename EmpNo
Stev 100
We can process all flat files through one mapping and one
session using list file.
First we need to create list file using unix script for all flat file
the extension of the list file is .LST.
Page 98 of 115
Page 99 of 115
We can put event wait before actual session run in the workflow
to wait a indicator file if file available then it will run the session
other event wait it will wait for infinite time till the indicator file
is available.
Solution:
Transformation Specifications
You can also use UPPER function on string columns but before
using it you need to ensure that the data is not case sensitive
(e.g. ABC is different from Abc)
If you are loading data from a delimited file then make sure the
delimiter is not a character which could appear in the data
itself. Avoid using comma-separated files. Tilde (~) is a good
delimiter to use.
Failure Notification
Port Standards:
Prefixed with: O_
Quick Reference
Aggregator AGG_<Purpose>
Expression EXP_<Purpose>
Filter FLT_<Purpose>
Rank RNK_<Purpose>
Router RTR_<Purpose>
Mapplet MPP_<Purpose>
connections.
Testing regimens:
1. Unit Testing
2. Functional Testing
UTP Template:
Actual Pass Test
Results, or ed
Fail By
SAP-
CMS
Inter
face
s
1 Check for SOURCE: Both the source and Should Pass Stev
the total target table load be same
count of SELECT count(*) record count should as the
records in FROM match. expected
source XST_PRCHG_STG
tables that
is fetched
and the TARGET:
total
records in Select count(*) from
the _PRCHG
PRCHG
table for a
perticular
session
timestamp
2 Check for select PRCHG_ID, Both the source and Should Pass Stev
all the target table record be same
target PRCHG_DESC, values should return as the
columns zero records expected
whether DEPT_NBR,
they are
getting EVNT_CTG_CDE,
populated
correctly PRCHG_TYP_CDE,
with
source PRCHG_ST_CDE,
data.
from T_PRCHG
MINUS
select PRCHG_ID,
PRCHG_DESC,
DEPT_NBR,
EVNT_CTG_CDE,
PRCHG_TYP_CDE,
PRCHG_ST_CDE,
from PRCHG
3 Check for Identify a one record It should insert a Should Pass Stev
Insert from the source record into target table be same
strategy which is not in target with source data as the
to load table. Then run the expected
records session
into target
table.
4 Check for Identify a one Record It should update record Should Pass Stev
Update from the source into target table with be same
strategy which is already source data for that as the
to load present in the target existing record expected
records table with different
into target PRCHG_ST_CDE or
table. PRCHG_TYP_CDE
values Then run the
session
5 UNIX
cd /pmar/informatica/pc/pmserver/
/pmar/informatica/pc/pmserver/pmcmd startworkflow -u
$INFA_USER -p $INFA_PASSWD -s $INFA_SERVER:
$INFA_PORT -f $INFA_FOLDER -wait $1 >> $LOG_PATH/
$LOG_FILE
Page 110 of 115
3) And also file watch mean that if indicator file available in the
specified location then we need to start our informatica jobs
otherwise will send email notification using
Basic Commands:
Cat file1 (cat is the command to create none zero byte file)
cat file1 file2 > all -----it will combined (it will create file if it
doesn’t exit)
cat file1 >> file2---it will append to file 2
ps -A
Crontab command.
Page 111 of 115
25 22 15 * 0 /usr/local/bin/backup_jobs
who | wc -l
$ ls -l | grep '^d'
Pipes:
of another.
ls –a
find command
find -name aaa.txt Finds all the files named aaa.txt in the
current directory or
find / -name vimrc Find all the files named 'vimrc' anywhere
on the system.
Find all files whose names contain the string 'xpilot' which
echo $SHELL
#!/usr/bin/sh
Or
#!/bin/ksh
It's to tell your shell what shell to you in executing the following
statements in your shell script.
Interactive History
A feature of bash and tcsh (and sometimes others) you can use
Opening a file
Page 114 of 115
Creating text
Edit modes: These keys enter editing modes and type in the
text
of your document.
r Replace 1 character
R Replace mode
Deletion of text
:q Quit.