Agenda
Data Quality Overview Release 9.1 Architecture Overview Informatica Analyst Address Validation Overview Data Quality Matching PowerCenter Integration
Powerful analysis, cleansing, matching, exception handling, reporting and monitoring capabilities that enable IT and the business to manage enterprise-wide data quality initiatives.
4 4
Frequent Requirements
Matching Monitoring & Parsing Address Data Analysis De-duplication & & Discovery and Validation Standardization Reporting
Profile your existing data and systems Be able to identify patterns, formats, schema, and data quality issues Drill down into actual data Create rules by example as you profile the data
6 6
Data Steward
Increase productivity and efficiency by enabling the business to proactively take responsibility for data quality and reduce their reliance on IT.
7 7
8 8
Brand iPod
Size 4GB Color Red
CountryCode
Currency US Dollar
ShortName ATT
CountryCode US
Currency USD
Phone 415.555.1212
Phone +1 (415)555-1212
Bobs Assistant
9 9
Address Validation
Validate or correct addresses for over 240 countries Have reference data from international postal agencies Validate worldwide data in one environment Be continuously maintained with worldwide post offices and databases
10 10
Address Validation
Address1 7887 KATY FRWY Address2 SUITE 333 Address3 HUSTEN Address4 TX Address5 99999
City Houston
County Harris
StateCode TX
StateName Texas
ZIP 77024
ZIP4 2005
Latitude 29.283427
Longitude -95.46802
12 12
Description
Sailors Desk Lamp Nautical Lamp Sailoring Lamp
Size
12 in 12 inch 1 Foot
Price
27.99 27.99 34.99
Intrinsically wrong (and potentially uncorrectable) data can still be valuable for Matching purposes Alternate or Nicknames Misspellings Invalid Data
City
E. Hartford Easthartford Hartford East HartfordCT Hartford CT
Name
W. S. Harrison II PhD William Stuart Harison William Stewart Harison Doctor Bill Harisen jr Harrisen William Doctor
DOB
1/33/1967 1/3/1967 9/9/99 1/13/1967
Address
Medical Center,117/2A #17497 Jackson 117- 2a Jacksen Rd. 117 Jackson Road. Suite 2A 117 Jacson Room 2a 2a Jackson Rd #174978
State Zip
NY CT CT 16987 06987 06987 6984 06987-4573
Monitoring Quality
Stakeholders need to be aware
Current quality metrics Alerts if quality thresholds are not being met
14 14
Business Empowerment
Simple-to-use browser-based tools
Designed for the tasks and skills of business data stewards and analysts Purpose-built, web-based UI for fast ramp-up Scorecarding & trending View business, not technical, representations Interact with data directly through profiling, rule validation, and scorecarding
Business Manager
Work with relevant data to meet business needs while reducing reliance on IT
16 16
IDQ V9.1
17
17
Release 9 DQ Architecture
Analyst Service Informatica Analyst
Informatica Developer
Informatica Administrator
ISP
Profile Warehouse
Profile Service
Mapping Service
SQL Service
Integration Service
Mapping Designer
Metadata Manager
MM Warehouse
18 18
Node configuration
MRS can run on one node but is not a highly available service Multiple MRS Services can run on the same node If MRS fails, it automatically restarts on the same node
19 19
Mapping execution using embedded Data Transformation Machine (DTM - Interprets and executes mappings)
20 20
21 21
22 22
Integration with LDAP, Active Directory Provides a set of core services used internally
Authentication Service, Name Services etc.
23 23
Informatica Analyst
24
24
Collaboration via
Metadata Bookmarks (URLs) Profile Comments Bi-directional Sharing of Rules with 9.1 Developer Sharing of RTM dictionaries
Profiling
Column Profiling Rule Profiling Rule creation/editing
Mapping Specification
Define business logic that populates a target table Configure the sources, target, rules, filters, and joins to transform the data
25 25
Profiling
View Profile Statistics Value / Pattern Frequency Analysis
Drilldown Analysis
26 26
27 27
Scorecards
Add Profile Columns to a Scorecard
28 28
Scorecards
Run the Scorecard View Trend Chart
29 29
30
30
Data Quality 9
OOTB RTM Dictionaries
Fee-Based Content
Geocoding Database Subscriptions
Country Packages
31 31
Step 1 - Transliteration
Transliterate
Parse
Format
63 105 52 GREECE
Transliteration
32 32
Step 2 Parsing
Transliterate
Parse
Format
Parsing
House number: 7031 Street: Columbia Gateway Dr Sub-Building: Suite 101 City: Columbia State: MD ZIP: 21046 Country: USA
33 33
Step 3 Correction
Transliterate
Parse
Format
Correction
34 34
Step 4 - Formatting
Transliterate
Parse
Format
Micros-Fidelio House 6-8 The Grove Slough SL1 1QP Great Britain
63 105 52 GREECE
Step 5 Enrichment
Transliterate
Parse
Format
Input address
Geo Coordinates
Enrichment
36 36
37
37
Bigram Distance
Longer multi token text strings
Hamming Distance
Numeric, date & code
Edit Distance
Strings of arbitrary length
38 38
Sample Rules
Category Name
Noise Word Company Word Delete Company Word Skip Personal Title Delete Nickname Replace Diminutives Nickname Replace Secondary Lookup Word is Deleted Word is Deleted Word is marked Skip Word is Deleted Word and its Diminutives are Replaced Word is Replaced Word generates additional search ranges
Rule Type
e.g. THE, AND
Examples
e.g. INC, LTD, CO e.g. DEPARTMENT, ASSOCIATION e.g. MR, MRS, DR, JR e.g. CATH(E,IE,Y) => CATHERINE e.g. MIKE => MICHAEL e.g. AL => ALBERT, ALFRED
39 39
Weights 0.763 Define the threshold that must be met before records will be output as a possible match
40 40
Classic Matching
The Identity Population rules overcome this limitation without the need to cleanse or standardize data
Identity Matching
41 41
Cluster ID
42 42
PowerCenter Integration
43
43
Scalability
PowerCenter Grid
44 44
45 45
DQ v9.x in PC Upgrade
When upgrading only IDQ to 9.1:
9.1 PC Integration installers need to be run on PC side to allow DQ 9.1 mappings in PWC 9.0.1 / 8.6.1
46 46
47 47
48 48