Anda di halaman 1dari 364

IBM Information Management Software

Front cover

Master Data Management


IBM InfoSphere Rapid Deployment Package
Implementing faster to see the benefits faster Seeing benefits with a financial services scenario Getting control of your data environment

Chuck Ballard Priyanka Deswal Paul Flores Philippe Guitard Charles Jia Marty Pittman Neeraj Singh Lena Woolf

ibm.com/redbooks

International Technical Support Organization Master Data Management: IBM InfoSphere Rapid Deployment Package April 2011

SG24-7704-01

Note: Before using this information and the product it supports, read the information in Notices on page vii.

Second Edition (April 2011) This edition applies to the Rapid Deployment Package (RDP) solution for IBM InfoSphere Master Data Management (MDM) Server Version 9.0.1 and Versions 8.0.1, 8.1.0.1 and 8.5 of IBM InfoSphere Information Server.

Copyright International Business Machines Corporation 2009, 2011. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . xiv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Chapter 1. Overview of the Rapid Deployment Package for MDM . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 The case for the RDP for MDM Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Determining whether RDP is the right choice for you . . . . . . . . . . . . . . . . . 5 Chapter 2. Rapid Deployment Package details. . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 MDMIS Parameter Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 DL_000_AutoStart_PS_DELTA_LOAD Job Sequence . . . . . . . . . . . 8 2.2.2 DL_000_DELTA_LOAD Job Sequence . . . . . . . . . . . . . . . . . . . . . . 15 2.3 Standard Interface File (SIF) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Suspect Duplicate Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.5 Configuration screens in the MDM Server UI . . . . . . . . . . . . . . . . . . . . . . 19 2.5.1 Enabling and disabling Suspect Duplicate Processing . . . . . . . . . . . 19 2.5.2 Selecting the set of Party Match criteria . . . . . . . . . . . . . . . . . . . . . . 20 2.6 Database Load options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 3. RDP MDM: Direct Load. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1 Direct Load process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.2 Job Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Import . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4 Data Quality Assurance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.5 Data Quality Error Consolidation / Reporting . . . . . . . . . . . . . . . . . . . . . . 45 3.6 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.7 ID assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.8 Data insert and update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Copyright IBM Corp. 2009, 2011. All rights reserved.

iii

Chapter 4. RDP for MDM: Delta Load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.2 MDM Party Maintenance Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.1 The instance resolution problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2.2 MDM Party Maintenance Services behavior . . . . . . . . . . . . . . . . . . . 68 4.2.3 MDM Party Maintenance Services Transaction List . . . . . . . . . . . . . 74 4.2.4 MDM Party Maintenance Services Profile. . . . . . . . . . . . . . . . . . . . . 87 4.2.5 MDM Party Maintenance Services installation . . . . . . . . . . . . . . . . . 89 4.3 MDM RDP Runtime Assets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.3.1 SIF Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.3.2 Data extension and SIF Parser configuration . . . . . . . . . . . . . . . . . . 99 4.3.3 SIF sequencer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.3.4 QualityStage runtime standardization and matching jobs . . . . . . . . 105 4.3.5 Search Suspect Candidates rule. . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.6 Disable phonetic keys generation in MDM Server . . . . . . . . . . . . . 106 4.3.7 MDM RDP Runtime Assets installation . . . . . . . . . . . . . . . . . . . . . . 107 4.3.8 MDM Matching Critical Data Rules Console user interface . . . . . . 111 4.4 Performance tuning for MDM Delta Load using RDP . . . . . . . . . . . . . . . 114 4.4.1 MDM BatchProcessor configuration . . . . . . . . . . . . . . . . . . . . . . . . 115 4.4.2 MDM Server configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 4.4.3 WebSphere Application Server configuration . . . . . . . . . . . . . . . . . 119 4.4.4 Database tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 4.4.5 Information Services Director job configuration . . . . . . . . . . . . . . . 122 4.5 Run Delta Load for MDM using RDP . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.5.1 Create source SIF files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.5.2 Run SIF Sequencer Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 4.5.3 Run MDM BatchProcessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 4.5.4 Check Delta Load result and error messages . . . . . . . . . . . . . . . . . 128 Chapter 5. Financial services business scenario . . . . . . . . . . . . . . . . . . 131 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.2 Business requirement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 Environment configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 5.4 An approach to implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 5.5 Initial load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.5.1 FBankCoT checking, savings, and loans systems . . . . . . . . . . . . . 139 5.5.2 Data quality assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 5.5.3 Create canonical form from the data sources . . . . . . . . . . . . . . . . . 151 5.5.4 Validate and modify efficacy of the RDP MDM rule sets. . . . . . . . . 169 5.5.5 Create SIF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.5.6 Execute RDP for MDM jobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 5.5.7 Verify successful load . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 5.6 Suspect resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235

iv

Master Data Management: IBM InfoSphere Rapid Deployment Package

5.7 Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 5.7.1 Hierarchy overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 5.7.2 Hierarchy scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 5.8 MDM consumption application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 5.9 Operational processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272 Appendix A. Configuration parameter file . . . . . . . . . . . . . . . . . . . . . . . . 275 Appendix B. Standard Interface File details . . . . . . . . . . . . . . . . . . . . . . . 295 Appendix C. MDM customization considerations . . . . . . . . . . . . . . . . . . 309 C.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 C.2 Data extensions and additions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 C.3 Behavior extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 C.4 Impact of data/behavior extensions on RDP for MDM . . . . . . . . . . . . . . 312 C.5 Extending RDP for MDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 C.6 Runtime column propagation (RCP). . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 C.7 Adding new elements (columns). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 C.8 Modifying existing elements (columns). . . . . . . . . . . . . . . . . . . . . . . . . . 316 Appendix D. Error processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 D.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 D.2 Pipe character (|) in the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 D.3 Validation error with the code table . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 D.4 RT/ST/ADMIN_SYS_TP_CD error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 D.5 End of record missing error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 D.6 Start date after end date error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 D.7 Date format error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 Appendix E. Additional material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Locating the web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 Using the web material. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 System requirements for downloading the web material . . . . . . . . . . . . . 338 How to use the web material . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338

Contents

vi

Master Data Management: IBM InfoSphere Rapid Deployment Package

Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

Copyright IBM Corp. 2009, 2011. All rights reserved.

vii

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: DataStage DB2 developerWorks IBM Information Agenda InfoSphere POWER5 pSeries QualityStage Rational Redbooks Redbooks (logo) Tivoli WebSphere

The following terms are trademarks of other companies: Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. The following company names appearing in this publication are fictitious: Fictional Bank Company T FBankCoT These names are used for instructional purposes only.

viii

Master Data Management: IBM InfoSphere Rapid Deployment Package

Preface
This IBM Redbooks publication documents the procedures for implementing an IBM InfoSphere Master Data Management (MDM) solution using the Rapid Deployment Package (RDP) for Master Data Management offering involving a typical financial services business scenario. It is aimed at IT architects, Information Management specialists, and Information Integration specialists responsible for implementing an IBM InfoSphere Master Data Management solution on a Red Hat Enterprise Linux 4.0 platform. This book is organized as follows: Chapter 1, Overview of the Rapid Deployment Package for MDM on page 1 provides an outline of the fundamentals of implementing an enterprise MDM solution with IBM InfoSphere Master Data Management Server, taking advantage of the RDP for MDM. Chapter 2, Rapid Deployment Package details on page 7 provides an overview of the RDP component of the IBM InfoSphere RDP for MDM solution. It includes the MDM Information Server (MDMIS) parameter set configuration details, a brief description of the Standard Interface File (SIF), the Duplicate Suspect Processing (DSP) configuration and database loading options. Chapter 3, RDP MDM: Direct Load on page 33 provides a high level description of the Direct Load process and the RDP components that are used in that process. The Direct Load process categories help to organize the presentation of the high-level descriptions of the related IBM DataStage and IBM QualityStage assets. Chapter 4, RDP for MDM: Delta Load on page 65 provides an overview of a Delta Load solution using IBM InfoSphere MDM Server (MDM) RDP Runtime Assets and MDM Party Maintenance Services. A Delta Load in RDP is the process of synchronizing changes in source system data with MDM Server. Because data is processed by MDM services during load, this solution provides the best level of business data validation, ease of implementation and maintenance, and highest MDM Server sustainability. The chapter provides implementation, configuration, and installation details about MDM RDP Runtime Assets and MDM Party Maintenance Services. Chapter 5, Financial services business scenario on page 131 describes an approach to implementing an IBM InfoSphere MDM Server using the InfoSphere RDP on a Linux platform. The scenario uses a fictitious financial services business as an example to explain the approach. The initial load of

Copyright IBM Corp. 2009, 2011. All rights reserved.

ix

the IBM InfoSphere MDM Server is performed with RDP for MDM DataStage and QualityStage (QS) jobs, and subsequent operational loads are performed using MDM Server RDP runtime assets. Appendix A, Configuration parameter file on page 275 classifies the various parameters into broad categories and sub-categories based on their function. It identifies the parameters in these categories that you must modify before the RDP for MDM jobs can be executed, and those that you should consider modifying. Appendix B, Standard Interface File details on page 295 provides an overview of the Record Type/Sub Type (RT/ST) mapping of the Standard Interface File (SIF). Appendix C, MDM customization considerations on page 309 describes the extensions supported by MDM Server and the impact of such extensions on the RDP for MDM jobs. Appendix D, Error processing on page 317 describes the most commonly encountered data-related problems in the SIF, and how they are highlighted in the RDP for MDM error log.

The team who wrote this book


This book was produced by a team of specialists from around the world working at the International Technical Support Organization (ITSO), San Jose Center. Chuck Ballard is a Project Manager at the International Technical Support organization, in San Jose, California. He has over 35 years of experience, holding positions in the areas of product engineering, sales, marketing, technical support, and management. His expertise is in the areas of database, data management, data warehousing, business intelligence, and process re-engineering. He has written extensively on these subjects, taught classes, and presented at conferences and seminars worldwide. Chuck has both a Bachelors degree and a Masters degree in Industrial Engineering from Purdue University.

Master Data Management: IBM InfoSphere Rapid Deployment Package

Priyanka Deswal is an Information Agenda Architect with IBM Software group, focused on designing solutions for customers in Asia Pacific. She has more than eleven years of experience in information management. Her areas of expertise include database technologies, information integration, master data management, content management and business analytics. She holds a bachelors degree in Computer Science and Engineering.

Paul Flores is a Senior Software Developer and Application Architect, and an IBM Certified Solution Developer for InfoSphere DataStage v8.5, located in Phoenix, AZ. His career spans more than 20 years in various Information System disciplines, working with companies such as Sandia National Laboratories, Intel and Acxiom. The majority of his career has centered around the development of software in support of diverse areas that span research and manufacturing. Paul joined IBM in 2008, and has been a part of the MDM RDP Development team since its inception. He holds a Bachelors degree in Mathematics and a Masters degree in Computer Information Sciences. Philippe Guitard is a Senior Software Developer with the MDM Server development team at IBM Canada. He has over 15 years of experience with software application development and data integration projects, and designed and developed several DataStage jobs for the RDP solution. Philippe has previously been a Senior Consultant, specialized in Enterprise Application Integration (EAI) with IBM WebSphere Transformation Extender, and developed credit scoring applications for Experian. Philippe holds a Masters degree in Computer Science from the Galile Institute, Paris, France. Charles Jia is a Senior Technical Specialist in the IBM Software Group with specialization in IBM InfoSphere MDM Server. His past experience has included leading roles in RDP for MDM Maintenance Services and MDM Development, and having primary responsibility for MDM features. He has over seven years of experience in developing MDM Server and four years of experience in IBM client services and consulting. Charles holds a Bachelors degree in Computer Science from Brock University in St. Catharines, ON Canada.

Preface

xi

Marty Pittman is an Information Management Architect with the IBM Software Group, located in Charlotte, NC. He has more than 17 years experience as an Information Management Solution Architect, and a Master Data Management and Data Warehousing / Business Intelligence Technical Specialist. Marty has broad experience in using his Financial Services background for discovering business problems and architecting information management solutions throughout all aspects of the Information Architecture, with a focus on master data management, data warehousing, data integration, and business analytics solutions in the banking and financial services industries. Marty holds a Bachelors degree in Finance. Neeraj Singh is currently a Senior Performance Engineer, and has been working on InfoSphere Master Data Management Server performance since June 2007. He has prior experience leading the Java technologies test team for functional, system, and performance tests as technical lead and test project leader. Neeraj joined IBM in 2000 and holds a Bachelors degree in Electronics and Communications Engineering.

Lena Woolf is a Senior Product Architect for the InfoSphere Master Data Management Server at the IBM Toronto Lab. She has over 12 years of experience in designing and developing enterprise applications for a wide range of industries, including banking, insurance, retail, and health care. Lena joined IBM in 2005 as part of the DWL acquisition, and since that time has been involved in the product architecture of MDM Server, playing a key role in six major releases of the product. She holds a Masters degree in Computer Science from the National Technical University of Ukraine.

xii

Master Data Management: IBM InfoSphere Rapid Deployment Package

Other contributors
Thanks to the following people for their contributions to this project.

From IBM locations worldwide


Tim Davis: Executive Director, Information Agenda Architecture Group, IBM Software Group, Information Management, Littleton, MA. Dickson Fu: Software Developer, IBM Software Group, Information Management, Markham, ON Canada. Christopher Grote: Technical Solution Architect, InfoSphere Centre of Excellence, IBM Software Group, London, UK. Clive Hannah: Information Agenda Architect, IBM Software Group, Information Management, Markham, ON Canada. Susan Laime: IM Analytics and Optimization Software Services, IBM Software Group, Information Management, Littleton, MA. Barry Rosen: Global Executive Architect, InfoSphere Information Agenda Group, Westford, MA.

From the ITSO, San Jose, CA


Mary Comianos: Publication Management Emma Jacobs: Graphics Ann Lund: Residency Administration Diane Sherman: Editor

Authors of the first edition of this book


The following list of authors wrote the first edition of this book. We thank them for their excellent work, much of which is still contained in this second edition. Nagraj Alur Alex Baryudin Mike Carney Priyanka Deswal Tim Davis Elizabeth Dial Norbert Eschle Clive Hannah Patrick Owen Barry Rosen Torben Skov

Preface

xiii

Now you can become a published author, too!


Here's an opportunity to spotlight your skills, grow your career, and become a published authorall at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html

Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks form found at: ibm.com/redbooks Send your comments in an email to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400

xiv

Master Data Management: IBM InfoSphere Rapid Deployment Package

Stay connected to IBM Redbooks


Find us on Facebook: http://www.facebook.com/IBMRedbooks Follow us on Twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html

Preface

xv

xvi

Master Data Management: IBM InfoSphere Rapid Deployment Package

Summary of changes
Summary of changes as created or updated on April 27, 2011. In this section, we provide a high-level summary of changes made to Master Data Management: Rapid Deployment Package for MDM, SG24-7704-00 to produce this second (updated) edition, Master Data Management: IBM InfoSphere Rapid Deployment Package, SG24-7704-01. This edition reflects the addition, deletion, or modification of new and changed information described below. However, it may also include minor corrections and editorial changes that are not identified.

New information
New information is as follows: Chapter 1, Overview of the Rapid Deployment Package for MDM on page 1: This new chapter provides an overview of the MDM solution and benefits to help position an MDM solution for IBM clients. It includes some information from the now-deleted Appendix A of the first edition. Chapter 3, RDP MDM: Direct Load on page 33: This new chapter describes the Direct Load process and the Rapid Deployment Package (RDP) components that are utilized in that process. Chapter 4, RDP for MDM: Delta Load on page 65: This new chapter describes the capability for Delta Loads for MDM. The first edition contained information about only the MDM Initial Load. Appendix A, Configuration parameter file on page 275: This new appendix provides additional information about customizing an MDM solution.

Changed information
Changed information is as follows: Chapter 2, RDP Detail from the first edition, has been updated and expanded, but the configuration parameters file information has been extracted and is now contained in the new Appendix A, Configuration parameter file on page 275. Chapter 3, Financial services business scenario, from the first edition, has been changed to Appendix 5, Financial services business scenario on page 131, and is updated in scope.

Copyright IBM Corp. 2009, 2011. All rights reserved.

xvii

xviii

Master Data Management: IBM InfoSphere Rapid Deployment Package

THIS PAGE INTENTIONALLY LEFT BLANK

SPONSORSHIP PROMOTION

Elevate Performance with NEC.


Now, more than ever, you need a strategic partner that will help you achieve your business goals. One with market strength and global stability plus one that will empower you with strategies that will help your business thrive. That partner is NEC. NEC is a leading provider of innovative IT, network and communications THIS PAGE INTENTIONALLY solutions for businesses across multiple vertical industries. As a Premier IBM Partner we deliver award-winning IBM Cognos, TM1, SPSS and InfoSphere solutions & services. With dual-shore capabilities, SmartPredict solutions and a highly successful delivery team NEC is your one stop partner for Business Analytics/Information Management. IOD 2011 Platinum Partner Booth P605. For more information, visit www.necam.com/performanceanalytics
2011 NEC Corporation. All rights reserved.

LEFT BLANK

THE ABOVE IS A PAID PROMOTION. IT DOES NOT CONSTITUTE AN ENDORSEMENT OF ANY OF THE ABOVE COMPANY'S PRODUCTS, SERVICES OR WEBSITES BY IBM. NOR DOES IT REFLECT THE OPINION OF IBM, IBM MANAGEMENT, SHAREHOLDERS OR OFFICERS. IBM DISCLAIMS ANY AND ALL WARRANTEES FOR GOODS OR SERVICES RECEIVED THROUGH OR PROMOTED BY THE ABOVE COMPANY.

Chapter 1.

Overview of the Rapid Deployment Package for MDM


In this book, we outline the fundamentals of implementing an enterprise Master Data Management (MDM) solution with IBM InfoSphere Master Data Management (MDM) Server, using the Rapid Deployment Package (RDP) for MDM. In this chapter, we briefly describe the RDP for MDM Server and discuss considerations for helping you determine whether RDP is the right choice for you.

Copyright IBM Corp. 2009, 2011. All rights reserved.

1.1 Introduction
The Rapid Deployment Package for MDM is a services offering that combines the pre-integration of IBM InfoSphere software with a prescriptive MDM implementation approach to significantly reduce the cost of MDM implementations, and reduce the overall risk. RDP can be rapidly deployed as the initial stage of an MDM Server deployment. The RDP MDM solution delivers a fully integrated solution that provides a Single View of the customer to your enterprise, whether the customer is defined as a customer, client, member, or citizen, as examples. Fully Integrated means that the solution is prepackaged with the IBM InfoSphere Information Server, IBM InfoSphere MDM Server Foundation, customizable integration assets (such as a pre-built set of Information Server jobs for data load and pre-built QualityStage Data Quality Rule Sets), and a fully articulated set of suggested practices and repeatable deployment standard practices. The content we provide in this book details the RDP MDM solution, to help you better understand the technical underpinnings, operational metrics, and deployment methods for the solution.

1.2 The case for the RDP for MDM Server


The Rapid Deployment Package for MDM (RDP) offering is designed for a first phase of MDM projects. At this stage, clients typically deploy MDM Server, as depicted in Figure 1-1 on page 3, in consolidation or coexistence styles of master data management, when data is loaded into an MDM repository, but most data changes are still coming from existing systems. RDP provides a prebuilt set of InfoSphere Information Server DataStage jobs for performing initial and delta loads directly to the MDM database: An initial load is an original movement of data from source systems into the MDM repository when the repository is empty. A delta load is a periodic (typically daily) data update from source systems into MDM.

Master Data Management: IBM InfoSphere Rapid Deployment Package

Source Systems
Source #1 Source #2 Source #N

MDM Server Information Server


DataStage QS MDM Business Serivces Duplicate Suspect Processing

User Interface and Reporting

Information Server
Information Analyzer Fast Track DataStage

SIF

Load Process DS Jobs

MDM Database

History

Figure 1-1 MDM Server implementation

MDM Server Business Services are used for data inquires, and the MDM Server Data Steward User Interface (DSUI) is used by Data Stewards to collapse the duplicate data that was not automatically collapsed during the load. After initial deployment, RDP projects can easily be expanded to become full centralized hubs. Deployment of the RDP MDM solution essentially involves the following steps: 1. Install and configure the RDP MDM solution in your UNIX or Linux environment. 2. Analyze and profile your source customer data. 3. Map source systems to the RDP MDM Standard Interface File (SIF). 4. Export source data as SIF. 5. If required, extend the model. The RDP price includes extending the model for up to 10 additional attributes and attribute lengths. 6. Configure RDP to feed source systems data as part of initial load and delta load processes. 7. Test and tune the system. Tune standardization and matching rules. 8. Deploy the solution into your production environment. The Rapid Deployment Package for MDM provides a prescriptive implementation approach with a pre-determined scope, which results in a predictable implementation timeline. The requirements for Master Data Management projects vary by client and therefore the implementation solutions vary to some extent. RDP is a pre-packaged solution that addresses the most typical set of client requirements and has a limited set of functionality, but still ensures an upgrade path to a full MDM solution. Next, we outline several key business requirements fulfilled by the RDP offering.

Chapter 1. Overview of the Rapid Deployment Package for MDM

Supported entities
The Rapid Deployment Package for MDM provides support for the following Party Domain MDM Server entities and their child objects: Party, Person, Organization, PersonName, OrganizationName, PartyAddress, Party-ContactMethod, PartyPrivPref, PartyIdentification, PartyValue, PartyLobRelationship, PartyAlert,PartyRelationship, AdminContEquiv. Contract, ContractAlert, ContractValue, AdminNativeKey Contract Component, ContractComponentValue ContractPartyRole, ContractRoleLocation Party Hierarchy

Solution to the Instance Resolution Problem


When data flows into InfoSphere MDM Server directly from external applications, such as established systems, the internal key is not known and often the nature of the data change is also not known. This issue, which is referred to as an Instance Resolution Problem, requires that the following information be determined: Which party or contract are you working with? Is data being added or updated? If you are trying to update, what instance do you want to update when multiple names or addresses, multiple contact methods or identifiers, multiple contract components, multiple party roles, and so on, exist? RPD solves the Instance Resolution Problem in the following way: The Unique Party Contact Equivalency key is used to identify the party you are working with. The contact equivalent data for a party cannot be changed. However, a party can have multiple rows in the contact equivalent table. To identify a contract, RDP uses the source system key stored in either the contract table or the native key table. The determination of which to use is an implementation decision and applies to the entire implementation. That is, all contracts are identified through one or the other but not a combination of both. For child objects, the record instance is identified by the implied business key which includes a type. Using person names as an example, MDM Server supports multiple legal names, multiple alias names, multiple preferred names, and so forth. With RDP, a party can have only one legal name, one alias name, one preferred name, and so forth. When you provide name information, you must also provide the party cross reference in the form of a Contact Equivalency key and the type of name (such as legal and alias). With

Master Data Management: IBM InfoSphere Rapid Deployment Package

this information, determining whether a name must be added, or which specific name must be updated, is then possible.

Suspect Duplicate Processing


Several characteristics of Suspect Duplicate Processing (SDP), which is searching for, matching, and creating associations, or suspects, between existing parties in the system, are as follows: The matching rule for parties is implemented as a QualityStage job and supports configurable matching attributes. Matching weights are calculated by QualityStage. The suspect duplicate candidate selection algorithm differs from the algorithm that is provided with the default candidate selection rule in MDM Server. Auto-collapsing of exact duplicates can be turned on or off. Auto-collapsing as part of direct initial load has implications in terms of data lineage within the MDM solution.

1.3 Determining whether RDP is the right choice for you


There are three basic rules, listed here in the form of questions, that can help you determine whether the Rapid Deployment Package for MDM is the right choice for your first MDM Server implementation. RDP might be the right choice for you if you can answer yes to all three of the following questions (rules): 1. Do you have a large data volume to be loaded in a short time? 2. Does RDP meet your customer requirements for Party domain? Rule 1, the first question, says to consider using the RDP solution for data load only if the batch load using services cannot meet your load time requirements. The typical suggested approach for loading data into MDM Server is to use maintenance runtime services and the MDM Server Batch processor for both initial data loads and delta loads. Be sure to turn off SDP during initial load. This initial load approach provides both good performance, and ease of implementation and maintenance. This approach requires Evergreening, which is performing bulk processing in batch mode, for SDP after the initial load and prior to delta load. Even if you have selected to use the RDP solution because of the large data volume for initial load, consult Rule 1 again when choosing a solution for delta load. Delta load typically contains significantly less data and therefore the load

Chapter 1. Overview of the Rapid Deployment Package for MDM

through services is a better choice. We strongly suggest using maintenance runtime services and MDM Server Batch processor for delta load. Rule 2, the second question, says that RDP is a suitable solution if the customer requirements for Party domain fit within the requirements outlined in 1.2, The case for the RDP for MDM Server on page 2. The business logic for the instance resolution problem and duplicate suspect processing is embedded within numerous InfoSphere DataStage jobs. Significant modifications to the business logic can take time and negate an important benefit of the RDP solution, which is a predictable implementation timeline. The following are several examples: Adding support for new MDM Server entities implemented as additions would require development of several new DataStage jobs. Changing business keys (to support multiple legal names, for example) would require modifications to multiple DataStage jobs. Although QualityStage matching and standardization rule sets are shared between RDP and MDM Server run time, other changes in suspect duplicate processing logic would have to be implemented in both DataStage jobs and in Java rules for MDM Server run time. For example, changes to the blocking logic in QualityStage would have to be implemented as a new Java candidate selection rule to be invoked from MDM Server services as part of the MDM Server DSUI. The DSUI is used by Data Stewards to collapse the duplicate data that was not auto-collapsed during the load. Also note that because RDP is a services offering with pre-built assets, after the assets are customized and part of the solution, they are not supported, the same as any assets written by services engagements. Rule 3, the third question, says that you should check with the RDP Assets availability matrix to ensure that RDP is supported on your platform of choice for an MDM Server solution. Not all MDM Server supported platforms are available with RDP.

Master Data Management: IBM InfoSphere Rapid Deployment Package

Chapter 2.

Rapid Deployment Package details


This chapter provides an overview of the Rapid Deployment Package (RDP) component of the IBM InfoSphere RDP for Master Data Management (MDM) solution. It includes the MDM Information Server (MDMIS) parameter set configuration details, a brief description of the Standard Interface File (SIF), the Duplicate Suspect Processing (DSP) configuration, and Database loading options.

Copyright IBM Corp. 2009, 2011. All rights reserved.

2.1 Introduction
The RDP for MDM solution consists of various components that either require specific configuration or offer customization options. This chapter provides a detailed review of the various configuration tasks that are required when setting up the following components: MDMIS Parameter Set Standard Interface File (SIF) Suspect Duplicate Processing Database load options

2.2 MDMIS Parameter Set


The execution of the RDP for MDM jobs is driven by configuration parameters that you can customize to suit the unique requirements of your organization. The two ways of running the RDP jobs in which the configuration of the MDMIS parameter set differ are as follows: DL_000_AutoStart_PS_DELTA_LOAD Job Sequence DL_000_DELTA_LOAD Job Sequence

2.2.1 DL_000_AutoStart_PS_DELTA_LOAD Job Sequence


If you execute either the DL_000_AutoStart_PS_DELTA_LOAD or the DL_000_AutoStart_PS_HIERARCHY job sequences, it performs the following tasks: 1. Runs the IL_000_PS__Prestart sequence job to extract the following items: Configuration parameters from the CONFIGELEMENT table in MDM, Database settings from the MDM_CONNECTIONS parameter set, Default configuration parameter file (named STATIC_MDMIS) 2. Creates a temporary configuration file (named VOLATILE_MDMIS) where the parameters values coming from the CONFIGELEMENT table override the corresponding parameter values in the STATIC_MDMIS configuration file. The BATCH_ID and DS_PROCESSING_DATE parameters are automatically set. The newly generated VOLATILE_MDMIS parameter value file is then passed to the DL_000_AutoStart_PS_DELTA_LOAD or DL_000_AutoStart_PS_HIERARCHY sequence jobs. The default configuration parameter file STATIC_MDMIS is unchanged and a copy of the VOLATILE_MDMIS parameter set value file is created as another value file named VOLATILE_MDMIS.<Batch number> to keep a

Master Data Management: IBM InfoSphere Rapid Deployment Package

history of all the parameter set value files used in RDP runs. Figure 2-1 shows the parameter set value files that are used in RDP. Note: A temporary TEMP_MDMIS parameter set value file is automatically created during the IL_000_PS__Prestart job sequence in preparation to the VOLATILE_MDMIS parameter set value file creation. This parameter set value file does not need to be maintained.

Figure 2-1 Parameter value files used in RDP

MDMIS parameter set configuration


To get the MDMIS parameter set ready for runtime, the following steps are necessary: 1. Set the MDM_CONNECTIONS parameters with the correct information to connect to the MDM database. You can create separate parameter value files to be able to run RDP against various databases. Figure 2-2 shows an example.

Figure 2-2 MDM_CONNECTIONS parameter set value file names

Chapter 2. Rapid Deployment Package details

Note: Be sure that the DS_VALUE_FILE_NAME parameter is matching the parameter set Value File Name (DEV, QA and PROD in Figure 2-2) because the IL_000_PS__Prestart job sequence is using the DS_VALUE_FILE_NAME parameter to retrieve the correct MDM_CONNECTIONS parameter set value file to merge with the STATIC_MDMIS parameter set at runtime. 2. Prepare the STATIC_MDMIS parameter set value file in the MDM parameter set. The best approach is to update the default values as shown in Figure 2-3, and then delete the existing STATIC_MDMIS parameter set value file, if any, and re-create it to ensure the newly created parameter set contains the correct values taken from the Default Value column.

Figure 2-3 MDMIS parameter set default values

Note: All the parameters that are linked to the MDM CONFIGELEMENT table will be retrieved from that table at runtime, therefore, setting set a Default Value for those is not necessary. To determine which parameters are linked to the MDM CONFIGELEMENT table, simply look at the Help Text column to see if the CONFIGELEMENT= parameter is present. See Figure 2-4 for an example.

10

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 2-4 MDMIS parameter set help text

3. Populate the parameter values stored in the CONFIGELEMENT table in MDM. There are two different ways to do so: Using the CFG_Config job sequence. Using the MDM Management Console.

Using the CFG_Config job sequence


In the RDP DataStage jobs, there is a job category (folder) called Jobs Configuration, containing two jobs as shown in Figure 2-5: A job sequencer called CFG_Config A job called CFG_Update_CM_Params

Figure 2-5 Configuration Jobs

To use the CFG_Config job sequence, all the parameters in the MDMIS parameter set that will be assigned values from the CONFIGELEMENT table must be updated. To know which parameters to update, look for the existence of the string CONFIGELEMENT= in the Help Text column in the MDMIS parameter set, as shown in Figure 2-4. The format of the Help Text column in the MDMIS parameter is as follows: <Help text>. CONFIGELEMENT=<Parameter Name in the CONFIGELEMENT table>=<Parameter Value>

Chapter 2. Rapid Deployment Package details

11

Example 2-1 shows a sample of the FS_DATA_SET_HEADER_DIR parameter help text.


Example 2-1 Help Text column for the FS_DATA_SET_HEADER_DIR parameter

Dataset headers directory. CONFIGELEMENT=/IBM/ELMDM/IIS/Install/ISDataSetHeaders/path=/mdmisdata03 /Projects/MDMDLINT2/DATA/ In some cases, the Help Text column contains multiple CONFIGELEMENT parameter names separated by a tilde (~) symbol, as shown in Example 2-2. These CONFIGELEMENT parameters are exclusively QualityStage parameters and should be set through the MDM Server UI. See 2.4, Suspect Duplicate Processing on page 19 for more details.
Example 2-2 Help Text column for the QS_MATCH_PERSON_1 parameter

Specify Variable Match String type and TpCd for person - C1 (default). CONFIGELEMENT=/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatch String1/type=C~/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatc hString1/TpCd=1 Important: When all the parameters Help Text CONFIGELEMENT values have been set in the MDMIS parameter set, the CFG_Config job sequence must be recompiled to be sure the MDMIS parameter set metadata (Help Text column contents in particular) are available to the job sequence at runtime.

12

Master Data Management: IBM InfoSphere Rapid Deployment Package

To run the CFG_Config job sequence, the MDM deployment name of the MDM instance to update and valid database connection settings from the MDM_CONNECTIONS parameter set must be provided, as shown in Figure 2-6.

Figure 2-6 CFG_Config job sequence run options

Using the MDM Management Console


Another way of preparing the MDM CONFIGELEMENT table for RDP is to use the Management Console command line tool provided by MDM Server. The Management Console needs to be run from the MDM Server which is pointing to the CONFIGELEMENT table needing to be prepared.

Chapter 2. Rapid Deployment Package details

13

Using Example 2-1 on page 12, the syntax to use the Management Console would be as shown in Example 2-3.
Example 2-3 Updating CONFIGELEMENT using the MDM Management Console

LOG_FILE="setparam.log" MANAGEMENT_CONSOLE_PATH=/usr/IBM/MDM80/HD_MDM80_01292008_0230_DB2_BE01/ ManagementConsole APPLICATION_NAME='WebSphere Customer Center' APPLICATION_VERSION=8.0.0.0 DEPLOYMENT_NAME='WebSphere Customer Center' PARAM_PATH='/IBM/ELMDM/IIS/Install/ISDataSetHeaders/path' PARAM_VALUE='/mdmisdata03/Projects/MDMDLINT2/DATA/' $MANAGEMENT_CONSOLE_PATH/console.sh -file $MANAGEMENT_CONSOLE_PATH/scripts/jacl/modifyConfigItem.jacl "$APPLICATION_NAME" $APPLICATION_VERSION "$DEPLOYMENT_NAME" $PARAM_PATH $PARAM_VALUE >> $LOG_FILE 2>&1 The script in Example 2-3 must be customized to go through all the parameters needing to be updated in the CONFIGELEMENT table.

Summary
After either running the CFG_Config job sequence or updating the CONFIGELEMENT table using the Management Console, the DL_000_AutoStart_PS_DELTA_LOAD or DL_000_AutoStart_PS_HIERARCHY job sequences are ready to be used.

14

Master Data Management: IBM InfoSphere Rapid Deployment Package

2.2.2 DL_000_DELTA_LOAD Job Sequence


If you execute the job sequence DL_000_DELTA_LOAD or DL_200_Hierarchy, you are responsible for creating your own MDMIS parameter value file and providing it as input to the job sequence. You can give it a name of your choosing, as shown in Figure 2-7.

Figure 2-7 User defined parameter set value files

Attention: In a production environment, we do not recommend the use of the DL_000_DELTA_LOAD or DL_200_Hierarchy sequences because they require manual modifications that might result in runtime errors or data quality deterioration. The use of the DL_000_AutoStart_PS_DELTA_LOAD and DL_000_AutoStart_PS_HIERARCHY (as described in 2.2.1, DL_000_AutoStart_PS_DELTA_LOAD Job Sequence on page 8) ensures a proper BATCH_ID naming convention and the use of parameters from the CONFIGELEMENT table to stay in sync with the parameters that are used by MDM itself.

Note: The DL_000_DELTA_LOAD or DL_200_Hierarchy sequences are suitable for testing various options using specific configuration parameter sets involving certain combinations of options, especially during a quality assurance (QA) process. A more cumbersome approach is to use the DL_000_AutoStart_PS_DELTA_LOAD and DL_000_AutoStart_PS_HIERARCHY sequences in such cases, because you would have to constantly modify the CONFIGELEMENT table for each test/development run. Unlike the DL_000_AutoStart_PS_DELTA_LOAD or DL_000_AutoStart_PS_HIERARCHY job sequences, the following parameters

Chapter 2. Rapid Deployment Package details

15

are not automatically filled in and therefore must be manually entered to execute the RDP jobs: BATCH_ID. Database parameters (starting with DB_). DS_PROCESSING_DATE. Parameters retrieved from the CONFIGELEMENT table. When creating a new MDMIS parameter set value file, all the parameter values are taken from the Default Value column, as shown in Figure 2-3 on page 10, which are provided with the RDP package. The preferred approach is to update the default values according to your needs, then create a new parameter set value file because editing the default values in a tabular form is easier than editing the parameter value files, which are presented on a single line, as shown in Figure 2-7 on page 15. See Appendix A, Configuration parameter file on page 275 for more details about the parameters to be filled in. When a MDMIS parameter set value file is ready, the DL_000_DELTA_LOAD or DL_200_HIERARCHY job sequences are ready to be used.

2.3 Standard Interface File (SIF)


The SIF is the file interface where you provide data to be loaded to the MDM Server through RDP. The SIF is a delimited ASCII file that contains data input to the load process. The default delimiter is the pipe character (|). The file is a multi-record format flat file with a record type code in the first field and a sub-record type code in the second field following the separator. Important: The SIF files must be in DOS format (records delimited by <CR><LF> characters) regardless of the platform RDP is running on. Each record type/sub-record type (also referred to as RT/ST) combination has a unique layout (metadata). The record type identifies the primary subject areas, which are Contact (P) and Contract (C). The Contact and Contract RT/ST combinations are listed in Table 2-1 on page 17.

16

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table 2-1 Contact (P) and Contract (C) RT/ST combinations Record Type/Sub-record Type Content

Contact record type (P) and sub-record type PP PO PG PH PE PA PC PI PB PR PM PR PM PS PT Person Contact Organization Contact Organization Name Person Name External Match Address Contact Method Identifier Line of Business Relationship Contact Relationship Person Miscellaneous Value Contact Relationship Person Miscellaneous Value Privacy Preference Person Alert

Contract record type (C) and sub-record type CH CK CC CR CL CV CM CT Contract Native Key Contract Component Contract Component Role Role Location Contact Component Value Contract Misc Value Contract Alert

Chapter 2. Rapid Deployment Package details

17

The record layout of the SIF is as follows: <RECORD_TYPE> | <SUBRECORD_TYPE> | <DATA><CR><LF> In the record layout, <CR><LF> is the mandatory DOS line feed character. The following considerations apply to the content of the SIF: The columns within the <DATA> section should be separated by a pipe character (|), with a pipe character following the last data element. All pipe separators must be present even if there is no data for a particular data element. Important: Currently, no escape character is provided if the input data itself contains the | character. Configuring RDP to use a different delimiter is not possible. This way can cause errors to be flagged by the parser in the Import SIF step. Ensure that pipe characters in the input data are suitably managed before populating the SIF. The domain values of key columns in the SIF must contain the values defined by the MDM Server. This will require transformation of domain values in the source system to the ones used in MDM Server. For example, the domain values for Gender in the MDM Server are M and F; the source system might have 0 and 1. The process that is creating the SIF is responsible for mapping the domain values appropriately. When a column is identified as being not nullable, a value must be provided for it and that value cannot be null. The Timestamp format is configurable using a format string such as YYYY-MM-DD.HH.MM.SS. See the IBM WebSphere DataStage and QualityStage Version 8 Parallel Job Developer Guide, SC18-9891 for details about format strings. The order of rows does not matter, because the rows will be sorted in the proper order by the DataStage jobs. For more details, read Appendix B, Standard Interface File details on page 295.

18

Master Data Management: IBM InfoSphere Rapid Deployment Package

2.4 Suspect Duplicate Processing


The RDP for MDM solution can be configured using the configuration screens in the Customer Matching Critical Data Rule user interface, or by setting values in the MDM Server configuration and management tables. Modifying these settings simultaneously affect both the RDP for MDM Direct Database Load and the MDM Server Business Services. The settings are stored in the CONFIGELEMENT table in MDM, which is read by RDP at run time to populate the QS_ parameters from the MDMIS parameter set, as explained in 2.2, MDMIS Parameter Set on page 8.

2.5 Configuration screens in the MDM Server UI


The configuration screens in the MDM Server user interface (UI) permit the RDP for MDM solution customizations, as described in the following sections.

2.5.1 Enabling and disabling Suspect Duplicate Processing


You can enable (On) or disable (Off) Suspect Duplicate Processing option with the buttons shown in Figure 2-8. To set the button to On or Off, from the MDM Server UI, click Matching Critical Data Rules Configuration Options in the navigation pane. This setting affects the processing of both the RDP for MDM Direct Database Load and the MDM Server Business Services.

Figure 2-8 Enabling or disabling Suspect Duplicate Processing in MDM Server UI

Chapter 2. Rapid Deployment Package details

19

A change in this configuration option will enable or disable Suspect Duplicate Processing in RDP through the MDMIS parameters listed in Table 2-2.
Table 2-2 Suspect Duplicate Processing parameters to enable matching Parameter Default Description

Description in CONFIGELEMENT Table QS_PERFORM_ORG_MATCH 0 Perform Organization Match - 1 (true) / 0 (false)

/IBM/Party/SuspectProcessing/enabled QS_PERFORM_PERSON_MATCH 0 Perform Person Match -1 (true) / 0 (false)

/IBM/Party/SuspectProcessing/enabled

2.5.2 Selecting the set of Party Match criteria


You can configure the critical match fields to use for party matching, and set the threshold scores to be used to categorize the action to take for a given score. Because the critical fields for person matching and organization matching can be configured independently, the user interface has two screens respectively.

Configure critical matching fields for Person matching


From the MDM Server UI, click Matching Critical Data Rules Person in the navigation pane to view the Matching Critical Data for Person in the content pane as shown in Figure 2-9 on page 21. You may perform the following tasks: Override the Minimum Match Score for each Suspect Match Category to set the threshold scores for A1, A2, and B matches. Select the matching critical data fields for a person by moving the appropriate fields from the left panes to the right panes under Matching Critical Data Fields/Select National Identifier/Select additional Matching Fields sections. The fields selected are Name, Address, City, State/Province, Country, Zip/Postal Code, Gender, Birth Date, Social Security Number, Home Telephone, Business Email, Passport Number and Mothers Maiden Name.

20

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 2-9 Configuring the critical matching fields for Person matching

A change in these configuration options affect RDP through the MDMIS parameters listed in Table 2-3.
Table 2-3 Suspect Duplicate Processing parameters for Person. Parameter Default Description

Name in CONFIGELEMENT Table QS_A1_MATCH_CU TOFF_PERSON QS_A2_MATCH_CU TOFF_PERSON QS_B_MATCH_CUT OFF_PERSON 205 Specify Person A1 Minimum Match Score.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/MatchScores/a1 175 Specify Person A2 Minimum Match Score.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/MatchScores/a2 150 Specify Person B Minimum Match Score.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/MatchScores/b

Chapter 2. Rapid Deployment Package details

21

Parameter

Default

Description

Name in CONFIGELEMENT Table QS_EXCLUDE_FIEL DS_FROM_MATCH_ PERSON (blank) Select Critical Data Fields for Individual Match.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonAddress/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonBirthDate/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonCity/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonCountry/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonGender/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString1/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString2/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString3/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString4/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonNationalID/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonPostCode/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonState/enabled=true C1 Specify Variable Match String type and TpCd for person.

QS_MATCH_PERSO N_1

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString1/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString1/TpCd QS_MATCH_PERSO N_2 C3 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString2/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString2/TpCd QS_MATCH_PERSO N_3 C5 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString3/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString3/TpCd QS_MATCH_PERSO N_4 C7 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString4/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString4/TpCd QS_MATCH_PERSO N_NATID C2 Specify Variable Match NationalId for person.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonNationalID/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonNationalID/TpCd

22

Master Data Management: IBM InfoSphere Rapid Deployment Package

Configure critical matching fields for Organization matching


From the MDM Server UI, click Matching Critical Data Rules Organization in the navigation pane to view the Matching Critical Data for Organization in the content pane as shown in Figure 2-10. You may perform the following tasks: Override the Minimum Match Score for each Suspect Match Category to set the threshold scores for A1, A2 and B matches. Select the matching critical data fields for an organization by moving the appropriate fields from the left pane to the right pane under Selected Matching Critical Data Fields/Selected National Identifier/Select additional Matching Data fields sections. The fields selected in this case are different from those selected for Person matching. Figure 2-10 shows the selected fields Name, Address, City, State/Province, Country, Zip/Postal Code, Established Date, Corporate Tax Identification, Business Telephone, Business Email, Tax Registration Number and Tax Identification Number.

Figure 2-10 Configuring the critical matching fields for organization matching

Chapter 2. Rapid Deployment Package details

23

A change in these configuration options affect RDP through the MDMIS parameters listed in Table 2-4.
Table 2-4 Suspect Duplicate Processing parameters for Organization Parameter Default Description

Name in CONFIGELEMENT Table QS_A1_MATCH_CUT OFF_ORGANIZATION QS_A2_MATCH_CUT OFF_ORGANIZATION QS_B_MATCH_CUTO FF_ORGANIZATION QS_EXCLUDE_FIELD S_FROM_MATCH_O RGANIZATION 205 Specify Org A1 Minimum Match Score - 205

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/MatchScores/a1 175 Specify Org A2 Minimum Match Score - 175 (default).

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/MatchScores/a2 150 Specify Org B Minimum Match Score - 150

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/MatchScores/b (blank) Select Critical Data Fields for Organization Match.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgAddress/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgCity/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgCountry/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgState/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgCountry/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgPostCode/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgEstablishedDate/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString1/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString2/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString3/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString4/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgNationalID/enabled I1 Specify Variable Match String type and TpCd for organization.

QS_MATCH_ORG_1

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString1/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString1/TpCd QS_MATCH_ORG_2 I2 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString2/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString2/TpCd QS_MATCH_ORG_3 (blank) (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString3/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString3/TpCd

24

Master Data Management: IBM InfoSphere Rapid Deployment Package

Parameter

Default

Description

Name in CONFIGELEMENT Table QS_MATCH_ORG_4 I3 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString4/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString4/TpCd QS_MATCH_ORG_N ATID C2 Specify Variable Match NationalId for organization.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgNationalID/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgNationalID/TpCd

2.6 Database Load options


RDP offers various options to load data into the MDM database. The RDP jobs include ODBC connectivity. However, IBM DB2 Enterprise, Oracle Enterprise, and DB2 Bulk are also available. The RDP jobs support a plug-in architecture for all the database accesses. Database stages are encapsulated in a shared container that can be easily replaced as shown in Figure 2-11.

Figure 2-11 Shared containers containing database stages

Chapter 2. Rapid Deployment Package details

25

The database shared containers are located in the DBContainers category in the RDP DataStage project, as shown in Figure 2-12.

Figure 2-12 Database shared containers in the RDP DataStage Project

To switch from the default set of ODBC shared containers to the DB2 Enterprise or Oracle Enterprise shared container, you must import the DataStage DSX file containing the shared containers of your choice (supplied in the RDP package) into the RDP DataStage project. After the new shared containers are imported, you must recompile all the jobs that use the database shared containers.

26

Master Data Management: IBM InfoSphere Rapid Deployment Package

You can either use the Find where used in the DataStage Designer, as shown in Figure 2-13 to get a list of jobs to be recompiled or recompile all the RDP jobs with the Multiple Job Compile options.

Figure 2-13 Finding jobs to recompile

Important: When importing a separate set of database shared containers, the existing shared containers are overwritten. If you have made customizations to the existing database shared container, we suggest that you make a copy (with a new name) before importing the new shared containers, then replicate the same customizations to the newly imported shared containers where applicable. The MDM_COMPILATION_OPTIONS parameter set has been added to the RDP DataStage project to make sure the various database shared containers provided in the RDP package work with the different databases. This parameter currently support DB2 and Oracle. This parameter set is included in the DB2 Enterprise and Oracle Enterprise database shared containers packages and are already set with the correct parameter values but if you plan to use the ODBC shared containers with Oracle, you must update it accordingly.

Chapter 2. Rapid Deployment Package details

27

The MDM_COMPILATION_OPTIONS parameter set is set by default to support DB2, as shown in Figure 2-14.

Figure 2-14 MDM_COMPILATION_OPTIONS parameter set for DB2

To support ODBC Oracle, the MDM_COMPILATION_OPTIONS must be updated as follows: Switch the SQL_TP_INT64 parameter Default Value from bigint to number(19,0), selecting the value from the proposed list. Switch all the MOD_ parameters Default Value from the semi-colon (;) character to the other value from the proposed list. Note: The MOD_U_SIF_FILE_NAME parameter is not specific to database support and should be changed only if your DataStage installation does not have the National Language Support (NLS) support enabled.

28

Master Data Management: IBM InfoSphere Rapid Deployment Package

The resulting MDM_COMPILATION_OPTIONS parameter set is similar to the one in Figure 2-15.

Figure 2-15 MDM_COMPILATION_OPTIONS parameter set for Oracle

When the MDM_COMPILATION_OPTIONS parameter set is updated, all the jobs using it must be recompiled to make the new values available at run time. You can use the Find where used option shown in Figure 2-13 on page 27 to find the jobs to compile or you can simply recompile all the RDP jobs. The RDP database loading jobs support two modes: Regular database inserts and updates Bulk Load Use the LOAD_METHOD parameter in the MDMIS parameter set, as shown in Figure 2-16 on page 30, to control which type of database loading will be used at run time. The two available options are: BULK ODBC_INSERT

Chapter 2. Rapid Deployment Package details

29

Figure 2-16 LOAD_METHOD parameter in the MDMIS parameter set

Note: Despite being called ODBC_INSERT, the second option also applies to DB2 Enterprise and Oracle Enterprise database stages in RDP. In that case, the upsert method is used (the database stage tries to update a record first if it already exists or insert a new record if there is no existing record to update). When the ODBC_INSERT load method is selected, RDP will use the following jobs to load the records in the MDM database: IL_090_LD_Insert_* DL_090_LD_Update_* DL_091_LD_Update* These jobs are using the upsert method, except the IL_090_LD_Insert_* jobs are using an insert+update method because they are used to insert new records and we do not expect records with the same primary key to be present in the database. When the BULK load method is selected, RDP will use the following generic jobs instead of the specific jobs (stated in the previous list) for all the database inserts: DL_091_LD_Bulk_Common DL_091_LD_Bulk_ContEquiv DL_091_LD_Bulk_NativeKey DL_091_LD_Bulk_Suspect

30

Master Data Management: IBM InfoSphere Rapid Deployment Package

The database updates are still made using the DL_090_LD_Update_* and DL_091_LD_Update* jobs. The main difference between the two method is that we do not capture rejected records out of bulk load as we do for the Insert and Update loads as shown in Figure 2-17 and Figure 2-18.

Figure 2-17 ODBC_INSERT load method

Figure 2-18 BULK load method

Important: Bulk load is available only for DB2 Enterprise and Oracle Enterprise. When setting up the RDP jobs with the default ODBC database stages, the DB2 Enterprise stages are used for bulk load. If you plan to use the bulk load option with ODBC for Oracle, you must import the DLDBBLKTABLE, DLDBBLKHISTORY, and DLDBBLKTRUNCTABLE shared containers from the Oracle Enterprise RDP package.

Chapter 2. Rapid Deployment Package details

31

32

Master Data Management: IBM InfoSphere Rapid Deployment Package

Chapter 3.

RDP MDM: Direct Load


This chapter presents a high-level description of the Direct Load process, and the Rapid Deployment Package (RDP) components that are use in that process. These components consist of various DataStage and QualityStage assets. The Direct Load process categories help to organize the presentation of the high level descriptions of the related DataStage and QualityStage assets. For a deeper and more detailed description of these components, see the InfoSphere MDM RDP HVL Operations Guide, which is included in the distribution of the MDM Server RDP assets.

Copyright IBM Corp. 2009, 2011. All rights reserved.

33

3.1 Direct Load process


The Direct Load process provides the means to process incoming Standard Interface File (SIF) data into the MDM Server entities in a quick fashion, using the parallel processing aspects of DataStage. The Direct Load process can be used to perform initial, incremental, and operational (delta) data loading processes. The Direct Load process consists of seven distinct processing categories as depicted in Figure 3-1. The incoming SIF data is organized into two sets, one that is related to the regular MDM Server database entities, and others related to the hierarchies. The loading of the hierarchy SIF data is similar and therefore is not presented here.

Job Control
Data Quality Assurance Standardization Data type checking RT/ST validation Code Tables Pair Validation Referential Integrity Data Quality Error Consolidation & Reporting Transitive errors Error reporting ID Assignment Internal ID Surrogate Key Data Base Record ID Data Insert and Update

Import SIF

Matching Suspect

MDM GUI

MDM data repository

MDM ConfigElement table Error log Configuration Parameters Error logs

No more error logs Error logs Variable number of files

One file for Party SIF File(s) Consolidated Error log One file for Contract

Figure 3-1 Direct Load process

34

Master Data Management: IBM InfoSphere Rapid Deployment Package

The seven processing categories are as follows: Job Control Includes processes that compile and update processing parameters from the MDM Server CONFIGELEMENT table, ensures that processing structures exist, and invokes parameter specific jobs. Import Consists of those DataStage assets and elements that are used to import the SIF data for processing. Data Quality Assurance This is the DataStage assets that perform the following functions: Validate code column values Perform parameter driven standardization of specific column values Validate the referential integrity (RI) of incoming data records Format related error messages

Data Quality Error Consolidation / Reporting Consists of the DataStage assets that consolidate error message files. created by the Data Quality Assurance processes, and drops incoming data records with associated errors. Matching This is the parameter-driven DataStage and QualityStage assets that identify specific match suspects for subsequent MDM Services processing. ID Assignment Included here are the DataStage assets that perform the following tasks: Derive internal record identifiers to assist in processing. Use surrogate keys to aid in the creation of database record identifiers. Determine whether an incoming record represents a record insertion or a update to an existing record. Data Insert and Update This category consists of DataStage assets that either insert or update data records into the MDM Server database. It includes the parameter-driven DataStage assets that provide bulk loading of records into the MDM Server database.

Chapter 3. RDP MDM: Direct Load

35

3.2 Job Control


This processing category consists of the MDM RDP Direct Load components that perform the following tasks: Establish values in the MDM Server CONFIGELEMENT table. Compile parameter values from the CONFIGELEMENT table at run time. Ensure that files exist prior to processing. Invoke jobs based on existence of data or parameter values. The components in this category generally represent pre-processing and post-processing activities. They represent the jobs that are run to control the overall processing of SIF data. Table 3-1 identifies the related Job Control DataStage assets that are involved in the processing (pre or post) of SIF formatted data.
Table 3-1 SIF Job Control DataStage assets DataStage asset CFG_Config Pre- or PostPreDescription This job sequence invokes the CFG_Update_CM_Params job. Use of this job sequence should be limited and it is not recommended to use on a consistent basis. This job uses the comment portion of the MDMIS parameter value set to load values into the CONFIGELEMENT table. This job sequence is used in production to invokes the IL_000_PS__Prestart job sequence which builds a new VOLATILE_MDMIS parameter value set before it invokes the DL_000_DELTA_LOAD job sequence with the new VOLATILE_MDMIS parameter value set. This job sequence is invoked by DL_000_AutoStart_PS_DELTA_LOAD, DL_000_AutoStart_PS_HIERARCH, and the IL_000_AutoStart_EX sequences. The sequence invokes the IL_000_PS_SF_Create, IL_000_PS_Set_BatchID, IL_000_PS_Gen_Volatile_Par_Set and IL_000_PS_Stage_ErrReasonTbl jobs, based on specific conditions described for each job. This job is invoked if the BATCH_ID.sf file exists in the directory identified in the MDMIS parameter value set FS_SK_FILE_DIR parameter.

CFG_Update_CM_Params DL_000_AutoStart_PS_DELT A_LOAD

PrePre-

IL_000_PS__Prestart

Pre-

IL_000_PS_SF_Create

Pre-

36

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset IL_000_PS_Set_BatchID

Pre- or PostPre-

Description This job uses the file identified in the MDMIS parameter set FS_PARAM_SET_DIR concatenated with MDM_CONNECTIONS/ and the MDM_CONNECTIONS_VALUE_FILE_NAME parameter along with the file identified in the MDMIS parameter FS_PARAM_SET_DIR concatenated with MDMIS/STATIC_MDMIS to compile the file identified by the MDMIS parameter set FS_PARAM_SET_DIR parameter concatenated with MDMIS/TEMP_MDMIS. This job uses parameter values from the CONFIGELEMENT table, the TEMP_MDMIS file compiled in the IL_000_PS_Set_Batch job, and the file identified in the PARAMS_FILE job parameter to compile a new VOLATILE_MDMIS parameter value set and to archive the previous VOLITILE_MDMIS parameter value set. This job extracts values from the ERRREASON table to add parameter values for the DROP_ON_PRVBY_ERR, DROP_ON_FROM_ERR, DROP_ON_ASSIGNEDBY_ERR, DROP_ON_REPLBY_ERR and RESET_ON_ASSIGNEDBY_ERR parameters along with the contents of the file identified in the MDMIS parameter set FS_PARAM_SET_DIR parameter value concatenated with /MDM_EC/DEFAULT to the VOLATILE_MDMIS parameter value set. This sequences controls the order of processing SIF data invoking job sequences and jobs based on the existence of data sets and parameter values. The jobs invoked in this job sequence will be described in their related processing category description. This sequence invokes the IL_061__AI_SK_State_File_Control and the IL_062_AI_CM_SK_Control job sequences based on the value of the MANUAL_STATE_FILE_CONTROL parameter.

IL_000_PS_Gen_Volatile_Par _Set

Pre-

IL_000_PS_Stage_ErrReason Tbl

Pre-

DL_000_DELTA_LOAD

IL_061__AI_SK_State_File_C ontrol IL_061_AI_SF_Delete

Pre-

This job sequence invokes the IL_061_AI_SF_Delete and IL_061_AI_SF_Create jobs before the ID Assignment category of jobs are invoked. This job deletes the surrogate key files.

Pre-

Chapter 3. RDP MDM: Direct Load

37

DataStage asset IL_061_AI_SF_Create IL_062_AI_CM_SK_Control

Pre- or PostPrePost-

Description This job creates surrogate key files utilizing next key values extracted from the CONFIGELEMENT table. This job updates the next key values in the CONFIGELEMENT table after the ID Assignment category jobs successfully complete.

3.3 Import
This processing category consists of those MDM RDP Direct Load components that are used to import records from SIF formatted text files and output related error logs. The DataStage assets provide specific functionality to perform the following tasks: Extract existing record identifiers from the MDM server database. Initialize temporary data structures for downstream processes. Parse incoming SIF data into specific data sets. Capture and record specific data or file format errors. The set of DataStage assets that imports the SIF formatted data consists of one DataStage job (DL_010_IS_Import_SIF) and a number of DataStage shared containers. These shared containers are much like shared libraries. They are used to provide database specific functionality and practical structures for the handling of incoming data structures. The DataStage import shared containers described below each contain functionality specific to a Record Type/Sub Type (RT/ST) defined in the SIF Mapping Specification spreadsheet. The DataStage job (DL_010_IS_Import_SIF) requires an MDMIS parameter value set as input. The job uses the parameters for the following reasons: To connect to the MDM Server database To determine behavior for handling of specific error conditions related to parsing (creation of records and handling of column values) For directory path locations to find incoming SIF files For directory path locations to output intermediate data sets For directory path locations to output any related error logs For the related batch identifier for data set and error log file naming

38

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table 3-2 has a brief description of the associated DataStage assets. The DataStage import shared containers (DLIS) description presents information about the input data format it handles and resulting data set naming convention.
Table 3-2 SIF Import DataStage assets DataStage asset
DL_010_IS_Import_SIF

Description
This DataStage job uses the assets listed here to import related SIF data, create data sets and capture and record data file / format errors. Import Shared Container SIF Data: LocationGroup_AddressGroup_Address RT/ST: PA Resulting Data Set: Parse_Address Import Shared Container SIF Data: Alert RT/ST: PT and CT Resulting Data Sets: Parse_Alert_Party and Parse_Alert_Contract Import Shared Container SIF Data: Contact RT/ST: PP and PO Resulting Data Set: Parse_Contact Import Shared Container SIF Data: LocationGroup_ContactMethodGroup_ContactMethod RT/ST: PC Resulting Data Set: Parse_ContactMethod Import Shared Container SIF Data: ContactRel RT/ST: PR Resulting Data Set: Parse_ContactRel Import Shared Container SIF Data: Contract RT/ST: CH Resulting Data Set: Parse_Contract Import Shared Container SIF Data: ContractComponent RT/ST: CC Resulting Data Set: Parse_ContractComponent Import Shared Container SIF Data: ContractCompVal RT/ST: CV Resulting Data Set: Parse_ContractCompVal

DLISAddress

DLISAlert

DLISContact

DLISContactMethod

DLISContactRel

DLISContract

DLISContractComponent

DLISContractCompVal

Chapter 3. RDP MDM: Direct Load

39

DataStage asset
DLISContractRole

Description
Import Shared Container SIF Data: ContractRole RT/ST: CR Resulting Data Set: Parse_ContractComponentRole. Import Shared Container SIF Data: ExternalMatch RT/ST: PE Resulting Data Set: Parse_ExternalMatch Import Shared Container SIF Data: Identifier RT/ST: PI Resulting Data Set: Parse_Identifier Import Shared Container SIF Data: LobRel RT/ST: PB Resulting Data Set: Parse_LOBRel Import Shared Container SIF Data: MiscValue RT/ST: PM and CM Resulting Data Sets: Parse_MiscValue_Party and Parse_MiscValue_Contract Import Shared Container SIF Data: NativeKey RT/ST: CK Resulting Data Set:Parse_NativeKey Import Shared Container SIF Data: OrgName RT/ST: PG Resulting Data Set:Parse_OrgName Import Shared Container SIF Data: Person Name, Person Search RT/ST: PH Resulting Data Set: Parse_PersonName Import Shared Container SIF Data: PPrefEntity_PrivPref RT/ST: PS Resulting Data Set: Parse_PrivPref Import Shared Container SIF Data: Role Location RT/ST:CL Resulting Data Set: Parse_RoleLocation

DLISExternalMatch

DLISIdentifier

DLISLOBRel

DLISMiscValue

DLISNativeKey

DLISOrgName

DLISPersonName

DLISPrivPref

DLISRoleLocation

40

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset
DLDBSELCONTEQUIV

Description
Database Shared Container. Retrieves record identifiers from CONTEQUIV and CONTACT database tables. Database Shared Container Records record identifiers to the IS_ADMINCLIENT database table. Database Shared Container Retrieves record identifiers from CONTRACT and NATIVEKEY database tables. Base Shared Container Records record identifiers to the IS_ADMINCONTRACT database table.

DLDBINSADMINCLIENT

DLDBSELNATIVEKEY

DLDBINSADMINCONTRACT

3.4 Data Quality Assurance


This processing category consists of those MDM RDP Direct Load components that perform the following tasks: Validates code column values against MDM Server database tables. Provides parameter driven standardization of specific column values. Standardization processes use the same QualityStage rule sets used by MDM Server run time as part of transaction processing. Provides parameter driven phonetic generation of specific column values. Phonetic generation processes use same QualityStage rules sets used by MDM Server run time as part of transaction processing. Validates the referential integrity of internal record references. Generates related error message log files. Loads column values from existing database records to related incoming records. The execution of these particular DataStage assets are preceded by the execution of the DL_015_II__InternalID_Party and DL_016_II__InternalID_Contract job sequences to establish internal identifiers. See 3.7, ID assignment on page 50. The DataStage assets used for SIF Data Quality Assurance processing consists of two job sequences that invoke the 19 related DataStage jobs. Within these jobs, DataStage and in some instances QualityStage assets are employed.

Chapter 3. RDP MDM: Direct Load

41

Table 3-3 lists the related job sequences and jobs; within a job description related QualityStage assets are identified along with associated input and output data sets. Each DataStage job described in Table 3-3 provides Edit Point shared containers to accommodate the user customizing the data quality assurance processing. The naming convention for these shared container is Extension Point for Custom Validation or Standardization (EPCVS) followed by the incoming data type such as Address, Alert, and so on. Note: Referential integrity validation for some of the Contract data sets can only occur after Party (Contact) ID assignment processing has completed successfully.
Table 3-3 SIF Data Quality Assurance assets DataStage asset DL_020__VS_RI_EC_PARTY Description This job sequences invokes the 12 DataStage jobs that follow, not necessarily in the order presented Input data set: Parse_Address and RI_Contact_SUBSET QualityStage Standardization Rule Sets: COUNTRY, USPREP, CAPREP, MNADKEY, MNSPOST, MDMUSADDR, MDMUSAREA, MDMCAADDR, MDMCAAREA Output data set: Address_Validated and AddressExisting. Input data set: Parse_Alert_Party and RI_Contact_SUBSET Output data set: PARTY_Alert_Validated and PARTY_AlertExisting Input data set: Parse_Contact Output data sets: Contact_Validated, RI_Contact_SUBSET and Existing_Contacts. Input data set: Parse_ContactMethod and RI_Contact_SUBSET QualityStage Standardization Rule Sets: MNPHONE Output data sets: ContactMethod_Validated and ExistingContactMethod

DL_020_VS_Address

DL_020_VS_Alert_Party

DL_020_VS_Contact

DL_020_VS_ContactMethod

42

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_020_VS_ContactRel

Description Input data set: Parse_ContactRel and RI_Contact_SUBSET Output data set: ErrCon_ContactRel_0 and ContactRelExisting Input data set: Parse_Identifier and RI_Contact_SUBSET Output data set: ErrCon_Identifier_0 and IdentifierExisting Input data set: Parse_LOBRel and RI_Contact_SUBSET Output data set: LOBRel_Validated and LOBRelExisting Input data set: Parse_MiscValue_Party and RI_Contact_SUBSET Output data set: PARTY_MiscValue_Validated and PARTY_MiscValueExisting Input data set: Parse_OrgName and RI_Contact_SUBSET QualityStage Standardization Rule Sets: MNNAME Output data sets: OrgName_RIValidate and ExistingOrgName Input data set: Parse_PersonName and RI_Contact_SUBSET QualityStage Standardization Rule Sets: MNNAME and MNNMKEYS Output data sets: PersonName_RIValidated and PersonNameExisting. Input data set: Parse_PrivPref and RI_Contact_SUBSET Output data sets: PrivPref_Validated and PrivPrefExisting Input data sets: Contact_Validated, PersonName_RIValidated, OrgName_RIValidated and RI_Contact_SUBSET Output data sets: Contact_Reference and ErrCon_Contact_0

DL_020_VS_Identifier

DL_020_VS_LOBRel

DL_020_VS_MiscValue_Party

DL_020_VS_Orgname

DL_020_VS_PersonName

DL_020_VS_PrivPref

DL_030_RI_Contact_Person_Org

Chapter 3. RDP MDM: Direct Load

43

DataStage asset DL_021__VS_EC_CONTRACT

Description This job sequences invokes the seven DataStage jobs that follow, not necessarily in the order presented. Input data sets: Parse_Alert_Contract and RI_Contract_SUBSET Output data sets: CONTRACT_Alert_Validated and CONTRACT_AlertExisting Input data sets: Parse_Contract and INPUT_CONTRACT_MASTER Output data sets: Contract_Reference., ErrCon_Contract_0 and RI_Contract_SUBSET Input data sets: Parse_ContractComponent and RI_Contract_SUBSET Output data sets: ContractComponent_Validated, ContractComponentExisting and ContractComponent_For_RI_Validation Input data sets: Parse_ContractCompVal, ContractComponent_For_RI_Validation and RI_Contract_SUBSET Output data sets: ContractCompValExisting and ContractCompVal_Validated Input data sets: Parse_ContractComponentRole, ContractComponent_For_RI_Validation, RI_Contract_SUBSET, Insert_CONTEQUIV and RI_Contact_SUBSET Output data sets: ContractComponentRoleExisting, ContractComponentRole_For_RI_Validatio n and ContractRole_Validated

DL_021_VS_Alert_Contract

DL_021_VS_Contract

DL_021_VS_ContractComponent

DL_021_VS_ContractCompVal

DL_021_VS_ContractRole

44

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_021_VS_MiscValue_Contract

Description Input data sets: Parse_MiscValue_Contract and RI_Contract_SUBSET Output data sets: CONTRACT_MiscValueExisting and CONTRACT_MiscValue_Validated Input data sets: Parse_RoleLocation, RI_Contract_SUBSET, ContractComponent_For_RI_Validation, ContractComponentRole_For_RI_Validatio n, RI_Contact_SUBSET and RISUBSET_ADDRESSGROUP Output data sets: RoleLocationExisting and RoleLocation_Validated.

DL_021_VS_RoleLocation

3.5 Data Quality Error Consolidation / Reporting


This processing category consists of those MDM RDP Direct Load components that consolidate error logs generated from both the Import and Data Quality Assurance processes and drops records associated with records in error. The DataStage assets used for SIF data quality error consolidation and reporting processing consists of similar job sequences for Party and Contract data. These job sequences (DL_040_EC_Party and DL_040_EC_Contract) each set up data sets to run in a loop (IL_040_EC_Party_Initial and IL_040_EC_Contract_Initial), execute iterative activities (IL_040_EC_Party_Iterative_Drop and IL_040_EC_Contract_Iterative_Drop) that identify and drop records associated to records that were identified earlier as containing errors, uses processing parameter to determine how many times to loop (DS_DROP_MAX_ITERATIONS), perform a last step after exiting the iterative loop (IL_040_EC_Party_Last_Drop and IL_040_EC_Contract_Last_Drop) and examine the error report results (DL_041_EC_Error_Check) using processing option parameters to determine whether to abort (DS_SIF_ERROR_THRESHOLD) or email the error report (DS_EMAIL_ERROR_CHECK_REPORT and DS_EMAIL_ERROR_CHECK_DISTRIBUTION).

Chapter 3. RDP MDM: Direct Load

45

Table 3-4 presents the related assets with a description consisting of the associated input and output data sets.
Table 3-4 Data Quality Error Consolidation / Reporting DataStage assets DataStage asset DL_040_EC_Party Description This job sequences invokes the Party related jobs as described earlier. Parameter Input: MDMIS Input data sets: PARTY*_VS_ERR_MSGS, PARTY_SIF_Import_IID_ERR_MSGS and PARTY*_RI_ERR_MSGS Output data sets: PartiesToDrop_0 and PARTY_ErrCon Parameter Input: Nth and Mth Input data set: PartiesToDrop_#Nth, ErrCon_Contact_#Nth, ErrCon_ContactRel_#Nth, ErrCon_Identifier_#Nth Output data set: PartiesToDropFinal, ErrCon_Contact_M, ErrCon_ContactRel_M, ErrCon_Identifier_M, PARTY_ErrCon, PartiesToDrop_#Mth and PartyDropCount_#Nth Input data set: PartiesToDropFinal DataStageShared Container: ILECDropByAssociation Output data set: ErrCon_<SIF Record Type> Shared container used to drop associated records for identified input data sets. Parameter Input: VALID_INPUT_RECS_DS, VALID_OUTPUT_RECS_DS, SSK_FIELD_ONE, SSK_FIELD_TWO Input data set: PartiesToDropFinal Output: VALID_OUTPUT_RECS_DS (ErrCon_<SIF Record Type>) and PARTY_ErrCon. This job sequences invokes the Contract related jobs as described earlier. Parameter Input: MDMIS

IL_040_EC_Party_Initial

IL_040_EC_Party_Iterative_Drop

IL_040_EC_Party_Last_Drop

ILECDropByAssociation

DL_040_EC_Contract

46

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset IL_040_EC_Contract_Initial

Description Input data sets: CONTRACT*_VS_ERR_MSGS, CONTRACT_SIF_Import_IID_ERR_MS GS and CONTRACT*_RI_ERR_MSGS Output data sets: ContractsToDrop_0 and CONTRACT_ErrCon Parameter Input: Nth and Mth Input data set: ContractsToDrop_#Nth, and ErrCon_Contract_#Nth, Output data set: ContractsToDropFinal CONTRACT_ErrCon, ContractsToDrop_#Mth and ContractDropCount_#Nth Input data set: ContractsToDropFinal DataStageShared Container: ILECDropByAssociation (described above) Output data set: ErrCon_<SIF Record Type> This job is invoked by both Party and Contract job sequences Parameter Input: ERROR_TYPE (PARTY or CONTRACT) Input data set: SIF_Import_ERR_MSGS, #ERROR_TYPE#_ErrCon, Outputs: ERROR_TYPE#ErrorThresholdReport (MDMIS FS_LOG_DIR parameter)

IL_040_EC_Contract_Iterative_Drop

IL_040_EC_Contract_Last_Drop

DL_041_EC_Error_Check

3.6 Matching
This processing category consists of parameter driven MDM RDP Direct Load components that perform match processing, identifying Suspect records for subsequent MDM Service activities. Match processing is invoked in the job sequence when the MDMIS parameter set QS_PERFORM_ORG_MATCH and and QS_PERFORM_PERSON_MATCH parameter values are set to 1. The QualityStage rule sets used in these matching jobs are also used by MDM Server run time as part of transaction processing.

Chapter 3. RDP MDM: Direct Load

47

Table 3-5 lists the related assets with a description that consists of the associated input and output data sets.
Table 3-5 Matching DataStage assets DataStage asset DL_050_MA__Match DL_051_MA_Prep Description This job sequence invokes the related DataStage jobs passing the MDMIS parameter value set to each. Input: ErrCon_Contact, ErrCon_PersonName, ErrCon_OrgName, ErrCon_Address, ErrCon_ContactMethod, ErrCon_Identifier, INPUT_PARTY_MASTER, PersonNameExisting, OrgNameExisting, AddressExisting, ContactMethodExisting and IdentifierExisting. Output: Direct_Update, Ids_For_Remove_Suspects, Contact_To_Match, PersonName_InternalId_Errors, PersonName_To_Match, OrgName_To_Match, OrgName_InternalId_Errors, Address_To_Match, ContactMethods_To_Match and Identifier_To_Match Input: Contact_To_Match, PersonName_To_Match, OrgName_To_Match, Address_To_Match, ContactMethods_To_Match and Identifier_To_Match Output: IncomingPersonRecords_for_Match, IncomingOrgRecords_for_Match and Match045_ERR_MSGS Database Shared Container: DLDBINTEMPPERSON and DLDBINTEMPORG Write records to the IS_PERSONTEMPTABLE table used when LOAD MODE is set to DELTA. Write records to the IS_ORGTEMPTABLE table used when LOAD MODE is set to DELTA. Input: IncomingOrgRecords_for_Match, Output: OrganizationCandidates_for_Match Database Shared Container: DLDBSELCANDIDATES DataStage shared container which returns records from database using SQL queries that are built dynamically. Input: IncomingPersonRecords_for_Match Output: PersonCandidates_for_Match Database Shared Container: DLDBSELCANDIDATES (see DL_053_MA_Org_Candidate_Sel)

DL_052_MA_Prep_Candidates

DLDBINTEMPPERSON DLDBINTEMPORG DL_053_MA_Org_Candidate_Sel

DLDBSELCANDIDATES DL_053_MA_Person_Candidate_Sel

48

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_054_MA_Org

Description Input: IncomingOrgRecords_for_Match and OrganizationCandidates_for_Match Output: MatchOrgA1 and MatchOrgNonA1 Matching Shared Container: DLMAOrganization Matching shared container consisting of various QualityStage assets used in performing Organizational matching returning A1 and NON-A1 matches to the invoking process. Input: IncomingOrgRecords_for_Match, OrganizationCandidates_for_Match, RefOrgTransMatchFreq, RefOrgCandidateMatchFreq and OrgMatchFreqForRollUp, Output: MatchOrgA1, MatchOrgNonA1, OrgRefMatchDebugFile and OrgDedupMatchDebugFile. Input: IncomingPersonRecords_for_Match and PersonCandidates_for_Match Output: MatchPersonA1 and MatchPersonNonA1 Matching Shared Container: DLMAPerson Matching shared container consisting of various QualityStage assets used in performing Person matching returning A1 and NON-A1 matches to the invoking process. Input: IncomingPersonRecords_for_Match, PersonCandidates_for_Match, RefPersonTransMatchFreq, RefPersonCandidateMatchFreq and PersonMatchFreq Output: MatchPersonA1, MatchPersonNonA1, PersonRefMatchDebugFile, PersonDedupMatchDebugFile and PersonMatchFreqForRollUp. Input: MatchPersonA1,MatchPersonNonA1, ErrCon_LOBRel, MatchOrgA1, MatchOrgNonA1, ErrCon_ContactRel and INPUT_PARTY_MASTER Output: LOB084_ERR_MSGS, LOBProcessing, SuspectProcessing and ToImpliedMatches Matching Shared Containers:DLMALOBRel, DLMALOBa and DLMALOBb Data BaseShared Container:DLDBINNEWMATCHCONTID This matching shared container is invoked twice (once for Person and once for ORG) returns LOB matches to the invoking process. Input: A1Match and LOBMatch Output: LnkAllLOBs

DLMAOrganization

DL_054_MA_Person

DLMAPerson

DL_055_MA_LOB

DLMALOBRel

Chapter 3. RDP MDM: Direct Load

49

DataStage asset DLMALOBa

Description Matching shared container consisting of various QualityStage assets used in performing Person LOB matching returning linked LOB matches to the invoking process. Input: LinkedDBInput and LOBMatchFreq Output: LnkXfmOut (LOBPersMatches) Matching shared container consisting of various QualityStage assets used in performing Org LOB matching returning linked LOB matches to the invoking process. Input: LinkedDBInput and LOBMatchFreq Output: LnkXfmOut (LOBOrgMatches) This Database Shared container writes records to the IS_NEWMATCHCONTID database table. This job identifies implied suspects from the DL_055_MA_LOB job producing a data set that is used in subsequent processing Input: ToImpliedMatches Output: GEND_IMPLIED_SUSPECTS.

DLMALOBb

DLDBINNEWMATCHCONTID DL_056_MA_Gen_Implied

3.7 ID assignment
This processing category consists of those MDM RDP Direct Load components that perform the following tasks: Creates an internal identifier, described below for each new incoming record. Identifies internal identifiers for incoming records that correspond to an existing database record. Determines if an incoming record represents an update to an existing record. Generates record identifiers for each database table record to be inserted. For each incoming SIF data record, a surrogate key is generated and used as an internal identifier during processing, called an internal ID. If a SIF data record identifier is found in the MDM Server database the value for the CONT_ID field is populated, otherwise it is left null. The internal ID is integral in the processing of the SIF data record up until specific table identifiers are generated for the related records. The creation and assignment of a record identifier differ only slightly for each record type. The assignment of a record identifier usually begins with a query of the MDM Server database to extract existing records (unless they have been previously extracted into a data set). For every entity participating in the load, MDM Server defines a business key, which is a combination of column values that uniquely

50

Master Data Management: IBM InfoSphere Rapid Deployment Package

identify the record. Based upon a comparison of business key column values a determination can be made if an incoming record represents a insert or an update. If a record is determined to represent an update existing column values and incoming column values are compared. If there is indeed a difference, the record is marked as an update. If there are no differences, the record is dropped to avoid unnecessary and / or redundant updates. The generation of internal identifiers for Party (Contact) data differs from Contract because the user has the option to utilize a native key construct to store contract identifiers. This differences in the naming convention of the specific DataStage assets used to generate Contract internal identifiers identify those assets that take into consideration the users decision regarding using native keys. Another factor in the differences between the generation of internal identifiers for Party and Contract is the use of EXTERNALMATCH SIF data records, see Appendix B, Standard Interface File details on page 295 for further details. The DataStage jobs that need to use MDM server database record identifiers use generic DataStage assets to generate the appropriate identifier. These assets consist of two shared containers (AINextKeyPrefix and DLAINextKey). The result of the DataStage assets that derive database record identifiers are data sets of record that are either inserts into or updates to their respective database tables. The assignment of identifiers use four DataStage job sequences, 21 DataStage jobs, and 20 DataStage shared containers. Table 3-6 describes these DataStage assets.
Table 3-6 DataStage assets used in SIF identifier assignment DataStage asset DL_015_II__InternalID_Party Description This DataStage job sequence ensures that the data constructs are in place to support the generation of internal identifiers before invoking the DataStage job DL_015_II_InternalID_Party. Inputs: Unique_Party_SSKs_DS, ExternalMatch_DS and SIF_Import_ERR_MSGS_Party Outputs: INPUT_PARTY_MASTER, ExternalMatchWithoutSIFContact and PARTY_SIF_Import_IID_ERR_MSG DataBase Shared Containers: DLDBSELCONTEQUIV, DLDBSELADMINEXTERNALMATCH, DLDBSELSSKTMP and DLDBSELCONTEQUIVEXIST

DL_015_II_InternalID_Party

Chapter 3. RDP MDM: Direct Load

51

DataStage asset DLDBSELCONTEQUIV

Description This database shared container returns active CONTEQUIV records utilizing the record identifiers to identify existing database records. This database shared container returns existing database record identifiers to be used with incoming external match data set records. This database shared container returns active CONTEQUIV records utilizing the record identifiers to be used with incoming external match data set records to identify externally matched identifiers. This DataStage job sequence ensures that the data constructs are in place to support the generation of internal identifiers and if the MDMIS parameter set value for DS_USE_NATIVE_KEY is set before invoking either of the following DataStage jobs: DL_016_II_InternalID_Contract_NativeKey DL_016_II_InternalID_Contract_NoNativeKeys This DataStage job is invoked when the MDMIS parameter set value for DS_USE_NATIVE_KEY is set. Inputs: Parse_ContractSSKs, Parse_NativeKey and CONTRACT_SIF_Import_ERR_MSGS Outputs: INPUT_CONTRACT_MASTER and CONTRACT_SIF_Import_IID_ERR_MSGS DataBase Shared Containers: DLDBSELSSKTMP2 and DLDBSELADMINNATIVEKEY This database shared container retrieves records from a join between the CONTRACT, NATIVEKEY and IS_ADMINCONTRACT database tables. This database shared container retrieves records from a join between the CONTRACT, NATIVEKEY and IS_ADMINNATIVEKEY database tables.

DLDBSELADMINEXTERNALMATCH

DLDBSELCONTEQUIVEXIST

DL_016_II__InternalID_Contract

DL_016_II_InternalID_Contract_NativeKey

DLDBSELSSKTMP2

DLDBSELADMINNATIVEKEY

52

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_016_II_InternalID_Contract_NoNativeKeys

Description This DataStage job is invoked when the MDMIS parameter values for DS_USE_NATIVE_KEY is not set. Inputs: Parse_ContractSSKs and CONTRACT_SIF_Import_ERR_MSGS Outputs: INPUT_CONTRACT_MASTER and CONTRACT_SIF_Import_IID_ERR_MSGS Database Shared Container: DLDBSELSSKTMP3 This database shared container retrieves records from a join between the CONTRACT and IS_ADMINCONTRACT database tables. This DataStage job sequence checks the MDMIS parameter values for QS_PERFORM_ORG_MATCH and QS_PERFORM_PERSON_MATCH before invoking the related DataStage jobs. It controls the order in which the related DataStage jobs are invoked. This DataStage job is invoked when the MDMIS parameter set values for QS_PERFORM_ORG_MATCH and QS_PERFORM_PERSON_MATCH are set. Inputs: ErrCon_Contact and LOBProcessing Shared Container: DLAIContact Inputs: CONTID_NULL, CONTID_PRESENT, ExternalMatchWithoutSIFContact and Existing_Contacts Outputs: COLSURVIVE_DROPS, Final_Existing_Contacts, Insert_CONTEQUIV, Insert_CONTACT, Update_CONTACT, Update_PERSON and Update_ORG Database Shared Container: DLDBSELCONTACT Assign Shared Container (Key): AINextKeyPrefix (CONT_ID), DLAINextKey (CONT_ID and Cont_Equiv) This database shared container retrieves records from a join between the IS_NEWMATCHCONTID, CONTACT, PERSON and ORG database table. This shared container generates a key prefix based on input data values.

DLDBSELSSKTMP3

DL_060__AI_ASSIGN_IDS_PARTY

DL_060_AI_Contact_Match

DLAIContact

DLDBSELCONTACT

AINextKeyPrefix

Chapter 3. RDP MDM: Direct Load

53

DataStage asset DLAINextKey DL_060_AI_Suspect

Description This shared container generates a key based on input data values. This DataStage job is invoked only when the MDMIS parameter set values for QS_PERFORM_ORG_MATCH and QS_PERFORM_PERSON_MATCH are set and is used to generate record identifiers for Suspect database records. Inputs: GEND_IMPLIED_SUSPECTS, SuspectProcessing, Insert_CONTEQUIV, Ids_For_Remove_Suspect Outputs: Insert_SUSPECT Database Shared Container: DLDBSELEXISTINGSUSPECT Assign Shared Container (Key): AINextKey (SUSPECT_ID) This database shared container retrieves records from the Suspect database table. This DataStage job is invoked only when the MDMIS parameter set values for QS_PERFORM_ORG_MATCH and QS_PERFORM_PERSON_MATCH are not set. Inputs: ErrCon_Contact Shared Container: DLAIContact (described above) This DataStage job generates database record identifiers for OrgName records and identifies updates to the OrgName table. Inputs: ErrCon_OrgName, Insert_CONTEQUIV and OrgNameExisting Outputs: Insert_ORGNAME, Update_ORGNAME and DuplicateBeforeImage_OrgName Database Shared Container: DLDBSELEXISTINGORGNAME Assign Shared Container (Key): DLAINextKey (ORG_NAME_ID) This database shared container retrieves records from the OrgName database table.

DLDBSELEXISTINGSUSPECT DL_060_AI_Contact_NoMatch

DL_060_AI_OrgName

DLDBSELEXISTINGORGNAME

54

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_060_AI_PersonName

Description This DataStage job generates database record identifiers for both PersonName and PersonSearch records and identifies updates to the PersonName and PersonSearch database tables. Inputs: ErrCon_PersonName, Insert_CONTEQUIV and PersonNameExisting Outputs: Insert_PERSONNAME, Insert_PERSONSEARCH, Update_PERSONSEARCH, Update_PERSONNAME and DuplicateBeforeImage_PersonName Database Shared Container: DLDBSELEXISTINGPERSONNAME Assign Shared Container (Key): DLAINextKey (PERSON_NAME_ID), DLAINextKey (PERSON_SEARCH_ID) This database shared container retrieves related records from both PersonName and PersonSearch database tables. This DataStage job generates database record identifiers for LOBREL records and identifies updates to the LOBREL database tables. Inputs: ErrCon_LOBRel, Insert_CONTEQUIV and LOBRelExisting Outputs: Insert_LOBREL, Update_LOBREL and DuplicateBeforeImage_LOBRel Database Shared Container: DLDBSELEXISTINGLOBREL Assign Shared Container (Key): DLAINextKey (LOB_REL_ID) This database shared container retrieves records from the LOBREL database table. This DataStage job generates database record identifiers for ALERT records and identifies updates to the ALERT database tables. Inputs: PARTY_ErrCon_Alert, Insert_CONTEQUIV and PARTY_AlertExisting Outputs: Insert_ALERT, Update_ALERT and DuplicateBeforeImage_Alert Database Shared Container: DLDBSELEXISTINGALERT Assign Shared Container (Key): DLAINextKey (ALERT_ID)

DLDBSELEXISTINGPERSONNAME

DL_060_AI_LOBRel

DLDBSELEXISTINGLOBREL DL_060_AI_Alert

Chapter 3. RDP MDM: Direct Load

55

DataStage asset DLDBSELEXISTINGALERT DL_060_AI_Identifier

Description This database shared container retrieves records from the ALERT database table. This DataStage job generates database record identifiers for IDENTIFIER records and identifies updates to the IDENTIFIER database tables. Inputs: ErrCon_Identifier, Insert_CONTEQUIV and IdentifierExisting Outputs: Insert_IDENTIFIER, Update_IDENTIFIER and DuplicateBeforeImage_Identifier Database Shared Container: DLDBSELEXISTINGIDENTIFIER Assign Shared Container (Key): DLAINextKey (IDENTIFIER_ID) This database shared container retrieves records from the IDENTIFIER database table. This DataStage job generates database record identifiers for CONTACTREL records and identifies updates to the CONTACTREL database tables. Inputs: ErrCon_ContactRel, Insert_CONTEQUIV and ContactRelExisting Outputs: Insert_CONTACTREL, Update_CONTACTREL, SameCONTID_ContactRel and DuplicateBeforeImage_ContactRel Database Shared Container: DLDBSELEXISTINGCONTACTREL Assign Shared Container (Key): DLAINextKey (CONT_REL_ID) This database shared container retrieves records from the CONTACTREL database table.

DLDBSELEXISTINGIDENTIFIER DL_060_AI_ContactRel

DLDBSELEXISTINGCONTACTREL

56

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_060_AI_MiscValue

Description This DataStage job generates database record identifiers for MISCVALUE records and identifies updates to the MISCVALUE database tables. Inputs: PARTY_ErrCon_MiscValue, Insert_CONTEQUIV and PARTY_MiscValueExisting Outputs: Insert_MISCVALUE, Update_MISCVALUE and DuplicateBeforeImage_MiscValue Database Shared Container: DLDBSELEXISTINGMISCVALUE Assign Shared Container (Key): DLAINextKey (MISCVALUE_ID) This database shared container retrieves PARTY related records from the MISCVALUE database table. This DataStage job generates database record identifiers for both PREFENTITY and PRIVPREF records and identifies updates to the MISCVALUE database tables. Inputs: ErrCon_PrivPref, Insert_CONTEQUIV and PrivPrefExisting Outputs: Insert_PPREFENTITY, Insert_PRIVPREF, Update_PPREFENTITY, Update_PRIVPREF and DuplicateBeforeImage_PrivPref Databse Shared Container: DLDBSELEXISTINGPRIVPREF Assign Shared Container (Key): DLAINextKey (PPREF_ID) This database shared container retrieves records from a join between the PPREFENTITY and PRIVPREF database tables.

DLDBSELEXISTINGMISCVALUE DL_060_AI_PrivPref

DLDBSELEXISTINGPRIVPREF

Chapter 3. RDP MDM: Direct Load

57

DataStage asset DL_060_AI_Address_ContactMethod

Description This DataStage job generates database record identifiers for Address, Address Group, Contact Method Group, Location Group and Phone Number related records and identifies updates to the Address Group, Contact Method Group, Location Group and Phone Number database tables. Inputs: ErrCon_Address, Insert_CONTEQUIV, AddressExisting, ErrCon_ContactMethod and ContactMethodExisting Outputs: RISUBSET_ADDRESSGROUP, Insert_ADDRESS, Insert_ADDRESSGROUP, Update_ADDRESSGROUP, Insert_LOCATIONGROUP, Update_LOCATIONGROUP, Insert_PHONENUMBER, Update_PHONENUMBER, Insert_CONTACTMETHODGROUP, Update_CONTACTMETHODGROUP, DuplicateBeforeImage_Address and DuplicateBeforeImage_Contact Database Shared Containers: DLDBSELADDRESS, DLDBSELCONTACTMETHOD and DLDBSELMD5ADDRESS Assign Shared Container (Key): DLAINextKey (ADDRESS_ID), DLAINextKey_Addr (LOCATION_GROUP_ID), DLAINextKey (ContactMethod:LOCATION_GROUP_ID) and DLAINextKey (CONTACT_METHOD_ID) This database shared container retrieves record from a join between the ADDRESS, ADDRESSGROUP and LOCATIONGROUP database tables. This database shared container retrieves record from a join between the CONTACTMETHOD, CONTACTMETHODGROUP, PHONENUMBER and LOCATIONGROUP database tables. This database shared container retrieves MD5_ADDRESS values from the ADDRESS database table. This DataStage job sequence controls the order in which the related DataStage jobs are invoked.

DLDBSELADDRESS

DLDBSELCONTACTMETHOD

DLDBSELMD5ADDRESS

DL_061__AI_ASSIGN_IDS_CONTRACT

58

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage asset DL_061_AI_Contract_NativeKey

Description This DataStage job generates database record identifiers for both Contract and NativeKey records and identifies updates to the Contract database tables. Inputs: ErrCon_Contract and INPUT_CONTRACT_MASTER Outputs: Insert_NATIVEKEY, Insert_CONTRACT, Update_CONTRACT, DuplicateBeforeImage_Contract, Contract_to_Component_Join and Contract_INSTANCE_PK Assign Shared Container (Key): DLAINextKey (NATIVE_KEY_ID), AINextKeyPrefix (CONTRACT_ID), DLAINextKey (CONTRACT_ID) This DataStage job generates database record identifiers for ALERT records and identifies updates to the ALERT database tables. Inputs: CONTRACT_ErrCon_Alert, Contract_INSTANCE_PK and CONTRACT_AlertExisting Outputs: CONTRACT_Insert_ALERT, CONTRACT_Update_ALERT and DuplicateBeforeImage_Alert Assign Shared Container (Key): DLAINextKey (ALERT_ID) This DataStage job generates database record identifiers for MiscValue records and identifies updates to the MiscValue database tables. Inputs: CONTRACT_ErrCon_MiscValue, Contract_INSTANCE_PK and CONTRACT_MiscValueExisting Outputs: CONTRACT_Insert_MISCVALUE, CONTRACT_Update_MISCVALUE and DuplicateBeforeImage_MiscValue Assign Shared Container (Key): DLAINextKey (MISCVALUE_ID)

DL_061_AI_Alert_Contract

DL_061_AI_MiscValue_Contract

Chapter 3. RDP MDM: Direct Load

59

DataStage asset DL_061_AI_Contract_Comp

Description This DataStage job generates database record identifiers for Contract Component and Contract Component Value records and identifies updates to the Contract Component and Contract Component Value database tables. Inputs: Contract_to_Component_Join, ErrCon_ContractComponent, ContractComponentExisting, ErrCon_ContractCompVal and ContractCompValExisting Outputs: ContrComp_to_ContrRole, Insert_CONTRACTCOMPONENT, Update_CONTRACTCOMPONENT, DuplicateBeforeImage_ContractComp, Insert_CONTRACTCOMPVAL, Update_CONTRACTCOMPVAL and DuplicateBeforeImage_ContractComponentValue Assign Shared Container (Key): DLAINextKey (CONTR_COMPONENT_ID) and DLAINextKey (CONTR_COMP_VAL_ID) This DataStage job generates database record identifiers for ContractRole records and identifies updates to the ContractRole database tables. Inputs: ContrComp_to_ContrRole, ErrCon_ContractRole and ContractComponentRoleExisting Outputs: Insert_CONTRACTROLE, Update_CONTRACTROLE, ContrRole_RoleLoc and DuplicateBeforeImage_ContractRole Assign Shared Container (Key): DLAINextKey (CONTRACT_ROLE_ID) This DataStage job generates database record identifiers for Role Location records and identifies updates to the Role Location database tables. Inputs: ContrRole_RoleLoc, ErrCon_RoleLocation and RoleLocationExisting Outputs: Insert_ROLELOCATION, Update_ROLELOCATION and DuplicateBeforeImage_ContractRoleLocation Assign Shared Container (Key): DLAINextKey (ROLE_LOCATION_ID)

DL_061_AI_Contract_Role

DL_061_AI_Role_Location

60

Master Data Management: IBM InfoSphere Rapid Deployment Package

3.8 Data insert and update


This processing category consists of those MDM RDP Direct Load components that are used to either update records or insert records into the MDM Server database. These assets consist of the following structure and format: Reading in a data set Constructing a history record as determined by the value of the MDMIS DS_LOAD_HISTORY parameter Inputting records to a database shared container. Note the following information about the database shared contains: It is database-specific: ODBC DB2 ORACLE

It connects to the MDM Server database. It properly formats the data record. It either inserts a new record, updates an existing record, or based on parameter settings, inserts a related history record. The MDMIS DS_LOAD_MODE parameter value data is used to invoke the DataStage job sequences that controls the bulk loading of data through either the use of SQL, or through bulk loading. The bulk loading of data requires specific configuration activities and considerations that are not presented here. For more information, see related articles by searching the IBM developerWorks site: http://www.ibm.com/developerworks/data/library/ Table 3-7 presents the DataStage jobs involved in inserting and updating records in the MDM Server database. It does not include those assets that are used to connect to the MDM Server database for each specific table or to bulk load data. Because a large number of DataStage assets are involved, the table is organized by the associated database table. The description of the DataStage job sequences that invoke the jobs and the assets used to bulk load data are described in Table 3-7.
Table 3-7 DataStage assets that insert or update by database table Database table Address AddressGroup DataStage asset IL_090_LD_Insert_Address IL_090_LD_Insert_AddressGroup DL_090_LD_Update_AddressGroup

Chapter 3. RDP MDM: Direct Load

61

Database table AddressGroup History Alert

DataStage asset DL_091_LD_Update_AddressGroup_History DL_090_LD_Insert_Alert_Contract DL_090_LD_Insert_Alert_Party DL_090_LD_Update_Alert_Contact DL_090_LD_Update_Alert_Contract DL_091_LD_Update_Alert_History IL_090_LD_Insert_Contact DL_090_LD_Update_Contact DL_091_LD_Update_Contact_History IL_090_LD_Insert_ContactMethod DL_090_LD_Update_ContactMethod DL_091_LD_Update_ContactMethod_History IL_090_LD_Insert_ContactMethodGroup DL_090_LD_Update_ContactMethodGroup DL_091_LD_Update_ContactMethodGroup_History IL_090_LD_Insert_ContactRel DL_090_LD_Update_ContactRel DL_091_LD_Update_ContactRel_History IL_090_LD_Insert_ContEquiv IL_090_LD_Insert_Contract DL_090_LD_Update_Contract DL_091_LD_Update_Contract_History IL_090_LD_Insert_ContractComponent DL_090_LD_Update_ContractComponent DL_091_LD_Update_ContractComponent_History IL_090_LD_Insert_ContractCompVal DL_090_LD_Update_ContractCompVal DL_091_LD_Update_ContractCompVal_History IL_090_LD_Insert_ContractRole DL_090_LD_Update_ContractRole

Alert History Contact Contact History ContactMethod ContactMethod History ContactMethodGroup ContactMethodGroup History ContactRel ContactRel History CONTEQUIV Contract Contract History ContractComponent ContractComponent History ContractComponent Value ContractComponent Value History ContractRole

62

Master Data Management: IBM InfoSphere Rapid Deployment Package

Database table ContractRole History Identifier Identifier History LOBREL LOBREL History LocationGroup LocationGroup History MiscValue

DataStage asset DL_091_LD_Update_ContractRole_History IL_090_LD_Insert_Identifier DL_090_LD_Update_Identifier DL_091_LD_Update_Identifier_History IL_090_LD_Insert_LobRel DL_090_LD_Update_LOBRel DL_091_LD_Update_LOBRel_History IL_090_LD_Insert_LocationGroup DL_090_LD_Update_LocationGroup DL_091_LD_Update_LocationGroup_History DL_090_LD_Insert_MiscValue_Contract DL_090_LD_Insert_MiscValue_Party DL_090_LD_Update_MiscValue_Contact DL_090_LD_Update_MiscValue_Contract DL_091_LD_Update_MiscValue_History IL_090_LD_Insert_NativeKey IL_090_LD_Insert_Org DL_090_LD_Update_Org DL_091_LD_Update_Org_History IL_090_LD_Insert_OrgName DL_090_LD_Update_OrgName DL_091_LD_Update_OrgName_History IL_090_LD_Insert_Person DL_090_LD_Update_Person DL_091_LD_Update_Person_History IL_090_LD_Insert_PersonName DL_090_LD_Update_PersonName DL_091_LD_Update_PersonName_History IL_090_LD_Insert_PersonSearch DL_090_LD_Update_PersonSearch DL_091_LD_Update_PersonSearch_History

MiscValue History NativeKey Org Org History OrgName OrgName History Person Person History PersonName PersonName History PersonSearch PersonSearch History

Chapter 3. RDP MDM: Direct Load

63

Database table PhoneNumber PhoneNumber History PPrefEntity PPrefEntity History PrivPref PrivPref History RoleLocation RoleLocation History Suspect

DataStage asset IL_090_LD_Insert_PhoneNumber DL_090_LD_Update_PhoneNumber DL_091_LD_Update_PhoneNumber_History IL_090_LD_Insert_PPrefEntity DL_090_LD_Update_PPrefEntity DL_091_LD_Update_PPrefEntity_History IL_090_LD_Insert_PrivPref DL_090_LD_Update_PrivPref DL_091_LD_Update_PrivPref_History IL_090_LD_Insert_RoleLocation DL_090_LD_Update_RoleLocation DL_091_LD_Update_RoleLocation_History IL_090_LD_Insert_Suspect

The DataStage assets presented in Table 3-7 on page 61 are invoked by type of data (Party or Contract) with updates preceding inserts. Therefore, for Party data, we have the following job sequences: DL_090_LD__Update_Party_SQL, which invokes the update jobs specific to Party database tables DL_090_LD__Update_Contract_SQL, which invokes update jobs that are related to Contract data The insert job sequences of DL_090_LD__Insert_Party_SQL and DL_090_LD__Insert_Contract_SQL are a bit different. In these job sequences, the value of the MDMIS LOAD_METHOD parameter influences whether data loading is performed utilizing bulk loading or through the conventional approach as represented by the DataStage jobs in Table 3-7 on page 61. The bulk loading job sequences, DL_091__LD_Bulk_Party and DL_091__LD_Bulk_Contract, use a DataStage container, DL_091_LD_Bulk_Common, that allows for multiple instance to run at the same time. The name of a table and related history table are passed into this shared container and it performs the specific bulk loading process on those tables. For the CONTEQUIV, NativeKey and Suspect tables, there are specific jobs (DL_091_LD_Bulk_ContEquiv, DL_091_LD_Bulk_Suspect and DL_091_LD_Bulk_NativeKey) to bulk load related data.

64

Master Data Management: IBM InfoSphere Rapid Deployment Package

Chapter 4.

RDP for MDM: Delta Load


This chapter provides an overview of a Delta Load solution using IBM InfoSphere MDM Server (MDM) Rapid Deployment Package (RDP) Runtime Assets and MDM Party Maintenance Services. A Delta Load in RDP is the process of synchronizing changes in source system data with MDM Server. Because data is processed by MDM services during loading, this solution provides the best level of business data validation, ease of implementation and maintenance, and highest MDM Server sustainability. We also include detailed implementation, configuration, and installation information about MDM RDP Runtime Assets and MDM Party Maintenance Services.

Copyright IBM Corp. 2009, 2011. All rights reserved.

65

4.1 Overview
The RDP for MDM Delta Load solution assumes that the MDM Server has already been installed and the data direct load has been completed using Information Server DataStage and QualityStage jobs. The two main components in this solution are as follows: MDM RDP Runtime Assets Must be installed on top of MDM Party Maintenance Services. MDM Party Maintenance Services Must be installed on top of MDM Server. This solution preserves the source-to-SIF work that was done in the direct load. The QualityStage sequencer job is executed to process SIF records in the order required by MDM Server and generates sequenced SIF files. The MDM BatchProcessor, or other batch framework, is used to read sequenced SIF files and feed the SIF records into MDM Server. MDM Server invokes the SIF Parser to transfer input SIF text messages into MDM business objects. MDM Server then picks up configured MDM Party Maintenance Services as a business proxy and runs it as a Java composite transaction. It resolves MDM business object identities and eventually starts an MDM Server add or update transaction. MDM Server invokes a Duplicated Suspect Candidate Search Rule extension in MDM RDP Runtime Assets to emulate QualityStage blocking for duplicated candidate selection. MDM Server sends a request to QualityStage runtime jobs for standardization and party matching. The Delta Load solution provides a functionality, which is same as the functionality in the Direct Load performed by RDP for MDM Direct Load.

66

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 4-1 depicts how the components are grouped together and form the RDP for MDM Delta Load solution.

MDM Server
MDM Party Mainenance Services Source Input File
DataStage

Information Server
QS

DS Runtime Jobs

Figure 4-1 RDP for MDM Delta Load solution

4.2 MDM Party Maintenance Services


MDM Party Maintenance Services is created to provide a rapid solution to implement MDM Server and load data for MDM Server. MDM Party Maintenance Services support a subset of MDM Party Domain (see details in 4.2.4, MDM Party Maintenance Services Profile on page 87). MDM Party Maintenance Services can be used in both initial data load and delta data load processes. MDM Party Maintenance Services is packaged and distributed as part of the InfoSphere MDM Server samples.

4.2.1 The instance resolution problem


InfoSphere MDM Server creates a unique internal identifier for each record or business entity, and that serves as its internal key. This key is called the Business Key. The Business Key is not typically intended to be published to other applications outside of InfoSphere MDM Server, established source systems, or

SIF Sequencer (DataStage Job)

Sequenced SIF Files

MDM Batch Processor

SIF Parser

MDM Business Services

MDM Database

Chapter 4. RDP for MDM: Delta Load

67

downstream consuming applications. However, it is available as part of the service response message. With InfoSphere MDM Server, you may configure the Business Key for each business entity. This key serves as the unique identifier of the business entity in external applications. InfoSphere MDM Server services expect the internal identifier to be provided as part of the update service request to ensure that services can identify the correct business entity in the database. However, when data flows into InfoSphere MDM Server directly from external applications, such as existing systems, the internal key is not known and often the nature of the data change is also not known. This issue, which is referred to as an Instance Resolution Problem, requires that the following information be determined: What instance of the business entity is being worked with: party A; party B; or others? Is data being added or updated? If you are trying to update, what instance do you want to update when there are multiple names or addresses, multiple contact methods or identifiers, multiple contract components, multiple party roles, and so on? This problem is addressed by MDM Party Maintenance Services.

4.2.2 MDM Party Maintenance Services behavior


MDM Party Maintenance Services do not require the internal key as part of the input. They also do not require the external system to specify if this entity must be added or updated in InfoSphere MDM Server. MDM Party Maintenance Services use the Business Key that is provided in the load operation to locate the correct instance of the business entity in the database. If an existing entity is found, it is updated by using the appropriate transaction, such as updateParty. If no existing entity is found, a new entity is created in InfoSphere MDM Server using the appropriate transaction, such as addParty. Party Maintenance Services support automatic expiry for deleted client records. This feature can be enabled by setting the following value to true in the WLCommon_extension.properties file: syncSourceSystemEndedDataWithMDM

68

Master Data Management: IBM InfoSphere Rapid Deployment Package

When the feature is enabled, existing child object records that are not provided in the input will be expired in the database. This functionality exists to accommodate source systems without or with limited change data capture capabilities. Table 4-1 summarizes the supported objects in MDM Party Maintenance Services.
Table 4-1 MDM Party Maintenance Services supported objects Entity Party Child objects PersonName, OrganizationName, PartyAddress, PartyContactMethod, PartyPrivPref, PartyIdentification, TCRMPartyValue, PartyLobRelationship, PartyAlert ContractAlert, ContractValue, ContractComponen ContractComponentValue, ContractPartyRole ContractParyRoleLocation

Contract ContractComponent ContractPartyRole

MDM Party Maintenance Services provide external behavior extensions to disallow the creation of duplicate entities based on Business Keys. They also disallow the updating of Business Keys. The new extensions are configured on the transactions for the following entities: PartyAddress PartyContactMethod ContractComponent ContractRoleLocation ContractComponentValue The group validations for the Party Role validation function are expired when the Party Maintenance Services sample is deployed.

Business Keys
MDM Party Maintenance Services use Business Keys to identify the correct instance of the business entity in the database. MDM Party Maintenance Services redefine the Business Keys that are provided by default as part of InfoSphere MDM Server.

Chapter 4. RDP for MDM: Delta Load

69

Table 4-2 lists Business Keys in MDM Party Maintenance Services.


Table 4-2 MDM Party Maintenance Services Business Keys Entity PersonName OrganizationName PartyAddress/Address PartyContactMethod/ContactMethod PartyRelationship PartyPrivPref PartyIdentification Business Keys NameUsageType NameUsageType AddressUsageType ContactMethodUsageType RelationshipFromValue, RelationshipToValue, RelationshipType PrivPrefEntity, PrivPrefType IdentificationType Note: MDM Server already includes an internal validation to disallow duplicates of IdentificationType and IdentificationNumber combinations. AdminPartyId, AdminSystemType PartyValueType ContractId, ContractComponentType, ProductType ContractComponentId, RoleType LocationGroupId, ContractRoleId AdminFldNmTp or AdminSystemType, AdminContractId Note: transactions search for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId element on TCRMContractBObj ContractComponentId, DomainValueType EntityName, InstancePK, AlertType RelatedLobType, LobRelationshipType

AdminContEquiv TCRMPartyValue ContractComponent ContractPartyRole ContractRoleLocation Contract

ContractComponentValue Alert (for Contract and Party only) PartyLobRelationship

70

Master Data Management: IBM InfoSphere Rapid Deployment Package

Customizing Business Keys


MDM Party Maintenance Services use the Business Keys configured in the metadata V_ELEMENTATTRIBUTE table, which can help to more easily redefine the keys for a particular client implementation. Customizing the Business Keys can use one of the following ways: Redefine Business Keys and create an SQL script. Execute the SQL script to update Business Keys in V_ELEMENTATTRIBUTE table. Restart the server to refresh MDM Server cache. Change the source code of the appropriate business proxy. For example, change the MaintainPartyAlertBP.resolveIdentity() method to implement custom business logic. Write an alternative implementation of the business proxy. For example, write a new MaintainPartyAlertCustomBP class that overrides the resolveIdentity() method and configure this class to be invoked for the maintainPartyAlert transaction.

Request and response message


The data for the Business Key for a particular object can come from one or more objects provided as part of the request message. For example, the Role Location Business Key includes the Contract Role primary key, which can be determined only by knowing the Contract Role type. As a result, the maintainContractRoleLocation service includes the Contract object hierarchy, which is Contract, Contract Component, Contract Party Role, Person or Organization, Party Address or Party Contact Method, Contact Equivalency and Role Location. The Party maintenance service itself ultimately executes either an addContractRoleLocation transaction or an updateContractRoleLocation transaction, but it requires some data from objects in the hierarchy to resolve the instance of the role location. Responses from the fine-grained Party Maintenance Services are constructed by capturing the response from the core transaction after it is executed, and replacing it in the response object hierarchy. For example, the response from an addContractRoleLocation is inserted into a Contract response object for maintainContractRoleLocation. Only the Business Key data is included in the response for objects that contribute Business Key data. For example, the maintainContractRoleLocation transaction response only includes Business Key data for the entities Contract, Contract Component, Contract Party Role, Person or Organization, Party Address or Party Contact Method. It also contains the fully populated Contract Role Location entity.

Chapter 4. RDP for MDM: Delta Load

71

If non-Business Key data is provided in the request, it is ignored and excluded from the constructed responses. If objects that are not required for the execution of the services are included in the request, the service fails. For example, if a contract component value business object is provided in a maintainRoleLocation service, the transaction fails because none of the information from the contract component value business object is required to resolve the identity of a role location instance.

Transaction message format


Party Maintenance Services support the following message formats: Standard <TCRMService> XML transaction format as provide by InfoSphere MDM Server. Standard Interface File (SIF) format as defined in the RDP for MDM Direct Load solution. The SIF parser is provided as part of the MDM RDP Runtime Assets. The SIF parser needs to be deployed on InfoSphere MDM Server before the SIF format can be used with MDM Party Maintenance Services. Party Maintenance Services do not support web services.

Implementation details
MDM Party Maintenance Services are implemented as Java composites that use the existing InfoSphere MDM Server business components and services. MDM Party Maintenance Services complement existing business services by providing a delta-sensing capability. Each of these services can be invoked individually to handle an individual business object, or as part of other composite transactions. This way facilitates the reuse of business logic between various Party Maintenance Services. Each Party maintenance business proxy class implements the IMaintainService interface and provides the resolveIdentity() method. The resolveIdentity() method is responsible for resolving the identity of the business object managed by the proxy. For example, MaintainPersonNameBP resolves the identity of the PersonName business object.

72

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 4-2 shows a sample class diagram in MDM Party Maintenance Services.

IMaintainService
<<JavaClass>>

DWLTxnBP

resolveIdentity(DWLCommon, DWLCommon,boolean)

MaintainBaseBP

<<JavaClass>>

<<JavaClass>>

MaintainPartyBP
execute ( ) resolveIdentity ( ) fireTransaction ( )

MaintainPersonNameBP
execute ( ) resolveIdentity ( ) fireTransaction ( )

Figure 4-2 Party maintenance business proxies class diagram

For example, the MaintainPartyBP.resolveIdentity() method is responsible for identity resolution of the Party entity. The MaintaintPartyBP class delegates the job of resolving the identity of the PersonName entity to the MaintainPersonNameBP.resolveIdentity() method. The MaintainPersonNameBP.resolveIdentity() method searches for active PartyName entities. It compares them with PartyNames entities in the input object to identify the PartyName entities that already exist in the database. The MaintainPersonNameBP.resolveIdentity() method takes the Party object as an input. In this case, the Party object carries a list of PersonName entities. Party object also contains the PartyId that was populated after the identity of the party was resolved by the MaintainPartyBP.resolveIdentity() method.

Chapter 4. RDP for MDM: Delta Load

73

Figure 4-3 shows a maintainParty composite transaction sequence diagram in MDM Party Maintenance Services.

:BatchProcessor

:MaintainPartyBP

:MaintainPersonNameBP

Resolves identity for Party and delegates to other BPs to do identity resolution of child objects

Resolves identity for list of Person Names

Issue add/updateParty tx to persist Party with children

Figure 4-3 Party Maintenance Business Proxies sequence diagram

4.2.3 MDM Party Maintenance Services Transaction List


This section describes the 17 composite transactions provided in MDM Party Maintenance Services. Each transaction can handle only a predefined list of business objects. If any additional objects are provided as part of the transaction input, an error is returned. The error occurs because MDM Party Maintenance Services validate and apply identity resolution logic only on supported entities; all unsupported entities are rejected.

74

Master Data Management: IBM InfoSphere Rapid Deployment Package

MaintainParty
This transaction handles the Person entity or the Organization entity and some of their child objects. Note that no accommodation exists within maintainParty to translate a contact equivalent into a party ID for CONTACT. provided_by_cont as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj, which contains one mandatory TCRMAdminContEquivBObj object and an optional list of child objects: TCRMOrganizationNameBObj or TCRMPersonNameBObj. TCMROrganizationNameBObj is mandatory for TCRMOrganizationBObj and TCRMPersonNameBObj is mandatory for TCRMPersonBObj TCRMPartyAddressBObj with one TCRMAddressBObj TCRMPartyContactMethodBObj with one TCRMContactMethodBObj TCRMPartyLobRelationshipBObj TCRMAlertBObj TCRMPartyPrivPrefBObj TCRMPartyValueBObj TCRMPartyIdentificationBObj Details The maintainParty transaction searches for the active parties based on the Business Key from the TCRMAdminContEquivBObj object. If an active party is found, it is updated using the updateParty transaction. If no active party is found, a new party is added using the addParty transaction. This transaction delegates the task of resolving the identity of the party child object to the following transactions: MaintainOrganizationName or MaintainPersonName MaintainPartyAddress MaintainPartyContactMethod MaintainPartyLobRelationship MaintainPartyAlert MaintainPartyPrivPref MaintainPartyValue MaintainPartyIdentification

If the auto-expiry for deleted client records is enabled, any existing partys child object records in the database which are not provided in the input will be expired by setting their EndDate values to the servers current time.

Chapter 4. RDP for MDM: Delta Load

75

MaintainOrganizationName
This transaction handles the OrganizationName entity as follows: Input object The TCRMOrganizationBObj object, which contains one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMOrganizationNameBObj object. Details The maintainOrganizationName transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, the maintainOrganizationName transaction performs identity resolution for the OrganizationName entity using Business Keys defined for OrganizationName. If an existing active OrganizationName is found, it is updated using the updateOrganizationName transaction; if none is found, a new entity is added using the addOrganizationName transaction.

MaintainPersonName
This transaction handles the PersonName entity, as follows: Input object TCRMPersonBObj object, which contains one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPersonNameBObj object. Details The maintainPersonName transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, the maintainPersonName performs identity resolution for the PersonName entity using the Business Keys defined for PersonName. If an existing active PersonName is found, it is updated using the updatePersonName transaction; if none is found, a new entity is added using the addPersonName transaction.

76

Master Data Management: IBM InfoSphere Rapid Deployment Package

MaintainPartyAddress
This transaction handles the PartyAddress entity, as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj or TCRMPartyBObj object, which contain one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPartyAddressBObj object. TCRMPartyAddressBObj must include one TCRMAddressBObj object. Details The maintainPartyAddress transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, the maintainPartyAddress transaction performs identity resolution for the PartyAddress entity using the Business Keys defined for PartyAddress. If an existing active PartyAddress is found, it is updated using the updatePartyAddress transaction; if none is found, a new entity is added using the addPartyAddress transaction.

MaintainPartyContactMethod
This transaction handles PartyContactMethod entity, as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj or TCRMPartyBObj object, which contain one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPartyContactMethodBObj objects. TCRMPartyContactMethodBObj must include one TCRMContactMethodBObj object. Details The maintainPartyContactMethod transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, it performs identity resolution for the PartyContactMethod entity using the Business Keys defined for PartyContactMethod. If an existing active PartyContactMethod is found, it is updated using the updatePartyContactMethod transaction; if none is found, a new entity is added using the addPartyContactMethod transaction.

Chapter 4. RDP for MDM: Delta Load

77

MaintainPartyLobRelationship
This transaction handles PartyLobRelationship entity, as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj object, which contain one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPartyLobRelationshipBObj object. Details The maintainPartyLobRelationship transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, the maintainPartyLobRelationship performs identity resolution for the PartyLobRelationship entity using the Business Keys defined for PartyLobRelationship. If an existing active PartyLobRelationship is found, it is updated using the updatePartyLobRelationship transaction; if none is found, a new entity will be added using the addPartyLobRelationship transaction.

MaintainPartyAlert
This transaction handles PartyAlert entity, as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj or TCRMPartyBObj object, which contain one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMAlertBObj object. Details The maintainPartyAlert transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, the maintainPartyAlert transaction performs identity resolution for the PartyAlert entity using the Business Keys defined for PartyAlert. If an existing active PartyAlert entity is found, it is updated using the updatePartyAlert transaction; if none is found, a new entity will be added using the addPartyAlert transaction.

78

Master Data Management: IBM InfoSphere Rapid Deployment Package

MaintainPartyPrivPref
This transaction handles the PartyPrivacyPreference entity, as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj or TCRMPartyBObj object, which contains one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPartyPrivPrefBObj object. Details The maintainPartyPrivPref transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If an active party is found, the maintainPartyPrifPref transaction performs identity resolution for the PartyPrivacyPreference entity using Business Keys defined for PartyPrivacyPreference. If an existing active PartyPrivacyPreference entity is found, it is updated using the updatePartyPrivacyPreference transaction; if none is found, a new entity is added using addPartyPrivacyPreference transaction.

MaintainPartyValue
This transaction handles the PartyPrivacyPreference entity, as follows: Input object TCRMPersonBObj or TCRMOrganizationBObj or TCRMPartyBObj object, which contain one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPartyValueBObj object. Details The maintainPartyValue transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object: If active party is found, the maintainPartyValue transaction performs identity resolution for the PartyValue entity using the Business Keys defined for PartyValue. If an existing active PartyValue is found, it is updated using the updatePartyValue transaction; if none is found, a new entity is added using the addPartyValue transaction.

Chapter 4. RDP for MDM: Delta Load

79

MaintainPartyIdentification
This transaction handles PartyIdentification entity, as follows: Note: No accommodation exists within maintainPartyIdentification to translate a contact equivalent into a party ID for IDENTIFIER.assigned_by. Input object TCRMPersonBObj or TCRMOrganizationBObj or TCRMPartyBObj object, which contain one mandatory TCRMAdminContEquivBObj object and one mandatory TCRMPartyIdentificationBObj object. Details The maintainPartyIdentification transaction searches for active parties based on the Business Key from the TCRMAdminContEquivBObj object. If an active party is found, the maintainPartyIdentification transaction performs identity resolution for the PartyIdentification entity using the Business Keys defined for PartyIdentification. If an existing active PartyIdentification is found, it is updated using the updatePartyIdentification transaction; if none is found, a new entity is added using the addPartyIdentification transaction. Note: Although Party Maintenance Services redefine Business Keys for the PartyIdentification entity to be IdentificationType, MDM Server already includes an internal validation to disallow duplicates of IdentificationType and IdentificationNumber combinations. Therefore, the Business Keys for PartyIdentification consist of IdentificationType and IdentificationNumber

MaintainPartyRelationships
This transaction handles the PartyRelationship entity, as follows: Input object TCRMPartyListBObj object, which contains multiple parties and each represented by the TCRMPartyBObj, TCRMPersonBObj or TCRMOrganizationBObj object. Each party must contain one mandatory TCRMAdminContEquivBObj object. Details The ObjectReferenceId elements must be used to properly identify the source and target parties within the relationship. Each PartyRelationship object must have its parent partys ObjectReferenceId as either its source or target party. Every PartyRelationship object must defines a relationship between a primary party and one of the other parties in the PartyList. If only two parties are provided, the first in the PartyList would be considered as the primary party.

80

Master Data Management: IBM InfoSphere Rapid Deployment Package

If the feature to automatically set the EndDate for deleted records is enabled, it is possible to provide only one party without any PartyRelationship object in the PartyList. The purpose of providing this input is to end all the existing active relationships for that party. If multiple parties are provided, the primary partys relationships that are not in the input are ended. The maintainPartyRelationship transaction searches for all parties based on the Business Key from the TCRMAdminContEquivBObj objects. The maintainPartyRelationships transaction searches for existing active relationships for the primary party using the getAllPartyRelationships transaction, performs identity resolution for the PartyRelationship entity based on the PartyRelationship Business Keys, and uses the updatePartyRelationship or addPartyRelationship transactions to update or add relationship objects. This transaction is not called as part of maintainParty because the assumption is that party relationships will be loaded separately after all the parties have been added to the system.

MaintainContractPlus
This transaction handles the Contract entity, including child entities ContractValue, ContractComponent, and ContractAlert, as follows: Note: No accommodation exists within maintainContractPlus to translate a contract cross reference into contract Id for CONTRACT.repl_by_contract. Input object TCRMContractBObj, which contains extension TCRMContractPlusBObjExt. In addition, at least one of the following items is mandatory: TCRMAdminNativeKeyBObj child object AdminContractId and AdminSysTp elements on TCRMContractBObj This transaction also has the following optional child objects: TCRMContractComponentBObj, along with the following optional child objects: TCRMContractComponentValueBObj TCRMContractPartyRoleBObj TCRMContractRoleLocationBObj TCRMContractAlertBObj

TCRMContractValueBObj, which is provided as an extended element of TCRMContractPlusBObjExt

Chapter 4. RDP for MDM: Delta Load

81

Details In the response message, the TCRMContractValueBObj is included as part of the TCRMContractBObj object, not as an extended element of the TCRMContractPlusBObjExt. The maintainContractPlus transaction searches for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId and AdminSysTp elements on TCRMContractBObj. If an active contract is found, it is updated using the updateContract transaction; if none is found, a new contract is added using the addContract transaction. The maintainContractPlus transaction adds multiple TCRMAdminNativeKeyBObj entities if multiple TCRMAdminNativeKeyBObj objects are provided as part of the request. However, only the first TCRMAdminNativeKeyBObj object is used as the Business Key to search for an existing Contract entity. This transaction delegates the task of child object identity resolution to the following transactions: MaintainContractComponent MaintainContractAlert The maintainContractPlus transaction performs identity resolution for ContractValue objects and maintains them using either the updateContractValue transaction or the addContractValue transaction. If the auto-expiry for deleted client records is enabled, any existing ContractAlert, ContractValue, ContractPartyRole and ContractPartyRoleLocation records in the database which are not provided in the input will be expired by setting their EndDate values to the servers current time.

MaintainContractComponent
This transaction handles the ContractComponent entity, including the child entities ContractComponentValue and ContractPartyRole, as follows: Input object TCRMContractBObj, which must contain the following items: At least one of the following items: TCRMAdminNativeKeyBObj child object AdminContractId and AdminSysTp elements on TCRMContractBObj

One mandatory TCRMContractComponentBObj object, which can contain either of the following optional child objects: TCRMContractComponentValueBObj TCRMContractPartyRoleBObj

82

Master Data Management: IBM InfoSphere Rapid Deployment Package

Details The maintainContractComponent transaction searches for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId and AdminSysTp elements on the TCRMContractBObj: If an active contract is found, the maintainContractComponent transaction performs identity resolution for the ContractComponent entity using the Business Keys defined for ContractComponent. If an existing ContractComponent entity is found, it is updated using the updateContractComponent transaction; if none is found, a new one is added using the addContractComponent transaction. This transaction delegates the task of resolving the identity of the ContractComponent child object to the following transactions: MaintainContractComponentValue MaintainContractPartyRole If auto-expiry for deleted client records is enabled, any existing ContractComponentValue, ContractPartyRole, and ContractPartyRoleLocation records in the database that are not provided in the input will be expired by setting their EndDate values to the servers current time.

MaintainContractAlert
This transaction handles the ContractAlert entity, as follows: Input object TCRMContractBObj, which contains the following items: At least one of the following items: TCRMAdminNativeKeyBObj child object, or AdminContractId and AdminSysTp elements on TCRMContractBObj

One mandatory TCRMContractAlertBObj object

Chapter 4. RDP for MDM: Delta Load

83

Details The maintainContractAlert transaction searches for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId and AdminSysTp elements on TCRMContractBObj: If an active contract is found, the maintainContractAlert transaction performs identity resolution for the ContractAlert entity using the Business Keys defined for ContractAlert. If an existing active ContractAlert is found, it is updated using the updateContractAlert transaction; if none is found, a new one is added using the addContractAlert transaction.

MaintainContractComponentValue
This transaction handles the ContractComponentValue entity, as follows: Input object TCRMContractBObj, which contains the following items: At least one of the following items: TCRMAdminNativeKeyBObj child object, or AdminContractId and AdminSysTp elements on TCRMContractBObj

One mandatory TCRMContractComponentBObj object, which contains its Business Keys and one mandatory TCRMContractComponentValueBObj Details The maintainContractComponentValue transaction searches for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId and AdminSysTp elements on TCRMContractBObj: If an active existing Contract is found, it retrieves the list of ContractComponent objects for the existing contract and performs identity resolution for the ContractComponent entity using the Business Keys defined for ContractComponent. If an existing ContractComponent is found, the maintainContractComponentValue transaction performs identity resolution for the ContractComponentValue entity using the Business Keys defined for the ContractComponentValue. If an existing active ContractComponentValue is found, it is updated using the updateContractComponentValue transaction; if none is found, a new one is added using the addContractComponentValue transaction.

84

Master Data Management: IBM InfoSphere Rapid Deployment Package

MaintainContractPartyRole
This transaction handles ContractPartyRole entity, including child entity ContractRoleLocation, as follows: Input object TCRMContractBObj, which contains the following items: At least one of the following items: TCRMAdminNativeKeyBObj child object AdminContractId and AdminSysTp elements on TCRMContractBObj

One mandatory TCRMContractComponentBObj object, which contains its Business Keys and which must contain one TCRMContractPartyRoleBObj with one TCRMPersonBObj, or one TCRMOrganizationBObj object which has one mandatory TCRMAdminContEquivBObj object to identify the party. Details The maintainContractPartyRole transaction searches for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId and AdminSysTp elements on TCRMContractBObj: If an active existing Contract is found, the maintainContractPartyRole transaction retrieves the list of ContractComponent objects for the existing contract and performs identity resolution for the ContractComponent entity using the Business Keys defined for ContractComponent. If an existing ContractComponent is found, the maintainContractPartyRole transaction searches for existing Party Roles based on the partys TCRMAdminContEquivBObj and it performs identity resolution for the ContractPartyRole entity using the Business Keys defined for ContractPartyRole. If an existing active ContractPartyRole is found, it is updated using the updateContractPartyRole transaction; if none is found, a new one is added using the addContractPartyRole transaction. This transaction delegates the task of resolving the identity of the ContractRoleLocation child object to the MaintainContractRoleLocation transaction. If the auto-expiry for deleted client records is enabled, any existing ContractPartyRole records in the database which are not provided in the input will be expired by setting their EndDate values to the servers current time.

Chapter 4. RDP for MDM: Delta Load

85

MaintainContractRoleLocation
This transaction handles the ContractRoleLocation entity, as follows: Input object TCRMContractBObj, which must contain the following items: One of the following items: TCRMAdminNativeKeyBObj child object AdminContractId and AdminSysTp elements on TCRMContractBObj

One mandatory TCRMContractComponentBObj object, which contains the Business Keys for ContractComponent One mandatory TCRMContractPartyRoleBObj object, which contains the following items: One Party object (TCRMPersonBObj or TCRMOrganizationBObj) with: One mandatory TCRMAdminContEquivBObj object to identify the party One mandatory TCRMPartyAddressBObj or TCRMPartyContactMethodBObj object. The Address and ContractMethod must exist in the system before you execute this transaction One mandatory TCRMContractRoleLocationBObj object, which contains the ObjectReferenceId to either the TCRMPartyAddressBObj or the TCRMPartyContactMethodBObj object from Party

Details The maintainContractRoleLocation transaction searches for active contracts based on the Business Key from either the TCRMAdminNativeKeyBObj child object or the AdminContractId and AdminSysTp elements on the TCRMContractBObj object. If an active existing Contract is found, the maintainContractRoleLocation transaction retrieves the list of ContractComponent objects for the existing contract and performs identity resolution for the ContractComponent entity using the Business Keys defined for ContractComponent. If an existing ContractComponent is found, the maintainContractRoleLocation transaction searches for the existing Party Roles based on the partys TCRMAdminContEquivBOb and performs identity resolution for the ContractPartyRole entity using Business Keys defined for ContractPartyRole. If an existing ContractPartyRole is found, the maintainContractRoleLocation transaction validates the existence of the Address or Contract Method objects and if the ContractRoleLocation entity

86

Master Data Management: IBM InfoSphere Rapid Deployment Package

exists, performs identity resolution for it using the Business Keys defined for ContractRoleLocation. If an existing active ContractRoleLocation is found, it is updated using the updateContractRoleLocation transaction; if none is found, a new one is added using the addContractRoleLocation transaction.

4.2.4 MDM Party Maintenance Services Profile


MDM Party Maintenance Services support a subset of MDM Party domain business objects that are children of TCRMParty and TCRMContract. The following child business objects are not supported: TCRMPartyAddressPrivPrefBObj TCRMPartyAddressPrivPrefBObj TCRMAddressValueBObj TCRMAddressNoteBObj TCRMPartyContactMethodPrivPrefBObj TCRMPartyLocationPrivPrefBObj TCRMFinancialProfileBObj TCRMPartyBankAccountBObj TCRMPartyChargeCardBObj TCRMPartyPayrollDeductionBObj TCRMIncomeSourceBObj TCRMVehicleHoldingBObj TCRMPropertyHoldingBObj TCRMAlertBObj for TCRMContractPartyRoleBObj TCRMContractPartyRoleSituationBObj TCRMContractPartyRoleIdentifierBObj TCRMContractPartyRoleRelationshipBObj TCRMContractRelationshipBObj TCRMContractRoleLocationPurposeBObj MDM Party Maintenance Services profile uses an existing InfoSphere MDM Server feature called Smart Inquiries to tune the InfoSphere MDM Server database to avoid accessing unused tables for unsupported business objects. Smart Inquiries provide the ability to reconfigure your server implementation to turn off parts of the data model related to unused transactions and tables. When these parts of the model are turned off, no database I/O inquiry is issued against the unused tables thus improving processing efficiency when getParty and getContract course grained transactions are invoked. Run ELMDM_Smart_Inquiry.sql script to enable this feature. Any databases tables that are included in this script will have their access turned off. For more information about Smart Inquiries, see the IBM InfoSphere Master Data

Chapter 4. RDP for MDM: Delta Load

87

Management Server Developers Guide, Version 9.0, which is licensed material that is available with the product. Table 4-3 lists the operations and function area that has been turned off (not performed).
Table 4-3 Disabled MDM Server operations Operational actions getFinancialProfile getAllContractAdminSysKeys getAllContractPartyRoleAlerts getAllContractPartyRoleAlerts getAllContractPartyRoleSituations getAllContrcatPartyRoleIdentifierbyContr actRoleId getAllContractRelationships getAllIncomeSources getAllPartyBankAccounts getAllPartyChargeCards getAllPartyPayrolldeductions getAllPartyAddressPrivacyPreferences getAllPartyContactMethodPrivacyPrefere nces getAllContractPartyRoleRelationships getAllAddressValues getAllAddressNotes getAllPartyLocationPrivacyPreferences getHolding getAllContractRoleLocationPurposes Function area not performed Financial Profile Contract Admin System Key Contract Party Role Alert Contract Party Role Alert Contract Party Role Situation Contract Party Role Identifier Contract Relationship Income Source Bank Account Charge Card Payroll Deduction Party Address Privacy Preference Party Contact Method Privacy Preference Contract Party Role Relationship Address Values Address Note Location Group Privacy Preference Holding Contract Role Location Purposes

88

Master Data Management: IBM InfoSphere Rapid Deployment Package

4.2.5 MDM Party Maintenance Services installation


MDM Party Maintenance Services must be installed on top of MDM Server. Installing the Party Maintenance Services modifies the configuration of InfoSphere: The Business Keys for entities supported by Party Maintenance Services are redefined. The MDM Party Maintenance Services behavior extensions are deployed to prevent duplicate entities based on Business Keys.

Installation process overview


Installing the MDM Party Maintenance Services consists of the following tasks: 1. Expand the MDM901_Samples.tar.gz from the InfoSphere MDM Server distribution to the server directory: <MDM_Sample_Home>. 2. To install Party Maintenance Services, run one of the following installation shell scripts that is appropriate for your server environment: WebSphere and DB2 Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebSphere/D B2/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebSphere/D B2/install_MDM_PartyMaintenanceServices.sh WebSphere and zOS Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebSphere/z OS/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebSphere/z OS/install_MDM_PartyMaintenanceServices.sh

Chapter 4. RDP for MDM: Delta Load

89

WebSphere and Oracle Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebSphere/O racle/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebSphere/O racle/install_MDM_PartyMaintenanceServices.sh WebLogic and Oracle Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebLogic/Or acle/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/WebLogic/Or acle/install_MDM_PartyMaintenanceServices.sh Cluster and WebSphere and DB2 Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Sphere/DB2/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Sphere/DB2/install_MDM_PartyMaintenanceServices.sh Cluster and WebSphere and zOS Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Sphere/zOS/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Sphere/zOS/install_MDM_PartyMaintenanceServices.sh Cluster and WebSphere and Oracle Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Sphere/Oracle/setVariables.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Sphere/Oracle/install_MDM_PartyMaintenanceServices.sh

90

Master Data Management: IBM InfoSphere Rapid Deployment Package

Cluster and WebLogic and Oracle Environment i. Edit: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Logic/Oracle/install_MDM_PartyMaintenanceServices.sh ii. Run: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/Web Logic/Oracle/setVariables.sh

Installation steps
Use the following steps to install MDM Party Maintenance Services: 1. Download the InfoSphere MDM Server Samples archive from InfoSphere MDM Server Samples Packaging (for example, MDM901_Samples.tar.gz) to a temp directory on the server, <MDM_Sample_Home>, and extract the content using commands in Example 4-1.
Example 4-1 Extract MDM Samples

gzip d MDMRDP901_Samples.tar.gz tar xvf MDMRDP901_Samples.tar The TAR file is expanded into several directories. The install_MDM_PartyMaintenanceServices.sh script is located in the following subdirectory: <MDM_Sample_Home>/PartyMaintenanceServices/install/ This script runs several scripts and uses resource files under the following folders: <MDM_Sample_Home>/PartyMaintenanceServices/Jars <MDM_Sample_Home>/PartyMaintenanceServices/properties <MDM_Sample_Home>/PartyMaintenanceServices/DB folders The path to each of these resources is indicated relatively in the script. 2. Edit the setVariables.sh based on environment described in Installation process overview on page 89 and set the variables with proper values, including DB_NAME, DB_USER, and DB_PASSWORD.

Chapter 4. RDP for MDM: Delta Load

91

Example 4-2 shows sample values of setVariables.sh file.


Example 4-2 Sample values of setVariables.sh

export export export export export export export export export export export export export export

JAVA_HOME=/usr/IBM/WebSphere/AppServer/java WAS_HOME=/usr/IBM/WebSphere/AppServer CELL_NAME=celebornCell01 NODE_NAME=Node01 APP_NAME=CAM_MDM900_10182009_2200_DB2_BE02 INSTALL_HOME=/usr/IBM/MDM/CAM_MDM900_10182009_2200_DB2_BE02 DB_NAME=MDM9QA2 DB_USER=celcam02 DB_PASSWORD=Schema90 APPLICATION_NAME='InfoSphere Master Data Management' APPLICATION_VERSION=9.0.0 DEPLOY_NAME=CAM_MDM900_10182009_2200_DB2_BE02 ADMIN_USER=input-user ADMIN_PASSWORD=input-password

These values are used to identify the location of server where the InfoSphere MDM Server application was installed and the location of the MDM.ear file, and other files. 3. To prevent a file permission error when installing the Party Maintenance Services, in the subdirectory with shell scripts that exists under the <MDM_Sample_Home>/PartyMaintenanceServices/install/ directory, execute the command in Example 4-3.
Example 4-3 The chmod command

chmod -R 755 *.sh 4. Execute Example 4-4 shell script to install MDM Party Maintenance Services.
Example 4-4 Install MDMPartyMaintenanceServices shell script

./install_MDM_PartyMaintenanceServices.sh 5. Ignore the warning WARNING: Duplicate name in Manifest that appears on the server command window while the script is running. While the script is running, it prompts you to check that the application server is running before deploying Party Maintenance Services InfoSphere MDM Server configuration. After you finish deploying Party Maintenance Services InfoSphere MDM Server configuration, you can either restart the server automatically with the script or skip restarting the server.

92

Master Data Management: IBM InfoSphere Rapid Deployment Package

This script modifies and updates the MDM.ear file, and the jar files where the InfoSphere MDM Server instance is installed. The original EAR or JAR files are copied and saved with a different name for backup. Each backup file can be found in the folder where the original files were with the .beforeELM extension. Examples are as follows: MDM.ear.beforeELM DWLCommonServicesEBJ.jar.beforeELM The log files can be found in the logs folder under the install folder. After the install script has run, check the log files in logs folder under the install folder for any errors occurred during the installation. 6. If InfoSphere MDM Server is installed on a cluster with several nodes, you must run the script install_MDM_PartyMaintenanceServices.sh in the following subfolder: <MDM_Sample_Home>/PartyMaintenanceServices/install/Cluster/ The script updates each InfoSphere MDM Server folder of the cluster. Before running the script, modify the setVariables.sh script according to each environment: WAS_HOME CELL_NAME NODE_NAME APP_NAME

Run the script as many times as needed to update the cluster InfoSphere MDM Server. If the script must be run again, replace the EAR file and JAR files with original files to avoid errors that will occur while the script is running. The logs folder is backed up with a current time stamp.

The install_MDM_PartyMaintenanceServices.sh script


The install_MDM_PartyMaintenanceServices.sh script performs the following modifications to the InfoSphere MDM Server instance: Runs the following scripts to populate MDM Party Maintenance Services configuration data into MDM Server database: clearELMDM.sql ELMDM_Business_Keys.sql ELMDM_Misc_Inserts.sql ELMDM_Txn_Names.sql

Extracts META-INF/MANIFEST.MF file from DWLCommonmServicesEJB.jar and edits Class-Path to include EntryLevelMDM.jar. Appends a string EntryLevelMDM.jar to the end of Class-Path with a space ahead.

Chapter 4. RDP for MDM: Delta Load

93

Extracts DWLCommon_extension.properties file from properties.jar. Copies the content of DWLCommon_extension_ELMEM.properties and pastes it under the following line in the DWLCommon_extension.properties: BusinessProxy.Default= com.dwl.base.requestHandler.DWLTxnBP Extracts tcrm_extension.properties file from properties.jar. Copies the content of tcrm_extension_ELMDM.properties and pastes it to the end of tcrm_extension.properties. Modifies tcrmRequest_extension.xsd by deleting the lines indicated by DELETE THE CODE BELOW AT DEPLOYMENT text in the files. Replaces these files in DWLSchemas.jar with the modified ones. Backs up the following files: MDM.ear DWLCommonServicesEJB.jar DWLSchema.jar properties.jar

Adds EntryLevelMDM.jar in the MDM.ear file and WebSphere runtime folder. Puts the modified JAR files back in the MDM.ear file and WebSphere runtime folder. Restarts the server, if requested.

4.3 MDM RDP Runtime Assets


MDM RDP Runtime Assets contain following components: QualityStage (QS) SIF Sequencer Job SIF Parser RDP Suspect Duplicated Candidate Search Rule QS runtime standardization and matching jobs QS runtime client for Remote Method Invocation (RMI) or web services access Java classes as adapter and converter to support QS runtime jobs Java wrapper classes and converter class to support disable MDM Server Soundex phonetic generation and pick up Nysiis phonetic keys from QS runtime jobs MDM RDP Runtime Assets are installed on top of MDM Party Maintenance Services and MDM Server.

94

Master Data Management: IBM InfoSphere Rapid Deployment Package

MDM RDP Runtime Assets play a role in RDP for MDM - Delta Load solution to preserve the source-to-SIF work and ensure same data loading results on suspect duplicated candidate searching, standardization, and party matching as what were done in direct load. SIF Sequencer Job processes source-to-SIF records in order, required by MDM Server. SIF Parser parses sequenced SIF records and converts SIF text request messages into MDM Server business objects. RDP Suspect Duplicated Candidate Search Rule overwrites MDM Server default suspect candidate search rule to emulate blockings used in DataStage Jobs during data direct load. QS runtime standardization and matching jobs are invoked by MDM services during the delta load. QS runtime client is deployed on MDM Server to allow MDM Services to access IBM Information Server where QS jobs are installed through RMI or the web services protocol.

4.3.1 SIF Parser


SIF Parser supports the source-to-SIF format that is used in RDP for MDM Direct Load solution. SIF Parser converts data from an SIF input text message into a persistent transaction Java object before sending the request to the MDM Party Maintenance Services transaction. An appropriate transaction name is chosen, based on a combination of the Record_Type and SubRecord_Type. For example, for Record_Type=P and SubRecord_Type=I, the SIF Parser issues a maintainPartyIdentification transaction, because the SIF record of this type of transaction contains the data for the PartyIdentification entity. SIF Parser is enabled by setting the entry, sif_compatibility_mode = on, in the DWLCommon_extention.property file. MDM RDP Runtime Assets installation script modifies it automatically.

Chapter 4. RDP for MDM: Delta Load

95

SIF records hierarchy


SIF Parser supports SIF records listed in Table 4-4.
Table 4-4 SIF Parser supported SIF records Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Rectype|Subtype P|P P|O P|H P|G P|A P|C P|I P|B P|R P|M P|S P|T C|H C|C C|R C|L C|V C|M C|A InfoSphere MDM Server Business Objects Party/Person Party/Organization PersonName OrganizationName Address/LocationGroup/AddressGroup ContactMethod/LocationGroup/ContactMethodGroup Identification LOBRelationship PartyRelationship PartyValue PartyPrivacyPreference PartyAlert ContractPlus/Contract ContractComponent ContractPartyRole ContractRoleLocation ContractComponentValue ContractPlus/ContractValue ContractAlert

SIF Parser supports party SIF records hierarchy and contract SIF records hierarchy.

96

Master Data Management: IBM InfoSphere Rapid Deployment Package

Party SIF hierarchy


Figure 4-4 shows party SIF records hierarchy.

P P/PO

P A P C P I

P B P M P S P T P P P H P O P G P R

Note: P R does not have any parent or child SIF record.

Figure 4-4 Party SIF records hierarchy

Contract SIF Hierarchy


Figure 4-5 shows contract SIF records hierarchy.

CH

CM

CC

CT

CR CL

CV

Figure 4-5 Contract SIF records hierarchy

Chapter 4. RDP for MDM: Delta Load

97

SIF metadata
MDM RDP Runtime Assets have 17 SIF metadata files in SIF.jar located in the <RDP_Assets_Home>/MDMRDPRuntime/jar directory. SIF metadata files for each supported SIF record are in the following list. SIF Parser converts SIF input message to MDM business objects based on these SIF metadata specifications. If any SIF format customization happens, SIF metadata must be updated accordingly. Alert.sif Contact.sif Contract.sif ContractComponent.sif ContractComponentValue.sif ContractPartyRole.sif ContractRoleLocation.sif ContractValue.sif OrganizationName.sif PartyAddress.sif PartyContactMethod.sif PartyIdentification.sif PartyLOBRelationship.sif PartyPrivPref.sif PartyRelationship.sif PartyValue.sif PersonName.sif

SIF Business Keys


SIF Business Keys are used to determine SIF records hierarchy relationship. SIF Business Keys are only necessary for MDM Server business object hierarchies that have three or more levels, such as C|R, C|V, and C|L records. The SIF Business Keys are defined in SIF.properties as shown in Example 4-5 on page 99.

98

Master Data Management: IBM InfoSphere Rapid Deployment Package

Example 4-5 SIF Business Key

############################################################# #SIF Business Key ############################################################### BusinessKey.C.C=ContractComponentType,ProductType BusinessKey.C.V=ContractComponentType,ProductType,DomainValueType BusinessKey.C.R=ContractComponentType,ProductType,RoleType BusinessKey.C.L=ContractComponentType,ProductType,RoleType,AddressUsage Type #################################### # SIF SPEC for Business Key #################################### ContractComponentType=CONTR_COMP_TP_CD ProductType=PROD_TP_CD RoleType=CONTR_ROLE_TP_CD AddressUsageType=ADDR_USAGE_TP_CD DomainValueType=DOMAIN_VALUE_TP_CD If there are customized InfoSphere MDM Server Business Keys defined in the V_ELEMENTATTRIBUTE table, the related SIF Business Key and SIF SPEC for Business Key must be changed accordingly.

4.3.2 Data extension and SIF Parser configuration


Every MDM Server implementation and customization has the business requirement to extend MDM Server default data model. SIF Parser supports MDM Server data extension. In this section we provide information about SIF metadata, SIF.properties, and SIF input format configuration that allows SIF Parser to work with MDM Server data extension. We assume you have the business requirement to extend the MDM Server CONTRACTROLE table by adding two new columns per the following names: Base table name: CONTRACTROLE Base MDM business object name: TCRMContractPartyRoleBObj Data extension table name: XCONTRACTROLE Extension column names: xsold_indicator, xsold_store_number Extension business object name: XContractPartyRoleBObjExt

Chapter 4. RDP for MDM: Delta Load

99

SIF metadata configuration


If CONTRACTROLE table is extended, the ContractPartyRole.sif metadata file must be modified accordingly. The three updates in the metadata file are as follows: 1. Define the extension business object to replace the base business object, such as using XContractPartyRoleBObjExt to replace TCRMContractPartyRoleBObj in Example 4-6. 2. Define data extension fields in metadata file. The primary key field of data extension table should also be defined in metadata file. See the ContractPartyRole Extension Fields section in Example 4-6. The extension fields must be defined before the NULL_XXX fields. 3. Define a NULL field for each data extension and primary key fields at end of the metadata file. See the ContractPartyRole Extension Null Fields section in Example 4-6.
Example 4-6 ContractPartyRole SIF metadata with extension

####################################################################### # SIF metadata: ContractPartyRole # Version 1.0.0 ## Defination: # column 1: SIF_FIELD_NAME # column 2: IS_NULLABLE # column 3: BOBJ_FIELD_NAME (optional) # Applies to data fields only, first character must be upper case # column 4: BOBJ_CLASS (optional) # Applies when the data field is not defined in the subType mapped BObj # Fully qualified class name if present ###################################################################### RECTYPE N SUBTYPE N ADMIN_SYS_TP_CD N ADMIN_CONTRACT_ID N LOAD_TYPE N ADMIN_CLIENT_SYS_TP_CD N AdminSystemType com.dwl.tcrm.coreParty.component.TCRMAdminContEquivBObj ADMIN_CLIENT_ID N AdminPartyId com.dwl.tcrm.coreParty.component.TCRMAdminContEquivBObj CONTR_COMP_TP_CD Y ContractComponentType com.dwl.tcrm.financial.component.TCRMContractComponentBObj PROD_TP_CD N ProductType com.dwl.tcrm.financial.component.TCRMContractComponentBObj

100

Master Data Management: IBM InfoSphere Rapid Deployment Package

CONTR_ROLE_TP_CD N RoleType com.ibm.cmdm.mdm.extension.component.XContractPartyRoleBObjExt REGISTERED_NAME Y RegisteredName DISTRIB_PCT Y DistributionPercentage IRREVOC_IND Y IrrevokableIndicator START_DT Y StartDate END_DT Y EndDate RECORDED_START_DT Y RecordedStartDate RECORDED_END_DT Y RecordedEndDate SHARE_DIST_TP_CD Y ShareDistributionType ARRANGEMENT_TP_CD Y ArrangementType ARRANGEMENT_DESC Y ArrangementDescription END_REASON_TP_CD Y RoleEndReasonType ## ContractPartyRole Extension Fileds xcontract_role_id N XContractRoleId xsold_indicator Y XSoldIndicator xsold_store_number Y XSoldInStoreNumber NULL_REGISTERED_NAME NULL_DISTRIB_PCT NULL_IRREVOC_IND NULL_END_DT NULL_RECORDED_START_DT NULL_RECORDED_END_DT NULL_SHARE_DIST_TP_CD NULL_ARRANGEMENT_TP_CD NULL_ARRANGEMENT_DESC NULL_END_REASON_TP_CD N N N N N N N N N N

## ContractPartyRole Extension NULL fields NULL_xcontract_role_id N NULL_xsold_in_field_ind N NULL_xsold_in_store_nbr N

Chapter 4. RDP for MDM: Delta Load

101

SIF properties file configuration


There may be several updates in the SIF.properties file for data extension. Note the following steps: 1. Define SIF record/sub_record type to its top level business object hierarchy. The top business object is one of the following objects: TCRMPersonBObj TCRMOrganizationBObj TCRMPartyBObj TCRMPartyListBObj TCRMContractBObj TCRMContractPlusBObjExt

If top business object is not extended, no configuration change is needed in SIF.properties part 1. 2. Define SIF record/sub_record type to business object mapping. If business object is extended because of corresponding data model table extended, SIF record/sub_record type must be defined to its extended business object. In Example 4-6 on page 100, SIF.properties file part 2 must be changed, as shown in Example 4-7.
Example 4-7 SIF.properties updated for sub_type.C.R

sub_type.C.R=com.ibm.cmdm.mdm.extension.component.XContractPartyRole BObjExt 3. Define a child-parent relationship navigator between business object classes. Each non-root business object has one definition.TCRMContractBObj is the root business object in MDM Server application party domain. Based on SIF metadata in Example 4-6 on page 100, SIF.properties part 4 must have changes as shown in Example 4-8.
Example 4-8 SIF.properties updated for parent child relationship navigator

navigator.com.dwl.tcrm.coreParty.component.TCRMPartyBObj=com.ibm.cmd m.mdm.extension.component.XContractPartyRoleBObjExt navigator.com.ibm.cmdm.mdm.extension.component.XContractPartyRoleBOb jExt=com.dwl.tcrm.financial.component.TCRMContractComponentBObj navigator.com.dwl.tcrm.financial.component.TCRMContractRoleLocationB Obj=com.ibm.cmdm.mdm.extension.component.XContractPartyRoleBObjExt 4. Define SIF Business Key. If there are customization or data extension changes MDM Server default Business Keys, then SIF Business Key and SIF metadata for Business Key must be redefined in SIF.properties files SIF Business Key and SIF SPEC for Business Key sections.

102

Master Data Management: IBM InfoSphere Rapid Deployment Package

SIF input format configuration


SIF input format must be created to match corresponding extended SIF metadata. In Example 4-6 on page 100, a ContractPartyRole SIF input record should have content in Example 4-9 specified. The Y is for xsold_indicator field, and 101 is for xsold_store_number.
Example 4-9 SIF input record created based on SIF metadata in example 4-5

C|R|1|DSSA1011||1|DSSA1010|1|1|2|||N||||||||||Y|101|0|1|0|1|0|0|0|0|0| 0|0|0|0|

4.3.3 SIF sequencer


SIF format in direct load allows records of various types and subtypes to appear in the same input load file. DataStage and QualityStage jobs can process these records because the referential integrity on the MDM Server database is dropped during direct load. Also, every line in SIF input file has one and only one SIF record. No SIF records hierarchy relationship is present. MDM Party Maintenance Services transactions invoke MDM Server core transactions that check data referential integrity. Therefore, loading the child record before the parent record that uses maintenance transactions is not possible. SIF Sequencer is built to sort and merge SIF records according to SIF records hierarchy. Sequencer determines the related SIF records and appends them to the end of their parent SIF record. SIF Sequencer has three main objectives: Determine related SIF records and appends them to end of their parent SIF record. Reduce the number of transactions to MDM Server. Let maintainParty or maintainContractPlus coarse grain transaction to load data instead of invoking several granular transactions. Improve performance.

Sequenced SIF records order


Sequencer reads source-to-SIF input SIF file, sorts and appends input SIF records by source system cross references, and generates sequenced SIF output file (or files), in the following order: 1. It appends all the child SIF records to the end of root SIF records (P|P, P|O, and C|H) first.

Chapter 4. RDP for MDM: Delta Load

103

2. It appends all the grandchild SIF records to their parent second, and so on. If a non-root SIF record does not have a matching parent SIF record, it may still have child and grandchild SIF records. Sequencer appends the related parent-child SIF records together. Sequenced SIF file or files have the following layout: Party SIF records stay on top section of SIF file. Party child SIF records that have no matching cross references set after party SIF records. The order between all party child SIF records does not matter. P|R SIF records sit after party child SIF records. Contract SIF records locate after all party child SIF records.Contract child and grand child SIF records reside after contract SIF file. Example 4-10 shows a SIF Sequencer generated SIF file.
Example 4-10 SIF Sequencer-generated SIF file

P|PP|HP|HP|AP|CP|CP|IP|IP|MP|T P|OP|GP|HP|AP|CP|IP|MP|T P|PP|HP|HP|AP|AP|CP|IP|IP|MP|T P|PP|H P|A P|C P|M P|I. C|HC|MC|CC|CC|RC|RC|RC|LC|L C|HC|C... C|TC|VC|RC|VC|LC|L C|CC|RC|VC|L C|T C|V C|RC|L C|R C|L C|M SIF Sequencer can have up to 19 SIF input files. It depends on source system and the tool used to extract the source data. SIF Sequencer may generate one output SIF file with all SIF records in order.

104

Master Data Management: IBM InfoSphere Rapid Deployment Package

It is also flexible to generate three sequenced SIF output files: Party.SIF The Party.SIF file contains all party and its child SIF records. Contract.SIF The Contract.SIF file includes all contract and its child/grandchild SIF records, plus C|M SIF records. PartyRelationship.SIF The PartyRelationship.SIF file has all party relationship SIF records.

Run SIF Sequencer Job


See section 4.5.2, Run SIF Sequencer Job on page 125 for details about how to run SIF Sequencer Job.

4.3.4 QualityStage runtime standardization and matching jobs


MDM RDP Runtime Assets contain 4 QualityStage runtime jobs for person name, organization name, address, and phone number standardization. These jobs allow delta load transactions going through MDM Server to use the same standardization rules as in RDP for MDM Direct Load. MDM RDP Runtime Assets also contain two QualityStage runtime jobs for person and organization suspect duplicate matching. The matching algorithm differs from the RDP for MDM Direct Load jobs. However, these runtime jobs provide a similar level of functionality as the direct load jobs. MDM Server Matching Critical Data Rules Console UI can be used to dynamically manage the critical data match strings for person and organization. QualityStage runtime jobs take into account the selections. MDM RDP Runtime Asset also have an adapter and a converter to support the QualityStage runtime jobs.

4.3.5 Search Suspect Candidates rule


The suspect duplicate candidate selection algorithm for RDP for MDM Direct Load jobs differs from the algorithm provided with the default candidate selection rule in InfoSphere MDM Server. To provide a similar level of functionality between RDP for MDM Direct Load and RDP for MDM Delta Load, MDM RDP Runtime Assets provide a search suspect candidates rule that overwrites MDM Server default implementation and implements the same blockings algorithm as used in the direct load.

Chapter 4. RDP for MDM: Delta Load

105

Blocking fields for each candidate selection pass are described in Table 4-5.
Table 4-5 Blockings in search suspect candidate rule Pass number Pass 1 Blocking fields for person match Last Name Phonetics Street Name Phonetics Postal Code Last Name Phonetics Box Number Postal Code Last Name Phonetics Rural Route Postal Code Last Name Phonetics Street Name Phonetics CityName Phonetics Last Name Phonetics Box Number CityName Phonetics Last Name Phonetics Rural Route CityName Phonetics Last Name Phonetics National Identifier Blocking fields for organization match Word 1 Phonetics Street Name Phonetics Postal Code Word 1 Phonetics Box Number Postal Code Word 1 Phonetics Rural Route Postal Code Word 1 Phonetics Street Name Phonetics CityName Phonetics Word 1 Phonetics Box Number CityName Phonetics Word 1 Phonetics Rural Route CityName Phonetics Word 1 Phonetics National Identifier

Pass 2

Pass 3

Pass 4

Pass 5

Pass 6

Pass 7

The search suspect candidates rule takes into account the matching critical data fields selection that can be configured using MDM Server Matching Critical Data Rules Console UI. If a critical field, such as street name, is not configured to participate in matching, then any pass that includes this field (such as passes 1 and 4) are not used in candidate selection.

4.3.6 Disable phonetic keys generation in MDM Server


MDM Server has default Soundex phonetic key generation, but RDP for MDM Direct Load uses Nysiis phonetic keys. To use the same phonetic keys in delta load, MDM RDP Runtime Assets disables MDM Server Soundex phonetic key generation, propagates Nysiis phonetic keys passed in from QualityStage runtime jobs, forward them to MDM Server, and eventually persists to the database.

106

Master Data Management: IBM InfoSphere Rapid Deployment Package

MDM RDP Runtime Assets provides dummy phonetic key generator wrapper classes that overwrite MDM Server default Soundex phonetic key generator classes to disable Soundex phonetic key generation in MDM, and a converter class to pick up Nysiis phonetic keys for person name, organization name, and address generated and passed in from QualityStage runtime jobs. MDM RDP Runtime Assets installation script modifies MDM Server tcrm_extension.properties file and CONFIGELEMENT table to replace Soundex phonetic keys generator configuration in MDM Server and uses Nysiis phonetic keys generated from QualityStage runtime jobs.

4.3.7 MDM RDP Runtime Assets installation


MDM Server and MDM Party Maintenance Services must be installed before installing MDM RDP Runtime Assets. SIF Sequencer Job must already be installed on IBM InfoSphere Information Server server at RDP for MDM Direct Load using DataStage and QualityStage RDP project time. The QualityStage components that are required for MDM and QualityStage integration are listed in Table 4-6. These components can be found in the <RDP_Assets_Home>/MDMRDPRuntime/QualityStage directory.
Table 4-6 QualityStage runtime components Component name Description

ELMDMQS.dsx

DataStage/QualityStage job export. Contains source code to be imported into your environment through the DataStage/QualityStage Designer Client. WebSphere Information Services Director (ISD) project export. Contains service definitions to be imported into your environment through the InfoSphere Information Server Console

ELMDMQS_ISDProject.xml

You must install QualityStage runtime jobs in ELMDMQS.dsx and deploy services for RMI interface using WebSphere ISD in ELMDMQS_ISDProject.xml file. You can follow similar installation steps as in the Installing DataStage and QualityStage jobs section in Chapter 54 of the InfoSphere Master Data Management Server Version 9.0 Developers Guide (licensed material available with the product). The

Chapter 4. RDP for MDM: Delta Load

107

differences are installing ELMDMQS.dsx and ELMDMQS_ISDProject.xml instead of MDMQS.dsx and MDMQS_ISDProject.xml this time. When deploying ELMDMQS_Project.xml file, you must also edit operations, as shown in Table 4-7. In the table, ISD is Information Service Director, and DS is DataStage.
Table 4-7 MDM RDP ISD Operations
ISD operation name DS job name Inputs accept array Yes Input data type Outputs return array Yes Output data type

elOrgMatch

RDP_MDMISD_Party_Suspe ct_Reference_Match_Org RDP_MDMISD_Party_Suspe ct_Reference_Match_Person RDP_MDMISD_Person_Stan dardization RDP_MDMISD_Address_Sta ndardization RDP_MDMISD_Organization _Standardization RDP_MDMISD_Phone_Stan dardization

ELOrgMatchInput

ELOrgMatchOutput

elPersonMatch

Yes

ELPersonMatchInput

Yes

ELPersonMatchOutput

standardizePersonName

PersonInput

PersonOutput

standardizeAddress

AddressInput

AddressOutput

standardizeOrganization

OrganizationInput

OrganizationOutput

standardizePhone

PhoneInput

PhoneOutput

This installation process does not cover the QualityStage runtime jobs installation. You can follow similar installation steps as in the Installing DataStage and QualityStage jobs section in Chapter 54 of the InfoSphere Master Data Management Server Version 9.0 Developers Guide (licensed material available with the product). MDM RDP Runtime Assets installation scripts are grouped by application server and database type. To install the assets, use the following basic steps: 1. Choose the installation scripts under the server and database combination that matches your server environment. 2. Edit the setVariables.sh script as appropriate. 3. Execute the install_RDP_Assets.sh script to install the InfoSphere MDM Server Rapid Deployment Package assets.

108

Master Data Management: IBM InfoSphere Rapid Deployment Package

The install_RDP_Assets.sh script


The install_RDP_Assets.sh script performs the following modifications to the InfoSphere MDM Server application: Executes the following SQL scripts to create and modify tables in the MDM Server database: AlterTables.sql Create_IS_Tables.sql Create_Seq_Objects.sql Executes one of the following combinations of SQL scripts, depending on the variables defined in setVariable.sh script: Altered_Compound_Triggers.sql and Altered_Delete_Compound_Triggers.sql Altered_Simple_Triggers.sql and Altered_Delete_Simple_Triggers.sql Executes the ELMDM_Configelement.sql script. Extracts the following JAR files into a temporary folder for modification: DWLCommonServicesEJB.jar properties.jar Extracts META-INF/MANIFEST.MF files from DWLCommonServicesEJB.jar and edits the class path to include the following JAR files: MDMRDPRuntime.jar SIF.jar ELMDMQS_client.jar For WebLogic Application Server, extracts the MANIFEST.MF files from PartyEJB.jar with ELMDMQSWS_client.jar added to the Class Path, and the ejb-jar.xml file is updated with a reference to IBM InfoSphere Information Server web services. Extracts the files tcrm_extension.properties and DWLCommon_extension.properties from properties.jar, adds the contents of tcrm_extension_ELMDM.properties and DWLCommon_extension_ELMDM.properties to the two properties files, and then reinserts the files into properties.jar file. Adds SIF.properties into properties.jar file. Adds the following files to the MDM.ear file: MDMRDPRuntime.jar SIF.jar ELMDMQSWS_client.jar ELMDMQS_client.jar

Chapter 4. RDP for MDM: Delta Load

109

Adds the following modified JAR files to the MDM.ear file: DWLCommonServicesEJB.jar properties.jar Deploys the changes to the InfoSphere MDM Server application.

Installation steps
Perform the following steps to install MDM RDP Runtime Assets: 1. From the RDP FTP distribution site, download the InfoSphere MDM Server RDP assets archive, MDM90_RDPRuntime.tar.gz, to a temporary directory on the server. 2. Extract the TAR file using the following command: gzip d MDM900_RDPRuntime.tar.gz xvf MDM900_RDPRuntime.tar The TAR file is extracted into several directories. 3. Navigate to the directory for the server and database type that matches your server environment. For example, if your server environment is IBM WebSphere Application Server with IBM DB2, navigate to: <RDP_Assets_Home>/MDMRDPRuntime/install/WebSphere/DB2/ 4. Edit setVariables.sh to provide the variables with appropriate values for your environment. These values are parameters that are used in installation scripts, as shown in Example 4-11.
Example 4-11 Sample values for setVariables.sh export JAVA_HOME=/usr/IBM/WebSphere/AppServer/java export NODE_NAME=Node01 export WSADMIN_BIN=/usr/IBM/WebSphere/AppServer/bin export SERVER_NAME=CAM_MDM900_12032009_1455_DB2_BE10 export APP_NAME=CAM_MDM900_12032009_1455_DB2_BE10 export INSTALL_HOME=/usr/IBM/MDM/CAM_MDM900_12032009_1455_DB2_BE10 export DB_NAME=MDM9QA2 export DB_USER=sarcam10 export DB_PASSWORD=Schema90 export TABLE_SPACE=TABLESPAC export INDEX_SPACE=INDEXSPAC export LONG_SPACE=LONGSPACE1 export TRIG=Compound export DEL_TRIG=TRUE export ADMIN_USER=cusadmin export ADMIN_PASSWORD=cusadmin export IIS_SRV_VERSION=81 MESSAGING_TYPE=WMQ export ISP_URL='iiop:\/\/IISserver.ibm.com:2809'

110

Master Data Management: IBM InfoSphere Rapid Deployment Package

Note: For WebSphere Application Server, the ISP_URL supports IIOP URLs only. For WebLogic Application Server, the ISP_URL only supports the web services URL, as follows: export ISP_URL='http:\/\/IISserver.ibm.com:9080' 5. To prevent a file permission error when installing the RDP runtime assets, run the following command in the /MDMRDPRuntime/install folder: chmod -R 755 *.sh 6. Run following script to install the RDP assets: ./install_RDP_Assets.sh Note: Ignore the following warning, which appears in the console while the script is running: WARNING: Duplicate name in Manifest. This script modifies the existing MDM.ear file and saves a backup copy of the original EAR file. The backup version can be found in the same folder as the modified file, and is renamed with the .beforeRDP file extension, for example: MDM.ear.beforeRDP

Redeploying RDP assets


To redeploy MDM RDP Runtime Assets, use the following steps: 1. Restore the MDM.ear using the following commands: rm /INSTALL_HOME/MDM.ear mv /INSTALL_HOME/MDM.ear.beforeRDP /INSTALL_HOME/MDM.ear 2. Run the install_RDP_Assets.sh script. 3. Check the log files for errors. You can ignore errors from database logs because of duplicated insert SQL errors that happened.

4.3.8 MDM Matching Critical Data Rules Console user interface


MDM Matching Critical Data Rules Console user interface (UI) is not part of MDM RDP Runtime Assets, but it is one of the MDM samples that is included with the purchase of MDM Server. After it is installed on the server, it can be used to manage party matching critical data fields dynamically, which means

Chapter 4. RDP for MDM: Delta Load

111

there is no need to restart the MDM Server to refresh cache after modified party matching critical data fields. MDM matching critical data fields can also be updated by directly updating the MDM Server CONFIGELEMENT table content. However, if necessary, stop the MDM Server to refresh server cache in this way. MDM RDP Runtime Assets provides the following document, which describes how to use the Matching Critical Data Rules Console to manage critical data: fatchingCriticalDataRulesConsole.pdf

Installing Matching Critical Data Rules Console UI


Perform the following steps: 1. Locate the MDM901_Samples.tar.gz archive on your InfoSphere MDM Server distribution media. 2. Open MDM901_Samples.tar.gz and extract the following installable EAR file: UI/runtime/CustomerMatchingCriticalDataRules.ear 3. Open CustomerMatchingCriticalDataRules.ear and locate propertiesUI.jar file, which contains three relevant properties files: mdmUIConfiguration.properties matchingCriticalDataRules.properties ClientAuthentication.properties 4. Open mdmUIConfiguration.properties and edit the entries shown in Example 4-12.
Example 4-12 Edit the entries in mdmUIConfiguration.properties # The iiop location for the server and port to connect to the MDM Server jndi #Ex: java.naming.provider.url=corbaloc:iiop:hasufel.torolab.ibm.com:9811 java.naming.provider.url= # The fully qualified name of the User Group Implementaion. # Currently the there are 2 values (user group implementation classes) for this property. # The 2 possible values can be: # Ex: # when deployed on a IBM WebSphere use UserGroupImpl=com.ibm.mdm.ui.registry.WASUserGroupImpl # when deployed on a BEA WebLogic use UserGroupImpl=com.ibm.mdm.ui.registry.BEAUserGroupImpl UserGroupImpl=

112

Master Data Management: IBM InfoSphere Rapid Deployment Package

5. Open matchingCriticalDataRules.properties and edit the entries shown in Example 4-13.


Example 4-13 Edit the entries in matchingCriticalDataRules.properties ############################################## # The host where the Config Manager is running # Ex: cm.host=localhost ############################################## cm.host= ################################################ # The port where the Config Manager is listening # Ex: cm.port=9902 ################################################ cm.port= ############################ # Inspect the values from MDM Server database: # [TABLE].[FIELD] # APPSOFTWARE.NAME # APPSOFTWARE.VERSION ############################ #Ex: #mdmServer.appName=InfoSphere Master Data Management #mdmServer.appVersion=9.0.0 mdmServer.appName= mdmServer.appVersion= ############################ # Inspect the values from MDM Server database: # [TABLE].[FIELD] # APPDEPLOYMENT.NAME ## if there is no value in the database then # leave the value of this property empty. # Ex: mdmServer.deployName= ############################ #Ex: #mdmServer.deployName=CAM_MDM900_12032009_1455_DB2_BE02 mdmServer.deployName= ############################ # [TABLE].[FIELD] # APPINSTANCE.NAME ## if there is no value in the database then # leave the value of this property empty. # Ex: mdmServer.instanceName= ############################ mdmServer.instanceName=

Chapter 4. RDP for MDM: Delta Load

113

6. Open ClientAuthentication.properties and edit the entries shown in Example 4-14.


Example 4-14 Edit the entries in ClientAuthentication.properties

# ID and password used for MDM client applications #Ex: #client.id=mdmClientUser #client.password= mdmClientPassword client.id= client.password= # The type of application server used. # Valid values are: WAS, WL #Ex: #applicationServerType=WAS applicationServerType=

7. Add the modified files back into the CustomerMatchingCriticalDataRules.ear file. 8. Use the application servers Administrative Console to install the CustomerMatchingCriticalDataRules.ear file. No custom settings are required during the installation process. 9. After the installation is complete, start the Customer Matching Critical Data Rules user interface. 10.To access the newly deployed user interface, use a web browser to navigate to the URL, structured as follows: http://<host>:<port>/CustomerMatchingCriticalDataRulesWeb/faces/inde x.jsp Replace <host> and <port> in the URL with the appropriate host name and port number.

4.4 Performance tuning for MDM Delta Load using RDP


Overall performance of loading data into MDM database using MDM RDP Runtime Assets and MDM party Maintenance Services depends on performance of many layers. This section describes several performance tuning tips for MDM BatchProcessor, QualityStage runtime jobs, MDM Server, WebSphere Server, the Database Layer, and the InfoSphere Information Server layer.

114

Master Data Management: IBM InfoSphere Rapid Deployment Package

4.4.1 MDM BatchProcessor configuration


MDM BatchProcessor is used to load data. To configure the batch processor to use SIF format input file and SIF parser, edit the following files: The batch_extension.properties file as shown in Example 4-15. The Batch.properties file as shown in Example 4-16.
Example 4-15 Sample update in batch_extension.properties

ParserAndExecConfiguration.Parser = SIF
Example 4-16 Sample update in Batch.properties

ServerConfiguration.provider_url = corbaloc:iiop:<ServerName:portNumber> ServerConfiguration.context_factory = <CTX_FACTORY> #Sample values: ServerConfiguration.provider_url=corbaloc:iiop:gandalf.torolab.ibm.com: 9825 ServerConfiguration.context_factory = com.ibm.websphere.naming.WsnInitialContextFactory

Concurrency level
The batch processor can submit concurrent requests to MDM server. The level of concurrency for the batch processor client can be controlled by changing the number of submitters. If total number of logical CPUs available to the MDM server (across all servers in case of a cluster) is N, and assuming MDM server is serving only those requests that are coming from the batch processor, you may choose the submitter number to be in a range from N to 2N. Based on internal tests, for an MDM server running on an IBM pSeries POWER5 system with eight physical processors (hence 16 logical processors), 24 submitters were observed to be optimal. The default number of submitters is 5. To set the number of submitters, edit the following file: <MDM_INSTALL_HOME>/BatchProcessor/properties/Batch.properties. As an example, set the number to 24: Submitter.number = 24 Note: The default number for reader and writer is 1, but you do not have to change them.

Chapter 4. RDP for MDM: Delta Load

115

Suspend/Resume threshold
These values indicate the thresholds of percentage heap usage at which the reader suspends or resumes reading. Default values for these thresholds are 20% and 30% respectively. The default values are good if your input requests are in XML format. However, if you are using SIF input, you may increase these thresholds to 30% and 40% respectively. These values are specified in the Batch.properties file.

Suspend duration
When the reader does not see enough free memory, it remains suspended and does not read more inputs. However, it calls garbage collection (GC) to ensure memory is free of garbage. The duration for which the reader waits before calling GC is controlled by suspend duration, which has a default value of 200 milliseconds. To reduce the GC overhead you may want set this to a higher value, say 2000 milliseconds, in the Batch.properties file.

Handling successful responses


If you do not need successful responses, you can set the following property in Batch.properties to false: setMDMSuccessResponseToQueue=false This setting will avoid responses of successful transactions being stored in memory, thus reducing memory usage

JVM heap size


Set the heap size for JVM running the batch processor by editing the following file: <MDM_INSTALL_HOME>/BatchProcessor/bin/runbatch.sh For most cases, 512 MB of heap is sufficient. Ensure that the java command in this script called to run batchController actually uses this heap setting.

Logging thresholds
To reduce the overhead because of logging in batch processor, set the logging threshold to Warning or Error level. You can do this by editing the following file: <MDM_INSTALL_HOME>/BatchProcessor/Log4J.properties In the file, set the logging threshold to WARN or ERROR, if it is not already: log4j.appender.file.Threshold=ERROR.

116

Master Data Management: IBM InfoSphere Rapid Deployment Package

4.4.2 MDM Server configuration


The MDM Server configuration can be modified through properties files, CONFIGELEMENT table, and Matching Critical Data Rules Console UI.

Logging thresholds
Set MDM Server logging threshold in Log4J.properties to WARN or ERROR to reduce logging overhead. If WebSphere is the server platform, Log4J.properties is included in the following JAR file: <WebSphere_Home>/profiles/<NodeName>/installedApps/<CellName>/<Instance Name>/properties.jar An example of the setting is follows: log4j.rootLogger=ERROR, file, stdout

Disable performance monitor


The performance monitor is disabled by default. However, if you enabled it for some reason, (for example to debug a performance problem), ensure you turn it OFF for normal operations to avoid the overhead of the performance monitor. To disable performance monitoring, make following changes to the CONFIGELEMENT table VALUE column: Set the following item to 0: /IBM/DWLCommonServices/PerformanceTracking/level Set entries such as the following entries to false: /IBM/DWLCommonServices/PerformanceTracking/%enabled

Disable TAIL
MDM Server Transaction Audit Information Log (TAIL) is turned off by default. If it is not required to turn on TAIL, be sure TAIL is off. To turn TAIL logging off, make the following change to CONFIGELEMENT table value column: set /IBM/DWLCommonServices/TAIL/enabled to false

Data standardization
MDM Runtime standardization ensures that names, addresses, and phone numbers are stored in MDM Server, using the same format, which increase data accuracy.

Chapter 4. RDP for MDM: Delta Load

117

If the input data is already standardized, turning off runtime data standardization can avoid performance overhead.

Name Standardization
Name standardization can be switched on or off by setting the following item in the CONFIGELEMENT table to false or true, respectively: /IBM/Party/ExcludePartyNameStandardization/enabled

Address and PhoneNumber Standardization


Address Standardization can be turned on or off by setting the following indicator to N or Y respectively, in the transaction input requests: StandardFormatingIndicator Also set the following items to true in the CONFIGELEMENT table to avoid performing data standardization multiple times in MDM Server and improve performance: /IBM/ThirdPartyAdapters/IIS/StandardizeAddress/StandardFormattingIndic ator/enabled /IBM/ThirdPartyAdapters/IIS/StandardizePhoneNumber/StandardFormattingI ndicator/enabled

Suspect Duplicate Processing (SDP)


If there are no duplicates in data, you can switch off SDP to avoid performance overhead, because searching suspects in clean data is also time consuming although no suspects will be found. You can switch off SDP by setting the value to false for the following entries in CONFIGELEMENT table: /IBM/Party/SuspectProcessing/enabled /IBM/Party/SuspectProcessing/AddParty/returnSuspect

History triggers
If history triggers are enabled, the I/O requirement on DB server is almost doubled. However, if enough I/O bandwidth is provided, the overhead of using history triggers is less than 5%.

118

Master Data Management: IBM InfoSphere Rapid Deployment Package

4.4.3 WebSphere Application Server configuration


This section provides tips for WebSphere Application Server configuration.

Size of ORB thread pool


Ensure the ORB thread pool is large enough for the maximum expected concurrency (maximum amount of concurrent RMI calls to MDM Server transactions). You can set it to a value larger than the number of concurrent users (or submitters in the batch processor). The default maximum size for the ORB.thread.pool is 50 and is sufficiently large for most cases. However, if you need to change it, you can do so using WebSphere Administration Console. Go to Application servers MDMServerName Thread pools ORB.thread.pool.

EJB cache size


This value should be set to the maximum number of active enterprise bean instances expected during a typical workload. For MDM server, set the Enterprise JavaBeans (EJB) cache size to 4000. Do this by using the WebSphere Administration Console. Go to Servers Application servers [MDMServerName] EJB Container Settings EJB cache settings.

JDBC connection pool size


Ensure JDBC connection pool size is large enough to support the concurrency. The default value of Maximum Connections is 20. You may leave it at the default and use the Tivoli Performance Viewer to determine whether the pool size must be increased. If the number of concurrent waiters is greater than 0 (zero) and the CPU usage is not close to 100%, you can increase the connection pool size. To change this setting using WebSphere Administration Console, go to Resources JDBC Data sources DWLCustomer Connection pool properties.

Prepared statement cache size


Specifies the number of statements that can be cached per connection. The WebSphere Application Server data source optimizes the processing of prepared statements and callable statements by caching those statements that are not being used in an active connection. For InfoSphere MDM Server set it to 300 and monitor it using Tivoli Performance Viewer to see if needs to be increased. To change this using WebSphere Administration Console, go to Resources JDBC Data sources DWLCustomer WebSphere Application Server data source properties.

Chapter 4. RDP for MDM: Delta Load

119

JVM settings
Change the JVM heap size and GC policy as follows: 1. From the WebSphere Administration Console, go to Servers Application servers [MDMServerName] Java and Process Management Process Definition Java Virtual Machine. 2. Set the initial heap size as 512 MB and the maximum heap size as 1024 MB. 3. Specify -Xgcpolicy:gencon under Generic JVM arguments to use generational concurrent (gencon) GC policy. 4. On the same page of WebSphere Administration Console, use the check box to enable verbose GC logs. These logs are helpful, to understand heap memory usage and garbage collection. The overhead caused by these logs is small.

4.4.4 Database tuning


MDM Server performance is strongly dependent on the performance of the underlying database layer. The following sections describe basic, but important, areas on which to focus.

Disk configuration
To ensure no bottlenecks because of I/O, a large number of physical disks or Storage Area Network (SAN) disks (typically configured as a RAID system) is needed to make available to the DB server machine. Use a set of dedicated disks for transaction logs and another set of dedicated disks for table spaces. If possible, use different disk controllers for these two sets of disks, because this gives the flexibility to configure the disk controllers independently to favor various I/O patterns seen on these sets of disks. Ensure read and write cache is enabled on the storage system.

Table spaces
Plan the table spaces to ensure that the I/O is balanced across all available disks. If I/O is not balanced, the busiest disk becomes the bottleneck and overall bandwidth will remain unused.

120

Master Data Management: IBM InfoSphere Rapid Deployment Package

DB2 statistics
While loading data into an empty MDM database, avoid running concurrent users (multithreaded) in the beginning. Load a few records (say 10000) into the database, using a single thread, and execute DB2 statistics (runstats) on your critical data tables before running multiple concurrent users. Execute runstats periodically after a large volume of data is loaded into the database since the last execution of runstats.

SQL access plan


Because MDM supports extensions and customizations, you must ensure that the database has correct indexes in place for all queries, including the customized ones. You can analyze the top most time-consuming SQLs from the database snapshot and ensure the access plans for these SQLs are efficient. For any customizations, use parameterized SQLs to take advantage of prepared statement caching and reduce number of compilation of SQL statements.

Buffer pools
You must monitor the database performance using DB2 snapshots or other tools to measure buffer pool usage, SQL response times, and general I/O rates. The buffer pool hit ratio is an indicator of how often the physical disks are accessed to get the data. Try to use large buffer pools such that buffer pool hit ratio is more than 80% for data, and more than 90% for indexes.

Chapter 4. RDP for MDM: Delta Load

121

4.4.5 Information Services Director job configuration


IBM InfoSphere Information Services Director (ISD) allows QualityStage jobs to be deployed as web services or EJBs. Using IBM Information Server console you can deploy and configure these jobs. Log on to the IBM Information Server console, expand ELMDMQSService Operations. See Figure 4-6 for detail.

Figure 4-6 Configure QS runtime jobs

Select each operation and configure as described in the following sections.

122

Master Data Management: IBM InfoSphere Rapid Deployment Package

Default Settings tab


Select the Default Setting tab and configure the following items: 1. Load balancer When there are multiple Application Service Backbone (ASB) Agents, the load balancer is used to determine to which ASB Agent a request will be routed. You may choose between round-robin and average response time. For ELMDMQS jobs, response time is selected by default and you can retain the same. 2. Max Queue wait This item defines maximum time a request can wait on the queue. The default is set to 1000 millisecond. You should increase it to higher value, such as 5000 milliseconds. See the highlighted item 1a in Figure 4-6 on page 122. You can monitor SystemErr.log of MDM Server. If you notice a message, such as the following one, increase this wait time: Queue Wait of 1000 Exceeded 3. Max Queue requests If a request cannot be served immediately, it waits in what is called an operation queue. This number denotes the size of this queue. See highlighted item 1b in Figure 4-6 on page 122. By default, queue size is set to 3 in ELMDMQS jobs. Change this value according to the following formula: Queue Size >= Maximum Level of concurrency / Minimum number of Job instances As an example, if the number of submitters is 30 and number of minimum job instances is set to 2, set this queue size at least to 15. If you notice errors, such as the following error in SystemErr.log of MDM Server, increase the queue size: Queue Limit of 3 Exceeded

Information provider tab


Select Information Provider tab, select Provider properties tab, and then review or change the following information: Active job instances or JDBC connections Select the Provider Properties tab. The active job instances or connectors are the minimum and maximum number of jobs that will be active at a time. The ASB Agent attempts to always keep the number of active instances between these limits. By default, these limits are set to 1 and 5 respectively for all ELMDMQS jobs. See highlighted item 2a in Figure 4-7 on page 124 as an example.

Chapter 4. RDP for MDM: Delta Load

123

You can check the active number of job instances using IBM Information Server Director Client. You can increase the maximum limit if you see that the number of running instances is at the maximum limit and CPU usage on the IBM Information Server is not near 100%. Activation threshold The activation threshold is used to determine whether a new job instance needs to be activated. This information is based on two parameters: Service Requests and Delay. The default values for these parameters for ELMDMQS jobs are 3 and 1000 respectively, as depicted in Figure 4-7. When the number of service requests in the operation queue is more than Service Requests and remain at a higher level, at least for a duration indicated by Delay, a new job instance is created. The default values should suffice for most cases.

Figure 4-7 Configure QS runtime jobs

124

Master Data Management: IBM InfoSphere Rapid Deployment Package

4.5 Run Delta Load for MDM using RDP


After MDM Server application, MDM Party Maintenance Services, and MDM RDP Runtime Assets are installed, deployed, and configured, you are ready to run Delta Load for MDM. IBM Information Server and DataStage and QualityStage RDP project should have been up and running since the time of the RDP for MDM Direct Load time. Perform the following tasks before running Delta Load: 1. Create source SIF files. 2. Run SIF Sequencer to generate sequenced SIF files. 3. Run MDM Batchprocessor to load data to MDM.

4.5.1 Create source SIF files


The RDP for MDM Delta Load solution takes SIF files as input data files. IBM Information Server is used to extract and transform client source data to create the source SIF file. See 2.3, Standard Interface File (SIF) on page 16 for details about creating the SIF file.

4.5.2 Run SIF Sequencer Job


The SIF Sequencer Job is included in the DataStage RDP Package and deployed with the RDP project at direct load phase. The SIF Sequencer Job can be invoked in a similar way to run any direct load jobs, by using only IL_000_AutoStart_EX.

Chapter 4. RDP for MDM: Delta Load

125

Figure 4-8 shows the SIF Sequencer Job in DataStage MDMIS R5.3 project.

Figure 4-8 SIF Sequencer in an MDMIS R5.3 project

SIF Sequencer can be run using either DataStage Director Client or the dsjob command-line command.

The dsjob command


To use the dsjob command, the user environment must be correctly set up so that required libraries and executables are in their proper paths. The InfoSphere Information Server environment must already be set up at the RDP for MDM Direct Load phase. The following steps describe how the particular dsjob command would be constructed to run the IL_000_AutoStart_EX SIF Sequencer job on InfoSphere Information Server: 1. Go to <IIS_Install_Home>/DSEngine/bin location. 2. Set user as dsadm user and log in. 3. Put sequenced SIF input file (or files) on the server at the location defined in the MDM Server CONFIGELEMENT table with the: name=/IBM/ELMDM/IIS/Install/SIFInputFiles/path

126

Master Data Management: IBM InfoSphere Rapid Deployment Package

4. Run following command: dsjob -run -mode NORMAL -wait -warn 0 -param MDM_CONNECTIONS=Default RDP IL_000_AutoStart_EX In the command, Default is the parameter value set name, RDP is the DataStage Project Name, and IL_000_AutoStart_EX is the SIF Sequencer job name. 5. Check the output file on server at the location defined in MDM Server CONFIGELEMENT table with the: name=/IBM/ELMDM/IIS/Install/ISDataSetHeaders/path 6. Check the error log file on server at the location defined in MDM Server CONFIGELEMENT table with the: name=/IBM/ELMDM/IIS/Install/ErrorFiles/path=/home/dsadm/Project/RDP/ ERROR

4.5.3 Run MDM BatchProcessor


MDM BatchProcessor or other batch framework can be used to read sequenced SIF files and feed the SIF records into MDM Server. To use the MDM BatchProcessor framework to load data, see 4.4.1, MDM BatchProcessor configuration on page 115 for details.

To load single SIF input data file


In the <MDM_INSTALL_HOME>/BatchProcessor/bin/ directory, execute the following command: runbatch.sh <SIF_FILE_FULLNAME> <LOG_FILE_PATH> <BATCH_EXTENSION_PROPERTYFILE_NAME> An example is as follows: runbatch.sh /opt/IBM/MDM/BatchProcessor/seed/person.sif /opt/IBM/MDM/BatchProcessor/seed/logs batch_extension

To load multiple SIF input data files


In the <MDM_INSTALL_HOME>/BatchProcessor/properties/ directory, edit the Batch.properties file by defining the SIF input data location, SIF input data file names, and log file location. Next, execute runbatch.sh without any argument. See Example 4-17 on page 128.

Chapter 4. RDP for MDM: Delta Load

127

Example 4-17 Batch.properties sample value for loading multiple SIF input files

SIF_INPUT_PATH=/usr/IBM/MDM/BAR_MDM850_12032008_1210_DB2_BE01/BatchProc essor/ SIF_INPUT_FILE_NAMES=Party.sif,Contract.sif,PartyRelationship.sif SIF_OUTPUT_PATH=/usr/IBM/MDM/BAR_MDM850_12032008_1210_DB2_BE01/BatchPro cessor/logs An example of the runbatch command is as follows: runbatch.sh

4.5.4 Check Delta Load result and error messages


In <LOG_FILE_PATH> are three log files: batchLoadSuccess.out batchLoadFail.out batchLoadSuspect.out Failed records in Delta Load can be found in the batchLoadFail.out file. This log file records three types of information: Failed record index number in original SIF file MDM error message original SIF record Example 4-18 shows a sample batchLoadFail.out file. It reports the first SIF record (index 0) that failed with MDM response XML file and original SIF input record at the end.
Example 4-18 The batchLoadFail.out file 0,<?xml version="1.0" encoding="UTF-8"?> <TCRMService xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="tCRMResponse.xsd"> <ResponseControl> <ResultCode>FATAL</ResultCode> <ServiceTime>35344</ServiceTime> <DWLControl> <requesterLanguage>100</requesterLanguage> <requesterLocale>en</requesterLocale> <requestID>559123429080642132</requestID> </DWLControl> </ResponseControl> <TxResponse> <RequestType>processTx</RequestType> <TxResult>

128

Master Data Management: IBM InfoSphere Rapid Deployment Package

<ResultCode>FATAL</ResultCode> <DWLError> <ComponentType>106</ComponentType> <ErrorMessage>Parser DWLTransaction failed. The format of the message is not correct or an application error occurred.</ErrorMessage> <ErrorType>READERR</ErrorType> <LanguageCode>100</LanguageCode> <ReasonCode>4928</ReasonCode> <Severity>0</Severity> <Throwable>com.dwl.base.requestHandler.exception.RequestParserException: The following is not correct: Type, SubType or their combination</Throwable> </DWLError> </TxResult> </TxResponse> </TCRMService>,P|P|100001|E243883150001||N||||||||100000||||||||100001|||1986-0 3-05 00:00:00||||||||||||||1969-05-30 00:00:00||||||100|100000|108|||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|P|A|100001|E243883150001||| |||||||2008-12-23 16:46:50||||100001||100002|100000||108|702? TEST COURT|||THORNHILL|L4J9K1||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0 |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|P|C|100001|E243883 150001|0|||||||||2008-12-23 16:46:53||||100001|100000||||||08152258043||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 |0|0|0|0|0|0|0|0|0|P|H|100001|E243883150001||||1||TestFN||||LastNm|| |2008-12-23 16:46:51|||||100001|||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|

The successfully loaded SIF records can be found in the batchLoadSuccess.out file. It records the message ID (index of the records in SIF input file) only by default. If you want to see the MDM response for each SIF input request, set the following entry in the Batch.properties file: setMDMSuccessResponseToQueue=true The batchLoadSuspect.out is empty for MDM Server version 9.0.1. RDP for MDM Delta Load is the solution to address moderate volume data load. It can be used for initial high-volume data load. Delta Load eliminates the need to write custom composite transactions for every client implementation, therefore reducing time and cost of implementation. Delta Load makes it possible to change the Suspect Data Processing rule for party matching dynamically, and without restarting MDM Server.

Chapter 4. RDP for MDM: Delta Load

129

130

Master Data Management: IBM InfoSphere Rapid Deployment Package

Chapter 5.

Financial services business scenario


This chapter describes an approach to implement IBM InfoSphere MDM Server using InfoSphere Rapid Deployment Package (RDP) on a linux platform. The scenario takes a financial services business as an example to explain the approach. In this example, the initial load of the IBM InfoSphere MDM Server is performed with RDP for MDM DataStage and QualityStage jobs and subsequent operational loads are performed by using MDM Server RDP runtime assets. This chapter includes the following topics: Introduction Business requirement Environment configuration An approach to implementation Initial load Suspect resolution Hierarchies MDM consumption application Operational processing

Copyright IBM Corp. 2009, 2011. All rights reserved.

131

5.1 Introduction
Fictional Bank Company T, hereafter referred to as FBankCoT, is a fictitious bank that provides services such as savings, checking, and loans in the North American continent. These services were either developed independently or obtained through acquisitions. This resulted in the same customer information potentially being represented inconsistently in each system, thereby leading to increased costs (such as mailing) and poor customer service. To overcome these problems, FBankCoT decided to implement a coexistence1 model of an MDM solution. Some overlap of customer information is expected between the checking, savings, and loans systems. However, a more likely situation is that a single customer has an account in only one or two of the systems.

5.2 Business requirement


The objective is to consolidate master data of a customer in an MDM repository to deliver improved customer service and reduce operational costs. The MDM repository was also required to have defined regional hierarchies, which can establish the association of customers to marketing organizations of the bank. End-of-day data latency was considered acceptable, given the general infrequency of changes to master data in the customer environment, which translates to changes to master data in the operational systems being processed at the end of every business day. Note: Because the objective of this book is to focus on using the RDP for MDM solution for building the MDM repository, we assume a simple data model (single table) for each of the three (savings, checking, and loans) systems.

With a coexistence model, key master data from one or more data sources is consolidated in the MDM repository. Changes occurring in the data sources are applied to the MDM repository. Synchronization is bidirectional: Existing systems provide new or updated data into MDM Server through delta load and MDM Server feeds accurate master data back into existing systems. The latency of the master data in the MDM repository varies by organization and frequency of delta/operational load. Typically, some new applications obtain master data from the MDM repository; legacy applications continue to access the master data from the existing data sources.

132

Master Data Management: IBM InfoSphere Rapid Deployment Package

5.3 Environment configuration


The configuration of the FBankCoT environment with the MDM repository is shown in Figure 5-1.

Users and Administrators

WebSphere Application Server (Domain), XMETA and IADB of IBM InfoSphere Information Server (Linux Platform) virgo.itsosj.sanjose.ibm.com

WebSphere Application Server of IBM InfoSphere MDM Server (Linux Platform) tarus.itsosj.sanjose.ibm.com

DataStage Engine of IBM InfoSphere Information Server (Linux Platform) phoenix.itsosj.sanjose.ibm.com

IADB

Existing Systems and MDM Repository (Linux Platform) orion.itsosj.sanjose.ibm.com Checking Savings Loan MDM Repository

Figure 5-1 FBankCoT environment configuration

Figure 5-1 shows the following information: A Linux server orion.itsosj.sanjose.ibm.com has the following systems: FBankCoT core services of checking, savings, and loans on a DB2 for LUW V9 database (FBANKCOT) that has one table (CHECKING, SAVINGS, and LOAN) for each of the three systems MDM repository database (FBANKCOT) containing the MDM data Two Linux servers (phoenix.itsosj.sanjose.ibm.com and virgo.itsosj.sanjose.ibm.com) that have InfoSphere Information Server 8.0.1 split as follows: WebSphere Application Server (Domain), XMETA and Information Analyzer IADB on virgo.itsosj.sanjose.ibm.com DataStage Engine on phoenix.itsosj.sanjose.ibm.com A Linux server tarus.itsosj.sanjose.ibm.com which has the WebSphere Application Server of the IBM MDM Server. Information Server version 8.0.1 used in this environment.

Chapter 5. Financial services business scenario

133

Important: Our objective is to showcase the RDP for MDM implementation on a Linux platform. For convenience, we chose our data sources and target MDM repository to be hosted on a single Linux platform, even though we recognize that in a real production environment these systems are likely to be hosted on an eclectic mix of operating systems, servers, and database management systems. The configuration we use is meant only to showcase the functionality of the RDP for MDM solution, and should in no way be seen as delivering the scalability and performance requirements of your business solution.

5.4 An approach to implementation


In a real production environment, you are likely perform the following tasks when implementing a coexistence MDM solution using RDP for MDM: 1. Perform a Data Quality Assessment (DQA) of the data sources in your environment to identify Master Data, assess data quality and determine the system of record (SOR) for your Master Data. Information Analyzer, InfoSphere Discovery and QualityStage would figure prominently in such an effort. Note: DQA is assumed to have occurred and only the results of this task is presented here. 2. Review the MDM data model and potentially customize it to the specific requirements of your organization. In case of customization, RDP for MDM jobs, MDM RDP runtime assets and MDM party maintenance services will need to be modified also. Note: Customization of the MDM data model is not covered in this book. 3. Create the code mapping tables from source to SIF and update the MDM code tables with domain values if appropriate. 4. Create a canonical form from the three data sources in our scenario. Note: The canonical form is a concept we invented for this scenario and is not defined in the RDP for MDM solution.

134

Master Data Management: IBM InfoSphere Rapid Deployment Package

5. Validate the RDP for MDM rule sets with the canonical form created in the previous step and modify as needed. If the rule sets are modified, incorporate them into the RDP for MDM jobs. 6. Create the mapping templates (from canonical form columns to the SIF RT/STs). 7. Create the SIF using the mapping templates and the code mapping tables created earlier. 8. Execute the RDP for MDM jobs with Standardization and Matching enabled in the configuration parameter file. If errors occur, correct2 them and reprocess. Note: In our case, we thoroughly cleaned the data prior to creating the SIF so that no errors occurred. However, to show the error messages generated by the RDP for MDM jobs, we created other SIFs containing the most frequently occurring errors and ran it through the RDP for MDM jobs. The purpose was to show the correspondence between a particular error and the error messages generated for it by the RDP for MDM jobs. This information is described in Appendix D, Error processing on page 317. 9. Verify the successful loading of the MDM repository, using the MDM Server Reporting facility. 10.Resolve any suspect parties that were not automatically collapsed in the load jobs, but are suspected to be duplicate using the MDM Server Data Stewardship UI. 11.Establish hierarchies that associate customers to marketing organizations within the bank. 12.Integrate the real-time services of MDM Server in your master data-consuming application. These are typically applications that you already have in your environment such as sales and marketing, CRM, and operational systems such as savings, checking, and loans.
2

You may either correct the errors in the SIF and reprocess the entire SIF again, or correct the data in the data sources and re-create the SIF for processing by the RDP for MDM jobs. This way can be time-consuming and is certainly the best approach when the number of errors is high, or when the errors represent problems with creating the SIF from the source. This approach is also desirable when the total data volume is relatively low, say less than 10 million rows. Correcting the data directly in the SIF can be tedious and error-prone, and should be used only where the number of errors is small. In general, a better way is to fix the data in the source system, or when creating the SIF from the source. Correcting just the records in error and reprocessing them later in delta mode is another option depending upon the number of and nature of the errors reported; a detailed discussion of the considerations involved is beyond the scope of this book. However, this approach is best if the data volume is high and the number of error records is low.

Chapter 5. Financial services business scenario

135

We wrote a sample application that provides a 360 degree view of a person who fetches Master Data (such as address) from the MDM Server by making a call to MDM web services and non-Master Data (such as balances) from the corresponding source systems. 13.Perform operational processing with updates occurring in the source systems after the initial loading. In the next sections, we describe the approach we used to perform the following tasks: Initial loading Suspect resolution Hierarchies MDM consumption application Operational Processing We also have instances of executions of the RDP for MDM jobs with SIFs containing commonly encountered problems to see the correspondence between a specific error condition in the SIF, and the corresponding error messages generated by the RDP for MDM jobs. This information is described in Appendix D, Error processing on page 317.

136

Master Data Management: IBM InfoSphere Rapid Deployment Package

5.5 Initial load


Figure 5-2 shows a high-level overview of the processing flow of the initial load of the MDM repository.

Data source

Data source

..

Data source

MDM Server Key data + Domain values

Data Quality Assessment (DQA) Key data columns Domain values in key data columns

Merge data from all the sources into a canonical form

Not covered in this IBM Redbook

Yes

Customize MDM data model? No

Create representative sample data for standardization ruleset validation and potential modification

Verify adequacy of RDP rulesets

No Modify ruleset OK? Yes Use modified rulesets in RDP jobs

Create mapping tables for each domain value

Map key columns to SIF records Create SIF file

Populate MDM repository using the RDP jobs

Figure 5-2 Rapid MDM approach used in the scenario for the initial load

Briefly, a DQA is performed on the data sources to identify the Master Data columns and the domain values in these Master Data columns for inclusion in the MDM repository. The MDM data models Master Data columns and corresponding domain values is reviewed against those of the three data sources. Based on this review, the MDM Code Reference tables may need to be updated with additional values, and source-to-SIF code mapping tables generated between the source Master Data columns and corresponding MDM Master Data columns.

Chapter 5. Financial services business scenario

137

Customization: If the MDM data model needs to be extended to support your organizations Master Data, then the MDM data model and behavior, and the RDP for MDM jobs must be customized to address your Master Data requirements. In our scenario, we did not include customization in the scope; rather, we only discuss considerations that are involved when customizing the MDM data model in Appendix C, MDM customization considerations on page 309. The Master Data from the data sources is loaded into a canonical form that closely mirrors the format of the SIF records consumed by the RDP for MDM jobs. During this process, you need to ensure that all MDM required columns (as described in Appendix B, Standard Interface File details on page 295) have valid data in them to avoid rejection by the RDP for MDM jobs. It is more efficient to detect and fix these errors early in the cycle (potentially in the source system itself) than much later after the RDP for MDM jobs have flagged it. The purpose of creating a canonical form is to have a single format for validating the efficacy of the RDP for MDM rule sets, and for simplifying the DataStage jobs for creating the SIF, regardless of the number of data sources involved. Typically, the data used for validating the efficacy of the RDP for MDM rule sets would be a representative sample of all the data. If the RDP for MDM rule sets are modified to address your organizations data, then these modified rule sets must replace the corresponding default ones in the RDP for MDM jobs. Ambiguities: In creating this canonical form, an important step is to resolve potential domain value semantic inconsistencies for a given column between multiple data sources. For example Gender of 0 means female in one of the source systems, while the corresponding column Sex value of 0 means male in another source system. After resolving such ambiguities in the canonical form, you should ensure that any user querying the canonical form data is aware of the revised semantics so as not to misinterpret the information retrieved. The data in the canonical form is then loaded into the SIF using the source-to-SIF column mapping templates you will have created, and the source-to-SIF code mapping tables generated earlier. Important: Before the RDP for MDM jobs can be run, you must drop all referential integrity (RI) constraints and triggers defined in the MDM repository. The script for dropping and creating triggers and constraints can be created by querying metadata in the MDM database catalog tables.

138

Master Data Management: IBM InfoSphere Rapid Deployment Package

The RDP for MDM configuration parameters are set to perform standardization and matching in the RDP for MDM jobs. The created SIF is processed by the RDP for MDM jobs. After all errors have been resolved, the MDM repository would have been loaded successfully. You should verify this by searching for known records in the MDM repository using the MDM Server UI. The referential constraints and triggers must be re-created before the MDM repository can be considered operational and consumable by business applications. Important: The best approach is to use the Standardization and Matching functionality of RDP for MDM jobs as much as possible, so as to rapidly deploy your MDM implementation. However, no generalized Standardization and Matching functionality might suit the particular requirements of your organizations data and could therefore require modification. For maximum efficiency, the recommended approach allows you to validate the efficacy of the RDP for MDM rule sets to your data and customize it if required. The overall flow is covered in more detail for our particular scenario: FBankCoT checking, savings, and loans systems Data Quality Assessment (DQA) Create canonical form from the data sources Validate efficacy of the RDP for MDM rule sets and modify to suit Create SIF Execute RDP for MDM jobs Verify successful load

5.5.1 FBankCoT checking, savings, and loans systems


The FBankCoT checking, savings, and loans systems are hosted on a DB2 for Linux, UNIX, and Windows (LUW) V9 database. As mentioned earlier, because we were only interested in Master Data that needed to be included in the MDM Server solution, we created one table for each system containing all the required Master Data. The DDL of the three tables is shown in Example 5-1 on page 140; the data content in each of these tables is available in Appendix A, Configuration parameter file on page 275. The Master Data columns in each table are highlighted in bold in Example 5-1 on page 140. However, note that all the columns are defined as being nullable with no Primary Key defined. In a real production environment, you most likely would have a Primary Key defined for each table

Chapter 5. Financial services business scenario

139

Note: Some overlap of customers and Master Data columns exists among the data in the three systems. In a few cases, such as address, the address data might all exist in one column in one system, but have the address data split over multiple columns in another system.
Example 5-1 DDL of the Checking, Savings, and Loan table

CREATE TABLE DB2INST1.CHECKING ( BALANCE DECIMAL(10 , 0), RATE DECIMAL(10 , 0), OVERDRAF_RATE DECIMAL(10 , 0), OVERDRAF_FEE INTEGER, CHECKINGID INTEGER, CUSTOMERID INTEGER, NAME VARCHAR(255), ADDRESS VARCHAR(255), COUNTRY VARCHAR(255), PHONE VARCHAR(255), SSN VARCHAR(255), DOB VARCHAR(10), DOD VARCHAR(10), GENDER VARCHAR(255), WORK_STATUS VARCHAR(255), PREF_LANGUAGE VARCHAR(255), AGEVERIFICATIONDOCUMENT VARCHAR(255), AGEVERIFICATIONNB VARCHAR(255), NATIONALITY VARCHAR(255), CUSTOMER_STATUS VARCHAR(255) ) DATA CAPTURE NONE IN USERSPACE1; CREATE TABLE DB2INST1.SAVINGS ( SAVINGSID INTEGER, SALUTATION VARCHAR(255), NAME VARCHAR(255), STREET VARCHAR(255), CITY VARCHAR(255), COUNTRY VARCHAR(255), SSN VARCHAR(255), DOB DATE, PHONE VARCHAR(255), CELLPHONE VARCHAR(255), GENDER INTEGER,

140

Master Data Management: IBM InfoSphere Rapid Deployment Package

BALANCE DOUBLE, RATE DOUBLE, OVERDRAF_RATE DOUBLE, CO_OWNER VARCHAR(10), EFFECTIVE_CUSTOMERDATE DATE, SOLICITATIONALLOW VARCHAR(255), DRIVERLICENSEID VARCHAR(255), CUSTOMER_PERFORMANCE VARCHAR(255) ) DATA CAPTURE NONE IN USERSPACE1; CREATE TABLE DB2INST1.LOAN ( LOANID INTEGER, CUSTOMERID INTEGER, PASSPORTNB INTEGER, TITLE VARCHAR(10), FIRSTNAME VARCHAR(255), LASTNAME VARCHAR(255), INITIALS VARCHAR(255), STREET VARCHAR(255), CITY VARCHAR(255), COUNTRY VARCHAR(255), EMAIL VARCHAR(255), GENDER VARCHAR(255), DOB DATE, DOD DATE, PAYMENT_SCHEDULE INTEGER, RATE DOUBLE, INITIAL_VALUE INTEGER, CREATION_DATE DATE, LATE_FEE DOUBLE, LATE_RATE DOUBLE, BALANCE DOUBLE, AUTOMAT_DEBT_IND VARCHAR(255), GUARANTOR_ID VARCHAR(255), MARRIED_STATUS VARCHAR(255), CUSTOMER_STATUS VARCHAR(255) ) DATA CAPTURE NONE IN USERSPACE1;

Chapter 5. Financial services business scenario

141

5.5.2 Data quality assessment


Data quality assessment (DQA) is the process of exposing technical and business data issues to plan the data integration effort most likely to succeed within budget and time constraints. Technical quality issues based on target technical standards are generally easy to discover and correct. Examples are as follows: Different or inconsistent standards in structure, format, or values Missing data, default values Spelling errors, data in wrong fields Buried information in free-form fields

Business quality issues however are more subjective, and are associated with business processes such as generating accurate reports, ensuring that data driven processes are working correctly, and shipments are going out on time. Because accuracy, timeliness, and correctness are subjective measures, assessing the business quality of the data requires the involvement of the business community. Note: For enterprise-level initiatives, such as ERP implementations or system consolidation, integration challenges at both the business and technical levels generally revolve around the semantic reconciliation of master data objects such as customer, product, and vendor. Because the business is the ultimate recipient and user of the data resulting from the integration effort, the success of a DQA is greatly dependent upon the ability and commitment of the business community to participate in the process, and more importantly, resolve semantic and business rule differences at the functional level.

142

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-3 shows a high-level overview of the primary steps of a DQA process. Prepare the data for assessment. Select the data sources to be investigated and analyzed. Conduct data discovery. The data analyst and subject matter expert (SME) perform the investigation and analyses by using tools such as IBM InfoSphere Information Analyzer and IBM InfoSphere Discovery. This task involves checking metadata integrity, structural integrity, entity integrity, relational integrity, and domain integrity. Document data quality issues and decisions. After all information about data quality is known, the appropriate data alignment and cleansing decisions can be made and implemented. Note: Typical DQA durations is between four to eight weeks. In short, focused development efforts are kept tight, although assessment can be ongoing and iterative. In longer (six or more months), development efforts can typically run six and more weeks and are a key part of requirements definition.

IT Data Analyst Staged Source

Information Analyzer Infosphere Discovery


Full Volume profiling and Automated data analysis

All Information & Reports

Data Alignment Decisions

Meta Data/Domain Integrity Column Analysis Completeness Consistency Pattern Consistency Translation table creation Structural Integrity Table Analysis Key Analysis Entity Integrity Duplicate Analysis Targeted Data Accuracy Relational Integrity Cross-Table Analysis Redundancy Analysis Domain Integrity Business Rule Identification and Validation

Figure 5-3 DQA approach: data assessment

Chapter 5. Financial services business scenario

143

The IBM InfoSphere Information Server product provides three tools for data assessment: IBM InfoSphere Information Analyzer With this product, you can quickly discover condition of data in large volumes of data in a fraction of the time that could be handled manually. Through its Column Analysis, Primary Key Analysis and Cross-Table Analysis functions, IBM InfoSphere Information Analyzer enables systematic analysis and reporting of results, thereby allowing the data analyst and subject matter expert to focus on the real problem of data quality issues. It enables you to apply professional quality control methods to manage the accuracy, consistency, completeness, and integrity of information stored in databases. By employing technology that integrates total quality management (TQM) principles with data modeling and relational database concepts, IBM InfoSphere Information Analyzer diagnoses data quality problems and facilitates data cleanup efforts. IBM InfoSphere QualityStage This product complements IBM InfoSphere Information Analyzer by investigating free-form text fields such as names, addresses, and descriptions. With IBM InfoSphere QualityStage, you can define rules for standardizing free-form text domains which is essential for effective probabilistic matching of potentially duplicate master data records. This level of sophisticated data assessment is critical to understanding the total cleansing effort required for a data integration project. IBM InfoSphere QualityStage is covered in IBM WebSphere QualityStage Methodologies, Standardization, and Matching, SG24-7546. IBM InfoSphere Discovery This product accelerates information-centric project deployment and reduces risk by creating a 360 degree view of data relationships across heterogeneous sources. Using patented capabilities, InfoSphere Discovery identifies and documents what data you have, where it is located and how it is linked across systems by intelligently capturing relationships and determining applied transformations and business rules.

144

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-4 summarizes the functions provided by IBM InfoSphere Information Analyzer and IBM InfoSphere QualityStage.

Data Source(s)

Metadata Integrity Domain Integrity Structural Integrity Information Analyzer


Data Rule validation

Information Analyzer Infosphere Discovery Metadata Access & Enrichment Column Analysis & Domain Assessment Primary Key Analysis Foreign Key Analysis Cross-Domain Analysis Automated Analysis Qualitystage Text pattern Analysis

Relational Integrity Entity Integrity Ongoing Metrics


Information Analyzer Metrics & Reporting

Qualitystage Duplicate Analysis Information Analyzer Baseline Analysis

Figure 5-4 Data assessment tools functionality

A discussion of all the steps and the benefits of Data Quality Assessment (DQA) is beyond the scope of this book. For details about these steps, see IBM WebSphere Information Analyzer and Data Quality Assessment, SG24-7508. In this scenario, we focus on determining the domain values in the columns in the source systems that need to be mapped to the corresponding columns in the MDM repository. The determination of the domain values in the source systems might necessitate adding3 new domain values to the MDM repository to accommodate values that exist in the source systems. For example, if the source system rates a customer into five categories (1 - 5) and the MDM repository only allows four categories, you must add another category to the MDM repository code reference table for customer rating. Also, because the SIF must be loaded with domain values expected by the MDM repository, the process creating the SIF must map the values in the source systems to the values in the MDM repository. Mapping tables are required for each code reference table in the MDM repository. For example, gender might be stored as 0 (female) and 1 (male) in the source systems, while the MDM repository expects M (male) and F (female). This requires a mapping table for gender that maps 0 to F, and 1 to M.
3

As part of the implementation preparation, the MDM code tables must be populated with appropriate values. The MDM implementation process, the steps to determine what these values should be, and how they are loaded is not within the scope of this book.

Chapter 5. Financial services business scenario

145

The Column Analysis Frequency Distribution Data report of IBM InfoSphere Information Analyzer is used to determine the valid domain values4 in the various code reference columns in the source systems. Figure 5-5 through Figure 5-7 on page 147 show the report for the GENDER column in the Checking, Savings and Loan table respectively. One value in Figure 5-5 for the Checking table shows an invalid value X, which is assumed to be corrected in the source system. While the Checking and Loan tables have M and F as the domain values, the Savings table has 0 (female) and 1 (male) as domain values.

FBANKCOT

Figure 5-5 Frequency Distribution Data report for GENDER column in Checking table

We assume that DQA process on the organizations Master Data has identified invalid domain values in the source table columns that correspond to columns in the MDM, and that these invalid values have been corrected in the source systems before mapping tables are created.

146

Master Data Management: IBM InfoSphere Rapid Deployment Package

FBANKCOT

Figure 5-6 Frequency Distribution Data report for GENDER column in Savings table

FBANKCOT

Figure 5-7 Frequency Distribution Data report for GENDER column in Loan table

Chapter 5. Financial services business scenario

147

Table 5-1 shows the columns that need to be mapped between the sources and the target MDM repository. This list was arrived at after an analysis of the code reference tables in the MDM repository and those in the source systems.
Table 5-1 Code table mapping between the sources and the MDM repository Common columns
Country

Source systems Column & source & domain values


COUNTRY in CHECKING US (null) US (null) US (null) LOW MID HIGH (null) GOLD SILVER BRONZE (null) A B C D (null) M F (null) 1 0 (null) M F (null) Married Single Divorced (null) US (null) English (null) Mr. Mrs. (spaces) (null) Mr. Mrs. (spaces) (null) 1 2 3 1 2 3 4 1 2 3

MDM Server Domain values & column


185 and other country codes COUNTRY_TP_CD in CDCOUNTRYTP

COUNTRY in SAVINGS

COUNTRY in LOAN

Customer performance

CUSTOMER_PERFORMANCE in SAVINGS

CLIENT_IMP_TP_CD in CDCLIENTIMPTP

CUSTOMER_STATUS in LOAN

Customer status

CUSTOMER_STATUS in CHECKING

CLIENT_ST_TP_CD in CDCLIENTSTTP

Gender

GENDER in CHECKING

M F

Not validated in MDM

GENDER in SAVINGS

GENDER in LOAN

Marital status

MARRIED_STATUS in LOAN

MARITAL_ST_TP_CD in CDMARITALSTTP

Nationality

NATIONALITY in CHECKING

185 and other country codes 100 and other language codes 14 15 and other salutation codes

COUNTRY_TP_CD in CDCOUNTRYTP LANG_TP_CD in CDLANGTP PREFIX_NAME_TP_CD in CDPREFIXNAMETP

Preferred language

PREF_LANGUAGE in CHECKING

Salutation

SALUTATION in SAVINGS

TITLE in LOAN

148

Master Data Management: IBM InfoSphere Rapid Deployment Package

Important: If the MDM repository is populated from the same column (such as gender) in multiple data sources, then overlapping values which have different semantic meanings is possible. For example, in one system, the value 0 might represent a female; the value 0 in another system might represent a male. When creating the canonical form, semantic conflicts must be resolved before populating the column. This situation did not exist in our scenario. Figure 5-8 on page 151 shows the mapping between the Master Data columns in the source systems to the corresponding columns in the canonical form table shown in Example 5-2 on page 150. In the canonical form, note the following information: The CUSTOMERID columns gets mapped to the ADMIN_CLIENT_ID column in the SIF which becomes part of the SSK in the MDM data repository. It is for all practical purposes the primary key for access in the source system. Important: There is no CUSTOMERID equivalent column in the Savings system. We therefore artificially generated a value that concatenated the SAVINGSID column (an implicit primary key) with an additional character and populated the CUSTOMERID column with it. When coding the MDM consumption application, we extract the SAVINGSID component from the CUSTOMERID column when it needs to retrieve non-Master Data from the Savings system as shown in the JSP application code in Example 5-6 on page 269. Two columns (SRCSYSTEMID and ZIPCODE) do not have corresponding columns in the source. The SRCSYSTEMID column is generated based on the source system columns being mapped (1 for Checking, 2 for Savings, and 3 for Loan); the ZIPCODE is embedded in other columns in the source systems and therefore not explicitly mapped.

Chapter 5. Financial services business scenario

149

Example 5-2 DDL of the canonical form table

CREATE TABLE STAGING.CANONICAL_TBL ( SRCSYSTEMID INTEGER NOT NULL, CUSTOMERID VARCHAR(255) NOT NULL, ACCOUNTID VARCHAR(255) NOT NULL, WORKSTATUS VARCHAR(255), CELLNB VARCHAR(255), PHONENB VARCHAR(255), EMAIL VARCHAR(255), PASSPORTNB VARCHAR(255), DRIVERLICNB VARCHAR(255), SSN VARCHAR(255), FIRSTNAME VARCHAR(255), LASTNAME VARCHAR(255), INITIALS VARCHAR(255), STREETADDRESS VARCHAR(255), CITY VARCHAR(255), COUNTRY VARCHAR(255), ZIPCODE VARCHAR(255), DOD DATE, DOB DATE, MARITALSTATUS VARCHAR(255), GENDER CHAR(1), NATIONALITY VARCHAR(255), CUSTOMERSTATUS VARCHAR(255), CUSTOMERPERF VARCHAR(255), STARTDATE DATE, SOLICITATIONALLOW VARCHAR(255), AGEVERIFICATIONDOC VARCHAR(255), SALUTATION VARCHAR(255), PREF_LANGUAGE VARCHAR(255), FREEFORMNAME VARCHAR(255), FREEFORMADDRESS VARCHAR(255) ) DATA CAPTURE NONE IN STAGINGSPACE;

150

Master Data Management: IBM InfoSphere Rapid Deployment Package

Source systems
WORK_STATUS PREF_LANGUAGE CHECKINGID GENDER PHONE CUSTOMERID NATIONALITY CUSTOMER_STATUS DOD AGEVERIFICATIONDOCUMENT ADDRESS SSN COUNTRY DOB AGEVERIFICATIONNB NAME CITY EFFECTIVE_CUSTOMERDATE SAVINGSID GENDER PHONE STREET DRIVERLICENSEID CELLPHONE COUNTRY SSN DOB SALUTATION NAME SOLICITATIONALLOW CUSTOMER_PERFORMANCE TITLE LASTNAME CITY EMAIL PASSPORTNB GENDER FIRSTNAME CUSTOMERID STREET CUSTOMER_STATUS DOD INITIALS COUNTRY MARRIED_STATUS DOB LOANID

Canonical form

C H E C K I N G

S A V I N G S

L O A N S

SRCSYSTEMID CUSTOMERID ACCOUNTID WORKSTATUS CELLNB PHONENB EMAIL PASSPORTNB DRIVERLICNB SSN FIRSTNAME LASTNAME INITIALS STREETADDRESS CITY COUNTRY ZIPCODE DOD DOB MARITALSTATUS GENDER NATIONALITY CUSTOMERSTATUS CUSTOMERPERF STARTDATE SOLICITATIONALLOW AGEVERIFICATIONDOC SALUTATION PREF_LANGUAGE FREEFORMNAME FREEFORMADDRESS

SRCSYSTEMID is assigned a value of 1 (checking), 2 (savings) or 3 (loans) depending upon the source ZIPCODE has no assignment from any of the input sources

Figure 5-8 Mapping from source(s) to canonical form

5.5.3 Create canonical form from the data sources


This section describes the mapping of data from the three source systems to a single canonical form table. The primary steps involved are as follows: 1. Define the sources to canonical form table target mapping. 2. Populate the canonical form table.

Chapter 5. Financial services business scenario

151

Define the sources to canonical form table target mapping


We used the IBM InfoSphere Information Server FastTrack component, Version 8.0.1 to perform the mapping and generate the DataStage jobs5. FastTrack provides Data Architects and Business Analysts with a drag-and-drop user interface to InfoSphere Information Server which allows them to define source-to-target mapping specifications and to define and track additional requirements for data transformations. From these mapping specifications, DataStage jobs6 and job templates are generated. The DataStage developer can test and verify the generated job followed by modification of the job according to the DataStage development best practices. The DataStage developer can then execute it to move source data to the target. Note: We assume that the DQA has taken place previously, and therefore the required ODBC data sources (see Figure 5-9 which includes the definition of the FBANKCOT and IADB data sources) have been defined for both the sources and target systems. All the data sources that were imported using InfoSphere Information Server console is also available to FastTrack users. The metadata acquired from these data sources is used to identify the target columns and tables in FastTrack, and to configure ODBC connectivity in the generated DataStage jobs.

5 6

Template jobs for more complex requirements In our scenario, the mappings are relatively simple. Therefore, the mapping specifications are used to generate runnable DataStage jobs.

152

Master Data Management: IBM InfoSphere Rapid Deployment Package

[FBANKCOT] QEWSD=39715 Driver=/opt/IBM/InformationServer/Server/branded_odbc/lib/VMdb222.so Description=FICTIONAL BANKING COMPANY T DATA SOURCE AddStringToCreateTable= AlternateID= Database=FBANKCOT DynamicSections=100 GrantAuthid=PUBLIC GrantExecute=1 IpAddress=9.43.86.101 IsolationLevel=CURSOR_STABILITY LogonID=db2inst1 Password=itso13sj PackageOwner= TcpPort=60000 WithHold=1 [iadb] QEWSD=39715 Driver=/opt/IBM/InformationServer/Server/branded_odbc/lib/VMdb222.so Description=IADB connection AddStringToCreateTable= AlternateID= Database=iadb DynamicSections=100 GrantAuthid=PUBLIC GrantExecute=1 IpAddress=9.43.86.104 IsolationLevel=CURSOR_STABILITY LogonID=iauser Password=itso13sj PackageOwner= TcpPort=50000 WithHold=1

Figure 5-9 ODBC data sources on odbc.ini file

Figure 5-10 on page 154 through Figure 5-22 on page 167 show windows for creating a specification that maps the SAVINGS source columns to the corresponding CANONICAL target columns, using FastTrack, and the generation and configuration of the DataStage job for that specification. Note: The mapping is repeated for the CHECKING and LOAN sources also, but that is not repeated here.

Chapter 5. Financial services business scenario

153

The steps are as follows: 1. Figure 5-10 shows the FastTrack login to the appropriate server (virgo) with the user ID isadmin; we assume that this user has the required permissions to access InfoSphere Information Server.

Figure 5-10 Define the sources to canonical form table target mapping (1 of 13)

154

Master Data Management: IBM InfoSphere Rapid Deployment Package

2. FastTrack source-to-target mapping specifications are contained in projects. We opened a previously created project, named SourceToSif_Canonical, for our mapping specification as shown in Figure 5-11.

Figure 5-11 Define the sources to canonical form table target mapping (2 of 13)

Chapter 5. Financial services business scenario

155

3. Figure 5-12 shows a list of source-to-target mapping specifications defined in this project. Click New Mapping in the Tasks list to create a new mapping specification.

Figure 5-12 Define the sources to canonical form table target mapping (3 of 13)

156

Master Data Management: IBM InfoSphere Rapid Deployment Package

4. Provide details of the new mapping specification in the Mapping Editor as shown in Figure 5-13, such as Name (SAVINGS_TO_CANONICAL). Click Column Mappings in the Basic section of the tab list to map the columns.

Figure 5-13 Define the sources to canonical form table target mapping (4 of 13)

Chapter 5. Financial services business scenario

157

5. As shown in Figure 5-14, open the Database metadata tab and expand the metadata tree under the target host (ORION.ITSOSJ.SANJOSE.IBM.COM), and navigate to the target canonical table (STAGING.CANONICAL_TBL in the database FBANKCOT). Drag this target table to the mapping canvas in the Target Columns field. This move causes this area to be populated with the columns from the STAGING.CANONICAL_TBL as shown.

FBANKCOT

Figure 5-14 Define the sources to canonical form table target mapping (5 of 13)

158

Master Data Management: IBM InfoSphere Rapid Deployment Package

6. In Figure 5-15, open the Database metadata tab and expand the metadata tree and navigate to the required source table STAGING.SAVINGS table in the FBANKCOT database in the source host ORION.ITSOSJ.SANJOSE.IBM.COM. Select the required source columns such as CITY and drag it on to the mapping canvas in the Source Columns field which corresponds to the target column (CITY) as shown.

FBANKCOT

ORION.ITSOSJ.SANJOSE.IBM.COM. FBANKCOT.STA

ORION.ITSOSJ.SANJOSE.IBM.COM. FBANKCOT.STA

Figure 5-15 Define the sources to canonical form table target mapping (6 of 13)

Chapter 5. Financial services business scenario

159

7. Repeat the process for the remaining columns. However, in the case of the target columns CUSTOMERID and SRCSYSTEMID, we must define a transformation function as follows: As mentioned previously, in general, the CUSTOMERID columns are mapped to the ADMIN_CLIENT_ID column in the SIF, which becomes part of the SSK in the MDM data repository. It is for all practical purposes the primary key for access in the source system. However, because there is no CUSTOMERID equivalent column in the Savings system, we artificially generated a value that concatenated the SAVINGSID column (an implicit primary key) with an additional character (1) and populated the CUSTOMERID column. This is achieved by placing a value of SAVINGSID: 1, under the Transformation Function field for the target CUSTOMERID column, as shown in Figure 5-16 on page 161. This action is remembered in the MDM consumption application when it must retrieve non-Master Data from the Savings system as shown in the JSP application code in Example 5-6 on page 269.

160

Master Data Management: IBM InfoSphere Rapid Deployment Package

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT. STAGING

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT. STAGING

Figure 5-16 Define the sources to canonical form table target mapping (7 of 13)

In the case of the target column SRCSYSTEMID, we need to have a constant value (2) placed in it to indicate that SAVINGS is the source system. This is achieved by placing a value of 2 under the Transformation Function field for the target SRCSYSTEMID column as shown in Figure 5-17 on page 162. It also shows the Transformation Function field with a value of SetNull() for some of the columns such as CUSTOMERSTATUS and STARTDATE. Note: The mapping specification is complete when all the columns have been mapped correctly.

Chapter 5. Financial services business scenario

161

Click Save and then Close.

FBANKCOT

ORION.ITSOSJ.SANJOSE.IBM.COM. FBANKCOT.STAGING

Figure 5-17 Define the sources to canonical form table target mapping (8 of 13)

162

Master Data Management: IBM InfoSphere Rapid Deployment Package

8. Select the newly created mapping specification SAVINGS_TO_CANONICAL and click Generate Job in the Tasks list as shown in Figure 5-18.

Figure 5-18 Define the sources to canonical form table target mapping (9 of 13)

Chapter 5. Financial services business scenario

163

9. For the Composition Type, select No Composition, as shown in Figure 5-19, because we are only generating a job from a single mapping specification. Click Next.

Figure 5-19 Define the sources to canonical form table target mapping (10 of 13)

164

Master Data Management: IBM InfoSphere Rapid Deployment Package

10.Select the project (SourceToSIF) and folder (SourceToCanonical) and click Finish. This step saves the generated job (with the Name of New Job SAVINGS_TO_CANONICAL) in the selected project and folder, as shown in Figure 5-20. Click Next.

Figure 5-20 Define the sources to canonical form table target mapping (11 of 13)

Chapter 5. Financial services business scenario

165

11.Connection details for the generated job need to be defined in the Job parameters as shown in Figure 5-21. Job parameters are used to pass database connection data such as user name and password. As a general guideline, these database connections generally consist of the source database, the target database, and the lookup data source (if lookups are implemented). Navigate to the data sources and enter the appropriate job parameter names for each data source. Supply the required user name and a password parameter and click Finish to generate the DataStage job.

FBANKCOT

Figure 5-21 Define the sources to canonical form table target mapping (12 of 13)

The generated job that is shown in Figure 5-22 on page 167 is the job corresponding to CHECKING_TO_CANONICAL instead of SAVINGS_TO_CANONICAL. This was an error on our part while capturing the screens.

166

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-22 Populate the canonical form table (13 of 13)

Populate the canonical form table


After the DataStage jobs have been generated (and modified if necessary to meet additional requirements), the jobs need to be compiled and executed. The DataStage jobs can either be run one by one from Director, or a job sequence can be built that controls the jobs. Notes: As mentioned earlier, we mistakenly only captured the figure of the CHECKING_TO_CANONICAL job instead of SAVINGS_TO_CANONICAL. The execution of the CHECKING_TO_CANONICAL job is described in Figure 5-23 on page 168 through Figure 5-25 on page 171. Figure 5-22 on page 167 shows the generated job CHECKING_TO_CANONICAL in Designer. Any changes to the job design, specifically, the Transformer Stage of the generated job could be of interest, because this stage implements the derivations and source to target mappings. We did not make any modifications.

Chapter 5. Financial services business scenario

167

Figure 5-23 through Figure 5-25 on page 171 show the windows of the execution of the generated job, and after all the sources were processed, the partial contents of the canonical form table is shown in Example 5-3 on page 169. The steps are as follows: 1. Open the Director Client. 2. Navigate to the newly generated job CHECKING_TO_CANONICAL job folders to the newly generated job as shown in Figure 5-23.

Figure 5-23 Populate the canonical form

3. Specify the appropriate parameters in the Job Run Options setting the appropriate values for the job parameters, and click Run as shown in Figure 5-24.

Figure 5-24 Populate the canonical form table 3/4

4. Review the jobs execution in Director for job logs.

168

Master Data Management: IBM InfoSphere Rapid Deployment Package

5. The data from all the sources must be loaded into the CANONICAL_TBL. The partial contents of this table is shown in Example 5-3.
Example 5-3 Partial contents of CANONICAL_TBL
3,"8000037","30000004",,,,"dfx@usa.ibm.com","608813863",,,"Denise ","Farrel","DF","1735 Saratoga Ave","San Jose","US",,,19860902,"Divorced","F",,,"GOLD ",,,,"Mr.",,, 3,"8000885","30000035",,,,"Kelly@gmail.com","743118324",,,"Kelly ","Hopkins","KH","1482 Rhode Island St","San Francisco","US",,,20010317,"Married","F",,,"BRONZE ",,,,"Mrs.",,, 3,"8000637","30000029",,,,"Burr@gmail.com","111695965",,,"Burr ","Preston","BP","3726 Broderick St","San Francisco","US",,,19890725,"Divorced","M",,,"BRONZE ",,,,,,, 3,"8000002","30000016",,,,"Allan@gmail.com","921319004",,,"Allan ","Jensen","AJ","PO Box 7424","San Francisco","US",,,19570329,"Married","M",,,"BRONZE ",,,,"Mrs.",,, ................ .......... 2,"200000161","20000016",,"(415) 923-1998","(408) 269-0922",,,"S99887766","111-345-2312",,,,"2584 Junction Ave","San Jose","US",,,19971102,,"0",,,"HIGH ",,"Y",," ",,"A Fanelli", 2,"200000071","20000007",,"(408) 919-1500","(999) 999-9999",,,"S12312311",,,,,"1603 Bel Air Ave","San Jose","US",,,19890823,,"0",,,"MID ",,"Y",," ",,"Anna Fanelli", 2,"200000001","20000000",,"(408) 236-2527","(999) 999-9999",,,"S45118674","232-22-4444",,,,"6177 Purple Sage Ct","San Jose","US",,,19370802,,"1",,,"MID ",,"Y",," ",,"Bruce H Anderson", 2,"200000121","20000012",,"(415) 561-8511","(408) 782-7100",,,"S22334455","112-99-1212",,,,"321 Curie Drivee","San Jose","US",,,19750902,,"1",,,"MID ",,"Y",,"Mrs.",,"Torben Andersom", 2,"200000171","20000017",,"(415) 673-4598","(408) 919-1500",,,"E123456789","456-34-4563",,,,"5528 Muir Dr","San Jose","US",,,19900314,,"1",,,"MID ",,"N",,"Mrs.",,"A Carter", .............. .......... 1,"70006245","10000022","Empl",,"(415) 296-9450",,,"xxxxxxxx","133-34-2345",,,,,,"US",,,19770918,,"M","US","B ",,,,"Drivers License",,"English","Andrew I Jensen","44 Montgomery St,Ste 3705,San Francisco,94104" 1,"70004432","10000006","Empl",,"(408) 850-6400",,,"S67856745","",,,,,,"US",,,19760802,,"M","US","C ",,,,"Drivers License",,"English","Gayle Fagan","2315 N 1st St,,San Jose,95119" 1,"70002305","10000023","Empl",,"(415) 282-0219",,,"S12312311","234-45-3434",,,,,,"US",,,19451022,,"F","US","B ",,,,"Drivers License",,"English","Anette A Jensen","77 Grand View Ave,Apt 202,San Francisco,94114" 1,"70002268","10000004","Empl",,"(800) 817-8232",,"134785432",,"xxx-xx-xxxx",,,,,,"US",,,19980803,,"M","US","B ",,,,"Passport",,"English","Anders Olsson","2050 North First Street,,San Jose,95119" 1,"70006863","10000020","Empl",,"(415) 683-0763",,,"xxxxxxxx","123-45-6789",,,,,,"US",,20081001,19670502,,"M","US","B ",,,,"Drivers License",,"English","Aaron Jensen","1363 14th Ave,,San Francisco,94122" 1,"70007096","10000027","Empl",,"(415) 677-9723",,"111345674",,"",,,,,,"US",,,19960411,,"M","US","A ",,,,"Passport",,"English","Allan Preston","720 Market St,Ste 900,San Francisco,94102" 1,"70005799","10000008","Empl",,"(800) 553-6387",,,"S98765432","345-34-2378",,,,,,"US",,,19370901,,"F","US","C ",,,,"Drivers License",,"English","Arcangelo Fanelli","170 W Tasman Dr,,San Jose,95119" 1,"70003060","10000024","Empl",,"(415) 586-7966",,,"S34565422","123-22-2222",,,,,,"US",,,19660825,,"X","US","B ",,,,"Drivers License",,"English","Anton T & Larue Jensen","258 Lisbon St,,San Francisco,94112" 1,"70005333","10000001","Empl",,"(408) 226-2327",,,"S13494673","453-42-1234",,,,,,"US",,,19450312,,"F","US","B ",,,,"Drivers License",,"English","Christina Anderson","6181 Camino Verde Dr,,San Jose,95119" 1,"70007859","10000002","Empl",,"(408) 782-7100",,,"S33433434","543-23-9999",,,,,,"US",,,19671201,,"F","US","B ",,,,"Drivers License",,"English","Alexandra Anderson","321 Curie Drivee,,San Jose,95119" ................ ................

5.5.4 Validate and modify efficacy of the RDP MDM rule sets
As mentioned previously, the purpose of creating a canonical form was to have a single format for validating the efficacy of the RDP for MDM rule sets, and for simplifying the DataStage jobs for creating the SIF, regardless of the number of data sources involved. The data used for validating the efficacy of the RDP for MDM rule sets should be a representative sample of all the data. Note: In our test environment, the volume of data was quite small and we therefore chose to use all of it as input to this process.

Chapter 5. Financial services business scenario

169

If the RDP for MDM rule sets are modified to address your organizations data, then these modified rule sets must replace the corresponding default ones in the RDP for MDM jobs. Important: We adopted this approach after our bad experience with directly executing the RDP for MDM jobs on the canonical form data (without this validation step) in which a majority of the rows got rejected by the RDP for MDM jobs because the critical CITY field in the SIF record was empty (because of standardization errors). We then introduced this approach of validating the canonical form data with the RDP for MDM rule sets, with the intention of modifying the default RDP for MDM rule sets to address potential problems with the CITY field in particular. Finally, because of time constraints, we chose to override the USPREP rule set (which is not in the RDP for MDM rule sets) only to ensure that the CITY name in the input was passed on to the SIF record appropriately. We did not make any changes to improve the quality of the standardization such as modifying the classifications and other overrides to fix problems such as misspellings of Drivee and Avedue. Be sure that you perform the necessary overrides to correct such problems also to ensure quality data is loaded into your MDM data repository. Hence our description of the process of exporting the RDP for MDM rule sets and importing them back after changes to them. In this section, we describe the following tasks: Importing out-of-the-box (OOTB) RDP for MDM rule sets into a DataStage project Validating RDP for MDM rule set in the standardization job Overriding Input Patterns & rerun the standardization job & export modified rule set Importing modified RDP for MDM rule sets into RDP for MDM jobs Note: As mentioned previously, the following information is not meant to be a tutorial about the use of QualityStage because that is beyond the scope of this book. See IBM WebSphere QualityStage Methodologies, Standardization, and Matching, SG24-7546 for information about using QualityStage. In this book, we include several relevant figures to facilitate a better understanding of the process adopted and guidelines proposed.

170

Master Data Management: IBM InfoSphere Rapid Deployment Package

Import OOTB RDP for MDM rule sets into a DataStage project
We created a project named MDMTESTRULE, in which we created a standardization job to analyze all the data created in canonical form (described in 5.5.3, Create canonical form from the data sources on page 151) for efficacy of the OOTB RDP for MDM rule sets. Figure 5-25 through Figure 5-43 on page 187 show several windows in the import process, as follows: 1. Launch the WebSphere DataStage and QualityStage Designer, and from the task bar, click Import DataStage Components, as shown in Figure 5-25.

Figure 5-25 Import OOTB RDP for MDM rule sets into a standardization job (1 of 7)

2. In the DataStage Repository Import window, specify the RDP for MDM jobs DSX file, select the Import selected radio button to select the components we want to import, and click OK, as shown in Figure 5-26.

Figure 5-26 Import OOTB RDP for MDM rule sets into a standardization job (2 of 7)

Chapter 5. Financial services business scenario

171

Figure 5-27 through Figure 5-29 on page 174 show the available components.

Figure 5-27 Import OOTB RDP for MDM rule sets into a standardization job (3 of 7)

172

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-28 Import OOTB RDP for MDM rule sets into a standardization job (4 of 7)

Chapter 5. Financial services business scenario

173

Figure 5-29 Import OOTB RDP for MDM rule sets into a standardization job (5 of 7)

174

Master Data Management: IBM InfoSphere Rapid Deployment Package

3. Because we are only interested in the components related to name and address standardization, we select only them (four shared containers and all the rule sets) and click OK. The progress of the import of the selected components is shown in Figure 5-30.

Figure 5-30 Import OOTB RDP for MDM rule sets into a standardization job (6 of 7)

Note: As mentioned previously, we actually only made changes to the USPREP rule set, which is not in the RDP for MDM rule sets. 4. At the completion of the import, review the imported components in the ValidationStanContainers in the navigation pane in Figure 5-31 on page 176.

Chapter 5. Financial services business scenario

175

Figure 5-31 Import OOTB RDP for MDM rule sets into a standardization job (7 of 7)

176

Master Data Management: IBM InfoSphere Rapid Deployment Package

You can now proceed to validate the efficacy of the OOTB RDP for MDM rule sets in the standardization job.

Validate RDP for MDM rule sets on the standardization job


We created a copy7 (named ORGUSPREP) of the USPREP OOTB RDP for MDM rule set and validated it with the representative sample of the canonical form data; in our case, because our volume of data was quite small, we used all the data in our sources as input to the validation effort. Important: We validated the address field only because our focus was on ensuring that the citya name information could be parsed from the SIF FREEFORMADDRESS field to populate the CITY_NAME column which is required by the MDM data repository. Other critical fieldsb such as LAST_NAME and POSTAL_CODE could have been standardized, but we chose not to include them here. Our objective here was to ensure that most, if not all, input data was loaded by RDP for MDM into the MDM data repository. Toward this goal, we focused on ensuring that the critical columns (CITY in this case) had the necessary information. This led us to work on the USPREPc rule set. As mentioned earlier, because of time constraints, we did not expend additional effort to modify the other rule sets to enhance the quality of the standardization performed by the OOTB RDP for MDM rule sets. However, given our recommendation to work with all the OOTB RDP for MDM rule sets, we demonstrate here the process of validating the efficacy of the OOTB RDP for MDM rule sets and modifying them if necessary for subsequent replacement of the original OOTB rule sets.
a. The CITY name field must be populated during load by RDP for MDM for inserting into the MDM data repository b. For a list of all critical fields, see the Required for Insert and Required for Update columns in the MDM_RDP_SIF mapping template c. As mentioned earlier, the USPREP rule set is not supplied with the RDP for MDM jobs. It is a part of the standard QualityStage rule set.

A copy was created as a backup, and also to be able to run standardization by using the original OOTB RDP for MDM rule sets. If modifications are required, they have to be performed on USPREP, which would then be imported into the RDP for MDM jobs for replacing the original OOTB USPREP rule set.

Chapter 5. Financial services business scenario

177

Figure 5-32 on page 180 through Figure 5-44 on page 188 show several windows that describe the validation process, as follows: 1. Launch the WebSphere DataStage and QualityStage Designer and display the VSSTANAddress shared container on the Designer canvas. VSSTANAddress is the shared container that contains the USPREP stage to be validated. This stage processes address data from one or more source columns and moves it into appropriate domain columns. Because we were only interested in the address fields, our focus was on reviewing the street address in the AddressDomain_USPREP column and the city name, state and Zip code in the AreaDomain_USPREP column. 2. The USPREP stage was inspected to see which rule sets were used and how they were used. We did not modify it. We reviewed them to generate a corresponding standardization job (J02_ORGUSPREP_STAN) to test the OOTB RDP for MDM rule sets. The current names of the source columns are ADDR_LINE_ONE, ADDR_LINE_TWO, and ADDR_LINE_THREE in the SIF files. The literal ZQADDRZQ8 is included here. 3. A copy of these rule sets was created in a ORGUSPREP folder. 4. We created our standardization job (J02_ORGUSPREP_STAN) with the ORGUSPREP rule set. The ORGUSPREP stage identifies the two address columns in the canonical form data (FREEFORMADDRESS and STREETADDRESS)9 with literal ZQADDRZQ as shown in Figure 5-40 on page 185 and Figure 5-41 on page 186 (which has two literals ZQADDRZQ). The reason for including the two literals ZQADDRZQ is that our inspection of the use of USPREP by RDP for MDM (as shown in Figure 5-32 on page 180 and Figure 5-33 on page 180) shows the following information: ZQADDRZQ ADDR_LINE_ONE ZQADDRZQ ADDR_LINE_TWO ZQADDRZQ ADDR_LINE_THREE

This literal specifies that after field overrides and field modifications are applied, it checks for common Address patterns. If not found, it checks for Name and Area patterns. If not found, the field is defaulted to Address. See IBM WebSphere QualityStage Methodologies, Standardization, and Matching, SG24-7546 for details about such literals. FREEFORMADDRESS is really the column we wanted to target. But we knew that either FREEFORMADDRESS or STREETADDRESS contained the data we needed. Therefore, this setup ensures that it generates one address to be standardized for each row.

178

Master Data Management: IBM InfoSphere Rapid Deployment Package

We do not supply the values for ADDR_LINE_TWO and ADDR_LINE_THREE in our canonical form data, and instead have the following information: ZQADDRZQ FREEFORMADDRESS STREETADDRESS ZQADDRZQ ZQADDRZQ The last two literals are essential to make our patterns match the patterns that are generated in RDP for MDM. 5. Figure 5-42 on page 186 shows execution of the J02_ORGUSPREP_STAN job. 6. Figure 5-43 on page 187 shows the data in the STREETADDRESS and FREEFORMADDRESS columns in the canonical form data. The FREEFORMADDRESS address shows the city name and Zip code data in it; STREETADDRESS contains only street address, and the CITY column has city information corresponding to the data in the STREETADDRESS column. 7. Figure 5-44 on page 188 shows the AreaDomain_ORGUSPREP column contents after the processing by ORGUSPREP rule set. It shows failure in moving city name to this column from the FREEFORMA in the input. The InputPattern_ORGUSPREP column shows the input pattern for the addresses that were not processed correctly by ORGUSPREP. Important: Because there is no city information, these records would fail validation by the RDP for MDM jobs and cause them to be rejected. As mentioned previously, our earlier experience without the pre-validation phase had resulted in a rejection of a majority of the rows by the RDP for MDM jobs. Our analysis of the errors had indicated that the missing critical CITY field was the cause of the rejections. Our pre-validation phase confirmed the missing values of this critical field. Examples of other critical fields that can cause rows to be rejected include LAST_NAME and POSTAL_CODE which did not appear in our processing. Because we did not want to have these records rejected, we proceeded to override the input patterns in order for proper processing of city names to occur as described in section Overriding patterns on page 188.

Chapter 5. Financial services business scenario

179

Figure 5-32 Validate RDP for MDM rule set on the standardization job (1 of 13)

Figure 5-33 Validate RDP for MDM rule set on the standardization job (2 of 13)

180

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-34 Validate RDP for MDM rule set on the standardization job (3 of 13)

Figure 5-35 Validate RDP for MDM rule set on the standardization job (4 of 13)

Chapter 5. Financial services business scenario

181

Figure 5-36 Validate RDP for MDM rule set on the standardization job (5 of 13)

182

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-37 Validate RDP for MDM rule set on the standardization job (6 of 13)

Chapter 5. Financial services business scenario

183

Figure 5-38 Validate RDP for MDM rule set on the standardization job (7 of 13)

184

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-39 Validate RDP for MDM rule set on the standardization job (8 of 13)

Figure 5-40 Validate RDP for MDM rule set on the standardization job (9 of 13)

Chapter 5. Financial services business scenario

185

Figure 5-41 Validate RDP for MDM rule set on the standardization job (10 of 13)

Figure 5-42 Validate RDP for MDM rule set on the standardization job (11 of 13)

186

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-43 Validate RDP for MDM rule set on the standardization job (12 of 13)

Chapter 5. Financial services business scenario

187

Figure 5-44 Validate efficacy of RDP for MDM rule sets (13 of 13)

Overriding patterns
We override the input patterns that were not handled in Figure 5-44 in the USPREP10 rule set and re-run the standardization to ensure the address columns were processed correctly, and then export the modified USPREP rule set to a DSX file.

10

The USPREP is modified, rather than ORGUSPREP, because that is the rule set in the RDP for MDM jobs which would need to be replaced.

188

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-45 on page 190 through Figure 5-55 on page 197 show windows that describe the input pattern override, rerun of the standardization job with the modified USPREP rule set, and the modified USPREP export process as follows: 1. The Rule Management window in Figure 5-45 on page 190 shows the various parts in a rule set, including Overrides. Click Overrides in the Rules Management window to add, copy, edit, or delete overrides to rules sets. 2. We perform input pattern overrides as shown in Figure 5-46 on page 191 and Figure 5-47 on page 192. With input pattern override, you can specify token overrides that are based on the input pattern. The input pattern overrides take precedence over the pattern-action file. Input pattern overrides are specified for the entire input pattern. Note: The override codes A (for ADDRESS) circled in the Enter Input Pattern text field corresponds to the literal ZQADDRZQ explained previously. The boxed values correspond to the characters overridden with A in the Override Code column of the Current Pattern List. 3. The modified rule sets are then provisioned11 as shown in Figure 5-48 on page 193. 4. A copy of the J02_ORGUSPREP_STAN job is created as J12_USPREP_STAN using a stage named USPREP as shown in Figure 5-49 on page 193. The USPREP stage is modified to refer to the canonical form data columns STREETADDRESS and FREEFORMADDRESS as shown in Figure 5-50 on page 194. 5. Figure 5-51 on page 194 shows the execution of this job. 6. Figure 5-52 on page 195 shows the results of processing by the modified USPREP rule set which shows the AreaDomain_USPREP column now populated with the city name for the relevant rows. The results indicate successful input pattern overrides. 7. Figure 5-53 on page 196 through Figure 5-55 on page 197 show the successful export of the modified USPREP rule set as a DSX file (USPREPCHANGED.dsx). We then proceed to import the modified USPREP rule set into the RDP for MDM jobs as described in Import modified RDP for MDM rule sets into RDP for MDM project on page 198.
11

You must provision new, copied, or customized rule sets in the Designer client before you can compile and run a job that uses them.

Chapter 5. Financial services business scenario

189

Figure 5-45 Override input pattern and rerun the standardization job (1 of 11)

190

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-46 Override input pattern and rerun the standardization job (2 of 11)

Chapter 5. Financial services business scenario

191

Figure 5-47 Override input pattern and rerun the standardization job (3 of 11)

192

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-48 Override input pattern and rerun the standardization job (4 of 11)

Figure 5-49 Override input pattern and rerun the standardization job (5 of 11)

Chapter 5. Financial services business scenario

193

Figure 5-50 Override input pattern and rerun the standardization job (6 of 11)

Figure 5-51 Override input pattern and rerun the standardization job (7 of 11)

194

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-52 Override input pattern and rerun the standardization job (8 of 11)

Chapter 5. Financial services business scenario

195

Figure 5-53 Override input pattern and rerun the standardization job (9 of 11)

196

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-54 Override input pattern and rerun the standardization job (10 of 11)

Figure 5-55 Override input pattern and rerun the standardization job (11 of 11)

Chapter 5. Financial services business scenario

197

Import modified RDP for MDM rule sets into RDP for MDM project
Figure 5-56 on page 199 through Figure 5-60 on page 201 describe several windows involved in importing the modified RDP for MDM rule sets (in Overriding patterns on page 188) into the RDP for MDM project using WebSphere DataStage Designer. The RDP_rule set_Change project contains all the RDP for MDM jobs into which the modified rule sets are imported. The windows and steps are as follows: 1. Figure 5-56 on page 199 shows the contents of the RDP_rule set_Change project that lists all the RDP for MDM jobs in it. 2. Figure 5-57 on page 200 through Figure 5-59 on page 200 shows the import of all the components of the modified rule set (USPREPCHANGED.dsx) file into this project. 3. Finally, the modified rule set is provisioned as shown in Figure 5-60 on page 201. Note: After provisioning, the job that is using the modified rule sets must be recompiled. With the creation of the SIF (that proceeded in parallel and described in 5.5.5, Create SIF on page 202), we proceed to execute the RDP for MDM jobs as described in 5.5.6, Execute RDP for MDM jobs on page 221.

198

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-56 Import modified RDP for MDM rule sets into RDP for MDM jobs (1 of 5)

Chapter 5. Financial services business scenario

199

Figure 5-57 Import modified RDP for MDM rule sets into RDP for MDM jobs (2 of 5)

Figure 5-58 Import modified RDP for MDM rule sets into RDP for MDM jobs (3 of 5)

Figure 5-59 Import modified RDP for MDM rule sets into RDP for MDM jobs (4 of 5)

200

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-60 Import modified RDP for MDM rule sets into RDP for MDM jobs (5 of 5)

Chapter 5. Financial services business scenario

201

5.5.5 Create SIF


As part of the custom work to create the SIF, you will typically create a mapping document/spreadsheet that specifies the mapping of source-to-SIF target columns, and any transformation of values especially where code tables are concerned. For code table transformations, a cross reference table is recommended (instead of hard coding the transformations) that contains the mapping of source code values to target code values. A custom DataStage job, or other program or utility, should read the source data (canonical form data created in 5.5.3, Create canonical form from the data sources on page 151 in our case) and perform a lookup of the cross reference tables that are created manually (or generated by Information Analyzer) to pick up the corresponding MDM code values for loading to the SIF. In this section, we describe the following information: Creation of a reference table using Information Analyzer. Generation of the SIF from the data stored in the canonical form data table created in section Create canonical form from the data sources on page 151.

Creation of a reference table using Information Analyzer


As mentioned earlier, reference tables for mapping may be created manually or using Information Analyzer. Because we had used Information Analyzer in the DQA, we briefly describe the process of creating one reference table to serve as a lookup for mapping values in the canonical form table to code values stored in the MDM code tables. These values must be retrieved to correctly populate the SIF. Figure 5-61 on page 204 through Figure 5-66 on page 209 describe the windows for creating a single reference table. It involves determining the code values in the appropriate MDM code table and then creating the reference table in Information Analyzer using these values as follows: 1. Figure 5-61 on page 204 shows the navigation pane in the MDM Server UI. Select Administration Console Navigation tree Code Tables. In the content pane, select a code table of interest (CdAdminSysTp) from the drop-down list, and click GO. 2. Figure 5-62 on page 205 shows the list of valid values in this table. We added code values for the Checking (1000000), Savings (100001), and Loan (1000002) systems through this GUI.

202

Master Data Management: IBM InfoSphere Rapid Deployment Package

Note: Repeat this process for all the code tables of interest in the MDM data repository for which reference tables need to be created. 3. After the code values have been verified, repeat the following steps for all columns in the canonical form table that are required to be translated to the target MDM code table values: a. Select the Column Analysis12 tab and View Analysis Summary for the CANONICAL_TBL table in Information Analyzer to view all the columns in this table, as shown in Figure 5-63 on page 206. Select the column (SRCSYSTEMID in this case) requiring a code value lookup and click View Details. b. Select the Frequency Distribution tab and key in the transformation of values in the Transformation Value column as shown in Figure 5-64 on page 207. It shows the Data Value of 3 (Loan system) in the SRCSYSTEMID column to be transformed to 1000002; Data Value of 1 (Checking system) in the SRCSYSTEMID column to be transformed to 1000000, and Data Value of 2 (Savings system) in the SRCSYSTEMID column to be transformed to 1000001. c. Select Reference Tables New Reference Table to create a reference table with these mappings, as shown in Figure 5-65 on page 208. d. Provide the Name (CTS_LKP_SRCID) of the reference table and set the radio button Mapping (All Values), as shown in Figure 5-66 on page 209. Click Save. Note: Repeat this process for all the columns in the canonical form table that have code values.

12

We assume that Column Analysis has been performed on the canonical form table, and that we know the columns in the canonical form table that must have their values transformed to those in the corresponding code table in the MDM data repository.

Chapter 5. Financial services business scenario

203

Figure 5-61 Creation of a reference table using Information Analyzer (1 of 6)

204

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-62 Creation of a reference table using Information Analyzer (2 of 6)

Chapter 5. Financial services business scenario

205

Figure 5-63 Creation of a reference table using Information Analyzer (3 of 6)

206

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-64 Creation of a reference table using Information Analyzer (4 of 6)

Chapter 5. Financial services business scenario

207

Figure 5-65 Create SIF (5 of 6)

208

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-66 Create SIF (6 of 6)

Chapter 5. Financial services business scenario

209

Table 5-2 summarizes the code table value mappings between the source and the target MDM data repository for our scenario.
Table 5-2 Code table mapping between the canonical form columns and the MDM Source systems column, source, and domain values
COUNTRY US (null) LOW MID HIGH GOLD SILVER BRONZE (null) A B C D (null) M F 0 1 (null) Married Single Divorced (null) US (null) English (null) Mr. Mrs. (spaces) (null)

MDM Server domain values and column


185 and other country codes 1 2 3 COUNTRY_TP_CD in CDCOUNTRYTP CLIENT_IMP_TP_CD in CDCLIENTIMPTP

CUSTOMERPERF

CUSTOMERSTATUS

1 2 3 4

CLIENT_ST_TP_CD in CDCLIENTSTTP

GENDER

M F

Not validated in MDM

MARITALSTATUS

1 2 3

MARITAL_ST_TP_CD in CDMARITALSTTP

NATIONALITY

185 and other country codes 100 and other language codes 14 15 and other salutation codes

COUNTRY_TP_CD in CDCOUNTRYTP LANG_TP_CD in CDLANGTP PREFIX_NAME_TP_CD in CDPREFIXNAMETP

PREF_LANGUAGE

SALUTATION

SIF generation from canonical form


We used FastTrack Version 8.0.1 to define the mapping between the columns in the canonical form data table and the SIF, and generated a DataStage job to load the SIF tables. A subsequent DataStage job extracted the data from the SIF tables and created the SIF file for processing by the RDP for MDM jobs. Because the FastTrack process was similar to the one described in the creation of the canonical form data table, it is not repeated here. In the following process, we describe only those steps that differ from the ones performed earlier. Specifically, the creation of the SIF tables (Figure 5-67 on page 211 shows the job jpSchemaPrep which reads the DDL for the 22 SIF13

210

Master Data Management: IBM InfoSphere Rapid Deployment Package

tables and creates the tables in the FBANKCOT database), performing a lookup of the reference tables created earlier, and the execution of the job creating the SIF file from the SIF tables.

FBANKCOT_CONN_PS parameters

FBANKCOT

Figure 5-67 Create SIF tables

13

These tables map one-to-one with the RT/ST combinations described in Appendix B, Standard Interface File details on page 295. The DDL for these tables can be downloaded from the IBM Redbooks website ftp://www.redbooks.ibm.com/redbooks/SG247704/

Chapter 5. Financial services business scenario

211

Table 5-3 shows the mapping between the columns in the canonical form data table to the corresponding SIF columns.
Table 5-3 Canonical form to SIF mapping
Canonical form columns
CANONICAL_TBL.FREEFORMADDRESS,CANONICAL_TBL.STREETADDRESS CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.CITY CTS_LKP_COUNTRY.TRANSFORMVALUE CANONICAL_TBL.ZIPCODE CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCSYSTEM.TRANSFORMVALUE CTS_LKP_AGEVERDOC.TRANSFORMVALUE CANONICAL_TBL.DOB CTS_LKP_NATIONALITY.TRANSFORMVALUE CTS_LKP_CUSTPERF.TRANSFORMVALUE CTS_LKP_CUSTSTATUS.TRANSFORMVALUE CANONICAL_TBL.DOD CTS_LKP_GENDER.TRANSFORMVALUE CTS_LKP_MARITALST.TRANSFORMVALUE CTS_LKP_PREFLANG.TRANSFORMVALUE CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.CELLNB (when it is NOT NULL) CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.EMAIL (when it is NOT NULL) CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.PHONENB (when it is NOT NULL) CANONICAL_TBL.ACCOUNTID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.ACCOUNTID CTS_LKP_SRCID.TRANSFORMVALUE CTS_LKP_PRODTP.TRANSFORMVALUE CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.ACCOUNTID CTS_LKP_SRCID.TRANSFORMVALUE CTS_LKP_PRODTP.TRANSFORMVALUE REGIONS_STAGING.REGION_ID REGIONS_STAGING.REGION_DESCRIPTION CANONICAL_TBL.CUSTOMERID when SRCSYSTEMID = '2' REGIONS_STAGING.REGION_ID REGIONS_STAGING.PARENT_REGION_ID (when PARENT_REGION_ID IS NOT NULL) Lookup Client Id Parent.REGION_ID CANONICAL_TBL.CUSTOMERID REGIONS_STAGING.REGION_DESCRIPTION when PARENT_REGION_ID IS NULL REGIONS_STAGING.REGION_ID REGIONS_STAGING.REGION_DESCRIPTION (when PARENT_REGION_ID is null)

RT/ST
PA

SIF column
ADDRESS.ADDR_LINE_ONE ADDRESS.ADMIN_CLIENT_ID ADDRESS.ADMIN_SYS_TP_CD ADDRESS.CITY_NAME ADDRESS.COUNTRY_TP_CD ADDRESS.POSTAL_CODE CONTACT.ADMIN_CLIENT_ID CONTACT.ADMIN_SYS_TP_CD CONTACT.AGE_VER_DOC_TP_CD CONTACT.BIRTH_DT CONTACT.CITIZENSHIP_TP_CD CONTACT.CLIENT_IMP_TP_CD CONTACT.CLIENT_ST_TP_CD CONTACT.DECEASED_DT CONTACT.GENDER_TP_CODE CONTACT.MARITAL_ST_TP_CD CONTACT.PREF_LANG_TP_CD CONTACTMETHOD.ADMIN_CLIENT_ID CONTACTMETHOD.ADMIN_SYS_TP_CD CONTACTMETHOD.REF_NUM CONTACTMETHOD.ADMIN_CLIENT_ID CONTACTMETHOD.ADMIN_SYS_TP_CD CONTACTMETHOD.REF_NUM CONTACTMETHOD.ADMIN_CLIENT_ID CONTACTMETHOD.ADMIN_SYS_TP_CD CONTACTMETHOD.REF_NUM CONTRACT.ADMIN_CONTRACT_ID CONTRACT.ADMIN_SYS_TP_CD CONTRACTCOMPONENT.ADMIN_CONTRACT_ID CONTRACTCOMPONENT.ADMIN_SYS_TP_CD CONTRACTCOMPONENT.PROD_TP_CD CONTRACTROLE.ADMIN_CLIENT_ID CONTRACTROLE.ADMIN_CLIENT_SYS_TP_CD CONTRACTROLE.ADMIN_CONTRACT_ID CONTRACTROLE.ADMIN_SYS_TP_CD CONTRACTROLE.PROD_TP_CD HIERARCHY_NODE.ADMIN_CLIENT_ID HIERARCHY_NODE.DESCRIPTION HIERARCHY_NODE.ADMIN_CLIENT_ID HIERARCHY_REL.ADMIN_CLIENT_ID_CHILD HIERARCHY_REL.ADMIN_CLIENT_ID_PARENT

PP

PC

PC

PC

CH

CC

CR

HN

HN HN

HN

HIERARCHY_REL.ADMIN_CLIENT_ID_PARENT HIERARCHY_REL.ADMIN_CLIENT_ID_CHILD HIERARCHY.DESCRIPTION

HN

HN

HIERARCHY_UP.ADMIN_CLIENT_ID HIERARCHY_UP.DESCRIPTION

212

Master Data Management: IBM InfoSphere Rapid Deployment Package

Canonical form columns


CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.DRIVERLICNB (when it is NOT NULL) CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.PASSPORTNB (when it is NOT NULL) CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.SSN (when it is NOT NULL) CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.FREEFORMNAME CANONICAL_TBL.FIRSTNAME,CANONICAL_TBL.FREEFORMNAME CANONICAL_TBL.FREEFORMNAME,CANONICAL_TBL.LASTNAME CTS_LKP_SALUTATION.TRANSFORMVALUE CANONICAL_TBL.CUSTOMERID CTS_LKP_SRCID.TRANSFORMVALUE CANONICAL_TBL.ACCOUNTID CTS_LKP_SRCID.TRANSFORMVALUE CTS_LKP_PRODTP.TRANSFORMVALUE

RT/ST
PI

SIF column
IDENTIFIER.ADMIN_CLIENT_ID IDENTIFIER.ADMIN_SYS_TP_CD IDENTIFIER.REF_NUM IDENTIFIER.ADMIN_CLIENT_ID IDENTIFIER.ADMIN_SYS_TP_CD IDENTIFIER.REF_NUM IDENTIFIER.ADMIN_CLIENT_ID IDENTIFIER.ADMIN_SYS_TP_CD IDENTIFIER.REF_NUM PERSONNAME.ADMIN_CLIENT_ID PERSONNAME.ADMIN_SYS_TP_CD PERSONNAME.FREE_FORM_NAME PERSONNAME.GIVEN_NAME_ONE PERSONNAME.LAST_NAME PERSONNAME.PREFIX_NAME_TP_CD ROLELOCATION.ADMIN_CLIENT_ID ROLELOCATION.ADMIN_SYS_TP_CD ROLELOCATION.ADMIN_CONTRACT_ID ROLELOCATION.ADMIN_CLIENT_SYS_TP_CD ROLELOCATION.PROD_TP_CD

PI

PI

PH

CL

Figure 5-68 on page 214 through Figure 5-73 on page 219 describe the windows that perform the lookup of the reference tables for transforming the code table values. Figure 5-74 on page 220 shows the execution of the job jpGenerateOutputSIF that extracts the contents of the SIF tables and generates the SIF file with the pipe ( | ) delimiters between the columns.

Chapter 5. Financial services business scenario

213

The steps shown by the figures are as follows: 1. Figure 5-68 shows the FastTrack Mapping Editor for a specific mapping specification (CANONICAL_ROLELOCATION_SIF). Click Lookup Definitions in the task bar on the left for the list of lookup definitions for the list of columns that must be translated into code values.

FBANKCOT

Figure 5-68 Create SIF (1 of 7)

214

Master Data Management: IBM InfoSphere Rapid Deployment Package

2. Figure 5-69 shows the names of the lookup definitions (such as CTS_LKP_SRCID), the corresponding lookup table (CTS_LKP_SRCID) and the source table (CANONICAL_TBL). Click New Lookup Definition to add another lookup definition.

FBANKCOT

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT .STAG

Figure 5-69 Create SIF (2 of 7)

Chapter 5. Financial services business scenario

215

3. Provide the name (CTS_LKP_SRC) of the lookup definition and click OK, as shown in Figure 5-70.

FBANKCOT

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT .STAG

Figure 5-70 Create SIF (3 of 7)

4. After the new lookup definition has been created and saved, the lookup table needs to be defined. Note: We assume that the reference tables created using Information Analyzer (or manually) have been imported into the InfoSphere Information Server metadata repository.

216

Master Data Management: IBM InfoSphere Rapid Deployment Package

In Figure 5-71, the reference table IAUSER.CTS_LKP_SRCID from the database IADB on host VIRGO.ITSOSJ.SANJOSE.IBM.COM is to be used as the lookup table. Drag the table from the database metadata tab to the Lookup Table field.

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT .STAG

Figure 5-71 Create SIF (4 of 7)

Chapter 5. Financial services business scenario

217

5. Figure 5-72 shows the next step which defines the source table for the lookup definition. In our case this is the canonical form table STAGING.CANONICAL_TBL in the database FBANKCOT on the host ORION.ITSOSJ.SANJOSE.IBM. Drag the table from the database metadata tab to the Source Table field.

FBANKCOT

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT .STAG

Figure 5-72 Create SIF (5 of 7)

218

Master Data Management: IBM InfoSphere Rapid Deployment Package

6. Figure 5-73 shows the next step which is to define a join key for the lookup definition. Make note of the key columns in both the lookup and the source tables to be joined, and then click Add Key to open the Add Join Entry window, as shown in Figure 5-73. Select the key columns in the Lookup Table and Source Table from the drop-down list. After completing all definition of the mapping specification, save it, and generate and verify the DataStage job14. This process is similar to the process described in Define the sources to canonical form table target mapping on page 152, and is not repeated here. When this generated job is run, it loads the SIF tables from the canonical form data table, and is not shown here.

ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT.STAGING.CANONICAL_TBL.PREF ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT.STAGING.CANONICAL_TBL.SALUT ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT.STAGING.CANONICAL_TBL.SOLIC ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT.STAGING.CANONICAL_TBL.SRCSY ORION.ITSOSJ.SANJOSE.IBM.COM.FBANKCOT.STAGING.CANONICAL_TBL.SSN

J.SANJOSE.IBM.COM.FBANKCOT. STAG

Figure 5-73 Create SIF (6 of 7)

14

The job name of a FastTack generated DataStage job generally defaults to the mapping name.

Chapter 5. Financial services business scenario

219

7. After all the SIF tables have been loaded, the DataStage job jpGenerateOutputSIF (Figure 5-74) is run to create the SIF file from the SIF tables. The DSX file of the jpGenerateOutputSIF DataStage job can be downloaded from the IBM Redbooks website at: ftp://www.redbooks.ibm.com/redbooks/SG247704/

FBANKCOT_CONN_PS parameters

FBANKCOT

Figure 5-74 Create SIF (7 of 7)

220

Master Data Management: IBM InfoSphere Rapid Deployment Package

Example 5-4 shows the partial contents of the SIF file generated by this process, corresponding to the canonical form data.
Example 5-4 Partial contents of SIF file
C|R|1000002|30000029|A|1000002|8000637||10|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000002|30000022|A|1000002|8000712||10|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000002|30000024|A|1000002|8000917||10|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000002|30000031|A|1000002|8000467||10|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000002|30000010|A|1000002|8000263||10|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000001|20000028|A|1000001|200000281||7|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000001|20000030|A|1000001|200000301||7|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000001|20000005|A|1000001|200000051||7|1||||||||||||0|0|0|0|0|0|0|0|0|0| C|R|1000002|30000023|A|1000002|8000259||10|1||||||||||||0|0|0|0|0|0|0|0|0|0| ..................... P|I|1000001|200000161|A|3||S99887766|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000001|200000291|A|3||S347936486|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000001|200000251|A|3||D12456745|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000000|70004432|A|3||S67856745|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000000|70006363|A|3||S12314517|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000000|70004262|A|3||S76193782|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000001|200000001|A|3||S45118674|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000001|200000091|A|3||S12314517|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000000|70007287|A|3||S12312399|||||||||||0|0|0|0|0|0|0|0|0|0| P|I|1000001|200000131|A|3||S12312456|||||||||||0|0|0|0|0|0|0|0|0|0| ...................... P|P|1000002|8000885|A|N|||||||3||||||||||||||||||||1|||||F|2001-03-17 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|P|1000002|8000362|A|N|||||||2||||||||||||||||||||1|||||F|1989-08-23 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|P|1000002|8000263|A|N|||||||2||||||||||||||||||||1|||||F|1986-06-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|P|1000001|200000301|A|N|||||||3|||||||||||||||||||||||||F|1987-07-06 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|P|1000000|70005799|A|N|||100|||||3|||||||||||||||||||||185|||F|1937-09-01 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|P|1000000|70002070|A|N|||100|||||4|||||||||||||||||||||185|||M|1923-04-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|P|1000001|200000051|A|N|||||||1|||||||||||||||||||||||||M|2011-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| ,,,,,,,,,,,,,,,, P|O|1000001|4|A|N||||||||||||||||||||||7|||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|O|1000001|8|A|N||||||||||||||||||||||7|||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| C|C|1000002|30000035|A|10|2||||||||||||0|0|0|0|0|0|0|0|0|0| C|C|1000002|30000028|A|10|2||||||||||||0|0|0|0|0|0|0|0|0|0|

5.5.6 Execute RDP for MDM jobs


After the SIF was created, we proceeded to drop the triggers and referential constraints on the MDM data repository, configured the parameters for the RDP for MDM jobs, and then launched the IL_000_INITIAL_LOAD job.

Drop triggers and RI constraints


All the triggers and referential integrity (RI) constraints in the MDM data repository were dropped as required: The following command was used to drop the triggers and deactivate referential integrity constraints: db2 -tsvf Filename In the command, Filename contains the SQL to perform this operation.

Chapter 5. Financial services business scenario

221

Notes: Scripts for dropping triggers and RI are in the RDP assets TAR file. The relative path of the script is as follows: <MDMRDPRuntimeAssets_Install_Home>/DB/<DB_TYPE>/ The Drop_Scripts.zip file provides scripts for dropping all triggers and referential integrity constraints. Scripts for resuming triggers and RI constraints are in the MDM installation folder: <MDM_Install_Home>/database/MDM/<DB_TYPE>/Standard/ddl/ After completion of initial loading, these scripts resume triggers and constraints.

Provide configuration parameters


Table 5-4 on page 223 shows our choices for the MUST MODIFY parameters. The main parameters of interest that were enabled or modified are as follows: Enabled standardization and matching in the RDP for MDM jobs: QS_PERFORM_ORG_MATCH set to 1 QS_PERFORM_PERSON_MATCH set to 1 QS_STAN_ADDRESS set to 1 QS_STAN_ORG_NAME set to 1 QS_STAN_PERSON_NAME set to 1

Modified the national ID from the default as follows: QS_MATCH_ORG_NATID set to I215 (Corporate Tax Identification) QS_MATCH_PERSON_NATID set to I116 (Social Security Number) Modified the database connection details, file system directories, and other parameters for our particular environment. Note: We make these changes in the configuration parameter file because we used the DL_000_DELTA_LOAD job rather than the DL_000_AutoStart_PS_DELTA_LOAD job.

15

The default was I8, which is the passport number. Having it as a national identification for an organization does not make sense. We therefore chose I2. 16 The default was c2, which is the business phone number. Having it as a national identification for a person does not make sense. We therefore chose I1.

222

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table 5-4 RDP configuration parameters MUST MODIFY list


Frequency Category Sub category
Connection

Parameter

Scenario value

One time

SETUP

DB_CONNECT_STRING DB_INSTANCE DB_PASSWORD DB_SCHEMA DB_USERID

MDM_DB db2inst1 encrypted value DB2INST1.a db2inst1 /home/dsadm/remote_db2config True (blank) /opt/IBM/InformationServer/Server/Configuratio ns/MDM_Default.apt /opt/IBM/InformationServer/Server/Configuratio ns/MDM_1X1.apt WebSphere Customer Center 100 /data/RDP/FREQ/ /data/RDP/DATA/ /data/RDP/ERROR/ /data/RDP/LOG/ ./ParameterSets/ /data/RDP/REJECT/ /data/RDP/SK/ /data/RDP/TMP/ canonical_HIER 2008-05-30 09:36:10 /data/RDP/SIF_IN/canonical_1/*.hsif /data/RDP/SIF_IN/canonical_1/*.sif True True True True

DS PARAMETER

$APT_DB2INSTANCE_HOME $APT_IMPORT_PATTERN_USES_FILESET_MOUNTED $APT_STRING_PADCHAR DS_PARALLEL_APT_CONFIG_FILE

DS_SEQUENTIAL_APT_CONFIG_FILE

Miscellaneous

MDM_DEPLOYMENT_NAME DS_LANGUAGE_TYPE_CODE

File location

DS_SUPPORT_FILE_DIR FS_DATA_SET_HEADER_DIR FS_ERROR_DIR FS_LOG_DIR FS_PARAM_SET_DIR FS_REJECT_DIR FS_SK_FILE_DIR FS_TMP_DIR

Runtime

Runtime

BATCH_ID (auto assigned) DS_PROCESSING_DATE (auto assigned) FS_HIERARCHY_SIF_FILE_PATTERN FS_SIF_FILE_PATTERN

ADVANCED

DS PARAMETER

$APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL $APT_IMPORT_PATTERN_USES_FILESET $APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS $APT_SORT_INSERTION_OPTIMIZATION

Chapter 5. Financial services business scenario

223

Frequency

Category

Sub category
QualityStage

Parameter

Scenario value

Recurring

SETUP

QS_MATCH_ORG_NATIDb QS_MATCH_PERSON_NATID
c

I2 I1 1 1 1

QS_PERFORM_ORG_MATCH

QS_PERFORM_PERSON_MATCHe QS_STAN_ADDRESSf QS_STAN_ORG_NAME


g

1
h

QS_STAN_PERSON_NAME

a. Period is required b. Default is the passport number; it should be I2, which is the Corporate Tax Identification c. The default setting of C2 equates to Business Phone Number, which is not a reasonable national ID document. We therefore changed it to I1, which is SSN. d. We chose to perform Org match. e. We chose to perform Person match. f. We chose to perform standardization on address. g. We chose to perform standardization on OrgName. h. We chose to perform standardization on PersonName.

224

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table 5-5 shows our choices for the CONSIDER MODIFYING parameters according to the guidelines in Table A-6 on page 290.
Table 5-5 RDP configuration parameters in the CONSIDER MODIFYING list
Frequency Category Sub category
Miscellaneous

Parameter

Recommendation

One time

SETUP

DS_SOURCE_DATE_FORMATa DS_USE_NATIVE_KEY

%yyyy-%mm-%dd %hh:%nn:%ss.6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

ADVANCED

SURROGATE

SK_MID_ADDRESS_ID_NEXT_VAL SK_MID_ALERT_ID_NEXT_VAL SK_MID_CONT_EQUIV_ID_NEXT_VAL SK_MID_CONT_ID_NEXT_VAL SK_MID_CONT_REL_ID_NEXT_VAL SK_MID_CONTACT_METHOD_ID_NEXT_VAL SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL SK_MID_CONTR_COMPONENT_ID_NEXT_VAL SK_MID_CONTRACT_ID_NEXT_VAL

One time

ADVANCED

SURROGATE

SK_MID_CONTRACT_ROLE_ID_NEXT_VAL SK_MID_HIER_ULT_PAR_ID_NEXT_VAL SK_MID_HIERARCHY_ID_NEXT_VAL SK_MID_HIERARCHY_NODE_ID_NEXT_VAL SK_MID_HIERARCHY_REL_ID_NEXT_VAL SK_MID_IDENTIFIER_ID_NEXT_VAL SK_MID_LOB_REL_ID_NEXT_VAL SK_MID_LOCATION_GROUP_ID_NEXT_VAL SK_MID_MISCVALUE_ID_NEXT_VAL SK_MID_NATIVE_KEY_ID_NEXT_VAL SK_MID_ORG_NAME_ID_NEXT_VAL SK_MID_PERSON_NAME_ID_NEXT_VAL SK_MID_PERSON_SEARCH_ID_NEXT_VAL SK_MID_PPREF_ID_NEXT_VAL SK_MID_ROLE_LOCATION_ID_NEXT_VAL SK_MID_SUSPECT_ID_NEXT_VAL SK_PREFIX_CONT_ID_NEXT_VAL SK_PREFIX_CONTRACT_ID_NEXT_VAL SK_PREFIX_HIERARCHY_ID_NEXT_VAL

Chapter 5. Financial services business scenario

225

Frequency

Category

Sub category QualityStage

Parameter

Recommendation

Recurring
Recurring

SETUP

QS_ALLOW_LOB_MATCH QS_EXCLUDE_FIELDS_FROM_MATCH_ORGANIZATION QS_EXCLUDE_FIELDS_FROM_MATCH_PERSON QS_MATCH_ORG_1b QS_MATCH_ORG_2 QS_MATCH_ORG_3 QS_MATCH_ORG_4 QS_MATCH_PERSON_1 QS_MATCH_PERSON_2 QS_MATCH_PERSON_3 QS_MATCH_PERSON_4 QS_PHONETIC_CODING_TYPE_ADDRESS QS_PHONETIC_CODING_TYPE_ORGANIZATION QS_PHONETIC_CODING_TYPE_PERSON QS_REJECT_ADDRESS_IF_NOT_STANDARDIZED QS_REJECT_ORG_NAME_IF_NOT_STANDARDIZED QS_REJECT_PERSON_NAME_IF_NOT_STANDARDIZED

0 (blank) (blank) I2 (blank) (blank) (blank) C1 C3 C5 C7 QSNYSIIS QSNYSIIS QSNYSIIS 0 0 0

Error Handling

DROP

DS_DETECTED_DUPLICATES_ACTION DS_PARTY_DROP_SEVERITY_LEVELc

E 0 itso@us.ibm.com 1 10 C C 120 50 12

Notification

DS_EMAIL_ERROR_CHECK_DISTRIBUTION DS_EMAIL_ERROR_CHECK_REPORT

Abort handling

DS_DROP_MAX_ITERATIONS DS_FAILED_COLUMNIZATION_ACTIONd DS_FAILED_RECORDIZATION_ACTIONe DS_SIF_ERROR_THRESHOLD DS_SIF_INDIVIDUAL_ERROR_THRESHOLD DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT

a. We chose to adopt the time stamp format for finer granularity information. b. We did not have organizations in our input data. However, if we had, the default value of C1 (which is SSN) is not apropriate. c. Defines the severity level below which parties are dropped; we chose the least sensitive setting. d. Defines what you want to do with a parsing failure; we chose C(ontinue). e. Defines what you want to do with a parsing failure; we chose C(ontinue).

226

Master Data Management: IBM InfoSphere Rapid Deployment Package

Launch RDP for MDM jobs


We launched DL_000_DELTA_LOAD using Director as shown in Figure 5-75.

Figure 5-75 Launch RDP for MDM jobs (1 of 2)

The Job Run Options are shown in Figure 5-76 on page 228. The successful completion of the job was checked in Director Client. Note: Changes have been made in RDP job names from the initial release. Therefore, you see references of new job names if you used the initial version of RDP jobs. We proceed to verify the successful load of the MDM data repository as described in 5.5.7, Verify successful load on page 228.

Chapter 5. Financial services business scenario

227

Figure 5-76 Launch RDP for MDM jobs (2 of 2)

5.5.7 Verify successful load


After loading the MDM data repository using RDP for MDM jobs, you must verify that the load was successful as follows: 1. Re-establish referential integrity constraints and triggers. 2. Use the MDM Server UI to query select rows.

Re-establish referential integrity constraints and triggers


The referential integrity constraints and triggers in the MDM data repository that were dropped prior to the launch of the RDP for MDM jobs must be re-established before proceeding with the query of selected rows. Note: If the RDP for MDM jobs have successfully validated all the rows, then no errors are highlighted by this process. If RI constraints are found to be violated, then an error (SQLSTATE 23512) is raised and the table is put into a check pending state. You then have to resolve these errors before proceeding further. Resume dropped triggers and referential integrity constraints. Use the following command to re-create the triggers and referential integrity constraints that were dropped previously: db2 -td@ -svf Filename

228

Master Data Management: IBM InfoSphere Rapid Deployment Package

In the command, Filename contains the SQL to create all the triggers and referential integrity constraints. Note: The scripts for resume triggers and referential integrity constraints will be placed in the MDM installation folder: <MDM_Install_Home>/database/MDM/<DB_TYPE>/Standard/ddl/ After completion of initial loading these scripts resume triggers and constraints.

Use the MDM Server UI to query select rows


After the RI constraints and triggers have been successfully re-instated, you can use the MDM Server UI to query select rows to verify that they were successfully loaded. Figure 5-77 on page 230 through Figure 5-81 on page 234 show the search and successful retrieval of information relating to a customer whose given name is Torben: 1. Use the web address for the MDM Server UI, and navigate to Party Maintenance Console Navigation Search Party Search in the Navigation pane, as shown in Figure 5-77 on page 230. Do the following steps: a. In the Family Name field, type a question mark (?), which is a wildcard character. b. In the Given Name 1 field, type Torben. c. Click SUBMIT.

Chapter 5. Financial services business scenario

229

Figure 5-77 Verify successful load (1 of 5)

230

Master Data Management: IBM InfoSphere Rapid Deployment Package

2. Figure 5-78 shows the results of the search and identifies only a single customer with a Family Name of Andersom as satisfying the conditions of the search. Select the customer by clicking the Party Id link as shown.

Figure 5-78 Verify successful load (2 of 5)

3. Figure 5-79 on page 232 shows the Master Data for this customer such as Party Info, Addresses, Identifiers, and Contact Method. Click the icon corresponding to Next in the Identifiers portion, as shown, to view details of additional identifiers associated with Torben Andersom. Note: The information associated with Torben Andersom has been merged from separate recordsa in the input because the RDP for MDM jobs was able to automatically match (A1) the Torben Andersom records in the CHECKING, SAVINGS, and LOAN systems.
a. Passport Number is from the LOAN system; Social Security number is from the CHECKING, SAVINGS systems.

Chapter 5. Financial services business scenario

231

Figure 5-79 Verify successful load (3 of 5)

232

Master Data Management: IBM InfoSphere Rapid Deployment Package

4. Figure 5-80 shows details of Torben Andersoms drivers license. Click the Next icon again to view information about additional identifiers. Note: Review Master Data information of other important customers also, and after the retrieved information is deemed to be accurate, you can conclude that the load by the RDP for MDM jobs is successful.

Figure 5-80 Verify successful load (4 of 5)

Chapter 5. Financial services business scenario

233

5. Figure 5-81 shows passport details of Torben Andersom.

Figure 5-81 Verify successful load (5 of 5)

234

Master Data Management: IBM InfoSphere Rapid Deployment Package

5.6 Suspect resolution


If matching is enabled in the RDP for MDM job in the configuration parameters (QS_PERFORM_PERSON_MATCH and QS_PERFORM_ORG_MATCH both set to 1 as shown in Figure 5-82 on page 236), then the RDP for MDM jobs are likely to conclusively identify certain parties as being duplicates of each other and will consolidate the information from these multiple parties into a single record in the MDM data repository. However, when the match score falls below the A1 cutoff value but above the B cutoff value (as shown in Figure 5-82 on page 236), it cannot conclusively make the determination that certain parties are duplicates, and therefore marks such potential duplicates as suspects for manual review. The process associated with manually reviewing these suspects and resolving potential duplicates is called suspect resolution. The MDM Server UI provides the capability to find the identified suspects and resolve (and mark) them as duplicates or not, as described in Figure 5-82 on page 236 through Figure 5-90 on page 244. It involves searching for suspects, reviewing their details, and collapsing them into a single record and choosing the column values to store in the collapsed record. Note: Using the real-time services of MDM Server will ensure that they are not raised as possible duplicates again.

Chapter 5. Financial services business scenario

235

The steps are as follows: 1. Use the web address for the MDM Server UI and navigate to Data Stewardship Console Navigation Party Suspect Processing Suspect Search in the Navigation pane as shown in Figure 5-82. Click SEARCH to view up to 100 potential suspects in both persons and organizations.

Figure 5-82 Suspect resolution (1 of 9)

236

Master Data Management: IBM InfoSphere Rapid Deployment Package

2. Figure 5-83 shows the results of the search and identifies a list of persons (we did not have organizations in our data), which have suspects that are associated with them. We chose to resolve potential duplicates associated with the person named JASTINDERK with Party Id 1120000000101288 by selecting the person and clicking the icon corresponding to Open Suspect List as shown.

Figure 5-83 Suspect resolution (2 of 9)

Chapter 5. Financial services business scenario

237

3. Figure 5-84 shows one other person with Party Id 4070000000000788 with the same name and a Match Reason code of A217. Because we believe this to be a duplicate, we select this suspect and click COLLAPSE CANDIDATES LIST to view the Collapsed Candidates List window, as shown in Figure 5-85 on page 239 which shows the single candidate (Party Id 4070000000000788) that has the Suspect Status of Parties are Duplicates. Click PREVIEW COLLAPSE to review what the collapsed party information should be.

Figure 5-84 Suspect resolution (3 of 9)

17

A2 indicates that there is a reasonable certainty that the two records represent the same party.

238

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-85 Suspect resolution (4 of 9)

Chapter 5. Financial services business scenario

239

4. Figure 5-86 through Figure 5-89 on page 243 show you the source data and the suspect data side-by-side and allows you to decide what the collapsed party columns values should be. For example, in Figure 5-86 you can use the drop-down list for Marital Status to select Single. See Figure 5-87 on page 241.

Figure 5-86 Suspect resolution (5 of 9)

240

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-87 Suspect resolution (6 of 9)

Chapter 5. Financial services business scenario

241

The Solicitation Indicator can be set to any value from the drop down list as shown in Figure 5-88 and Figure 5-89 on page 243, even though neither the source nor the suspect have this information. Click SUBMIT when the collapsed party information has been updated to your satisfaction. The UI sends business service request to MDM Server to collapse suspects.

Figure 5-88 Suspect resolution (7 of 9)

242

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-89 Suspect resolution (8 of 9)

Chapter 5. Financial services business scenario

243

5. Figure 5-90 shows the collapsed party information, which now has a new Party Id 869122548964091809.

Figure 5-90 Suspect resolution (9 of 9)

244

Master Data Management: IBM InfoSphere Rapid Deployment Package

Note: Each time parties are collapsed, the MDM Server invokes suspect processing for a newly created party to identify any new potential suspects. If you have modified the Standardization and Matching QualityStage rules in the RDP for MDM initial load, the same rules must be deployed in MDM Server run time to ensure identical business logic between runtime services requests and load. For more information about integrating runtime QualityStage rules with MDM Server see InfoSphere MDM Server Developer Guide. You should repeat this process for all the persons that have suspects associated with them. You may now integrate the real-time services of MDM Server into your existing applications as described earlier. In our scenario, we wrote a new application to consume master data in MDM Server as described in 5.8, MDM consumption application on page 266.

5.7 Hierarchies
Hierarchies provide a view of relationships between parties/contacts. The deletion of a party/contact, or the collapse of multiple parties/contacts into a single party/contact can have an impact on existing hierarchy structures.

5.7.1 Hierarchy overview


MDM supports a directed acyclic graph hierarchy. You can define multiple hierarchies (Primary Key is the HIERARCHY_ID column in the HIERARCHY table). A hierarchy includes hierarchy nodes (Primary Key is the HIERARCHY_NODE_ID column in the HIERARCHYNODE table), and each hierarchy node18 has a column INSTANCE_PK column whose value matches the Party Id of the corresponding party/contact19. Each HIERARCHY_NODE_ID value corresponds to a single hierarchy as identified by the HIERARCHY_ID column in this table and which is a foreign key to the HIERARCHY table. A party/contact may have zero or many hierarchy nodes associated with it. A hierarchy node may also be associated with multiple hierarchies.

18

Each node must reference a valid Hierarchy using the Hierarchy Name (such as Legal, Marketing and Finance) and TypeCode (1,2, and 3). 19 The RDP for MDM data model only supports a party/contact hierarchy even though MDM Server supports product hierarchies also.

Chapter 5. Financial services business scenario

245

Figure 5-91 illustrates the concept, as follows: Three hierarchies: National, Western Region, and Eastern Region Six parties: Austin, Bill, Charles, David, Estelle, and Frank Twelve hierarchy nodes in all: six hierarchy nodes (corresponding to Austin, Bill, Charles, David, Estelle and Frank) associated with the National hierarchy, three hierarchy nodes (corresponding to Austin, Bill, and Charles) associated with the Western Region hierarchy, and three hierarchy nodes (corresponding to David, Estelle, and Frank) associated with the Eastern Region hierarchy. Note: Each party has two corresponding hierarchy nodes associated with it. All six parties belong to the National hierarchy and are in a hierarchy of reporting where Estelle and Frank report to David, who in turn reports to Charles. Bill and Charles report to Austin who is at the top of the hierarchy and is defined as the hierarchy ultimate parent. Three parties (Austin, Bill and Charles) are associated with the Western Region hierarchy. Austin is again at the top of the hierarchy here and is defined as the hierarchy ultimate parent. Three parties (David, Estelle and Frank) are associated with the Eastern Region hierarchy. David is at the top of this hierarchy and is defined as the hierarchy ultimate parent. Note: Business rules have been defined to ensure that a cyclic graph does not occur.

Hierarchy National

Hierarchy Western Region

Hierarchy Eastern Region

Hierarchy Node Hierarchy Ultimate Parent (HUP) Austin

Hierarchy Node (HUP) Austin

Hierarchy Node David

(HUP)

Hierarchy Node Bill

Hierarchy Node Charles

Hierarchy Node Bill

Hierarchy Node Charles

Hierarchy Node Estelle

Hierarchy Node Frank

Hierarchy Node David

Hierarchy Node Estelle

Hierarchy Node Frank

Figure 5-91 Hierarchy scenario example

246

Master Data Management: IBM InfoSphere Rapid Deployment Package

Hierarchy data is processed as a separate feed after all other party/contact data has been validated, matched, keys assigned and the data loaded into the MDM data repository. The input hierarchy data is validated against the hierarchy data and party/contact information already in the MDM data repository. The Hierarchy RT/ST (Table B-20 on page 307 through Table B-23 on page 308) data is processed in the same manner as the non-hierarchy RT/ST party/contact and contract data.

5.7.2 Hierarchy scenario


After successfully loading the MDM data repository for the FBankCoT scenario, and performing suspect resolution, we defined a single hierarchy of parties (persons and organizations), which showed the relationship of customers to the banks marketing organizations. Note: No organizations are in our FBANKCOT data. For the purposes of creating a hierarchy, we loaded organization records in to the MDM data repository. The loading of these organization records is not described here. Figure 5-92 on page 248 shows the MARKETING hierarchy, the various party (person20 and organization21) hierarchy nodes, and the hierarchy node relationships defined for the FBankCoT scenario. We have combined persons and organizations in the same hierarchy. We did not have persons in some organizations in our scenario, an unlikely situation in the production environment.

20 21

Oval shape Rectangle shape

Chapter 5. Financial services business scenario

247

US Wide Marketing

State - California

State - Massachusetts

State - Washington

Local Marketing San Jose Torben Andersom Carol Hansson Anna Fanelli Bruce H Anderson Anders Olsson Arcangelo Fanelli Yesica Anderson A Carter Christina Anderson Alex Skov Denise Farrel Kurt Madi Barry Rosen

Local Marketing San Francisco Local Marketing Eugene

Local Marketing Salem

Local Marketing Seattle

Aaron Jensen Anton T & Larue Jensen Allan Jensen Brandon Jensen Andrew I Jensen Steven C Preston

Renee Jackson

Jackie Jackson

Figure 5-92 FBankCoT hierarchy scenario

In this section, we describe the following information: The SIF hierarchy RT/ST records (that we created manually) to define the hierarchy, hierarchy nodes, hierarchy relationships, and hierarchy ultimate parent. RDP for MDM jobs Director output Verify successful hierarchy creation using MDM Server UI

248

Master Data Management: IBM InfoSphere Rapid Deployment Package

SIF hierarchy RT/ST records


The hierarchy RT/ST records (HH, HN, HR, and HU) in the SIF corresponding to the hierarchy shown in Figure 5-92 on page 248 is shown in Example 5-5. The sequence of the SIF records is immaterial because they are sorted into the required sequence by the RDP for MDM jobs.
Example 5-5 SIF Hierarchy RT/ST records

H|R|A|MARKETING|2|1000001|3|1000001|7||||0|0| H|R|A|MARKETING|2|1000001|2|1000001|9||||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000031|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|7|1000001|200000281|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000051|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|6|1000001|200000201|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000141|Leaf Node Link|||0|0| H|N|A|MARKETING|2|1000001|7|CONTACT|City|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|5|CONTACT|City|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|200000141|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000171|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000251|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000181|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000121|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000211|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|R|A|MARKETING|2|1000001|1|1000001|2||||0|0| H|R|A|MARKETING|2|1000001|1|1000001|4||||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000161|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|6|1000001|200000251|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000181|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000121|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|6|1000001|200000211|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000081|Leaf Node Link|||0|0| H|N|A|MARKETING|2|1000001|2|CONTACT|State|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|4|CONTACT|State|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|9|CONTACT|City|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|200000081|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000221|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000261|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000301|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000131|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|H|A|MARKETING|2|United States of America|||0|0| H|U|A|MARKETING|2|1000001|1|United States Of America|||0|0| H|R|A|MARKETING|2|1000001|1|1000001|3||||0|0| H|R|A|MARKETING|2|1000001|2|1000001|5||||0|0| H|R|A|MARKETING|2|1000001|8|1000001|200000291|Leaf Node Link|||0|0|

Chapter 5. Financial services business scenario

249

H|R|A|MARKETING|2|1000001|5|1000001|200000071|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000041|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|6|1000001|200000241|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000011|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|6|1000001|200000221|Leaf Node Link|||0|0| H|N|A|MARKETING|2|1000001|6|CONTACT|City|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|1|CONTACT|United States Of America|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|200000011|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000091|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000031|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000281|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000051|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000201|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|R|A|MARKETING|2|1000001|2|1000001|6||||0|0| H|R|A|MARKETING|2|1000001|4|1000001|8||||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000171|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|6|1000001|200000261|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|9|1000001|200000301|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000131|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000001|Leaf Node Link|||0|0| H|R|A|MARKETING|2|1000001|5|1000001|200000091|Leaf Node Link|||0|0| H|N|A|MARKETING|2|1000001|3|CONTACT|State|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|8|CONTACT|City|||1||0|0|0|0| H|N|A|MARKETING|2|1000001|200000001|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000161|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000291|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000071|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000041|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0| H|N|A|MARKETING|2|1000001|200000241|CONTACT|CUSTOMER|||1|LOCAL|0|0|0|0|

250

Master Data Management: IBM InfoSphere Rapid Deployment Package

RDP for MDM jobs director output


The DL_200_Hierarchy job processes the SIF containing the hierarchy RT/STs: 1. Figure 5-93 shows the launch of the DL_200_Hierarchy job from the Director and the selection of the canonical_1 file as the configuration parameter file for this run; the contents of the canonical_1 file is not shown here. 2. Director Client shows the successful execution of this job or any errors and warnings associated with this job run.

Figure 5-93 RDP for MDM jobs Director output

Chapter 5. Financial services business scenario

251

Verify successful hierarchy creation


We used the MDM Server UI to verify that the hierarchies were successfully created, shown in Figure 5-94 through Figure 5-108 on page 265. 1. Expand the navigation pane in the MDM Server UI and click Search Hierarchy By Party, as shown in Figure 5-94.

Figure 5-94 Hierarchy view using MDM Server UI (1 of 15)

252

Master Data Management: IBM InfoSphere Rapid Deployment Package

2. In the Party Search pane (Figure 5-95), enter the following information: In the Family Name field, type Andersom. In the Given Name 1 field, type Torben. Click Submit to view the results of the search.

Figure 5-95 Hierarchy view using MDM Server UI (2 of 15)

Chapter 5. Financial services business scenario

253

3. Figure 5-96 shows one qualifying person in the Party Search Results pane. Click the link 1180000000001888 under Party Id to view the associated hierarchy information.

Figure 5-96 Hierarchy view using MDM Server UI (3 of 15)

254

Master Data Management: IBM InfoSphere Rapid Deployment Package

4. Figure 5-97 shows Torben Andersom belonging to the MARKETING hierarchy. To view more information about the MARKETING hierarchy, click the MARKETING link under Hierarchy Name.

Figure 5-97 Hierarchy view using MDM Server UI (4 of 15)

5. Figure 5-98 on page 256 and Figure 5-99 on page 257 show the FULL VIEW of the MARKETING hierarchy that shows the ancestors and the children in this hierarchy.

Chapter 5. Financial services business scenario

255

Figure 5-98 Hierarchy view using MDM Server UI (5 of 15)

256

Master Data Management: IBM InfoSphere Rapid Deployment Package

To view more details of a specific node in the hierarchy such as Marketing California, click the corresponding link as shown in Figure 5-99.

Figure 5-99 Hierarchy view using MDM Server UI (6 of 15)

Chapter 5. Financial services business scenario

257

6. Figure 5-100 shows details of the specific node Marketing - California. Click RETURN to go back to the previous window.

Figure 5-100 Hierarchy view using MDM Server UI (7 of 15)

258

Master Data Management: IBM InfoSphere Rapid Deployment Package

7. Click US-Wide Marketing(UP) in Figure 5-101 to view details of this node (root or hierarchy ultimate parent).

Figure 5-101 Hierarchy view using MDM Server UI (8 of 15)

Chapter 5. Financial services business scenario

259

8. Figure 5-102 shows the details identifying it as the root, Ultimate Parent Yes.

Figure 5-102 Hierarchy view using MDM Server UI (9 of 15)

260

Master Data Management: IBM InfoSphere Rapid Deployment Package

9. You may view the same information (FULL VIEW, ANCESTORS VIEW, DESCENDENTS VIEW) by searching on an organization with Party Id 2030000000200388 as shown in Figure 5-103 through Figure 5-108 on page 265.

Figure 5-103 Hierarchy view using MDM Server UI (10 of 15)

Chapter 5. Financial services business scenario

261

Figure 5-104 Hierarchy view using MDM Server UI (11 of 15)

Figure 5-105 Hierarchy view using MDM Server UI (12 of 15)

262

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-106 Hierarchy view using MDM Server UI (13 of 15)

Chapter 5. Financial services business scenario

263

Figure 5-107 Hierarchy view using MDM Server UI (14 of 15)

264

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-108 Hierarchy view using MDM Server UI (15 of 15)

Chapter 5. Financial services business scenario

265

5.8 MDM consumption application


The business requirement for the MDM solution implementation was one of coexistence, where Master Data was maintained in the MDM repository that was synchronized with the changes occurring in the source system (or systems) at each end-of-day. As mentioned previously, you typically integrate the real-time services of MDM Server into your existing applications to access the master data therein. However, our scenario involved writing a new simple MDM consumption application that obtains a 360-degree view of a customer, where Master Data is obtained from the MDM Server through web service call and non-Master Data is retrieved from the corresponding source systems checking, savings, and loan. Our application involved a GUI interface that provided for search on first name and last name in the MDM repository, returning the address (Master Data from the MDM repository) and balance (non-Master Data) from the appropriate source systems checking, savings, and loan. The MDM consumption application22 was developed as a JSP and J2EE application and performs the following functions: It uses a JSP to provide the GUI, shown in Figure 5-109 on page 267, which allows you to search on first name and last name: 1. It uses the PartyServiceProxy() web service to obtain the party ID and address information for persons matching the search criteria23. The code invoking the web service is highlighted in Example 5-6 on page 269. Note: In our sample application, we did not provide for wildcard searches and assumed that the results of the search would either be zero or one row from the MDM repository with the associated party ID. 2. This party ID is then used to retrieve the address information from the MDM repository, and the corresponding source system keys (SSK) for the checking, saving and loan systems as highlighted in Example 5-6 on page 269. 3. The SSKs are then used to connect to the DB2 for LUW source system (or systems) to retrieve the balance (non-key) data (as highlighted in Example 5-6 on page 269) and present back to the user as shown in Figure 5-110 on page 268.
22 23

Download from the IBM Redbooks website ftp://www.redbooks.ibm.com/redbooks/SG247704 In our case, we assumed only zero or one row to be returned matching the criteria, which is highly unlikely in a real-world environment.

266

Master Data Management: IBM InfoSphere Rapid Deployment Package

In this case, the customer Renee Jackson only has a savings account, and no checking or loan accounts. 4. Figure 5-111 on page 268 and Figure 5-112 on page 269 show the customer Torben Andersom having accounts in Checking, Savings, and Loan systems. Note: The code shown in Example 5-6 on page 269 is only meant to show the web service calls and subsequent access to the source systems. It has no error handling capabilities, which is essential in a real-world application. The CUSTOMERID value in the ADMIN_CLIENT_ID field of the CONTEQUIV table in MDM is used to access the balance information for the Checking and Loan systems. In the case of the Savings system, the CUSTOMERID value in the ADMIN_CLIENT_ID was generated from the SAVINGSID by adding another character to it. The MDM consumption application strips this additional character to get the SAVINGSID key which is then used to retrieve the balance.

Figure 5-109 MDM consumption application (1 of 4)

Chapter 5. Financial services business scenario

267

Figure 5-110 MDM consumption application (2 of 4)

Figure 5-111 MDM consumption application (3 of 4)

268

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure 5-112 MDM consumption application (4 of 4)

The code invoking the web service is highlighted in Example 5-6.


Example 5-6 FBankCoT 360 view test.jsp
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <%@page language="java" contentType="text/html; charset=ISO-8859-1" pageEncoding="ISO-8859-1"%> <%@page import="com.ibm.www.xmlns.prod.websphere.wcc.common.intf.schema.Control" %> <%@page import="com.ibm.www.xmlns.prod.websphere.wcc.party.intf.schema.PersonSearchResultsResponse" %> <%@page import="com.ibm.www.xmlns.prod.websphere.wcc.party.port.PartyServiceProxy" %> <%@page import="com.ibm.www.xmlns.prod.websphere.wcc.party.schema.PersonSearch" %> <%@page import="com.ibm.www.xmlns.prod.websphere.wcc.party.schema.PersonSearchResult"%> <%@page import="com.ibm.www.xmlns.prod.websphere.wcc.party.intf.schema.PartyAdminSysKeyResponse"%> <%@page import="java.sql.*"%> <html> <head> <title>test</title> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> <meta name="GENERATOR" content="Rational Application Developer"> </head> <body> <% if(request.getParameter("query") != null){ // retrieve the first and last name as search parameters String lastname = request.getParameter("lastname"); String firstname = request.getParameter("firstname"); // Retrieve the party try{ PartyServiceProxy myPSP = new PartyServiceProxy(); PersonSearch myPSrch = new PersonSearch(); myPSrch.setLastName(lastname); myPSrch.setGivenNameOne(firstname); Control myControl = new Control(); myControl.setRequesterName("wasadmin"); myControl.setRequestId(12312);

Chapter 5. Financial services business scenario

269

PersonSearchResultsResponse myPSRR = myPSP.searchPerson(myControl, myPSrch); %> <table border="1" width="70%"> <tr><td colspan="2"><p>Customer Details:</td></tr> <% PersonSearchResult myPSrchR = myPSRR.getSearchResult(0); %> <tr><td colspan="2"><p><%=myPSrchR.getGivenNameOne()%> <%=myPSrchR.getLastName()%> <% PersonSearch myPSResult = myPSrchR.getMatchedFields(); long partyId = myPSResult.getPartyId().longValue(); %> <p style="line-height: normal"><%=myPSResult.getAddrLineOne()%><br> <%=myPSResult.getCityName() %></p></td></tr> <%// get the balance for checking from DB PartyAdminSysKeyResponse myPASKR = null; String strCustId = null; boolean proceedFlag = false; try{ myPASKR = myPSP.getPartyAdminSysKeyByPartyId(myControl,"1000000",partyId); strCustId = myPASKR.getAdminSysKey().getAdminSysPartyId(); proceedFlag = true; } catch (Exception e){ proceedFlag = false; } // define the vars so that we can re-use them. javax.sql.DataSource ds = null; java.sql.Connection con = null; String query = null; PreparedStatement stmt = null; int custid = 0; ResultSet rs = null; javax.naming.InitialContext ctx = new javax.naming.InitialContext(); ds = (javax.sql.DataSource) ctx.lookup("jdbc/test"); con = ds.getConnection("db2inst1","itso13sj"); if(proceedFlag){ query = "select balance from db2inst1.checking " + " where customerid =?"; stmt = con.prepareStatement(query); custid = Integer.parseInt(strCustId); stmt.setInt(1, custid); rs = stmt.executeQuery(); if(rs.next()){ %> <tr><td width="40%">Balance for Checking Account:</td><td align="right"><%=rs.getBigDecimal("balance")%></td></tr> <% } stmt.close(); } else { %> <tr><td colspan="2"><p> The customer does not have a checking account.</p></td></tr> <% } // now get the balance for the savings account try{ myPASKR = myPSP.getPartyAdminSysKeyByPartyId(myControl,"1000001",partyId); strCustId = myPASKR.getAdminSysKey().getAdminSysPartyId();

270

Master Data Management: IBM InfoSphere Rapid Deployment Package

strCustId = strCustId.substring(0,strCustId.length()-1); proceedFlag = true; } catch (Exception e){ proceedFlag = false; } if(proceedFlag){ query = "select balance from db2inst1.savings where savingsid =?"; custid = Integer.parseInt(strCustId); stmt = con.prepareStatement(query); stmt.setInt(1, custid); rs = stmt.executeQuery(); if(rs.next()){ %> <tr><td width="40%">Balance for Savings Account:</td><td align="right"><%=rs.getBigDecimal("balance")%></td></tr> <% } stmt.close(); } else { %> <tr><td colspan="2"><p> The customer does not have a savings account.</p></td></tr> <% } // now get the balance for the Loan account try{ myPASKR = myPSP.getPartyAdminSysKeyByPartyId(myControl,"1000002",partyId); strCustId = myPASKR.getAdminSysKey().getAdminSysPartyId(); proceedFlag = true; } catch (Exception e){ proceedFlag = false; } if(proceedFlag){ query = "select balance from db2inst1.loan where customerid =?"; custid = Integer.parseInt(strCustId); stmt = con.prepareStatement(query); stmt.setInt(1, custid); rs = stmt.executeQuery(); if(rs.next()){ %> <tr><td width="40%">Balance for Loan:</td><td align="right"><%=rs.getBigDecimal("balance")%></td></tr> <% } stmt.close(); } else { %> <tr><td colspan="2"><p> The customer does not have a Loan.</p></td></tr> <% } %> </table> <% } catch(Exception e) { e.printStackTrace();

Chapter 5. Financial services business scenario

271

} } else { %> <p>Please enter the search criteria: <form action="test.jsp" method="get"> <p>First Name: <input type="text" name="firstname"/> <p>Last Name: <input type="text" name="lastname"/> <input type="hidden" name="query"/> <p><input type="submit" value="Submit"/> </form> <% } %> </body> </html>

5.9 Operational processing


After the initial loading is successful, FBankCoT has a need to update the MDM repository with updated master data at periodic intervals. For this purpose, an operational load is scheduled at specific time intervals during a week. Although there are other options of doing operational load processing, such as using RDP MDM jobs, for this example we have used MDM RDP assets. They use MDM runtime maintenance services for processing the operational load. This section explains the steps required to implement operational load for FBANKCOT using the MDM RDP asset. The steps for processing operational load are as follows: 1. Extract delta data from the source system. Because we have three data sources, we receive three operational data files. This extraction can also be done using the ETL tool, depending on the customer situation. However, for this example, we assume that no direct connectivity is available for ETL, and FBankCoT has scheduled a batch process on the source system to receive data files for daily operational processing. These files could have the updated records or new records coming from the source systems. 2. Use DataStage to convert the received files from the source system to SIF, which can be done in a similar way as was done for the initial load. 3. After converting the data to one SIF, convert the SIF file into a Sequence SIF file as the batch processor utility that we will use requires the data to be in specific sequence. A DataStage job named IL_000_Autostart_EX in

272

Master Data Management: IBM InfoSphere Rapid Deployment Package

DataStage and QualityStage RDP assets allows us to convert SIF file to a Sequence SIF format, as shown in Figure 5-113. 4. See 2.3, Standard Interface File (SIF) on page 16 to get details of the sequence file format.

Figure 5-113 Job IL_000_Autostart_EX

5. Open the batch_extension.properties file, which is located in the following folder:


/opt/IBM/MDM/CAM_MDM902_08192010_2159_DB2_BE01/BatchProcessor/properties

Modify the file by setting the following parameter: ParseAndExecConfiguration.Parser = TCRMService 6. Open the Batch.properties file, which is located in the following folder:
/opt/IBM/MDM/CAM_MDM902_08192010_2159_DB2_BE01/BatchProcessor/properties

Modify the file by setting the following two parameters:


ServerConfiguration.provider_url=corbaloc:iiop:<ServerName:portNumber>

For example: ServerConfiguration.provider_url= corbaloc:iiop:gandalf.torolab.ibm.com:9825 ServerConfiguration.context_factory = <CTX_FACTORY> For example: ServerConfiguration.context_factory=com.ibm.websphere.naming.WsnI nitialContextFactory

Chapter 5. Financial services business scenario

273

7. We have the sequence SIF file ready and also we have set the required parameters in the properties file. We can now call the batch processor. For Linux, the batch processor utility (runbatch.sh) is in the following path:
/opt/IBM/MDM/CAM_MDM902_08192010_2159_DB2_BE01/BatchProcessor/bin/

Run the following command: runbatch.sh inputFile outputPath batch_extensionPropertiesFileName For example: runbatch.sh /opt/IBM/MDM/Regression/BatchFramework/seed/delta.sif /opt/IBM/MDM/Regression/BatchFramework/seed/logs batch_extension 8. Check the log files for successful execution of delta load in the folder specified as the output path. 9. Go to the following directory: /opt/IBM/MDM/Regression/BatchFramework/seed/logs The directory contains three log files: batchLoadSuccess.out batchLoadFail.out batchLoadSuspect.out 10.Review the batchLoadFail.out file for any failed record loading to MDM Server. It contains error messages and error reason codes for debugging purposes. 11.After you determine that no errors are listed in the output log files, go to the data stewardship console to verify whether the delta load changes are reflected into MDM repository. See 5.5.7, Verify successful load on page 228 for the process of checking data in the data stewardship UI.

274

Master Data Management: IBM InfoSphere Rapid Deployment Package

Appendix A.

Configuration parameter file


A number of parameters are provided to control the execution of the RDP for MDM jobs. These are all listed in Table A-1 on page 278 through Table A-4 on page 284 with a brief description and their default value. In this book, for ease of understanding, we classified the various parameters into broad categories and sub-categories based on their function. We then also identified the parameters in these categories that must be modified (Table A-5 on page 288) before the RDP for MDM jobs can be executed, and those that you should consider modifying (Table A-6 on page 290) before the RDP for MDM jobs can be executed. Note: For a detailed description of these parameters, see the IBM WebSphere DataStage and QualityStage Version 8 Parallel Job Developer Guide, SC18-9891. The parameters working in conjunction with the CONFIGELEMENT table from MDM as described in 2.2, MDMIS Parameter Set on page 8 list the names of the CONFIGELEMENT records used to derive them in the following tables.

Copyright IBM Corp. 2009, 2011. All rights reserved.

275

Categories and sub-categories


The broad categories and sub-categories that are defined are as follows: SETUP category (Table A-1 on page 278) identifies parameters that are associated with setting up the environment for the RDP for MDM jobs to run, such as the database instance (DB_INSTANCE) to connect to, the location of the various libraries ($APT_DB2INSTANCE_HOME), and the error file directories (FS_ERROR_DIR). We defined the following sub-categories: Connection sub-category includes parameters to access the MDM repository database and includes DB_CONNECT_STRING, DB_INSTANCE, and $APT_DB2INSTANCE_HOME. DS PARAMETER sub-category includes parameters that identify the DataStage code libraries and configuration files and includes DS_PARALLEL_APT_CONFIG_FILE. Miscellaneous sub-category includes parameters such as the date format in the SIF file (DS_SOURCE_DATE_FORMAT) and the language code (DS_LANGUAGE_TYPE_CODE). File location sub-category includes parameters that identify the path of the various files such as error (FS_ERROR_DIR), log (FS_LOG_DIR), and parameter sets (FS_PARAM_SET_DIR). QualityStage sub-category includes parameters that specify whether standardization and matching should be performed, such as QS_STAN_ADDRESS, QS_STAN_PERSON_NAME, QS_STAN_ORG_NAME, QS_PERFORM_MATCH, and QS_PERFORM_ORG_MATCH. If standardization and matching is requested, then parameters in support of these functions can be customized, such as QS_A1_MATCH_CUTOFF_PERSON, QS_A2_MATCH_CUTOFF_PERSON, and QS_MATCH_PERSON_NATID. ERROR HANDLING category (Table A-2 on page 282) identifies parameters that define the action to be taken when errors are detected by the RDP for MDM jobs, such as the action to be taken when duplicates are detected in the SIF (DS_DETECTED_DUPLICATES_ACTION), the number of errors threshold beyond which a job should be aborted (DS_DROP_MAX_ITERATIONS), and the severity level above which parties should be dropped (DS_PARTY_DROP_SEVERITY_LEVEL).

276

Master Data Management: IBM InfoSphere Rapid Deployment Package

We defined the following sub-categories: DROP sub-category includes parameters that define when records should be dropped given an error condition, such as DS_DETECTED_DUPLCATES_ACTION and DS_PARTY_DROP_SEVERITY_LEVEL. Notification sub-category includes parameters that identify the persons to be notified when errors occur, such as DS_EMAIL_ERROR_CHECK_DISTRIBUTION and DS_EMAIL_ERROR_CHECK_REPORT. Abort handling sub-category includes parameters that specify the error conditions and thresholds that should cause the job to abort, such as DS_DROP_MAX_ITERATIONS, DS_FAILED_COLUMNIZATION_ACTION, and DS_SIF_INDIVIDUAL_ERROR_THRESHOLD. Error Consolidation sub-category includes parameters that specify whether a party or contract should be dropped when a specific error occurs, such as DROP_ON_PRVBY_ERR and DROP_ON_REPLB_ERR. RUNTIME category (Table A-3 on page 284) identifies parameters that must be provided at run time to uniquely identify that execution and include the SIF files to be processed (FS_SIF_FILE_PATTERN and FS_HIERARCHY_SIF_FILE_PATTERN), the processing date (DS_PROCESSING_DATE) and batch ID (BATCH_ID). There are no sub-categories. ADVANCED category (Table A-4 on page 284) identifies parameters that afford greater control over the performance and functionality of the RDP for MDM jobs such as the type of history records created (LOAD_HISTORY_FLAG), the columns used to calculate the checksum used in address matching (DS_MD5_CRITICAL_ADDRESS_COLUMNS), and the next value to be used in surrogate key generation (SK_MID_CONT_ID_NEXT_VAL) and the file holding it (SK_MID_CONT_ID_SIF). We defined the following sub-categories: DEBUG sub-category includes parameters for debugging, such as DS_LAND_FILE_FLAG. DS PARAMETER sub-category includes parameters that control the DataStage environment variables, such as $APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL and $APT_NO_SORT_INSERTION. HISTORY sub-category includes the parameter LOAD_HISTORY_FLAG that controls history record creation.

Appendix A. Configuration parameter file

277

SETUP sub-category includes the DS_MD5_CRITICAL_ADDRESS_COLUMNS parameter that specifies the columns used to calculate the checksum used in address matching. SURROGATE sub-category parameters specifies the next surrogate key value (SK_MID_CONT_ID_NEXT_VAL) to be used and the file (SK_MID_CONT_ID_SIF) it is to be taken from for the various identifiers.
Table A-1 RDP configuration parameters by the SETUP category
Sub category Parameter Default Name in CONFIGELEMENT
Connection DB_CONNECT_STRING (blank) Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset Values to control the database access. Will vary between environments.Is set in the MDM_CONNECTION parmset (blank) Pad character to be used in the Load jobs DataStage Configuration file for parallel jobs

Description

DB_USERID

(blank)

DB_PASSWORD

(blank)

DB_SCHEMA

(blank)

DB_CLIENT_INSTANCE

(blank)

DB_SERVER_INSTANCE

(blank)

DB_ALIAS

(blank)

$APT_DB2INSTANCE_HOME DS PARAMETER DS_STRING_PADCHAR DS_PARALLEL_APT_CONFIG_FILE

/home/dsadm/remote_db2config 0x0 /opt/IBM/InformationServer/Server/Configuratio ns/MDM_Default.apt /opt/IBM/InformationServer/Server/Configuratio ns/MDM_1X1.apt %yyyy-%mm-%dd WebSphere Customer Center

DS_SEQUENTIAL_APT_CONFIG_FILE

DataStage Configuration file for sequential jobs Timestamp format in the SIF Files MDM deployment name required by the jobs reading and writing to the CONFIGELEMENT table. Must match the deployed MDM application name in order for it to update the correct values. MDM Language ID - 100 (default). 100 = English Use NativeKey for Contract_ID resolution - 1 (true) / 0 (false).

Miscellaneous

DS_SOURCE_DATE_FORMAT MDM_DEPLOYMENT_NAME

DS_LANGUAGE_TYPE_CODE

100

DS_USE_NATIVE_KEY

/IBM/ELMDM/IIS/Contract/useNativeKey/enabled

278

Master Data Management: IBM InfoSphere Rapid Deployment Package

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

File location

DS_SUPPORT_FILE_DIR

/mdmisdata03/Projects/MDMISINT3/FREQ/

Directory where required files are installed. For instance FREQUENCY files used by QS Match. (at present this seems to be the only files stored there) Data set headers directory. The place where .ds files descriptors are stored. Actually ds data are stored in the database.

FS_DATA_SET_HEADER_DIR

/mdmisdata03/Projects/MDMISINT3/DATA/

/IBM/ELMDM/IIS/Install/ISDataSetHeaders/path FS_ERROR_DIR /mdmisdata03/Projects/MDMISINT3/ERROR/ /IBM/ELMDM/IIS/Install/ErrorFiles/path FS_LOG_DIR FS_PARAM_SET_DIR FS_REJECT_DIR /mdmisdata03/data/MDMIS/LOG/ ./ParameterSets/ /mdmisdata03/Projects/MDMISINT3/REJECT/ /IBM/ELMDM/IIS/Install/RejectFiles/path FS_SK_FILE_DIR /mdmisdata03/Projects/MDMISINT3/SK/ /IBM/ELMDM/IIS/Install/SKFiles/path FS_TMP_DIR /mdmisdata03/data/MDMIS/TMP/ 205 Temporary files directory Specify Org A1a Minimum Match Score - 205 Surrogate key files directory. Log files directory Parameter Set directory Reject files directory Error files directory

QualityStage

QS_A1_MATCH_CUTOFF_ORGANIZATION

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/MatchScores/a1 QS_A1_MATCH_CUTOFF_PERSON 205 Specify Person A1 Minimum Match Score 205

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/MatchScores/a1 QS_A2_MATCH_CUTOFF_ORGANIZATION 175 Specify Org A2b Minimum Match Score - 175

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/MatchScores/a2 QS_A2_MATCH_CUTOFF_PERSON 175 Specify Person A2 Minimum Match Score 175 (default).

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/MatchScores/a2 QS_ALLOW_LOB_MATCH 0 Allow match across Line of Business - 1 (true) / 0 (false)

/IBM/Party/SuspectProcessing/PersistDuplicateParties/enabled QS_B_MATCH_CUTOFF_ORGANIZATION 150 Specify Org B Minimum Match Score - 150

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/MatchScores/b QS_B_MATCH_CUTOFF_PERSON 150 Specify Person B Minimum Match Score 150

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/MatchScores/b

Appendix A. Configuration parameter file

279

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

QualityStage

QS_EXCLUDE_FIELDS_FROM_MATCH_ORGANIZATIO N

(blank)

Select Critical Data Fields for Organization Match.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgAddress/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgCity/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgCountry/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgState/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgCountry/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgPostCode/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgEstablishedDate/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString1/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString2/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString3/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString4/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgNationalID/enabled QS_EXCLUDE_FIELDS_FROM_MATCH_PERSON (blank) Select Critical Data Fields for Individual Match.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonAddress/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonBirthDate/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonCity/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonCountry/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonGender/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString1/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString2/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString3/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString4/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonNationalID/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonPostCode/enabled /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonState/enabled QS_MATCH_ORG_1 I1 Specify Variable Match String type and TpCd for organization: these are values I1 through I12 which correspond to entries in the CDIDTP table in the ID_TP_CD column. For example, I1 corresponds to the Social Security Number as shown in Figure A-1 on page 292. The MDM UI allows you to add up to 4 user specified columns as shown in Figure A-1 on page 292 to include in the match beside the 8 already pre-specified ones. The actual values here are MDM codes denoting the available match identifiers.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString1/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString1/TpCd QS_MATCH_ORG_2 I2 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString2/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString2/TpCd QS_MATCH_ORG_3 (blank) (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString3/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString3/TpCd QS_MATCH_ORG_4 I3 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString4/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgMatchString4/TpCd QS_MATCH_ORG_NATID I8 Specify Variable Match NationalId for organization: I8 (default) corresponds to the passport number. The MDM UI allows you to specify the document used for national ID (drivers license, passport number, SSN etc.)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgNationalID/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Org/OrgNationalID/TpCd

280

Master Data Management: IBM InfoSphere Rapid Deployment Package

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

QualityStage

QS_MATCH_PERSON_1

C1

Specify Variable Match String type and TpCd for person, C1 through C8: these correspond to entries in the CDCONTMETHTP table in the CONT_METH_TP_CD column as shown in Figure A-2 on page 293. The MDM UI allows you to add up to 4 user specified columns to include in the match beside the already pre-specified ones. The actual values here are MDM codes denoting the available match contact methods.

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString1/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString1/TpCd QS_MATCH_PERSON_2 C3 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString2/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString2/TpCd QS_MATCH_PERSON_3 C5 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString3/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString3/TpCd QS_MATCH_PERSON_4 C7 (blank)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString4/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonMatchString4/TpCd QS_MATCH_PERSON_NATID C2 Specify Variable Match NationalId for person - C2 (default). The MDM UI allows you to specify the document used for national ID (drivers license, passport number, SSN, and so on.)

/IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonNationalID/type /IBM/ELMDM/IIS/CriticalDataFieldsMatch/Person/PersonNationalID/TpCd QS_PERFORM_ORG_MATCH 0 Perform Organization Match - 1 (true) / 0 (false)

/IBM/Party/SuspectProcessing/enabled QS_PERFORM_PERSON_MATCH 0 /IBM/Party/SuspectProcessing/enabled QS_PHONETIC_CODING_TYPE_ADDRESS QSNYSIIS May be QSSOUNDEX or custom Perform Person Match -1 (true) / 0 (false)

/IBM/ELMDM/IIS/StandardizeAddress/PhoneticCodingType QS_PHONETIC_CODING_TYPE_ORGANIZATION QSNYSIIS May be QSSOUNDEX or custom

/IBM/ELMDM/IIS/StandardizeOrganizationName/PhoneticCodingType QS_PHONETIC_CODING_TYPE_PERSON QSNYSIIS May be QSSOUNDEX or custom

/IBM/ELMDM/IIS/StandardizePersonName/PhoneticCodingType

Appendix A. Configuration parameter file

281

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

QualityStage

QS_REJECT_ADDRESS_IF_NOT_STANDARDIZED

Reject record if standardization fails - 1 (true) / 0 (false)

/IBM/ELMDM/IIS/StandardizeAddress/RejectOnFail QS_REJECT_ORG_NAME_IF_NOT_STANDARDIZED 0 Reject record if standardization fails -1 (true) / 0 (false) If standardization leaves unhandled data AND STREET_NAME, BOX_ID, DEL_ID are all null

/IBM/ELMDM/IIS/StandardizeOrganizationName/RejectOnFail QS_REJECT_PERSON_NAME_IF_NOT_STANDARDIZE D 0 Reject record if standardization fails - 1 (true) / 0 (false)

/IBM/ELMDM/IIS/StandardizePersonName/RejectOnFail QS_STAN_ADDRESS 0 If not equal to com.ibm.mdm.thirdparty.integration.iis8.adap ter.InfoServerStandarizerAdapter, will bypass standardization for address.

/IBM/Party/Standardizer/Address/className QS_STAN_ORG_NAME 0 If not equal to com.ibm.mdm.thirdparty.integration.iis8.adap ter.InfoServerStandarizerAdapter, will bypass standardization for name.

/IBM/Party/Standardizer/Name/className QS_STAN_PERSON_NAME 0 If not equal to com.ibm.mdm.thirdparty.integration.iis8.adap ter.InfoServerStandarizerAdapter, will bypass

/IBM/Party/Standardizer/Name/className

a. A1 is described in 3.6, Matching on page 47 b. A2 is described in 3.6, Matching on page 47 Table A-2 RDP configuration parameters by the ERROR HANDLING category
Sub category Parameter Default Name in CONFIGELEMENT DROP
DS_DETECTED_DUPLICATES_ACTION E Action to take if duplicates (same key only) records are detected in the SIF file. The duplicate records will be removed from input. E: Error all duplicates / K: Keep first, error others.

Description

/IBM/ELMDM/IIS/Errors/detectedDuplicated/action DS_PARTY_DROP_SEVERITY_LEVEL 4 Party will be dropped if there are errors with severity <= DS_PARTY_DROP_SERVERITY_LEVEL. Severity level ranges from 0 (worst) to 10 (least)

/IBM/ELMDM/IIS/Errors/partyDropSeverity/level

282

Master Data Management: IBM InfoSphere Rapid Deployment Package

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

Notification

DS_EMAIL_ERROR_CHECK_DISTRIBUTION

Space-separated list of email address to receive error count report of SIF errors by file (abort or not is controlled by three parameters: DS_SIF_ERROR_THRESHOLD, DS_SIF_INDIVIDUAL_ERROR_THRESHOLD, DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT) 1 Flag to indicate whether the error report (of SIF file error count) should be emailed at all. (abort or not is controlled by three parameters: DS_SIF_ERROR_THRESHOLD, DS_SIF_INDIVIDUAL_ERROR_THRESHOLD, DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT) Number of times the job Contract_Iterative_Drop or Party_Iterative_Drop will run (at max). The job will abort when this level is reached. If it fails their are problems with the data. Either fix the data or increase this value and restart the job. Action if ANY row fails columnization - F: Fail (Default) / C: Continue. Fail == abort the job

DS_EMAIL_ERROR_CHECK_REPORT

Abort handling

DS_DROP_MAX_ITERATIONS

10

DS_FAILED_COLUMNIZATION_ACTION

/IBM/ELMDM/IIS/Errors/failedColunmization/action DS_FAILED_RECORDIZATION_ACTION F Action if ANY row fails recordization. Warning! Setting to C may break the row counter - F: Fail (Default) / C: Continue. Faill == abort the job

/IBM/ELMDM/IIS/Errors/failedRecordization/action DS_SIF_ERROR_THRESHOLD 101 Percentage of ALL SIF records with errors that cause the job stream to abort (any value above 100 will skip this check)

/IBM/ELMDM/IIS/Errors/failedIfInputRowsInError/percentage DS_SIF_INDIVIDUAL_ERROR_THRESHOLD 101 Percentage of an individual SIF File's Records with errors that will cause the job stream to abort (Any value over 100 will skip this check) Number of Individual SIF Files, whose Error Threshold has been exceeded, that are required for an abort. Identifier assigned by party was dropped. 0. do not drop the party, but drop the identifier record. 1. drop the party. ReasonCode100385 severity <=party drop Contact Rel from party error action. 0 - do not drop the party, but drop the contact rel record. 1 - drop the party. ReasonCode100383 severity <=party drop Provided by party ID RI Validation 0 - Party will not be dropped, 1 - drop parties when the provided by party was dropped. ReasonCode100381 severity <=party drop severity level Reply by contract ID RI Validation 0 - Contract will not be dropped. 1 - drop contracts when the REPLBY contract was dropped ReasonCode 100388 severity <=party drop severity level Err_mssage_tp_cd associated with Duplicate primary key. Used in conjunction with MDMIS.DS_DETECTED_DUPLICATES_ACTION ErrorCodeIncludeList PipeDelimited List of ErrorCodes used for error thresholding in job EC_Error_Check |0|110127|110184|110125|110126|1626|110208|110209|1 10385| Reset the assigned by to null, if the assigned by party was dropped. This parameter is only useful when DROP_ON_ASSIGNEDBY_ERR=0. Error Reason code 100391 severity <=party drop severity level

DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT

101

Error Consolidation (MDM_EC parameter set)

DROP_ON_ASSIGNEDBY_ERR

DROP_ON_FROM_ERR

DROP_ON_PRVBY_ERR

DROP_ON_REPLBY_ERR

DUPLICATES_ERR_MSG_TP_CD

12

INCLUDE_LIST_ERR_MSG_TP_CD

|0|110127|110184|110125|110126|162 6|110208|110209|110385|

RESET_ON_ASSIGNEDBY_ERR

Appendix A. Configuration parameter file

283

Table A-3 RDP configuration parameters by the RUNTIME category


Sub category Parameter Default Name in CONFIGELEMENT Runtime
FS_HIERARCHY_SIF_FILE_PATTERN

Description

/mdmisdata03/Projects/MDMISINT3/SIF _IN/sanitycheck/*.hsif

Hierarchy SIF files pattern. Includes full path and file mask. All files meeting this pattern are read by RDP jobs.

/IBM/ELMDM/IIS/Install/HierarchySIFInputFiles/path
FS_SIF_FILE_PATTERN

/mdmisdata03/Projects/MDMISINT3/SIF _IN/sanitycheck/*.sif /IBM/ELMDM/IIS/Install/SIFInputFiles/path

SIF files pattern. Includes full path and file mask. All files meeting this pattern are read by the RDP jobs

BATCH_ID (auto assigned)

Batch ID generated at run time if IL_000_AutoStart_EX is used. If IL_000_INITIAL_LOAD is used batch ID is assigned through the parameterset. Will be appended to every output filename generated during the job run Generated at run time. Can be used to fix the processing date if you are restarting the load at a later date.

DS_PROCESSING_DATE (auto assigned)

1900-01-01 00:00:00

Table A-4 RDP configuration parameters by the ADVANCED category


Sub category Parameter Default Name in CONFIGELEMENT
DEBUG DS_LAND_FILE_FLAG 0 When set to 1 only Insert Update data sets are created and the load jobs are not run. 0 implies database will be updated. (blank) (blank) (blank) (blank) (blank) (blank) (blank) (blank) History flag to set history records creation type - C: Compound /S: Simple / N: None.

Description

DS PARAMETER

$APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL $APT_IMPORT_PATTERN_USES_FILESET $APT_IMPORT_PATTERN_USES_FILESET_MOUNTED $APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS $APT_NO_PART_INSERTION $APT_NO_SORT_INSERTION $APT_SORT_INSERTION_OPTIMIZATION $APT_OLD_BOUNDED_LENGTH

True True True True True True True True C

HISTORY

LOAD_HISTORY_FLAG

/IBM/ELMDM/IIS/Install/History/type SETUP DS_MD5_CRITICAL_ADDRESS_COLUMNS ADDR_LINE_ONE,ADDR_LINE_TWO,ADDR_L INE_THREE,CITY_NAME,POSTAL_CODE,PR OV_STATE_TP_CD,COUNTRY_TP_CD,RESI DENCE_NUM /IBM/ELMDM/IIS/MD5CriticalAddressColumns/value The columns used to calculate the MD5 checksum used in address "matching"

284

Master Data Management: IBM InfoSphere Rapid Deployment Package

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

SURROGATE

SK_LOAD_SUFFIX

88

Constant value that is appended to each surrogate key. This avoids possible key collisions with MDM Service generated IDs 88

/IBM/ELMDM/IIS/Key/LoadSuffix/value SK_MASK PPPMMMMMMMMMMMSS Format of surrogate keys. Example PPPMMMMMMMMMMMSS P=set size of Cyclical Sequence,M=set size of midSequence. S=set size of load suffix PPPMMMMMMMMMMMSS

/IBM/ELMDM/IIS/Key/Mask/value SK_MID_ADDRESS_ID_NEXT_VAL 1 /IBM/ELMDM/IIS/StartAtKeys/AddressID/part2 SK_MID_ADDRESS_ID_SF SK_MID_ALERT_ID_NEXT_VAL skMid_ADDRESS_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/AlertID/part2 SK_MID_ALERT_ID_SF SK_MID_CONT_EQUIV_ID_NEXT_VAL skMid_ALERT_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/ContEquivID/part2 SK_MID_CONT_EQUIV_ID_SF SK_MID_CONT_ID_NEXT_VAL skMid_Contacts_CONTEQUIV_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/ContactID/part2 SK_MID_CONT_ID_SF SK_MID_CONT_REL_ID_NEXT_VAL skMid_Contacts_CONT_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/ContactRelID/part2 SK_MID_CONT_REL_ID_SF SK_MID_CONTACT_METHOD_ID_NEXT_VAL skMid_ContactRel_CONT_REL_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for CONTACT_METHOD_ID The file that holds the previous surrogate key Surrogate key value for CONT_REL_ID The file that holds the previous surrogate key Surrogate key value for CONT_ID The file that holds the previous surrogate key Surrogate key value for CONT_EQUIV_ID The file that holds the previous surrogate key Surrogate key value for ALERT_ID Surrogate key value for ADDRESS_ID

/IBM/ELMDM/IIS/StartAtKeys/ContactMethodID/part2 SK_MID_CONTACT_METHOD_ID_SF SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL skMid_CONTACT_METHOD_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for CONTR_COMP_VAL_ID

/IBM/ELMDM/IIS/StartAtKeys/ContractComponentValID/part2 SK_MID_CONTR_COMP_VAL_ID_SF SK_MID_CONTR_COMPONENT_ID_NEXT_VAL skMid_CONTR_COMP_VAL_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for CONTR_COMPONENT_ID

/IBM/ELMDM/IIS/StartAtKeys/ContractComponentID/part2 SK_MID_CONTR_COMPONENT_ID_SF SK_MID_CONTRACT_ID_NEXT_VAL skMid_CONTR_COMPONENT_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/ContractID/part2 The file that holds the previous surrogate key Surrogate key value for CONTRACT_ID

Appendix A. Configuration parameter file

285

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

SURROGATE

SK_MID_CONTRACT_ID_SF SK_MID_CONTRACT_ROLE_ID_NEXT_VAL

skMid_CONTRACT_ID.sf 1

The file that holds the previous surrogate key Surrogate key value for CONTRACT_ROLE_ID

/IBM/ELMDM/IIS/StartAtKeys/ContractRoleID/part2 SK_MID_CONTRACT_ROLE_ID_SF SK_MID_HIER_ULT_PAR_ID_NEXT_VAL skMid_CONTRACT_ROLE_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for HIER_ULT_PAR_ID

/IBM/ELMDM/IIS/StartAtKeys/HierarchyUltimateParentID/part2 SK_MID_HIER_ULT_PAR_ID_SF SK_MID_HIERARCHY_ID_NEXT_VAL skMid_HIER_ULT_PAR_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/HierarchyID/part2 SK_MID_HIERARCHY_ID_SF SK_MID_HIERARCHY_NODE_ID_NEXT_VAL skMid_HIERARCHY_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for HIERARCHY_NODE_ID /IBM/ELMDM/IIS/StartAtKeys/HierarchyNodeID/part2 SK_MID_HIERARCHY_NODE_ID_SF SK_MID_HIERARCHY_REL_ID_NEXT_VAL skMid_HIERARCHY_NODE_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for HIERARCHY_REL_ID /IBM/ELMDM/IIS/StartAtKeys/HierarchyRelID/part2 SK_MID_HIERARCHY_REL_ID_SF SK_MID_IDENTIFIER_ID_NEXT_VAL skMid_HIERARCHY_REL_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/IdentifierID/part2 SK_MID_IDENTIFIER_ID_SF SK_MID_LOB_REL_ID_NEXT_VAL skMid_Identifier_IDENTIFIER_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/LOBRelID/part2 SK_MID_LOB_REL_ID_SF SK_MID_LOCATION_GROUP_ID_NEXT_VAL skMid_LOB_REL_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for LOCATION_GROUP_ID /IBM/ELMDM/IIS/StartAtKeys/LocationGroup/part2 SK_MID_LOCATION_GROUP_ID_SF SK_MID_MISCVALUE_ID_NEXT_VAL skMid_LOCATION_GROUP_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/MiscValueID/part2 SK_MID_MISCVALUE_ID_SF skMid_MISCVALUE_ID.sf The file that holds the previous surrogate key The file that holds the previous surrogate key Surrogate key value for MISCVALUE_ID The file that holds the previous surrogate key Surrogate key value for LOB_REL_ID The file that holds the previous surrogate key Surrogate key value for IDENTIFIER_ID The file that holds the previous surrogate key Surrogate key value for HIERARCHY_ID

286

Master Data Management: IBM InfoSphere Rapid Deployment Package

Sub category

Parameter

Default Name in CONFIGELEMENT

Description

SURROGATE

SK_MID_NATIVE_KEY_ID_NEXT_VAL

1 /IBM/ELMDM/IIS/StartAtKeys/NativeKeyID/part2

Surrogate key value for NATIVE_KEY_ID

SK_MID_NATIVE_KEY_ID_SF SK_MID_ORG_NAME_ID_NEXT_VAL

skMid_NativeKey_NATIVE_KEY_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/OrgName/part1

The file that holds the previous surrogate key Surrogate key value for ORG_NAME_ID

SK_MID_ORG_NAME_ID_SF SK_MID_PERSON_NAME_ID_NEXT_VAL

skMid_OrgName_ORG_NAME_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/PersonName/part2

The file that holds the previous surrogate key Surrogate key value for PERSON_NAME_ID

SK_MID_PERSON_NAME_ID_SF SK_MID_PERSON_SEARCH_ID_NEXT_VAL

skMid_PersonName_PERSON_NAME_ID.sf 1

The file that holds the previous surrogate key Surrogate key value for PERSON_SEARCH_ID

/IBM/ELMDM/IIS/StartAtKeys/PersonSearch/part2 SK_MID_PERSON_SEARCH_ID_SF SK_MID_PPREF_ID_NEXT_VAL skMid_PersonName_PERSON_SEARCH_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/PPrefID/part2 SK_MID_PPREF_ID_SF SK_MID_ROLE_LOCATION_ID_NEXT_VAL skMid_PrivPref_PPREF_ID.sf 1 The file that holds the previous surrogate key Surrogate key value for ROLE_LOCATION_ID /IBM/ELMDM/IIS/StartAtKeys/RoleLocationID/part2 SK_MID_ROLE_LOCATION_ID_SF SK_MID_SUSPECT_ID_NEXT_VAL skMid_ROLE_LOCATION_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/SuspectID/part2 SK_MID_SUSPECT_ID_SF SK_PREFIX_CONT_ID_NEXT_VAL skMid_Contacts_SUSPECT_ID.sf 1 The file that holds the previous surrogate key The file that holds the previous surrogate key Surrogate key value for SUSPECT_ID The file that holds the previous surrogate key Surrogate key value for PPREF_ID

Surrogate key value for CONT_ID

/IBM/ELMDM/IIS/StartAtKeys/ContactID/part1 SK_PREFIX_CONT_ID_SF skPrefix_Contacts_CONT_ID.sf

The file that holds the previous surrogate key Surrogate key value for CONTRACT_ID

SK_PREFIX_CONTRACT_ID_NEXT_VAL

1 /IBM/ELMDM/IIS/StartAtKeys/ContractID/part1

SK_PREFIX_CONTRACT_ID_SF SK_PREFIX_HIERARCHY_ID_NEXT_VAL

skPrefix_Contracts_CONTRACT_ID.sf 1 /IBM/ELMDM/IIS/StartAtKeys/HierarchyID/part1

The file that holds the previous surrogate key Surrogate key value for HIERARCHY_ID

SK_PREFIX_HIERARCHY_ID_SF

skPrefix_HIERARCHY_ID.sf

The file that holds the previous surrogate key

Appendix A. Configuration parameter file

287

MUST MODIFY parameters


Table A-5 lists all the parameters that, in our opinion, is a good practice. Database connection details, file names and directories, and key DataStage parameters all require to be provided before RDP for MDM jobs can be launched. It is also a good practice to enable standardization and matching; the default is to disable these jobs. Guidelines are provided where appropriate.
Table A-5 RDP configuration parameters MUST MODIFY list
Frequency Category Sub category
Connection

Parameter

Recommendation to set parameter to

One time

SETUP

DB_CONNECT_STRING DB_INSTANCE DB_PASSWORD DB_SCHEMA DB_USERID $APT_DB2INSTANCE_HOME

(blank) (blank) (blank) (blank) (blank) /home/dsadm/remote_db2config TRUE 0x0 /opt/IBM/InformationServer/Server/Configuratio ns/MDM_Default.apt /opt/IBM/InformationServer/Server/Configuratio ns/MDM_1X1.apt WebSphere Customer Centera 100 /mdmisdata03/data/MDMIS/PARAMETERS/ /mdmisdata03/Projects/MDMISINT3/DATA/ /mdmisdata03/Projects/MDMISINT3/ERROR/ /mdmisdata03/data/MDMIS/LOG/ ./ParameterSets/ /mdmisdata03/Projects/MDMISINT3/REJECT/ /mdmisdata03/Projects/MDMISINT3/SK/ /mdmisdata03/data/MDMIS/TMP/ 1 1/1/1900 /mdmisdata03/Projects/MDMISINT3/SIF_IN/san itycheck/*.hsif /mdmisdata03/Projects/MDMISINT3/SIF_IN/san itycheck/*.sif

DS PARAMETER

$APT_IMPORT_PATTERN_USES_FILESET_MOUNTED DS_STRING_PADCHAR DS_PARALLEL_APT_CONFIG_FILE

DS_SEQUENTIAL_APT_CONFIG_FILE

Miscellaneous

MDM_DEPLOYMENT_NAME DS_LANGUAGE_TYPE_CODE

File location

DS_SUPPORT_FILE_DIR FS_DATA_SET_HEADER_DIR FS_ERROR_DIR FS_LOG_DIR FS_PARAM_SET_DIR FS_REJECT_DIR FS_SK_FILE_DIR FS_TMP_DIR

Runtime

Runtime

BATCH_ID (auto assigned) DS_PROCESSING_DATE (auto assigned) FS_HIERARCHY_SIF_FILE_PATTERN

FS_SIF_FILE_PATTERN

288

Master Data Management: IBM InfoSphere Rapid Deployment Package

Frequency

Category

Sub category
DS PARAMETER

Parameter

Recommendation to set parameter to

One time

ADVANCED

$APT_IMPEXP_ALLOW_ZERO_LENGTH_FIXED_NULL $APT_IMPORT_PATTERN_USES_FILESET $APT_IMPORT_REJECT_STRING_FIELD_OVERRUNS $APT_SORT_INSERTION_OPTIMIZATION

true true true true I2 I1 1 1 1 1 1

Recurring

SETUP

QualityStage

QS_MATCH_ORG_NATID QS_MATCH_PERSON_NATID QS_PERFORM_ORG_MATCH QS_PERFORM_PERSON_MATCH QS_STAN_ADDRESS QS_STAN_ORG_NAME QS_STAN_PERSON_NAME

a. This name must match the name used when deploying the MDM application.

Appendix A. Configuration parameter file

289

CONSIDER MODIFYING parameters


Table A-6 lists all the parameters that in our opinion, you should consider modifying. Guidelines are provided where appropriate.
Table A-6 RDP configuration parameters in the CONSIDER MODIFYING list
Frequency Category Sub category
Miscellaneous

Parameter

Recommendation to set parameter to

One time

SETUP

DS_SOURCE_DATE_FORMAT DS_USE_NATIVE_KEY

%yyyy-%mm-%nn %hh:%nn%ss.6 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

ADVANCED

SURROGATE

SK_MID_ADDRESS_ID_NEXT_VAL SK_MID_ALERT_ID_NEXT_VAL SK_MID_CONT_EQUIV_ID_NEXT_VAL SK_MID_CONT_ID_NEXT_VAL SK_MID_CONT_REL_ID_NEXT_VAL SK_MID_CONTACT_METHOD_ID_NEXT_VAL SK_MID_CONTR_COMP_VAL_ID_NEXT_VAL SK_MID_CONTR_COMPONENT_ID_NEXT_VAL SK_MID_CONTRACT_ID_NEXT_VAL

One time

ADVANCED

SURROGATE

SK_MID_CONTRACT_ROLE_ID_NEXT_VAL SK_MID_HIER_ULT_PAR_ID_NEXT_VAL SK_MID_HIERARCHY_ID_NEXT_VAL SK_MID_HIERARCHY_NODE_ID_NEXT_VAL SK_MID_HIERARCHY_REL_ID_NEXT_VAL SK_MID_IDENTIFIER_ID_NEXT_VAL SK_MID_LOB_REL_ID_NEXT_VAL SK_MID_LOCATION_GROUP_ID_NEXT_VAL SK_MID_MISCVALUE_ID_NEXT_VAL SK_MID_NATIVE_KEY_ID_NEXT_VAL SK_MID_ORG_NAME_ID_NEXT_VAL SK_MID_PERSON_NAME_ID_NEXT_VAL SK_MID_PERSON_SEARCH_ID_NEXT_VAL SK_MID_PPREF_ID_NEXT_VAL SK_MID_ROLE_LOCATION_ID_NEXT_VAL SK_MID_SUSPECT_ID_NEXT_VAL SK_PREFIX_CONT_ID_NEXT_VAL SK_PREFIX_CONTRACT_ID_NEXT_VAL SK_PREFIX_HIERARCHY_ID_NEXT_VAL

290

Master Data Management: IBM InfoSphere Rapid Deployment Package

Frequency

Category

Sub category
QualityStage

Parameter

Recommendation to set parameter to

Recurring Recurring

SETUP

QS_ALLOW_LOB_MATCH QS_EXCLUDE_FIELDS_FROM_MATCH_ORGANIZATIO N QS_EXCLUDE_FIELDS_FROM_MATCH_PERSON QS_MATCH_ORG_1 QS_MATCH_ORG_2 QS_MATCH_ORG_3 QS_MATCH_ORG_4 QS_MATCH_PERSON_1 QS_MATCH_PERSON_2 QS_MATCH_PERSON_3 QS_MATCH_PERSON_4 QS_PHONETIC_CODING_TYPE_ADDRESS QS_PHONETIC_CODING_TYPE_ORGANIZATION QS_PHONETIC_CODING_TYPE_PERSON QS_REJECT_ADDRESS_IF_NOT_STANDARDIZED QS_REJECT_ORG_NAME_IF_NOT_STANDARDIZED QS_REJECT_PERSON_NAME_IF_NOT_STANDARDIZE D

0 (blank)

(blank) (blank) (blank) (blank) (blank) C1 C3 C5 C7 QSNYSIIS QSNYSIIS QSNYSIIS 0 0 0

Error Handling

DROP

DS_DETECTED_DUPLICATES_ACTION DS_PARTY_DROP_SEVERITY_LEVEL

E 4

Notification

DS_EMAIL_ERROR_CHECK_DISTRIBUTION DS_EMAIL_ERROR_CHECK_REPORT 1 10 F F 101 101 101

Abort handling

DS_DROP_MAX_ITERATIONS DS_FAILED_COLUMNIZATION_ACTION DS_FAILED_RECORDIZATION_ACTION DS_SIF_ERROR_THRESHOLD DS_SIF_INDIVIDUAL_ERROR_THRESHOLD DS_SIF_INDIVIDUAL_ERROR_THRESHOLD_KOUNT

Appendix A. Configuration parameter file

291

The match columns for organization (QS_MATCH_ORG_*) and person (QS_MATCH_PERSON_*) in Table A-1 on page 278 allow you to specify match fields to be used: Allowable values I1 through I12 correspond to entries in the ID_TP_CD column in the CDIDTP table in the ID_TP_CD as shown in Figure A-1. The actual values here are MDM codes denoting the available match identifiers. For example, I1 corresponds to the Social Security Number. Allowable values C1 through C8 correspond to entries in the CONT_METH_TP_CD column in the CDCONTMETHTP table as shown in Figure A-2 on page 293. The actual values here are MDM codes denoting the available match contact methods. For example, C1 corresponds to the Home Telephone number.

Figure A-1 CDIDTP table contents: corresponds to the In columns

292

Master Data Management: IBM InfoSphere Rapid Deployment Package

Figure A-2 CDCONTMETHTP table contents: corresponds to the Cn columns

Note: The match columns for organization (QS_MATCH_ORG_*) and person (QS_MATCH_PERSON_*) in Table A-1 on page 278 are also derived from the CONFIGELEMENT table in MDM when using the Matching Critical Data Rules UI as shown in 2.5, Configuration screens in the MDM Server UI on page 19.

Appendix A. Configuration parameter file

293

294

Master Data Management: IBM InfoSphere Rapid Deployment Package

Appendix B.

Standard Interface File details


This appendix provides an overview of the Record Type/Sub Type (RT/ST) mapping of the Standard Interface File (SIF). The SIF has 23 RT/ST combinations (Table B-1 on page 297 through Table B-23 on page 308) to populate party and contact information in the MDM data repository, each with specific fields that almost mirror corresponding columns and tables in the MDM data repository model. It includes RT/ST combinations to define hierarchies (HH, HN, HR and HU). Map the key data columns in your source systems to corresponding columns in the appropriate SIF RT/ST records before they can be loaded into the MDM repository using RDP for MDM jobs. Note: SIF supports both inserts to and updates of records in the MDM repository, but not delete operations. In this book, we cover both inserts (for initial load) and updates to perform delta processing. To map the columns in your source systems to the SIF, the data types of each column in the RT/ST must be known; this information is defined in the RT/ST templates that is provided as part of the RDP for MDM solution. Table B-1 on

Copyright IBM Corp. 2009, 2011. All rights reserved.

295

page 297 through Table B-23 on page 308 do not contain the data type information. When a value in the column of an RT/ST record can be null (as indicated by the letter N in the Can be empty? column in Table B-1 on page 297), you can define the action to be taken on the value in the corresponding column of the MDM data repository when NULL is supplied as a value in that RT/ST column as described next. Set the null indicator for that column in the RT/ST to a 1 or 0. The Mapping Rule specifies the action to be taken on the value in the corresponding column of the MDM data repository. The null indicator columns (names beginning with NULL_) and their corresponding Mapping Rule are shown in Table B-1 on page 297 through Table B-23 on page 308. For example, in the RT/ST, the NULL_PREF_LANG_TP_CD in Table B-1 on page 297 column corresponds to the PREF_LANG_TP_CD column (which can be empty) and the Mapping rule for it specifies that the following action be taken: If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value This action specifies the following information: Setting a 1 in the NULL_PREF_LANG_TP_CD column in the PP RT/ST SIF record indicates that you want the value in the corresponding column in the MDM repository to be set to NULL. Setting to 0 with a null in the PREF_LANG_TP_CD column indicates that the corresponding column in the MDM data repository should retain its prior value. This applies to the case of an update operation. Setting to 0 with a non-null value in the PREF_LANG_TP_CD column indicates that the corresponding column in the MDM data repository should be over-written with the value in the PREF_LANG_TP_CD column. This setting is applicable in an update operation. Note: If the PREF_LANG_TP_CD column has a value, the null indicator setting does not apply. Table B-1 on page 297 through Table B-23 on page 308 provide a high-level overview of the individual columns and mapping rules for each of the 23 RT/ST combinations.

296

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table B-1 Contact information RT/ST is PP


Column name Can be empty? N N N N Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE FORCE_MATCH CONTEQUIV_DESCRIPTION ACCE_COMP_TP_CD PREF_LANG_TP_CD CONTACT_NAME SOLICIT_IND CONFIDENTIAL_IND CLIENT_IMP_TP_CD CLIENT_ST_TP_CD CLIENT_POTEN_TP_CD RPTING_FREQ_TP_CD LAST_STATEMENT_DT ALERT_IND PRVBY_ADMIN_SYS_TP_CD PRVBY_ADMIN_CLIENT_ID DO_NOT_DELETE_IND SOURCE_IDENT_TP_CD LAST_USED_DT LAST_VERIFIED_DT SINCE_DT LEFT_DT ACCESS_TOKEN_VALUE ORG_TP_CD INDUSTRY_TP_CD ESTABLISHED_DT BUY_SELL_AGR_TP_CD PROFIT_IND MARITAL_ST_TP_CD BIRTHPLACE_TP_CD CITIZENSHIP_TP_CD HIGHEST_EDU_TP_CD AGE_VER_DOC_TP_CD GENDER_TP_CODE BIRTH_DT DECEASED_DT CHILDREN_CT DISAB_START_DT DISAB_END_DT USER_IND NULL_DESCRIPTION NULL_ACCE_COMP_TP_CD NULL_PREF_LANG_TP_CD NULL_CONTACT_NAME NULL_SOLICIT_IND NULL_CONFIDENTIAL_IND NULL_CLIENT_IMP_TP_CD NULL_CLIENT_ST_TP_CD NULL_CLIENT_POTEN_TP_CD NULL_RPTING_FREQ_TP_CD NULL_LAST_STATEMENT_DT NULL_ALERT_IND NULL_PROVIDED_BY_CONT NULL_DO_NOT_DELETE_IND NULL_SOURCE_IDENT_TP_CD NULL_LAST_USED_DT NULL_LAST_VERIFIED_DT NULL_SINCE_DT NULL_LEFT_DT NULL_ACCESS_TOKEN_VALUE NULL_INDUSTRY_TP_CD NULL_ESTABLISHED_DT NULL_BUY_SELL_AGR_TP_CD NULL_PROFIT_IND NULL_MARITAL_ST_TP_CD NULL_BIRTHPLACE_TP_CD NULL_CITIZENSHIP_TP_CD NULL_HIGHEST_EDU_TP_CD NULL_AGE_VER_DOC_TP_CD NULL_GENDER_TP_CODE NULL_BIRTH_DT NULL_DECEASED_DT NULL_CHILDREN_CT NULL_DISAB_START_DT NULL_DISAB_END_DT NULL_USER_IND

"P" "P" or "O" (Cannot be updated) CDADMINSYSTP U update, A add, empty either add or update as applicable "Y" or "N" CDACCETOCOMPTP CDLANGTP

CDCLIENTIMPTP CDCLIENTSTTP CDCLIENTPOTENTP CDRPTINGFREQTP

CDADMINSYSTP

CDSOURCEIDENTTP

MUST BE EMPTY if SUBTYPE = "P", REQUIRED FOR SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "P" MUST BE EMPTY if SUBTYPE = "P" MUST BE EMPTY if SUBTYPE = "P" MUST BE EMPTY if SUBTYPE = "P" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O" MUST BE EMPTY if SUBTYPE = "O"

CDORGTP CDINDUSTRYTP CDBUYSELLAGREETP CDMARITALSTTP CDCOUNTRYTP CDCOUNTRYTP CDHIGHESTEDUTP CDAGEVERDOCTP not validated

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Appendix B. Standard Interface File details

297

Table B-2 OrgName information RT/ST is PG


Column name Can be empty? N N N N Y N N Y Y Y Y Y Y Y N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE ORG_NAME_TP_CD ORG_NAME S_ORG_NAME START_DT END_DT LAST_USED_DT LAST_VERIFIED_DT SOURCE_IDENT_TP_CD P_ORG_NAME NULL_S_ORG_NAME NULL_END_DT NULL_LAST_USED_DT NULL_LAST_VERIFIED_DT NULL_SOURCE_IDENT_TP_CD

"P" "G" Not required. U update, A add, empty either add or update as applicable CDORGNAMETP

Use Processing Date if not supplied.

CDSOURCEIDENTTP

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-3 Person Name / Person Search information RT/ST is PH


Column name Can be empty? N N N N Y Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y N N N N N N N N N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE PREFIX_NAME_TP_CD PREFIX_DESC NAME_USAGE_TP_CD FREE_FORM_NAME GIVEN_NAME_ONE GIVEN_NAME_TWO GIVEN_NAME_THREE GIVEN_NAME_FOUR LAST_NAME GENERATION_TP_CD SUFFIX_DESC START_DT END_DT USE_STANDARD_IND LAST_USED_DT LAST_VERIFIED_DT SOURCE_IDENT_TP_CD P_LAST_NAME P_GIVEN_NAME_ONE P_GIVEN_NAME_TWO P_GIVEN_NAME_THREE P_GIVEN_NAME_FOUR GIVEN_NAME_ONE_SEARCH GIVEN_NAME_TWO_SEARCH GIVEN_NAME_THREE_SEARCH GIVEN_NAME_FOUR_SEARCH LAST_NAME_SEARCH NULL_PREFIX_NAME_TP_CD NULL_PREFIX_DESC NULL_GIVEN_NAME_ONE NULL_GIVEN_NAME_TWO NULL_GIVEN_NAME_THREE NULL_GIVEN_NAME_FOUR NULL_GENERATION_TP_CD NULL_SUFFIX_DESC NULL_END_DT NULL_USE_STANDARD_IND NULL_LAST_USED_DT NULL_LAST_VERIFIED_DT NULL_SOURCE_IDENT_TP_CD

"P" "H" Not required. U update, A add, empty either add or update as applicable CDPREFIXNAMETP CDNAMEUSAGETP Must be supplied if LAST_NAME is empty. Must be empty if GIVEN_NAME or LAST_NAME present.

Must be empty if FREE_FORM_NAME supplied CDGENERATIONTP Use Processing Date if not supplied.

CDSOURCEIDENTTP

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

298

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table B-4 External Match RT/ST is PE


Column name Can be empty? N N N N Y Y N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE DESCRIPTION LINKTO_ADMIN_SYS_TP_CD LINKTO_ADMIN_CLIENT_ID

"P" "E CDADMINSYSTP U update, A add, empty either add or update as applicable Not required.

Table B-5 Location_Group_Address_Group Address RT/ST is PA


Column name Can be empty? N N N N Y Y Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE UNDEL_REASON_TP_CD MEMBER_IND PREFERRED_IND SOLICIT_IND EFFECT_START_MMDD EFFECT_END_MMDD EFFECT_START_TM EFFECT_END_TM START_DT END_DT LAST_USED_DT LAST_VERIFIED_DT SOURCE_IDENT_TP_CD CARE_OF_DESC ADDR_USAGE_TP_CD COUNTRY_TP_CD RESIDENCE_TP_CD PROV_STATE_TP_CD ADDR_LINE_ONE ADDR_LINE_TWO ADDR_LINE_THREE CITY_NAME POSTAL_CODE ADDR_STANDARD_IND OVERRIDE_IND RESIDENCE_NUM COUNTY_CODE LATITUDE_DEGREES LONGITUDE_DEGREES POSTAL_BARCODE P_CITY BUILDING_NAME STREET_NUMBER STREET_NAME P_STREET_NAME STREET_SUFFIX PRE_DIRECTIONAL POST_DIRECTIONAL BOX_DESIGNATOR BOX_ID STN_INFO STN_ID REGION DEL_DESIGNATOR DEL_ID DEL_INFO

"P" "A" Not required. U update, A add, empty either add or update as applicable CDUNDELREASONT P

Use Processing Date if not supplied.

CDSOURCEIDENTTP CDADDRUSAGETP CDCOUNTRYTP CDRESIDENCETP CDPROVSTATETP

Appendix B. Standard Interface File details

299

Column name

Can be empty? N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N

Mapping rule

Validate to table

NULL_UNDEL_REASON_TP_CD NULL_MEMBER_IND NULL_PREFERRED_IND NULL_SOLICIT_IND NULL_EFFECT_START_MMDD NULL_EFFECT_END_MMDD NULL_EFFECT_START_TM NULL_EFFECT_END_TM NULL_END_DT NULL_LAST_USED_DT NULL_LAST_VERIFIED_DT NULL_SOURCE_IDENT_TP_CD NULL_CARE_OF_DESC NULL_CONTRY_TP_CD NULL_RESIDENCE_TP_CD NULL_PROV_STATE_TP_CD NULL_ADDR_LINE_TWO NULL_ADDR_LINE_THREE NULL_POSTAL_CODE NULL_ADDR_STANDARD_IND NULL_OVERRIDE_IND NULL_RESIDENCE_NUM NULL_COUNTY_CODE NULL_LATITUDE_DEGREES NULL_LONGITUDE_DEGREES NULL_POSTAL_BARCODE NULL_BUILDING_NAME NULL_STREET_NUMBER NULL_STREET_NAME NULL_STREET_SUFFIX NULL_PRE_DIRECTIONAL NULL_POST_DIRECTIONAL NULL_BOX_DESIGNATOR NULL_BOX_ID NULL_STN_INFO NULL_STN_ID NULL_REGION NULL_DEL_DESIGNATOR NULL_DEL_ID NULL_DEL_INFO

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-6 LocationGroup_ContactMethodGroup_ContactMethod RT/ST is PC


Column name Can be empty? N N N N Y Y Y Y Y Y Y Y Y Y Y Y Y Y N Y Y Y Y Y N Y Y Y Y Y Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE UNDEL_REASON_TP_CD MEMBER_IND PREFERRED_IND SOLICIT_IND EFFECT_START_MMDD EFFECT_END_MMDD EFFECT_START_TM EFFECT_END_TM START_DT END_DT LAST_USED_DT LAST_VERIFIED_DT SOURCE_IDENT_TP_CD CONT_METH_TP_CD METHOD_ST_TP_CD ATTACH_ALLOW_IND TEXT_ONLY_IND MESSAGE_SIZE COMMENT_DESC REF_NUM CONT_METH_STD_IND COUNTRY_CODE AREA_CODE EXCHANGE PH_NUMBER EXTENSION

"P" "C" Not required. U update, A add, empty either add or update as applicable CDUNDELREASONTP

Use Processing Date if not supplied.

CDSOURCEIDENTTP CDCONTMETHTP CDMETHODSTATUSTP

300

Master Data Management: IBM InfoSphere Rapid Deployment Package

Column name

Can be empty? N N N N N N N N N N N N N N N N N N N N N N N

Mapping rule

Validate to table

NULL_UNDEL_REASON_TP_CD NULL_MEMBER_IND NULL_PREFERRED_IND NULL_SOLICIT_IND NULL_EFFECT_START_MMDD NULL_EFFECT_END_MMDD NULL_EFFECT_START_TM NULL_EFFECT_END_TM NULL_END_DT NULL_LAST_USED_DT NULL_LAST_VERIFIED_DT NULL_SOURCE_IDENT_TP_CD NULL_METHOD_ST_TP_CD NULL_ATTACH_ALLOW_IND NULL_TEXT_ONLY_IND NULL_MESSAGE_SIZE NULL_COMMENT_DESC NULL_CONT_METH_STD_IND NULL_COUNTRY_CODE NULL_AREA_CODE NULL_EXCHANGE NULL_PH_NUMBER NULL_EXTENSION

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-7 Identifier RT/ST is PI


Column name Can be empty? N N N N Y N Y Y Y Y Y Y Y Y Y Y Y Y N N N N N N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE ID_TP_CD ID_STATUS_TP_CD REF_NUM START_DT END_DT EXPIRY_DT ASSIGNEDBY_ADMIN_SYS_TP_CD ASSIGNEDBY_ADMIN_CLIENT_ID IDENTIFIER_DESC ISSUE_LOCATION LAST_USED_DT LAST_VERIFIED_DT SOURCE_IDENT_TP_CD NULL_ID_STATUS_TP_CD NULL_REF_NUM NULL_END_DT NULL_EXPIRY_DT NULL_ASSIGNED_BY NULL_IDENTIFIER_DESC NULL_ISSUE_LOCATION NULL_LAST_USED_DT NULL_LAST_VERIFIED_DT NULL_SOURCE_IDENT_TP_CD

"P" "I" Not required. U update, A add, empty either add or update as applicable CDIDTP CDIDSTATUSTP Use Processing Date if not supplied.

CDADMINSYSTP

CDSOURCEIDENTTP If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value ref_num can only be null for 1 identifier status type. If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value ######################################################################################################### If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-8 LobRel RT/ST is PB


Column name Can be empty? N N N N Y N N N Y Y N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE ENTITY_NAME LOB_TP_CD LOB_REL_TP_CD START_DT END_DT NULL_END_DT

"P" "B" Not required. U update, A add, empty either add or update as applicable "CONTACT" CDLOBTP CDLOBRELTP Use Processing Date if not supplied.

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Appendix B. Standard Interface File details

301

Table B-9 ContactRel RT/ST is PR


Column name Can be empty? N N N N N N Y N Y Y Y Y Y N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD_TO ADMIN_CLIENT_ID_TO ADMIN_SYS_TP_CD_FROM ADMIN_CLIENT_ID_FROM LOAD_TYPE REL_TP_CD REL_DESC START_DT END_DT REL_ASSIGN_TP_CD END_REASON_TP_CD NULL_REL_DESC NULL_END_DT NULL_REL_ASSIGN_TP_CD NULL_END_REASON_TP_CD

"P" "R" Not required. Not required. TO and FROM SSKs cannot be the same. U update, A add, empty either add or update as applicable CDRELTP Use Processing Date if not supplied. CDRELASSIGNTP CDENDREASONTP If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-10 Contract RT/ST is CH


Column name Can be empty? N N N N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_ID LOAD_TYPE CONTR_LANG_TP_CD CURRENCY_TP_CD FREQ_MODE_TP_CD BILL_TP_CD PREMIUM_AMT NEXT_BILL_DT CURR_CASH_VAL_AMT LINE_OF_BUSINESS BRAND_NAME SERVICE_ORG_NAME BUS_ORGUNIT_ID SERVICE_PROV_ID REPLBY_ADMIN_SYS_TP_CD REPLBY_ADMIN_CONTRACT_ID ISSUE_LOCATION PREMAMT_CUR_TP CASHVAL_CUR_TP ACCESS_TOKEN_VALUE MANAGED_ACCOUNT_IND AGREEMENT_NAME AGREEMENT_NICKNAME SIGNED_DT EXECUTED_DT END_DT ACCOUNT_LAST_TRANSACTION_DT TERMINATION_DT TERMINATION_REASON_TP_CD AGREEMENT_DESCRIPTION AGREEMENT_ST_TP_CD AGREEMENT_TP_CD SERVICE_LEVEL_TP_CD LAST_VERIFIED_DT LAST_REVIEWED_DT PRODUCT_ID CLUSTER_KEY

"C" "H" CDADMINSYSTP U update, A add, empty either add or update as applicable CDLANGTP CDCURRENCYTP CDFREQMODETP CDBILLTP

Required if Reply by contract ID present.

CDADMINSYSTP

CDCURRENCYTP CDCURRENCYTP IMPORTANT: Leave null unless advised by MDM Server expert.

CDTERMINATIONREASONTP CDAGREEMENTSTTP CDAGREEMENTTP CDSERVICELEVELTP

NOT USED

302

Master Data Management: IBM InfoSphere Rapid Deployment Package

Column name

Can be empty? N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N N

Mapping rule

Validate to table

NULL_CONTR_LANG_TP_CD NULL_CURRENCY_TP_CD NULL_FREQ_MODE_TP_CD NULL_BILL_TP_CD NULL_PREMIUM_AMT NULL_NEXT_BILL_DT NULL_CURR_CASH_VAL_AMT NULL_LINE_OF_BUSINESS NULL_BRAND_NAME NULL_SERVICE_ORG_NAME NULL_BUS_ORGUNIT_ID NULL_SERVICE_PROV_ID NULL_REPL_BY_CONTRACT NULL_ISSUE_LOCATION NULL_PREMAMT_CUR_TP NULL_CASHVAL_CUR_TP NULL_ACCESS_TOKEN_VALUE NULL_MANAGED_ACCOUNT_IND NULL_AGREEMENT_NAME NULL_AGREEMENT_NICKNAME NULL_SIGNED_DT NULL_EXECUTED_DT NULL_END_DT NULL_REPLACES_CONTRACT NULL_ACCOUNT_LAST_TRANSACTION_DT NULL_TERMINATION_DT NULL_TERMINATION_REASON_TP_CD NULL_AGREEMENT_DESCRIPTION NULL_AGREEMENT_ST_TP_CD NULL_AGREEMENT_TP_CD NULL_SERVICE_LEVEL_TP_CD NULL_LAST_VERIFIED_DT NULL_LAST_REVIEWED_DT NULL_PRODUCT_ID NULL_CLUSTER_KEY

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-11 Contract RT/ST is CK


Column name Can be empty? N N N N N N Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_FLD_NM_TP_CD ADMIN_CONTRACT_ID LINKTO_ADMIN_FLD_NM_TP_CD LINKTO_ADMIN_CONTRACT_ID CONTRACT_COMP_IND

"C" "K" CDADMINFLDNMTP CDADMINFLDNMTP ANY VALUE INPUT WILL BE OVERRIDDEN TO "N"

Table B-12 Contract Component RT/ST is CC


Column name Can be empty? N N N N Y N N Y Y Y Y Y Y Y Y Y Y Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_ID LOAD_TYPE PROD_TP_CD CONTRACT_ST_TP_CD CURR_CASH_VAL_AMT PREMIUM_AMT ISSUE_DT VIATICAL_IND BASE_IND CONTR_COMP_TP_CD SERV_ARRANGE_TP_CD EXPIRY_DT PREMAMT_CUR_TP CASHVAL_CUR_TP CLUSTER_KEY

"C" "C" Not required. U update, A add, empty either add or update as applicable CDPRODTP CDCONTRACTSTTP

CDCONTRCOMPTP CDARRANGEMENTTP CDCURRENCYTP CDCURRENCYTP

Appendix B. Standard Interface File details

303

Column name

Can be empty? N N N N N N N N N N

Mapping rule

Validate to table

NULL_CURR_CASH_VAL_AMT NULL_PREMIUM_AMT NULL_ISSUE_DT NULL_VIATICAL_IND NULL_BASE_IND NULL_SERV_ARRANGE_TP_CD NULL_EXPIRY_DT NULL_PREMAMT_CUR_TP NULL_CASHVAL_CUR_TP NULL_CLUSTER_KEY

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-13 Contract Role RT/ST is CR


Column name Can be empty? N N N N Y N N Y N N Y Y Y Y Y Y Y Y Y Y Y N N N N N N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_ID LOAD_TYPE ADMIN_CLIENT_SYS_TP_CD ADMIN_CLIENT_ID CONTR_COMP_TP_CD PROD_TP_CD CONTR_ROLE_TP_CD REGISTERED_NAME DISTRIB_PCT IRREVOC_IND START_DT END_DT RECORDED_START_DT RECORDED_END_DT SHARE_DIST_TP_CD ARRANGEMENT_TP_CD ARRANGEMENT_DESC END_REASON_TP_CD NULL_REGISTERED_NAME NULL_DISTRIB_PCT NULL_IRREVOC_IND NULL_END_DT NULL_RECORDED_START_DT NULL_RECORDED_END_DT NULL_SHARE_DIST_TP_CD NULL_ARRANGEMENT_TP_CD NULL_ARRANGEMENT_DESC NULL_END_REASON_TP_CD

"C" "R" Not required. U update, A add, empty either add or update as applicable Not required. CDCONTRCOMPTP CDPRODTP CDCONTRACTROLETP

Use Processing Date if not supplied.

CDSHAREDISTTP CDARRANGEMENTTP CDENDREASONTP If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-14 Role Location RT/ST is CR


Column name Can be empty? N N N N Y N N Y N N N Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_ID LOAD_TYPE ADMIN_CLIENT_SYS_TP_CD ADMIN_CLIENT_ID CONTR_COMP_TP_CD PROD_TP_CD CONTR_ROLE_TP_CD ADDR_USAGE_TP_CD START_DT END_DT UNDEL_REASON_TP_CD NULL_END_DT NULL_UNDEL_REASON_TP_CD

"C" "L" Not required. U update, A add, empty either add or update as applicable Not required. CDCONTRCOMPTP CDPRODTP CDCONTRACTROLETP CDADDRUSAGETP Use Processing Date if not supplied. CDUNDELREASONTP

N N

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

304

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table B-15 Role Location RT/ST is CL


Column name Can be empty? N N N N Y N N Y N N N Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_ID LOAD_TYPE ADMIN_CLIENT_SYS_TP_CD ADMIN_CLIENT_ID CONTR_COMP_TP_CD PROD_TP_CD CONTR_ROLE_TP_CD ADDR_USAGE_TP_CD START_DT END_DT UNDEL_REASON_TP_CD NULL_END_DT NULL_UNDEL_REASON_TP_CD

"C" "L" Not required. U update, A add, empty either add or update as applicable Not required. CDCONTRCOMPTP CDPRODTP CDCONTRACTROLETP CDADDRUSAGETP Use Processing Date if not supplied. CDUNDELREASONTP

N N

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-16 ContractCompVal RT/ST is CV


Column name Can be empty? N N N N Y Y N N N Y Y N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_ID LOAD_TYPE CONTR_COMP_TP_CD PROD_TP_CD DOMAIN_VALUE_TP_CD VALUE_STRING START_DT END_DT NULL_END_DT

"C" "V" Not required. U update, A add, empty either add or update as applicable CDCONTRCOMPTP CDPRODTP CDDOMAINVALUETP Use Processing Date if not supplied.

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-17 MiscValue RT/ST is CM or PM


Column name Can be empty? N N N N Y N Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_OR_CONTRACT_ID LOAD_TYPE MISCVALUE_TP_CD VALUE_STRING PRIORITY_TP_CD SOURCE_IDENT_TP_CD DESCRIPTION START_DT END_DT VALUEATTR_TP_CD_0 ATTR0_VALUE VALUEATTR_TP_CD_1 ATTR1_VALUE VALUEATTR_TP_CD_2 ATTR2_VALUE VALUEATTR_TP_CD_3 ATTR3_VALUE VALUEATTR_TP_CD_4 ATTR4_VALUE VALUEATTR_TP_CD_5 ATTR5_VALUE VALUEATTR_TP_CD_6 ATTR6_VALUE VALUEATTR_TP_CD_7 ATTR7_VALUE VALUEATTR_TP_CD_8 ATTR8_VALUE VALUEATTR_TP_CD_9 ATTR9_VALUE

"C" or "P" "M" Not required. U update, A add, empty either add or update as applicable CDMISCVALUETP CDPRIORITYTP CDSOURCEIDENTTP Use Processing Date if not supplied. CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP CDMISCVALUEATTRTP

Appendix B. Standard Interface File details

305

Column name

Can be empty? N N N N N N N N N N N N N N N N N N N N N N N N N

Mapping rule

Validate to table

NULL_VALUE_STRING NULL_PRIORITY_TP_CD NULL_SOURCE_IDENT_TP_CD NULL_DESCRIPTION NULL_END_DT NULL_VALUEATTR_TP_CD_0 NULL_ATTR0_VALUE NULL_VALUEATTR_TP_CD_1 NULL_ATTR1_VALUE NULL_VALUEATTR_TP_CD_2 NULL_ATTR2_VALUE NULL_VALUEATTR_TP_CD_3 NULL_ATTR3_VALUE NULL_VALUEATTR_TP_CD_4 NULL_ATTR4_VALUE NULL_VALUEATTR_TP_CD_5 NULL_ATTR5_VALUE NULL_VALUEATTR_TP_CD_6 NULL_ATTR6_VALUE NULL_VALUEATTR_TP_CD_7 NULL_ATTR7_VALUE NULL_VALUEATTR_TP_CD_8 NULL_ATTR8_VALUE NULL_VALUEATTR_TP_CD_9 NULL_ATTR9_VALUE

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-18 PPrefEntity_PrivPref RT/ST is PS


Column name Can be empty? N N N N Y N N Y Y Y N Y N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID LOAD_TYPE PPREF_REASON_TP_CD SOURCE_IDENT_TP_CD VALUE_STRING START_DT END_DT PPREF_TP_CD PPREF_ACT_OPT_ID NULL_VALUE_STRING NULL_END_DT NULL_PPREF_ACT_OPT_ID

"P" "S" Not required. U update, A add, empty either add or update as applicable CDPPREFREASONTP CDSOURCEIDENTTP Use Processing Date if not supplied. CDPPREFTP PPREFACTIONOPT If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

Table B-19 Alert RT/ST is CT or PT


Column name Can be empty? N N N N Y Y Y N Y Y Y Y N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CONTRACT_OR_CLIENT_ID LOAD_TYPE REMOVED_BY_USER CREATED_BY_USER ALERT_TP_CD ALERT_SEV_TP_CD START_DT END_DT DESCRIPTION NULL_REMOVED_BY_USER NULL_CREATED_BY_USER NULL_ALERT_SEV_TP_CD NULL_END_DT NULL_DESCRIPTION

"C" or "P" "T" Not required. U update, A add, empty either add or update as applicable

CDALTERTP CDALERTSEVTP Use Processing Date if not supplied.

If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value If 1 then set to null, if 0 and column is empty use prior value, if 0 and column is not empty overwrite prior value

306

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table B-20 Hierarchy RT/ST is HH


Column name Can be empty? N N Y N N Mapping rule Validate to table

RECTYPE SUBTYPE LOAD_TYPE NAME HIERARCHY_TP_CD DESCRIPTION START_DT END_DT NULL_DESCRIPTION NULL_END_DT

"H" "H" U update, A add, empty either add or update as applicable CDHIERARCHYTP

N N

if 1 set null, if 0 use prior if 1 set null, if 0 use prior

Table B-21 Hierarchy Node RT/ST is HN


Column name Can be empty? N N Y N N N N Mapping rule Validate to table

RECTYPE SUBTYPE LOAD_TYPE NAME HIERARCHY_TP_CD ADMIN_SYS_TP_CD ADMIN_CLIENT_ID ENTITY_NAME DESCRIPTION START_DT END_DT NODEDESIG_TP_CD LOCALEDESCRIPTION NULL_DESCRIPTION NULL_END_DT NULL_NODEDESIG_TP_CD NULL_LOCALEDESCRIPTION

"H" "H" U update, A add, empty either add or update as applicable CDHIERARCHYTP

N N N N

if 1 set null, if 0 use prior if 1 set null, if 0 use prior if 1 set null, if 0 use prior if 1 set null, if 0 use prior

Table B-22 Hierarchy Rel RT/ST is HR


Column name Can be empty? N N Y N N N N N N Mapping rule Validate to table

RECTYPE SUBTYPE LOAD_TYPE NAME HIERARCHY_TP_CD ADMIN_SYS_TP_CD_PARENT ADMIN_CLIENT_ID_PARENT ADMIN_SYS_TP_CD_CHILD ADMIN_CLIENT_ID_CHILD DESCRIPTION START_DT END_DT

"H" "R" U update, A add, empty either add or update as applicable CDHIERARCHYTP CDADMINSYSTP CDADMINSYSTP

NULL_DESCRIPTION NULL_END_DT

N N

if 1 set null, if 0 use prior if 1 set null, if 0 use prior

Appendix B. Standard Interface File details

307

Table B-23 Hierarchy Ultimate Parent RT/ST is HU


Column name Can be empty? N N Y N N N N Mapping rule Validate to table

RECTYPE SUBTYPE LOAD_TYPE NAME HIERARCHY_TP_CD ADMIN_SYS_TP_CD ADMIN_CLIENT_ID DESCRIPTION START_DT END_DT

"H" "U" U update, A add, empty either add or update as applicable CDHIERARCHYTP CDADMINSYSTP

NULL_DESCRIPTION NULL_END_DT

N N

if 1 set null, if 0 use prior if 1 set null, if 0 use prior

308

Master Data Management: IBM InfoSphere Rapid Deployment Package

Appendix C.

MDM customization considerations


This appendix describes the extensions that are supported by Master Data Management (MDM) Server and the impact of such extensions on the Rapid Deployment Package (RDP) for MDM jobs.

Copyright IBM Corp. 2009, 2011. All rights reserved.

309

C.1 Introduction
Because MDM Server source code is not accessible to clients, there are a number of extension and configuration mechanisms available to adapt the product to your environment. The Extension Framework1 is one of these mechanisms. It is tightly integrated with the kernel of the product. The primary types of extensions are as follows: Data extensions and additions, which allow you to add new data elements and create new business entities with a set of business services to maintain them. Behavior extensions, which allow you to plug in new business rules or functionality. Note: MDM Server also comes with MDM Server Workbench, a development tool to help with the creation of these data and behavior extensions. This workbench is in the form of a plug-in to IBM Rational Software Architect. You may also create new transactions or services using the MDM Server application framework. You can build transactions by constructing new controller and business components, and using the existing Request Framework and Common Components. This appendix briefly describes the following extension information: Data extensions and additions Behavior extensions Impact of extensions on RDP for MDM A brief overview of several considerations involved in extending RDP for MDM is covered here as follows: Extending RDP for MDM Runtime Column Propagation Adding new elements (columns) Modifying existing elements (columns)

MDM Server also uses its own extension framework to plug in certain modules, such as Rules of Visibility, to keep it loosely coupled and easily configurable to turn on or off.

310

Master Data Management: IBM InfoSphere Rapid Deployment Package

C.2 Data extensions and additions


MDM Server provides a mechanism for extending the data model. You can add new attributes to existing tables and add new tables. Extended data elements can be persisted and retrieved as part of existing MDM Server transactions without the need to modify MDM Server code. MDM Server has the following responsibilities when dealing with extended data: Parsing extended data as part of an XML service request and creating extended business objects Invoking validation routines on the extended business objects Populating the extended data elements as part of the MDM Server meta-data so that features such as external validation rules can be used Invoking methods on the extended business object when required to persist or retrieve the extended data elements Constructing XML data as part of the service completion

C.3 Behavior extensions


MDM Server provides a mechanism for extending the behavior of the product in an event-based way. The Pre/Post Transaction and Pre/Post Action points within the product can be extended to provide additional functionality. A transaction equates to a published service, or Controller Component operation. An action equates to an operation on a business logic component. There may be other predefined points that can be extended. They are documented as part of the service specification. You can write extensions to MDM Server behavior as Java code or in a rules engine language. Extensions are organized into Extension Sets, which are similar to the rule sets within a rules engine. Examples include generic prospective client rules or line of business-specific rules like life insurance client rules. The Extension Controller is the gateway from the core application to behavior extensions and is invoked at extension points listed above. It is provided with the following information: Data about extension point that invoked it The transactions object hierarchy The actions object hierarchy, in the case of an action extension point The transaction header that was provided in the original MDM Server request The Extension Controller uses the parameters to determine if any Extension Sets must be further evaluated. Relevant Extension Sets are then interrogated and qualified extensions, either Java or rules sets, are invoked.

Appendix C. MDM customization considerations

311

C.4 Impact of data/behavior extensions on RDP for MDM


The process of extending the MDM data model to support your organization's specific master data requirements is beyond the scope of this IBM Redbooks publication. However, this Appendix provides considerations for extending the MDM Server, and the corresponding impact on the RDP for MDM assets. Because RDP for MDM loads directly into the MDM target tables, creating new MDM Server services or behavior extensions will have no impact on RDP for MDM. However, with extensions to the MDM data model, changes must be made to both the MDM Server and the corresponding RDP for MDM assets. MDM Server provides a code generation tool to allow clients to change existing column attributes, add new columns to existing tables, and add new tables to satisfy business requirements. The code generation tool also generates the web services integration code for these data extensions.

312

Master Data Management: IBM InfoSphere Rapid Deployment Package

Corresponding changes to the RDP for MDM assets depend on the type of data model change, as summarized in Table C-1.
Table C-1 MDM extensions impact on RDP for MDM Type of MDM extension Data extensions and additions Nature of MDM extension Either: Add a new element to an existing SIF record when that element does not participate in some transformation or aggregation Modify an existing element's data type and precision/scale/length when that element does not participate in some transformation or aggregation. Either: Add a new element to an existing SIF record when that element participates in some transformation or aggregation Modify an existing element's data type and precision/scale/length when that element participates in some transformation or aggregation. Add a new table (new SIF record) Behavior extensions New transaction or service Impact on RDP for MDM Change to the corresponding ImportSIF shared container (names starting with ILIS) will propagate through to the target. For BulkLoad, no further changes are required. For Insert (Upsert), change to the corresponding DB shared container (names starting with ILDBIN) is also required.

Change to the corresponding ImportSIF shared container (names starting with ILIS) will propagate through to the target. For BulkLoad, no further changes are required. For Insert (Upsert), change to the corresponding DB shared container (names starting with ILDBIN) is also required. Search for dependent objects where the element is used, and examine transformation or validation logic to see if changes are necessary.

Impact is beyond the scope of a typical RDP for MDM implementation. No impact No impact

Appendix C. MDM customization considerations

313

C.5 Extending RDP for MDM


In general, RDP for MDM jobs have been built using modular design techniques and reusable shared containers. This allows changes to be made to source, target, edit, and validation logic without changing the actual jobs themselves. In this way, if the extensions are confined to existing shared containers, upgrading core RDP jobs without losing client-specific customizations is possible. Shared containers are provided for the following jobs: ImportSIF format Database select (code tables), and target (upsert and bulk load methods) Edit points for pre-validation Validation and Standardization Match Processing Error Conditions ID Assignment

C.6 Runtime column propagation (RCP)


RCP is a feature of IBM InfoSphere Information Server that allows job designs to accommodate additional columns beyond those defined by the DataStage or QualityStage job developer. Using RCP judiciously facilitates re-usable job designs based on input metadata, rather than using a large number of jobs with hard-coded table definitions to perform the same tasks. Furthermore, RCP facilitates re-use through parallel shared containers. By using RCP, only the columns explicitly referenced within the shared container logic must be defined, the remaining columns pass through at run time, as long as each stage in the shared container has RCP enabled on its stage Output properties. Before a DataStage developer can use RCP, it must be enabled at the project level through the administrator client. RCP is then enabled or disabled on the Output tab of each stage. When RCP is enabled, columns not explicitly defined will be passed across the stage from input to output.

314

Master Data Management: IBM InfoSphere Rapid Deployment Package

C.7 Adding new elements (columns)


Most RDP for MDM job designs enable RCP across their stages. In the simplest case, this way allows additional columns to be defined on input by modifying the corresponding ImportSIF shared container. If these new columns are not needed in additional derivations or validations, these additional columns will flow from the source SIF to the target database table. For BulkLoad, no additional changes are necessary. For Insert(Upsert), the table definition within the corresponding DB shared container must also be updated with the new column. For some RDP for MDM jobs, stages, and shared containers, RCP is explicitly disabled. In most cases, RCP is disabled for QualityStage match, because this is a standard practice (only matching key columns are output). However, for other objects and jobs where RCP for MDM is disabled, they should be reviewed to ensure the additional columns are passed down when necessary. Table C-2 summarizes the jobs and containers that might require review. Note: This is an incomplete table, and may change with new releases of RDP assets. If a newly-added element must also be validated, the corresponding EditPoint or Validation shared container should be changed instead of changing an existing base RDP for MDM job.
Table C-2 RDP for MDM objects with RCP disabled Category MDMIS R4 MDMIS R4 MDMIS R4 MDMIS R4/ Shared Containers/EditPointContainers MDMIS R4/ Shared Containers/ValidationStanContainers MDMIS R4/ Shared Containers/ImportSIFContainers MDMIS R4/Shared Containers/DBContainers Job/Container name IL_000_PS_Stage_ErrReasonTbl IL_010_IS_Import_SIF IL_020_VS_Address EPCVSAddress (Container) VSVALAddress (Container) (names start with ILIS) (names start with "ILDBIN")

Appendix C. MDM customization considerations

315

C.8 Modifying existing elements (columns)


By using RCP, changes to existing column attributes such as length, precision, and in some cases even data type can also flow from source to target. Similar to adding new columns, changes must be made in the source ImportSIF shared container, and (if using Insert) target DB shared container. A possibility is that an existing column might also be used in a transformation or aggregation that must be reviewed and updated. Although the Advanced Find feature might be useful in locating some objects, identifying and changing all dependent transformations by exporting the entire RDP for MDM project to a DSX file is easiest. DSX files contain a clear-text representation of all DataStage and QualityStage objects that can be easily searched using a text editor or command-line tool. Edit a copy of the .DSX file, and update all transformations and stages where an existing column is used. This updated copy can be re-imported into the RDP for MDM project. This method is particularly effective for simple derivations. Changes to some elements may require more advanced knowledge of DataStage and QualityStage. For example, if a column is used in a QualityStage standardization or match, those specifications will need to be updated. These advanced or more extensive changes may require the guidance of an IBM Professional Services consultant experienced with RDP for MDM.

316

Master Data Management: IBM InfoSphere Rapid Deployment Package

Appendix D.

Error processing
This appendix describes the most commonly encountered data-related problems in the Standard Interface File (SIF) and how they are highlighted in the Rapid Deployment Package (RDP) for Master Data Management (MDM) error log.

Copyright IBM Corp. 2009, 2011. All rights reserved.

317

D.1 Introduction
Figure D-1 shows that errors processing the SIF might be identified during the various phases (such as Import SIF, Validation & Standardization, and Error Consolidation & Referential Integrity) of RDP for MDM processing. The errors are consolidated into the consolidated error log.

Figure D-1 Main components of RDP processing and error logs generated

The general format of the error messages in the error logs is shown in Table D-1 on page 319. Go to the download site for a document about error codes: http://www.redbooks.ibm.com/redpieces/abstracts/sg247704.html

318

Master Data Management: IBM InfoSphere Rapid Deployment Package

Table D-1 Error message format Field# 1 2 3 4 5 6 7 8 9 SIF column names RECTYPE SUBTYPE ADMIN_SYS_TP_CD ADMIN_CLIENT_ID_OR_ CONTRACT_ID CONT_ID SIF_FILE_NAME SIF_ROW_NUMBER ERR_CODE ERR_REASON_CODE This is the surrogate key (SK) generated for each row This is the physical location of the error row among the input data files This is the error type, such as invalid state/province type code. This field is an integer. This is a specific instance of an error code, such as invalid state/province type code in the Address Validation job. This field is also an integer. This text message corresponds to the error code (ERR_CODE): the literal string "invalid state/province type code" The severity of the error. You should set this to a unique ID which gets assigned to each run of the jobs. It is used in all of the filenames for all files created by the RDP for MDM jobs. This is a surrogate key that we apply inside the RDP for MDM jobs for use there, it is dropped before loading the database (where the CONT_ID is used instead). This is a time stamp corresponding to when the error was detected. This is the name of the stage that detected the error and produced the error row. This is the name of the job in which the ERR_STAGE_NAME stage resides. Description These two fields identify the record type, such as PA (address), PI (identifier), CC (contract component) These two fields are the SSK (Source System Key)

10

ERR_MSG

11 12

ERR_SEVERITY_LEVEL BATCH_ID

13

INTERNAL_ID

14 15 16

ERR_TS ERR_STAGE_NAME ERR_JOB_NAME

Appendix D. Error processing

319

Important: The error log does not identify the offsets in the record of the fields in error. Also, the sequence of error messages does not reflect the actual time sequence of occurrence of the error. The parallel framework with the default automated partitioning used in the RDP for MDM jobs causes the sequence of these errors to be non-deterministic. This means that reruns of the same job are likely to show the error messages generated in a different order each time. In this appendix, we created a number of SIF files containing the most commonly encountered errors to identify the corresponding error messages generated by the RDP for MDM jobs. The contents of the consolidated error log is shown here. Note: We chose to document each error in isolation to review the corresponding errors generated in the logs. In practice, a combination of these errors is likely to occur. Carefully review the error logs to determine and rectify the errant rows in the input SIF. The most commonly encountered errors are as follows: Pipe (|) character in the data Validation error with the code table RT/ST/ADMIN_SYS_TP_CD error End of record missing Start date after end date error Date format error

320

Master Data Management: IBM InfoSphere Rapid Deployment Package

D.2 Pipe character (|) in the data


The pipe character (|) is the field delimiter in the SIF, and the presence of it in the data will cause the SIF parser to fail with an error message. We introduced a pipe character in the address field of row 28 of the SIF (highlighted in Example D-1. Only the partial contents of the SIF is shown here) to view the error messages generated by the SIF parser. Example D-2 on page 324 shows the contents of the error log for this error: The first record highlights row 28 in the SIF file SIF_Out.pipe with the SSK of (1000000,70005817) that the SIF parser is unable to parse. Error message shows Unable to parse record at RT/ST Level, and the error severity level is 0. The name of the stage (tx_RTST_ci_Rejects) and job name (IL_010_Parse_Columnization) is also provided. This row is rejected. The subsequent records show the rows in the SIF that are also rejected because they are associated with the previous row. The following message is generated to identify the corresponding row numbers (496, 579, 126, 698, and 17) in the input SIF file: Record dropped by association. Fatal errors were detected on related party records. Note: Currently, the pipe character cannot be substituted as the field delimiter, nor is an escape character provided.
Example: D-1 Pipe character in the data error: partial contents of SIF

P|P|1000002|8000719|A|N|||||||1||||||||||||||||||||3||||||1984-05-07 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000090|A|N|||||||2||||||||||||||||||||3|||||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000071|A|N|||||||2|||||||||||||||||||||||||F|1989-08-23 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000041|A|N|||||||1|||||||||||||||||||||||||M|1998-08-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70008172|A|N|||100|||||2|||||||||||||||||||||185|||M|1937-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

Appendix D. Error processing

321

P|P|1000002|8000037|A|N|||||||1||||||||||||||||||||3|||||F|1986-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000297|A|N|||||||1||||||||||||||||||||2|||||F|1995-10-30 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000212|A|N|||||||1||||||||||||||||||||3|||||F|1997-11-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000291|A|N|||||||1|||||||||||||||||||||||||F|1977-03-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004432|A|N|||100|||||3|||||||||||||||||||||185|||M|1976-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004182|A|N|||100|||||2|||||||||||||||||||||185|||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000640|A|N|||||||2||||||||||||||||||||3|||||F|1975-07-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000469|A|N|||||||2||||||||||||||||||||1|||||M|1990-03-14 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000111|A|N|||||||1||||||||||||||||||||1|||||M|1984-06-21 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000201|A|N|||||||1|||||||||||||||||||||||||M|1967-05-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005333|A|N|||100|||||2|||||||||||||||||||||185|||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005817|A|N|||100|||||4|||||||||||||||||||||185|||M|1957-03-29 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000232|A|N|||||||1||||||||||||||||||||1||||||1966-08-25 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000259|A|N|||||||1||||||||||||||||||||2|||||F|1991-02-04 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000011|A|N|||||||3|||||||||||||||||||||||||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

322

Master Data Management: IBM InfoSphere Rapid Deployment Package

P|P|1000001|200000221|A|N|||||||2|||||||||||||||||||||||||M|1977-09-18 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70003022|A|N|||100|||||2|||||||||||||||||||||185|||M|1989-12-11 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|A|1000002|8000640|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||6177 Purple Sage Ct|||San Jose|||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0| P|A|1000002|8000469|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||5528 Muir Dr|||San Jose|||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0| P|A|1000002|8000111|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||631 Ofarrell St|||San Francisco|||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 |0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|A|1000001|200000201|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||1363 14th Ave|||San Francisco|||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 |0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|A|1000000|70005333|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||6181 Camino Verde Dr,,San Jose,95119||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 |0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|A|1000000|70005817|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||PO Box 7424||San Francisco,94120||||||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| P|A|1000002|8000340|A|||||||||2008-10-26 19:11:54.000000||||||1|185|||44 Montgomery St|||San Francisco|||||||||||||||||||||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 |0|0|0|0|0|0|0|0|0|0|0|0|0|0| ............

Appendix D. Error processing

323

Example: D-2 Pipe character in the data error log

P|A|1000000|70005817|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.pipe|28|110362|100401|U nable to parse record at RT/ST Level|0|canonical_errPipe|6696|2008-05-30 09:36:10|tx_RTST_ci_Rejects|IL_010_Parse_Columnization P|C|1000000||70005817|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.pipe|496|110387|100387 |Record dropped by association. Fatal errors were detected on related party records.|0|canonical_errPipe|6696|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|H|1000000|70005817|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.pipe|579|110387|100387| Record dropped by association. Fatal errors were detected on related party records.|0|canonical_errPipe|6696|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|I|1000000|70005817|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.pipe|126|110387|100387| Record dropped by association. Fatal errors were detected on related party records.|0|canonical_errPipe|6696|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|I|1000000|70005817|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.pipe|698|110387|100387| Record dropped by association. Fatal errors were detected on related party records.|0|canonical_errPipe|6696|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|P|1000000|70005817|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.pipe|17|110387|100387|R ecord dropped by association. Fatal errors were detected on related party records.|0|canonical_errPipe|6696|2008-05-30 9:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop

324

Master Data Management: IBM InfoSphere Rapid Deployment Package

D.3 Validation error with the code table


We introduced an invalid code (-13) in the CLIENT_POTEN_TP_CD field of row 2 of the SIF (highlighted in Example D-3; only the partial contents of the SIF is shown here) to view the error messages generated. Example D-4 on page 327 shows the contents of the error log for this error: The first record highlights row 2 in the SIF file SIF_Out.CodeError with the SSK of (1000002,8000090) that is in error. Error message shows The following is not correct: ClientPotentialType, and the error severity level is 0. The name of the stage (020_Contact.CheckCodeAndContentValidationErrors) and job name (IL_020_VS_Contact) is also provided. The second row has the error message Record In Error Dropped for the same row (2) in the SIF. It also It shows name of the stage in which this occurs as being 020Contact.DropErrorRows, and the job name being IL_020_VS_Contact. The subsequent records show the rows (689, 780, and 41) in the SIF that are also rejected because they are associated with row 2 that was dropped. The messages Invalid PersonName Records: No Matching Contact Record (row 689) and Record dropped by association. Fatal errors were detected on related party records. (rows 780 and 41) are generated.
Example: D-3 Validation error with the code table error: partial contents of SIF

P|P|1000002|8000719|A|N|||||||1||||||||||||||||||||3||||||1984-05-07 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000090|A|N|||||||2||-13||||||||||||||||||3|||||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000071|A|N|||||||2|||||||||||||||||||||||||F|1989-08-23 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000041|A|N|||||||1|||||||||||||||||||||||||M|1998-08-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70008172|A|N|||100|||||2|||||||||||||||||||||185|||M|1937-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000037|A|N|||||||1||||||||||||||||||||3|||||F|1986-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

Appendix D. Error processing

325

P|P|1000002|8000297|A|N|||||||1||||||||||||||||||||2|||||F|1995-10-30 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000212|A|N|||||||1||||||||||||||||||||3|||||F|1997-11-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000291|A|N|||||||1|||||||||||||||||||||||||F|1977-03-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004432|A|N|||100|||||3|||||||||||||||||||||185|||M|1976-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004182|A|N|||100|||||2|||||||||||||||||||||185|||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000640|A|N|||||||2||||||||||||||||||||3|||||F|1975-07-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000469|A|N|||||||2||||||||||||||||||||1|||||M|1990-03-14 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000111|A|N|||||||1||||||||||||||||||||1|||||M|1984-06-21 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000201|A|N|||||||1|||||||||||||||||||||||||M|1967-05-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005333|A|N|||100|||||2|||||||||||||||||||||185|||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005817|A|N|||100|||||4|||||||||||||||||||||185|||M|1957-03-29 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000232|A|N|||||||1||||||||||||||||||||1||||||1966-08-25 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000259|A|N|||||||1||||||||||||||||||||2|||||F|1991-02-04 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000011|A|N|||||||3|||||||||||||||||||||||||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000221|A|N|||||||2|||||||||||||||||||||||||M|1977-09-18 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

326

Master Data Management: IBM InfoSphere Rapid Deployment Package

P|P|1000000|70003022|A|N|||100|||||2|||||||||||||||||||||185|||M|1989-12-11 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| ............

Example: D-4 Validation error with the code table error log output

P|P|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|2|1624|100054| The following is not correct: ClientPotentialType|0|canonical_errCode|8611|2008-11-01 09:24:13|020_Contact.CheckCodeAndContentValidationErrors|IL_020_VS_Contact P|P|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|2|110184|10006 6|Record In Error Dropped|0|canonical_errCode|8611|2008-11-01 09:24:13|020Contact.DropErrorRows|IL_020_VS_Contact P|H|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|689|110126|100 246|Invalid PersonName Records: No Matching Contact Record|0|canonical_errCode|8611|2008-11-01 09:25:10|030_CONTACT_RIV.Party Join Proc|IL_030_RI_Contact_Person_Org P|I|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|780|110387|100 387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_errCode|8611|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|A|1000002|8000090|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.CodeError|41|110387|1003 87|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_errCode|8611|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop

Appendix D. Error processing

327

D.4 RT/ST/ADMIN_SYS_TP_CD error


We introduced an invalid code RT/ST/ADMIN_SYS_TP_CD (PP1) in row 3 of the SIF (highlighted in Example D-5; only the partial contents of the SIF is shown here) to view the error messages generated. Example D-6 on page 330 shows the contents of the error log for this error: The third record highlights row 3 in the SIF file SIF_Out.errRTST with the SSK of (1,200000071) that is in error. Error message shows Invalid Contact Record: No Match found in PersonName nor OrganizationName, and the error severity level is 0. The name of the stage (030_CONTACT_RIV.Process_Contact_Join) and job name (IL_030_RI_Contact_Person_Org) is also provided. The first two records and the ones following the third record are errors resulting from the invalid RT/ST/ADMIN_SYS_TP_CD. Note the various rows (42, 102, 115, 474, and 755), error messages, and stage and job in which these errors were detected.
Example: D-5 RT/ST/ADMIN_SYS_TP_CD error: partial contents of SIF

P|P|1000002|8000719|A|N|||||||1||||||||||||||||||||3||||||1984-05-07 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000090|A|N|||||||2||||||||||||||||||||3|||||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1|200000071|A|N|||||||2|||||||||||||||||||||||||F|1989-08-23 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000041|A|N|||||||1|||||||||||||||||||||||||M|1998-08-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70008172|A|N|||100|||||2|||||||||||||||||||||185|||M|1937-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000037|A|N|||||||1||||||||||||||||||||3|||||F|1986-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000297|A|N|||||||1||||||||||||||||||||2|||||F|1995-10-30 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000212|A|N|||||||1||||||||||||||||||||3|||||F|1997-11-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

328

Master Data Management: IBM InfoSphere Rapid Deployment Package

P|P|1000001|200000291|A|N|||||||1|||||||||||||||||||||||||F|1977-03-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004432|A|N|||100|||||3|||||||||||||||||||||185|||M|1976-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004182|A|N|||100|||||2|||||||||||||||||||||185|||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000640|A|N|||||||2||||||||||||||||||||3|||||F|1975-07-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000469|A|N|||||||2||||||||||||||||||||1|||||M|1990-03-14 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000111|A|N|||||||1||||||||||||||||||||1|||||M|1984-06-21 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000201|A|N|||||||1|||||||||||||||||||||||||M|1967-05-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005333|A|N|||100|||||2|||||||||||||||||||||185|||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005817|A|N|||100|||||4|||||||||||||||||||||185|||M|1957-03-29 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000232|A|N|||||||1||||||||||||||||||||1||||||1966-08-25 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000259|A|N|||||||1||||||||||||||||||||2|||||F|1991-02-04 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000011|A|N|||||||3|||||||||||||||||||||||||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000221|A|N|||||||2|||||||||||||||||||||||||M|1977-09-18 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70003022|A|N|||100|||||2|||||||||||||||||||||185|||M|1989-12-11 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| ..................

Appendix D. Error processing

329

Example: D-6 RT/ST/ADMIN_SYS_TP_CD error log output

P|A|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|42|110107|1000 35|Invalid Internal ID|0|canonical_errRTST|0|2008-11-01 10:00:12|020_Address.CheckCodeAndContentValidationErrors|IL_020_VS_Address P|A|1000001|200000071|-1|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|42|110184|100 024|Record In Error Dropped|0|canonical_errRTST|0|2008-11-01 10:00:11||IL_020_VS_Address P|P|1|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|3|110105|100244|Inva lid Contact Record: No Match found in PersonName nor OrganizationName|0|canonical_errRTST|8625|2008-11-01 10:00:01|030_CONTACT_RIV.Process_Contact_Join|IL_030_RI_Contact_Person_Org P|C|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|102|100995|100 995|Invalid Internal Id|0|canonical_errRTST|0|2008-11-01 09:58:44|020_ContactMethod.Type_Code_Chkup|IL_020_VS_ContactMethod P|C|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|474|100995|100 995|Invalid Internal Id|0|canonical_errRTST|0|2008-11-01 09:58:44|020_ContactMethod.Type_Code_Chkup|IL_020_VS_ContactMethod P|C|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|102|110184|100 077|Record In Error Dropped.|0|canonical_errRTST|0|2008-11-01 09:58:43|020_ContactMethod.Final_Recs_Process|IL_020_VS_ContactMethod P|C|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|474|110184|100 077|Record In Error Dropped.|0|canonical_errRTST|0|2008-11-01 09:58:43|020_ContactMethod.Final_Recs_Process|IL_020_VS_ContactMethod P|I|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|115|110107|100 161|Invalid Internal ID|0|canonical_errRTST|0|2008-11-01 10:00:30|020_Identifier.CheckCodeAndContentValidationErrors|IL_020_VS_Identifier P|I|1000001|200000071|-1|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|115|110184|10 0154|Record in Error Dropped|0|canonical_errRTST|0|2008-11-01 10:00:30|020_Identifier.DropErrorRows|IL_020_VS_Identifier P|H|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|755|110107|100 218|Invalid Internal Id|0|canonical_errRTST|0|2008-11-01 09:58:59|020_PersonName.Validation_Chk|IL_020_VS_PersonName P|H|1000001|200000071|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.errRTST|755|110184|100 219|Record In Error Dropped|0|canonical_errRTST|0|2008-11-01 09:58:58|020_PersonName.Final_Rec_Process|IL_020_VS_PersonName

330

Master Data Management: IBM InfoSphere Rapid Deployment Package

D.5 End of record missing error


We assumed that there was a problem with the code generating the SIF file which resulted in the end of record (DOS Record Terminator Character) being dropped at the end of a row. We simulated this error by concatenating 2 SIF records into a single row as shown in row 2 in Example D-7. The two SSKs (1,955742003) and (1,955742002) appear to be columns in the same row in the SIF. When this error occurs, any additional columns detected after the final expected column (as defined by the metadata) are ignored and a warning message [Import consumed only 74bytes of the record's 164 bytes (no further warnings will be generated from this partition)] is written to the Director log as shown in Figure D-2. Note: The main point here is to carefully review the Director log output for such warnings because they do not appear in the RDP for MDM error logs. The count of bytes (74 in our example) begins after the SSK because that is where the columns begin; the count includes the column delimiter pipe | character.
Example: D-7 End of record missing error: partial contents of SIF

P|H|1|955742001||||1||Alley|Mary|||Barton|||||||||||||||||||1|1|0|0|1|1|0|1|1|0|0|0|0 | P|H|1|955742003||||1||Georgina|Elly|||Colborn|||||||||||||||||||1|1|0|0|1|1|0|1|1|0|0 |0|0|P|H|1|955742002||||1||Cheryl|Lynn|||Ainsworth|||||||||||||||||||1|1|0|0|1|1|0|1| 1|0|0|0|0| P|H|1|955742004||||1||Margaret|F|||Conway|||||||||||||||||||1|1|0|0|1|1|0|1|1|0|0|0|0 |

Figure D-2 End of record missing error: partial contents of Director log output

Appendix D. Error processing

331

D.6 Start date after end date error


We introduced an invalid end date DISAB_END_DT that preceded the start date DISAB_START_DT (date bounds error) in row 8 of the SIF (highlighted in Example D-8; only the partial contents of the SIF is shown here) to view the error messages generated. Example D-9 on page 334 shows the contents of the error log for this error: The second record highlights row 8 in the SIF file SIF_Out.endBeforeStartDate with the SSK of (1000002,8000212) that is in error. Error message shows EndDate must be after StartDate, and the error severity level is 0. The name of the stage (020_Contact.CheckCodeAndContentValidationErrors) and job name (IL_020_VS_Contact) is also provided. The first record also highlights the fact that row 8 is dropped with the error message Record In Error Dropped. The subsequent records are errors resulting from the invalid date bounds. Note the various rows (157, 254, 353, and 675), error messages, and stage and job in which these errors were detected.
Example: D-8 Start date after end date error: partial contents of SIF

P|P|1000002|8000719|A|N|||||||1||||||||||||||||||||3||||||1984-05-07 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000090|A|N|||||||2||||||||||||||||||||3|||||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000071|A|N|||||||2|||||||||||||||||||||||||F|1989-08-23 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000041|A|N|||||||1|||||||||||||||||||||||||M|1998-08-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70008172|A|N|||100|||||2|||||||||||||||||||||185|||M|1937-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000037|A|N|||||||1||||||||||||||||||||3|||||F|1986-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000297|A|N|||||||1||||||||||||||||||||2|||||F|1995-10-30 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

332

Master Data Management: IBM InfoSphere Rapid Deployment Package

P|P|1000002|8000212|A|N|||||||1||||||||||||||||||||3|||||F|1997-11-02 00:00:00.000000|||1999-08-23 00:00:00.000000|1989-08-23 00:00:00.000000||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0| P|P|1000001|200000291|A|N|||||||1|||||||||||||||||||||||||F|1977-03-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004432|A|N|||100|||||3|||||||||||||||||||||185|||M|1976-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70004182|A|N|||100|||||2|||||||||||||||||||||185|||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000640|A|N|||||||2||||||||||||||||||||3|||||F|1975-07-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000469|A|N|||||||2||||||||||||||||||||1|||||M|1990-03-14 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000111|A|N|||||||1||||||||||||||||||||1|||||M|1984-06-21 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000201|A|N|||||||1|||||||||||||||||||||||||M|1967-05-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005333|A|N|||100|||||2|||||||||||||||||||||185|||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70005817|A|N|||100|||||4|||||||||||||||||||||185|||M|1957-03-29 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000232|A|N|||||||1||||||||||||||||||||1||||||1966-08-25 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000259|A|N|||||||1||||||||||||||||||||2|||||F|1991-02-04 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000011|A|N|||||||3|||||||||||||||||||||||||F|1945-03-12 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000221|A|N|||||||2|||||||||||||||||||||||||M|1977-09-18 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0|

Appendix D. Error processing

333

P|P|1000000|70003022|A|N|||100|||||2|||||||||||||||||||||185|||M|1989-12-11 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| ...........


Example: D-9 Start date after end date error log output

P|P|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|8|110 184|100066|Record In Error Dropped|0|canonical_endBeforeStartDate|9107|2008-11-04 04:44:17|020Contact.DropErrorRows|IL_020_VS_Contact P|P|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|8|102 |100056|EndDate must be after StartDate|0|canonical_endBeforeStartDate|9107|2008-11-04 04:44:17|020_Contact.CheckCodeAndContentValidationErrors|IL_020_VS_Contact P|H|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|675|1 10126|100246|Invalid PersonName Records: No Matching Contact Record|0|canonical_endBeforeStartDate|9107|2008-11-04 04:45:44|030_CONTACT_RIV.Party Join Proc|IL_030_RI_Contact_Person_Org P|I|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|254|1 10387|100387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_endBeforeStartDate|9107|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|C|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|353|1 10387|100387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_endBeforeStartDate|9107|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|A|1000002|8000212|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.endBeforeStartDate|157|1 10387|100387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_endBeforeStartDate|9107|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop

334

Master Data Management: IBM InfoSphere Rapid Deployment Package

D.7 Date format error


We introduced an invalid date format BIRTH_DT (dd-mm-yy) in row 1 of the SIF (highlighted in Example D-10; only the partial contents of the SIF is shown here) to view the error messages generated. Example D-11 shows the contents of the error log for this error: The first record highlights row 1 in the SIF file SIF_Out.dateFormatError with the SSK of (1000002,8000719) that is in error. Error message shows Unable to parse record at RT/ST Level, and the error severity level is 0. The name of the stage (tx_RTST_ci_Rejects) and job name (IL_010_Parse_Columnization) is also provided. The subsequent records are errors resulting from the invalid date format. Note the various rows (227, 422, 513, and 859), error messages, and stage and job in which these errors were detected.
Example: D-10 Date format error: partial contents of SIF

P|P|1000002|8000719|A|N|||||||1||||||||||||||||||||3||||||07-05-1984 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000002|8000090|A|N|||||||2||||||||||||||||||||3|||||M|1975-09-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000071|A|N|||||||2|||||||||||||||||||||||||F|1989-08-23 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000001|200000041|A|N|||||||1|||||||||||||||||||||||||M|1998-08-03 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| P|P|1000000|70008172|A|N|||100|||||2|||||||||||||||||||||185|||M|1937-08-02 00:00:00.000000||||||0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0| 0|0|0|0| ...........

Appendix D. Error processing

335

Example: D-11 Date format error log output

P|P|1000002|8000719|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.dateFormatError|1|110362 |100022|Unable to parse record at RT/ST Level|0|canonical_dateFormatError|5925|2008-05-30 09:36:10|tx_RTST_ci_Rejects|IL_010_Parse_Columnization P|H|1000002|8000719|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.dateFormatError|859|1101 26|100246|Invalid PersonName Records: No Matching Contact Record|0|canonical_dateFormatError|5925|2008-10-28 15:13:22|030_CONTACT_RIV.Party Join Proc|IL_030_RI_Contact_Person_Org P|I|1000002|8000719|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.dateFormatError|513|1103 87|100387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_dateFormatError|5925|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|A|1000002|8000719|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.dateFormatError|422|1103 87|100387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_dateFormatError|5925|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop P|C|1000002|8000719|0|/data/RDP/SIF_IN/canonical_err/SIF_Out.dateFormatError|227|1103 87|100387|Record dropped by association. Fatal errors were detected on related party records.|0|canonical_dateFormatError|5925|2008-05-30 09:36:10|Split_Kept_Dropped|IL_040_EC_Party_Last_Drop

336

Master Data Management: IBM InfoSphere Rapid Deployment Package

Appendix E.

Additional material
This book refers to additional material that can be downloaded from the Internet as described.

Locating the web material


The web material associated with this book is available in softcopy on the Internet from the IBM Redbooks web server. Point your web browser at: ftp://www.redbooks.ibm.com/redbooks/SG247704 Alternatively, you can go to the IBM Redbooks website at: ibm.com/redbooks Select the Additional materials and open the directory that corresponds with the IBM Redbooks form number, SG247704.

Copyright IBM Corp. 2009, 2011. All rights reserved.

337

Using the web material


The additional web material that accompanies this book includes the following file: File name SG247704Code.zip Description Compressed code and data used in the scenario

System requirements for downloading the web material


We used the following system configuration: Hard disk space: Operating System: 500 MB minimum Windows

How to use the web material


Create a subdirectory (folder) on your workstation, and extract the contents of the web material ZIP file into this folder.

338

Master Data Management: IBM InfoSphere Rapid Deployment Package

Master Data Management: IBM InfoSphere Rapid Deployment Package

(0.5 spine) 0.475<->0.875 250 <-> 459 pages

Back cover

Master Data Management


IBM InfoSphere Rapid Deployment Package
Implementing faster to see the benefits faster Seeing benefits with a financial services scenario Getting control of your data environment
IBM InfoSphere Rapid Deployment Package (RDP) for Master Data Management (MDM) is a services offering that combines the pre-integration of IBM InfoSphere software with a prescriptive MDM implementation approach to significantly reduce the cost of MDM implementations, and reduce the overall risk. The RDP MDM delivers a fully integrated solution that provides, to your enterprise, a single view of the customer. It also provides a seamless upgrade path to IBM InfoSphere MDM Server, to give you a wide and robust range of MDM functionality. This IBM Redbooks publication is aimed at IT architects, Information Management specialists, and Information Integration specialists responsible for implementing an IBM InfoSphere Master Data Management solution on a Red Hat Enterprise Linux 4.0 platform. A simple financial services MDM scenario describes the RDP for MDM offering. The scenario shows how RDP can deliver a return on investment in a short time frame by using a phased approach. MDM solutions can provide significant benefits to an enterprise. Realizing those benefits and return on investment requires implementation of an MDM solution and a change in how the organization does business. For this reason, how your MDM solution is implemented is often as important as the solution itself.

INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information: ibm.com/redbooks


SG24-7704-01 ISBN 0738435422

Anda mungkin juga menyukai