Anda di halaman 1dari 17

Informatica- Complex Scenarios and their Solutions

Author(s) Aatish Thulasee Das Rohan Vaishampayan Vishal Raj

Date written(MM/DD/YY): 07/01/2003 Declaration


We hereby declare that this document is based on our personal experiences and / or experiences of our project members. To the best of our knowledge, this document does not contain any material that infringes the copyrights of any other individual or organization including the customers of Infosys. Aatish Thulasee Das, Rohan Vaishampayan, Vishal Raj

Project Details

Projects involved: REYREY H/W Platform: 516 RAM, Microsoft Windows 2000 S/W Environment: Informatica Appln. Type: ETL tool Project Type : Dataware housing

Target readers: Datawarehousing team using ETL tools Keywords


ETL Tools, Informatica, Dataware Housing

INDEX INFORMATICA- COMPLEX SCENARIOS AND THEIR SOLUTIONS...........................1

Author(s)........................................................................................................................1 AatishThulaseeDas.......................................................................................................1 RohanVaishampayan.....................................................................................................1 VishalRaj.......................................................................................................................1 Datewritten(MM/DD/YY):07/01/2003........................................................................1 Declaration.....................................................................................................................1 ProjectDetails...............................................................................................................1 Targetreaders:DatawarehousingteamusingETLtools................................................1 Keywords.......................................................................................................................1


INTRODUCTION............................................................................................ ..........................4 SCENARIOS:.......................................................................................................................... ....4

1. PERFORMANCE PROBLEMS WHEN A MAPPING CONTAINS MULTIPLE SOURCES AND TARGETS..................................................................................................... ....4

1.1Background................................................................................................................4 1.2ProblemScenario:.....................................................................................................4 1.3Solution:...................................................................................................................4 DivideandRule.ItisalwaysbettertodividetheComplexmapping(i.e.multiple sourceandTargets)intosimplemappingswithonesourceandonetarget.Thatwill greatlyhelpinmanagingthemappings.Alsoalltherelatedmappingscanbeexecuted inparallelindifferentsessions.Eachsessionwillestablishitsownconnectionandthe servercanhandlealltherequestsinparallelagainstthemultipletargets.Eachsession canbeplacedintotheBatchandruninCONCURRENTmode.................................4


2. WHEN SOURCE DATA IS FLAT FILE.................................................................. .............5

2.1Background...............................................................................................................5 WhatisaFlatFile?.........................................................................................................5 AFlatfileisoneinwhichtabledataisgatheredinlinesofASCIItextwiththevalue fromeachtablecellseparatedbyadelimiterorspaceandeachrowrepresentedwitha newline...........................................................................................................................5 BelowisthesampleFlatFilewhichwasusedduringtheproject..................................5 ........................................................................................................................................5 Fig2.1:In_DailyFlatFile............................................................................................5 2.2ProblemScenario......................................................................................................5 WhentheaboveflatfilewasloadedintoInformaticatheSourceanalyzerwaslike shownbelow....................................................................................................................5 ........................................................................................................................................6 Fig2.2:In_DailyFlatFileafterloadingintoInformatica...........................................6 TwoIssueswhichwereencounteredduringloadingtheaboveshownflatfilesareas following:........................................................................................................................6 2.3Solution....................................................................................................................6 Followingisthesolutionwhichwasincorporatedbyustosolvetheaboveproblem...6 1.Sincethedatawassoheterogeneouswedecidedtokeepallthedatatypesinthe sourcequalifierasStringandchangedthemasperthefieldsinwhichtheywere gettingmapped................................................................................................................6 2.RegardingtheSizeofthefieldswechangedthesizetothemaximumpossiblesize forexampleasmentioned................................................................................................7 ............................................................................................................................7
3 EXTRACTING DATA FROM THE FLAT FILE CONTAINING NESTED RECORD SETS......................................................................................................................... .....................7 4. TOO LARGE LOOKUP TABLES:.................................................................... ...................9

5 COMPLEX LOGIC FOR SEQUENCE GENERATION:................................ .................13

Introduction
This Document is based upon learning that we had during the work on project Reynolds and Reynolds in CAPS (PCC), Pune. We have come up with the Best Practices to overcome the complex scenarios we faced during the ETL process. This Document also tells about some common best practices to follow while developing the mappings.

Scenarios:
1. Performance problems when a mapping contains multiple sources and Targets.
1.1 Background

In Informatica, multiple sources can be mapped with the multiple targets. This property is quite useful to map the relative mappings at one place. This reduces the creation of multiple sessions. Also all the relative loading takes place in one go. It is quite logical to group the different sources and targets in same mapping that contains the same logic.
1.2 Problem Scenario:

In the multiple target scenarios, if there are some complex transformations in some of the sub mappings then the performance is degraded drastically. In this scenario the single database connection is handling multiple database statements. Also it is difficult to manage the mapping. For example if there is performance problem due to one of the sub mapping then other sub mapping will also suffer the performance degradation.
1.3 Solution:

Divide and Rule. It is always better to divide the Complex mapping (i.e. multiple source and Targets) in to simple mappings with one source and one target. That will greatly help in managing the mappings. Also all the related mappings can be executed in parallel in different sessions. Each session will establish its own connection and the server can handle all the requests in parallel against the multiple targets. Each session can be placed in to the Batch and run in CONCURRENT mode.

2. When source data is Flat File


2.1 Background What is a Flat File? A Flat file is one in which table data is gathered in lines of ASCII text with the value from each table cell separated by a delimiter or space and each row represented with a new line. Below is the sample Flat File which was used during the project.

Fig 2.1: In_Daily - Flat File. 2.2 Problem Scenario When the above flat file was loaded into Informatica the Source analyzer was like shown below

Fig 2.2: In_Daily - Flat File after loading into Informatica. Two Issues which were encountered during loading the above shown flat files are as following: 1. Data types of the fields from the flat file and the respective fields from Target tables were not matching. For example refer to Fig 2.1 in the First row i.e. record corresponding to BH the Fourth field is having its Data type as Date also refer the Third row i.e. field corresponding to CR the fourth field is Char and in the target table the corresponding field was having data type as Char. 2. Size of the fields from the flat file and the respective fields from Target tables were not matching. For example refer to Fig 2.1 the Eighth row i.e. record corresponding to QR the fifth field is having its Field size as 100 but after the loading process the source analyzer showed the size of the field equal to 45 (as shown in the Fig 2.2) also the fifth field corresponding to CR is 5 and in the target table the corresponding field was having size equal to 100. 2.3 Solution Following is the solution which was incorporated by us to solve the above problem 1. Since the data was so heterogeneous we decided to keep all the data types in the source qualifier as String and changed them as per the fields in which they were getting mapped.

2. Regarding the Size of the fields we changed the size to the maximum possible size for example as mentioned

3 Extracting data from the flat file containing nested record sets.
3.1 Background: The Flat file shown in the previous section (fig 2.2) contains the nested record set. To explain the nested formation of the record of the above file is restructured in the Fig 3.1.

Level 3

Level 2

Level 1

Fig 3.1: In_Daily - Flat File restructured in the Nested form.

Here the data is in 3 levels. First level of data is containing the Batch File information starting with record BH and ending with BT record. The second level of data is containing Dealer records in the batch file, starting with record DH and ending with DT. The third level of data is containing information of different activities for a particular dealer.
3.2 Problem Scenario:

The data required for loading was in the form such that a single row should consist of dealer detail as well as different activities done by the particular dealer. But only the second level data (i.e. 2nd and 14th rows in the flat file shown above) contain the different dealer details and the Third level of data contains the different activity details for dealers. Both the data required to be concatenated to form single information to load in a single row of target table.
3.3 Solution:

In this particular kind of scenario, the dealer information data (Second Level data) should be stored into variables by putting the condition that satisfies the dealer information. This row should be filtered in the next transformation. So, for that

particular row of flat file (i.e. dealer information) the data is stored in the variables. And for the dealers activity data (Third Level Data), row should be passed to next transformation with the Dealer Information that was stored in the variable during previous row load. The same is done here:

4. Too Large Lookup Tables:


4.1 Background: What is a Lookup Transformation? A Lookup transformation is used in your mapping to look up data in a relational table, view, or synonym (See 4.1). Import a lookup definition from any relational database to which both the Informatica Client and Server can connect. Multiple Lookup transformations can be used in a mapping. The Informatica Server queries the lookup table based on the lookup ports in the transformation (See Fig 4.2). It compares Lookup transformation port values to lookup table column values based on the lookup condition. Use the result of the lookup to pass to other transformations and the target.

You can use the Lookup transformation to perform many tasks, including: Get a related value. For example, if your source table includes employee ID, but you want to include the employee name in your target table to make your summary data easier to read. Perform a calculation. Many normalized tables include values used in a calculation, such as gross sales per invoice or sales tax, but not the calculated value (such as net sales). Update slowly changing dimension tables. You can use a Lookup transformation to determine whether records already exist in the target.

(The actual screens are attached for reference.)

Fig 4.1: LOOKUP is a kind of Transformation.

Lookup Conditions

Fig 4.2: The Lookup Conditions to be specified in order to get Lookup Values.

4.2 Problem Scenario: In the project one of the mappings had large lookup tables that were hampering the performance of the mapping as a. They were consuming a lot of cache memory unnecessarily and b. More time was spent in searching for relatively less number of values from a large lookup table. Thus the loading of data from source table(s) to the target table(s) was unnecessarily consuming more time than it should normally do. 4.3 Our Solution: We eliminated the first problem by simply using the lookup table as one of the source table itself. The source tables & target tables are not cached in Informatica and hence it made sense to use the large lookup table as a source. (See Fig 4.3) This also ensured that Cache memory would not be wasted unnecessarily and could be used for other tasks.

Multiple Source Tables Joined in the Source Qualifier

Source Qualifier

Fig 4.3: The Mapping showing the use of Lookup table as a Source table.

SQL to join the tables User Defined Join

Fig 3.4: The use of Join condition in the Source Qualifier. After using the lookup table as a source we used a joiner condition in the Source Qualifier. This reduced the searching time that was taken by Informatica as the numbers of rows to be searched were drastically reduced since the join condition takes care of the excess rows which would otherwise have been there in the Lookup transformation. Thus the second problem was also successfully eliminated.

5 Complex logic for Sequence Generation:


5.1 Background: What is a Sequence Generator? A sequence generator is transformation that generates a sequence of numbers once you specify a starting value (see Fig 2.2) and the increment by which to increment this starting value. (The actual screens are attached for reference.)

Fig 5.1: The Sequence Generator is a kind of Transformation.

Fig 5.2: The Transformation details to be filled in order to generate a sequence.

5.2 Problem Scenario: In the project one of the mappings had two requirements viz., a. During the transfer of Data to a column of a Target Table the Sequence Generator was required to trigger only selectively. But as per its property, every time a row gets loaded into the Target Table the sequence generator is triggered. b. Another requirement was that the sequences of numbers generated by the Sequence Generator were required to be in order. For e.g.: The values that were to be loaded in the column of the target table were either sequence generated or obtained from a lookup table. So whenever the lookup condition returned a value that value would populate the Target Table but at the same time the Sequence Generator would also trigger and hence increment by 1 so its CURRVAL (current value, see Fig 5.1) would be increment by 1. So when the next value is loaded in the column of the target table the difference between the sequence generated values would be 2 instead of 1. Thus the generated sequence wont be continuous and there would be gaps or holes in the sequence.

5.2 Our Solution: A basic rule for the Sequence Generator is that if a row gets loaded into the Target table the sequence generator gets triggered. In order to prevent the sequence generator from triggering we created two instances of the same target table. (See Fig 5.3)

Sequence Generator Lookup Table

Target Table (Second Instance) Target Table (First Instance)

Fig 5.3: The Mapping showing two instances of the same Target table. The sequence generator was mapped to the column in the Target Table in the first instance (See Fig 5.3) whereas the value returned from the Lookup Table (if any) was mapped to the same column in the Target table in the second instance (See Fig5.3). And all the other values for the remaining columns in the Target Table were filtered on the basis of the value returned from the Lookup Table i.e. if the lookup table returned a value then a row in the second instance of the target table would get populated and thus the sequence generator wont be triggered. If the lookup table returns a null value then a row would get populated in the first instance of the target table and in this case the sequence generator would trigger and its value would get loaded in the column of the Target Table.

Thus by achieving control over the triggering of the sequence generator we could avoid the holes or gaps in the sequence generated by the Sequence generator.