But what if the underlying table upon which lookup was done changes the data after the lookup cache is created? Is there a way so that the cache always remain up-to-date even if the underlying table changes?
A static Lookup Cache to determine if a source record is new or updatable You don't need dynamic Lookup cache for the above type of scenario.
The problem arising from above scenario can be resolved by using dynamic lookup cache Here are some more examples when you may consider using dynamic lookup,
Updating a master customer table with both new and updated customer information coming together as shown above Loading data into a slowly changing dimension table and a fact table at the same time. Remember, you typically lookup the dimension while loading to fact. So you load dimension table before loading fact table. But using dynamic lookup, you can load both simultaneously. Loading data from a file with many duplicate records and to eliminate duplicate records in target by updating a duplicate row i.e. keeping the most recent row or the initial row
Loading the same data from multiple sources using a single mapping. Just consider the previous Retail business example. If you have more than one shops and Linda has visited two of your shops for the first time, customer record Linda will come twice during the same load.
Inserts the row into the cache: If the incoming row is not in the cache, the Integration Service inserts the row in the cache based on input ports or generated Sequence-ID. The Integration Service flags the row as insert. Updates the row in the cache: If the row exists in the cache, the Integration Service updates the row in the cache based on the input ports. The Integration Service flags the row as update. Makes no change to the cache: This happens when the row exists in the cache and the lookup is configured or specified To Insert New Rows only or, the row is not in the cache and lookup is configured to update existing rows only or, the row is in the cache, but based on the lookup condition, nothing changes. The Integration Service flags the row as unchanged.
Notice that Integration Service actually flags the rows based on the above three conditions. And that's a great thing, because, if you know the flag you can actually reroute the row to achieve different logic. Fortunately, as soon as you create a dynamic lookup Informatica adds one extra port to the lookup. This new port is called:
NewLookupRow
Using the value of this port, the rows can be routed for insert, update or to do nothing. You just need to use a Router or Filter transformation followed by an Update Strategy. Oh, forgot to tell you the actual values that you can expect in NewLookupRow port are:
0 = Integration Service does not update or insert the row in the cache. 1 = Integration Service inserts the row into the cache. 2 = Integration Service updates the row in the cache.
When the Integration Service reads a row, it changes the lookup cache depending on the results of the lookup query and the Lookup transformation properties you define. It assigns the value 0, 1, or 2 to the NewLookupRow port to indicate if it inserts or updates the row in the cache, or makes no change.
If you check the mapping screenshot, there I have used a router to reroute the INSERT group and UPDATE group. The router screenshot is also given below. New records are routed to the INSERT group and existing records are routed to the UPDATE group. Screen-shot of the Router (click on the image to expand)
Output old values on update: The Integration Service outputs the value that existed in the cache before it updated the row. Output new values on update: The Integration Service outputs the updated value that it writes in the cache. The lookup/output port value matches the input/output port value.
Note: We can configure to output old or new values using the Output Old Value On Update transformation property.
Insert null values: The Integration Service uses null values from the source and updates the lookup cache and target table using all values from the source. Ignore Null inputs for Update property : The Integration Service ignores the null values in the source and updates the lookup cache and target table using only the not null values from the source.
If we know the source data contains null values, and we do not want the Integration Service to update the lookup cache or target with null values, then we need to check the Ignore Null property for the corresponding lookup/output port. When we choose to ignore NULLs, we must verify that we output the same values to the target that the Integration Service writes to the lookup cache. We can Configure the mapping based on the value we want the Integration Service to output from the lookup/output ports when it updates a row in the cache, so that lookup cache and the target table might not become unsynchronized.
New values. Connect only lookup/output ports from the Lookup transformation to the target. Old values. Add an Expression transformation after the Lookup transformation and before the Filter or Router transformation. Add output ports in the Expression transformation for each port in the target table and create expressions to ensure that we do not output null input values to the target.
We can choose the ports we want the Integration Service to ignore when it compares ports. The Designer only enables this property for lookup/output ports when the port is not used in the lookup condition. We can improve performance by ignoring some ports during comparison. (Learn how to improve performance of lookup transformation here) We might want to do this when the source data includes a column that indicates whether or not the row contains data we need to update. Select the Ignore in Comparison property for all lookup ports except the port that indicates whether or not to update the row in the cache and target table. Note: We must configure the Lookup transformation to compare at least one port else the Integration Service fails the session when we ignore all ports.
In this article let us take up a very trivial but an important aspect that we as DW developers usual face. This is related to loading flat file sources. Whenever we have flat file sources we usual ask source systems for a specific type of field delimiters. Now suppose we have requested our source system for a comma separated flat file which will hold all Employee Information of an organization. Say we ask for a very simple file with five columns Empno, Ename, Job, Sal, Address. Let us name the file as Emp_Src.txt. Below are the sample data in the file EMPNO,ENAME,JOB,SAL,ADDRESS 7900,JAMES,CLERK,950,CHOA CHU KANG 8001,SANJAY,ANALYST,33000,BUKIT BATOK 7654,MARTIN,SALESMAN,1375,RUSSEL ST 7566,JONES,MANAGER,1050,YEW TEE 7844,TURNER,SALESMAN,1650,BISHAN 7698,BLAKE,Manager,3740,JURONG 7788,SC,ANALYST,3300,LAKESIDE 7370,SMITH,CLERK,2500,COMMONWEALTH 7402,ADAM,SERVICE,5500,ADAM ST Very simple, right? Yes, it is. Still we will once again look into the nitty-gritty of importing a flat file in Informatica. The below diagram specifies the different steps while importing a flat file, in our case a simple comma separated file. On successful execution of the Simple File to Table map, below is the dataset in the DB table: Now suppose in the same flat file we have an address which has a comma in its text. Then what happens? Let us execute the same mapping with the below data. The address for EmpNo 7566 has a comma in its text. His address is now Block 35, Yee Tee) EMPNO,ENAME,JOB,SAL,ADDRESS
7900,JAMES,CLERK,950,CHOA CHU KANG 8001,SANJAY,ANALYST,33000,BUKIT BATOK 7654,MARTIN,SALESMAN,1375,RUSSEL ST 7566,JONES,MANAGER,1050,BLOCK 35,YEW TEE 7844,TURNER,SALESMAN,1650,BISHAN 7698,BLAKE,Manager,3740,JURONG 7788,SC,ANALYST,3300,LAKESIDE 7370,SMITH,CLERK,2500,COMMONWEALTH 7402,ADAM,SERVICE,5500,ADAM ST On successful execution, let us check the data in the table.
Oops address got truncated for EmpNo 7566. How do we handle such case? In such scenarios we have to request Source Systems to enclose such text which has the delimiters as a part of the text with some other identifiers. Suppose in our case we request source to send us files with such texts within double quotes. Now the file would look like: EMPNO,ENAME,JOB,SAL,ADDRESS 7900,JAMES,CLERK,950,CHOA CHU KANG 8001,SANJAY,ANALYST,33000,BUKIT BATOK 7654,MARTIN,SALESMAN,1375,RUSSEL ST 7566,JONES,MANAGER,1050,"BLOCK 35,YEW TEE" 7844,TURNER,SALESMAN,1650,BISHAN 7698,BLAKE,Manager,3740,JURONG 7788,SC,ANALYST,3300,LAKESIDE 7370,SMITH,CLERK,2500,COMMONWEALTH 7402,ADAM,SERVICE,5500,ADAM ST
Now we have to make a small change in the source definition. Double click to edit on the Source Definition of the flat file and click on the Advanced button
And then check the type of Optional Quotes that our source system has agreed to send. In our case we would need to check the Double check box as shown below:
Save and validate corresponding Mapping and Sessions and then re-execute it. Now let us check the result:
We can also make the same change when defining the source definition at the very beginning while importing. The specification has to be given as Text qualifier in Step 2 as shown below: