Anda di halaman 1dari 15

IIR mini-bytes

Measuring Performance using Relate

Mike Pataky
Senior Product Specialist

Background
ASCAP raised an SR
The throughput for a custom population running a batch file
developed by Informatica is too slow.
Matching song titles, running batch file to relate a title and the
throughput is extremely slow at 10 000 rows about 2hours.

ASCAP have been using our products for a long


time to match song titles
They use a custom pop

Heavily customized not based on any standard pop

NOTE: Abhi has raised an Internal KB 124458 on


this subject:
HOW TO: Create a histogram of search txn durations when using Relate?

Plan
Use Relate with the s and ss parameters to get
-s: Histogram of search times
-ss: Individual times for each search (includes s)

Look for distribution of search times and single


long-running searches
Investigate actual data used in long-running
searches

The Output Histogram (-s)


relate> Search Duration Histogram
relate> 10.000 ms
111
11.10%
relate> 20.000 ms
53
16.40%
relate> 30.000 ms
51
21.50%
relate> 40.000 ms
35
25.00%
relate> 50.000 ms
31
28.10%
relate> 60.000 ms
29
31.00%
relate> 70.000 ms
29
33.90%
relate> 80.000 ms
22
36.10%
relate> 90.000 ms
18
37.90%
relate> 100.000 ms
14
39.30%
relate> 110.000 ms
9
40.20%
relate> 120.000 ms
13
41.50%
relate> 130.000 ms
9
42.40%
relate> 140.000 ms
7
43.10%
relate> 150.000 ms
9
44.00%
relate> 160.000 ms
12
45.20%
relate> 170.000 ms
13
46.50%
relate> 180.000 ms
11
47.60%
relate> 190.000 ms
12
48.80%
relate> 200.000 ms
9
49.70%
relate> 210.000 ms
12
50.90%
relate> 220.000 ms
10
51.90%
relate> 230.000 ms
20
53.90%
relate> 240.000 ms
13
55.20%
relate> 250.000 ms
15
56.70%
relate> 260.000 ms
11
57.80%
relate> 270.000 ms
8
58.60%
relate> 280.000 ms
7
59.30%
relate> 290.000 ms
3
59.60%
relate> 300.000 ms
3
59.90%
relate> 310.000 ms
2
60.10%
relate> 320.000 ms
6
60.70%
relate> 330.000 ms
5
61.20%

The Output Histogram (-s)


relate> Search Duration Histogram
relate> 10.000 ms
111
11.10%
relate> 20.000 ms
53
16.40%
relate> 30.000 ms
51
21.50%
relate> 40.000 ms
35
25.00%
relate> 50.000 ms
31
28.10%
relate> 60.000 ms
29
31.00%
relate> 70.000 ms
29
33.90%
relate> 80.000 ms
22
36.10%
relate> 90.000 ms
18
37.90%
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>

100.000
110.000
120.000
130.000
140.000
150.000
160.000
170.000
180.000
190.000
200.000
210.000
220.000
230.000
240.000
250.000
260.000
270.000
280.000
290.000
300.000
310.000
320.000
330.000

ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms

14
9
13
9
7
9
12
13
11
12
9
12
10
20
13
15
11
8
7
3
3
2
6
5

39.30%
40.20%
41.50%
42.40%
43.10%
44.00%
45.20%
46.50%
47.60%
48.80%
49.70%
50.90%
51.90%
53.90%
55.20%
56.70%
57.80%
58.60%
59.30%
59.60%
59.90%
60.10%
60.70%
61.20%

The Output Histogram (-s)


relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
..
.
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>

9910.000 ms
9920.000 ms
9930.000 ms
9940.000 ms
9950.000 ms
9960.000 ms
9970.000 ms
9980.000 ms
9990.000 ms
10000.000 ms
10010.000 ms
10690.000 ms
10700.000 ms

16470.000
17630.000
21480.000
21500.000
21510.000
21530.000
21580.000
21590.000
21600.000
21630.000
21820.000
21860.000

ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms

0
0
0
0
0
0
0
0
0
0
0
1
1

94.60%
94.60%
94.60%
94.60%
94.60%
94.60%
94.60%
94.60%
94.60%
94.60%
94.60%
94.70%
94.80%

1
1
2
1
1
1
1
1
1
1
1
1

98.80%
98.90%
99.10%
99.20%
99.30%
99.40%
99.50%
99.60%
99.70%
99.80%
99.90%
100.00%

<< The End

The Output Histogram (-s)


relate> Search Call Histogram
relate> 10.000 ms
146
relate> 20.000 ms
122
relate> 30.000 ms
87
relate> 40.000 ms
56
relate> 50.000 ms
50
relate> 60.000 ms
55
relate> 70.000 ms
44
relate> 80.000 ms
30
relate> 90.000 ms
37
relate> 100.000 ms
29
relate> 110.000 ms
21
relate> 120.000 ms
22

14.60%
26.80%
35.50%
41.10%
46.10%
51.60%
56.00%
59.00%
62.70%
65.60%
67.70%
69.90%

Not sure of difference between


Search Duration Histogram

and
Search Call Histogram

Probably one is whole search (including read and


write) the other is just the actual IIR search call

The Output Histogram (-s)


At end of second Histogram
relate> thread

0: 167 searches, Total elapsed time: 228.609000 s, Average time (excluding startup): 1368.449 ms,
Total call time: 101.241000 s, Average call time: 605.862 ms

relate> thread

1: 167 searches, Total elapsed time: 228.651000 s, Average time (excluding startup): 1368.593 ms,
Total call time: 44.203000 s, Average call time: 264.120 ms

relate> thread

2: 167 searches, Total elapsed time: 228.667000 s, Average time (excluding startup): 1368.491 ms,
Total call time: 46.533000 s, Average call time: 277.952 ms

relate> thread

3: 167 searches, Total elapsed time: 228.104000 s, Average time (excluding startup): 1364.132 ms,
Total call time: 80.326000 s, Average call time: 479.269 ms

relate> thread

4: 166 searches, Total elapsed time: 228.538000 s, Average time (excluding startup): 1373.675 ms,
Total call time: 52.920000 s, Average call time: 315.741 ms

relate> thread

5: 166 searches, Total elapsed time: 228.155000 s, Average time (excluding startup): 1370.970 ms,
Total call time: 76.140000 s, Average call time: 456.313 ms

relate> Total elapsed time: 230.454000 s in_queue length=100 out_queue length=100 Average elapsed time: 1369.051541 ms
Average call time: 399.876302 ms

Notice there are 6 threads already


Either explicitly (-n6) or automatically (based on number of CPUs)

The Output Histogram (-s)


relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>

relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>

Result Set Histogram


20 recs
567
40 recs
75
60 recs
42
80 recs
34
100 recs
26
120 recs
18
140 recs
21
160 recs
31

56.70%
64.20%
68.40%
71.80%
74.40%
76.20%
78.30%
81.40%

2020 recs
0
99.60%
2040 recs
0
99.60%
2060 recs
0
99.60%
2080 recs
4 100.00%
Reads
1000
Writes
100789
End-Time: 2012/02/01 08:42:18
Process-Time: 0:03:51.000

Some VERY large result sets


Maybe single word or common word?

Reads = 1000, Time = 3m51s


So why do 10,000 records take 2 hours?

10

The Output All searches (-ss)


relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>
relate>

Start-Time: 2012/02/01 08:38:27


Version
relate
SSACE
RS64NVE Oct 18 2011 05:28:05 9.2.0.000 510712
input /u01/app/product/ssa/matching/data/frn_inp2.dat
output /u01/app/product/ssa/matching/data/frn_ttl_out.dat
Txn
1 78.000 ms 62.000 ms
Txn
2 96.000 ms 95.000 ms
Txn
3 129.000 ms 115.000 ms
Txn
4 294.000 ms 288.000 ms
Txn
8 253.000 ms 252.000 ms
Txn
14
4.000 ms 4.000 ms
Txn
20 37.000 ms 35.000 ms
Txn
26
2.000 ms 2.000 ms
Txn
32 37.000 ms 18.000 ms
Txn
5 508.000 ms 507.000 ms
Txn
11
4.000 ms 4.000 ms
Txn
17
5.000 ms 4.000 ms
Txn
23 36.000 ms 34.000 ms
Txn
10 268.000 ms 265.000 ms
Txn
16
4.000 ms 3.000 ms
Txn
6 574.000 ms 392.000 ms
Txn
12
5.000 ms 4.000 ms
Txn
18
4.000 ms 4.000 ms
Txn
7 509.000 ms 410.000 ms
Txn
13
6.000 ms 5.000 ms
Txn
22 35.000 ms 35.000 ms
Txn
9 480.000 ms 290.000 ms
Txn
28
8.000 ms 4.000 ms
^^^^^^

Two times mean ???

Search

^^^^^

Call ??

11

To investigate individual search times


1. Copy log to a text file
2. Import data into Excel
Delimited. Specify space separator

3. Remove text columns


Now have 3 columns: Searchnum, Time1, Time2

4. Sort by Time1
Search # Time1(ms) Time2(ms)
..
474
21590
21562
497
21623
5
957
21816
83
961
21858
4156

12

Conclusions / Recommendations
Break 10K down into 1K chunks and look for
rogue transactions
Increase Output buffer for Relate
Total elapsed time: 230.454s in_queue length=100 out_queue length=100 Average elapsed time: 1369.051541ms Average call time: 399.876302ms

-n6:100:2000

Look to tune threshold scores


Examine results Are they all good results?

Find out exactly what the two figures mean


(from R&D)
13

Other cases where Relate is useful


Using Relate can help to prove whether an
observed performance issue is in IIR or not
Relate is a perfect batch search app.
If the customers search app is slower..

Running Relate with local input data


eliminates/reduces network/database traffic
It can be run with database table input using define_source
Output can be sent to a database table (though not very
useful)
If customers search app runs slower on their network..

14

15