Performance Concerns
Since the subquery is executed once per record from parent query, we should eliminate as many rows as possible before the execution of the subquery step. The efficiency of each subquery execution: do we have right index or partition pruning? For example, when a subquery has to use FTS on a table with 10K blocks, if the parent query returns 10K rows, that will be a huge number of physical IO. If the purpose is to get a value from another table, consider using JOIN.
This is the view used by the qurery in question. Direct Marketing wraps most of its business logic inside views. Plenty of correlated subqueries here, this is another direct marketing feature.
The Issue
After 11.2.0.3 upgrade (2012/09/28), this query has continuously topped high physical IO usages among all databases scanned by performance framework. For example, the single run from 09/30 22 hour to 10/01 08 hour, it read 339M blocks. The run from 09/26 22 hour only read 3.168M blocks, also completed within 2 hours. So my interest is to investigate where the 100 times of physical IO is from. AWR segment statistics shows the source of physical IO is the table SEM.AI_KEYWORD, one of the table used twice by the correlated subquery inside the view related to the query. The physical reads on this table for the same period from 09/30 22 hour to 10/01 08 hour is 100M blocks. The mismatch from 300M blocks from AWR SQL stats is another interesting thing worth some research.
1. 6 sets of PX queues, or DFO (data flow object), one for parent query, one for each subquery. 2. 6 PX COORDINATOR operations. So each subquery will start its own PX operations. 3. Plans for subqueries in SELECT list are usually listed at the top on the main query. Here from line ID 1 to 28. The main query is from 29 to 32. 4. The final COST is usually messed up and the cost from subqueries not counted in. 5. The values passed from main query to subquery usually lised in PREDICATE part as bind variables. 6. No plan change after upgrade.
Table Info
The base table of parent query, AI_ACCOUNT, has 206 rows, or 206 unique IDs. The subquery condition on AI_KEYWORD is ACCOUNT_ID = AI_ACCOUNT.ID for each row returned from parent query. AI_KEYWORD is list partitioned by ACCOUNT_ID. The total size is 1,730,120 blocks (outdated, but pretty close. dba_segments has 1,752,876 blocks). The largest partition has 85,120 blocks. In theory, for each row from AI_ACCOUNT, the two subqueries inside the view related to AI_KEYWORD will each read a single partition. So this query should use at most 2* 1,752,876 blocks (3.505M) of physical reads, plus some block numbers from other tables. That is close to the result before 11.2.0.3 upgrade (3.168M).
Some Research
Using session stats to see if any interesting statistics during realtime tracking. From previous case study, we have seen the case when a target table has full table refresh, consistent reads related to UNDO could cause larger than expected IO usages. Another case is chained or migrated rows which could also cause larger than expected IO usages. Also there is no UNDO and chained/migrated rows statistics inside session statistics during realtime tracking. Using ASH and AWR ASH: The dominant events are direct path read, which is usually FTS. AWR ASH also shows the major waits for the query before upgrade is db file scatter read, which usually can take advantages of cache.
The huge PX process count is from the sum of PX slave processes used by all runs of the subqueries.
Is It Reproducible?
Yes and No. When this database was busy and my test failed to acquire enough PX processes, it actually completed with the time close to the one run in serial. But when the database was relatively idle and the query could acquire PX processes for most of the time, the pattern could be reproduced. Here is the part of SQL report from a test query. After 17 minutes, it had read more than 10M blocks, with only one account id completed.
Summary
Be careful about the performance of correlated subquery. Use DEFAULT (or any) parallel attribute on tables with caution. This should be an Oracle bug or unwanted feature, just not sure if it exists in earlier version. Remove outdated or unwanted scheduled query jobs. The offending query is from an outdated scheduled job no application actually uses.