TWP Bidw Parallel Execution 11gr1

Oracle SQL Parallel Execution
An Oracle White Paper

June 2008
Executive Summary...............................................................................4
Introduction...........................................................................................4
Why Parallel Execution?.......................................................................6
The ultimate goal: scalability............................................................6
Shared everything – the Oracle advantage.......................................7
Fundamental Concepts of Oracle's Parallel Execution..........................8
Processing parallel SQL statements..................................................9
Query Coordinator (QC) and parallel servers..............................10
Producer/consumer model...........................................................12
Granules.......................................................................................13
Data redistribution.......................................................................14
Enabling parallel execution in Oracle.............................................19
Controlling SQL Parallel Execution in Oracle................................21
Understand your target workload................................................22
Controlling the degree of parallelism..........................................22
Controlling the usage of parallelism............................................24
Oracle SQL Parallel Execution best practices.....................................25
Start with a balanced system...........................................................26
Calibrate your configuration........................................................26
Stripe And Mirror Everything (S.A.M.E.) – use ASM................27
Set database initialization parameters for good performance..........28
Memory allocation ......................................................................28
Controlling parallel servers.........................................................29
Enabling efficient I/O throughput ...............................................30
Use parallel execution with common sense.....................................31
Don't enable parallelism for small objects...................................31
Use parallelism to achieve your goals, not to exceed them.........31
Avoid using hints.........................................................................31
Combine parallel execution with Oracle Partitioning.....................32
Ensure statistics are good enough....................................................32
Monitor parallel execution activity..................................................32
Whether or not to use parallel execution in RAC............................33
Use Database Resource Manager....................................................33
Don't try to solve hardware deficiencies with other features...........33
Don't ignore other features..............................................................34
Monitoring SQL Parallel Execution....................................................34
Oracle SQL Parallel Execution Page 2

(G)V$ parallel execution views.......................................................34
Interpreting parallel SQL execution plans.......................................35
Parallel plan without partitioning................................................35
Parallel plan with partitioning and partition-wise join................36
Oracle Enterprise Manager..............................................................38
Wait events..................................................................................38
Input/Output (I/O) monitoring.....................................................40
Parallel execution monitoring......................................................40
SQL monitoring...........................................................................41
Upgrade considerations coming from Oracle Database 9i..................43
More parallel operations..................................................................43
If on Oracle Database 9i you used hints to enable SQL parallel
execution......................................................................................44
If on Oracle Database 9i you used session settings to enable SQL
parallel execution.........................................................................44
If on Oracle Database 9i you used object level settings to enable
SQL parallel execution................................................................44
Execution plan changes...................................................................44
Changes in database defaults...........................................................45
Use Resource Manager....................................................................45
Conclusion...........................................................................................46

EXECUTIVE SUMMARY
Parallel execution is one of the fundamental database technologies that enable
organizations to manage and access tens – if not even hundreds of terabytes of
data. Without parallelism, these large databases, commonly used for data
warehouses but increasingly found in operational systems as well, would not
exist.
Parallel execution is the ability to apply multiple CPU and I/O resources to the
execution of a single database operation. While every major database vendor
today provides parallel capabilities, there remain key differences in the
architectures provided by the various vendors.
SQL parallel execution was first introduced in Oracle more than a decade ago1
and has been enriched and improved since. This paper discusses the parallel
execution architecture of Oracle Database 11g and shows its superiority over
alternative architectures for real-world applications. This paper also touches on
how to control and monitor parallel execution; lastly, it gives an insight into on
upgrade considerations when migrating from earlier versions of Oracle.
While the focus of this paper is on Oracle Database 11g, the fundamental
concepts are also applicable to earlier versions of Oracle.
INTRODUCTION
Databases today, irrespective of whether they are data warehouses, operational
data stores, or OLTP systems, contain a wealth of information. However,
finding and presenting the right information in a timely fashion can be a
challenge because of the vast quantity of data involved.
Parallel execution is the capability that addresses this challenge. Using
parallelism, terabytes of data can be processed in minutes or even less, not hours
or days. Parallel execution uses multiple processes to accomplish a single task –
to complete a SQL statement in the case of SQL parallel execution. The more
effectively the database software can leverage all hardware resources – multiple
cores, multiple I/O channels, or even multiple nodes in a cluster - the more
efficiently queries and other database operations will be processed.
1 Parallel execution was first introduced in Oracle Version 7.3 in 1996

Examples of resource-intensive database operations include:
– Large (long-running) queries: for example data warehouse analysis
comparing one year's results with the results of the year prior
– Building indexes on large tables
– Gathering statistics in a large database
– Loading a large amount of data into a database
– Taking a database backup
Large data warehouses should always use parallel execution to achieve good
performance. Specific operations in OLTP applications, such as batch
operations, can also significantly benefit from parallel execution. Oracle SQL
parallel execution requires Oracle Database 11g Enterprise Edition.
The paper covers four main topics:
– The first section discusses the fundamental concepts of parallel processing
of the Oracle database; the reader will become familiar with Oracle's
parallel architecture, learn Oracle-specific terminology around parallel
execution, and understand the basics of how to control and identify parallel
SQL processing.
– The second section focuses on best practices around parallel execution to
ensure the most optimal usage of your hardware resources
– The third section provides an insight into how to monitor an environment
using parallel execution, leveraging either SQL or Oracle Enterprise
Manager Database/Grid Control.
– The fourth section focuses on upgrade considerations when migrating an
environment from an earlier release of Oracle to Oracle Database 11g.

WHY PARALLEL EXECUTION?
The ultimate goal: scalability

Imagine that your task is to count the number of cars in your street.
– Scenario 1: You can go through the street by yourself and count the number
of cars.
– Scenario 2: If your friend is available then the two of you could start on
opposite ends of the street, count cars until you meet each other and add the
results of both counts to complete the task.
Assuming your friend counts equally fast as you do, you expect to complete the
task of counting all cars in a street in approximately half the time compared to
when you perform the job all by yourself. If this is the case then your operations
scales linearly. 2x the number of resources halves the total processing time.
The database is not very different from the counting cars example. If you
allocate twice the number of resources and achieve a processing time that is half
of what it was with the original amount of resources, then the operation scales
linearly. Figure 1 Below shows graphically how the processing time decreases
for a linearly scalable operation.
1x 360
400
2x 180
3x 120
4x 90
350
5x 72
6x 60
7x 51.43
300
8x 45
9x 40
10x 36
Relative processing time
250
200
150
100
50
0
1x 2x 3x 4x 5x 6x 7x 8x 9x 10x
Resources in units of x
Figure 1: Processing time as a function of resources for linear scalability.
The graph does not look linear to you, right? Look again: it shows the absolute
processing time, not a relative speedup factor. For example, using 2x the
resources reduces the processing time from 360 to 180, and from 2x to 4x down
to 90, both cases of linear scalability. It's just that the absolute performance gain

is decreasing with higher number of resources, but we will come back to this in
the best practices section.
Now imagine your friend gets tired easily and has to rest regularly throughout
the job. Of course the total amount of time it takes to count the total number of
cars reduces, but doubling the resources does not half the processing time.
Maybe you spend two thirds of the original processing time to complete the task.
In this case the operation does not scale as well: doubling the resources does not
give the expected linear reduction in processing time.
In a database there are multiple components involved in processing a query,
each having its own maximum processing power. Most notably CPUs, memory
and Input/Output (I/O) all collaborate together. For database processing you
may experience a lack of scalability if you don't allocate resources in the correct
quantities across the various components. For example, if you add CPU
resources but you don't add I/O resources then the CPUs may not be able to
retrieve the data fast enough to keep processing at full speed.
Shared everything – the Oracle advantage

Traditionally, two approaches have been used for the implementation of parallel
execution in database systems. The main differentiation is whether or not the
physical data layout is used as a base – and static pre-requisite – for dividing,
thus parallelizing, the work.
These fundamental approaches are known as shared everything architecture
and shared nothing architecture.
Figure 2: Shared everything versus shared nothing
In a shared nothing system CPU cores are solely responsible for individual data
sets and the only way to access a specific piece of data you have to use the CPU
core that owns this subset of data2; such systems are are also commonly known
2 Some implementations allow a static small number of cores as smallest unit; for the sake
of simplicity we will discuss them as one core, the architectural trade-offs are identical

as MPP (Massively Parallel Processing) systems. In order to achieve a good
workload distribution MPP systems have to use a hash algorithm to distribute
(partition) data evenly across available CPU cores. As a result MPP systems
introduce mandatory, fixed parallelism in their systems in order to perform
operations that involve table scans; the fixed parallelism completely relies on a
fixed static data partitioning at database or object creation time. Most non-
Oracle data warehouse systems are MPP systems.
Thanks to Oracle's shared everything architecture the Oracle Database does not
require any pre-defined data partitioning to enable parallelism. Oracle can
parallelize almost every operation, independent of the underlying data
distribution. If, however, the data has been pre-partitioned (using Oracle
Partitioning), Oracle can use the same optimizations and algorithms shared
nothing vendors claim.
Oracle's shared everything architecture enables flexible parallel execution and
high concurrency without overloading the system, using a superset of parallel
execution capabilities over shared nothing vendors.

FUNDAMENTAL CONCEPTS OF ORACLE'S PARALLEL EXECUTION
The Oracle Database provides functionality to perform a complex task in
parallel, without manual intervention. Operations that can be executed in parallel
include:
– SQL loader and SQL-based data loads
– Queries
– RMAN backups
– Index builds
– Gathering statistics
– And more
This paper focuses on SQL parallel execution only, which consists of parallel
query, parallel DML (Data Manipulation Language) and parallel DDL (Data
Dictionary Language). While the paper focuses on Oracle Database 11g, the
information in this paper also applies to Oracle Database 10g and higher, unless
explicitly stated.
Processing parallel SQL statements

When you execute a SQL statement in the Oracle Database it is decomposed
into individual steps (a.k.a. rowsources), identified as separate lines in an
execution plan. Below is an example of a simple serial SQL statement and its
execution plan. The statement returns the total number of customers in the
CUSTOMERS table:
select count(*) from customers c;
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
| 1 | SORT AGGREGATE | |
| 2 | TABLE ACCESS FULL| CUSTOMERS |
----------------------------------------
Figure 3: customer count, serial plan

A serial example, showing all customer purchase information is shown below:
select c.name, s.purchase_date, s.amount
from customers c, sales s
where s.customer_id = c.id ;
----------------------------------------
| Id | Operation | Name |
----------------------------------------
| 0 | SELECT STATEMENT | |
|* 1 | HASH JOIN | |
| 2 | TABLE ACCESS FULL| CUSTOMERS |
| 3 | TABLE ACCESS FULL| SALES |
----------------------------------------
Figure 4: customer purchase information, serial plan
If you execute a statement in parallel (via mechanisms described later), the

Oracle Database will parallelize as many of the individual steps in the execution
plan as possible and reflects this in the execution plan. The two plans shown
above will change as follows:
-------------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | |
| 1 | SORT AGGREGATE | | | | |
| 2 | PX COORDINATOR | | | | |
| 3 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | Q1,00 | PCWP | |
| 5 | PX BLOCK ITERATOR | | Q1,00 | PCWC | |
| 6 | TABLE ACCESS FULL| CUSTOMERS | Q1,00 | PCWP | |
-------------------------------------------------------------------------------
Figure 5: customer count, parallel plan
----------------------------------------------------------------------------------------
| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------
| 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | P->S | QC (RAND) |
| 3 | HASH JOIN | | Q1,01 | PCWP | |
| 4 | PX RECEIVE | | Q1,01 | PCWP | |
| 5 | PX SEND BROADCAST | :TQ10000 | Q1,00 | P->P | BROADCAST |
| 7 | TABLE ACCESS FULL | CUSTOMERS | Q1,00 | PCWP | |
| 9 | TABLE ACCESS FULL | SALES | Q1,01 | PCWP | |
-----------------------------------------------------------------------------------------
Figure 6: customer purchase information, parallel plan

These plans look quite a bit different than before, mainly because we are having
additional “logistical” processing steps due to the parallel processing that we did
not have before3.
SQL parallel execution in the Oracle database is based on a few fundamental
concepts. The following section discusses these concepts that help you
understand the parallel execution setup in your database and read the basics of
parallel SQL execution plans.
Query Coordinator (QC) and parallel servers
SQL parallel execution in the Oracle Database is based on the principles of a

coordinator (often called the Query Coordinator – QC for short) and parallel
servers. The QC is the session that initiates the parallel SQL statement and the
parallel servers are the individual sessions that perform work in parallel. The QC
distributes the work to the parallel servers and may have to perform a minimal –
mostly logistical – portion of the work that cannot be executed in parallel. For
example a parallel query with a SUM() operation requires adding the individual
sub-totals calculated by each parallel server.
The QC is easily identified in the parallel execution plans above as 'PX
COORDINATOR' (for example ID 1 in Figure 6 shown above). The process
acting as the QC of a parallel SQL operation is the actual user session process
itself.
The parallel servers are taken from a pool of globally available parallel server
processes and assigned to a given operation (the setup is discussed in a later
section). All the work shown in a parallel plan BELOW the QC in our sample
parallel plans (Figure 5, Figure 6) is done by the parallel servers.
Parallel server processes can be easily identified on the OS level, for example on
Linux they are the oracle processes ORA_P***:
oracle 23473 1 0 17:46 ? 00:00:00 ora_p000_linux111
...
Figure 7: parallel server processes seen on the OS level using 'ps -ef'
3 Parallel plans will look different in versions prior to Oracle Database 10g.

Going back to the example of counting the cars, there would be a third person –
the QC - telling you and your friend – the two parallel servers - to go ahead and
count the cars; this is equivalent to the operation with the ID 2 in Figure 5,
illustrated in Figure 8.
– You can do exactly the same on the road that is being done internally in the
database with the SQL and execution plan shown in Figure 8: You and
your friend will go ahead and count only the cars on your own side; this is
equivalent to the operations with the ID 4, ID 5, and ID 6, where ID 5 is the
equivalent to tell each one of you to only count the cars on your side of the
road (details to follow in the granule section).
– Finally, each of you tells the third person your individual subtotals (ID 3)
and he would add them up to the final result (ID 1). This is the hand-over
from the parallel servers (processes doing the actual work) to the QC for
final “assembly” of the result for returning it to the user process.
QC gets the subtotals and adds them up
Parallel servers retrieve subtotals
Figure 8: QC and parallel servers
Producer/consumer model
Continuing with our car counting example, imagine the job is to count the total
number of cars per car color you see. Well, if you and your friend are going to
cover one side of the road each, each one of you potentially sees the same colors
and gets a subtotal for the colors, but not the complete result for the street. You
could go ahead, memorize all this information and tell it back to the third person
(the “person in charge”), but this poor individual then has to sum up all of the
results by himself – what if all cars in the street had a different color? The third
person would redo exactly the same work as you and your friend. To parallelize

the counting, you ask two more friends to help you out4. They both walk in the
middle of the road, one of them taking count of all dark colors, the other one of
all bright colors (assuming this “car color separation” is approximately splitting
the information in half). Whenever you count a new car, you tell the person that
is in charge of this color about the new encounter – you produce the
information, redistribute it based on the color information, and the color
counter is consuming the information. At the end, both color counting friends
tell their result the person in charge and you're done; we had two sets, each with
two friends doing a part of the job, working hand in hand.
That's how the database works: In order to execute a statement in parallel
efficiently sets of parallel servers work in pairs: one set is producing rows
(producer) and one set is consuming the rows (consumer).
For example for the parallel join between the SALES and CUSTOMERS tables,
rows have to be redistributed based on the join key to make sure that matching
join keys from both tables are sent to the same parallel server process doing the
join. In this example one set of parallel servers reads and sends the data from
table CUSTOMERS (producer) and another set receives the data (consumer) and
joins it with table SALES, as shown in Figure 9.
Slave set 2 “consumes” the records Slave set 1 “produces” rows from
and joins it with table sales table customers
Figure 9: Producer and Consumer
Operations (rowsources) that are processed by the same set of parallel servers
can be identified in an execution plan by looking in the 'TQ' column. As
shown in Figure 9, the first slave set (Q1,00) is reading table CUSTOMERS in
parallel and producing rows that are sent to slave set 2 (Q1,01) that consumes
these records and joins it with table SALES. Whenever data is distributed from
4 Note that the number of additional friends is not related to the number of distinct car
colors, but matches exactly the number of people that are counting cars. We want to use
our additional friends in the most optimal manner and - assuming that all “car scanners”
have equally distributed incremental results on a continuous base - having as many “car
color counters” keeps them continuously busy as well; using more friends to count the car
colors would leave all three of them without work for 30% of their time (on average).

a producers to consumers you will also see an entry of the form :TQxxxxx
(Table Queue x) in the 'NAME' column. Please disregard the content of the
other columns for now.
This has a very important consequence for the number of parallel servers that
are spawned for a given parallel operation: the producer/consumer model
expects two sets of parallel servers (a.k.a. slave set) for a parallel operation, so
the number of parallel server processes is twice the number of the requested
Degree Of Parallelism (DOP, the number of parallel servers working on an
individual task). For example, if the parallel join in Figure 9 runs with DOP 4,
then 8 parallel server processes will be used for this statement.
The only case when parallel servers do not work in pairs is if the statement is so
basic that one set of parallel servers can complete the entire statement in
parallel. For example select count(*) from customers requires only
one parallel server set (see Figure 5).
Granules
A granule is the smallest unit of work when accessing data. Oracle Database
uses a shared everything architecture, which from a storage perspective means
that any CPU core in a configuration can access any piece of data; this is the
most fundamental architectural difference between Oracle and all other major
database products on the market. Unlike all other systems, Oracle can – and will
- choose this smallest unit of work solely dependent on a query's requirements.
'Block Iterator' is the operation

name for block-based granules
Figure 10: Block-based granule in the customer count example.
The basic mechanism the Oracle Database uses to distribute work for parallel
execution is block ranges on disk – so-called block-based granules. This
methodology is unique to Oracle and is not dependent on whether the
underlying objects have been partitioned. Access to the underlying objects is
divided into a large number of granules which are given out to parallel servers to
work on (and when a parallel server finishes the work for one granule the next
one is given out). The number of granules is always much higher than the

requested DOP in order to get a good distribution of the workload between
parallel servers. As the first parallel step of the parallel processing, the operation
'PX BLOCK ITERATOR ' shown in Figure 11 literally is an iteration over all
generated block range granules.
Although block-based granules are the basis to enable parallel execution for
almost any operation, there are some operations that can benefit from the
underlying data structure and leverage individual partitions as granules of work.
With partition-based granules only one parallel server performs the work for
all data in a single partition. The Oracle Optimizer considers partition-based
granules if the number of (sub)partitions accessed in the operation is at least
equal to the DOP (and ideally much higher if there may be skew in the sizes of
the individual (sub)partitions). The most common operations that use partition-
based granules are partition-wise joins which will be discussed later.
Based on the SQL statement and the degree of parallelism, the Oracle Database
decides whether block-based on partition-based granules lead to a more optimal
execution; you can not influence this behavior.
In the car counting example, one side of the street – or even a block of a long
street - could be considered the equivalent of a block-based granule. The
existing data volume – the street – is subdivided into physical pieces on which
the parallel servers – you and your friend – are working on independently.
Data redistribution
Parallel operations – except for the most basic ones – typically require data
redistribution. Data redistribution is required in order to perform operations such
as parallel sorts, aggregations and joins. At the block-granule level there is no
notion of knowledge about the actual data content of an individual granule. Data
has to be redistributed as soon as a subsequent operation relies on the actual
content. Remember the last car example? The car color mattered, but you don't
know – or even control – what color car is parked where on the street. You
redistributed the information about the amount of cars per color to the additional
two friends based on their color responsibility, enabling them to do the total
counting for the colors they're in charge of.
Data redistribution takes place between individual parallel servers either within
a single machine, or, in the case of parallel execution across multiple machines
in a Real Application Clusters (RAC) database, between parallel servers on
multiple machines. Of course in the latter case interconnect communication is
used for the data redistribution while shared-memory is used for the former.
Data redistribution is not unique to the Oracle Database. In fact, this is one of
the most fundamental principles of parallel processing, being used by every
product that provides parallel capabilities. The fundamental difference and
advantage of Oracle's capabilities, however, is that parallel data access
(discussed in the granules section earlier) and therefore the necessary data

redistribution are not constrained by any given hardware architecture or database
setup.
Shared-nothing (MPP) database systems also require data redistribution unless
operations can take advantage of partition-wise joins (as explained further down
in this section). In shared-nothing systems parallel operations that cannot benefit
from a partition-wise join – such as a simple three-way table join on two
different join keys - always make heavy use of interconnect communication.
Because the Oracle Database also enables parallel execution within the context
of a node, parallel operations do not always have to use interconnect
communication, thus avoiding a potential bottleneck at the interconnect channel.
The following section will explain Oracle's data redistribution capabilities using
the simple example of table joins without any secondary data structures, such as
indexes or materialized views.
Serial join
In a serial join a single session reads both tables and performs the join. In this
example we assume two large tables CUSTOMERS and SALES are involved in
the join.
The database uses full table scans to access both tables. For a serial join the
single serial session (red arrows) can perform the full join because all matching
values from the CUSTOMERS table are read by one process. Figure 11 depicts
the serial join5.
Figure 11: Serial join based on two full table scans.
5 Please note that the figures in this section represent logical diagrams to explain data
redistribution. In an actual database environment data would typically be striped across
multiple physical disks, accessible to any parallel server. This complexity has deliberately
been left out from the images.

Parallel joins
Processing the same simple join in parallel, a redistribution of rows will become
necessary. Parallel servers scan parts of either table based on block ranges and in
order to complete the join, rows have to be distributed between parallel servers.
Figure 12 depicts the data redistribution for a parallel join at a DOP 2,
represented by the green and red arrow respectively. Both tables are read in
parallel by both the red and green process (using block-range granules) and then
each parallel server has to redistribute its result set based on the join key to the
subsequent parallel join operator.
Figure 12: Data redistribution for a simple parallel join.
There are many data redistribution methods. The following 5 are the most
common ones:
– HASH: Hash redistribution is very common in parallel execution in order to
achieve an equal distribution of work for individual parallel servers based
on a hash distribution. Hash (re)distribution is the basic parallel execution
enabling mechanism for most data warehouse database system, most
notably MPP systems.
– BROADCAST: Broadcast redistribution happens when one of the two
result sets in a join operation is much smaller than the other result set.
Instead-of redistributing rows from both result sets the database sends the
smaller result set to all parallel servers in order to guarantee the individual
servers are able to complete their join operation. The small result set may be
produced in serial or in parallel.
– RANGE: Range redistribution is generally used for parallel sort operations.
Individual parallel servers work on data ranges so that the QC does not have
to do any sorting but only to present the individual parallel server results in
the correct order.

– KEY: Key redistribution ensures result sets for individual key values to be
clumped together. This is an optimization that is primarily used for partial
partition-wise joins (see further down) to ensure only one side in the join
has to be redistributed.
– ROUND ROBIN: Round-robin data redistribution can be the final
redistribution operation before sending data to the requesting process. It can
also be used in an early stage of a query when no redistribution constraints
are required.
As a variation on the data redistribution methods you may see a LOCAL suffix
in a parallel execution plan on a Real Application Clusters (RAC) database.
LOCAL redistribution is an optimization in RAC to minimize interconnect
traffic for inter-node parallel queries. For example you may see a HASH
LOCAL redistribution in an execution plan indicating that the row set is
produced on the local node and only sent to the parallel servers on that node.
HASH redistribution on join column
HASH redistribution on join column
Figure 13: Data redistribution for a simple parallel join using a HASH
redistribution.
Data redistribution is shown in the SQL execution plan in the 'PQ Distrib'
column. The execution plan for the simple parallel join illustrated in Figure 13.
Parallel partition-wise joins
If at least one of the tables accessed in the join has been partitioned on the join
key the database may decide to use a partition-wise join. If both tables are equi-
partitioned on the join key the database may use a full partition-wise join.
Otherwise a partial partition-wise join may be used in which one of the tables is
dynamically partitioned in memory followed by a full partition-wise join.
A partition-wise join does not require any data redistribution because individual
parallel servers will work on the equivalent partitions of both joined tables.

Figure 14: Full partition-wise joins do not require data redistribution.
As shown in Figure 14, the red parallel process reads data partition one of the
CUSTOMERS table AND data partition one of the SALES table; the equi-
partitioning of both tables on the join key guarantees that there will no matching
rows for the join outside of these two partitions. The red parallel process will
always be able to complete the full join by reading just these matching
partitions. The same is true the green parallel server process, too, and for any
pair of partitions of these two tables. Note that partition-wise joins use partition-
based granules rather than block-based granules.
The partition-wise join is the fundamental enabler for shared nothing systems.
Shared nothing systems typically scale well as long as they can take advantage
of partition-wise joins. As a result, the choice of partitioning (distribution) in a
shared nothing system is critical as well as the access path to the tables.
Operations that do not use partition-wise operations in an MPP system often do
not scale well.
Enabling parallel execution in Oracle

Consider the following example. Your database stores historical sales data and
customer data. Following are the relevant table definitions:
SQL> desc customers
Name Null? Type
----------------- -------- ------------
ID NOT NULL NUMBER(38)
NAME NOT NULL VARCHAR2(60)
YEAR_OF_BIRTH NUMBER(38)
EMAIL_ADDRESS VARCHAR2(50)
STREET_NUMBER VARCHAR2(10)
STREET_NAME VARCHAR2(60)
CITY VARCHAR2(60)
STATE_PROVINCE VARCHAR2(40)
ZIP_CODE VARCHAR2(10)
COUNTRY NOT NULL VARCHAR2(40)

SQL> desc sales
Name Null? Type
----------------- -------- ------------
PURCHASE_DATE NOT NULL DATE
ITEM_ID NOT NULL NUMBER(38)
CUSTOMER_ID NOT NULL NUMBER(38)
STORE_ID NUMBER(38)
QUANTITY NUMBER(38)
AMOUNT NOT NULL NUMBER(7,2)
TAX NUMBER(7,2)
The tables are initially not partitioned and there are no indexes on the tables.
You want to know the total revenue for the last two months of 2007 in the
United States, by state. The following query retrieves this result:
select c.state_province
, sum(s.amount) revenue
from customers c
, sales s
where s.customer_id = c.id
and s.purchase_date
between to_date('01-NOV-2007','DD-MON-YYYY')
and to_date('31-DEC-2007','DD-MON-YYYY')
and c.country = 'United States of America'
group by c.state_province
/
You run the query without enabling parallel execution and let's say it takes 10
minutes to execute the query.
The end user who runs the query expects a faster response time (less than 3
minutes) and one way to achieve this, assuming there are surplus resources
available, is to execute in parallel.
By default the Oracle Database is configured to support parallel execution
out-of-the-box. The most relevant database initialization parameters are:
– parallel_max_servers: the maximum number of parallel servers that
can be started by the database instance. In order to execute an operation in
parallel, parallel servers must be available (i.e. not in use by another parallel
operation). By default the value for parallel_max_servers is derived
from other database settings and will be discussed later in this paper. Going
back to the example of counting cars and using help from friends:
parallel_max_servers is the maximum number of friends that you
can call for help.
– parallel_min_servers: the minimum number of parallel servers that
are always started when the database instance is running.
parallel_min_servers enables you to avoid any delay in the

execution of a parallel operation for the startup of parallel servers.
Again going back to the example of counting cars: parallel_min_servers is
the number of friends that are there with you that you don't have to call in
order to start the job of counting the cars.
Verify that parallel execution is enabled for your database instance (connect to
the database as a DBA or SYSDBA):
SQL> show parameter parallel_max_servers
NAME TYPE VALUE

--------------------- ----------- --------
parallel_max_servers integer 80
There are three ways to enable a query to execute in parallel.

1) Enable the table(s) for parallel execution:
alter table sales parallel ;
alter table customers parallel ;
Use this method if you generally want to execute operations accessing
these tables in parallel.
2) Use a parallel hint.
select /*+ parallel(c) parallel(s) */
c.state_province
from customers c
, sales s
and s.purchase_date
between to_date('01-JAN-2007','DD-MON-YYYY')
and c.country = 'United States'
group by c.state_province
/
This method is mainly useful for testing purposes, or if you have a
particular statement or few statements that you want to execute in
parallel, but most statements run in serial.
3) Use alter session force parallel query ;
This method is useful if your application always runs in serial except
for this particular session that you want to execute in parallel. A batch
operation in an OLTP application may fall into this category.

All of these three methods enable the so-called DEFAULT parallel capabilities
where Oracle chooses the DOP. By default, Oracle will spawn two parallel
server processes per each core on most systems, so if you run your query on an
environment with 2 CPU cores the DOP will be 4. The original query which
initially took 10 minutes to complete should complete within less than 3 minutes
at DOP of 4, assuming sufficient resources are available.
Controlling SQL Parallel Execution in Oracle

Now that you know how to enable parallel execution and you know the concepts
behind Oracle's parallel execution model, you may wonder where's the limit of
parallel processing. Obviously, you can use more resources to speed up response
times, but if too many operations take this approach, the system may soon be
starved for resources - you can't use more resources than you have.
Oracle Database has built-in limits and settings to prevent system overload and
ensure the database remains available to applications. Database initialization
parameter parallel_max_servers is a good example of one of these
limits. All processes in the database require resources, including memory and
while active, CPU and I/O resources. The system will not allocate more parallel
servers to users than the setting of this initialization parameter.
Understand your target workload
Parallel execution can enable a single operation to utilize all system resources.
While this may not be a problem in certain scenarios there are many cases in
which this would not be desirable. Consider the workload to which you want to
apply parallel execution to get optimum use of the system while satisfying your
requirements.
Single-user workload
The single-user workload is a workload in which there is a single operation

executing on the database and the objective is for this operation to finish as fast
as possible. An example for this type of workload is a large overnight batch load
that populates database tables or gathers statistics. Also benchmark situations
often measure maximum performance in a single-user workload.
In a single-user workload all resources can be allocated to improve performance
for the single operation.
Multi-user concurrent workload
Most production environments have a multi-user workload. Users concurrently

execute queries – often ad-hoc type queries – and/or concurrent data load
operations take place.
In a multi-user environment, workload resources must be divided amongst
concurrent operations. End-users will expect a fair amount of resources to be
allocated to their operation in order to get predictable response times.

Controlling the degree of parallelism
Oracle's parallel execution framework enables you to either explicitly chose - or

even enforce - a specific degree of parallelism (DOP) or to rely on Oracle to
control it.
DEFAULT parallelism
In the earlier example of our parallel query we used so-called DEFAULT

parallelism. DEFAULT parallelism uses a formula to determine the DOP based
on the system configuration6 (typically the DOP is 2 x [number of CPU cores];
in a cluster configuration 2 x [number of CPU cores] x [number of nodes]). So,
on a four node cluster with each node having 8 CPU cores, the default DOP
would be 2 x 8 x 4 = 64.
The DEFAULT algorithm was designed to use maximum resources assuming the
operation will finish faster if you use more resources. DEFAULT parallelism
targets the single-user workload. In a multi-user environment DEFAULT
parallelism will rapidly starve system resources leaving no available resources
for other users to execute in parallel.
Fixed Degree Of Parallelism (DOP)
Unlike the DEFAULT parallelism, a specific DOP can be requested from the
Oracle database. For example, you can set a fixed DOP at a table or index level:
alter table customers parallel 8 ;
alter table sales parallel 16 ;
In this case queries accessing just the customers table use a requested DOP of 8,
and queries accessing the sales table request a DOP of 16. A query accessing
both the sales and the customers table will be processed with a DOP of 16 and
potentially allocate 32 parallel servers (producer/consumer); whenever different
DOPs are specified, Oracle is using the higher DOP7.
Adaptive parallelism
When using Oracle's adaptive parallelism capabilities, the database will use an
algorithm at SQL execution time to determine whether a parallel operation
should receive the requested DOP or be throttled down to a lower DOP.
In a system that makes aggressive use of parallel execution by using a high DOP
the adaptive algorithm will throttle down with only few operations running in
parallel. While the algorithm will still ensure optimal resource utilization, users
may experience inconsistent response times. Using solely the adaptive
6 We are oversimplifying here for the purpose of an easy explanation. The multiplication
factor of two is derived by the init.ora parameter parallel_threads_per_cpu, an OS specific
parameter that is set to two on most platforms
7 Some statements do not fall under this rule, such as a parallel CREATE TABLE AS
SELECT; a discussion of these exceptions is beyond the scope of this paper.

parallelism capabilities in an environment that requires deterministic response
times is not advised.
Adaptive parallelism is controlled through the database initialization parameter
parallel_adaptive_multi_user.
Guaranteeing a minimal DOP
Once a SQL statement starts execution at a certain DOP it will not change the
DOP throughout its execution. However if you start at a low DOP – either as a
result of adaptive parallel execution or because there were simply not enough
parallel servers available - it may take a very long time to complete the
execution of the SQL statement. If the completion of a statement is time-critical
then you may want to either guarantee a minimal DOP or not execute at all (and
maybe warn the DBA or programmatically try again later when the system is
less loaded).
To guarantee a minimal DOP, use the initialization parameter
parallel_min_percent. This parameter controls the minimal percentage
of parallel server processes that must be available to start the operation; it
defaults to 0, meaning that Oracle will always execute the statement,
irrespective of the number of available parallel server processes.
For example, if you want to ensure to get at least 50% of the requested parallel
server processes for a statement:
SQL> alter session set parallel_min_percent=50 ;
SQL> select /*+ parallel(s,128) */ count(*)

from sales s ;
select /*+ parallel(s,128) */ count(*) from sales s

*
ERROR at line 1:
ORA-12827: insufficient parallel query slaves
available
If there are insufficient parallel query servers available – in this example less
than 64 parallel servers for a simple SQL statement (or less than 128 slaves for a
more complex operation, involving producers and consumers) - you will see
ORA-12827 and the statement will not execute. You can capture this error in
your code and retry later.
Controlling the usage of parallelism
Depending on your expected workload pattern you might want to ensure that
Oracle's parallel execution capabilities are used most optimally for your
environment. This implies two basic tasks, (a) to control the usage of parallelism
and (b) to ensure that the system does not get overloaded while adhering to the

potential different priorities for different user classes in the case of mixed
workload environments.
Whether or not a SQL operation is running parallel and what DOP is chosen is
determined based on the following rules, in the following priority order:
– Parallelism was requested in a hint:
select /*+ parallel(s,16) */ count(*)
from sales s ;
The requested DOP for this query is 16.
– The DOP was requested in an alter session command. For example:
alter session force parallel query parallel;
The requested DOP for any operation in that session will be DEFAULT
parallelism.
– Tables and/or indexes in the select statement accessed have the parallel
degree setting at the object level. If objects have a DEFAULT setting then
the database determines the DOP value that belongs to DEFAULT. For a
query that processes objects with different DOP settings, the object with the
highest parallel degree setting accessed in the query determines the
requested DOP.
Using Oracle Database Resource Manager
Oracle Database Resource Manager (DBRM) enables you to group users based
on characteristics, and restrict parallel execution for some users. DBRM is the
ultimate last instance in determining the maximum degree of parallelism, and no
user in a resource group (using a specific resource plan) will ever be able to run
with a higher DOP than the resource group's maximum. For example, if your
Figure 15: Restricting parallel execution in Oracle Database Control.

resource plan has a policy of using a maximum DOP of 4 and you request a
DOP of 16 via a hint, your SQL will run with a DOP of 4. Figure 15 shows an
Enterprise Manager Database Control screenshot restricting parallel execution to
a DOP of 4 for a resource plan named 'DW_USERS'.
Furthermore, DBRM can control the maximum number of active sessions for a
given resource group. So for the shown resource plan 'DW_USERS' a maximum
of 4 sessions are allowed to be active, resulting in a total maximum resource
consumption of 4 (sessions) x 4 (DOP) x 2 (slave sets) = 32 parallel server
processes.
ORACLE SQL PARALLEL EXECUTION BEST PRACTICES

This section lists best practices that you should bear in mind when you consider
using SQL parallel execution, or that you may revisit if you already use it to
ensure you get the optimum performance out of your system.
Start with a balanced system

A good foundation is the basis for successful use of SQL parallel execution. In
the case of SQL parallel execution the foundation consists of the hardware
configuration that you use to run your database. All system resources, CPUs, I/O
and memory, should be able to support the use of SQL parallel execution. If
you use Real Application Clusters (RAC) then you have to also size the
interconnect appropriately. Parallel execution is very I/O intensive by nature, so
every built-in imbalance in a system may have a bigger and more visible impact
on the overall scalability of the hardware platform than for less I/O intensive
workloads.
For example, if your system is intended to run an I/O intensive workload, then
plan the system conservatively, assuming that every CPU core can process
approximately 200 MB/s sustained. For example, if you want to keep 4 CPU
cores busy in such a configuration, then the entire I/O subsystem should be able
to support 800 MB/s sustained for optimum performance. Note that I/O
throughput requirement has to be guaranteed throughout the whole hardware
system: the Host Bus Adapters (HBAs) in the compute nodes, any switches you
use, and the I/O subsystem, incl. storage controllers and physical spindles. The
weakest link is going to limit the performance and scalability of operations in
this configuration. If you rely on storage shared with other applications then the
throughput performance for your database is not guaranteed and you will likely
see inconsistent response times for your parallel operations.
SQL parallel execution is also a heavy consumer of memory. Per CPU core you
should have at least 4 GB of RAM.
If you use RAC and you use inter-node parallel query – parallel operations that
spawn multiple nodes - then you have to size the interconnect appropriately;
some of the data redistribution is going to happen over the interconnect, making

it as crucial as the overall I/O capabilities. Oracle has even more optimizations
to minimize interconnect traffic as classical shared nothing architectures – such
as choosing to run a parallel operation or a subset of it within a single node - but
in the worst case the throughput required on the interconnect for good scalability
is at least equal to the throughput going to disk. Use (multiple) 10 GigE or
Infiniband interconnect if you plan to use inter-node parallel query.
Work with your hardware vendor and Oracle representative to ensure you start
with a good foundation.
Calibrate your configuration
You should set a baseline for the performance you expect out of the Oracle
Database. The Oracle Database software will not achieve better performance
than the hardware configuration can achieve. Hence you should know what the
operating system can achieve before you introduce the Oracle software, and use
it as a baseline if later you think the performance is insufficient.
SQL parallel execution is typically very I/O intensive, so you want to measure
the maximum I/O performance you can achieve without the Oracle Database.
You can use ORION8 (ORacle I/O Number calibration tool, a free Oracle-
provided utility designed to simulate Oracle I/O workloads) or basic operation
system utilities (such as the Linux/Unix dd command) to measure the I/O
performance for your system. Make sure to calibrate the configuration in the
way Oracle will use it (how the data will be laid out across storage devices) and
use a calibration workload that resembles the type of workload the Oracle
Database will perform when running SQL statements in parallel (typically large
random I/Os).
Stripe And Mirror Everything (S.A.M.E.) – use ASM
Conservatively, any physical disk may be able to sustain 20-30 MB/s for large
random reads. Considering that you need about 200 MB/s to keep a single CPU
core busy (i.e. 8 - 10 physical disks), you should realize that you need a lot of
physical spindles to get good performance for database operations running in
parallel. Do not use a single 1 TB disk for your 800 GB database, because you
will not get good performance running operations in parallel against the
database; this might work well for your single-user home video archive, but not
for a database leveraging parallel query with multiple users.
The way to utilize multiple physical spindles with Oracle's shared everything
architecture is to stripe across multiple devices. For high availability you should
use a RAID configuration (storage-based RAID1 or RAID5 are commonly
used) to ensure you can survive the failure of a single disk. For many years
Oracle has recommended its users to use the Stripe And Mirror Everything
(S.A.M.E.) methodology using a stripe size of 1 MB. Such a configuration is
8 Orion is downloadable from the Oracle Technology Network,

http://www.oracle.com/technology/software/tech/orion/index.html

relatively simple to set up, providing good performance for pretty much any
workload (OLTP, reporting, data warehouse).
Starting with Oracle Database 10g you can use Oracle's Automatic Storage
Manager (ASM), included with the database. ASM can be used to store Oracle
Database files (including online redo log files and archivelog files). ASM will
stripe across all devices you present to it in the context of a disk group. Most
importantly, if and when you expand your configuration, ASM will
automatically re-balance (re-stripe) the data across all devices, so that you will
always benefit from all storage devices in your configuration, without running
into hot spots in the storage configuration. ASM implements Oracle's S.A.M.E.
methodology and automatically maintains it as devices are added or removed.
ASM can also be used to mirror data or it can be used with hardware RAID
configurations.
Use ASM if you are using Oracle Database 10g or higher.
Set database initialization parameters for good performance

Once you have ensured a balanced system you install the Oracle Database
software and create a database. There are a few parameters that you should pay
attention to when it comes down to achieving good performance for SQL
parallel execution.
Memory allocation
Large parallel operations may use a lot of execution memory, and you should
take this into account when allocating memory to the database. You should also
bear in mind that the majority of operations that execute in parallel bypass the
buffer cache. A parallel operation will only use the buffer cache if the object has
been either explicitly created with the CACHE option or if the object size is
smaller than 2% of the buffer cache. If the object size is less than 2% of the
buffer cache then the cost of the checkpoint to start the direct read is deemed
more expensive than just reading the blocks into the cache.
shared_pool_size
Parallel servers communicate among themselves and with the Query
Coordinator by passing messages. The messages are passed via memory buffers
that are allocated from the shared pool. When a parallel server is started it will
allocate buffers in the shared pool so it can communicate, if there is not enough
free space in the shared pool to allocate the buffers the parallel server will fail to
start. In order to size your shared pool appropriate you should use the following
formulas to calculate the additional overhead parallel servers will put on the
shared pool. If you are doing inter-node parallel operations
(((2 + (cpu_count X parallel_threads_per_cpu)) X 2) X
(cpu_count X parallel_threads_per_cpu)) X
parallel_execution_message_size X # concurrent queries

or when you use cross instance parallel operation in a RAC environment.
(((2+ (cpu_count X 2)) X 4) X cpu_count X 2)) X
parallel_execution_message_size X # concurrent queries
Note the results are returned in bytes.

Only the memory needed for the parallel_min_servers will be pre-
allocated from the shard_pool at database startup. As additional parallel servers
are needed, their memory buffers will be allocated “on the fly” from the shared
pool. These rules apply irrespective of whether you use shared_pool_size
directly, or sga_target (10g and higher) or memory_target (starting with
11g).
pga_aggregate_target
The pga_aggregate_target parameter controls the total amount of
execution memory that can be allocated by Oracle. Oracle attempts to keep the
amount of private memory below the target you specified by adapting the size of
the work areas. When you increase the value of this parameter, you indirectly
increase the memory allotted to work areas. Consequently, more memory-
intensive operations are able to run fully in memory and less will work their way
over to disk. For environments that run a lot of parallel operations you should
set pga_aggregate_target as large as possible. A good rule of thumb is to
have a minimum of 100MB X parallel_max_servers.
parallel_execution_message_size
As mentioned above the Parallel servers communicate among themselves and
with the Query Coordinator by passing messages via memory buffers. If you
execute a lot of large operations in parallel, it’s advisable to reduce the
messaging latency by increasing the parallel_execution_message_size
(the size of the buffers). By default the message size is 2K. Ideally you should
increase it to 16k (16384). However, a larger value for
parallel_execution_message_size will increase the memory
requirement for the shared_pool so if you increase it from 2K to 16K your
parallel server memory requirement will be 8 X more.
Controlling parallel servers
In order for a parallel operation to execute in an optimal fashion there has to be

enough parallel servers available. If there are no parallel servers available the
operation will actually be executed serially.
cpu_count
CPU count is an automatically derived parameter by the Oracle system and is
used to determine the default number of parallel servers and the default degree
of parallelism for an object. Do not change the value of this parameter.

parallel_threads_per_cpu
The parameter describes the number of parallel execution processes or threads
that a CPU can handle during parallel execution. It is used to calculate the
default degree of parallelism for the instance and determines the maximum
number of parallel servers if parallel_max_servers is not set. The default
is platform-dependent and is adequate in most cases (two on most platforms).
parallel_min_servers
This parameter determines the number of parallel servers that will be started
during database startup. By default the value is 0. It is recommended that you set
parallel_min_servers to “average number of concurrent queries *
maximum degree of parallelism need by a query”. This will ensure that there are
ample parallel server processes available for the majority of the queries executed
on the system and queries will not suffer any additional overhead of having to
spawn extra parallel servers. However, if extra parallel servers are required for
additional queries above you average workload they can be spawn “on the fly”
up to the value of parallel_max_servers. Bear in mind that any
additional parallel server processes that are spawned above
parallel_min_servers will be killed after they have been inactive for a
certain about of time and will have to be re-spawned if they are need again in
the future.
parallel_max_servers
This parameter determines the maximum number of parallel servers that may be
started for a database instance, should there be demand for them. The default
value on Oracle Database 10g and higher is 10 * cpu_count *
parallel_threads_per_cpu. A good rule of thumb is to ensure
parallel_max_servers is set to a number greater than the “maximum
number of concurrent queries * maximum degree of parallelism need by a
query”. By doing this you will ensure every query gets the appropriate number
of parallel servers.
parallel_adaptive_multi_user
This parameter controls whether or not Oracle automatically downgrades
parallel operations to proactively to prevent an overloading of the system.
Depending on the workload and the user expectations you should set this
parameter to true or false. Realize that if you set the parameter to true, then
parallel operations may be downgraded aggressively, which can significantly
impact the execution time. For predictable response times on a busy server it is
better to set this parameter to false.

Enabling efficient I/O throughput
db_file_multiblock_read_count
SQL parallel execution is generally used for queries that will access a lot of
data, for example when doing a full table scan. Since parallel execution will by-
pass the buffer cache and access data directly from disk you want each I/O to be
as efficient as possible, and using large I/Os is a way to reduce latency. Set
db_file_multiblock_read_count such that when it is multiplied by the
block size you end up with 1 MB. E.g. for 8K block size, use
db_file_multiblock_read_count=128.
disk_async_io
For optimum performance make sure you use asynchronous I/Os. This is the
default value for the majority of platforms.
Use parallel execution with common sense

While parallel execution provides a very powerful and scalable framework to
speed up SQL operations, you should not forget use some common sense rules;
never forget that while parallel execution might buy you an additional
incremental performance boost, it requires more resources and might also have
side effects on other users or operations on the same system. You cannot use
more resources than you have available.
Don't enable parallelism for small objects
Small tables/indexes (up to thousands of records; up to 10s of data blocks)

should never be enabled for parallel execution. Operations that only hit small
tables will not benefit much from executing in parallel, whereas they would use
parallel servers that you want to be available for operations accessing large
tables. Remember also that once an operation starts at a certain DOP, then it will
not change its DOP throughout the execution. Best practices for customers that
are using object sizes as the main driving factor for parallelism are commonly
aligning the DOP with some kind of step function for parallelism, e.g.
– objects smaller than 200 MB will not use any parallelism
– objects between 200 MB and 5GB are using a DOP of 4
– objects beyond 5GB are getting a DOP of 32
Needless to say that your personal optimal settings may vary - either in size
range or DOP - and highly depend on your target workload and business
requirements only.
Use parallelism to achieve your goals, not to exceed them
Use parallelism to achieve your business requirements, not to over-achieve

them. If a certain class of queries has to run within 2 minutes don't increase the

DOP to run them in 30 seconds. Remember Figure 1? Assuming linear
scalability, you need four times the number of parallel processes for this speed-
up example – resources you could give to three additional queries of the same
class, getting four times the work done and still adhering to your business goals
(obviously this is somewhat of a simplification since you should plan for some
head room, but the message should be clear).
Avoid using hints
In general you should avoid using hints to enable parallel execution. Hints are
hard to maintain and may not give the right behavior over time when objects and
business requirements change.
Combine parallel execution with Oracle Partitioning

Oracle Partitioning9 is powerful database functionality that is useful to manage
large database objects, and to achieve good performance when accessing large
database objects. Partitioning enables you to store one logical object – a table or
index – transparently in several independent physical segments. The data
placement is controlled with additional information about the object, such as
ranges of order data or hash buckets of customer id information.
There are specific optimizations between SQL parallel execution and Oracle
Partitioning that you should bear in mind when you plan to use these
functionalities together: For example, two large partitioned tables that can take
advantage of parallel partial or full partition-wise joins (as discussed on page 18)
can be joined faster than if no partitioning is involved. Ideally (sub)partitions are
similar in size which can be achieved by using hash (sub)partitioning on a
unique or almost unique column with the number of hash (sub)partitions a
power of 2.
For example: consider two large tables SALES and CUSTOMERS. Partition the
SALES table using composite partitioning RANGE on ORDER_DATE, HASH on
CUSTOMER_ID. Partition the CUSTOMERS table using HASH partitioning on
CUSTOMER_ID. Parallel table joins between SALES and CUSTOMERS can now
take advantage of full partition-wise joins.
Ensure statistics are good enough

Executing a SQL statement with the wrong execution plan usually results in
poor execution performance. If you execute in parallel then using the wrong
execution plan may exaggerate the performance issue.
The Oracle Database will compute the optimal execution plan for any SQL
operation. The basis for a good computation is good information about table
sizes, data distribution etc. Gathering statistics timely is the key to get the
statistics right so that the optimizer can generate a good execution plan.
9 Oracle Partitioning is an extra licensable option of Oracle Enterprise Edition

Starting with Oracle Database 10g also make sure to gather system statistics.
System statistics describe the system's hardware characteristics, such as I/O and
CPU performance and utilization, to the query optimizer. System statistics
enable the query optimizer to more accurately estimate I/O and CPU costs,
enabling the query optimizer to choose a better execution plan.
Monitor parallel execution activity

Use database utilities to monitor the activity on your system, focusing on SQL
parallel execution if you suspect problems in that area. Use the Enterprise
Manager performance page in Database Control or Grid Control to monitor wait
events. Alternatively you can use statspack (on Oracle Database 9i) and AWR
reports (Oracle Database 10g and higher) to analyze system performance. For
more information also see the following section about parallel execution
monitoring.
Whether or not to use parallel execution in RAC

RAC provides an excellent architecture to incrementally scale your hardware
configuration as you require system resources. You can use the additional
resources to support additional users (and hence reduce the load on the other
servers) and/or use the additional resources to directly improve the performance
of the operations running on the database. Do take into account that inter-node
parallel execution may result in a lot of interconnect traffic, so ensure you size
the interconnect appropriately. By default the Oracle database enables inter-node
parallel execution (parallel execution of a single statement involving more than
one node).
If you use a relatively weak interconnect, relative to the I/O bandwidth from the
server to the storage configuration, then you may be better of restricting parallel
execution to a single node or to a limited number of nodes; inter-node parallel
execution will not scale with an undersized interconnect. As a general rule of
thumb, your interconnect must provide the total I/O throughput of all nodes in a
cluster (since all nodes can distribute data at the same point in time with the
speed data is read from disk); so, if you have a four node cluster, each node
being able to read 1GB/sec from the I/O subsystem, the interconnect must be
able to support 4 x 1GB/sec = 4GB/sec to scale linearly for operations involving
inter-node parallel execution. It is not recommended to use inter-node parallel
execution unless your interconnect satisfies this requirement (or comes very
close).
Use instance_groups and parallel_instance_groups or database
services (starting with Oracle Database 11g) to limit inter-node parallel
execution. It is recommended to use services beginning with Oracle database
11g. The parameter instance_groups is going to be deprecated and only
retained for backwards compatibility reasons.

Use Database Resource Manager
Database Resource Manager ultimately decides the final DOP for a parallel
SQL operation before executing it. Consider using the Database Resource
Manager if you want to restrict users from using unlimited parallelism (and
hence overload the system). Database Resource Manager is an excellent tool to
guarantee resources for operations that require a certain response time.
Don't try to solve hardware deficiencies with other features

The most common “problem” of parallel execution (besides overloading a
system) is that people try to achieve scalability with parallel execution on
unbalanced systems – which obviously will not work. Rather than trying to
address the underlying problem by implementing a balanced system, people
often fight the symptoms, e.g. are creating indexes or additional summary tables.
While such measurements might alleviate existing deficiencies in the short term,
they will not fix them, but rather introduce unnecessary complexity and delay
solving the problem. When you need a hammer because you have a nail, using a
wrench might work for one or two nails, but not for building a whole house.
Don't ignore other features

On the other hand, having a hammer and making everything look like a nail is as
bad as not having the hammer at all. SQL parallel execution is a great way to get
better performance for expensive database operations, but do not forget that
there may be other functionality that is equally if not more appropriate to
achieve better performance for specific business requirements. For example,
embedding a cube-organized materialized view for your multi-dimensional
reporting and analysis might deliver a level of performance that would require
an orders of magnitude larger hardware to satisfy the same queries using parallel
execution and the detail data records. All of Oracle's warehousing functionality
is working together in harmony, so use specific features to leverage its strengths
and to solve your business requirements, not to adhere to a religious approach to
only use one set of functionality.
MONITORING SQL PARALLEL EXECUTION

There are several ways to monitor parallel execution. This section discusses
various options.
(G)V$ parallel execution views

Specific parallel execution performance views start with (G)V$_PQ and
(G)V$_PX. While the so-called V$ views give you an instance-specific view,
the GV$ views are useful in a Real Application Cluster (RAC) environment to
extract cluster-wide information. In addition to the columns in the equivalent V$
view the GV$ view contains the instance ID (nothing more, nothing less). For

example if you wanted to know parallel execution activity across a cluster, you
could run:
select inst_id
, status
, count(1) px_servers#
from gv$px_process
group by inst_id, status
order by inst_id, status ;
INST_ID STATUS PX_SERVERS#

---------- --------- -----------
1 AVAILABLE 4
1 IN USE 12
2 AVAILABLE 8
2 IN USE 8
3 AVAILABLE 6
3 IN USE 10
4 AVAILABLE 2
4 IN USE 14
Interpreting parallel SQL execution plans

Starting with Oracle Database 10g, for a given query there is a single cursor that
is executed by all parallel servers. All parallel execution information is in the
single execution plan that is used by every parallel server process. You can get
the parallel plan information through various mechanisms, e.g. using the
EXPLAIN PLAN utility, select from the cursor cache, or use the advanced
workload repository. The basic plan information will be the same for all these
mechanisms, so we will discuss how to identify and interpret the most
fundamental parallel execution optimization, namely a partition-wise join.
Parallel plan without partitioning
Initially tables SALES and CUSTOMERS are not partitioned. The following
shows a portion of the execution plan.
explain plan for select c.state_province
from customers c, sales s
and s.purchase_date
between to_date('01-NOV-2007','DD-MON-YYYY')
and c.country = 'United States of America'
group by c.state_province;
select * from table(dbms_xplan.display);10
10 Note that some columns in the execution plan have been removed to improve the
readability of this example.

--------------------------------------------------------------
| Id | Operation | Name | PQ Distrib |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | |
| 1 | PX COORDINATOR | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | QC (RAND) |
| 3 | HASH GROUP BY | | |
| 4 | PX RECEIVE | | |
| 5 | PX SEND HASH | :TQ10001 | HASH |
| 6 | HASH GROUP BY | | |
| 7 | HASH JOIN | | |
| 8 | PX RECEIVE | | |
| 9 | PX SEND BROADCAST | :TQ10000 | BROADCAST |
| 10 | PX BLOCK ITERATOR | | |
| 11 | TABLE ACCESS FULL| CUSTOMERS | |
| 12 | PX BLOCK ITERATOR | | |
| 13 | TABLE ACCESS FULL | SALES | |
--------------------------------------------------------------
Figure 16: customer purchase information per state, parallel plan
Using all the information discussed in the concept section of this paper, you will
be able to identify the following parallel processing steps:
– The CUSTOMERS table is read in parallel (ID 11) and is then broadcasted
to all parallel servers (ID 9) who read the SALES table.
– After the join, the data is redistributed using a HASH redistribution (ID 5)
on the group by column.
– Hash join and hash group by take place in parallel without a need for
redistribution (ID 6 and ID 7). Every parallel server process is doing the
incremental aggregation of their disjoint data set.
– Results are returned to the query coordinator in random order (ID 2), since
no order by was specified in the SQL statement; whenever a parallel server
finishes the computation of its incremental result it is returned to the QC.
Parallel plan with partitioning and partition-wise join
Large databases and particularly data warehouses – the types of databases that
mostly use parallel execution – should always use Oracle Partitioning.
Partitioning can provide great performance improvements because of partition
elimination (pruning) capabilities, but also because parallel execution plans can
take advantage of partitioning.
Let's recreate the tables SALES and CUSTOMERS as follows:
– HASH partitioning on the ID column for the CUSTOMERS table using 128
partitions.
– Table SALES and CUSTOMERS are now equi-partitioned on the join
column

--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop| TQ | PQ Distrib |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | | |
| 1 | PX COORDINATOR | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 | | | Q1,01 | QC (RAND) |
| 3 | SORT GROUP BY | | | | Q1,01 | |
| 4 | PX RECEIVE | | | | Q1,01 | |
| 5 | PX SEND HASH | :TQ10000 | | | Q1,00 | HASH |
| 6 | SORT GROUP BY | | | | Q1,00 | |
| 7 | PX PARTITION HASH ALL| | 1 | 128 | Q1,00 | |
| 8 | HASH JOIN | | | | Q1,00 | |
| 9 | TABLE ACCESS FULL | CUSTOMERS | 1 | 128 | Q1,00 | |
| 10 | TABLE ACCESS FULL | SALES | 1 | 128 | Q1,00 | |
----------------------------------------------------------------------------------------
Figure 17: customer purchase information, parallel plan , hash partitioning with partition-wise joins
Figure 17 shows the execution plan for the same query using the now hash
partitioned tables. Unlike in previous examples, you do not see the granules for
table SALES and CUSTOMERS right away in the plan. The simple reason for
this because we are now using partition-based granules, so Oracle does not
have to partition the data for parallel access at runtime; the database simply has
to iterate over existing partitions.
Furthermore, we are joining two equi-partitioned tables leveraging a partition-
wise join. The partition-based granules are not only identical for both tables, but
the iteration (processing) of granules is now a processing of pairs of partitions
that includes the join as well; one parallel server process is working on one
equivalent partition pair at a given point in time. Consequently, the partition-
based granule iterator is ABOVE the hash join operation in the execution plan.
Besides the known processing steps of parallel execution this new behavior of a
partition-wise join is seen in the execution plan in Figure 17.
– Tables SALES and CUSTOMERS are accessed in parallel, iterating over the
existing equi-partitioned hash partition-based granules (ID 7). You can read
this operation as “ loop over all hash partitions and process the operations
below”. A set of parallel servers is working on n partitions at a time (n
equals the DOP), from partition 1 to 128 (identified through columns
'Pstart' and 'Pstop')
– For each HASH partition pair, a parallel server process joins the table
CUSTOMERS and SALES.
No data redistribution is taking place to join tables SALES and CUSTOMERS.
In the case of inter-node parallel query, there would be no data transfer
necessary between the compute nodes, and the Oracle database – although built
on the shared everything paradigm - would behave like a shared nothing system
for this operation.

--------------------------------------------------------------------------------------
| Id | Operation | Name | Pstart| Pstop | PQ Distrib |
--------------------------------------------------------------------------------------
| 2 | PX SEND QC (RANDOM) | :TQ10001 | | | QC (RAND) |
| 3 | HASH GROUP BY | | | | |
| 4 | PX RECEIVE | | | | |
| 5 | PX SEND HASH | :TQ10000 | | | HASH |
| 6 | HASH GROUP BY | | | | |
| 7 | PX PARTITION HASH ALL | | 1 | 128 | |
|* 8 | HASH JOIN | | | | |
|* 9 | TABLE ACCESS FULL | CUSTOMERS | 1 | 128 | |
| 10 | PX PARTITION RANGE ITERATOR| | 72 | 73 | |
|* 11 | TABLE ACCESS FULL | SALES | 9089 | 9344 | |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

---------------------------------------------------
8 – access("S"."CUSTOMER_ID"="C"."ID")
9 - filter("C"."COUNTRY"='United States of America')
10 - filter("TIME_ID">=TO_DATE(' 2007-11-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')
AND "TIME_ID"<=TO_DATE(' 2007-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Figure 18: customer purchase information, parallel plan with PWJ
A full partition-wise join only requires the partitioning strategy of the join
column(s) to be identical. If we change the SALES table to become a composite
RANGE–HASH partitioned table, using PURCHASE_DATE for range partitioning
(7 years worth of data, partitioned by month) and CUSTOMER_ID for hash sub-
partitioning using 128 sub-partitions we still adhere to the condition for a full
partition-wise join and the plan would look change only slightly, as shown in
Figure 18 above.
However the query against the new partitioned tables returns even faster than
before. Besides the benefits from the parallel full partition-wise join a big
performance improvement is achieved through partition elimination; the Oracle
database analyzes all existing predicates in the query to see whether some
partitions can be ruled out from the processing completely. In our case, the
composite range-hash partitioned table SALES has 84 x 128 =10,752
subpartitions in total. Analyzing the filter predicate on purchase_date leads
to a reduction down to two range partitions (#72 and #73, shown in pstart/pstop
of ID 10); we only have to access 256 out of 10,752 partitioned, providing appr.
a 40x performance improvement.
Partition-wise joins can also be leveraged when joining REF Partitioned tables
or as so-called partial partition-wise joins when a small table is joined with a
significantly larger table and the database enforces a data redistribution to match
the partitioning strategy of the larger table. For the sake of focusing on parallel
execution only we will not further discuss partition-wise joins for REF
partitioned table nor do we discuss partial partition-wise joins.

Oracle Enterprise Manager
Oracle Enterprise Manager Database Control 11g provides new monitoring
capabilities useful from a parallel execution perspective. The functionality will
also be available in Oracle Enterprise Manager Grid Control 11g.
Wait events
The main performance screen in Oracle Enterprise Manager Database Control or

Grid Control – starting with Oracle Database 10g – shows a graph of wait events
over time. This screen is useful to identify what the system workload looks like
at any point in time. It is easy to figure out whether the system is using a lot of
CPU resources or whether it is waiting on a particular resource and if so, what
resource that is. If a significant portion of the workload consists of SQL
statements executing in parallel then it is typical to see a high CPU utilization
and/or significant user I/O waits.
Figure 19: Oracle Enterprise Manager Database Console 11g performance

page - wait events.
Figure 19 shows an Oracle Enterprise Manager Database Control screenshot of

the performance page focused on the graph with wait events. The parallel
execution workload shows a lot of I/O waits and not a very high CPU utilization
on the system.
The most common PX events deal with the message (data) exchange of the
producer/consumer model: in order to mitigate the waits the parallel execution
infrastructure uses buffers: producers fill a buffer and consumers read the buffer.
The mechanism works both ways to ensure efficient processing. As a result of
this model you will very likely see wait events in the database instance that are
due to producers waiting for consumers to accept data (PX Deq Credit:
send blkd) or consumers waiting for producers to produce data (PX Deq
Credit: need buffer). The wait events due to the producer/consumer

model are unavoidable to a large extent and don't really hurt performance (the
wait events fall in the “idle” wait class). Other wait events that you might see are
related to parallel server startup/shutdown, and for a coordinator to be able to get
the parallel servers it needs. These wait events should be rare and should not
take up a lot of time in a production environment.
In a statspack output or an Analytic Workload Repository (AWR) report you
will see all parallel execution wait events reported. As mentioned earlier, most
of the parallel execution (PX) events are either idle wait events or non-tunable,
unavoidable events due to the additional process communication in a parallel
environment..
Generally it is not parallel execution specific wait events that may cause slow
system performance but rather waits introduced by the workload running in
parallel, such as I/O waits, or high CPU utilization. An increase of the idle
parallel execution events can often be considered a symptom of a performance
problem rather than the cause. For example, an increase of consumers waiting
for producers to produce data (PX Deq Credit: need buffer) very likely
indicates a performance problem of slow IO, in case of having a producer operation
that involves disk IO (e.g. a parallel full table scan).
Input/Output (I/O) monitoring
Almost all SQL statements executing in parallel will read data directly from disk
rather than out of memory. As a result parallel statements can be very I/O
intensive. Oracle Enterprise Manager Database Control 11g provides I/O
throughput information on the main performance page – on the “I/O tab” – as
well as on the detailed I/O pages.
Figure 20: Detailed I/O page in OEM 11g Database Console for a parallel
DML workload.

The example in Figure 20 shows the I/O page for a parallel DML workload. A
lot of the I/Os per second are for the database writer and a significant portion of
the throughput is large writes. For a predominantly parallel query environment
you expect the majority of the throughput (in MB/s or GB/s) from large reads. If
parallel SQL operations are bottlenecked by I/O it is usually because the
maximum throughput (MB/s) has been reached rather than the maximum I/O
operations per second (IOPS).
Parallel execution monitoring
Oracle Enterprise Manager Database Control 11g also introduced parallel

execution monitoring on the performance page. The screens help you identify
whether the system is running a large number of statements in parallel and
whether the majority of the resources are used for few statements running at a
large DOP versus a large number of statements running at a lower DOP. Figure
21 shows a screenshot of the Parallel Execution tab on the performance page in
Oracle Enterprise Manager 11g Database Control..
Figure 21: Parallel execution monitoring in OEM 11g Database Console.

SQL monitoring
Oracle Database 11g introduced a new dynamic view GV$SQL_MONITOR11.

This view enables real-time monitoring of long-running SQL statements and all
parallel SQL statements without any overhead.
Figure 22: Monitoring a parallel execution query in near real-time.
With Oracle Database 11.1.0.6 you can only use textual output from the view.
Starting with Oracle Enterprise Manager database console 11.1.0.7 there is a
graphical interface to GV$SQL_MONITOR. Oracle Enterprise Manager Grid
Control 11g will also provide the graphical interface.
The examples and screenshots in this section show Oracle Enterprise Manager
11.1.0.7 database console on a single instance 2 CPU database server12.
The SQL Monitoring screen shows the execution plan of a long-running
statement or a statement that is running in parallel. In near real-time (the default
refresh cycle is 5 seconds) you can monitor which step in the execution plan is
being worked on and if there are any waits (see Figure 22). A parallel statement
shows the parallel server sets. The SQL Monitor output is extremely valuable to
identify which parts of an execution plan are expensive throughout the total
execution of a SQL statement.
The SQL Monitoring screens also provide information about the parallel server
sets and work distribution between individual parallel servers on the “Parallel”
tab (see Figure 23).
11 Oracle Database Enterprise Manager Tuning Pack must be licensed in order to access
(G)V$SQL_MONITOR.
12 As of publication Oracle Database 11.1.0.7 is not yet available. The example shows
screenshots of an early version of database console on a development version.

Figure 23: Parallel server sets and work distribution in SQL Monitoring.
Ideally you see an equal distribution of work across the parallel servers. If there
is a skew in the distribution of work between parallel servers in one parallel
server set then you have not achieved optimal performance. The statement will
have to wait for the parallel server performing most work to complete.
The third tab in the SQL Monitoring interface shows the activity for the
statement over time in near real-time (see Figure 24). Use this information to
identify at statement level what resources are used most intensely.
Figure 24: Wait activity in SQL Monitoring.

UPGRADE CONSIDERATIONS COMING FROM ORACLE DATABASE 9I
Oracle Database 10g introduced a completely rewritten internal parallel
execution infrastructure. Many parallel execution restrictions that existed in
Oracle Database 9i have been lifted starting with Oracle Database 10g.
If you are using SQL parallel execution on Oracle Database 9i, and you plan to
upgrade to Oracle Database 10g or beyond, then you should be aware of some
changes in the SQL parallel execution infrastructure. These changes may result
in unexpected changes, mainly in terms of getting more statements parallelized
and the chance of using more parallel resources on a system, that can lead to
different execution times for parallel operation or a different system utilization
between Oracle Database 9i and higher releases.
More parallel operations

The internal code rewrite introduced with Oracle Database 10g lifted a number
of parallel execution restrictions that existed in Oracle Database 9i. As a result
you might see that some operations that were running in serial are now executed
in parallel when you use parallel settings at the table level. This may be great for
the execution time of these operations that did not run in parallel before, but it
also means that the system will end up using a lot more parallel resources than it
used to. In the worst case operations that would run in parallel in Oracle
Database 9i are now going to be starved for parallel resources and may be either
be running at a lower DOP, or even be serialized. This problem is even
exaggerated if your system already runs close to the resource limit with Oracle
Database 9i.
When you plan to upgrade Oracle Database 9i you should review your SQL
parallel execution settings. In any and all cases you should validate the parallel
execution behavior between Oracle Database 9i and another release through a
representative test of the workload on your production system.
If on Oracle Database 9i you used hints to enable SQL parallel execution
If you always use hints, and nothing but hints, to enable SQL parallel execution
on Oracle Database 9i then there is little to worry about when upgrading. You
should verify whether every operation with parallel hints actually runs in
parallel in Oracle Database 9i, but if it does, it will do so in Oracle Database 10g
and beyond as well.
If on Oracle Database 9i you used session settings to enable SQL parallel execution
If you always use only the session setting to enable parallel execution, then you
should look at the operations that are executed in the sessions that enable or
force parallel execution. Expect more operations to execute in parallel after an
upgrade to Oracle Database 10g or beyond. If there are only parallel operations
on Oracle Database 9i in your parallel enabled sessions then you would expect
minimal changes, if any, after an upgrade.

If on Oracle Database 9i you used object level settings to enable SQL parallel execution
If you set the parallel properties at the table or index level in order to enable
parallel execution, then you will face the highest likelihood to experience
changes. Expect some operations that access parallel enabled objects which
would not execute in parallel on Oracle Database 9i to run in parallel after an
upgrade.
Carefully review the parallel settings at the table level, and reset the parallel
setting on small database objects to noparallel (database objects with fewer than
thousands of records and/or few database blocks in size). Operations that
complete in a few seconds or less when running in serial benefit little from
executing in parallel. Rather you want operations that take minutes or even
hours to complete in serial to benefit from parallel execution.
Execution plan changes

As mentioned before in this paper, you will only see a single execution plan for
a parallel statement in Oracle Database 10g and beyond that us used by all
parallel servers. As a result the execution plan is easier to read. However, if you
automate the comparison of execution plans between the old database release
and the new database release, then you will see changes. You should understand
where these changes come from and you may have to manually compare the
execution plans to ensure they do not change for the worse.
Furthermore, due to the change to a single cursor model you will only see
multiple parallel servers executing the actual single cursor for the parallel
execution plan instead of seeing different SQL statements representing
fragments of the parallel plan (a.k.a. slave SQL in version prior to Oracle
database 10g), so they way of how to monitor and analyze parallel execution
will change. Note that the fact of changing to a single cursor model by itself will
not have any impact on the operation of your system; an impact, if any, only
relates to more parallel capabilities in Oracle database 10g and beyond.
Changes in database defaults

Some of the default values for database initialization parameters for SQL
parallel execution have changed from Oracle Database 9i. The most notable
changes are:
parallel_max_servers
The default value in Oracle Database 9i was 10. For Oracle Database 10g and
higher, assuming you use automatic memory management for execution
memory (i.e. you use pga_aggregate_target or starting with Oracle
Database 11g memory_target) the default equates to 10 * cpu_count.
Generally 10 * cpu_count equates to a lot more than 10, which means that
SQL parallel execution may end up using a lot more system resources.
If your system was heavily loaded on Oracle Database 9i with some operations

running in parallel, you may see the overall system throughput go down when
you upgrade to Oracle Database 10g or beyond. The remedy for this change is to
manually set parallel_max_servers to 10 in the database initialization file
pfile or spfile.
parallel_adaptive_multi_user
In Oracle Database 9i parallel_adaptive_multi_user was by default
derived from parallel_automatic_tuning and defaulted to false. In
Oracle Database 10g and beyond parallel_adaptive_multi_user
equates to true. As a result the database will aggressively reduce the DOP for
SQL parallel operations when some other statements already use SQL parallel
servers. If you did not explicitly change parallel_automatic_tuning or
parallel_adaptive_multi_user on Oracle Database 9i, then you should
explicitly set parallel_adaptive_multi_user to false when you upgrade
to Oracle Database 10g or beyond.
Use Resource Manager

Consider the use of Resource Manager beyond Oracle Database 9i to ensure
operations get the resources they need when they need them. If there is a class of
user or a type of application that should never execute in parallel, consider
ensuring that this application cannot execute in parallel using a specific
consumer group and an appropriate resource plan in Resource Manager. That
way the application will not unexpectedly consume parallel resources,
potentially starving operations that do require parallel execution in order to
complete in a reasonable amount of time.
CONCLUSION
The objective of parallel execution is to reduce the total execution time of an
operation by using multiple resources concurrently. Resource availability is the
most important prerequisite for scalable parallel execution.
The Oracle Database provides a powerful SQL parallel execution engine that
can run almost any SQL-based operation – DDL, DML and queries – in the
Oracle Database in parallel. This paper explained how to enable SQL parallel
execution and provided some best practices to ensure its successful use.

June 2008
Author: Mark Van de Wiel, Hermann Baer
Contributing Authors: Thierry Cruanes, Maria Colgan
Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.
Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
oracle.com
Copyright © 2008, Oracle. All rights reserved.

This document is provided for information purposes only and the
contents hereof are subject to change without notice.
This document is not warranted to be error-free, nor subject to any
other warranties or conditions, whether expressed orally or implied
in law, including implied warranties and conditions of merchantability
or fitness for a particular purpose. We specifically disclaim any
liability with respect to this document and no contractual obligations
are formed either directly or indirectly by this document. This document
may not be reproduced or transmitted in any form or by any means,
electronic or mechanical, for any purpose, without our prior written permission.
Oracle, JD Edwards, PeopleSoft, and Siebel are registered trademarks of Oracle
Corporation and/or its affiliates. Other names may be trademarks
of their respective owners.

TWP Bidw Parallel Execution 11gr1

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

TWP Bidw Parallel Execution 11gr1

Diunggah oleh

Hak Cipta:

Format Tersedia

Oracle SQL Parallel Execution

An Oracle White Paper

Oracle SQL Parallel Execution Page 2

Oracle SQL Parallel Execution Page 3

Oracle SQL Parallel Execution Page 4

Oracle SQL Parallel Execution Page 5

The ultimate goal: scalability

Figure 1: Processing time as a function of resources for linear scalability.

Oracle SQL Parallel Execution Page 6

Shared everything – the Oracle advantage

Figure 2: Shared everything versus shared nothing

Oracle SQL Parallel Execution Page 7

Oracle SQL Parallel Execution Page 8

Processing parallel SQL statements

Oracle SQL Parallel Execution Page 9

If you execute a statement in parallel (via mechanisms described later), the

Figure 5: customer count, parallel plan

Figure 6: customer purchase information, parallel plan

Oracle SQL Parallel Execution Page 10

Query Coordinator (QC) and parallel servers

SQL parallel execution in the Oracle Database is based on the principles of a

oracle 23473 1 0 17:46 ? 00:00:00 ora_p000_linux111

oracle 23475 1 0 17:46 ? 00:00:00 ora_p001_linux111

oracle 23477 1 0 17:46 ? 00:00:00 ora_p002_linux111

oracle 23479 1 0 17:46 ? 00:00:00 ora_p003_linux111

Oracle SQL Parallel Execution Page 11

QC gets the subtotals and adds them up

Parallel servers retrieve subtotals

Figure 8: QC and parallel servers

Oracle SQL Parallel Execution Page 12

Figure 9: Producer and Consumer

Oracle SQL Parallel Execution Page 13

'Block Iterator' is the operation

Figure 10: Block-based granule in the customer count example.

Oracle SQL Parallel Execution Page 14

Oracle SQL Parallel Execution Page 15

Figure 11: Serial join based on two full table scans.

Oracle SQL Parallel Execution Page 16

Figure 12: Data redistribution for a simple parallel join.

Oracle SQL Parallel Execution Page 17

HASH redistribution on join column

HASH redistribution on join column

Parallel partition-wise joins

Oracle SQL Parallel Execution Page 18

Enabling parallel execution in Oracle

Oracle SQL Parallel Execution Page 19

Oracle SQL Parallel Execution Page 20

NAME TYPE VALUE

There are three ways to enable a query to execute in parallel.

Oracle SQL Parallel Execution Page 21

Controlling SQL Parallel Execution in Oracle

Understand your target workload

The single-user workload is a workload in which there is a single operation

Multi-user concurrent workload

Most production environments have a multi-user workload. Users concurrently

Oracle SQL Parallel Execution Page 22

Oracle's parallel execution framework enables you to either explicitly chose - or

In the earlier example of our parallel query we used so-called DEFAULT

Fixed Degree Of Parallelism (DOP)

alter table sales parallel 16 ;

Oracle SQL Parallel Execution Page 23

Guaranteeing a minimal DOP

SQL> select /+ parallel(s,128) / count(*)

select /+ parallel(s,128) / count(*) from sales s