Chapter 20

Chapter 20
Query Processing
Transparencies
Chapter 20 - Objectives
Objectives of query processing and optimization.

Static versus dynamic query optimization.
How a query is decomposed and semantically
analyzed.
How to create a R.A.T. to represent a query.
Rules of equivalence for RA operations.
How to apply heuristic transformation rules to
improve efficiency of a query.
2
Types of database statistics required to estimate

cost of operations.
Different strategies for implementing selection.
How to evaluate cost and size of selection.
Different strategies for implementing join.
How to evaluate cost and size of join.
Different strategies for implementing projection.
How to evaluate cost and size of projection.
3
How to evaluate the cost and size of other RA

operations.
How pipelining can be used to improve efficiency
of queries.
Difference
between
materialization
and
pipelining.
Advantages of left-deep trees.
Introduction
In network and hierarchical DBMSs, low-level

procedural query language is generally embedded
in high-level programming language.
Programmers responsibility to select most
appropriate execution strategy.
With declarative languages such as SQL, user
specifies what data is required rather than how it is
to be retrieved.
Relieves user of knowing what constitutes good
execution strategy.
5
Introduction
Also gives DBMS

performance.
more
control
over
system
Two main techniques for query optimization:

heuristic rules that order operations in a query;
comparing different strategies based on relative costs,
and selecting one that minimizes resource usage.
Disk access tends to be dominant cost in query

processing for centralized DBMS.
6
Query Processing
Activities involved in retrieving data from the
database.
Aims of QP:
transform query written in high-level language
(e.g. SQL), into correct and efficient execution
strategy expressed in low-level language
(implementing RA);
execute strategy to retrieve required data.
7
Query Optimization
Activity of choosing an efficient execution strategy
for processing query.
As there are many equivalent transformations of

same high-level query, aim of QO is to choose one
that minimizes resource usage.
Generally, reduce total execution time of query.
May also reduce response time of query.
Problem computationally intractable with large
number of relations, so strategy adopted is reduced
to finding near optimum solution.
8
Example 20.1 - Different Strategies

Find all Managers who work at a London branch.
SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = Manager AND b.city = London);

Three equivalent RA queries are:
(1) (position='Manager') (city='London')
(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position='Manager') (city='London')(

Staff
Staff.branchNo=Branch.branchNo
(3) ( position='Manager'(Staff))
Branch)
( city='London' (Branch))
10
Assume:
1000 tuples in Staff; 50 tuples in Branch;
50 Managers; 5 London branches;
no indexes or sort keys;
results of any intermediate operations stored
on disk;
cost of the final write is ignored;
tuples are accessed one at a time.
11
Example 20.1 - Cost Comparison
Cost (in disk accesses) are:
(1) (1000 + 50) + 2*(1000 * 50) = 101 050

(2) 2*1000 + (1000 + 50) = 3 050
(3) 1000 + 2*50 + 5 + (50 + 5) = 1 160
Cartesian product and join operations much more

expensive than selection, and third option
significantly reduces size of relations being joined
together.
12
Phases of Query Processing
QP has four main phases:

decomposition (consisting of parsing and
validation);
optimization;
code generation;
execution.
13
Phases of Query Processing
14
Dynamic versus Static Optimization
Two times when first three phases of QP can be

carried out:
dynamically every time query is run;
statically when query is first submitted.
Advantages of dynamic QO arise from fact that
information is up to date.
Disadvantages are that performance of query is
affected, time may limit finding optimum
strategy.
15
Dynamic versus Static Optimization
Advantages of static QO are removal of runtime

overhead, and more time to find optimum
strategy.
Disadvantages arise from fact that chosen
execution strategy may no longer be optimal
when query is run.
Could use a hybrid approach to overcome this.
16
Query Decomposition
Aims are to transform high-level query into RA

query and check that query is syntactically and
semantically correct.
Typical stages are:
analysis,
normalization,
semantic analysis,
simplification,
query restructuring.
17
Analysis
Analyze query lexically and syntactically using

compiler techniques.
Verify relations and attributes exist.
Verify operations are appropriate for object type.
18
Analysis - Example
SELECT staff_no
FROM Staff
WHERE position > 10;
This query would be rejected on two grounds:

staff_no is not defined for Staff relation
(should be staffNo).
Comparison >10 is incompatible with type
position, which is variable character string.
19
Analysis
Finally, query transformed into some internal

representation more suitable for processing.
Some kind of query tree is typically chosen,
constructed as follows:
Leaf node created for each base relation.
Non-leaf node created for each intermediate
relation produced by RA operation.
Root of tree represents query result.
Sequence is directed from leaves to root.
20
Example 20.1 - R.A.T.
21
Normalization
Converts query into a normalized form for easier

manipulation.
Predicate can be converted into one of two
forms:
Conjunctive normal form:

(position = 'Manager' salary > 20000) (branchNo = 'B003')
Disjunctive normal form:

(position = 'Manager' branchNo = 'B003' )
(salary > 20000 branchNo = 'B003')
22
Semantic Analysis
Rejects normalized queries that are incorrectly

formulated or contradictory.
Query is incorrectly formulated if components
do not contribute to generation of result.
Query is contradictory if its predicate cannot be
satisfied by any tuple.
Algorithms to determine correctness exist only
for queries that do not contain disjunction and
negation.
23
Semantic Analysis
For these queries, could construct:

A relation connection graph.
Normalized attribute connection graph.
Relation connection graph

Create node for each relation and node for result.
Create edges between two nodes that represent a
join, and edges between nodes that represent
projection.
If not connected, query is incorrectly formulated.
24
Semantic Analysis - Normalized Attribute

Connection Graph
Create node for each reference to an attribute, or

constant 0.
Create directed edge between nodes that represent a
join, and directed edge between attribute node and 0
node that represents selection.
Weight edges a b with value c, if it represents
inequality condition (a b + c); weight edges 0 a
with -c, if it represents inequality condition (a c).
If graph has cycle for which valuation sum is
negative, query is contradictory.
25
Example 20.2 - Checking Semantic Correctness

SELECT p.propertyNo, p.street
FROM Client c, Viewing v, PropertyForRent p
WHERE c.clientNo = v.clientNo AND
c.maxRent >= 500 AND
c.prefType = Flat AND p.ownerNo = CO93;
Relation connection graph not fully connected, so

query is not correctly formulated.
Have omitted the join condition (v.propertyNo =
p.propertyNo) .
26

Relation Connection graph
Normalized attribute
connection graph
27

WHERE c.maxRent > 500 AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.prefType = Flat AND c.maxRent < 200;
Normalized attribute connection graph has cycle

between nodes c.maxRent and 0 with negative
valuation sum, so query is contradictory.
28
Simplification
Detects redundant qualifications,

eliminates common sub-expressions,
transforms query to semantically equivalent
but more easily and efficiently computed form.
Typically, access restrictions, view definitions, and
integrity constraints are considered.
Assuming user has appropriate access privileges,
first apply well-known idempotency rules of
boolean algebra.
29
Transformation Rules for RA Operations

Conjunctive Selection operations can cascade into
individual Selection operations (and vice versa).
pqr(R) = p( q( r(R)))
Sometimes referred to as cascade of Selection.

branchNo='B003' salary>15000(Staff) =
branchNo='B003'( salary>15000(Staff))
30

Commutativity of Selection.
p( q(R)) = q( p(R))
For example:
branchNo='B003'( salary>15000(Staff)) =
salary>15000( branchNo='B003'(Staff))
31

In a sequence of Projection operations, only the
last in the sequence is required.
L M N(R) = L (R)
For example:
lName branchNo, lName(Staff) = lName (Staff)
32

Commutativity of Selection and Projection.
If predicate p involves only attributes in projection list,

Selection and Projection operations commute:
Ai, , Am( p(R)) = p( Ai, , Am(R))
where p {A1, A2, , Am}
For example:
fName, lName( lName='Beech'(Staff)) =
lName='Beech'( fName,lName(Staff))
33

Commutativity of Theta join (and Cartesian
product).
R pS=S pR
RXS=SXR
Rule also applies to Equijoin and Natural join.

For example:
Staff staff.branchNo=branch.branchNo Branch =
Branch
staff.branchNo=branch.branchNo
Staff
34

Commutativity of Selection and Theta join (or
Cartesian product).
If selection predicate involves only attributes of

one of join relations, Selection and Join (or
Cartesian product) operations commute:
p(R
r S) = ( p(R))
r S
p(R X S) = ( p(R)) X S
where p {A1, A2, , An}
35
If selection predicate is conjunctive predicate

having form (p q), where p only involves
attributes of R, and q only attributes of S,
Selection and Theta join operations commute as:
p q(R
S) = ( p(R))
( q(S))
p q(R X S) = ( p(R)) X ( q(S))
36

For
example:
position='Manager' city='London'(Staff
Staff.branchNo=Branch.branchNo Branch) =
( position='Manager'(Staff))
( city='London' (Branch))
37

Commutativity of Projection and Theta join (or
Cartesian product).
If projection list is of form L = L1 L2, where L1

only has attributes of R, and L2 only has
attributes of S, provided join condition only
contains attributes of L, Projection and Theta
join commute:
L1L2(R
S) = ( L1(R))
( L2(S))
38
If join condition contains additional attributes

not in L (M = M1 M2 where M1 only has
attributes of R, and M2 only has attributes of S),
a final projection operation is required:
L1L2(R r S) = L1L2( ( L1M1(R))
( L2M2(S)))
39
For example:
position,city,branchNo(Staff
Branch) =
( position, branchNo(Staff))
city, branchNo (Branch))
and using the latter rule:

position, city(Staff
Staff.branchNo=Branch.branchNo Branch) =
position, city (( position, branchNo(Staff))
( city, branchNo (Branch)))
40

Commutativity of Union and Intersection (but
not set difference).
RS=SR
RS=SR
41

Commutativity of Selection and set operations
(Union, Intersection, and Set difference).
p(R S) = p(S) p(R)
p(R S) = p(S) p(R)
p(R - S) = p(S) - p(R)
42

Commutativity of Projection and Union.
L(R S) = L(S) L(R)
Associativity of Union and Intersection (but not
Set difference).
(R S) T = S (R T)
(R S) T = S (R T)
43

Associativity of Theta join (and Cartesian product).
Cartesian product and Natural join are always
associative:
(R S) T = R (S T)
(R X S) X T = R X (S X T)
If join condition q involves attributes only from S

and T, then Theta join is associative:
(R p S) q r T = R p r (S q T)
44
For example:
(Staff
Staff.staffNo=PropertyForRent.staffNo
PropertyForRent)
ownerNo=Owner.ownerNo staff.lName=Owner.lName
Staff
Owner =
staff.staffNo=PropertyForRent.staffNo staff.lName=lName
(PropertyForRent
ownerNo
Owner)
45
Example 20.3 Use of Transformation Rules

For prospective renters of flats, find properties that
match requirements and owned by CO93.
WHERE c.prefType = Flat AND
c.clientNo = v.clientNo AND
v.propertyNo = p.propertyNo AND
c.maxRent >= p.rent AND
c.prefType = p.type AND
p.ownerNo = CO93;
46
47
48
49
Heuristical Processing Strategies
Perform Selection operations as early as possible.

Keep predicates on same relation together.
Combine Cartesian product with subsequent

Selection whose predicate represents join
condition into a Join operation.
Use associativity of binary operations to

rearrange leaf nodes so leaf nodes with most
restrictive Selection operations executed first.
50
Heuristical Processing Strategies
Perform Projection as early as possible.

Keep projection attributes on same relation together.
Compute common expressions once.

If common expression appears more than once, and
result not too large, store result and reuse it when
required.
Useful when querying views, as same expression is used
to construct view each time.
51
Cost Estimation for RA Operations
Many different ways of implementing RA

operations.
Aim of QO is to choose most efficient one.
Use formulae that estimate costs for a number of
options, and select one with lowest cost.
Consider only cost of disk access, which is usually
dominant cost in QP.
Many estimates are based on cardinality of the
relation, so need to be able to estimate this.
52
Database Statistics
Success of estimation depends on amount and

currency of statistical information DBMS holds.
Keeping statistics current can be problematic.
If statistics updated every time tuple is changed,
this would impact performance.
DBMS could update statistics on a periodic basis,
for example nightly, or whenever the system is
idle.
53
Typical Statistics for Relation R

nTuples(R) - number of tuples in R.
bFactor(R) - blocking factor of R.
nBlocks(R) - number of blocks required to store R:
nBlocks(R) = [nTuples(R)/bFactor(R)]
54
Typical Statistics for Attribute A of Relation R

nDistinctA(R) - number of distinct values that
appear for attribute A in R.
minA(R),maxA(R)
minimum and maximum possible values

for attribute A in R.
SCA(R) - selection cardinality of attribute A in R.
Average number of tuples that satisfy an
equality condition on attribute A.
55
Statistics for Multilevel Index I on Attribute A

nLevelsA(I) - number of levels in I.
nLfBlocksA(I) - number of leaf blocks in I.
56
Selection Operation
Predicate may be simple or composite.

Number of different implementations, depending
on file structure, and whether attribute(s)
involved are indexed/hashed.
Main strategies are:
Linear Search (Unordered file, no index).
Binary Search (Ordered file, no index).
Equality on hash key.
Equality condition on primary key.
57
Selection Operation
Inequality condition on primary key.
Equality condition on clustering (secondary)
index.
Equality condition on a non-clustering
(secondary) index.
Inequality condition on a secondary B+-tree
index.
58
Estimating Cardinality of Selection
Assume attribute values are uniformly distributed

within their domain and attributes are
independent.
nTuples(S) = SCA(R)
For any attribute B A of S, nDistinctB(S) =

nTuples(S)
if nTuples(S) < nDistinctB(R)/2
nDistinctB(R)
if nTuples(S) > 2*nDistinctB(R)
[(nTuples(S) + nDistinctB(R))/3]
otherwise
59
Linear Search (Ordered File, No Index)
May need to scan each tuple in each block to check

whether it satisfies predicate.
For equality condition on key attribute, cost estimate is:
[nBlocks(R)/2]
For any other condition, entire file may need to be

searched, so more general cost estimate is:
nBlocks(R)
60
Binary Search (Ordered File, No Index)
If predicate is of form A = x, and file is ordered on

key attribute A, cost estimate:
[log2(nBlocks(R))]
Generally, cost estimate is:

[log2(nBlocks(R))] + [SCA(R)/bFactor(R)] - 1
First term represents cost of finding first tuple

using binary search.
Expect there to be SCA(R) tuples satisfying
predicate.
61
Equality of Hash Key
If attribute A is hash key, apply hashing algorithm

to calculate target address for tuple.
If there is no overflow, expected cost is 1.
If there is overflow, additional accesses may be
necessary.
62
Equality Condition on Primary Key
Can use primary index to retrieve single record

satisfying condition.
Need to read one more block than number of
index accesses, equivalent to number of levels in
index, so estimated cost is:
nLevelsA(I) + 1
63
Inequality Condition on Primary Key
Can first use index to locate record satisfying

predicate (A = x).
Provided index is sorted, records can be found by
accessing all records before/after this one.
Assuming uniform distribution, would expect half
the records to satisfy inequality, so estimated cost
is:
nLevelsA(I) + [nBlocks(R)/2]
64
Equality Condition on Clustering Index
Can use index to retrieve required records.

Estimated cost is:
nLevelsA(I) + [SCA(R)/bFactor(R)]
Second term is estimate of number of blocks that

will be required to store number of tuples that
satisfy equality condition, represented as SCA(R).
65
Equality Condition on Non-Clustering Index
Can use index to retrieve required records.

Have to assume that tuples are on different
blocks (index is not clustered this time), so
estimated cost becomes:
nLevelsA(I) + [SCA(R)]
66
Inequality Condition on a Secondary B +Tree Index
From leaf nodes of tree, can scan keys from

smallest value up to x (< or <= ) or from x up to
maximum value (> or >=).
Assuming uniform distribution, would expect half
the leaf node blocks to be accessed and, via index,
half the file records to be accessed.
Estimated cost is :
nLevelsA(I) + [nLfBlocksA(I)/2 + nTuples(R)/2]
67
Composite Predicates - Conjunction

without Disjunction
May consider following approaches:

- If one attribute has index or is ordered, can use one of
above selection strategies. Can then check each retrieved
record.
- For equality on two or more attributes, with composite
index (or hash key) on combined attributes, can search
index directly.
- With secondary indexes on one or more attributes
(involved only in equality conditions in predicate), could
use record pointers if exist.
68
Composite Predicates - Selections with

Disjunction
If one term contains an (OR), and term requires

linear search, entire selection requires linear
search.
Only if index or sort order exists on every term
can selection be optimized by retrieving records
that satisfy each condition and applying union
operator.
Again, record pointers can be used if they exist.
69
Join Operation
Main strategies for implementing join:
Block Nested Loop Join.

Indexed Nested Loop Join.
Sort-Merge Join.
Hash Join.
70
Estimating Cardinality of Join
Cardinality of Cartesian product is:

nTuples(R) * nTuples(S)
More difficult to estimate cardinality of any join

as depends on distribution of values.
Worst case, cannot be any greater than this value.
71
Estimating Cardinality of Join
If assume uniform distribution, can estimate for

Equijoins with a predicate (R.A = S.B) as follows:
If A is key of R:
nTuples(T) nTuples(S)
If B is key of S:
nTuples(T) nTuples(R)
Otherwise, could estimate cardinality of join as:

nTuples(T) = SCA(R)*nTuples(S)
or
nTuples(T) = SCB(S)*nTuples(R)
72
Block Nested Loop Join
Simplest join algorithm is nested loop that joins

two relations together a tuple at a time.
Outer loop iterates over each tuple in R, and
inner loop iterates over each tuple in S.
As basic unit of reading/writing is a disk block,
better to have two extra loops that process blocks.
Estimated cost of this approach is:
nBlocks(R) + (nBlocks(R) * nBlocks(S))
73
Block Nested Loop Join
Could read as many blocks as possible of smaller

relation, R say, into database buffer, saving one block
for inner relation and one for result.
New cost estimate becomes:
nBlocks(R)+[nBlocks(S)*(nBlocks(R)/(nBuffer-2))]
If can read all blocks of R into the buffer, this reduces

to:
nBlocks(R) + nBlocks(S)
74
Indexed Nested Loop Join
If have index (or hash function) on join attributes of

inner relation, can use index lookup.
For each tuple in R, use index to retrieve matching
tuples of S.
Cost of scanning R is nBlocks(R), as before.
Cost of retrieving matching tuples in S depends on
type of index and number of matching tuples.
If join attribute A in S is PK, cost estimate is:
nBlocks(R) + nTuples(R)*(nlevelsA(I) + 1)
75
Sort-Merge Join
For Equijoins, most efficient join is when both

relations are sorted on join attributes.
Can look for qualifying tuples merging relations.
May need to sort relations first.
Now tuples with same join value are in order.
If assume join is *:* and each set of tuples with
same join value can be held in database buffer at
same time, then each block of each relation need
only be read once.
76
Sort-Merge Join
Cost estimate for the sort-merge join is:

nBlocks(R) + nBlocks(S)
If a relation has to be sorted, R say, add:

nBlocks(R)*[log2(nBlocks(R)]
77
Hash Join
For Natural or Equijoin, hash join may be used.

Idea is to partition relations according to some hash
function that provides uniformity and randomness.
Each equivalent partition should hold same value
for join attributes, although it may hold more than
one value.
Cost estimate of hash join as:
3(nBlocks(R) + nBlocks(S))
78
Projection Operation
To implement projection need following steps:

Removal of attributes that are not required.
Elimination of any duplicate tuples produced
from previous step. Only required if
projection attributes do not include a key.
Two main approaches to eliminating duplicates:
sorting;
hashing.
79
Estimating Cardinality of Projection
When projection contains key, cardinality is:

nTuples(S) = nTuples(R)
If projection consists of a single non-key attribute, estimate

is:
nTuples(S) = SCA(R)
Otherwise, could estimate cardinality as:

nTuples(S) min(nTuples(R), im=1(nDistinctai(R)))
80
Duplicate Elimination using Sorting
Sort tuples of reduced relation using all remaining

attributes as sort key.
Duplicates will now be adjacent and can be removed
easily.
Estimated cost of sorting is:
nBlocks(R)*[log2(nBlocks(R))].
Combined cost is:
nBlocks(R) + nBlocks(R)*[log2(nBlocks(R))]
81
Duplicate Elimination using Hashing
Two phases: partitioning and duplicate

elimination.
In partitioning phase, for each tuple in R,
remove unwanted attributes and apply hash
function to combination of remaining attributes,
and write reduced tuple to hashed value.
Two tuples that belong to different partitions are
guaranteed not to be duplicates.
Estimated cost is:
nBlocks(R) + nB
82
Set Operations
Can be implemented by sorting both relations on same

attributes, and scanning through each of sorted
relations once to obtain desired result.
Could use sort-merge join as basis.
Estimated cost in all cases is:
nBlocks(R) + nBlocks(S) +
nBlocks(R)*[log2(nBlocks(R))] +
nBlocks(S)*[log2(nBlocks(S))]
Could also use hashing algorithm.

83
Estimating Cardinality of Set Operations
As duplicates are eliminated when performing union,

difficult to estimate cardinality, but can give an upper
and lower bound as:
max(nTuples(R), nTuples(S)) nTuples(T)
nTuples(R) + nTuples(S)
For set difference, can also give upper and lower

bound:
0 nTuples(T) nTuples(R)
84
Aggregate Operations
SELECT AVG(salary)
FROM Staff;
To implement query, could scan entire Staff

relation and maintain running count of number
of tuples read and sum of all salaries.
Easy to compute average from these two running
counts.
85
Aggregate Operations
SELECT AVG(salary)
FROM Staff
GROUP BY branchNo;
For grouping queries, can use sorting or hashing

algorithms similar to duplicate elimination.
Can estimate cardinality of result using
estimates derived earlier for selection.
86
Pipelining
Materialization - output of one operation is stored

in temporary relation for processing by next.
Could also pipeline results of one operation to
another without creating temporary relation.
Known as pipelining or on-the-fly processing.
Pipelining can save on cost of creating temporary
relations and reading results back in again.
Generally, pipeline is implemented as separate
process or thread.
87
Types of Trees
88
Pipelining
With linear trees, relation on one side of each

operator is always a base relation.
However, as need to examine entire inner relation
for each tuple of outer relation, inner relations
must always be materialized.
This makes left-deep trees appealing as inner
relations are always base relations.
Reduces search space for optimum strategy, and
allows QO to use dynamic processing.
Not all execution strategies are considered.
89
Query Optimization in Oracle
Oracle supports two approaches

optimization:rulebasedandcostbased.
to
query
Rulebased
15 rules, ranked in order of efficiency. Particular
access path for a table only chosen if statement
contains a predicate or other construct that makes
thataccesspathavailable.
Score assigned to each execution strategy using these
rankingsandstrategywithbest(lowest)scoreselected.
90
QO in Oracle Rule-Based
When 2 strategies producesame score,tiebreak

resolved by making decision based on order in
whichtablesoccurintheSQLstatement.
91
QO in Oracle Rule-based: Example

SELECTpropertyNo
FROMPropertyForRent
WHERErooms>7ANDcity=London
Singlecolumn access path using index on city from WHERE
condition(city=London).Rank9.
Unbounded range scan using index on rooms from WHERE
condition(rooms>7).Rank11.
Fulltablescanrank15.
AlthoughthereisindexonpropertyNo,columndoesnotappear
inWHEREclauseandsoisnotconsideredbyoptimizer.
Based on these paths, rulebased optimizer will choose to use
indexbasedoncitycolumn.
92
QO in Oracle Cost-Based
To improve QO, Oracle introduced costbased

optimizer in Oracle 7, which selects strategy that
requiresminimalresourceusenecessarytoprocessall
rows accessed by query (avoiding above tiebreak
anomaly).
User can select whether minimal resource usage is
based on throughput or based on response time, by
setting the OPTIMIZER_MODE initialization
parameter.
Costbased optimizer also takes into consideration
hintsthattheusermayprovide.
93
QO in Oracle Statistics
Costbased optimizer depends on statistics for all

tables,clusters,andindexesaccessedbyquery.
Users responsibility to generate these statistics and
keepthemcurrent.
PackageDBMS_STATScanbeusedtogenerateand
managestatistics.
Whenever possible, Oracle uses a parallel method to
gather statistics, although index statistics are
collectedserially.
EXECUTE
DBMS_STATS.GATHER_SCHEMA_STATS(Manager);
94
QO in Oracle Histograms
Previously made assumption that data values

within columns of a table are uniformly
distributed.
Histogram of values and their relative
frequencies gives optimizer improved selectivity
estimates in presence of nonuniform
distribution.
95
(a) shows uniform distribution of rooms and (b) the
actualnonuniformdistribution.
Firstcanbestoredcompactlyaslowvalue(1)andahigh
value (10), and as total count of all frequencies (in this
case,100).
96
Histogram is data structure that can improve

estimatesofnumberoftuplesinresult.
Twotypesofhistogram:
widthbalanced histogram, which divides data into a
fixed number of equalwidth ranges (called buckets)
each containing count of number of values falling
withinthatbucket;
heightbalanced
histogram,
which
places
approximately same number of values in each bucket
so that end points of each bucket are determined by
howmanyvaluesareinthatbucket.
97
(a)widthbalancedforroomswith5buckets.Eachbucket
ofequalwidthwith2values(12,34,etc.)
(b)heightbalancedheightofeachcolumnis20(100/5).
98
QO in Oracle Viewing Execution Plan
99

Chapter 20

Diunggah oleh

Informasi Dokumen

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Chapter 20

Diunggah oleh

Hak Cipta:

Format Tersedia

Chapter 20

Objectives of query processing and optimization.

Types of database statistics required to estimate

How to evaluate the cost and size of other RA

In network and hierarchical DBMSs, low-level

Also gives DBMS

Two main techniques for query optimization:

Disk access tends to be dominant cost in query

As there are many equivalent transformations of

Example 20.1 - Different Strategies

Example 20.1 - Different Strategies

(2) (position='Manager') (city='London')(

Example 20.1 - Different Strategies

Example 20.1 - Cost Comparison

Cost (in disk accesses) are:

(1) (1000 + 50) + 2*(1000 * 50) = 101 050

Cartesian product and join operations much more

Phases of Query Processing

QP has four main phases:

Phases of Query Processing

Dynamic versus Static Optimization

Two times when first three phases of QP can be

Dynamic versus Static Optimization

Advantages of static QO are removal of runtime

Aims are to transform high-level query into RA

Analyze query lexically and syntactically using

This query would be rejected on two grounds:

Finally, query transformed into some internal

Example 20.1 - R.A.T.

Converts query into a normalized form for easier

Conjunctive normal form:

Disjunctive normal form:

Rejects normalized queries that are incorrectly

For these queries, could construct:

Relation connection graph

Semantic Analysis - Normalized Attribute

Create node for each reference to an attribute, or

Example 20.2 - Checking Semantic Correctness

Relation connection graph not fully connected, so

Example 20.2 - Checking Semantic Correctness

Example 20.2 - Checking Semantic Correctness

Normalized attribute connection graph has cycle

Detects redundant qualifications,

Transformation Rules for RA Operations

Sometimes referred to as cascade of Selection.

Transformation Rules for RA Operations

Transformation Rules for RA Operations

Transformation Rules for RA Operations

If predicate p involves only attributes in projection list,

Transformation Rules for RA Operations

Rule also applies to Equijoin and Natural join.

Transformation Rules for RA Operations

If selection predicate involves only attributes of

Transformation Rules for RA Operations

If selection predicate is conjunctive predicate

p q(R X S) = ( p(R)) X ( q(S))

Transformation Rules for RA Operations

Transformation Rules for RA Operations

If projection list is of form L = L1 L2, where L1

Transformation Rules for RA Operations

If join condition contains additional attributes

Transformation Rules for RA Operations

city, branchNo (Branch))

and using the latter rule:

(1) (1000 + 50) + 2(1000 50) = 101 050