Definitions
Query processing
translation of query into low-level activities
evaluation of query
data extraction
Query optimization
selecting the most efficient query evaluation
Advanced D
Query processing an
takes
course
cid
name
cid
courseid
courseid
coursename
00112233
Paul
00112233
312
312
Advanced DBs
00112238
Rob
00112233
395
395
Machine Learning
00112235
Matt
00112235
312
Advanced D
Query processing an
parser and
translator
relational algebra
expression
optimizer
evaluation
engine
output
data
Advanced D
evaluation plan
data
Query processing an
data
statistics
select:
project:
union:
difference: product: x
join:
Advanced D
Query processing an
name( cid<00112235(student) )
name(coursename=Advanced DBs((student
student
cid
takes)
courseid
takes
course) )
course
cid
name
cid
courseid
courseid
coursename
00112233
Paul
00112233
312
312
Advanced DBs
00112238
Rob
00112233
395
395
Machine Learning
00112235
Matt
00112235
312
Advanced D
Query processing an
Why Optimize?
Many alternative options to evaluate a query
name(coursename=Advanced DBs((student cid takes) courseid course) )
name((student cid takes) courseid coursename=Advanced DBs(course)) )
Advanced D
Query processing an
Evaluation plans
coursename=Advanced DBs l
courseid; index-nested
loop
student
name=Paul
cid; hash join
student
Advanced D
student
course
takes
Query processing an
Estimating Cost
What needs to be considered:
Disk I/Os
sequential
random
CPU time
Network communication
Advanced D
Query processing an
Advanced D
Query processing an
11
Selection (1/2)
Linear search
read all pages, find records that match (assuming equality search)
average cost:
nonkey BR, key 0.5*BR
Binary search
on ordered field
average cost: log 2 BR m
m additional pages to be read
m = ceil( SC(A,R)/FR ) - 1
Primary/Clustered
Index
average cost:
single record HTi + 1
multiple records HTi + ceil( SC(A,R)/FR )
Advanced D
Query processing an
13
Selection (2/2)
Secondary Index
average cost:
key field HTi + 1
nonkey field
worst case HTi + SC(A,R)
linear search more desirable if many matching records
Advanced D
Query processing an
14
1 2 ... n
multiple indices
disjunctive selections:
1 2 ... n
multiple indices
union of RIDs
linear search
Advanced D
Query processing an
15
Advanced D
Query processing an
16
Sorting
efficient evaluation for many operations
required by query:
SELECT cid,name FROM student ORDER BY name
implementations
internal sorting (if records fit in memory)
external sorting
Advanced D
Query processing an
17
Advanced D
Query processing an
18
// N pages allocated
Advanced D
Query processing an
19
Advanced D
Query processing an
20
Sort-Merge Example
a 12
d 95
R1
a 12
x 44
s 95
file
d 95
o 73
a 12
t 45
x 44
n 67
x 44
R2
memory
e 87
z 11
v 22
b 38
Advanced D
R3
o 73
run
a 12
d 95
f 12
f 12
pass
a 95
d
12
f 12
a 95
d
12
d 95
e 87
s 95
e 87
b 38
n 67
e 87
t 45
n 67
Query processing an
d 95
o 73
x 44
R4 v 22
z 11
b 38
f 12
s 95
b 38
a 12
t 45
v 22
z 11
pass
f 12
n 67
o 73
s 95
t 45
v 22
x 44
z 11
21
Sort-Merge cost
BR the number of pages of R
Sort stage: 2 * BR
read/write relation
Merge stage:
BR
initially M runs to be merged
BR
log
thus, total number of passes: M 1 M
Total cost:
2 * BR + 2 * B R *
BR
log
M 1
M
BR
Advanced D
Query processing an
22
Projection
1,2 (R)
remove unwanted attributes
scan and drop attributes
cost
initial scan + sorting + final scan
Advanced D
Query processing an
23
Join
name(coursename=Advanced DBs((student
implementations
cid
takes)
courseid
course) )
Advanced D
Query processing an
24
Advanced D
Query processing an
25
Advanced D
Query processing an
26
Advanced D
Query processing an
27
Advanced D
Query processing an
28
R S
Index on inner relation (S)
for each tuple in outer relation (R) probe index of inner relation
Costs:
BR + NR * c
c the cost of index-based selection of inner relation
Advanced D
Query processing an
29
Sort-merge join
R S
Relations sorted on the join attribute
Merge sorted relations
pointers to first record in each relation
read in a group of records of S with the same values in the join
attribute
read records of R and process
d D
e 67
e E
e 87
x X
n 11
v V
v 22
z 38
Advanced D
Query processing an
30
Hash join
R S
use h1 on joining attribute to map records to partitions that fit in memory
records of R are partitioned into R0 Rn-1
records of S are partitioned into S0 Sn-1
Advanced D
R0
S0
R1
S1
Rn-1
Sn-1
Query processing an
31
Exercise: joins
R S
NR=215
BR = 100
NS=26
BS = 30
B+ index on S
order 4
full nodes
Advanced Databases
32
Evaluation
evaluate multiple operations in a plan
materialization
pipelining
name
coursename=Advanced DBs
courseid; index-nested
loop
cid; hash join
student
Advanced D
course
takes
Query processing an
33
Materialization
create and read temporary relations
create implies writing to disk
more page writes
name
coursename=Advanced DBs
courseid; index-nested
loop
cid; hash join
student
Advanced D
course
takes
Query processing an
34
Pipelining (1/2)
creating a pipeline of operations
reduces number of read-write operations
implementations
demand-driven - data pull
producer-driven - data push
name
coursename=Advanced DBs
ccourseid; index-nested
loop
cid; hash join
student
Advanced D
course
takes
Query processing an
35
Pipelining (2/2)
can pipelining always be used?
any algorithm?
cost of R S
materialization and hash join: BR + 3(BR+BS)
pipelining and indexed nested loop join: NR * HTi
courseid
pipelined
materialized
R
cid
student
Advanced D
takes
coursename=Advanced DBs
course
Query processing an
36
Query Optimization
T, 12 possible orders
Advanced D
Query processing an
38
Cost estimation
operation (, ,
implementation
size of inputs
size of outputs
sorting
)
name
coursename=Advanced DBs
courseid; index-nested
loop
cid; hash join
student
Advanced D
course
takes
Query processing an
39
A v (R)
v min(A,R)
max(A,R) min(A,R)
1 2 ... n (R)
NR *
multiplying probabilities
N R *[(s1 N R ) *(s2 N R ) *...(sn N R )]
1 2 v... n (R)
Advanced D
Query processing an
40
R S
R S = : NR* NS
R S key for R: maximum output size is Ns
R S foreign key for R: NS
R S = {A}, neither key of R nor S
NR*NS / V(A,S)
NS*NR / V(A,R)
Advanced D
Query processing an
41
Expression Equivalence
conjunctive selection decomposition
1 2
(R) 1 ( 2 (R))
commutativity of selection
( (R)) ( (R))
1
1(R x S) = R
commutativity of joins
R
S=S
2 (S)
associativity of joins: R
Advanced D
A2 (S)
(S
T) = (R
S)
Query processing an
42
Advanced D
Query processing an
43
name
name
coursename=Advanced DBs
student
takes
Advanced D
ccourseid; index-nested
ccourseid; index-nested
loop
loop
course
student
takes
Query processing an
coursenam =
Advanced DBs
course
44
Advanced Databases
45
Summary
Estimating the cost of a single operation
Estimating the cost of a query plan
Optimization
choose the most efficient plan
Advanced D
Query processing an
46