Anda di halaman 1dari 34

Database Performance Issues

& Query Optimisation


CT004-3.5-3-Advanced Database Systems Database Performance & QO
Topic & Structure of Lesson
In this lecture we will look at:
Database performance issues
Indexing
Steps involved in query optimization
Already covered most of this
Tips for SQL performance tuning
An overview of distributed query
processing
Slide 2 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Learning Outcomes
By the end of this lesson you should
be able to:
Discuss database performance issues
Explain how a query can be optimized
Use tips for tuning SQL
Slide 3 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Introduction to Performance
Tuning
You need an understanding/awareness of:
Transforming models to implementation
The relational model (for a number of reasons)
Technology factors
Slide 4 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Introduction to Performance Tuning
Database design starts with
modeling the requirement and
producing a conceptual model
In implementation, a number of trade-offs need to be
considered to:
Satisfy todays needs for information
Satisfy the above in reasonable time (performance
requirements)
Satisfy anticipated or unanticipated user demands, e.g. ad-
hoc queries
Be capable of being extended
Be easy to modify in changing hardware & software
environments
Slide 5 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Tuning
Application (Analyst/Programmer)
DBA - Systems Level Tuning
Vendor - Product specific
Investigation
Monitoring DB Statistics e.g.. logs, transaction times
Simulation
Describes how the system evaluates the query
Explain facility (Oracle)
Execution Plan facility (SQL Server)
Slide 9 (of 37)
Indexing
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Tuning: Table Indexing
Index is a means of expediting retrieval of data
e.g. Find all students with gpa > 3.3
May need to scan entire table

Index enables finding data quickly without having to scan
the whole table

Indexes are built on a column(s) (search key)
1 column or a combination of columns of a table
By default, primary key field is indexed
Fields marked as 'unique' are indexed

Index consists of a set of entries pointing to
locations of each search key
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
B-Tree Index Example
Commonly used with
attribute tables as well as
graphic-attribute tables
(CAD data structures)
Binary coding reduces the
search list by streaming
down the tree.
A balanced tree is best.
37
12
49
59
19
44
3
37
49 12
19 44 3
Number
High Low
Primary Key #
59
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Index
Indexes take up storage choose carefully
On popular 'queryed' columns
Can be added after understanding query patterns
WHERE condition should be tuned to take advantage of
the indexes
Rethink: attributes with a high badness factor, e.g.
gender
Full table scan may be better if hit rate > 20%

Indexes need 'maintainance'
Indexes need to be periodically refreshed
Indexes are normally refreshed during downtime
Indexes arent technically necessary for operation
Indexes must be maintained by DB administrator
Pruned, refreshed, analyzed...
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Create Index
CREATE INDEX index_name
ON table_name (column_name)
CREATE INDEX IDX_CUSTOMER_LAST_NAME
ON CUSTOMER (Last_Name)
CREATE INDEX IDX_CUSTOMER_LOCATION
ON CUSTOMER (City, Country)

CREATE TABLE employee_records (
name VARCHAR(50),
employeeID INT, INDEX (employeeID)
)

ALTER
DROP
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Types of Indexes
Clustered vs. Unclustered
Clustered- ordering of data records same as ordering
of data entries in the index
Unclustered- data records in different order from index
Primary vs. Secondary
Primary index on fields that include primary key
Secondary other indexes
Unique vs. Non-unique
Non-unique e.g. Lastname
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Example: Clustered Index
Sorted by sid
sid name gpa
50000 Dave 3.3
53650 Smith 3.8
53666 Jones 3.4
53688 Smith 3.2
53831 Madayan 1.8
53832 Guldu 2.0
50000
53600
53800
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Example: Unclustered Index
Sorted by sid but
Index on gpa
sid name gpa
50000 Dave 3.3
53650 Smith 3.8
53666 Jones 3.4
53688 Smith 3.2
53831 Madayan 1.8
53832 Guldu 2.0
1.8
2.0
3.2
3.3
3.4
3.8
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Database Management
Systems - TMC Computer
School, August 1999
Using Indexes
- Need to choose attributes to index wisely!
- Examine transaction requirements
- Update V Query?
- Volume of rows updated or queried
- Frequency of a query
- Transaction rates
- User priorities
- Index usage (Oracle)
- Not used with Nulls in the WHERE clause
- Nor used if mathematics is used with the indexed
attribute
- What queries could benefit most from an index?
Query Optimization
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Query Optimization
Objective: Find the optimum set of access paths to
retrieve the required data
Applies to updates and queries
Reasons for automating the optimization process:
Machine can use more information
Re-optimization easier following data re-organization
Optimizer can evaluate more solutions than a non-
automated process (i.e the user)
Automation makes expertise more widely available
Still very dependent on programmer skills
Query syntax dramatically affects access path choices
e.g whether or not an index is used
Slide 16 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Query Optimization
Still scope for human intervention
Still very dependent on programmer skills because query
syntax dramatically affects the access path choices
e.g whether or not an index is used
The term optimization can be an over claim
Slide 17 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Example
Get names of suppliers who supply part P2:
Select distinct s.name from s,sp
where s.s# = sp.s#
and sp.p = p2
The database contains 100 suppliers and 10,000
shipments, 50 of which supply p2
Consider how to evaluate the query without optimization?
Slide 18 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Unoptimized
1 Compute cartesian product of s and sp
involves reading the 10,000 sp tuples 100 times
resulting in 1,000,000 tuple reads
Product will contain 1,000,000 tuples which will
need to be written back to disk
2 Apply restriction in the where clause
involves 1,000,000 tuple reads but gives a 50 row
result which can stay in memory
3 Project the result of step 2 over sname
to give the final result, containing at most 50
tuples, which again can remain in memory
Slide 19 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Optimized
1 Restrict SP to those tuples containing p2
involves 10,000 tuple reads but the results has 50 rows which
stay in memory
2 Join the result of step 1 to relation S over s#
100 tuple reads and results in 50 tuples, still in memory
3 Project the result of step 2 over sname
to give a final result of 50 tuples
Optimized version is about 300 times faster in terms of
tuple I/O.
Unoptimized version needs about 3,000,000 I/Os whereas the
optimized version need around 10,100
A restriction followed by a join instead of a product then
the restriction has produced a dramatic improvement
Slide 20 (of 37)
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Example: Indexing
If SP was indexed or hashed on p# the tuples read in
step 1 would be 50 rather than 10,000 and optimised
version would be around 20,000 times faster
Also, I/Os in step 2 to at most 50
in practice, block I/O are what count.
Slide 21 (of 37)
SQL Tuning
CT004-3.5-3-Advanced Database Systems Database Performance & QO
SQL Optimizations: Basics
- Use column names instead of * in SELECT
- Try to minimize the number of subquery block in your query
- Try to use UNION ALL in place of UNION
- Avoid != or NOT or <>
Unable to use indexes even if 1 exists
- Avoid DISTINCT
- Avoid HAVING: e.g. Write the query as
SELECT subject, count(subject) FROM student_details
WHERE subject != 'Science' AND subject != 'Maths'
GROUP BY subject;
Instead of:
SELECT subject, count(subject) FROM student_details
GROUP BY subject
HAVING subject!= Science' AND subject!= Maths';

http://beginner-sql-tutorial.com/sql-query-tuning.htm
CT004-3.5-3-Advanced Database Systems Database Performance & QO
- Use IN instead of OR for non-indexed column
Better: .... WHERE <column> IN (val1, val2, val3)
Than: .... WHERE <column> = val1 OR <column> = val2...
- Use LIMIT 1 or EXISTS instead of IN or TOP
- Preferably use 'Prefix' pattern matches
- Avoid use of functions in Left-side of comparison
Better: ...WHERE first_name LIKE 'Chan%';
Than: WHERE SUBSTR(first_name,1,3) = 'Cha';
- Avoid functions on indexed columns
Better: WHERE event_date >= '2011/03/15' - INTERVAL 7 DAYS
Than: WHERE TO_DAYS(CURRENT_DATE) - TO_DAYS(event_date) <= 7
- General SQL rules
Use single case for all SQL verbs
Begin all SQL verbs on a new line
Separate all words with a single space
Right or left aligning verbs within the initial SQL verb
SQL Optimizations: More
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Improve Query Processing
- Verify that appropriate statistics are being collected
- Use INDEXed columns in WHERE clauses
- Use EXPLAIN to understand the query execution plan
- Use MySQL 'Query' cache
Caches result set and returns it if identical query re-comes
Does not work for prepared statements
http://dev.mysql.com/doc/refman/5.0/en/query-cache.html
- Sequence WHERE clause predicates from most restrictive to
least restrictive by table, by predicate type
- Adjust internal variables
Index Buffer Size, Table Buffer Size
Number of max open tables, Time limit for long queries
http://www.infoworld.com/d/data-management/7-performance-tips-faster-sql-
queries-262
http://www.codeforest.net/8-great-mysql-performance-tips
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Tips for MySQL/DB performance
- Choose the right data type
- (Almost) always have an id field
- Use ENUM (vs. Varchar) if appropriate
- Use NOT NULL constraints
- Fixed length tables (static) are faster
- Use vertical partitioning
- Choose the right storage engine
MyISAM for read-heavy, limited updates
InnoDB for more scale, has row-based locking
- Use PROCEDURE_ANALYSE for column type
recommendation
- Use persistent connections to DB
- Do big insert, updates and deletes in (small) batches
http://net.tutsplus.com/tutorials/other/top-20-mysql-best-practices/
http://www.infoworld.com/d/data-management/10-essential-performance-tips-
mysql-192815
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Prepared Statements
Prepare a query ones and inform the query engine
Reuse query -> get performance benefit

- Does not need to be 're-parsed' each time
- Protects application against SQL injection attacks
- (In MySQL) Transmitted in a native binary form -> more
efficient & help reduce network delays
- BUT cannot be used by query cache (in MySQL)

PreparedStatement updateemp = connection.prepareStatement
("insert into emp values(?,?)");
updateemp.setInt(1,23);
updateemp.setString(2,"Roshan");
updateemp.executeUpdate();

http://www.roseindia.net/jdbc/jdbc-mysql/TwicePreparedStatement.shtml
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Demos & Walkthrus
View queries executed
Check the query plan
Change query and re-execute
http://www.mysql.com/products/enterprise/demo.html

View Query Execution Plan
http://www.codeproject.com/Articles/9990/SQL-Tuning-
Tutorial-Understanding-a-Database-Execu

How to find if a query is worth optimizing?
http://www.mysqlperformanceblog.com/2012/09/11/how-
to-find-mysql-queries-worth-optimizing/

Slow Query Log and indexing walkthru
http://www.dreamhost.com/dreamscape/2013/08/19/mys
ql-checking-the-slow-query-log-and-simple-indexing/
Distributed Query Processing
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Slide 34 (of 37)
Distributed Query Processing
Consider get London suppliers of red parts
the user is at the New York site, data is in London
n suppliers satisfy the criteria
a relational system involves 2 messages



A non-relational system

NY London
A
n
NY London
A(n)
n
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Slide 35 (of 37)
Optimization is an important issue
There may be many ways of moving the data around
Rx at X, Ry at Y
Rx Y
Ry X
Rx, Ry Z
Distributed Query Processing
CT004-3.5-3-Advanced Database Systems Database Performance & QO
Slide 36 (of 37)
Distributed Query Processing
Suppliers (S) (S#, CITY) 10,000 Site A
Parts (P) (P#, COLOUR) 100,000 Site B
Supplies (SP) (S#, P#) 1000,000 Site A
Every tuple is 200 bits long
There are 10 red parts
100,000 shipments by London suppliers
Data transfer at 50,000 bps
Access delay of .1 second
Query: Find London suppliers of red parts.
Total time (t) = access delay + (data vol. / data rate)

CT004-3.5-3-Advanced Database Systems Database Performance & QO
Slide 37 (of 37)
Distributed Query Processing
Move relation P to site A and process
.1 + ((100,000 x 200) / 50,000) = 400s (6.67 minutes)
Move relations S and SP to B and process
.1 + ((10,000 + 1000,000) x 200) / 50,000 = 4040s
(1.12hrs)
Restrict P at site B (to give 10 red parts). Move the
result to site A
.1 + (10 x 200) / 50,000 = .14 second!
Distributed Query Optimization

Anda mungkin juga menyukai