Anda di halaman 1dari 18

Steps in Processing an SQL Query

In this lesson we discuss the steps used by a DBMS to process high-level SQL
queries.

Whenever an SQL Select data retrieval statement is submitted to the Query


Manager of the DBMS, distinct steps are invoked to process the SQL query and return
the required data to the user. These steps are given in the diagram below.
STEP 1 : Query Decomposition - Scanning, Parsing and Validating

The query, expressed in the high-level language SQL, must first be scanned , parsed
and validated.

SCANNING
The scanner identifies the language components (tokens) in the text of the
query.

PARSING

The parsing process has two functions:

It checks the incoming query for correct syntax (rules of grammar) of the
query language..

It breaks down the statement into component parts that can be


understood by the RDBMS. These component parts are stored in an
internal structure called a query tree. This is expanded further later in
this lesson

VALIDATING

That is ,the query must also be validated by checking that all attribute and
relation names are valid and semantically meaningful.

STEP 2 : QUERY OPTIMIZATION

The DBMS must then devise an execution strategy for retrieving the result of the
query from the internal database files. A query typically has many possible execution
strategies.
The process of choosing a suitable one for processing a query is known as query
optimization . This is expanded further in the next lesson. <
The query optimizer module has the task of producing an execution plan.

STEP 3 : QUERY CODE GENERATION

The code generator generates the code to execute that plan.


Code can be :

Executed directly ( interpreted mode).

Stored and executed later whenever needed ( compiled mode).


STEP 4 : RUNTIME DATABASE PROCESSING

The run time database processor has the task of running the query code whether
compiled or interpreted, to produce the query result. If a runtime error results, an error
message is generated by the runtime database processor.

Query Trees

A query tree is a tree structure that corresponds to a relational algebra expression by


representing the input relations as leaf nodes of the tree and the relational algebra
operations as internal nodes. An execution of a query tree consists of executing an
internal node operation when its operands are available and then replacing that
internal node by the relation that results from executing the operation. The execution
terminates when the root node is executed and produces the result relation.

Consider a table called Book in a library database as follows:

Book (BookId, BookName, Edition)

Example 1

Suppose that we want to display the name of the book whose BookId is
ADM3713. The SQL query is as follows:

Select BookName
From Book
Where BookId=ADM3713;

The equivalent Relational Algebra expression is:


BookName ( BookId = ADM3713 (Book))

The Query tree corresponding to this expression is:


Example 2

Display the BookId and BookName of all books borrowed on 27 March, 2007.

Select BookId, BookName


From Loan, Book
Where Loan.BookId = Book.BookId
And DateOut = 27-Mar-2007;
The equivalent Relational Algebra expression is:

BookId, BookName (( DateOut = 27-Mar-2007 (Loan)) BookId (Book))

The Query tree corresponding to this expression is:


There may be several possible query trees corresponding to a given query. For
example, we may reorder the internal nodes so that the join of the Book and Loan
tables is done before the Restrict operation on the Loan table. In general, it is more
efficient to perform the restrict before doing the join.
Linear Search and Index Search Data Access Methods

In this lesson we discuss two Data Access Methods used to retrive data from a single
table.

When a user submits an SQL query, this has to be first translated into a procedure for
accessing the data from the disk using low-level READ operarations. The time to read
data from the disk is one of the major contributors to the time for processing the
query. Hence the query processor would want to select the Data Access method that is
the most efficient. There are usually several methods for accessing the required
records from the files on disk.
We first deal with processing of a query that involves a single table.

QUERY PROCESSING WITH SINGLE TABLES

For queries involving a single table, the methods used to access the records are
1. Linear Search (file scan)
2. Index Search

Let's look at an example to illustrate these methods.

Given the table: Book (BookId, BookName, Edition)

BOOK:

BookId BookName Edition

ROB7600 Database Systems: Design, Implementation, and Management 5


ADM3713 Modern Database Management 7
OSB14565 Oracle database 10 g: A Beginners Guide 1
ELM1074 Fundamentals of Database Systems 4
HMN8660 Building Electronic Commerce with Web Database Constructions

Query: Display the name of the book with BookId ADM3713

Select BookName
From Book
Where BookId=ADM3713;

LINEAR SEARCH

Assume that the Book file is stored as an unordered file, i.e. the records are not sorted on any
field. For the Linear Search, we need to search through the entire Book table. We read each
record from the start of the file and for each record read, we check the BookId column. If the
BookId is ADM3713 then that meets the search condition we are looking for so we output the
Book Name field of that record. If the search is on a unique field such as BookId (as in the query
above) then once we have found a match we can terminate the search. If, however, the search
were on a non-unique field such as Edition, then we would need to continue reading through the
file to find all matching values.

Do While not EOF(BookFile) //process until end of file//


Begin
Bookrec = Read(BookFile) // read the first(next) record in the Book file //
If Bookrec.BookId = ADM3713 Then
Output Bookrec.BookName
End If
End

INDEX SEARCH

An index is a data structure that provides quick lookup of data in a column (or columns) of a
table.For example, the Book table in a Library database can have an index on the BookId
column. This is a separate data structure with pointers to the rows of the Book file. So in order to
search for a book with a particular BookId, we no need not read each record of the Book file, but
instead search the BookId index structure for the value of BookId that we want and this will
point to the corresponding record in the Book file.
Indexes may be organized in different ways. We can think of an index as a list, with each entry
containing a BookId value and a pointer to the Book file where that record is in the Book file.
The entries are maintained in sorted order on BookId. In practice, indexes are not stored as
sequential lists as these become difficult to search if the file is large as well as inserting entries in
the index in the sorted order will require recreating the index. Most indexes are organized as tree
structures in such a manner that searches of the index involve branching through the tree from
root to leaf. A common tree structure for indexes is a B+ tree which is a tree structure with
certain rules that cause the tree to be wide and not too deep. This improves the search time.If the
index is on a unique field such as BookId, the index structure is referred to as a primary
index.You can have several indexes on a table in the database. In an extreme case, every column
of the table may be indexed. So for example for the Book file we can have three separate index
files, one on BookId, one on BookName, and one on Edition.
Indexes are created and maintained by the DBMS. The DBMS automatically creates an index on
the primary key of each table. The user can specify additional indexes using the 'Create Index'
SQL statement. Indexes have a cost for inserts, updates, and deletes. The DBMS has to do work
to maintain indexes. If you insert into or delete from a table, the system has to insert or delete
rows in all the indexes on the table. If you update a table, the system has to maintain those
indexes that are on the columns being updated. So having a lot of indexes can speed up select
statements, but slow down inserts, updates, and deletes.

Query: Display the name of the book with BookId ADM3713

Select BookName
From Book
Where BookId=ADM3713;

Assume that there is an index on the BookId column of the Book table. The Index Search Data
Access Method uses this index to find the record of the Book table with BookId = 'ADM3713'.
We will use a generic search method called SEEK which requires three parameters: the table
name, the Column that is indexed and the search value. The Index Search method will use the
system Seek method to search through the index for the required BookId value; this will return a
pointer to a record in the Book file. This record will then be read directly.
Bookrec = Seek (Book, BookId,ADM3713)
Output Bookrec.BookName
Nested Loops Data Access Method

Access cost is the time taken to retrieve the required data from disk. Access costs
depend on the method used in accessing the records in the file. Many queries involve
the joining of tables. A join operation can involve two or more tables. For queries
involving joins, the access methods used are:

1. Nested loops (brute force)

2. Index join

3. Hash join

4. Sort- merge join

The procedure discussed in this lesson is the Nested Loop Join.

NESTED LOOP JOIN

Consider the following tables in the Library database.

Book (BookId, BookName, Edition)

Member (MemberId, MemberName, Address, UnpaidFines)

Loan (LoanId, MemberId, BookId, DateOut, DateDue DateReturned)

The SQL for the query which finds the BookId and name of all books borrowed on the
27th October, 2005 is given below.

Select BookId, BookName

From Loan, Book


Where Loan.BookId = Book.BookId

And DateOut=27-10-2005;

The query involves the join of two tables, the Book table and the Loan table. BookId,
a foreign key in the Loan table, is the field on which the tables are joined. In the
Nested Loop join, one file is read sequentially and, for each record read, the second
file is read checking for a match on the join field. For the query above, the Book file
may be read and for each BookId in the Book file we find the rows that match in the
Loan file.
Each row found in the Loan file is then concatenated with the row in the Book file.
The resulting relation is then restricted to only those rows with DateOut=27-10-
2005. The resulting table is a projection of the BookId and BookName columns.
Alternatively, the Loan file can be selected as the outer file and, for each Loan record
with DateOut=27-10-2005, the Book file is read to find the matching BookId.

In a nested loop join the following steps occur:

1. One file is selected as an outer file and the other as an inner file

2. Scan the outer file.

3. For each row of the outer table search the inner table for matching rows on the
join field.

4. If any rows are found in the inner table concatenates the results with the current
row of the outer table.

Other conditions specified in the WHERE clause of the SQL statement can be
checked prior to the join (if the condition is on the records of the inner file) or after
the join (on the concatenated record).

BOOK:

BookId BookName Edition

ROB7600 Database Systems: Design, Implementation, and Management 5


ADM3713 Modern Database Management 7
OSB14565 Oracle database 10 g: A Beginners Guide 1
ELM1074 Fundamentals of Database Systems 4
HMN8660 Building Electronic Commerce with Web Database Constructions

LOAN:

LoanId MemberId BookId DateOut DateDue DateReturned

01234 05711045 ROB7600 27-10-2005 04-11-2005


01235 05731045 ADM3713 27-10-2005 10-11-2005
01236 05711035 OSB14565 27-10-2005 04-11-2005
01237 05713045 ROB7600 20-10-2005 27-10-2005 27-10-2005
01238 05721045 ELM1074 27-10-2005 04-11-2005
01239 05711045 ADM3713 20-10-2005 24-10-2005 24-10-2005
01240 05710095 OSB14565 21-10-2005 04-11-2005
01241 05711045 ELM1074 05-10-2005 14-10-2005 14-10-2005
01242 05611045 HMN8660 27-10-2005 04-11-2005
01243 05781045 ADM3713 18-10-2005 27-10-2005 27-10-2005

The algorithm for Nested loops (brute force) is given below:

For every record in the Book file (outer file), retrieve every record in Loan file and
test the condition: Book.BookId =Loan.BookId
Do While not EOF (Book file)
Begin
BookRec = Read (Book file)
Do While not EOF (Loan file)
Begin
LoanRec = Read(Loan file)
If BookRec.BookId =LoanRec.BookId then
If LoanRec.DateOut = 27-10-2005then
output BookRec.BookId, BookRec.BookName
End If
End If
End
End

For this algorithm:


No of records in the Book file: 5
No. of records in the Loan file: 10
Total number of reads to perform nested loop join: 5 * 10 = 50
Performance considerations

The nested loop join repetitively scans the inner table. That is, the outer table is
scanned once but the inner table is scanned as many times as the number of rows in
the outer table that satisfies the condition. Nested loop join is often used if:

The outer table is small.

Predicates reduce the number of qualifying rows in the outer table.

QUESTIONS:
1. Write the algorithm for the Loan table as the outer file.
2. How many records are read to do the nested loop join?
3. Under what conditions would the nested loop work well for this query?
4. How may we improve the nested loop join?
Index Join Data Access Method

The nested loop join repetitively scans the inner table. That is, the outer table is
scanned once but the inner table is scanned as many times as the number of rows in
the outer table that satisfies the condition. To increase the speed of this join operation,
the query processor can use an index on the column that joins the tables (if the index
exists). This type of join is referred to as an Index join.

INDEX JOIN

Using the tables in the Library database, we assume an index exists on BookId in the
Book table.

Let us revisit the SQL query which finds the BookId and name of all books borrowed
on the 27th October, 2005.

Select BookId, BookName

From Loan, Book

Where Loan.BookId = Book.BookId

And DateOut=27-10-2005;

In an index join the following steps occur:

1. The table with the indexed field will be the inner table. In this case the inner
table is the Book table.

2. Scan the outer table (the Loan table) and for each record read, check whether it
matches the criterion (DateOut =27-10-2005).

3. Using the join field (BookId) from the outer table, use the index on BookId on
the Book table to seek the matching record in the inner table.
4. If any rows are found in the inner table concatenates the required fields with
the required fields of the outer table and output the results.

BOOK:

BookId BookName Edition

ROB7600 Database Systems: Design, Implementation, and Management 5


ADM3713 Modern Database Management 7
OSB14565 Oracle database 10 g: A Beginners Guide 1
ELM1074 Fundamentals of Database Systems 4
HMN8660 Building Electronic Commerce with Web Database Constructions

LOAN:

LoanId MemberId BookId DateOut DateDue DateReturned

01234 27-10- 04-11-2005


05711045 ROB7600
2005
01235 27-10- 10-11-2005
05731045 ADM3713
2005
01236 27-10- 04-11-2005
05711035 OSB14565
2005
01237 20-10- 27-10-2005 27-10-2005
05713045 ROB7600
2005
01238 27-10- 04-11-2005
05721045 ELM1074
2005
01239 20-10- 24-10-2005 24-10-2005
05711045 ADM3713
2005
01240 21-10- 04-11-2005
05710095 OSB14565
2005
01241 05-10- 14-10-2005 14-10-2005
05711045 ELM1074
2005
01242 05611045 HMN8660 27-10- 04-11-2005
2005
01243 18-10- 27-10-2005 27-10-2005
05781045 ADM3713
2005

The algorithm for Index join is given below:


For every record in the Loan file (outer file) that matches the specified condition
(DateOut=27-10-2005), use the index on the BookId field in the Book File to
seek the corresponding record.

Do While not EOF (Loan File)


Begin
LoanRec = Read (Loan File)
If LoanRec.DateOut = 27-10-2005 then
y= LoanRec. BookId
BookRec = Seek (Book File, BookId, y)
Output BookRec.BookId, BookRec.BookName
End If
End

QUESTIONS
Consider what would happen if the index existed on the BookId field in the Loan
table.
1. Write the algorithm for the Loan table as the inner file with an index on BookId.
2. Under what conditions does the index join work well?

Anda mungkin juga menyukai