In this lesson we discuss the steps used by a DBMS to process high-level SQL
queries.
The query, expressed in the high-level language SQL, must first be scanned , parsed
and validated.
SCANNING
The scanner identifies the language components (tokens) in the text of the
query.
PARSING
It checks the incoming query for correct syntax (rules of grammar) of the
query language..
VALIDATING
That is ,the query must also be validated by checking that all attribute and
relation names are valid and semantically meaningful.
The DBMS must then devise an execution strategy for retrieving the result of the
query from the internal database files. A query typically has many possible execution
strategies.
The process of choosing a suitable one for processing a query is known as query
optimization . This is expanded further in the next lesson. <
The query optimizer module has the task of producing an execution plan.
The run time database processor has the task of running the query code whether
compiled or interpreted, to produce the query result. If a runtime error results, an error
message is generated by the runtime database processor.
Query Trees
Example 1
Suppose that we want to display the name of the book whose BookId is
ADM3713. The SQL query is as follows:
Select BookName
From Book
Where BookId=ADM3713;
Display the BookId and BookName of all books borrowed on 27 March, 2007.
In this lesson we discuss two Data Access Methods used to retrive data from a single
table.
When a user submits an SQL query, this has to be first translated into a procedure for
accessing the data from the disk using low-level READ operarations. The time to read
data from the disk is one of the major contributors to the time for processing the
query. Hence the query processor would want to select the Data Access method that is
the most efficient. There are usually several methods for accessing the required
records from the files on disk.
We first deal with processing of a query that involves a single table.
For queries involving a single table, the methods used to access the records are
1. Linear Search (file scan)
2. Index Search
BOOK:
Select BookName
From Book
Where BookId=ADM3713;
LINEAR SEARCH
Assume that the Book file is stored as an unordered file, i.e. the records are not sorted on any
field. For the Linear Search, we need to search through the entire Book table. We read each
record from the start of the file and for each record read, we check the BookId column. If the
BookId is ADM3713 then that meets the search condition we are looking for so we output the
Book Name field of that record. If the search is on a unique field such as BookId (as in the query
above) then once we have found a match we can terminate the search. If, however, the search
were on a non-unique field such as Edition, then we would need to continue reading through the
file to find all matching values.
INDEX SEARCH
An index is a data structure that provides quick lookup of data in a column (or columns) of a
table.For example, the Book table in a Library database can have an index on the BookId
column. This is a separate data structure with pointers to the rows of the Book file. So in order to
search for a book with a particular BookId, we no need not read each record of the Book file, but
instead search the BookId index structure for the value of BookId that we want and this will
point to the corresponding record in the Book file.
Indexes may be organized in different ways. We can think of an index as a list, with each entry
containing a BookId value and a pointer to the Book file where that record is in the Book file.
The entries are maintained in sorted order on BookId. In practice, indexes are not stored as
sequential lists as these become difficult to search if the file is large as well as inserting entries in
the index in the sorted order will require recreating the index. Most indexes are organized as tree
structures in such a manner that searches of the index involve branching through the tree from
root to leaf. A common tree structure for indexes is a B+ tree which is a tree structure with
certain rules that cause the tree to be wide and not too deep. This improves the search time.If the
index is on a unique field such as BookId, the index structure is referred to as a primary
index.You can have several indexes on a table in the database. In an extreme case, every column
of the table may be indexed. So for example for the Book file we can have three separate index
files, one on BookId, one on BookName, and one on Edition.
Indexes are created and maintained by the DBMS. The DBMS automatically creates an index on
the primary key of each table. The user can specify additional indexes using the 'Create Index'
SQL statement. Indexes have a cost for inserts, updates, and deletes. The DBMS has to do work
to maintain indexes. If you insert into or delete from a table, the system has to insert or delete
rows in all the indexes on the table. If you update a table, the system has to maintain those
indexes that are on the columns being updated. So having a lot of indexes can speed up select
statements, but slow down inserts, updates, and deletes.
Select BookName
From Book
Where BookId=ADM3713;
Assume that there is an index on the BookId column of the Book table. The Index Search Data
Access Method uses this index to find the record of the Book table with BookId = 'ADM3713'.
We will use a generic search method called SEEK which requires three parameters: the table
name, the Column that is indexed and the search value. The Index Search method will use the
system Seek method to search through the index for the required BookId value; this will return a
pointer to a record in the Book file. This record will then be read directly.
Bookrec = Seek (Book, BookId,ADM3713)
Output Bookrec.BookName
Nested Loops Data Access Method
Access cost is the time taken to retrieve the required data from disk. Access costs
depend on the method used in accessing the records in the file. Many queries involve
the joining of tables. A join operation can involve two or more tables. For queries
involving joins, the access methods used are:
2. Index join
3. Hash join
The SQL for the query which finds the BookId and name of all books borrowed on the
27th October, 2005 is given below.
And DateOut=27-10-2005;
The query involves the join of two tables, the Book table and the Loan table. BookId,
a foreign key in the Loan table, is the field on which the tables are joined. In the
Nested Loop join, one file is read sequentially and, for each record read, the second
file is read checking for a match on the join field. For the query above, the Book file
may be read and for each BookId in the Book file we find the rows that match in the
Loan file.
Each row found in the Loan file is then concatenated with the row in the Book file.
The resulting relation is then restricted to only those rows with DateOut=27-10-
2005. The resulting table is a projection of the BookId and BookName columns.
Alternatively, the Loan file can be selected as the outer file and, for each Loan record
with DateOut=27-10-2005, the Book file is read to find the matching BookId.
1. One file is selected as an outer file and the other as an inner file
3. For each row of the outer table search the inner table for matching rows on the
join field.
4. If any rows are found in the inner table concatenates the results with the current
row of the outer table.
Other conditions specified in the WHERE clause of the SQL statement can be
checked prior to the join (if the condition is on the records of the inner file) or after
the join (on the concatenated record).
BOOK:
LOAN:
For every record in the Book file (outer file), retrieve every record in Loan file and
test the condition: Book.BookId =Loan.BookId
Do While not EOF (Book file)
Begin
BookRec = Read (Book file)
Do While not EOF (Loan file)
Begin
LoanRec = Read(Loan file)
If BookRec.BookId =LoanRec.BookId then
If LoanRec.DateOut = 27-10-2005then
output BookRec.BookId, BookRec.BookName
End If
End If
End
End
The nested loop join repetitively scans the inner table. That is, the outer table is
scanned once but the inner table is scanned as many times as the number of rows in
the outer table that satisfies the condition. Nested loop join is often used if:
QUESTIONS:
1. Write the algorithm for the Loan table as the outer file.
2. How many records are read to do the nested loop join?
3. Under what conditions would the nested loop work well for this query?
4. How may we improve the nested loop join?
Index Join Data Access Method
The nested loop join repetitively scans the inner table. That is, the outer table is
scanned once but the inner table is scanned as many times as the number of rows in
the outer table that satisfies the condition. To increase the speed of this join operation,
the query processor can use an index on the column that joins the tables (if the index
exists). This type of join is referred to as an Index join.
INDEX JOIN
Using the tables in the Library database, we assume an index exists on BookId in the
Book table.
Let us revisit the SQL query which finds the BookId and name of all books borrowed
on the 27th October, 2005.
And DateOut=27-10-2005;
1. The table with the indexed field will be the inner table. In this case the inner
table is the Book table.
2. Scan the outer table (the Loan table) and for each record read, check whether it
matches the criterion (DateOut =27-10-2005).
3. Using the join field (BookId) from the outer table, use the index on BookId on
the Book table to seek the matching record in the inner table.
4. If any rows are found in the inner table concatenates the required fields with
the required fields of the outer table and output the results.
BOOK:
LOAN:
QUESTIONS
Consider what would happen if the index existed on the BookId field in the Loan
table.
1. Write the algorithm for the Loan table as the inner file with an index on BookId.
2. Under what conditions does the index join work well?