SQL Technology

SQL 101: Get Your Information in Order
http://www.oracle.com/technetwork/issue-archive/2011/11-sep/o51sql-453459.html[8/30/2014 11:14:08 AM]

As Published In

September/October 2011
TECHNOLOGY: SQL 101

Get Your Information in Order
By Melanie Caffrey

Part 1 in a series on the basics of the relational
database and SQL
Ask most seasoned professionals working with Oracle Database instances today what their chief
complaint regarding performance issues is, and 9 times out of 10, they will answer with responses
along the lines of lack of SQL expertise, poorly written SQL statements, or poorly trained
database programmers. As relational databases have become a necessary part of everyday
business life, knowledge of structured query language (SQL) has become paramount.
Paradoxically, however, learning good SQL programming techniques has taken a backseat to
creating, for example, user-friendly, attractive interfaces written in database-agnostic
programming languages such as J ava. Programmers may spend a great deal of time learning their
chosen or assigned interface language and very little to no time learning SQL.
This series of SQL 101 articles is for those new to or not yet completely familiar with relational
database concepts and SQL coding constructs. It is for anyone learning SQL, tasked with
teaching SQL to others within a workgroup, or managing programmers who write database
access code. This first article in the series begins with information about the basic building blocks
that all programmers (or DBAs/designers/managers) should know when writing their first set of
SQL statements.
How Is Data Organized in a Relational Database?
Being able to visualize how data is organized in a database is key to retrieving that data quickly
and easily. Whenever you withdraw money from an ATM, you are reading and manipulating data.
Whenever you purchase anything online, you are changing data. Whether you are banking,
shopping, or performing one of many business activities, you are likely interacting with a relational
database.
A relational database stores data in a two-dimensional matrix known as a table, and tables
generally consist of multiple columns and rows. (I say generally here because it is possible to
have a table with just one column and no rows, although it is not common. I will cover that
exception in later articles in this series.) Relational databases employ relational database
management system (RDBMS) software to help manage the task of giving a user the ability to
read and manipulate data without knowing the exact file and/or drive storage device location
where a particular piece of information can be found. (Oracle Database is, among other things, an
RDBMS.) Users need only know which tables contain the information they seek. The RDBMS
relies on SQL constructs and keywords, provided by users, to access the tables and the data
contained within the tables columns and rows.
How Is Data Represented in a Relational Database?
Each table in a relational database usually contains information about a single type of data and
has a unique name, distinct from all other tables in that schema. A schema is typically a grouping
of objects (such as tables) that serve a similar business function. For example, three tables that
contain data about employees, departments, and payroll details, respectively, may exist together
inside a schema named HR. There can be only one table named EMPLOYEE inside the HR
schema (for the purposes of this introductory explanation, discussions regarding features that
support the coexistence of tables with the same name, such as Oracle Database 11gs editions
and Edition-Based Redefinition, are beyond the scope of this article). Now suppose the
information in an EMPLOYEE table includes the structure and content shown in Figure 1.
Figure 1: The EMPLOYEE table
A table consists of at least one column, and a column is a set of values of a particular type of
data. Like a tables name within a schema, a columns name is unique within a table and should
clearly identify the type of data it contains. For example, the EMPLOYEE table has columns for
the employees first name (FIRST_NAME), last name (LAST_NAME), hire date (HIRE_DATE),
and manager (MANAGER) in this initial, scaled-down representation. In Figure 1, the employee
LAST_NAME Newton represents a single data element within the table. And the HIRE_DATE 14-
SEP-2005 represents another single data element.
Further, each row in a table represents a single set of data. For example, the row of the
EMPLOYEE table with the FIRST_NAME Frances and LAST_NAME Newton represents a unique
set of data. Each intersection of a column and a row represents a value. However, note that some
values are not present. In Figure 1, not every row/column combination for MANAGER contains a
value. In this case, such values are said to be NULL. A null value is not a blank, a space, or a
zero. It is the absence of a value.
The Key to Good Relations
For a row to be able to uniquely represent a particular set of data, it must be unique for the entire
set of row/column value intersections within the table. If the company using the EMPLOYEE table
were to hire another employee named Frances Newton on September 14, 2005, and Frances
Newton had no MANAGER value associated with her row in the EMPLOYEE table yet, the
original entry for Frances Newton would no longer be unique. This coincidence of identical data is
referred to as a duplicate. Duplicate entries should not be allowed in tables (more in subsequent
articles on why this is so). Therefore, the EMPLOYEE table requires a column that will ensure
uniqueness for every row, even if the company hires several employees with the same names
and employment details.
Enter the primary key. The primary key is a column that ensures uniqueness for every row in a
table. When a primary key is added to the EMPLOYEE table, the two Frances Newtons are no
longer alike, because one now has an EMPLOYEE_ID value of 37 and the other has an
EMPLOYEE_ID value of 73. Figure 2 illustrates the addition of the EMPLOYEE_ID primary key to
the EMPLOYEE table.
Figure 2: The EMPLOYEE_ID primary key
Note that the EMPLOYEE_ID value appears to have nothing specifically to do with the rest of the
column/row combination values that follow it within each row. In other words, it has nothing to do
with the employee data per se. This type of key is often a system-generated sequential number,
and because it has nothing to do with the rest of the data elements in the table, it is referred to as
a synthetic or surrogate key. Using such a key is advantageous in maintaining the uniqueness of
each row, because the key is not subject to updates and is therefore immutable. It is best to avoid
primary key values that are subject to changes, because they result in complexity that is almost
impossible to manage.
A table can have only one primary key, comprising one or several columns. A key comprising
more than one column is referred to as a composite or concatenated key. In some cases, a
primary key may not be necessary or even appropriate. In most cases, however, it is strongly
recommended that every table have a primary key. (Oracle Database does not require every
table to have a primary key, however.)
Careful Consideration of Foreign Relations
Up to now, this discussion has focused on how data is organized in a single table. But a relational
database also connects (relates) tables and organizes information across multiple tables.
An important connector in a relational database is the foreign key, which identifies a column or set
of columns in one table that refers to a column or set of columns in another table. In addition to
connecting information, the use of foreign key relations between tables helps keep information
organized.
For example, if you store the department name of each employee alongside each employees
details in the EMPLOYEE table, you may very well see the same department name repeated
across multiple employee listings. And any change to a department name would require that
name to subsequently be updated in every row for every employee in that department.
If, however, you split the data into two tables, EMPLOYEE and DEPARTMENT, as shown in
Figure 3, you will be simultaneously establishing a relationship between the tables by using a
foreign key and organizing the data to minimize updates and provide the best-possible data
consistency. Consider the resultant data split shown in Figure 3.
Figure 3: EMPLOYEE and DEPARTMENT tables with a foreign key relationship
In Figure 3, DEPARTMENT_ID is a foreign key column in the EMPLOYEE tableit links the
EMPLOYEE and DEPARTMENT tables together. You can find all employee details for every
employee in a particular department by looking at that DEPARTMENT_ID column in the
EMPLOYEE table. The DEPARTMENT_ID value corresponds to one row in the DEPARTMENT
table that provides department-specific information.
Less Is More, or Less Is the Norm
Key to understanding relational databases is knowledge of data normalization and table
relationships. The objective of normalization is to eliminate redundancy and thereby avoid future
problems with data manipulation. The rules that govern how a database designer should go about
minimizing the duplication of data have been formulated into various normal forms. The design of
tables, columns, and primary and foreign keys follows these normalization rules, and the process
is called normalization.
There are many normalization rules. The most commonly accepted are the five normal forms and
the Boyce-Codd normal form. In my experience, many programmers, analysts, and designers do
not normalize beyond the third normal form, although experienced database designers may.
First and foremost. For a table to be in first normal form, all repeating groups must be moved to
a new table. Consider the example in Figure 4, in which several office location columns have
been added to the EMPLOYEE table.
Figure 4: EMPLOYEE table with office location columnsfirst normal form violation
The table has multiple columns containing office location values for those employees who travel
frequently for work and are required to physically work in multiple office locations. Office location
values are listed in three columns: OFFICE_1, OFFICE_2, and OFFICE_3. What will happen
when one of these employees is required to work in an additional location?
To avoid problems or having to add yet another column, OFFICE_4, the database designer
moves the office location data to a separate table named EMPLOYEE_OFFICE_LOCATION, as
shown in Figure 5.
Figure 5: The EMPLOYEE and EMPLOYEE_OFFICE_LOCATION tables in first normal form
Second normal form and composite keys. Second normal form is a special-case normal form
that has to do with tables that have composite primary keys. A composite primary key includes
two or more columns.
In second normal form, all nonkey columns must depend on the entire primary key. In other
words, any nonkey columns added to a table with a composite primary key cannot be dependent
on only part of the primary key.
Figure 6 illustrates the EMPLOYEE_OFFICE_LOCATION table, and in this table the primary key
is a combination of the EMPLOYEE_ID and OFFICE columns, so any columns added to this table
must be dependent on both primary key columns. The OFFICE_PHONE_NUMBER column is
dependent solely on the OFFICE column, however; it has nothing to do with the EMPLOYEE_ID
column.
Figure 6: Second normal form violation
Next Steps
READ more about relational database
design and concepts
Oracle Database Concepts 11g
Release 2 (11.2)
Oracle Database SQL Language
Reference 11g Release 1 (11.1)
READ more SQL 101
The EMPLOYEE_OFFICE_LOCATION table in Figure 6 is in violation of second normal form. For
this table to comply with second normal form, another table must be created and the
OFFICE_PHONE_NUMBER data must be moved to the new table.
The key and nothing but the key. Third normal form expands on second normal form. It dictates
that every nonkey column must be a detail or a fact about the primary key. Figure 7 illustrates a
third normal form violation.
Figure 7: Third normal form violation
The addition of the DEPT_NAME column in this
EMPLOYEE table violates third normal form in that
DEPT_NAME is dependent on the DEPT_ID value,
not the EMPLOYEE_ID value. Complying with the
rules of third normal form necessitates creating
another table and moving the department name
values into this new table, with a foreign key
reference from the EMPLOYEE table to the new
table. (The solution in Figure 3 demonstrates third
normal form: DEPARTMENT_ID is a foreign key
column in the EMPLOYEE table.)

Where Does SQL Fit In?
With SQL you create new data, delete old data,
modify existing data, and retrieve data from a relational database. The following statement, for
example, creates a new EMPLOYEE table:
CREATE TABLE empl oyee
( empl oyee_i d NUMBER,
f i r st _name VARCHAR2( 30) ,
l ast _name VARCHAR2( 30) ,
hi r e_dat e DATE,
sal ar y NUMBER( 9, 2) ,
manager NUMBER) ;

Note that this statement creates a table with the names of each column (aka data attribute) and
each columns respective datatype and length. For example, this line of the CREATE TABLE
statement:
empl oyee_i d NUMBER

creates a table column (data attribute) called EMPLOYEE_ID with a datatype of NUMBER. This
EMPLOYEE_ID column is therefore defined to contain only numeric data.
The SQL SELECT statement enables you to retrieve, or query, data. For example, the following
SQL statement retrieves all FIRST_NAME, LAST_NAME, and HIRE_DATE column values from
this articles EMPLOYEE table:
SELECT f i r st _name, l ast _name, hi r e_dat e
FROM empl oyee;

As you can see, the syntax is fairly straightforward.
Conclusion
This article introduced the organization and structure of a relational database. It described tables,
columns, and rows and presented examples of data and how it is represented in a relational
database. The next article in this SQL 101 series will continue the discussion of data
normalization and introduce the SQL execution environment.
Melanie Caffrey is a senior development manager at Oracle. Caffrey is a
coauthor of Expert PL/SQL Practices for Oracle Developers and DBAs
(Apress, 2011).
Send us your comments
SQL 101: Modeling and Accessing Relational Data
http://www.oracle.com/technetwork/issue-archive/2011/11-nov/o61sql-512018.html[8/30/2014 10:39:59 AM]
As Published In

November/December 2011
TECHNOLOGY: SQL 101

Modeling and Accessing Relational
Data
By Melanie Caffrey

Part 2 in a series on the basics of the relational database and SQL
Part 1 in this series, Get Your Information in Order (Oracle Magazine, September/October
2011), introduced readers to relational databases and the language chiefly used to interact with
them: structured query language (SQL). Building on the material I presented in Part 1, this article
explains relationship concepts in more depth and outlines the process of designing a new
database. Then it introduces you to three tools available at no charge that you can use for
viewing and managing the data in an Oracle Database instance via SQL. (Although Ill briefly
review the concepts covered in Part 1, I encourage you to read that installment before starting
this one.)
Relating in Different Ways
Data is organized in a relational database as tablestwo-dimensional matrices made up of
columns and rows. A tables primary key, enforced by a primary key constraint (which will be
defined in a future article in this series), is a column or combination of columns that ensures that
every row in a table is uniquely identified. Two tables that have a common column are said to
have a relationship between them; the common column is the foreign key in one table and the
primary key in the other. The value (if any) stored in a row/column combination is a data element.
The cardinality of a relationship is the ratio of the number (also called occurrences) of data
elements in two tables related column(s). Relationship cardinality can be of three types: one-to-
many, one- to-one, or many-to-many.
One-to-many (1:M). The most common type of relationship cardinality is a 1:M relationship.
Consider the relationship between the EMPLOYEE and DEPARTMENT tables shown in Figure 1.
The common column is DEPARTMENT_ID (which is the primary key in the DEPARTMENT table
and the foreign key in the EMPLOYEE table). One individual DEPARTMENT_ID can relate to
many rows in the EMPLOYEE table. This represents the business rule that one department can
relate to one or many employees (or even no employees) and that an employee is associated
with only one department (or, in some cases, no department). This business rule can be restated
as follows: Each employee in a department may be in one and only one department, and that
department must exist in the department table.
Figure 1: EMPLOYEE and DEPARTMENT tables with a 1:M relationship
One-to-one (1:1). A 1:1 relationship between the DEPARTMENT table and the MANAGER table
is depicted in Figure 2. For every row in the DEPARTMENT table, only one matching row exists in
the MANAGER table. Expressing this as a business rule: Every department has at least one and
at most one manager, and, conversely, each manager is assigned to at least one and at most
one department.
Figure 2: DEPARTMENT and MANAGER tables with a 1:1 relationship
One-to-one data relationships can exist, but they are not typically implemented as two tables
(though they may be modeled that way). Instead, the data is combined into one table for the sake
of simplicity.
Many-to-many (M:M). Consider the EMPLOYEE and PROJECT tables in Figure 3. The business
rule is as follows: One employee can be assigned to multiple projects, and one project can be
supported by multiple employees. Therefore, it is necessary to create an M:M relationship to link
these two tables.
Figure 3: EMPLOYEE and PROJECT tables requiring an M:M relationship
To support the relational database model, an M:M relationship must be resolved into 1:M
relationships. Figure 4 illustrates this resolution with the creation of an associative table (also
sometimes called an intermediate or intersection table) named EMPLOYEE_PROJECT.
Figure 4: Associative EMPLOYEE_PROJECT table that resolves the M:M relationship
In this example, the associative tables primary keya composite of its Employee_ID and
Project_ID columnsis foreign-key-linked to the tables for which it is resolving an M:M
relationship. It reflects that one employee can be assigned to multiple projectsand, in this
example, that one employee can be assigned multiple and different responsibilities for each
project to which that person is assigned. Note that the employee with an EMPLOYEE_ID value of
1234 is assigned to two projects but that his responsibilities are different for each project.
Rendering Relational Roadmaps
For conceptual purposes, it is helpful to display table relationships of different types by using
database schema diagrams. (A schema is typically a grouping of objects, such as tables, that
serve a similar business function.) Schema diagramsalso known as physical data modelscan
use several types of standard notation. The schema diagram in Figure 5 uses a convention called
crows-foot notationthe standard notation used by Oracle SQL Developer Data Modeler, a
database modeling and design tool. With crows-foot notation, a crows foot, or fork, indicates the
many side of the relationship and a single line with an arrowhead indicates the 1 side of the
relationship.
Figure 5: A database schema diagram showing a mandatory foreign key relationship between the
EMPLOYEE and DEPARTMENT tables
The schema diagram in Figure 5 shows a mandatory relationship between the EMPLOYEE table
and the DEPARTMENT table. A mandatory relationshipindicated by the solid line between the
crows foot and the arrowheadis one in which a value must be present in a foreign key column.
In Figure 5, this means that every employee must be assigned to at most one department and at
least one department. If the relationship were optional, the relationship line would be dotted
instead of solid, indicating that an employee can be assigned to one or no department. (In Figure
5, the P in the left margin indicates the primary key, and the F indicates a foreign key.)
Paving the Way with Analysis and Design
Before developers, DBAs, or database architects create tablesor even schema diagramsthey
gather data requirements that identify the needs of the system users. Among other things,
requirements gathering should result in a list of individual data elements the users consider
important and need to store. The developer or DBA then groups what the users consider to be the
most important data elements in this list into entities. The individual data elements are known as
the entities attributes.
This creation of entities and their attributes, along with the necessary entity relationships based on
business rules, is often referred to as logical data modeling. It is logical because it doesnt take
into consideration a specific technical (or physical) implementation. Entity is the logical term for a
physical implementations table. And attribute (or, sometimes, field) is the logical term for column.
A diagram depicting a logical data model is known as an entity-relational diagram (ERD).
Specialized tools such as Oracle SQL Developer Data Modeler facilitate the generation of ERDs.
After you have created a logical data model and chosen your physical technical implementation,
such as Oracle Database 11g, you are ready to create your database schema diagram (the
physical model). When you create the schema diagram, you assign a datatype to each of your
tables columns. The datatype specifies the type of data (such as numeric, character, or date) that
can be stored in each column.
Normalization Versus Denormalization
Data normalization is the process, based on widely accepted rules, that developers and DBAs use
for tables to eliminate redundancy from data. Conversely, denormalization is the act of adding
redundancy to data. When designing a physical database model, database designers must weigh
the advantages of eliminating all redundancyresulting in data that is split into many tables
against possible reduced query (data retrieval) performance when some or many of these tables
are joined together via SQL. (Youll learn more about the role of the JOIN clause in SQL queries
later in this series.) Only experienced database designers should denormalize. Increasing
redundancy might marginally improve query performance, but it will always increase the overall
programming effort and complexity, because multiple copies of the same data must be kept in
sync. The process of syncing multiple copies of data threatens data integrity.
Data Access and the SQL Execution Environment
Oracle software runs on many different hardware architectures and operating systems. The
Next Steps
design and concepts
Release 2 (11.2)
Oracle SQL Developer Users Guide
Release 2.1
READ SQL 101 Part 1
computer on which the Oracle Database software resides is known as the Oracle Database
server. Additionally, Oracle Database server can refer to the Oracle Database software and its
data. (The remainder of this article refers to the latter definition.) Specialized tools installed on
users computers enable them to access data on the Oracle Database server. These tools
called clients, or front endsare used to send SQL commands to the server, or back end. Three
such tools are Oracle SQL Developer, Oracles SQL*Plus, and Oracle Application Express SQL
Workshop.
SQL commands instruct the server to perform certain actions in the database. A command can
create a table, query a table, change a table, add new data, or update existing data, among other
things. In response to a query request, for example, the server returns a result set to the client,
which then displays it to the user.
Before you can begin to use any of these tools to communicate SQL requests to the Oracle
Database server, you must create a database connection. If you are using SQL Developer, read
the 1.4 Database Connections section in the Oracle SQL Developer Users Guide Release 2.1,
to learn how to set up a connection to your database. If you are using SQL*Plus or SQL
Workshop within Oracle Application Express, ask your database administrator to create a
database connection for you. You also need a database administrator to create a username and a
password for you, with appropriate permissions that enable you to create your own objects. See
the CREATE USER section of the Oracle Database SQL Language Reference 11g Release 2
(11.2).
Using Oracle SQL Developer
Once you are connected to the database, viewing data in the Oracle SQL Developer environment
is relatively straightforward. Figure 6 shows the Tables node within the Connections Navigator, a
tree-based object-browser pane in Oracle SQL Developer.
Figure 6: Tables node in the Oracle SQL Developer Connections Navigator
To view the details of any of your tables, expand the Tables node by clicking the plus sign and
then double-clicking the individual table name. Figure 7 shows the result of double-clicking the
EMPLOYEE table in the Connections Navigator. The tables column names are displayed
vertically in the Connections Navigator pane, and several tabs that provide details about the table
are displayed to the right of that pane.
Figure 7: Column list and detail tabs of the EMPLOYEE table in Oracle SQL Developer
Columns and Datatypes
By default, the Columns tab is displayed first. It lists
the tables column names and datatypes. It also
shows which columns allow null values (that is, the
absence of a value), which column or columns are
defined as the tables primary key, and any column
comments. (You can see that no primary key has
been defined for the EMPLOYEE table in Figure 7.
Youll learn how to create a primary key later in this
article series.)
Every column has a datatype, chosen during
physical modeling and defined when the table was
created. For example, the datatype for the SALARY
column in the EMPLOYEE table is NUMBER. Any
column defined with the NUMBER datatype permits
only numeric data. No text and no alpha characters
such as monetary symbols may be stored in a
column defined with this datatype. Note that the
SALARY columns datatype is defined as NUMBER(9, 2). The first number in the parentheses (9
in this example) is referred to as the precision, and the second number (2 in this example) is
known as the scale. This precision and this scale mean that the SALARY column can have a
maximum of nine digits, with a maximum of two digits after the decimal point (useful for columns
containing monetary data). If a value with more than two digits after the decimal point is inserted
into the SALARY column, no error will occur; the value will simply be automatically rounded by the
Oracle Database server.
As you might have guessed, columns defined with the VARCHAR2 datatype store variable-length
alphanumeric data (including text, numbers, and special characters). The maximum length of the
data is specified in the parentheses. VARCHAR2 is the most commonly used datatype and can
store up to 4,000 bytes. In contrast, the CHAR datatype (not shown in Figure 7) allows for fixed-
length alphanumeric data only and stores a maximum of 2,000 bytes. I recommend choosing
VARCHAR2 over CHAR when you select an alphanumeric datatype to define text columns. With
VARCHAR2, particularly if you choose an appropriate maximum length for your columns, you
wont waste storage space. (Choosing a maximum of 4,000 bytes makes sense only if you truly
believe that your column will ever contain data values that reach that size limit.) CHAR can waste
storage space, because the data is always padded with blank space to the specified fixed length
before it is stored.
SQL*Plus and SQL Workshop
Its a good idea to familiarize yourself with multiple data access tools so that you can decide
which one works best for youand so that you can access Oracle Database data in settings
(such as a client site, if you are a consultant) where your preferred tool might not be available.
SQL*Plusa command-line-based utility thats the tool of choice for many old-school Oracle
Database programmers and DBAsis always available, because it is included with every Oracle
Database installation. When invoked, the SQL*Plus environment looks like the output shown in
Figure 8.
Figure 8: SQL*Plus environment immediately after login
To view a tables columns and each columns datatype in SQL*Plus, you use the DESCRIBE
command (desc for short), as shown in Figure 9.
Figure 9: Output of the DESCRIBE command in SQL*Plus
If youre using Oracle Application Express, you can use any data access tool youd like, but Oracle
Application Express does have its own built-in data access functionality, called SQL Workshop.
SQL Workshop contains several subutilities. One of them is the Object Browser, which behaves
similarly to the Connections Navigator in Oracle SQL Developer. Double-clicking a table name in
a list displays the details for that table in the right-hand pane. Figure 10 shows the output for the
EMPLOYEE table and column details displayed in SQL Workshop.
Figure 10: Output of the EMPLOYEE tables details in Oracle Application Express SQL
Workshop
Conclusion
Analyzing requirements, planning how data entities should relate to one another, and modeling
those entities and their relationships both logically and physically are all essential steps in
database design. Knowing your users business needs can help you create meaningful entities
and relationships and define appropriate attributes with reasonably sized, well-chosen datatypes.
Once youve defined and created your physical tables, you have many data access options to
choose from, including the three no-cost tools from Oracle youve learned about in this article.
The next installment of SQL 101 will look at the anatomy of a SQL SELECT statement and
explain how to use multiple data access tools to construct SQL statements.
Melanie Caffrey is a senior development manager at Oracle. She is a
(Apress, 2011) and Expert Oracle Practices: Oracle Database Administration
from the Oak Table (Apress, 2010).
SQL 101: Getting Answers with SELECT
http://www.oracle.com/technetwork/issue-archive/2012/12-jan/o12sql-1408573.html[8/30/2014 12:33:07 PM]
As Published In
January/February 2012
Next Steps
design, concepts, and SQL
Release 2 (11.2)
Release 2.1
READ SQL 101 Parts 1-2
DOWNLOAD
Oracle Database, Express Edition
11g
Oracle SQL Developer
Oracle Application Express
TECHNOLOGY: SQL 101

Getting Answers with SELECT
By Melanie Caffrey

database and SQL
Part 2 in this series, Modeling and Accessing Relational Data (Oracle Magazine,
November/December 2011), introduced readers to the ways data entities (tables) can relate to
one another in a relational database. When your logical models and physical implementations use
meaningful entities and well-chosen datatypes, you have multiple options for accessing the data.
This article focuses on the purpose and anatomy of the SQL SELECT statementalso called a
queryand explains how to use Oracle SQL Developer and Oracle Application Express to
construct queries and view their results. (Although Ill briefly review the concepts covered in Part
2, I encourage you to read that installment before starting this one.)
It All Begins with a Query
The goal of writing a SQL query is usually to get the answer from the database to a question or
questions. For example, you might want to ask
How many employees work in the accounting department?
Of those employees, which ones are currently working on multiple projects?
Which employees working on multiple projects in the accounting department have received a
salary increase between their date of hire and today, and which employees havent?
You obtain the answers to these questions by using
a SQL SELECT statement. A SELECT statement
has at least two parts: the SELECT list and the
FROM clause. The SELECT list specifies one or
more columns (or expressions, to be explained in
subsequent installments of this series)selected
from one or more tablesthat you want to display.
The FROM clause lists the table(s) from which your
desired column data should be obtained.
Know Your Data
Before you write a SELECT statement, you must
determine which table or tables contain the
information of interest. For example, if you want to
know all employees hire dates, you must first
determine which table contains employee
information. Perusal of your schema diagram reveals
that employee data is in a table called EMPLOYEE.
You can then use the following SELECT statement:

SELECT first_name, last_name, hire_date
FROM employee

The SELECT list in the above statement specifies three columnslisting the first name, last name,
and date of hire for every employee contained in the EMPLOYEE table, which is specified in the
FROM clause. (To specify multiple columns in a SELECT list, you separate the column names with
commas; a good practice is to insert a space after each comma for readability.)
When the above statement is executed, the result set is a list of all the values found in the
first_name, last_name, and hire_date columns of the EMPLOYEE table, as shown in Listing 1.
Code Listing 1: Code Listing 1: SELECT statement result for three columns

SELECT first_name, last_name, hire_date
FROM employee
FIRST_NAME LAST_NAME HIRE_DATE

Frances Newton 14-SEP-05
Emily Eckhardt 07-JUL-04
Donald Newton 24-SEP-06
Matthew Michaels 16-MAY-07

Everything with a Mere *
If you want to display all the columns for a particular table, you can use the asterisk (*) wildcard
character as the SELECT list instead of typing the name of every column. For example

SELECT *
FROM employee

When this statement executes, the result set displays the columns in the order in which they are
defined in the table, as shown in Listing 2.
Code Listing 2: SELECT statement result for all columns

SELECT *
FROM employee
EMPLOYEE_ID FIRST_NAME LAST_NAME HIRE_DATE SALARY MANAGER
DEPARTMENT_ID

37 Frances Newton 2005-09-14 75000

28 Emily Eckhardt 2004-07-07 100000
1234 Donald Newton 2006-09-24 80000 28 10
7895 Matthew Michaels 2007-05-16 70000 28 10
4 rows selected

This is the same column order you see when you issue the DESCRIBE command (or when you
click the Columns tab in Oracle SQL Developer), as shown in Listing 3.
Code Listing 3: DESCRIBE result for the EMPLOYEE table describe employee

Name Null Type

EMPLOYEE_ID NUMBER
FIRST_NAME VARCHAR2(30)
LAST_NAME VARCHAR2(30)
HIRE_DATE DATE
SALARY NUMBER(9,2)
MANAGER NUMBER
DEPARTMENT_ID NUMBER
7 rows selected

You should use the asterisk wildcard character primarily for ad hoc queryingwhen you want an
answer from the database that you have not already asked for via programmatic code. When you
include SELECT statements in programmatic blocks of code (which youll learn about in
subsequent articles in this series), it is a good practice to list your columns of interest by name in
your SELECT lists.
SELECT with Oracle SQL Developer
In Oracle SQL Developer, an easy way to construct a SELECT statement is to drag and drop a
table name from the TABLES node in the Connections Navigator into the SQL Worksheet. This
action automatically creates an editable SELECT statement in the SQL Worksheet whose select
list includes all the columns in the table. Figure 1 shows the result of dragging and dropping the
EMPLOYEE table into the Oracle SQL Developer SQL Worksheet.
Figure 1: Result of dragging and dropping the EMPLOYEE table into the SQL Worksheet
Figure 2 shows the SQL Worksheet icons.
Figure 2: The SQL Worksheet icon tool bar
The leftmost green arrow in Figure 2 is the Execute Statement icon. When you want to obtain the
results for a single statement, place your cursor anywhere on the statement line and click the
Execute Statement icon. The results appear on the Results tab, as shown in Figure 3.
Figure 3: The Results tab in the SQL Worksheet
In the tool bar, the small green arrow superimposed on the image of a piece of paper is the Run
Script icon. By clicking it, you execute a SQL*Plus-like script consisting of multiple statements (as
Ill illustrate in the next article in this series). The results are displayed on the Script Output tab,
as shown in Figure 4.
Figure 4: The Script Output tab in the SQL Worksheet
Build and Run a SELECT Statement with Oracle Application Express
You can also construct a SELECT statement in the SQL Commands window of Oracle Application
Express SQL Workshop, a Web-based interface to the database. The SQL Workshop SQL
Commands window has no drag-and-drop facility, so you must type your statement explicitly.
Next, click Run to see your result set in the Results section of SQL Workshop, as shown in Figure
5. The results format is similar to that used on the Results tab of the SQL Worksheet, as you can
see by comparing Figure 5 with Figure 3.
Figure 5: The Results section in SQL Workshop
Constructing a SELECT statement in the SQL Commands window of the SQL Workshop in Oracle
Application Express is similar to constructing a SELECT statement in SQL*Plus (as I will illustrate
in the next article in this series).
Eliminate Redundancy with Distinction
As you know from previous installments in this series, one of your database design goals should
be to eliminate redundancy. Sometimes, however, the way you select data might cause the results
to include duplicate values. Use of the DISTINCT or UNIQUE keyword in your SELECT list,
however, helps you eliminate duplicate data in your result sets.
In the example in Figure 6, four rows are returned yet only two employees are assigned to
departments. Frances Newton and Emily Eckhardt have NULL values for DEPARTMENT_ID.
Figure 6: Employee first and last name data with corresponding departments
If you want to display only the distinct (or unique) DEPARTMENT_ID values in the EMPLOYEE
table, you can construct a SELECT statement like the one in Figure 7.
Figure 7: A DISTINCT list of the DEPARTMENT_ID values in the EMPLOYEE table
Using the DISTINCT keyword to query a table containing only a few rows (as in this example) is
probably unnecessary, because duplicate data would be obvious in the full results. But in a table
with hundreds or thousands of EMPLOYEE records, it might not be at all obvious which
departments are represented (or not).
Improve Readability Through Consistent Formatting
The more consistently code is formatted, the easier it is to read. The easier code is to read, the
easier it is for people reviewing it to discover obvious or potential bugs and suggest improvements.
If your IT management insists that all developers adhere to a standard code format, Oracle SQL
Developers formatting facilities can help you comply with such mandates more easily.
For example, this articles examples show a mix of uppercase and lowercase keywords. However,
your environments standards might dictate that you use a particular casing style. Oracle SQL
Developer provides several methods to help you achieve consistency. At a minimum, you can
make a statements keywords all uppercase, lowercase, or initial-capped by highlighting the
statement, right-clicking in the SQL Worksheet, and selecting To Upper/Lower/InitCap (or typing
Ctrl+Quote), as shown in Figure 8.
Figure 8: Changing keyword case
Figure 9 shows the result of changing a statements keywords to uppercase via the mechanism
illustrated in Figure 8.
Figure 9: Keyword case changed
Another way to control your codes formatting is to right-click in the SQL Worksheet and choose
Format (or type Ctrl-F7). (Be aware that selecting this option affects all the code in the SQL
Worksheetas of Oracle SQL Developer Release 3.0.04.) To set your preferences for this option,
select Tools -> Preferences -> Database -> SQL Formatter -> Oracle Formatting and click
Edit. Figure 10 shows some of the available formatting options.
Figure 10: SQL Formatter options in Oracle SQL Developer
Finishing Your Thought
You might occasionally need to refer to your schema diagram to identify the table(s) you want to
include in a query or to look up the syntax for correct statement construction in the Oracle
documentation. The code completion facility in Oracle SQL Developer helps you with both tasks. If
you pause while typing your statement, the code completion facility will prompt you with a list of
appropriate table names, column names, and commands you can select from. Figure 11 shows an
example of this feature in action.
Figure 11: Code completion facility in Oracle SQL Developer
Highlight Your Code
Syntax highlighting in Oracle SQL Developer marks the SQL language keywords in your code with
a color different from that of the table names, column names, and other statement criteria. When
this feature is enabled, your SQL language commands appear by default in blue and other
statement criteria appear in black. Syntax highlighting, illustrated in Figure 12, can greatly improve
your codes readability, enabling you and others to spot errors more readily.
Figure 12: Syntax highlighting in Oracle SQL Developer
Syntax highlighting, along with the other Oracle SQL Developer formatting facilities Ive described
in this article, can be edited or disabled via Tools -> Preferences. By default, they are enabled
and exhibit the behavior and results shown in this article.
Conclusion
This article has shown you how to construct and execute simple SQL SELECT statements with
Oracle SQL Developer and the SQL Workshop SQL Commands facility in Oracle Application
Express. Youve also seen how the formatting, syntax highlighting, and code completion facilities
in Oracle SQL Developer can enhance your codes readability and accuracy.
The next installment of SQL 101 will examine the WHERE and ORDER BY clauses of a SQL
statement and take a closer look at Oracles SQL*Plus tool.
Melanie Caffrey is a senior development manager at Oracle. She
coauthored Expert PL/SQL Practices for Oracle Developers and DBAs
(Apress, 2011) and Expert Oracle Practices: Oracle Database
Administration from the Oak Table (Apress, 2010).
SQL 101: Why WHERE Matters
http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22sql-1494267.html[8/30/2014 12:32:17 PM]
As Published In

March/April 2012
TECHNOLOGY: SQL 101

Why WHERE Matters
By Melanie Caffrey

database and SQL
Part 3 in this series, Getting Answers with SELECT Statements (Oracle Magazine,
January/February 2012), introduced the anatomy of a SELECT statement (or query) and the
importance of ascertaining which tables contain data of interest. Now that youre familiar with a
SELECT statements basic functionality, you can start filtering your data to limit the output in
meaningful ways. The WHERE clause enables you to narrow the scope of the data a SELECT
statement retrieves (or fetches). WHERE, and its associated comparison and logical operators,
are the focus of this article.
To try out the examples in this and subsequent articles in the series, you need access to an
Oracle Database instance. If necessary, download and install an Oracle Database edition for your
operating system. I recommend installing Oracle Database, Express Edition.
If you install the Oracle Database software, choose the installation option that enables you to
create and configure a database. A new database, including sample user accounts and their
associated schemas, will be created for you. (Recall from Part 1 of this series that a schema is
typically a grouping of objects, such as tables, that serve a similar business function.) SQL_101 is
the user account youll use for the examples in this article; its also the schema in which you will
create database tables and other objects. When the installation process prompts you to specify
schema passwords, enter and confirm passwords for SYS and SYSTEM and make a note of
them. Finally, whether you installed the database software from scratch or have access to an
existing Oracle Database instance, download and unzip the SQL script and run it to create the
example tables for the SQL_101 schema.
The SQL queries in this article are executed against tables in the SQL_101 schema with the
SQL*Plus tool.
Setting Limits by Comparing
To filter the data a query retrieves, you add a WHERE clausealso called a predicate list or a set
of conditionsto your SQL statement. In a nutshell, the WHERE clause specifies criteria that
must be met before records are included in your query result set. The WHERE clause must
specify a WHERE clause condition (or conditions) that the database software evaluates to be true
or falsealternatively, the software can determine the absence of a value. A WHERE clause
consists of conditional expressions. A conditional expression takes the form
<left_expression> <as compared with> <right_expression>
Here are some examples of common types of conditional expressions:
WHERE <column_name> =
<literal_character_value>
WHERE <column_name> IN (3, 7, 9)
WHERE <column_name> >= 100
WHERE <column_name> LIKE 'E%';
WHERE <column_name> BETWEEN 100 AND 500;

A literal character value, or string, is any list of alphanumeric characters enclosed in single
quotation marks, such as Smith, 73abc, or 15-MAR-1965.
Comparison operators compare expressions to determine the appropriate data for selection. Table
1 shows commonly used comparison operators.

Operator Definition Example
= Equal WHERE last_name = Michaels
!=
<>
Not equal
Not equal
WHERE salary <> 100000
>
>=
Greater than
Greater than or equal to
WHERE salary >= 70000
<
<=
Less than Less than or equal to WHERE salary <= 85000
IN (...) List of values WHERE SALARY IN (70000,
85000, 100000)
BETWEEN ...
AND ...
Inclusive of two values (and all values
between them)
WHERE SALARY BETWEEN
70000 and 100000
LIKE Does pattern matching with wildcard
characters % and _
WHERE first_name LIKE F%
IS NULL
IS NOT NULL
Tests for null values
Tests for non-null values
WHERE manager IS NULL
Table 1: SQL WHERE clause comparison operators
The Importance of (In)equality
The most commonly used comparison operator is the equality operator, =. For example, if you
wanted to find out the names and hire dates of all employees with an annual salary of $70,000,
you could execute the SQL query in Listing 1.
Code Listing 1: Query for finding employees whose salary equals $70,000
SQL> select first_name, last_name, hire_date, salary
2 from employee
3 where salary = 70000;
FIRST_NAME LAST_NAME HIRE_DATE SALARY
----------- ------------- ------------ -----------
Matthew Michaels 16-MAY-07 70000
1 row selected.

The value stored in the SALARY column is compared with the literal value 70000 to determine
whether the values are equal. Each row that satisfies the WHERE clause condition is retrieved.
Sometimes you might want to exclude certain data from your query results. For example, after the
query and result in Listing 1, you already know the name, hire date, and salary of the employee
named Matthew Michaels. To get the same information for all other employees, you could execute
the query in Listing 2. As you can see, the query uses the inequality operator, !=, and retrieves
every row except the one with the LAST_NAME value of Michaels.
Code Listing 2: Query that excludes the employee Michaels
2 from employee
3 where last_name != 'Michaels';
----------- ------------- ------------ -----------
Frances Newton 14-SEP-05 75000
Emily Eckhardt 07-JUL-04 100000
Donald Newton 24-SEP-06 80000
3 rows selected.

Be aware that when you compare a database column value with a character literal, or string, the
case of the data contained in the database column must, by default, exactly match the case of the
data contained in the string. The query in Listing 3 returns no rows, because the case of the string
denoting the employees last name is different from that of the data stored in the EMPLOYEE
tables LAST_NAME column.
Code Listing 3: Query using a literal value (case-sensitive) in a WHERE clause condition
2 from employee
3 where last_name = 'MICHAELS';
no rows selected

Youll learn about converting string literal data to match the case of data contained in a database
column (and vice versa) in subsequent articles in this series.
Note in the example in Listing 3 that when you compare a string literal with a database column
value, you must enclose the string literal value in single quotation marks. The same requirement is
true for comparing date literals with database column values.
Any two values you compare with each other must be of the same datatype. Compare only
numbers with numbers, strings with strings, and dates with dates. Whenever possible, Oracle
Database will perform an implicit datatype conversion, but in general, you should avoid allowing
Oracle Database to do so. The query in Listing 4 will return a result, but as a best practice, you
should never compare a number with a string.
Code Listing 4: Query that performs an implicit datatype conversion
2 from employee
3 where salary = '70000';
----------- ------------- ------------ -----------
1 row selected.

The Range of Inclusion
Sometimes you need to obtain a set of records (rows) that falls within a certain range of values.
You can do so with the BETWEEN operator, as in Listing 5.
The results of a BETWEEN operation can include the listed values that define the range.
Therefore, in the example in Listing 5, the result list includes an employee with a salary of
$75,000, the lower end of the range, along with one whose salary of $80,000 is between the upper
and lower listed values. The BETWEEN operator is used most often for number and date
comparisons.
Code Listing 5: Query for selecting records within a range of values
SQL> select first_name, last_name, salary
2 from employee
3 where salary BETWEEN 75000 and 85000;
FIRST_NAME LAST_NAME SALARY
----------- ------------- ---------
Frances Newton 75000
Next Steps
design, concepts, and SQL
Release 2 (11.2)
Release 2.1
DOWNLOAD the sample script for
this article
Donald Newton 80000
2 rows selected.

The Greater and the Lesser
The comparison operators >, >=, <, and <= are useful if you need to obtain a set of records that
fall either above or below certain criteria. In Listing 6, the less than or equal to operator, <=, is
used to fetch a list of employees whose yearly salary is less than or equal to $80,000.
Code Listing 6: Query using less than or equal to operator
2 from employee
3 where salary <= 80000;
----------- ------------- ---------
Donald Newton 80000
Matthew Michaels 70000
3 rows selected.

Match What You Like
Whenever you dont know or remember the exact spelling of a data value such as a name or you
suspect data corruption (incorrect values in your database), you may want to perform an inexact
search. The LIKE operator can help you carry out such a task. This operator performs pattern
matching by using wildcard characters. The underscore (_) wildcard denotes a single character,
and the percentage (%) wildcard denotes any number of characters (including zero characters).
The query in Listing 7 obtains records in which the last name begins with the uppercase letter N
and contains the lowercase letter w. In the query in Listing 7, an unknown number of characters
can exist between the N and the w, and an unknown number of characters can exist after the w
hence the use of two % wildcards in the expression.
Code Listing 7: Query using LIKE operator with literal string and wildcard values
2 from employee
3 where last_name like 'N%w%';
----------- ------------- ---------
Donald Newton 80000
2 rows selected.

Consider the query in Listing 8. In this example, the
WHERE clause limits the result set to rows in which
the last name begins with two characters, has a
lowercase letter w as the third character, and ends
with any character(s) or at the third character. You
can place the % or _ wildcard character anywhere
within a literal character string (which, as always,
must be enclosed in single quotation marks).

Code Listing 8: Query using LIKE operator with
wildcard and literal values
SQL> select first_name, last_name
2 from employee
3 where last_name like '__w%';
FIRST_NAME LAST_NAME
----------- -------------
Frances Newton
Donald Newton
2 rows selected.

The IN Crowd
The IN operator evaluates a comma-delimited list of values enclosed within a set of parentheses.
The query in Listing 9 uses the IN operator to retrieve employees who have an annual salary of
$75,000, $85,000, or $100,000.
Code Listing 9: Query using IN operator with a list of values
2 from employee
3 where salary in (75000, 85000, 100000);
----------- ------------- ---------
Emily Eckhardt 100000
2 rows selected.

Negating with NOT
The BETWEEN, IN, and LIKE comparison operators can all be negated with the NOT logical
operator. (Ill describe logical operators shortly.) Consider the query in Listing 10. This query
returns all the employees whose last name does not begin with an uppercase letter N.
Code Listing 10: Query using NOT and LIKE operators
2 from employee
3 where last_name NOT LIKE 'N%';
----------- -------------
Emily Eckhardt
Matthew Michaels
2 rows selected.

Existence or Absence of Values
Recall from Part 1 in this series that the absence of a value is referred to as a null value. A null
value cannot be equal or unequal to another null value or to any non-null value. Therefore, you
must always use the IS NULL or IS NOT NULL comparison operators to evaluate whether a data
value is null or not. For example, the query in Listing 11 returns employees who do not yet have
an assigned manager.
Code Listing 11: Query using IS NULL operator
SQL> select first_name, last_name, manager
2 from employee
3 where manager IS NULL;
FIRST_NAME LAST_NAME MANAGER
----------- ------------- -----------------
Frances Newton
Emily Eckhardt
2 rows selected.

Note that the DISTINCT keyword (which you learned about in Part 3 of this series) recognizes and
returns NULL values:
SQL> select DISTINCT manager
2 from employee;
MANAGER
----------
28
2 rows selected.

To eliminate null values from a result set derived from a query that uses the DISTINCT keyword in
its SELECT list, you can use the IS NOT NULL operator in your WHERE clause:
SQL> select DISTINCT manager
2 from employee
3 where manager IS NOT NULL;
MANAGER
----------
28
1 row selected.

Truth in Logic
WHERE clauses with only one predicate are rare. The logical operators AND and OR are used to
group multiple predicates contained within the same WHERE clause of a single SQL statement.
Each added predicate further filters your result set. If two conditions are combined via the AND
operator, both conditions must evaluate to true to produce a result. If two conditions are combined
with the OR operator, only one of the conditions needs to evaluate to true to yield a result.
For example, the SQL statement in Listing 12 combines two comparison operators by using the
AND logical operator. The result displays employees who do not have an assigned manager
(according to the EMPLOYEE table) and whose salary is greater than $75,000.
Code Listing 12: Query using AND logical operator to combine multiple predicates
SQL> select first_name, last_name, manager, salary
2 from employee
3 where salary > 75000
4 AND manager IS NULL;
FIRST_NAME LAST_NAME MANAGER SALARY
----------- ------------- ----------------- ------
1 row selected.

Using the OR logical operator instead of the AND operator changes the result set to include two
more rows, as shown in Listing 13.
Code Listing 13: Query using OR logical operator to combine multiple predicates
2 from employee
3 where salary > 75000
4 OR manager IS NULL;
----------- ------------- ----------------- ------
Donald Newton 28 80000
3 rows selected.

Logical Precedence
If you use both AND and OR in a WHERE clause, the AND operator will always take precedence
over the OR operator. That is, any AND conditions are evaluated first.
Consider the SQL query in Listing 14.
Code Listing 14: Query using AND and OR logical operators
2 from employee
3 where manager IS NULL
4 AND salary = 75000
5 OR salary = 80000;
----------- ------------- ----------------- ------
Donald Newton 28 80000
2 rows selected.

You can change the precedence of logical operators in the WHERE clause by grouping the
expressions with parentheses. The query in Listing 15 yields a different result from the preceding
one because the OR condition in parentheses is evaluated before the AND condition.
Code Listing 15: Query using AND and OR logical operators with parenthetical precedence
2 from employee
3 where manager IS NULL
4 AND (salary = 75000
5 OR salary = 80000);
----------- ------------- ----------------- ------
1 row selected.

In this new query, both expressionsthe manager is null AND the salary is either $75,000 or
$80,000must evaluate to true to produce a result. Because the record for Donald Newton
satisfies the second condition but not the first, it is not in the result set.
If you write a predicate that contains a mixture of ANDs and ORs, I strongly recommend that you
use parentheses to mandate the order of operation explicitly. In general, this practice will make
your SQL more understandable, maintainable, and correct.
Conclusion
Only rarely will you write a query without a WHERE clause, and this article has shown you how to
use the WHERE clause to expand upon simple SQL SELECT statements and filter data of interest
to receive a more meaningful result set. Youve seen how comparison operators are used in
conjunction with the WHERE clause to help you specify your desired result. Youve also seen how
logical operators can be used to further filter your data by grouping predicates.
The next installment of SQL 101 will examine the ORDER BY clause of a SQL statement and take
a closer look at Oracles SQL*Plus tool.
SQL 101: An Order of Sorts
http://www.oracle.com/technetwork/issue-archive/2012/12-may/o32sql-1541432.html[8/30/2014 12:31:38 PM]
As Published In

May/June 2012
TECHNOLOGY: SQL 101

An Order of Sorts
By Melanie Caffrey

database and SQL
Part 4 in this SQL 101 series, Why WHERE Matters (Oracle Magazine, March/April 2012),
introduced readers to the WHERE clause of a SQL SELECT statement (a query) and the
importance of filtering your data. The WHERE clause and the SELECT list tell the database which
rows you want your SELECT statement to retrieve. Now that you know how to narrow the scope
of the data a query fetches, youre ready to learn how to sort (or order) the data. This article
focuses on the SQL ORDER BY clause and how it behaves in conjunction with certain options
and keywords to tell the database how you want retrieved rows to be sorted.
associated schemas, will be created for you. SQL_101 is the user account youll use for the
examples in this series; its also the schema in which youll create database tables and other
objects. When the installation process prompts you to specify schema passwords, enter and
confirm passwords for SYS and SYSTEM and make a note of them.
Whether you installed the database software from scratch or have access to an existing Oracle
Database instance, download and unzip the SQL script and execute the script to create the
example tables for the SQL_101 schema. (View the script in a text editor to get instructions on
how to execute the script and information on what it does.)
The SQL queries in this article are executed against tables in the SQL_101 schema with Oracles
SQL*Plus tool. In addition to discussing the ORDER BY clause, this article provides a closer look
at SQL*Plus.
Making Order out of Disarray
Oracle Database table data isnt stored in any specific order, regardless of the order in which it
was inserted into the database. To retrieve rows in either ascending or descending order by
column, you must tell the database that you want to do so. For example, you might want to list all
employees in the order they were hired, display all employees in order of highest to lowest annual
salary, or list the last names of all employees in the accounting department in alphabetical order.
You retrieve sorted data by adding an ORDER BY clause to your SELECT statement. ORDER BY
is always the last clause in a query.
Listing 1 shows a simple query of the EMPLOYEE table that doesnt filter or order its result set.
Compare Listing 1s result set with the one in Listing 2. When you use the ORDER BY clause, the
result set is in ascending order by default. Listing 2 displays the employees in the EMPLOYEE
table sorted by last name in default ascending alphabetical order.
Code Listing 1: Simple query for listing all rows in the EMPLOYEE table

SQL> set linesize 32000
SQL> set feedback on
2 from employee;

Roger Friedli 16-MAY-07 60000
Betsy James 16-MAY-07 60000
6 rows selected.

Code Listing 2: Query that lists all rows in ascending alphabetical order by last name

2 from employee
3 ORDER BY last_name;

6 rows selected.

You can obtain a result set in descending order by adding the DESC keyword immediately after
the column name in the ORDER BY clause. The query in Listing 3 retrieves all employees from
the most recent to the least recent date of hire. Note the DESC keyword in the ORDER BY
clause. (You can use the ASC keyword to explicitly request ascending order, but it isnt necessary,
because ascending order is the default.)
Code Listing 3: Query that retrieves and displays all employees in descending order by date of
hire

2 from employee
3 ORDER BY hire_date DESC;

6 rows selected.

Names, Numbers, and Arrangements
Your ORDER BY clause does not need to explicitly name the column(s) by which you want to
order the data. If you prefer, you can use the number of the columns position in the querys
SELECT list. Listing 4 shows a query that retrieves all employees ordered from highest to lowest
salary, using the sequence number (4) of the salary column in the querys SELECT list.
Code Listing 4: Query that retrieves and displays all employees in descending order by column 4

2 from employee
3 ORDER BY 4 DESC;

6 rows selected.

A query can sort on multiple columns, using multiple ascension and descension requests. You list
the columns (or SELECT list column sequence numbers) in the ORDER BY clause, delimited by
commas. The results are ordered by the first column, then the second, and so on for as many
columns as the ORDER BY clause includes. If you want any results sorted in descending order,
your ORDER BY clause must use the DESC keyword directly after the name or the number of the
relevant column.
Listing 5 shows a result set that displays all employees in descending order of hire date (most
recent to least recent), within which the employees are further sorted from lowest to highest salary
and then by last name. Because ascending order is the default, the second column in Listing 5s
ORDER BY clause doesnt need to include the ASC keyword; for the same reason, the ASC
keyword associated with the last_name column is superfluous.
Code Listing 5: Query that retrieves and displays all employees, using multiple ORDER BY
criteria

2 from employee
3 ORDER BY hire_date DESC, 4, last_name ASC;

6 rows selected.

Ensuring That You Are Set
Whenever you log in to the database with your username and password, youre creating a session
in the database. You can change certain environment settings for your session that have no effect
on other connected sessions (logged-in users). In Listing 1, note that a couple of SQL*Plus set
commands appear before the SQL statement. These commands set system variables to
customize the SQL*Plus environment settings for the current session. For example, in Listing 1,
the following command sets the number of characters that SQL*Plus displays on a line before
beginning a new line:

set linesize 32000

The shorter notation for this command is set lines n. This command is helpful if you want to
ensure that the lines of your SQL query results do not wrap.
The other set command used in Listing 1 is

set feedback on

This command directs SQL*Plus to display a final count of the number of rows returned in your
result set. The shorter notation for this command is set feed on.
The last line displayed in Listing 2s result set is

6 rows selected.

This line appears because the SQL*Plus feedback setting was turned on (in Listing 1). If you do
not want to see this final count of the number of rows returned in your result set, you can turn this
setting off with the set feed off command.
Your environment settings will apply to all of your current sessions subsequent query execution
results.

Ordering the Unknown
Recall that a null value is one that is not known. Listing 6, for example, lists all employees from the
EMPLOYEE table with their manager values. Two of the six returned records have null values in
the manager column.
Code Listing 6: Query that displays all employees with their manager values

SQL> select employee_id, first_name, last_name, manager
2 from employee
3 ORDER BY manager, last_name;
EMPLOYEE_ID FIRST_NAME LAST_NAME MANAGER

6567 Roger Friedli 28
6568 Betsy James 28
7895 Matthew Michaels 28
1234 Donald Newton 28
28 Emily Eckhardt
37 Frances Newton
6 rows selected.

When an ORDER BY clause sorts results in ascending order, any null values are displayed last by
default. Conversely, if an ORDER BY clause specifies descending order for a column containing
null values, as in Listing 7, the null values are displayed first by default. By using the NULLS
FIRST or NULLS LAST option in the ORDER BY clause, you can override the defaults and
explicitly specify how you want null values to be sorted. The example in Listing 8 uses the NULLS
FIRST option to override the default display-nulls-last behavior of an ORDER BY clause.
Code Listing 7: Query that orders a column containing null values in descending order

2 from employee
3 ORDER BY manager DESC, last_name;

28 Emily Eckhardt
37 Frances Newton
6568 Betsy James 28
6 rows selected.

Code Listing 8: Query that orders a column containing null values with the NULLS FIRST option

2 from employee
3 ORDER BY manager NULLS FIRST, last_name;

28 Emily Eckhardt
37 Frances Newton
6568 Betsy James 28
6 rows selected.

Sorting with Distinction
When including an ORDER BY clause in a SQL SELECT statement, you will usually choose to
sort by a column or an expression thats in the statements SELECT list. However, you can also
order by columns or expressions that are not in the SELECT list. Listing 9 displays a list of
employees ordered by the most recent to the least recent date of hire, within which the employees
are sorted alphabetically by last name. Although the sort occurs and displays correctly, only the
employees first and last names appear in the output, because hire_date is not in the SELECT list.
Code Listing 9: Query that orders by a column not included in the SELECT list
Next Steps
design and concepts
Release 2 (11.2)
DOWNLOAD the script for this article

2 from employee
3 ORDER BY hire_date DESC, last_name;

Roger Friedli
Betsy James
Matthew Michaels
Donald Newton
Frances Newton
Emily Eckhardt
6 rows selected.

If you include the DISTINCT keyword in the SELECT list, only columns or expressions in the
SELECT list may be included in the ORDER BY clause. As Listing 10 shows, an error will occur if
a query using the DISTINCT keyword tries to order by a column not included in the SELECT list.
Code Listing 10: Query with DISTINCT fails because ORDER BY column is not in the SELECT
list

SQL> select DISTINCT hire_date
2 from employee
3 ORDER BY manager NULLS FIRST;
ORDER BY manager NULLS FIRST
*
ERROR at line 3:
ORA-01791: not a SELECTed expression

The Errors of Our Ways
You will inevitably make mistakes while learning to write SQL statements. Being able to interpret
the Oracle Database error messages you receive is key to your understanding of SQL. Some error
messages make it easy to understand what youve done wrong, whereas others are not so
straightforward. The best approach is to try to resolve one error message at a time (a process
called debugging).
Oracle Database tells you on which line of a query an error has occurred. Listing 10, for example,
displays the following error message:

ERROR at line 3:
ORA-01791: not a SELECTed expression

Now you know that the database program is having difficulty with the following line:

3 ORDER BY manager NULLS FIRST;

If you add the MANAGER column to the querys
SELECT list, as the error message implies, you will
be able to rerun the statement successfully
(assuming that the query contains no other errors).
Syntax errors will probably be the most common
errors you make while learning SQL. The importance
of carefully reading (and rereading) your statements
while debugging cannot be overemphasized. Simple
typos, misplaced or missing commas, and unpaired
single quotation marks (to name a few common
mistakes) can cause a myriad of problems to which
the solution might not be readily apparent.
Using Aliases and Format Models
Sometimes you might want your query output to
display meaningful headings for specific columns or
expressions. You can make this happen by adding a
column alias to any of the columns or expressions in
your SQL statements SELECT list. Listing 11 shows examples of the types of column aliases you
can use. Note that if a column alias contains more than one wordor you want it to appear in
exact case (uppercase is otherwise the default)you must enclose the alias in double quotation
marks. As Listing 11 shows, you may use a columns alias in a querys ORDER BY clause,
provided that it is not enclosed in double quotation marks in the SELECT list.
Code Listing 11: Query that uses column aliases

SQL> select first_name first, last_name "Last", hire_date "Start Dt", salary "sal"
2 from employee
3 ORDER BY manager NULLS FIRST, first;
FIRST Last Start Dt sal

6 rows selected.

SQL*Plus provides formatting commands that enable you to format attributes for a result set
column. For example, Listing 12 illustrates the use of a format model (sometimes referred to as a
format mask) applied to the SALARY column. This type of formatting command can be applied to
any SELECT list expression that consists of a NUMBER datatype. The shorthand notation for the
SQL*Plus COLUMN command is COL.
Code Listing 12: Query that uses a SQL*Plus format model via the COLUMN command

SQL> COLUMN salary FORMAT $999,999
2 from employee
3 order by salary desc, last_name;

Emily Eckhardt 07-JUL-04 $100,000

Donald Newton 24-SEP-06 $80,000
Frances Newton 14-SEP-05 $75,000
Matthew Michaels 16-MAY-07 $70,000
Roger Friedli 16-MAY-07 $60,000
Betsy James 16-MAY-07 $60,000
6 rows selected.

SQL Statements in SQL*Plus
SQL*Plus requires the use of a statement terminator, which tells it when to execute your SQL
statement. The semicolon (;) is the statement terminator used in most of the examples in this
series of articles so far. Alternatively, you can use a forward slash (/) as a statement terminator,
provided that it is on a separate line from the rest of the SQL statement. Listing 13 demonstrates
the use of both acceptable terminators.
Code Listing 13: Query executed with semicolon and forward slash terminators

2 from employee
3 order by hire_date desc, salary desc, last_name;


6 rows selected.
2 from employee
3 order by hire_date desc, salary desc, last_name
4 /


6 rows selected.

The SQL*Plus buffer keeps track of the last statement you ran. To re-execute that statement
without retyping it, type a forward slash and press Enter. This shortcut is useful, for example, for
checking the status of a batch job that is supposed to insert or update records in a particular table.
Only the most recent statement remains in the buffer; it is replaced as soon as you execute
another query. To display (or list) the contents of the buffer, you can execute the SQL*Plus LIST
command (or just the letter l). For example:

SQL> l
1 select first_name, last_name, hire_date, salary
2 from employee
3* order by hire_date desc, salary desc, last_name
SQL>

Conclusion
This article has shown you how to expand on simple SQL SELECT statements via the ORDER BY
clause to order the data you display in a more meaningful way. Youve seen how the DESC,
NULLS FIRST, and NULLS LAST options behave and how null values are handled by default in
an ORDER BY clause. Youve also seen how the presence or the absence of the DISTINCT
keyword in a SELECT list affects query execution if the ORDER BY clause includes a column
thats not in the SELECT list.
The next installment in the SQL 101 series will look at character functions.
SQL 101
http://www.oracle.com/technetwork/issue-archive/2012/12-jul/o42sql-1603138.html[8/30/2014 12:37:18 PM]
As Published In

July/August 2012
TECHNOLOGY: SQL 101

A Function of Character
By Melanie Caffrey

database and SQL
Part 5 in this series, An Order of Sorts (Oracle Magazine, May/June 2012), introduced the
ORDER BY clause of a SQL SELECT statement (or query) and how it behaves in conjunction with
certain options and keywords to order (or sort ) the data in query results. Now you are ready to
start learning how to use SQL functions in your queries to transform result set data so that it
displays differently from how it is stored in the database. This article focuses on the SQL character
functions (also known as string functions or text functions), which enable you to manipulate how
character data is displayed. Subsequent articles in this series will introduce the date and number
functions.
associated schemas, will be created for you. (Note that SQL_101 is the user account youll use for
the examples in this series; its also the schema in which youll create database tables and other
objects.) When the installation process prompts you to specify schema passwords, enter and
Finallywhether you installed the database software from scratch or have access to an existing
Oracle Database instancedownload, unzip, and execute the SQL script to create the tables for
the SQL_101 schema that are required for this articles examples. (View the script in a text editor
for execution instructions.)
Pretty Printing
The most-basic character functions enable you to change the way alphanumeric data is formatted
in a result set. For example, the simple query in Listing 1 obtains all unique last name values from
the EMPLOYEE table and displays them in all capital letters. It does this by applying the UPPER
character function to the LAST_NAME column. Similarly, the query in Listing 2 uses the LOWER
character function to display all department location names from the DEPARTMENT table in
lowercase letters. All functions take some kind of input parameter(s). Character functions require
input parameters that are alphanumericeither a character (or string) literal or a column with a
VARCHAR2, CHAR, or CLOB datatype. The data in the EMPLOYEE tables LAST_NAME column
and the DEPARTMENT tables LOCATION column is stored with a datatype of VARCHAR2.
Recall that a literal character value is any list of alphanumeric characters enclosed in single
quotation marks, such as Smith, 73abc, or 15-MAR-1965.
Code Listing 1: Query that lists every unique last name value in uppercase letters

SQL> select distinct UPPER(last_name) "Uppercase Employee Last Name"
2 from employee
3 order by UPPER(last_name);
Uppercase Employee Last Name
ECKHARDT
FRIEDLI
JAMES
LEBLANC
MICHAELS
NEWTON
PETERSON
7 rows selected.

Code Listing 2: Query that displays all department locations in lowercase letters

SQL> select name, LOWER(location) "Lowercase Department Location"
2 from department
3 order by location;
NAME Lowercase Department Location

Accounting los angeles
Payroll new york
2 rows selected.

Listings 3 and 4 demonstrate the INITCAP function. The query in Listing 3 uses INITCAP to
convert certain first and last names from being stored in all lowercase in the EMPLOYEE table to
being displayed with initial capital letters. The INITCAP function capitalizes the first letter of a
string and lowercases the remainder of the string, as demonstrated by the query in Listing 4. That
query also shows that the input parameter for an INITCAP function can consist of a character
functions application to a string or a database column that stores alphanumeric data. Specifically,
the query applies the UPPER function to the LAST_NAME column of the EMPLOYEE table for
certain employees. The UPPER function is said to be nested inside the INITCAP function. The
SQL 101
Oracle Database server applies nested functions in order, from innermost function to outermost
function. In Listing 4, the UPPER function converts the values peterson and leblanc to
PETERSON and LEBLANC. Then the INITCAP function converts those uppercase values to
Peterson and Leblanc.
Code Listing 3: Query that shows certain names converted and with the initial letter capitalized

SQL> set lines 32000
SQL> select first_name, last_name, INITCAP(first_name) "First Name",
INITCAP(last_name) "Last Name"
2 from employee
3 where employee_id in (6569, 6570);
FIRST_NAME LAST_NAME First Name Last Name

michael peterson Michael Peterson
mark leblanc Mark Leblanc
2 rows selected.

Code Listing 4: Query that demonstrates the INITCAP function

SQL> select INITCAP('eMPLOYEE lAST nAMES') "INITCAP Literal",
INITCAP(UPPER(last_name)) "Converted Employee Last Name"
2 from employee
3 where employee_id in (6569, 6570);
INITCAP Literal Converted Employee Last Name

Employee Last Names Peterson
Employee Last Names Leblanc
2 rows selected.

Padding Your Results
To pad something is to add to it. The LPAD and RPAD functions enable you to pad your
character-data results by repeating a character, space, or symbol to the left or right of any string.
LPAD pads to the left of a string; RPAD pads to the right.
Listing 5 demonstrates the power of the RPAD and LPAD functions. Note that each takes three
input parameters: the column name or string literal you want to pad; the length to which the string
should be padded; and the character, space, or symbol (the filler) to pad with. For example, the
query in Listing 5 specifies that the department name should be right-padded to a total length of
15 with the . filler character. If any department name is exactly 15 characters or longer, no filler
character will be added. Because Accounting is 10 characters long, the RPAD function adds five
filler characters to its right. The query also specifies that the location should be left-padded to a
total length of 15. Because LOS ANGELES is 11 characters long, counting the space, the LPAD
function adds four filler characters to its left.
Code Listing 5: Query that applies the RPAD and the LPAD functions

SQL> select RPAD(name, 15, '.') department, LPAD(location, 15, '.')
location
2 from department;
DEPARTMENT LOCATION

Accounting..... ....LOS ANGELES
Payroll........ .......NEW YORK
2 rows selected.

The Helpful Dual
Oracle Database provides a single-row, single-column table called DUAL that is useful for many
purposes, not the least of which is learning about Oracle functions. DUAL is an Oracle system
table owned by the SYS user, not the SQL_101 schema. Many Oracle system tables are made
available to all users via public synonyms. Synonyms will be discussed in subsequent articles in
this series.
The DUAL table contains no data thats useful in and of itself. (It has one row with one column
called the DUMMY columnthat contains the value X.) You can use DUAL to try out functions
that work on string literals and, as youll see in subsequent articles in this series, on number
literals and even on todays date.
The following demonstrates the single-row, single-column output of a SELECT statement executed
against the DUAL table:

SQL> select *
2 from dual;
D
-
X
1 row selected.

To display the current date, you can query the DUAL table as follows:

SQL> select sysdate
2 from dual;
SQL 101
SYSDATE
18-APR-12
1 row selected.

And finally, the following example shows how you can practice any function in the SELECT clause
of a SQL statement, using the DUAL table:

SQL> select rpad('Melanie', 10, '*') Melanie, lpad('Caffrey', 10, '.')
Caffrey
2 from dual;
MELANIE CAFFREY

Melanie*** ...Caffrey
1 row selected.

Note that functions work even though there is no usable data in DUAL. In the preceding examples,
the SYSDATE function displays the current date and time of the operating system hosting the
database, and the RPAD and LPAD functions add padding to my name.
Stringing Strings Together
Sometimes it makes sense to combine certain strings, such as the FIRST_NAME and
LAST_NAME values from the EMPLOYEE table, in the result set display. You can use
concatenation to accomplish this taskwith either the CONCAT function, illustrated in Listing 6, or
the (more commonly used) concatenation operator || (two pipe characters), illustrated in Listing 7.
Code Listing 6: Query that demonstrates the CONCAT function

SQL> select CONCAT(first_name, last_name) employee_name
2 from employee
3 order by employee_name;
EMPLOYEE_NAME
BetsyJames
DonaldNewton
EmilyEckhardt
FrancesNewton
MatthewMichaels
RogerFriedli
markleblanc
michaelpeterson
8 rows selected.

Code Listing 7: Query that demonstrates the concatenation operator, ||

SQL> select first_name||' '||last_name employee_name
2 from employee
EMPLOYEE_NAME
Betsy James
Donald Newton
Emily Eckhardt
Frances Newton
Matthew Michaels
Roger Friedli
mark leblanc
michael peterson
8 rows selected.

The CONCAT function takes two parameters and concatenates them. You can also nest multiple
CONCAT function calls, as shown in Listing 8. The queries in Listings 7 and 8 concatenate literal
strings with column data values. (I prefer the concatenation operator, because it has unlimited
input parameters and makes the concatenated output more readable.)
Code Listing 8: Query that demonstrates nested CONCAT calls

SQL> select CONCAT(first_name, CONCAT(' ', last_name)) employee_name
2 from employee
EMPLOYEE_NAME
Betsy James
Donald Newton
Emily Eckhardt
Frances Newton
Matthew Michaels
Roger Friedli
mark leblanc
michael peterson
8 rows selected.

Giving Your Data a Trim
Sometimes you want to remove unwanted spaces or characters from data when you display it. For
example, data inserted into a table column via a form application might include extraneous
characters or spacespreceding and/or following the actual data valuethat the form input field
doesnt trim.
SQL 101
Next Steps
design and concepts
Listing 9 shows a query that trims extra spaces from string values. The TRIM function in Listing 9
takes two parameters. The first parameter is the character, symbol, or space (delimited by single
quotes) to be removed. The second parameter specifies the string literal or column value to be
trimmed. The TRIM function supports three keywords: LEADING, TRAILING, and BOTH. The
example in Listing 9 uses the TRAILING keyword to right-trim the FIRST_NAME value. The TRIM
function applied to the LAST_NAME value specifies the LEADING keyword to left-trim the spaces
from that value. And, as you can see, the spaces to the right of the LAST_NAME value remain
and are included in the output.
Code Listing 9: Query that trims extra spaces

SQL> select '''' ||TRIM(TRAILING ' ' FROM 'Ashton ') || ''''
first_name,
'''' || TRIM(LEADING ' ' FROM ' Cinder ') || '''' last_name
2 from dual;
FIRST_NA LAST_NAME

'Ashton' 'Cinder '
1 row selected.

Compare the output in Listing 9 with that in Listing 10, which trims the rightmost extra spaces from
the LAST_NAME value. When no keyword is specified, the default behavior for the TRIM function
is to trim leading as well as trailing characters. The older RTRIM and LTRIM functions are
available for backward compatibility.
Code Listing 10: Query that trims extra spaces, including rightmost extra spaces

SQL> select '''' || TRIM(TRAILING ' ' FROM 'Ashton ') || ''''
first_name,
'''' || TRIM(' Cinder ') || '''' last_name
2 from dual;
FIRST_NA LAST_NAM

'Ashton' 'Cinder'
1 row selected.

Searching for Strings Within Strings
When you need to search column values for similar string pattern values, you can do so with the
INSTR character function. INSTRwhich stands for in stringreturns the position of a substring
within a string value. Listing 11 demonstrates the INSTR function applied to the LAST_NAME
column of the EMPLOYEE table to locate all occurrences of the ton substring. As you can see,
the INSTR function takes as input the literal or column value you want to search, followed by the
substring pattern to search for. In Listing 11, the INSTR function finds the ton pattern in only two
column data valuesboth of them Newtonand returns 4 as their position. Because it did not find
the search string in any other values, the output for those values is 0.
Code Listing 11: Query that demonstrates the INSTR character function

SQL> select last_name, INSTR(last_name, 'ton') ton_starting_point
2 from employee
3 order by last_name;
LAST_NAME TON_STARTING_POINT

Eckhardt 0
Friedli 0
James 0
Michaels 0
Newton 4
Newton 4
leblanc 0
peterson 0
8 rows selected.

Two additional parametersstarting position and occurrenceare optional. The starting position
specifies the character in the string from which to begin your search. The default behavior is for
the search to begin at the first characterotherwise known as character position 1. The
occurrence parameter lets you specify which occurrence of the substring youd like to find. For
example, the word Mississippi includes two occurrences of the issi substring. To search for the
starting-position location of the second occurrence of this pattern, you must provide the INSTR
function with an occurrence parameter of 2:

SQL> select INSTR('Mississippi', 'issi', 1, 2)
2 from dual;
INSTR('MISSISSIPPI','ISSI',1,2)
5
1 row selected.

Extracting Strings from Strings
Sometimes you need to extract a portion of a string
for your desired output. The SUBSTR (for substring)
character function can assist you with this task.
Listing 12 shows a query that uses the SUBSTR
SQL 101
Release 2 (11.2)
Release 3.1
function to extract the first three characters of every
LAST_NAME value from the EMPLOYEE table. The
SUBSTR function takes two required parameters
and one optional input parameter. The first
parameter is the literal or column value on which you
want the SUBSTR function to operate. The second
parameter is the position of the starting character for
the substring, and the optional third parameter is the
number of characters to be included in the substring.
If the third parameter is not specified, the SUBSTR
function will return the remainder of the string.
Code Listing 12: Query that demonstrates the
SUBSTR character function

SQL> select last_name, SUBSTR(last_name, 1, 3)
2 from employee
LAST_NAME SUB

Eckhardt Eck
Friedli Fri
James Jam
Michaels Mic
Newton New
Newton New
leblanc leb
peterson pet
8 rows selected.

Listing 13 demonstrates the SUBSTR and INSTR functions working together to display the portion
of every LAST_NAME value from the EMPLOYEE table that contains the ton substring. In this
example, the output from the INSTR function provides the value for the input parameter that
specifies the position for the SUBSTR functions starting character. In the LAST_NAME values in
which the substring ton is not found, the entire LAST_NAME value is returned, for two reasons:
SUBSTR treats a starting position of 0 the same as a starting position of 1 (that is, as the first
position in the string), and because the query omits the optional length parameter, the full
remainder of the string is returned.
Code Listing 13: Query that demonstrates the INSTR and SUBSTR character functions

SQL> select last_name, INSTR(last_name, 'ton') ton_position,
SUBSTR(last_name,
INSTR(last_name, 'ton')) substring_ton
2 from employee
LAST_NAME TON_POSITION SUBSTRING_TON

Eckhardt 0 Eckhardt
Friedli 0 Friedli
James 0 James
Michaels 0 Michaels
Newton 4 ton
Newton 4 ton
leblanc 0 leblanc
peterson 0 peterson
8 rows selected.

When Size Matters
Occasionally you need to determine a strings lengthfor example, to determine the maximum
number of characters a form entry field should permit. Listing 14 shows a query that uses the
LENGTH function to display the length of all FIRST_NAME values from the EMPLOYEE table.
Code Listing 14: Query that demonstrates the LENGTH function

SQL> select first_name, LENGTH(first_name) length
2 from employee
3 order by length desc, first_name;
FIRST_NAME LENGTH

Frances 7
Matthew 7
michael 7
Donald 6
Betsy 5
Emily 5
Roger 5
mark 4
8 rows selected.

Character functions can also be placed in WHERE and ORDER BY clauses, as illustrated in
Listings 15 and 16.
In Listing 15, the total length of those employee first and last names concatenated together,
separated by a single space, is calculated with the LENGTH function. And only values that are
more than 15 characters long are returned in the result set. In Listing 16, the WHERE clause uses
the INSTR function nested inside the SUBSTR function to return only those employees whose last
names contain the substring tonthe resultant employee first and last name values are
concatenated and separated with a space. Finally, the querys ORDER BY clause sorts the
concatenated first and last name values from the SELECT list by character length in descending
order, by using the LENGTH character function.
SQL 101
Code Listing 15: Query that demonstrates a function in a WHERE clause

2 from employee
3 where LENGTH(first_name||' '||last_name) > 15
EMPLOYEE_NAME
Matthew Michaels
michael peterson
2 rows selected.

Code Listing 16: Query that demonstrates functions in a WHERE and an ORDER BY clause

2 from employee
3 where SUBSTR(last_name, INSTR(last_name, 'ton')) = 'ton'
4 order by LENGTH(employee_name) desc;
EMPLOYEE_NAME
Frances Newton
Donald Newton
2 rows selected.

Conclusion
This article has shown you how character functions can be used in SELECT statements to
manipulate the ways data is displayed. Youve seen how to convert data values to uppercase,
lowercase, and mixed cases and how to search for strings within strings. Youve also seen how to
pad and trim data and how to specify a strings total length. By no means does this article provide
an exhaustive list of the Oracle character functions. Review the documentation for more details:
bit.ly/HZUBC5.
The next installment of SQL 101 will discuss number functions and other miscellaneous functions.

SQL 101: From Floor to Ceiling and Other Functional Cases
http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52sql-1735910.html[8/30/2014 12:38:24 PM]
As Published In

September/October 2012
TECHNOLOGY: SQL 101

From Floor to Ceiling and Other
Functional Cases
By Melanie Caffrey

Part 6 in this series, A Function of Character (Oracle Magazine, July/August 2012), introduced
SQL character functions (also known as string functions or text functions) and showed how your
queries can use them to modify the appearance of character result set data. Similarly, you can
use SQL number functions to manipulate numerical data so that it displays differently from how it
is stored in the database. This article introduces you to some of the more commonly used SQL
number functions, along with some useful miscellaneous other functions.
operating system. I recommend installing Oracle Database, Express Edition 11g Release 2.
associated schemas, will be created for you. (Note that SQL_101 is the user account youll use for
for execution instructions.) Some of the examples also use the DUAL table. Recall that DUAL is
an Oracle system table owned by the SYS user, not the SQL_101 schema. DUAL contains no
meaningful data itself, but it is useful to query it as a way to experiment with functions that work on
literals.
A Nice Round Number
One frequently used number function, ROUND, enables you to round a numeric value that is
returned in a result set. For example, the simple query in Listing 1 uses this function to apply
conventional rounding to two numbers. One number is rounded down, and the other is rounded
up.
Code Listing 1: Using ROUND function to round one number up and another number down

SQL> select ROUND(7534.1238, 2), ROUND(99672.8591, 2)
2 from dual;
ROUND(7534.1238,2) ROUND(99672.8591,2)

7534.12 99672.86
1 row selected.

Number functions require input parameters that are numericeither a column with a NUMBER
datatype or a numeric literal. ROUND takes two parameters, one required and one optional. The
required parameter is the numeric value to be rounded. The optional parameter is an integer that
indicates the rounding precisionthat is, how many places to the right (indicated by a positive
integer) or left (indicated by a negative integer) of the decimal point the numeric value should be
rounded to. The query in Listing 1 applies the ROUND number function to two numeric literal
values. Both numbers are rounded to two digits to the right of the decimal point.
If you omit the second parameter, the ROUND function will round the numeric value to the nearest
whole number, as shown in Listing 2. The query in Listing 3 demonstrates that if you pass the
optional parameter a negative integer, the ROUND function will round the numeric value on the left
side of the decimal point.
Code Listing 2: Using ROUND function to round numeric values to whole numbers

SQL> select ROUND(7534.1238), ROUND(99672.8591)
2 from dual;
ROUND(7534.1238) ROUND(99672.8591)

7534 99673
1 row selected.

Code Listing 3: Using ROUND function to round numeric values to the left of the decimal point

SQL> select ROUND(7534.1238, -1), ROUND(99672.8591, -3)
2 from dual;
ROUND(7534.1238,-1) ROUND(99672.8591,-3)

7530 100000
1 row selected.

Cutting Your Data Off
The TRUNC function returns a numeric value truncated to a certain number of decimal places.
Like the ROUND function, it takes one required parameter and one optional parameter. The
required parameter is the number to be truncated. The optional parameter is a positive or a
negative integer. A positive integer specifies how many decimal places to truncate to. Listing 4
shows how the TRUNC function behaves when it is passed a positive value for the optional
parameter. Note that the query simply truncates the returned values, leaving off any digits beyond
two digits to the right of the decimal point.
Code Listing 4: Using TRUNC function to cut off digits to the right of the decimal point

SQL> select TRUNC(7534.1238, 2), TRUNC(99672.8591, 2)
2 from dual;
TRUNC(7534.1238,2) TRUNC(99672.8591,2)

7534.12 99672.85
1 row selected.

If you omit the optional parameter, the returned value will be truncated to zero decimal places, as
shown in Listing 5. When you use a negative integer for the optional parameter, as shown in
Listing 6, you are specifying how many digits to the left of the decimal point should be changed to
0 in the displayed results.
Code Listing 5: Using TRUNC function to truncate numeric values to whole numbers

SQL> select TRUNC(7534.1238), TRUNC(99672.8591)
2 from dual;
TRUNC(7534.1238) TRUNC(99672.8591)

7534 99672
1 row selected.

Code Listing 6: Using TRUNC function to truncate numeric values to the left of the decimal point

SQL> select TRUNC(7534.1238, -1), TRUNC(99672.8591, -3)
2 from dual;
TRUNC(7534.1238,-1) TRUNC(99672.8591,-3)

7530 99000
1 row selected.

Code Listing 5: Using TRUNC function to truncate numeric values to whole numbers

SQL> select TRUNC(7534.1238), TRUNC(99672.8591)
2 from dual;
TRUNC(7534.1238) TRUNC(99672.8591)

7534 99672
1 row selected.

Top to Bottom
Similar to ROUND and TRUNC are the FLOOR and CEIL number functions. The FLOOR function
determines the largest integer less than (or equal to) a particular numeric value. Conversely, the
CEIL function determines the smallest integer greater than (or equal to) a particular numeric value.
FLOOR and CEIL (unlike ROUND and TRUNC) do not take an optional parameter for precision,
because their output is always an integer. When all four of these functions are applied to a
positive number, as illustrated in Listing 7, FLOOR behaves similarly to TRUNC with no optional
parameter specified, and CEIL behaves similarly to ROUND with no optional parameter specified.
Note, however, that FLOOR behaves like ROUND and CEIL behaves like TRUNC when these
functions are applied to a negative number.
Code Listing 7: Using and comparing ROUND, CEIL, TRUNC, and FLOOR functions

SQL> select ROUND(99672.8591), CEIL(99672.8591), TRUNC(99672.8591), FLOOR(99672.8591)
2 from dual;
ROUND(99672.8591) CEIL(99672.8591) TRUNC(99672.8591) FLOOR(99672.8591)

99673 99673 99672 99672
1 row selected.

Arithmetic and Its Remains
The four arithmetic operators (+, , *, and /for addition, subtraction, multiplication, and division)
can be used in SQL statements and combined with one another and any of the number functions.
Listing 8 shows a query from the EMPLOYEE table that reports each employees annual base
salary, a calculated 3 percent bonus per salary value, and the weekly salary value (including
bonus) for each employee.
Code Listing 8: Arithmetic operators in combination with the ROUND number function

SQL> select first_name, last_name, salary, salary*.03 "BONUS",
ROUND((salary/52)+((salary*.03)/52)) "Weekly Sal w/Bonus"
2 from employee
3 order by salary desc, last_name;
FIRST_NAME LAST_NAME SALARY BONUS Weekly Sal w/Bonus

Emily Eckhardt 100000 3000 1981
michael peterson 90000 2700 1783
Donald Newton 80000 2400 1585
Frances Newton 75000 2250 1486
Matthew Michaels 70000 2100 1387
mark leblanc 65000 1950 1288
Roger Friedli 60000 1800 1188
Betsy James 60000 1800 1188
8 rows selected.

Two number functions, MOD and REMAINDER, can both be used to calculate the remainder of a
value divided by another value. Both functions require two parameters: the value to be divided and
the divisor. The MOD function uses the FLOOR function in its computation logic, and the
REMAINDER function uses ROUND. For this reason, the values returned from the two functions
can differ, as shown in Listing 9.
Code Listing 9: Differences between the MOD and REMAINDER function results

SQL> select MOD(49, 18) modulus, REMAINDER(49, 18) remaining
2 from dual;
MODULUS REMAINING

13 -5
1 row selected.

Replacing the Unknown with the Known
Recall that a null value in a table is the absence of a value. Null values cannot be compared with,
or computed with, one another. However, you can substitute a non-null value for a NULL value by
applying the NVL miscellaneous function to the NULL. The NVL function requires two input
parameters: the expression (a column value, literal value, or computed result) to be tested for
nullity and the expression to substitute for any NULL expressions in the results.
For example, Listing 10 shows a query that lists each employee alongside the EMPLOYEE ID
value of that persons manager. For the employees with no value for MANAGERthat is, those
whose MANAGER values are NULL in the databasethe results display 0 as the managers
EMPLOYEE ID. This occurs because the query applies the NVL function to each MANAGER
value, substituting 0 for any NULL.
Code Listing 10: Substitute a NULL value for MANAGER with a value of 0

SQL> select employee_id, last_name, first_name, NVL(manager, 0) manager
2 from employee
3 order by manager, last_name, first_name;
EMPLOYEE_ID LAST_NAME FIRST_NAME MANAGER

28 Eckhardt Emily 0
37 Newton Frances 0
6569 peterson michael 0
6567 Friedli Roger 28
6568 James Betsy 28
7895 Michaels Matthew 28
1234 Newton Donald 28
6570 leblanc mark 6569
8 rows selected.

As you can also see in Listing 10, the original value of the tested expression is returned if it is not
NULL.
In the Listing 10 example, a returned value of 0 might not tell you as clearly as youd like that
certain employee records have no assigned manager value. Instead, you might prefer to return
text that states this fact explicitly.
Listing 11 shows a query that tries to replace each NULL value with a more descriptive text literal.
The query returns an error, however, because the NVL function requires the substitution value to
be convertible to the datatype of the comparison value. However, you can obtain the textual
output in a few ways. Listing 12 demonstrates one of them: the inclusion of a datatype conversion
function, TO_CHAR. Datatype conversion functions will be discussed in detail in subsequent
articles in this series.
Code Listing 11: Attempt to replace a returned NULL value with a text value

SQL> select employee_id, last_name, first_name, NVL(manager,
'Has no manager') manager
2 from employee
select employee_id, last_name, first_name, NVL(manager, 'Has no manager') manager
*
Next Steps
READ SQL 101, Parts 16
design and concepts
Release 1 (11.2)
Release 3.1
number and substitution functions
bit.ly/MgvEzi
bit.ly/LN8F0d
ERROR at line 1:
ORA-01722: invalid number

Code Listing 12: Replace a returned NULL value with a text value by using TO_CHAR

SQL> select employee_id, last_name, first_name, NVL(TO_CHAR(manager),
'Has no manager') manager
2 from employee

6568 James Betsy 28
28 Eckhardt Emily Has no manager
37 Newton Frances Has no manager
6569 peterson michael Has no manager
8 rows selected.

Adding More Detail with DECODE
Sometimes a simple substitution function such as NVL doesnt provide all the choices you require.
The DECODE function uses if-then-else logic to give you more than one possible substitution
choice.
The syntax for the DECODE function starts with an input expression. It compares that expression
with a comparison value. If the two values match (this is the if-then portion of the DECODE
logic), the DECODE function returns the substitution value. If the two values do not match, the
input expression will be compared with the next available comparison value. If another comparison
value is not provided, the optional default substitution value (the else portion of the DECODE
logic) will be returned. Listing 13 demonstrates the syntax for the DECODE function. It also
demonstrates how DECODE functions can be nested inside one another.
Code Listing 13: DECODE substitution function

SQL> select employee_id, first_name, last_name, DECODE(manager, 28,
'Emily Eckhardt', 6569, 'Michael Peterson', DECODE(employee_id, 28,
'Is Emily', 6569, 'Is Michael', 'Neither Emily nor Michael')) manager
2 from employee

6567 Roger Friedli Emily Eckhardt
6568 Betsy James Emily Eckhardt
7895 Matthew Michaels Emily Eckhardt
1234 Donald Newton Emily Eckhardt
28 Emily Eckhardt Is Emily
6569 michael peterson Is Michael
6570 mark leblanc Michael Peterson
37 Frances Newton Neither Emily nor Michael
8 rows selected.

The query in Listing 13, like the one in Listing 12, substitutes a textual value for the actual
MANAGER value for each employee record. Note, though, that with DECODE, you can repeat the
test and substitute valuesthat is, repeat the if-then logicas much as you require. Another
difference from NVL is that DECODE can test for conditions other than nullity; for example, the
query in Listing 13 tests whether a particular value exists.
If you do want to test for nullity with DECODE, you can write a query such as
SELECT DECODE(manager, NULL, 'Has no Manager', manager) FROM employee;

In this example, if the value obtained from the
MANAGER column is NULL, the Has no manager
string will be returned. Otherwise, the non-null
manager value will be returned. You might be
wondering why this statement does not return an
error, given that the MANAGER value is of a
different datatype than the string that would be
returned if the MANAGER value were NULL. The
reason is implicit datatype conversion. Oracle
Database implicitly converts a number to a string in
situations like this example. (It does notand
cannotconvert a string to a number.) However, it is
not good practice to allow Oracle Database to
perform implicit datatype conversions. If you need a
datatype conversion, you should always perform a
call to a datatype conversion function explicitly.
A Case for Comparative Searches
Although the DECODE function is more powerful
than the NVL function, it cannot be (easily) used for
comparisons other than equality (and inequality.) A
searched CASE expression can not only be used in
place of the DECODE function but can also be used more easily for greater-than or less-than
comparisons.
Listing 14 shows a query that uses a searched CASE expression. As you can see, the searched
CASE expression starts with the CASE keyword and ends with the END keyword. Each WHEN
clause tests a condition. If a condition is true, the CASE expression will return the value specified
in the associated THEN clause. Like the DECODE functions ELSE condition, the default ELSE
condition in the searched CASE expression is optional. CASE expressions can be used in
WHERE clauses, as shown in Listing 15. They can even be nested, as shown in Listing 16.
Code Listing 14: Searched CASE expression in a less-than comparison

SQL> select employee_id, first_name, last_name, salary,
2 CASE WHEN manager = 28 THEN 'Emily is the manager. No bonus this year.'
3 WHEN salary < 80000 THEN 'Bonus this year.'
4 ELSE 'No bonus this year.'
5 END "Bonus?"
6 from employee
7 order by last_name, first_name;
EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY Bonus?

28 Emily Eckhardt 100000 No bonus this year.
6567 Roger Friedli 60000 Emily is the manager. No bonus this
year.
6568 Betsy James 60000 Emily is the manager. No bonus this
year.
7895 Matthew Michaels 70000 Emily is the manager. No bonus this
year.
1234 Donald Newton 80000 Emily is the manager. No bonus this
year.
37 Frances Newton 75000 Bonus this year.
6570 mark leblanc 65000 Bonus this year.
6569 michael peterson 90000 No bonus this year.
8 rows selected.

Code Listing 15: Searched CASE expression in a WHERE clause

SQL> select employee_id, first_name, last_name, salary
2 from employee
3 where salary + CASE
4 WHEN ROUND((salary/52)+((salary*.03)/52)) > 1500
5 THEN 0
6 WHEN ROUND((salary/52)+((salary*.03)/52)) < 1300
7 THEN 500
8 ELSE 200
9 END > 75000
EMPLOYEE_ID FIRST_NAME LAST_NAME SALARY

28 Emily Eckhardt 100000
37 Frances Newton 75000
6569 michael peterson 90000
4 rows selected.

Code Listing 16: Nested searched CASE expressions

SQL> select employee_id, first_name, last_name,
2 CASE manager WHEN 28 THEN 'Emily Eckhardt'
3 WHEN 6569 THEN 'Michael Peterson'
4 ELSE
5 CASE employee_id WHEN 28 THEN 'Is Emily'
6 WHEN 6569 THEN 'Is Michael'
7 ELSE 'Neither Emily nor Michael'
8 END
9 END manager
10 from employee

6567 Roger Friedli Emily Eckhardt
6568 Betsy James Emily Eckhardt
7895 Matthew Michaels Emily Eckhardt
1234 Donald Newton Emily Eckhardt
28 Emily Eckhardt Is Emily
6569 michael peterson Is Michael
6570 mark leblanc Michael Peterson
37 Frances Newton Neither Emily nor Michael
8 rows selected.

Conclusion
This article has shown you a few of the more common number functions and how you can use
them to manipulate the way your data displays. Youve seen how to round numeric data values up
and down and how to truncate them. You now know how the ROUND and TRUNC number
functions behave, in comparison to FLOOR and CEIL. Youve also seen that the MOD and
REMAINDER number functions can return different values because of the type of computation
each one uses. Last but not least, you understand the power and differences of substitution
functions such as NVL, DECODE, and searched CASE expressions.
This article has by no means provided an exhaustive list of the Oracle number and miscellaneous
substitution functions. Review the documentation at bit.ly/MgvEzi and bit.ly/LN8F0d for more
information.
The next installment of SQL 101 will discuss date and datatype conversion functions.
SQL 101: Selecting a Type That Is Right for You
http://www.oracle.com/technetwork/issue-archive/2012/12-nov/o62sql-1867727.html[8/30/2014 12:40:05 PM]
As Published In

November/December 2012
TECHNOLOGY: SQL 101

Selecting a Type That Is Right for
You
By Melanie Caffrey

Part 7 in this series, From Floor to Ceiling and Other Functional Cases (Oracle Magazine,
September/October 2012), introduced common SQL number functions and showed how your
queries can use them to modify the appearance of numeric result set data. It also introduced SQL
substitution functions and showed how you can use them to manipulate result set data to convey
more-meaningful results. Similarly, you can use SQL date functions and datatype conversion
functions to manipulate data so that it displays differently from how it is stored in the database.
This article introduces you to some of the more commonly used SQL date functions, along with
some useful datatype conversion functions.
To try out the examples in this series, you need access to an Oracle Database instance. If
necessary, download and install an Oracle Database edition for your operating system. I
recommend installing Oracle Database, Express Edition 11g Release 2.
associated schemas, will be created for you. (Note that SQL_101 is the user account to use for
the SQL_101 schema that is required for this articles examples. (View the script in a text editor for
execution instructions.) Some of the examples also use the DUAL table. Recall that DUAL is an
Oracle system table owned by the SYS user, not the SQL_101 schema. DUAL contains no
meaningful data itself, but it is useful to query it as a way to experiment with functions that work on
literals.
The Perfect Format for Your Date
The DATE datatype is stored in Oracle Database in an internal format that consists of both date
and time information: the century, year, month, day, hour, minute, and second. For input and
output of dates, every Oracle Database instance has a default date format model (also called a
mask) that is set by the NLS_DATE_FORMAT initialization parameter. (Initialization parameters
determine the default settings for Oracle Database instances. Users who have appropriate
permissions can change some of these parameters on a per-database, per-instance, or per-
session basis.) When you first query the data stored in a table column with a DATE datatype,
Oracle Database displays it with a format mask of either DD-MON-YYYY or DD-MON-RR,
depending on which is set as the default.
The RR format mask, which represents a two-digit year, was introduced to deal with end-of-
century issues such as the Y2K problem. With RR, a two-digit year can refer to a year in the
previous, current, or next centurydepending on the current year and the two-digit year specified
in the query. Table 1 shows the relationship between the current year, the range of two-digit year
combinations, and the corresponding century referred to as a result.
Last Two Digits of Current
Year
Two-Digit Year Specified in
Query
Century Referred
To
Between 00 and 49 Between 00 and 49 Current
Between 00 and 49 Between 50 and 99 Previous
Between 50 and 99 Between 00 and 49 Next
Between 50 and 99 Between 50 and 99 Current
Table 1: Relationship among current year, two-digit year specified, and the century referred to as
a result
For example, the last two digits of the current year (2012) are 12, which falls between 00 and 49.
A SQL query issued during 2012 that specifies an RR year value of 15, therefore, refers to the
year ending in 15 (2015) in the current century (the twenty-first), because 15 is between 0 and 49.
A query issued in 2012 that specifies an RR year value of 98 refers to the year ending in 98
(1998) in the previous century (the twentieth), because 98 is between 50 and 99.
The query in Listing 1 uses the EMPLOYEE table in the sample schema for this article. The query
displays employees sorted from most recent to least recent date of hire. As you can see, the hire
date data is displayed in DD-MON-RR format. For example, it shows that Roger Friedli was hired
on 16-MAY-07. To change the way this data is displayed, you use the TO_CHAR conversion
function in conjunction with a format model of your choosing. (You had a brief introduction to
TO_CHAR in the last installment, where you saw that it can be used to convert a number to a text
string.)
Code Listing 1: Display date data in the Oracle Database default date format
SQL> select first_name, last_name, hire_date
2 from employee
3 order by hire_date desc, last_name, first_name;

Theresa Wong 27-FEB-10
Thomas Jeffrey 27-FEB-10
mark leblanc 06-MAR-09
michael peterson 03-NOV-08
Roger Friedli 16-MAY-07
Betsy James 16-MAY-07
10 rows selected.

The query in Listing 2 modifies the way the date data from Listing 1 is displayed. To convert data
of DATE datatype to a specific date format model, TO_CHAR takes one required parameter and
one optional parameter. The required parameter is data of DATE datatype from a column,
expression, or literal. The optional parameter is a textual format-mask representation of the date to
be displayed. In Listing 2, the default format mask of DD-MON-RR is changed to display as
YYYY-MM-DD.
Code Listing 2: Display date data in a different format by using TO_CHAR with a format mask
SQL> select first_name, last_name, TO_CHAR(hire_date, 'YYYY-MM-DD')
hire_date
2 from employee

Thomas Jeffrey 2010-02-27
Theresa Wong 2010-02-27
mark leblanc 2009-03-06
michael peterson 2008-11-03
Roger Friedli 2007-05-16
Betsy James 2007-05-16
Matthew Michaels 2007-05-16
Donald Newton 2006-09-24
Frances Newton 2005-09-14
Emily Eckhardt 2004-07-07
10 rows selected.

Listing 3 demonstrates that the second parameter for TO_CHAR is optional. If it is left off, the
format mask of the date data returned will simply be the default format mask. Note also that the
datatype of the date returned is VARCHAR2. The output from Listing 3 is sorted by HIRE_DATE in
descending order, but in character, not date, descending order. So, be aware that when you apply
the TO_CHAR conversion function, your data is returned as character strings; you should plan and
sort accordingly.
Code Listing 3: Default date format mask is used when optional parameter is not provided
SQL> select first_name, last_name, TO_CHAR(hire_date)
hire_date_formatted
2 from employee
3 order by hire_date_formatted desc, last_name, first_name;

Roger Friedli 16-MAY-07
Betsy James 16-MAY-07
10 rows selected.

Dates with Strings Attached
Just as you can convert a date to a string, you can convert a string literal to a date. The resulting
expression can be compared with any other columns data of DATE datatype or another date
expression. You perform the conversion by applying the TO_DATE conversion function to a text
string, as shown in Listing 4. The query in Listing 4 not only returns all employees whose
HIRE_DATE value is found to be greater than the date value 01-JAN-2008; it also demonstrates
that the TO_DATE conversion function can be used in WHERE clauses as well as SELECT lists.
The TO_DATE function is applied to the string literal 01-JAN-2008, with a format mask that helps
the database interpret the supplied literal as a date.
Code Listing 4: Use the TO_DATE conversion function in a WHERE clause
SQL> select first_name, last_name, TO_CHAR(hire_date, 'DD-MON-YYYY')
hire_date
2 from employee
3 where hire_date > TO_DATE('01-JAN-2008', 'DD-MON-YYYY')

4 rows selected.

When you provide a format mask to the TO_DATE function, the mask you choose must be the
same as the one used in the string literal you supply. If the two do not agree, you will receive an
error message similar to the one shown in Listing 5. When you convert a text literal, it is good
practice to use the TO_DATE conversion function and explicitly specify an appropriate format
mask. This way, your statement can be interpreted independently of any database, instance, or
session default date settings.
Code Listing 5: Error when the format mask does not match the provided string literal
hire_date
2 from employee
3 where hire_date > TO_DATE('01-JAN-2008', 'MM/DD/RR')
where hire_date > TO_DATE('01-JAN-2008', 'MM/DD/RR')
*
ERROR at line 3:
ORA-01858: a non-numeric character was found where a numeric was
expected

Oracle Database will perform implicit date conversion where it can, if (and only if) the literal is
already in the default date format. However, I do not recommend that you allow it to do so,
because your code will be more fragile and less likely to perform well long-term. Listing 6 shows a
query that relies on the default date format in Oracle Database and its ability to perform implicit
date conversion on a string literal. Compare the result in Listing 6 with that in Listing 7, which also
attempts to perform an implicit date conversion. The query in Listing 7 fails because the database
cannot interpret the date format mask of the literal value being compared with the values in the
HIRE_DATE column of the EMPLOYEE table.
Code Listing 6: Implicit date conversion (not recommended) returns a result set
hire_date
2 from employee
3 where hire_date > '01-JAN-2008'

4 rows selected.

Code Listing 7: Attempted implicit date conversion fails
hire_date
2 from employee
3 where hire_date > '01/01/2008'
where hire_date > '01/01/2008'
*
ERROR at line 3:
ORA-01843: not a valid month

Because the default date format can be changed, it is best not to allow your queries to rely on an
expected default format. Instead, always use the TO_DATE function on date string literals. One
way to find out which default date format your current session is using is to execute the query
shown in Listing 8. The SYS_CONTEXT function can be used by any session (and, therefore, any
user) to see current session attributes.
Code Listing 8: Find the default date format for your current session
SQL> select sys_context ('USERENV', 'NLS_DATE_FORMAT')
2 from dual;
SYS_CONTEXT('USERENV','NLS_DATE_FORMAT')
DD-MON-RR
1 row selected.

Taking Time with Your Dates
Recall that the Oracle DATE datatype includes a time component. You can either ignore the time
component, as the examples in this article have done so far, or you can include it for display or
comparison purposes. Listing 9 shows a query that includes the time component from each
HIRE_DATE value for every employee listed in the EMPLOYEE table. Note that all the employee
records except the one for Theresa Wong show a time value of 12:00:00. If you do not include a
time when inserting a value into a column with a DATE datatype, the time will default to midnight
(12:00:00 a.m. or 00:00:00 military time). To display or compare a date value in military time, use
the HH24 format mask instead of HH.
Code Listing 9: Display the time component of a value with a DATE datatype
SQL> select first_name, last_name, TO_CHAR(hire_date, 'DD-MON-YYYY HH:MI:SS')
hire_date
2 from employee

Thomas Jeffrey 27-FEB-2010 12:00:00
Theresa Wong 27-FEB-2010 09:02:45
Donald Newton 24-SEP-2006 12:00:00
Roger Friedli 16-MAY-2007 12:00:00
Betsy James 16-MAY-2007 12:00:00
Matthew Michaels 16-MAY-2007 12:00:00
Frances Newton 14-SEP-2005 12:00:00
Next Steps
LEARN more about date and datatype
conversion functions
bit.ly/PR7GQh
bit.ly/NOgf01
READ more about
relational database design and concepts
Oracle Database Concepts 11g Release 2 (11.2
Oracle Database SQL Language Reference 11g
Release 1 (11.1)
Oracle SQL Developer Users Guide Release
3.1
Oracle Database development essentials
Oracle Database 2 Day Developers Guide 11g
Release 2 (11.2)
DOWNLOAD the sample script for this article
Emily Eckhardt 07-JUL-2004 12:00:00
mark leblanc 06-MAR-2009 12:00:00
michael peterson 03-NOV-2008 12:00:00
10 rows selected.

Unless you know the exact time of the date values on which youd like to filteror unless all the
time portions for your date values are already set to midnightusing date values in your WHERE
clauses can produce unexpected results. Consider the query in Listing 10. You know from the
results in the previous listings that two employees were hired on February 27, 2010, yet only one
is returned in Listing 10s result set. The reason is that the TO_DATE function in the WHERE
clause does not specify an exact time, so Oracle Database assumes that the time is midnight and
returns only those records that contain the specified date value and midnight as the time
component.
Code Listing 10: WHERE clause using TO_DATE might not capture all possible values
hire_date
2 from employee
3 where hire_date = TO_DATE('27-FEB-2010', 'DD-MON-YYYY')

1 row selected.

Cutting Your Date Short
When you would like to be able to filter on
a certain date but do not want to have to
include each individual time component,
you can use a couple of different
methods. One method is to include the
TRUNC function (introduced in the
previous installment in this series). It, like
the TO_CHAR function, works not only on
numbers but also on date values. The
TRUNC function helps cut off the time
portion of a date if no optional format
parameter is passed to it. This can be
useful for date comparison purposes.
Listing 11 shows a revised version of the
query from Listing 10. As you can see,
eliminating the time portion of the values
in the HIRE_DATE column enables the
comparison against the date value 27-
FEB-2010 to retrieve all records with a
HIRE_DATE value of 27-FEB-2010,
irrespective of the time. The truncated
HIRE_DATE value is made into a date
only value to be compared with the
corresponding date only value returned
from the result of applying the TO_DATE function on the literal string 27-FEB-2010 with a date-
only format.
Code Listing 11: Truncate the time from a DATE value to return all records for a particular day
hire_date
2 from employee
3 where TRUNC(hire_date) = TO_DATE('27-FEB-2010', 'DD-MON-YYYY')

2 rows selected.

Be aware, however, that you might sacrifice performance by applying a function to your table
column values in a WHERE clause. Indexes (used to assist with data access efficiencyand not
discussed in this series) can improve query performance in certain situations. Applying a function
to a table column has the effect of ensuring that an index on the column might never be used.
Also, this function would be applied to every value in that column for every row. Both actions are
extreme performance inhibitors. Therefore, another method you can use is to specify a date range
outside of the date(s) you would actually prefer to filter on. The query in Listing 12 retrieves the
same result set as the query in Listing 11. The difference between the two is that the query in
Listing 12 does not apply a function to the HIRE_DATE column data. Instead, it chooses a range
just outside of the desired date(s) and encloses the filtered date data inside this range of values.
Code Listing 12: Date range that returns records for a particular day
hire_date
2 from employee
3 where hire_date >= TO_DATE('27-FEB-2010', 'DD-MON-YYYY')
4 and hire_date < TO_DATE('28-FEB-2010', 'DD-MON-YYYY')

2 rows selected.

A System for Getting Your Dates Right
You will often need to perform date arithmetic. A useful built-in function (one already built into
Oracle Database) is SYSDATE. This function returns the current date and time that are set on the
operating system of the computer on which the database resides. It takes no parameters. Listing
13 shows an example of using the SYSDATE function to return and display the current date and
time.
Code Listing 13: The SYSDATE function
SQL> select SYSDATE, TO_CHAR(SYSDATE, 'DD-MON-YYYY HH24:MI:SS') sysdate_with_time
2 from dual;
SYSDATE SYSDATE_WITH_TIME

08-AUG-12 08-AUG-2012 14:25:08
1 row selected.

SYSDATE can be extremely useful in date arithmetic. Listing 14 shows how many days are left in
2012 from the current date (August 8, 2012, in the example). Note that if the SYSDATE value
were not truncated, the returned DAYS_TILL_2013 value would include some fraction of the
SYSDATE value (to account for the time component). Because it is truncated, however, the entire
current date is subtracted from January 1, 2013, to arrive at the result of 146 days left in the year.
Listing 15 uses SYSDATE and date arithmetic (using a date function called
MONTHS_BETWEEN) against the HIRE_DATE column of the EMPLOYEE table, to show the
number of years of service for each employee.
Code Listing 14: SYSDATE used in date arithmetic
SQL> select SYSDATE, (TO_DATE('01-JAN-2013', 'DD-MON-YYYY') -
TRUNC(SYSDATE))
Days_till_2013
2 from dual;
SYSDATE DAYS_TILL_2013

08-AUG-12 146
1 row selected.

Code Listing 15: SYSDATE and date arithmetic combined with DATE data
SQL> select substr(last_name, 1, 10) last_name, substr(first_name, 1,
10)
first_name, hire_date, ROUND(MONTHS_BETWEEN(TRUNC(SYSDATE), TRUNC(HIRE_
DATE))/12, 2) YEARS_OF_SERVICE
2 from employee
3 order by years_of_service desc, last_name, first_name;
LAST_NAME FIRST_NAME HIRE_DATE YEARS_OF_SERVICE

Eckhardt Emily 07-JUL-04 8.09
Newton Frances 14-SEP-05 6.9
Newton Donald 24-SEP-06 5.88
Friedli Roger 16-MAY-07 5.23
James Betsy 16-MAY-07 5.23
Michaels Matthew 16-MAY-07 5.23
peterson michael 03-NOV-08 3.77
leblanc mark 06-MAR-09 3.42
Jeffrey Thomas 27-FEB-10 2.45
Wong Theresa 27-FEB-10 2.45
10 rows selected.

Another method for performing date arithmetic is to use the BETWEEN operator, as demonstrated
by the query in Listing 16. Be aware, however, that the BETWEEN operator uses the midnight (or
00:00:00) time component of the upper-range value in a date-range comparison. To include all
possible values for the date specified in the upper range of the date comparison, ensure that the
date includes the full time component of your upper range. In the example in Listing 16, an upper-
range date value of 27-FEB-2010 23:59:59 would have allowed both employee records with a
HIRE_DATE value of 27-FEB-2010 to be included in the result set.
Code Listing 16: BETWEEN operator uses midnight in a date range comparison
SQL> select last_name, first_name, hire_date
2 from employee
3 where hire_date BETWEEN TO_DATE('26-FEB-2010', 'DD-MON-YYYY')
4 AND TO_DATE('27-FEB-2010', 'DD-MON-YYYY');

Jeffrey Thomas 27-FEB-10
1 row selected.

Conclusion
This article has shown you a few of the most common date functions and how they can be used
to manipulate the way data is displayed. Youve seen how to use the TO_CHAR and TO_DATE
conversion functions and have learned the differences between them. You now know that dates
all contain a time component that can be used or truncated according to your needs. Youve been
introduced to the SYSDATE function and date arithmetic. Last but not least, you now know the
pitfalls to be aware of when you use DATE comparisons in WHERE clauses with TO_DATE and
BETWEENand what you can do to avoid unexpected results. By no means has this article
provided an exhaustive list of the Oracle Database date and datatype conversion functions. You
can review the documentation for more details at bit.ly/PR7GQh and bit.ly/NOgf01. The next
installment of SQL 101 will discuss aggregate functions.
SQL 101: Having Sums, Averages, and Other Grouped Data
As Published In

January/February 2013
Next Steps
READ more about
relational database design and
concepts
Release 2 (11.2)
Release 3.1
aggregate functions
this article
TECHNOLOGY: SQL 101

Having Sums, Averages, and Other
Grouped Data
By Melanie Caffrey

Part 8 in this series, Selecting a Type That Is Right for You (Oracle Magazine,
November/December 2012), introduced common SQL date functions and showed how your
queries can use them to modify the appearance of date result set data. It also introduced the
SYSDATE function and date arithmetic and showed how they can be used to manipulate result
set data to convey more-meaningful results. So far, all of the functions discussed in this series
operate on single-row results. Aggregate functions (also called group functions) operate on
multiple rows, enabling you to manipulate data so that it displays differently from how it is stored in
the database. This article introduces you to some of the more commonly used SQL group
functions, along with the GROUP BY and HAVING clauses.
To try out the examples in this series, you need
access to an Oracle Database instance. If
necessary, download and install an Oracle Database
edition for your operating system from. I recommend
installing Oracle Database, Express Edition 11g
Release 2. If you install the Oracle Database
software, choose the installation option that enables
you to create and configure a database. A new
database, including sample user accounts and their
associated schemas, will be created for you. (Note
that SQL_101 is the user account to use for the
examples in this series; its also the schema in which
youll create database tables and other objects.)
When the installation process prompts you to specify
schema passwords, enter and confirm passwords
for SYS and SYSTEM and make a note of them.
Finallywhether you installed the database software
from scratch or have access to an existing Oracle
Database instancedownload, unzip, and execute
the SQL script to create the tables for the SQL_101
schema that are required for this articles examples. (View the script in a text editor for execution
instructions.)
The Sum of All Rows
All aggregate functions group data to ultimately produce a single result value. Because aggregate
functions operate on multiple row values, you can use them to generate summary data such as
totals. For example, you can answer budget planning questions such as What is the total allotted
amount of annual salary paid to all employees? The query in Listing 1 demonstrates use of the
SUM aggregate function to answer this question. It adds together all the values in the EMPLOYEE
tables SALARY column, resulting in the total value 970000.
Code Listing 1: Display the sum of all salary values in the EMPLOYEE table
SQL> select SUM(salary) from employee;
SUM(SALARY)
970000
1 row selected.

When You Strive for Average
Another example of a business question you can answer by using an aggregate function is, What
is currently the average annual salary for all employees? Like the query in Listing 1, the query in
Listing 2 applies an aggregate function to the EMPLOYEE tables SALARY column. The AVG
function in Listing 2 sums up the salary values and then divides the total by the number of
employee records with non-null salary values. With the total of 970000 paid annually divided by 10
employees, the average annual salary value is 97000.
Code Listing 2: Compute the average salary value across all non-null salary values
SQL> select AVG(salary) from employee;
AVG(SALARY)
97000
1 row selected.

The EMPLOYEE table holds 11 records, but the average-salary computation in Listing 2 considers
only 10 records. This happens because the AVG aggregate function ignores null values. (In the
EMPLOYEE table, the null salary value is for the employee Lori Dovichi.) To substitute a non-null
value for any null values, you can nest a NVL function call (introduced in Part 7 of this series)
inside the call to the AVG function, as demonstrated in Listing 3. The average salary value
returned in Listing 3 is lower than the value returned in Listing 2, because the null salary value for
Lori Dovichi has been replaced with a 0 and evaluated along with all other non-null salary values.
Substitute non-null values for null values only when it makes sense to do so from a business
perspective.
Code Listing 3: Substitute a non-null value for any null values
SQL> select AVG(NVL(salary, 0)) avg_salary
2 from employee;
AVG_SALARY
88181.8182
1 row selected.

Keeping Count
As you know from previous articles in this series, the SQL*Plus set feedback on command
displays a count of records that satisfy your query criteria. This method works well when the
database quickly returns a small number of records that can display easily on your screen. But its
unwieldy when you evaluate hundreds, thousands, or millions of records, because you must wait
for all the records in the result set to be fetched from the database and returned to your client. A
more efficient alternative is to use the COUNT aggregate function, demonstrated in Listing 4.
Code Listing 4: Obtain a count of all employees by using COUNT(*)
SQL> select COUNT(*)
2 from employee;
COUNT(*)
11
1 row selected.

The COUNT aggregate function counts the number of records that satisfy the query condition. The
query in Listing 4 uses COUNT(*)which returns the count of all rows that satisfy the query
conditionto obtain a count of all the records in the EMPLOYEE table. COUNT(*) does not ignore
null values, whereas COUNT with a column input does. By comparing Listings 4 and 5, you can
see that the returned value is the same whether you count all columns in the EMPLOYEE table
with COUNT(*) or count just the primary key column with COUNT(employee_id).
Code Listing 5: Obtain a count of all employees by applying COUNT to the primary key column
SQL> select COUNT(employee_id)
2 from employee;
COUNT(EMPLOYEE_ID)
11
1 row selected.
However, contrast a call to COUNT(*) or COUNT(employee_id) with a call to COUNT(manager),
demonstrated in Listing 6. In the EMPLOYEE table, 5 of the 11 records have no value for the
MANAGER column, so they are not included in the single count value that is returned.
Code Listing 6: Apply COUNT to a column that contains null values
SQL> select COUNT(manager)
2 from employee;
COUNT(MANAGER)
7
1 row selected.

The query in Listing 7 demonstrates that a call to COUNT(*) or COUNT(column_name) returns a
result value of 0 if no rows match the query condition. The query requests a count of all rows and
a count of all manager values for all employee records with a hire date matching the current days
system date, SYSDATE. Because no one was hired on the date the query was run, a count value
of 0 is returned.
Code Listing 7: COUNT(*) and COUNT(column_name) both return 0 when no rows match
SQL> select COUNT(*), COUNT(manager)
2 from employee
3 where hire_date > TRUNC(SYSDATE);
COUNT(*) COUNT(MANAGER)

0 0
1 row selected.

If your goal is to determine a count of distinct values, you can combine the COUNT aggregate
function with the DISTINCT keyword. Listing 8 shows a query that determines the count of
DISTINCT or UNIQUE (a keyword you can use instead of DISTINCT) MANAGER column values in
the EMPLOYEE table. A count of 3 is returned. The null value that exists in the MANAGER
column for several employees is not included in the count of distinct manager values by the call to
the COUNT aggregate function.
Code Listing 8: COUNT and DISTINCT obtain a count of distinct values
SQL> select COUNT(DISTINCT manager) num_distinct_managers
2 from employee;
NUM_DISTINCT_MANAGERS
3
1 row selected.

Maximizing and Minimizing
You can certainly locate the maximum and minimum values within a set of row values youve
fetched with a well-ordered SQL statement. But if your result set is voluminous and all you want is
the maximum or the minimum result, you dont want to scroll to the top or the bottom of the result
set to see it. You can use the MIN and MAX aggregate functions instead. The query in Listing 9
uses them to display the EMPLOYEE tables maximum and minimum salary values.
Code Listing 9: MAX and MIN obtain maximum and minimum column values
SQL> select MAX(salary), MIN(salary)
2 from employee;
MAX(SALARY) MIN(SALARY)

300000 60000
1 row selected.

One by One and Group by Group
So far the examples in this article have discussed aggregate functions working on all rows for a
particular aggregation criterion. But you might want to do further categorizations and aggregations
within your data. The GROUP BY clause enables you to collect data across multiple records and
group the results by one or more columns. Aggregate functions and GROUP BY clauses are used
in tandem to determine and return an aggregate value for every group. For example, the query in
Listing 10 obtains a count of employees in each department.
In Listing 10, note that one employee has not been assigned to a department in the EMPLOYEE
table and that person is included as a group in the results. Note also that the query uses an
ORDER BY clause. Although the GROUP BY clause groups data, it does not sort the results in
any particular order. The query in Listing 11 shows the query from Listing 10 without the ORDER
BY clause.
Code Listing 10: GROUP BY creates grouped categorizations
SQL> select COUNT(employee_id), department_id
2 from employee
3 GROUP BY department_id
4 ORDER BY department_id;
COUNT(EMPLOYEE_ID) DEPARTMENT_ID

6 10
2 20
2 30
1
4 rows selected.

Code Listing 11: No ORDER BY clause with GROUP BY clause
2 from employee
3 GROUP BY department_id;

2 30
1
2 20
6 10
4 rows selected.

When a GROUP BY clause is followed by an ORDER BY clause, the columns listed in the
ORDER BY clause must also be included in the SELECT list. The query in Listing 12
demonstrates the error that will occur if the SELECT list column list and the ORDER BY column list
do not match. Similarly, an error will occur if you do not use GROUP BY for every column in the
SELECT list that is not part of an aggregation operation, as shown by the query in Listing 13. This
query is the same as the query in Listing 12, minus the GROUP BY clause.
Code Listing 12: Error when ORDER BY clause column list doesnt also appear in the SELECT
list
2 from employee
4 ORDER BY hire_date DESC;
ORDER BY hire_date DESC
*
ERROR at line 4:
ORA-00979: not a GROUP BY expression

Code Listing 13: Error when GROUP BY does not list required column
2 from employee
select COUNT(employee_id), department_id
*
ERROR at line 1:
ORA-00937: not a single-group group function

The GROUP BY clause is necessary if your intent is to return multiple groups. And your intent to
return multiple groups is determined by the inclusion in the SELECT list of any column that is not
part of an aggregation operation. In the query in Listing 13, that column is DEPARTMENT_ID in
the EMPLOYEE table. A query that uses aggregate functions and no GROUP BY clause always
returns exactly one row, even if the table you query contains no rows at the time you query it.
Having the Last Word
Just as a SELECT list can use a WHERE clause to filter the result set to include only records that
meet certain criteria, the GROUP BY clause can use a similar clause to filter groups. The HAVING
clause works with the GROUP BY clause to limit the results to groups that meet the criteria you
specify. Listing 14 expands on the query in Listing 10. Inclusion of the HAVING clause in this
query eliminates any groups with fewer than two employees from the result set. As you can see,
the group with no assigned department is not returned in Listing 14s result set, because that
group contains only one employee.
Code Listing 14: HAVING clause filters groups
2 from employee
4 HAVING COUNT(employee_id) > 1

6 10
2 20
2 30
3 rows selected.

The HAVING clause works primarily on aggregate function columns, whereas the WHERE clause
works on columns and other expressions without an aggregation operation. I say primarily
because the HAVING clause can use multiple operators in its filtering operation. For example, the
query in Listing 15 displays a count of employees-per-department-and-salary groups that satisfy
one of two criteria, only the first of which uses an aggregate function:
The department has two or more employees.
The salaries of the employees in the department are less than 100000.
Code Listing 15: HAVING clause can use multiple operators
SQL> select COUNT(employee_id), department_id, salary
2 from employee
3 GROUP BY department_id, salary
4 HAVING (COUNT(employee_id) > 1
5 OR salary < 100000)
6 ORDER BY department_id, salary desc;
COUNT(EMPLOYEE_ID) DEPARTMENT_ID SALARY

1 10 80000
1 10 70000
2 10 60000
1 20 90000
1 20 65000
1 30 70000
1 75000
7 rows selected.

Odds and Ends
Although every column included in the SELECT list must also be listed in a GROUP BY clause,
this restriction doesnt apply to number and string literals, constant expressions (expressions that
do not use column values), and functions such as SYSDATE. Listing 16 shows a query that, for
demonstration purposes, expands on the query in Listing 15. It includes a literal, a constant
expression, and the SYSDATE function in the SELECT list, but it doesnt need to list these items in
the GROUP BY or ORDER BY clause.
Code Listing 16: Literals, expressions, and functions not listed in GROUP BY or ORDER BY
SQL> select COUNT(employee_id), department_id, salary,
2 SYSDATE, String Literal, 42*37 Expression
3 from employee
4 GROUP BY department_id, salary
5 HAVING (COUNT(employee_id) > 1
6 OR salary < 100000)
7 ORDER BY department_id, salary desc;
COUNT(EMPLOYEE_ID) DEPARTMENT_ID SALARY SYSDATE STRINGLITERAL'
EXPRESSION

1 10 80000 29-SEP-12 String Literal

1554
1554
1554
1554
1554
1554
1 75000 29-SEP-12 String Literal
1554
7 rows selected.

As with the other types of functions youve learned about in this article series, you can nest
aggregate functions inside one another. The query in Listing 17 obtains a sum of all salaries per
department. Then it applies the MIN aggregate function to each department salary summary value
to obtain the lowest department salary summary value. This query also demonstrates that you
dont have to list every column in the SELECT list that you list in the GROUP BY clause, even if
its mandatory to do the reverse.
Code Listing 17: Nested aggregate functions
SQL> select MIN(SUM(salary)) min_department_salary_sum
2 from employee
3 where department_id is not null
4 GROUP by department_id;
MIN_DEPARTMENT_SALARY_SUM
155000
1 row selected.

Conclusion and Anticipation
This article has shown you a few of the most common aggregate functions and how you can use
them to manipulate how your data is displayed. Youve seen how to use single-group aggregate
functions such as MAX, MIN, and AVG as well as multigroup functions such as COUNT and SUM.
You now know how these functions operate when null values are present in your data and how
such operations can affect your results. Youve been introduced to the GROUP BY and HAVING
clauses and have been shown how these clauses can help you further filter and categorize your
summary data. Last, but not least, you know what pitfalls to look for when using an ORDER BY
clause with a GROUP BY clause and that column values listed in the SELECT list must also
appear in any GROUP BY clause. By no means does this article provide an exhaustive list of the
Oracle Database aggregate functions. Review the documentation for more details:
bit.ly/WxKZFu.The next installment of SQL 101 will discuss analytic functions.
SQL 101: A Window into the World of Analytic Functions
As Published In

March/April 2013
TECHNOLOGY: SQL 101

A Window into the World of Analytic
Functions
By Melanie Caffrey

Part 9 in this series, Having Sums, Averages, and Other Grouped Data (Oracle Magazine,
January/February 2013), introduced common SQL aggregate functions and the GROUP BY and
HAVING clauses, showing how you can use them to manipulate single-row and grouped result
set data to convey more-meaningful results. The discussion of aggregate functions segues
logically into the subject of more-advanced SQL operations that use aggregations and other
specific views of your data. This article is the first in a three-article sequence that introduces you
to some commonly used analytic functions and their associated clauses. Analytic functions not
only operate on multiple rows but also can perform operations such as ranking data, calculating
running totals, and identifying changes between different time periods (to name a few)all of
which facilitate creation of queries that answer business questions for reporting purposes.
recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle
Database software, choose the installation option that enables you to create and configure a
database. A new database, including sample user accounts and their associated schemas, will be
created for you. (Note that SQL_101 is the user account to use for the examples in this series; its
also the schema in which youll create database tables and other objects.) When the installation
process prompts you to specify schema passwords, enter and confirm passwords for SYS and
SYSTEM and make a note of them.
Increasing Your Bottom Line
You can use standard SQL to answer most data questions. However, pure SQL queries that
answer questions such as What is the running total of employee salary values as they are
summed row by row? arent easy to write and may not perform well over time. Analytic functions
add extensions to SQL that make such operations faster-running and easier to code.
The query in Listing 1 demonstrates use of the SUM analytic function. The query results list all
employees alongside their respective salary values and display a cumulative total of their salaries.
Code Listing 1: Obtain a cumulative salary total, row by row, for all employees
SQL> select last_name, first_name, salary,
2 SUM (salary)
3 OVER (ORDER BY last_name, first_name) running_total
4 from employee
LAST_NAME FIRST_NAME SALARY RUNNING_TOTAL

Dovichi Lori
Eckhardt Emily 100000 100000
Friedli Roger 60000 160000
James Betsy 60000 220000
Jeffrey Thomas 300000 520000
Michaels Matthew 70000 590000
Newton Donald 80000 670000
Newton Frances 75000 745000
Wong Theresa 70000 815000
leblanc mark 65000 880000
peterson michael 90000 970000
11 rows selected.

This result is accomplished with the query line that reads
SUM (salary)
OVER (ORDER BY last_name, first_name) running_total

Anatomy of an Analytic Function
Learning the syntax of an analytic function is half the battle in harnessing its power for efficient
query processing. The syntax for the analytic query line in Listing 1 is
FUNCTION_NAME( column | expression,column | expression,... )
OVER
( Order-by-Clause )

In Listing 1, the function name is SUM. The argument to the SUM function is the SALARY column
(although it could also be an expression). The OVER clause identifies this function call as an
analytic function (as opposed to an aggregate function). The ORDER BY clause identifies the
piece of data this analytic function will be performed over.
This series will discuss scalar subqueries in a later installment. Suffice it to say for this articles
purposes that using a scalar subquery is another method you could employ to achieve the result
obtained in Listing 1. However, it would perform significantly more slowly and its syntax would be
more difficult to write than the analytic query line in Listing 1.
Code Listing 2: Obtain a cumulative salary total, row by row, by department
SQL> select last_name, first_name, department_id, salary,
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY last_name, first_name)
department_total
4 from employee
5 order by department_id, last_name, first_name;
LAST_NAME FIRST_NAME DEPARTMENT_ID SALARY DEPARTMENT_TOTAL

Dovichi Lori 10
Eckhardt Emily 10 100000 100000
Friedli Roger 10 60000 160000
James Betsy 10 60000 220000
Michaels Matthew 10 70000 290000
Newton Donald 10 80000 370000
leblanc mark 20 65000 65000
peterson michael 20 90000 155000
Jeffrey Thomas 30 300000 300000
Wong Theresa 30 70000 370000
11 rows selected.
The query in Listing 2 cumulatively sums the salary values of the employee rows within each
department. The PARTITION clause ensures that the analytic function is applied independently to
each department group (or partition). You can see that the cumulative total resets after the
department changes from 10 to 20, and again from 20 to 30, and finally from 30 to an employee
record that has no department ID. The analytic function syntax including a PARTITION clause
expands as follows on the syntax used in the Listing 1 example:
FUNCTION_NAME( argument,argument, )
OVER
( Partition-Clause Order-by-Clause )

A Separate Order
The queries in Listings 1 and 2 sort the rows returned by employee last name and first name. The
query in Listing 3 uses a slightly different ordering criterion for the analytic function computation.
Code Listing 3: Compute each row based on salary value
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY salary)
department_total
4 from employee
5 order by department_id, salary, last_name, first_name;

James Betsy 10 60000 120000
Dovichi Lori 10 370000
Wong Theresa 30 70000 70000
11 rows selected.

The analytic function in Listing 3 computes the department total values based on salary, in
ascending order for each partition, with NULL salary values evaluated last. You can see that the
record for Lori Dovichithe only record with a NULL salary valueends up with the same
DEPARTMENT_TOTAL value as the record in the same department (Emily Eckhardt) that has the
highest salary value.
An analytic functions ORDER BY clause works independently from the ORDER BY clause of the
overall query that contains the analytic function. Little or no correlation exists between the two
unless they use the same column or expression listings in the same order. In Listing 4, for
example, note that even though the data returned is listed in department/last name/first name
order (like the result sets in Listings 1 and 2), the values returned for the DEPARTMENT_TOTAL
expression match the values for those returned in Listing 3. And even though Betsy James and
Lori Dovichi appear in a different order in the result sets of Listings 3 and 4, the values returned
for their respective department total computations are the same.
Code Listing 4: Sort the data returned from the query in Listing 3 differently
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY salary)
department_total
4 from employee

Dovichi Lori 10 370000
James Betsy 10 60000 120000
Wong Theresa 30 70000 70000
11 rows selected.

Your Choice of Window
An analytic function might or might not include a windowing clause. A windowing clause is a set of
parameters or keywords that defines the group (or window) of rows within a particular partition that
will be evaluated for analytic function computation. The query in Listing 1 uses a windowing clause
by default, because it uses an ORDER BY clause. An ORDER BY clause, in the absence of any
further windowing clause parameters, effectively adds a default windowing clause: RANGE
UNBOUNDED PRECEDING, which means, The current and previous rows in the current partition
are the rows that should be used in the computation. When an ORDER BY clause isnt
accompanied by a PARTITION clause, the entire set of rows used by the analytic function is the
default current partition.
The queries in Listings 3 and 4 include a PARTITION clause but use no windowing clause
parameters. In the calculated results, the DEPARTMENT_TOTAL values for Betsy James and
Roger Friedli are identical. In the absence of windowing clause parameters, when your querys
analytic function orders by a particular column or expression within its partition and two or more
rows have the same value, the analytic function is applied to each of them and returns the same
result, because the analytic function cannot ascertain the order in which they should be evaluated.
The query in Listing 5 uses the ROWS 2 PRECEDING windowing clause to sum the current rows
salary value with just the two preceding rows salary values. Even though the employee listed just
above Matthew Michaels, Betsy James, has a DEPARTMENT_TOTAL value of 220000, the
DEPARTMENT_TOTAL value listed for Matthew Michaels is 190000. This occurs because only
the SALARY value for Matthew Michaels, 70000, is summed with the SALARY values of the two
rows directly preceding histhose of Betsy James and Roger Friedli.
Code Listing 5: Add a ROWS windowing clause
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY last_name, first_name
4 ROWS 2 PRECEDING) department_total
5 from employee

Dovichi Lori 10
James Betsy 10 60000 220000
Wong Theresa 30 70000 370000
11 rows selected.

If a windowing clause that uses parameters is added to an analytic function, the resulting syntax
will look like this:
FUNCTION_NAME( argument,argument, )
OVER
( Partition-Clause Order-by-Clause Windowing-Clause)

Multiple Windows into Your Data
The windowing clause provides either a sliding or an anchored view of data, depending on which
parameters you pass to it. Queries with just an ORDER BY clause (such as those in Listings 1, 2,
3, and 4) provide an anchored view of the data: it begins with the first row (or top) of the partition
and ends with the current row being processed. The query in Listing 5 results in a sliding view of
the data, because the DEPARTMENT_TOTAL value for each row can change, depending on how
the data is sorted (ordered) within each partition.
Listing 5 demonstrates use of the ROWS clause as the parameter input to the windowing clause.
You can also create a sliding view of data by using the RANGE clause. Unlike the ROWS clause,
the RANGE windowing clause can be used only with ORDER BY clauses containing columns or
expressions of numeric or date datatypes. It has this datatype requirement because it operates on
all rows within a certain range of the current row. The value for the column or expression by which
your data is ordered within each partition falls within specified numeric or date units from the
current row.
Code Listing 6: Sort a partition by date of hire and use a RANGE windowing clause
SQL> select last_name, first_name, department_id, hire_date, salary,
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY hire_date
4 RANGE 90 PRECEDING) department_total
5 from employee
6 order by department_id, hire_date;
LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE SALARY
DEPARTMENT_TOTAL

Eckhardt Emily 10 07-JUL-04 100000
100000
Newton Donald 10 24-SEP-06 80000
80000
James Betsy 10 16-MAY-07 60000
190000
Friedli Roger 10 16-MAY-07 60000
190000
Michaels Matthew 10 16-MAY-07 70000
190000
Dovichi Lori 10 07-JUL-11
peterson michael 20 03-NOV-08 90000
90000
leblanc mark 20 06-MAR-09 65000
65000
Jeffrey Thomas 30 27-FEB-10 300000
300000
Wong Theresa 30 27-FEB-10 70000
370000
Newton Frances 14-SEP-05 75000
75000
11 rows selected.

The query in Listing 6 illustrates how the RANGE clause works. The querys partition is sorted by
HIRE_DATE. The query then specifies the following windowing clause:
Next Steps
READ more about
relational database design and concepts
Oracle Database Concepts 11g Release 2
(11.2)
Oracle Database SQL Language Reference
11g Release 1 (11.1)
Oracle Database Data Warehousing Guide
11g Release 2 (11.2)
Oracle SQL Developer Users Guide Release
3.1
DOWNLOAD the sample script for this article
RANGE 90 PRECEDING

This line means, Provide a summary of
the current rows salary value together with
the salary values of all previous rows
whose HIRE_DATE value falls within 90
days preceding the HIRE_DATE value of
the current row. Note that within
Department 10, only three rows have a
DEPARTMENT_TOTAL value different
from their SALARY value. The employees
listed in these rows were all hired on the
same date and therefore fall within the
range of date values required for salary
summation.
Also note that within Department 30, two
employees were hired on the same date
but only one of the rows lists a
DEPARTMENT_TOTAL value different
from its SALARY value. This result is due
to the PRECEDING keyword in the
RANGE clause. Effectively, this means, Look at any rows that precede the current row before
determining whether the HIRE_DATE units being sorted fall within the range of the current rows
HIRE_DATE. No row precedes that of Thomas Jeffrey in Department 30, so his resultant
DEPARTMENT_TOTAL value remains unchanged and is no different from his listed SALARY
value.
The query in Listing 7 illustrates the importance of using only columns or expressions of date or
numeric datatypes. It tries to sort each partition by employee last name and first name. Because a
RANGE windowing clause can determine only an appropriate range of values dependent upon
numeric or date rangesnot textual or string rangesit cannot determine the appropriate range
and causes the query to fail.
Code Listing 7: RANGE windowing clause that uses an incorrect datatype
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY last_name, first_name
4 RANGE 90 PRECEDING) department_total
5 from employee
SUM (salary)
*
ERROR at line 2:
ORA-30486: invalid window aggregation group in the window specification

Also, if your querys analytic function uses a RANGE windowing clause, you will be able to use
only one column or expression in the ORDER BY clause; ranges are one-dimensional. These
restrictions do not apply to the ROWS windowing clause, which can be applied to any datatype
and is not limited to a single column or expression in the ORDER BY clause.
Narrowing Your Viewpoint
In its most basic form, a window can be specified in one of three mutually exclusive ways. Table 1
shows the types of parameters that can be passed to the ROWS or RANGE windowing clauses.
Windowing
Clause
Parameter
Description
current row The window begins and ends with the current row being processed.
UNBOUNDED
PRECEDING
The window begins with the first row of the current partition and ends with
the current row being processed.
numeric
expression
PRECEDING
ROWS clause The window begins with the row that is numeric
expression rows preceding the current row and ends with the current row
being processed.
RANGE clause The window begins with the row whose ORDER BY value
is numeric expression rows less than, or preceding, the current row and
ends with the current row being processed.
Table 1: Windowing clause parameters
So far, all the windows demonstrated in this article end at the current row and use preceding row
or range values in their computations. You can also use the BETWEEN operator to specify a
window in which the current row falls somewhere in the middle of the result set. The query in
Listing 8 demonstrates that in addition to a ROWS or RANGE clause that specifies that your
window starts with previous row values and ends with the current row being processed, you can
also use the FOLLOWING parameter to look at rows following the current row being processed
and make an evaluation based on those row values.
Code Listing 8: Query with a RANGE windowing clause that uses the BETWEEN and
FOLLOWING parameters
2 SUM (salary)
3 OVER (PARTITION BY department_id ORDER BY hire_date
4 RANGE BETWEEN 365 PRECEDING AND 365 FOLLOWING)
department_total
5 from employee
DEPARTMENT_TOTAL

100000
270000
270000
270000
270000
155000
155000
370000
370000
75000
11 rows selected.

Conclusion
Using analytic functions is a powerful way to get answers about your data that would otherwise
require convoluted, possibly poorly performing SQL. Your reporting needs will dictate not only
which analytic functions you use but also which windowing clauses (if any) will provide the
reporting view into your data that best conveys meaningful results to your users. This article has
demonstrated use of a common analytic function (SUM); the PARTITION and OVER clauses; the
ROWS and RANGE windowing clauses; and several basic, common windowing clause parameter
specifications. The next installment of SQL 101 will continue the discussion of analytic functions.
To learn more details about what you can glean from using the Oracle analytic functions, review
the documentation at bit.ly/yWtbz1 and bit.ly/R4cZyq.
SQL article May-June 2013
As Published In

May/June 2013
Next Steps
READ more about
concepts
Release 2 (11.2)
Oracle Database Data Warehousing
Guide 11g Release 2 (11.2)
this article
TECHNOLOGY: SQL 101

Leading Ranks and Lagging
Percentages: Analytic Functions,
Continued
By Melanie Caffrey

This article is the second in a three-article sequence that introduces you to some commonly used
SQL analytic functions and their associated clauses. Analytic functions add extensions to SQL
that make complex queries easier to code and faster-running. In Part 10 of this series, A Window
into the World of Analytic Functions (Oracle Magazine, March/April 2013), you learned how the
SUM analytic function, the PARTITION and OVER clauses, the ROWS and RANGE windowing
clauses, and several windowing clause parameter specifications help you manipulate result set
data for business reporting purposes. This article introduces you to analytic functions that enable
your queries to
Rank datafor example, to display the three employees with the highest salaries by
department
Return the first or last value from a groupfor example, to compare the salary of every
employee in a department with that of the last employee hired in that department
Provide your report with the rows that either precede (lead) or follow (lag) the current row
being processedfor example, to discover how many days before an employees hire date
the penultimately hired employee was hired
Obtain percentages within a groupfor example, to find out what percentage a particular
employee received of the total amount a department pays its employees annually
To try out the examples in this series, you need
access to an Oracle Database instance. If
necessary, download and install an Oracle Database
edition for your operating system. I recommend
installing Oracle Database, Express Edition 11g
Release 2. If you install the Oracle Database
software, choose the installation option that enables
you to create and configure a database. A new
database, including sample user accounts and their
associated schemas, will be created for you. (Note
that SQL_101 is the user account for the examples
in this series; its also the schema in which youll
create database tables and other objects.) When the
installation process prompts you to specify schema
passwords, enter and confirm passwords for SYS
and SYSTEM and make a note of them.
Finallywhether you installed the database software
from scratch or have access to an existing Oracle
Database instancedownload, unzip, and execute
the SQL script to create the tables for the SQL_101
schema required for this articles examples. (View
the script in a text editor for execution instructions.)
Being Outranked
A query that retrieves the top or bottom N row(s) from a database table that satisfy certain criteria
is sometimes referred to as a top-N query. For example, you might want to ask who the most
highly paid employees are or which department has the lowest sales figures. An easy way to
answer such a question is to use either the RANK or the DENSE_RANK analytic function, both of
which calculate and display the numerical rank of a value within a group of values. The example in
Listing 1 lists all employees alongside their respective salary values, partitioned and sorted by
department and further sorted, in descending order, by salary. It uses the DENSE_RANK analytic
function to assign a numerical rank to the salaries within each department.
Code Listing 1: Code Listing 1: List employees, ranked by department, by salary

SQL> select department_id, last_name, first_name, salary,
2 DENSE_RANK() over (partition by department_id
3 order by salary desc) dense_ranking
4 from employee
5 order by department_id, salary desc, last_name, first_name;
DEPARTMENT_ID LAST_NAME FIRST_NAME SALARY
DENSE_RANKING

10 Dovichi Lori
1
10 Eckhardt Emily 100000
2
3
4
5
10 James Betsy 60000
5
1
2
30 Jeffrey Thomas 300000
1
30 Wong Theresa 70000
2
Newton Frances 75000
1
11 rows selected.

The results in Listing 1 reveal an interesting analytic function phenomenon. When a query uses a
descending sort order, a NULL value can affect the outcome of the analytic function being used.
By default, with a descending sort, SQL views NULLs as being higher than any other value. In
Listing 1, the record for employee Lori Dovichi has no salary value, yet the DENSE_RANK
analytic function gives her salary a rank value of 1 the highest rankin Department 10.
You can eliminate NULLs from consideration by adding a WHERE clause such as
WHERE SALARY IS NOT NULL

Alternatively, you can use the NULLS LAST extension to the ORDER BY clause in your
windowing clause, as demonstrated in Listing 2. The record for Lori Dovichi still appears first for
Department 10, because the querys overall ORDER BY clause still orders by salary in descending
order. But the rank value attributed to that record is now 5the lowest rank value for Department
10. Note also that the DENSE_RANK function assigns the same rank value, 4, to two records
(Roger Friedlis and Betsy James) in the results for Department 10, because both employees
have the same salary value.

Code Listing 2: List employees, ranked by department, by salary, with NULLS LAST
2 DENSE_RANK() over (partition by department_id
3 order by salary desc NULLS LAST)
dense_ranking
4 from employee
DENSE_RANKING

10 Dovichi Lori
5
1
2
3
4
4
1
2
1
2
1
11 rows selected.

Listing 3 performs a similar query to that of Listing 2, with the RANK analytic function instead of
DENSE_RANK. Note that the results include no rank value of 5 for Department 10. The reason is
that DENSE_RANK and RANK attribute rank values to records differently. DENSE_RANK returns
ranking numbers without any gaps, regardless of any records that have the same value for the
expression in the ORDER BY windowing clause. In contrast, when the RANK analytic function
finds multiple rows with the same value and assigns them the same rank, the subsequent rank
numbers take account of this by skipping ahead. As you see in the results for Listing 3, RANK
assigns a rank value of 4 to two records and skips to a rank value of 6 for the final record in the
department, which has the lowest rank value.
Code Listing 3: Use the RANK analytic function instead of the DENSE_RANK analytic function
2 RANK() over (partition by department_id
3 order by salary desc NULLS LAST)
regular_ranking
4 from employee
REGULAR_RANKING

10 Dovichi Lori
6
1
2
3
4
4
1
2
1
2
1
11 rows selected.

Finishing First or Last
For reporting purposes, it might occasionally be useful to include the first value obtained for a
particular group or window when displaying your query results. You can use the FIRST_VALUE
analytic function for this purpose, as shown in Listing 4. The query in Listing 4 returns windows
that are partitioned by department and ordered by date of hire within each partition. Alongside
each returned salary value, the first salary value obtained per window is also displayed. This
information could be useful for comparing the salary value of every employee in a department with
that of the first employee hired in that department.
Code Listing 4: Display the first value returned per window, using FIRST_VALUE
2 FIRST_VALUE(salary)
3 over (partition by department_id order by hire_date)
first_sal_by_dept
4 from employee
FIRST_SAL_BY_DEPT


100000
100000
100000
100000
100000
100000
90000
90000
300000
300000
75000
11 rows selected.

Contrast the query results in Listing 4 with the results in Listing 5. The query in Listing 5 uses the
LAST_VALUE analytic functionbut uses it incorrectly. You cannot simply swap the
LAST_VALUE analytic function for the FIRST_VALUE analytic function and expect the results to
return the last value per window. Recall that the default behavior of an ORDER BY clause in a
partition without an accompanying windowing clause is to make the default window a sliding view
that operates on the current row and all preceding rows. In Listing 5, then, the value returned from
the call to the LAST_VALUE function is always the same as the current rows salary value. To
make the call to the LAST_VALUE function more meaningful, you must add a windowing clause
to the ORDER BY clause in the partition, as shown in Listing 6.
Code Listing 5: Fail to obtain the last value per window, because of incorrect use of
LAST_VALUE
2 LAST_VALUE(salary)
3 over (partition by department_id order by hire_date)
last_sal_by_dept
4 from employee
LAST_SAL_BY_DEPT


100000
80000
70000
70000
70000
90000
65000
300000
70000
75000
11 rows selected.

The query in Listing 6 displays the employee records that fall within each department partition,
sorted by date of hire, along with the last salary value within each partition and its associated last
name value. The query specifies WHERE SALARY IS NOT NULL, because a NULL salary value
would not provide a useful comparison. Because the record for the employee Lori Dovichi is
NULL, its not included in the partition for Department 10. All other records in that partition display
the last name and salary values for Matthew Michaels as the last employee record that falls within
the partition. The values for LAST_EMP and LAST_SAL for the employee Frances Newton are the
same as those for her employee record, because no other employee records have a NULL value
for their Department ID. You can use the IGNORE NULLS extension to eliminate NULLS from
consideration in your LAST_VALUE analytic function call if you want to include all records,
regardless of the presence of NULL values. To do this, change the call to LAST_VALUE (salary)
to LAST_VALUE (salary IGNORE NULLS).
Code Listing 6: Display the last value per window through correct use of LAST_VALUE
SQL> select last_name, first_name, department_id dept_id, hire_date,
salary,
2 LAST_VALUE(last_name)
3 over (partition by department_id order by hire_date
4 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
last_emp,
5 LAST_VALUE(salary)
6 over (partition by department_id order by hire_date
7 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
last_sal
8 from employee
9 where salary is not null
10 order by department_id, hire_date, last_name, first_name;
LAST_NAME FIRST_NAME DEPT_ID HIRE_DATE SALARY LAST_EMP
LAST_SAL

Eckhardt Emily 10 07-JUL-04 100000 Michaels

70000
Newton Donald 10 24-SEP-06 80000 Michaels
70000
Friedli Roger 10 16-MAY-07 60000 Michaels
70000
James Betsy 10 16-MAY-07 60000 Michaels
70000
Michaels Matthew 10 16-MAY-07 70000 Michaels
70000
peterson michael 20 03-NOV-08 90000 leblanc
65000
leblanc mark 20 06-MAR-09 65000 leblanc
65000
Jeffrey Thomas 30 27-FEB-10 300000 Wong
70000
Wong Theresa 30 27-FEB-10 70000 Wong
70000
Newton Frances 14-SEP-05 75000 Newton
75000
10 rows selected.

In the Lead and Lagging Behind
A common reporting requirement, for comparison purposes, is to access data not only from the
current row being reviewed but also from the rows that precede or follow the current row. Consider
the query in Listing 7. Using the LAG analytic function, you can obtain a side-by-side view of when
the current employee was hired, alongside the last date on which an employee was hired, on a
per-department basis. For example, the record for the employee Donald Newton shows that the
employee hired before him was hired on 07-JUL-04. If you look at the record immediately
preceding the record for Donald Newtonthe record for Emily Eckhardtyou can see that she
was indeed hired on 07-JUL-04.
Code Listing 7: Use the LAG analytic function to obtain row data preceding the current row
SQL> select last_name, first_name, department_id, hire_date,
2 LAG(hire_date, 1, null) over (partition by department_id
3 order by hire_date) prev_hire_date
4 from employee
LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE PREV_HIRE

Eckhardt Emily 10 07-JUL-04
Newton Donald 10 24-SEP-06 07-JUL-04
Friedli Roger 10 16-MAY-07 24-SEP-06
James Betsy 10 16-MAY-07 16-MAY-07
Michaels Matthew 10 16-MAY-07 16-MAY-07
Dovichi Lori 10 07-JUL-11 16-MAY-07
peterson michael 20 03-NOV-08
leblanc mark 20 06-MAR-09 03-NOV-08
Jeffrey Thomas 30 27-FEB-10
Wong Theresa 30 27-FEB-10 27-FEB-10
Newton Frances 14-SEP-05
11 rows selected.

The syntax for the LAG analytic function is
LAG(column | expression, offset, default)

Offset is a positive integer that defaults to a value of 1. This parameter tells the LAG function how
many previous rows it should go back. A value of 1 means, Look at the row immediately
preceding the current row within the current window. Default is the value you want to return if the
offset value (index) is out of range for the current window. For the first row in a group, the default
value will be returned.
The syntax for the LEAD analytic function is almost the same as that for the LAG analytic function,
with two differences:
The offset parameter tells the LEAD function how many rows after the current row it should
go forward.
For the last row in a group, the default value will be returned.
Consider the query in Listing 8. As Listing 8 shows, the LEAD analytic function looks at and
reports on the row directly following the current row. The value for the FOLLOWING_HIRE_DATE
column for the employee records that are listed last for each department is NULL, because there
are no further records in each department group. Similarly, every time a new department group is
displayed, the value for the PREV_HIRE_DATE column for the employee records listed first is
also NULL, because there are no previous records in the group.
Code Listing 8: Use LAG and LEAD to obtain row data preceding and following the current row
SQL> select last_name, first_name, department_id, hire_date,
2 LAG(hire_date, 1, null) over (partition by department_id
3 order by hire_date) prev_hire_date,
4 LEAD(hire_date, 1, null) over (partition by department_id
5 order by hire_date)
following_hire_date
6 from employee
LAST_NAME FIRST_NAME DEPARTMENT_ID HIRE_DATE PREV_HIRE
FOLLOWING

Eckhardt Emily 10 07-JUL-04 24-

SEP-06
Newton Donald 10 24-SEP-06 07-JUL-04 16-
MAY-07
Friedli Roger 10 16-MAY-07 24-SEP-06 16-
MAY-07
James Betsy 10 16-MAY-07 16-MAY-07 16-
MAY-07
Michaels Matthew 10 16-MAY-07 16-MAY-07 07-
JUL-11
Dovichi Lori 10 07-JUL-11 16-MAY-07
peterson michael 20 03-NOV-08 06-
MAR-09
leblanc mark 20 06-MAR-09 03-NOV-08
Jeffrey Thomas 30 27-FEB-10 27-
FEB-10
Wong Theresa 30 27-FEB-10 27-FEB-10
Newton Frances 14-SEP-05
11 rows selected.
Increasing Your Ratios
Business users often need to report on percentages. Sales amounts, overall costs, and annual
salaries are just some of the figures that are likely to require a percentage calculation. The query
in Listing 9 uses the RATIO_TO_REPORT analytic function to answer the question What
percentage of the total annual salary allotment does each employee receive? The syntax for the
RATIO_TO_REPORT analytic function is
RATIO_TO_REPORT( column | expression)

Code Listing 9: Use RATIO_TO_REPORT to obtain the percentage of salaries
round(RATIO_TO_REPORT(salary) over ()*100, 2) sal_percentage
2 from employee
SAL_PERCENTAGE


10.31
8.25
7.22
6.19
6.19
9.28
6.7
30.93
7.22
7.73
11 rows selected.

One nice feature of this analytic function is that it does the work for you of summing the
expression values that are used to obtain the resultant percentage values. You dont need an
additional aggregate function call. Note also that the analytic function call in this query example
uses the entire set of rows as its window, because it does not specify any ORDER BY clause or
additional windowing clauses. Compare the results from Listing 9 with those obtained from the
query in Listing 10. The query in Listing 10 adds a PARTITION clause to the OVER clause to
calculate the percentage of total departmental salaries each employee receives.
Code Listing 10: Use RATIO_TO_REPORT to obtain the percentage of salaries, by department
round(ratio_to_report(salary)
2 over(partition by department_id)*100, 2) sal_dept_pct
3 from employee
SAL_DEPT_PCT


27.03
21.62
18.92
16.22
16.22
58.06
41.94
81.08
18.92
100
11 rows selected.

Conclusion
This article continued the discussion of analytic functions introduced in Part 10 of this series. It
demonstrated how you can use seven more of the most common analytic functions to manipulate
the way your results display. Youve seen how to use the RANK and DENSE_RANK analytic
functions to obtain results for top-N queries and understand the differences between them. Youve
learned how the FIRST_VALUE and LAST_VALUE analytic functions can be used in your reports
for data comparisons within groups. You also now know how LEAD and LAG can show you the
row values preceding and following your current row values to facilitate data comparisons. Those
reading the online version of this article have also learned how to obtain percentages within a
group with the RATIO_TO_REPORT analytic function.
In all cases, you can see that harnessing the power of these analytic functions greatly reduces the
need to write complicated SQL to obtain the same results. Review the documentation at
bit.ly/yWtbz1 and bit.ly/R4cZyq for more details. The next installment of SQL 101 will conclude the
discussion of analytic functions.
SQL 101: Pivotal Access to Your Data: Analytic Functions, Concluded
As Published In

July/August 2013
TECHNOLOGY: SQL 101

Pivotal Access to Your Data:
Analytic Functions, Concluded
By Melanie Caffrey

Part 11 in this series, Leading Ranks and Lagging Percentages: Analytic Functions, Continued
(Oracle Magazine, May/June 2013), continued the discussion of analytic functions that began in
Part 10. It demonstrated analytic functions that enable you to obtain results for top-N queries,
evaluate data comparisons, and calculate percentages within a group, among other actions. This
article wraps up the series coverage of analytic functions by showing
How you can get a new perspective on your results with pivot queries that convert column
data into row data or row data into column data
How to employ an inline view to use an analytic function in a WHERE clause
recommend installing Oracle Database, Express Edition 11g Release 2. If you install the Oracle
Database software, choose the installation option that enables you to create and configure a
database. A new database, including sample user accounts and their associated schemas, will be
created for you. (Note that SQL_101 is the user account to use for the examples in this series; its
also the schema in which youll create database tables and other objects.) When the installation
process prompts you to specify schema passwords, enter and confirm passwords for SYS and
SYSTEM and make a note of them.
Turning Your Data on Its Side
A common business reporting requirement is that data in a column be displayed horizontally rather
than vertically, for better readability. For example, compare the result set in Listing 1 to the one in
Listing 2. The query in Listing 1 lists all employees alongside their respective department IDs,
sorted by department and employee name. The data returned in Listing 2 is the same as that
returned in Listing 1, but it is displayed differently.
Code Listing 1: Obtaining a list of employees, sorted by department and employee name
SQL> select department_id, last_name, first_name
2 from employee
DEPARTMENT_ID LAST_NAME FIRST_NAME

10 Dovichi Lori
10 Eckhardt Emily
10 Friedli Roger
10 James Betsy
10 Michaels Matthew
10 Newton Donald
20 leblanc mark
20 peterson michael
30 Jeffrey Thomas
30 Wong Theresa
Newton Frances

Code Listing 2: Employees, sorted by department and name, displaying one row per department
SQL> select department_id,
2 LISTAGG(first_name||' '||last_name, ', ')
3 WITHIN GROUP
4 (order by last_name, first_name) employees
5 from employee
6 group by department_id
7 order by department_id;
DEPARTMENT_ID EMPLOYEES

10 Lori Dovichi, Emily Eckhardt, Roger Friedli, Betsy James,
Matthew Michaels, Donald Newton
20 mark leblanc, michael peterson
30 Thomas Jeffrey, Theresa Wong
Frances Newton
4 rows selected.

Listing 2 uses the LISTAGG function (introduced in Oracle Database 11g) to construct a comma-
delimited list of employees per department, thereby pivoting the more traditionally displayed result
set in Listing 1. You can use LISTAGG as a single-group aggregate function, a multigroup
aggregate function, or an analytic function.
When LISTAGG is invoked as a single-group aggregate function, it operates on all rows that
satisfy any WHERE clause condition andlike all other single-group aggregate functions
returns a single output row. The example in Listing 2 demonstrates the use of LISTAGG as a
multigroup aggregate function, returning a row for each group defined by the GROUP BY clause.
The syntax for the LISTAGG function is
LISTAGG ( column | expression,
delimiter ) WITHIN GROUP (ORDER BY column | expression)

LISTAGG performs as an analytic function if you add an OVER clause:
OVER (PARTITION BY column | expression)

The column or expression to be aggregated, the WITHIN GROUP keywords, and the ORDER BY
clause that immediately follows the WITHIN GROUP keywords (that is, the sort that takes place
within the grouping) are mandatory in all three LISTAGG use cases.
The query in Listing 3 uses LISTAGG as an analytic function. It obtains a list of salaries, from
highest to lowest, per department. Alongside each salary value is the name, in last-name/first-
name alphabetical order, of the employee who earns that salary value. In addition, every
employee for the current rows listed department is returned in salary order, from highest to
lowest, and in last-name/first-name alphabetical order.
Code Listing 3: Invoking the LISTAGG function as an analytic function
SQL> select department_id, salary, first_name|| ||last_name earned_by,
2 listagg(first_name|| ||last_name, , )
3 within group
4 (order by salary desc nulls last, last_name, first_name)
5 over (partition by department_id) employees
6 from employee
7 order by department_id, salary desc nulls last, last_name,
first_name;
DEPARTMENT_ID SALARY EARNED_BY EMPLOYEES

10 100000 Emily Eckhardt Emily Eckhardt, Donald

Newton,
Matthew Michaels, Roger
Friedli,
Betsy James, Lori Dovichi
10 70000 Matthew Michaels Emily Eckhardt, Donald
Newton,
Friedli,
10 60000 Roger Friedli Emily Eckhardt, Donald
Newton,
Friedli,
10 60000 Betsy James Emily Eckhardt, Donald
Newton,
Friedli,
10 Lori Dovichi Emily Eckhardt, Donald
Newton,
Friedli,
20 90000 michael peterson michael peterson, mark
leblanc
20 65000 mark leblanc michael peterson, mark
leblanc
30 300000 Thomas Jeffrey Thomas Jeffrey, Theresa Wong
30 70000 Theresa Wong Thomas Jeffrey, Theresa Wong
75000 Frances Newton Frances Newton
11 rows selected.

Twisting and Turning into Fewer and Wider
A PIVOT clause enables you to turn rows into columns and present your data in a cross-tabular
format. The syntax of the PIVOT clause is
SELECT FROM PIVOT ( aggregate-
function column | expression )
FOR column | expression to be pivoted IN (value1, valueN)
) AS alias

Compare the result sets in Listings 4 and 5. The query in Listing 4 displays a summary of each
departments total employee salary amount in a cross-tabular report. The query in Listing 5 returns
the same departmental salary summaries as those in Listing 4 but in columnar format, which your
users might consider less readable.
Code Listing 4: Using the PIVOT function to obtain cross-tabular results
SQL> select *
2 from (select department_id, salary
3 from employee) total_department_sals
4 PIVOT (SUM(salary)
5 FOR department_id IN (10 AS Accounting, 20 AS Payroll, 30 AS
IT,
6 NULL AS Unassigned_Department));
ACCOUNTING PAYROLL IT UNASSIGNED_DEPARTMENT

370000 155000 370000 75000
1 row selected.

Code Listing 5: Traditional columnar display of summarized salaries, grouped by department
SQL> select department_id, sum(salary)
2 from employee
3 group by department_id
4 order by department_id nulls last;
DEPARTMENT_ID SUM(SALARY)

10 370000
20 155000
30 370000
75000
4 rows selected.

The query in Listing 6 demonstrates that its possible to pivot on more than one column. The
results from this query display the sum total of salaries per department, but only for employees
who were hired in a particular year. You can also pivot on and display multiple aggregate values,
as Listing 7 demonstrates. The query in Listing 7 obtains the sum of all salaries, alongside the
latest date of hire for an employee, per department.
Code Listing 6: Displaying the sum total salaries of employees per department for a particular
year
SQL> select *
2 from (select department_id,
3 to_char(trunc(hire_date, 'YYYY'), 'YYYY') hire_date, salary
4 from employee)
5 PIVOT (SUM(salary)
6 FOR (department_id, hire_date) IN
7 ((10, '2007') AS Accounting_2007,
8 (20, '2008') AS Payroll_2008,
9 (30, '2010') AS IT_2010
10 )
11 );
ACCOUNTING_2007 PAYROLL_2008 IT_2010

190000 90000 370000
1 row selected.
Code Listing 7: Pivoting on and displaying multiple aggregate columns
SQL> select *
2 from (select department_id, hire_date, salary
3 from employee)
4 PIVOT (SUM(salary) AS sals,
5 MAX(hire_date) AS latest_hire
6 FOR department_id IN (10, 20, 30, NULL));
10_SALS 10_LATEST 20_SALS 20_LATEST 30_SALS 30_LATEST NULL_SALS
NULL_LATE

370000 07-JUL-11 155000 06-MAR-09 370000 27-FEB-10 75000 14-

SEP-05
1 row selected.

When you use multiple aggregate functions, its advisable to supply an alias for each of them. The
resultant column headings are a concatenation of the pivot values (or pivot aliases), an
underscore, and (if youve supplied them) the aliases of the aggregate functions. For example,
some of Listing 7s columns are 10_SALS and 10_LATEST. Note that the columns for the latest
hire dates per department, such as 10_LATEST, are actually columns using the LATEST_HIRE
alias. When you prepend the LATEST_HIRE alias with the department ID, the query should return
a column that reads, for example, 10_LATEST_HIRE.
However, with SQL*Plus, the column heading displayed for a column with a DATE datatype is
never longer than the default format for the value returned. The HIRE_DATE columns format is
DD-MON-RR, so only the first nine characters of the heading are displayed. To display a full
heading, such as 10_LATEST_HIRE, consider using TO_CHAR to apply a date format mask to
the column.
If you dont supply an alias for your aggregate functions, you might get an error message, as
shown in the example in Listing 8. Because neither of the aggregate functions in Listing 8 is
aliased, the PIVOT operator doesnt know to which one to apply the column heading for the pivot
value (in this case, the DEPARTMENT_ID value). As a result, the PIVOT operator cant simply use
its default column headings and the query fails with a column ambiguously defined error
message. Avoid this error by creating an alias for each aggregate function; dont rely solely on the
default column headings that result from use of the PIVOT operation.
Code Listing 8: A column ambiguously defined error occurs
SQL> select *
3 from employee)
4 PIVOT (SUM(salary),
5 MAX(hire_date)
6 FOR department_id IN (10, 20, 30, NULL));
select *
*
ERROR at line 1:
ORA-00918: column ambiguously defined

A Horizontal View of the Vertical
Just as you might have a reporting need to turn rows into columns, you might also need to turn
columns into rows. Youve seen one way to do this with the LISTAGG function. You can also use
the UNPIVOT operator for this purpose. Note that the UNPIVOT operator does not reverse an
action performed with the PIVOT operator. Rather, it works on data that is already stored as
pivoted.
Consider the CREATE TABLE statement in Listing 9. It creates a table with pivoted data, using a
query similar to the one in Listing 7. Now you can query this data by using the UNPIVOT operator,
as Listing 10 illustrates. Compare the values returned from the query in Listing 10 with the values
returned from the query in Listing 7. As you can see, they are the same but are displayed
differently.
Code Listing 9: Creating a table with pivoted data
SQL> CREATE TABLE pivoted_emp_data AS
2 select *
4 from employee)
5 PIVOT (SUM(salary) sum_sals,
6 MAX(hire_date) latest_hire
7 FOR department_id IN (10 AS Acc, 20 AS Pay, 30 AS IT, NULL));
Table created.

Code Listing 10: Using the UNPIVOT operator to turn rows into columns
SQL> select hire_date, salary
2 from pivoted_emp_data
3 UNPIVOT INCLUDE NULLS
4 ((hire_date, salary)
5 FOR department_id IN (
6 (acc_latest_hire, acc_sum_sals) AS 'Accounting',
7 (pay_latest_hire, pay_sum_sals) AS 'Payroll',
Next Steps
READ more about
concepts
Release 2 (11.2)
Oracle Database Data Warehousing
Guide 11g Release 2 (11.2)
this article
8 (it_latest_hire, it_sum_sals) AS 'IT',
9 (null_latest_hire, null_sum_sals) AS 'Unassigned'
10 ))
11 order by hire_date, salary;
HIRE_DATE SALARY

14-SEP-05 75000
06-MAR-09 155000
27-FEB-10 370000
07-JUL-11 370000
4 rows selected.
The results in Listing 7 are returned as one long record (row), with each HIRE_DATE and
SALARY combination, pivoted by department, displayed side by side. In contrast, each of these
combinations is returned as a separate and distinct row from the query in Listing 10, with the
HIRE_DATE and SALARY values displayed in separate columns. Note that the query in Listing 10
unpivots and returns value pairs of different datatypes. HIRE_DATE uses the DATE datatype, and
SALARY uses the NUMBER datatype, so the alias you use for these value pairs must be
enclosed in single quotation marks. If it is not, you might get an error message like the one shown
in Listing 11.
Code Listing 11: Using aliases for value pairs of different datatypes
SQL> select hire_date, salary
2 from pivoted_emp_data
3 UNPIVOT INCLUDE NULLS
4 ((hire_date, salary)
5 FOR department_id IN (
6 (acc_latest_hire, acc_sum_sals) AS Accounting,
7 (pay_latest_hire, pay_sum_sals) AS Payroll,
8 (it_latest_hire, it_sum_sals) AS IT,
9 (null_latest_hire, null_sum_sals) AS Unassigned
10 ))
11 order by hire_date, salary;
(acc_latest_hire, acc_sum_sals) AS Accounting,
*
ERROR at line 6:
ORA-56901: non-constant expression is not allowed for pivot|unpivot
values

When and How to Predicate Analytically
Other than the final ORDER BY clause, analytic functions are the last set of operations performed
in a query. Because they can appear only in the SELECT list or the ORDER BY clause, you
cannot use them directly in any predicates, including in a WHERE or HAVING clause. If you need
to select from a result set based on the outcome of applying an analytic function, you can use an
inline view. An inline view is a SELECT statement in the FROM clause of another SELECT
statement. It acts as a TABLE (otherwise known as a FROM) clause. You have already seen
examples of inline view capability in this article in Listings 4, 6, 7, 8, and 9.
Code Listing 12: Using an inline view to enable use of an analytic function as a predicate
SQL> select *
2 from (select department_id, last_name||', '||first_name, salary,
3 dense_rank() over (partition by department_id
4 order by salary desc nulls last)
d_rank
5 from employee)
6 where d_rank < 3
7 order by department_id, salary desc nulls last;
DEPARTMENT_ID LAST_NAME||','||FIRST_NAME SALARY
D_RANK

10 Eckhardt, Emily 100000 1

10 Newton, Donald 80000 2
20 peterson, michael 90000 1
20 leblanc, mark 65000 2
30 Jeffrey, Thomas 300000 1
30 Wong, Theresa 70000 2
Newton, Frances 75000 1
7 rows selected.

Suppose you want to use an analytic function to obtain the top two salary earners by department.
As Listing 12 illustrates, you can place the analytic function operation in an inline view and alias it.
The alias provided to the inline view in Listing 12 is D_RANK (this is named for the result of
applying the DENSE_RANK analytic function). The query in the inline view (the inner query) must
be resolved before it can be used by the query that encompasses it (the outer query). After the
inline view completes, the outer query can use its result in a predicate. The predicate clause in the
outer query for Listing 12 is
WHERE d_rank < 3

Striving to Perform Well
Although analytic functions help you form a more
elegant and less convoluted SQL solution to a
reporting requirement, they are not a replacement
for writing good code. Your goal should be to
constantly and consistently write good SQL thats
easy to maintain and that will perform well over time.
Its all too easy to abuse SQL techniques that make
processes easier. Used incorrectly, any SQL
technique can be written poorly and become a
system inhibitor.
In particular, sorting and sifting data can exhaust
system resources. (The query in Listing 13, for
example, includes three potential sort operations.)
This shouldnt necessarily deter you from using
analytic functions, but keep in mind that you can
write a query that brings a system to its knees just
as easily as you can write one that provides you with an efficient, elegant, and easy-to-maintain
solution. With the power of analytic functions comes responsibility.
Code Listing 13: Query with analytic functions that may cause system performance problems
SQL> select first_name||' '||last_name, department_id, hire_date,
2 sum(salary) over (order by department_id,
3 first_name||' '||last_name) sum_dept_emp,
4 avg(salary) over (order by hire_date, department_id)
avg_dept_hire_dt
5 from employee
6 order by department_id, hire_date, first_name||' '||last_name;
FIRST_NAME||''||LAST_NAME D...MENT_ID HIRE_DATE SUM_DEPT_EMP
AVG_DEPT_HIRE_DT

Emily Eckhardt 10 07-JUL-04 240000

100000
Donald Newton 10 24-SEP-06 140000
85000
Betsy James 10 16-MAY-07 60000
74166.6667
Matthew Michaels 10 16-MAY-07 310000
74166.6667
Roger Friedli 10 16-MAY-07 370000
74166.6667
Lori Dovichi 10 07-JUL-11 240000
97000
michael peterson 20 03-NOV-08 525000
76428.5714
mark leblanc 20 06-MAR-09 435000
75000
Thomas Jeffrey 30 27-FEB-10 895000
100000
Theresa Wong 30 27-FEB-10 595000
97000
87500
11 rows selected.
Conclusion
This article concludes the discussion of analytic functions introduced in Part 10 and continued in
Part 11 of this series. It demonstrates how LISTAGG, PIVOT, and UNPIVOT can be used to
manipulate the way your data is displayed. Youve seen how to turn columns into rows and rows
into columns and how such views differ from each other. Youve learned how to take individual
data items and return them as delimited lists for more-readable reports. Youre also now aware of
the specific caveats that apply when you use these functionalities.
Last but not least, youre aware now that queries that use analytic functions canif youre not
carefulconsume many, if not all, of your system resources. In all cases, analytic functions
greatly reduce the need to write complicated SQL to obtain the same results. But they are tools to
be wielded with equal parts of enthusiasm and caution. Review the documentation at bit.ly/yWtbz1
and bit.ly/R4cZyq for more details.
This article concludes the SQL 101 series. Youve learned basic relational database concepts and
many SQL coding constructs, but the series has given you only a glimpse into what Oracle SQL
has to offer you. Be sure to continue to read the documentation and try existing and new Oracle
Database features. Thank you for being readers of Oracle Magazine and of the SQL 101 series.
As you continue writing SQL, my hope is that you enjoy it as much as I do.

SQL Technology

Diunggah oleh

Informasi Dokumen

Deskripsi Asli:

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

SQL Technology

Diunggah oleh

Hak Cipta:

Format Tersedia

SQL 101: Get Your Information in Order

http://www.oracle.com/technetwork/issue-archive/2011/11-sep/o51sql-453459.html[8/30/2014 11:14:08 AM]

37 Frances Newton 2005-09-14 75000

Emily Eckhardt 07-JUL-04 $100,000

Matthew Michaels 16-MAY-07 $70,000

Matthew Michaels 16-MAY-07 $70,000

1 10 80000 29-SEP-12 String Literal

Eckhardt Emily 10 07-JUL-04 100000

Eckhardt Emily 10 07-JUL-04 100000

Eckhardt Emily 10 07-JUL-04 100000 Michaels

Eckhardt Emily 10 07-JUL-04 24-

Dovichi Lori 10 07-JUL-11

Dovichi Lori 10 07-JUL-11

10 100000 Emily Eckhardt Emily Eckhardt, Donald

370000 07-JUL-11 155000 06-MAR-09 370000 27-FEB-10 75000 14-

10 Eckhardt, Emily 100000 1

Emily Eckhardt 10 07-JUL-04 240000

Anda mungkin juga menyukai