You must have seen sessions waiting on the event latch: cache buffers chains from time to
time. If you ever wondered what this means and how you can reduce time spent on it, read on.
Here you will learn how buffer cache works, how Oracle multi-versioning works, how buffers
are allocated and deallocated, what hash chain is and how buffers are linked to it, what the role
of cache buffer chain latch is and why sessions wait on it, how to find the objects causing the
contention and how to reduce the time spent on that event.
While exploring the reasons for the slowness in database sessions, you check the wait interface
and see the following output:
SQL> select state, event from v$session where sid = 123;
STATE
EVENT
------- --------------------------WAITING latch: cache buffers chains
This event is more common, especially in applications that perform a scan of a few blocks of
data. To resolve it, you should understand what the cache buffers chains latch is and why
sessions have to wait on it. To understand that you must understand how the Oracle buffer cache
works. We will explore these one by one, and close with the solution to reducing the cache
buffers chains latch waits.
This is the fifth in the Series "100 Things You Probably Didn't Know About Oracle". If
you haven't already, I urge you to read the other parts
Part1 (Commit does not force writing of buffers into the disk),
the Oracle server process assigned to the session performs the following actions:
1) locates the block that contains the record with EMPNO = 1
2) loads the block from the database file to an empty buffer in the buffer cache
3) if an empty buffer is not immediately found, locates an empty buffer or forces the DBWn
process to write some dirty buffers to make room
4) updates the NAME column to ROB in the buffer
In step 1, we assume an index is present and hence the server process can locate the single block
immediately. If the index is not present, Oracle will need to load all the blocks of the table EMP
into the buffer cache and check for matching records one by one.
The above description has two very important concepts:
1) a block, that is the smallest unit of storage in the database
2) a buffer, that is a placeholder in the buffer cache used to hold a block.
Buffers are just placeholders, which may or may not be occupied. They can hold exactly one
block at a time. Therefore for a typical database where the block size is set to 8KB, the buffers
are also of size 8KB. If you use multiple block sizes, e.g. 4KB or 16KB, you would have to
define multiple buffer caches corresponding to the other block sizes. In that case the buffer sizes
will match the block sizes corresponding to those blocks.
When buffers come to the cache, the server process must scan through them to get the value it
wants. In the example shown above, the server process must find the record where EMPNO=1.
To do so, it has to know the location of the blocks in the buffers. The process scans the buffers in
a sequence. So, buffers should ideally be placed in a sequence, e.g. 10 followed by 20, then 30,
etc. However this creates a problem. What happens when, after this careful placement buffers,
buffer #25 comes in? Since it falls between 20 and 30, it must be inserted in-between, i.e. Oracle
must move all the buffers after 20 one step towards the right to make room for the new buffer
#25. Moving of memory areas in the memory is not a good idea. It costs expensive CPU cycles,
requires all actions on the buffers (even reading) to stop for the duration and is prone to errors.
Therefore, instead of moving the buffers around, a better approach is to put them in something
like a linked list. Fig 1 shows how that is done. Each of the buffers has two pointers: which one
is behind and which one is right ahead. In this figure, buffer 20 shows that 10 is in front and 30 is
the one behind. This would be the case regardless of the actual position of the buffers. When 25
comes in, all we have to do is to update the behind pointer of 20 and ahead pointer of 30 to
point to 25. Similarly the ahead pointer and behind pointer of 25 are updated to point to 30
and 20 respectively. This simple update is much quicker, does not need activity to stop on all the
buffers except the ones being updated and less error-prone.
However, there is another problem. This is just one of the lists. Buffers are used for other
purposes as well. For instance, the LRU algorithm needs a list of buffers in LRU order, the
DBWn process needs a list of buffers for writing to the disk, etc. So, physically moving the
buffers to specific lists is not just impractical, its impossible at the same time. Oracle employs a
simpler technique to overcome the obstacle. Rather than placing the actual buffers in the linked
list, Oracle creates a simpler, much lighter structure called buffer header as a pointer to an actual
buffer. This buffer cache is moved around, leaving the actual buffer in place. This way, the buffer
header can be listed in many types of lists at the same time. These buffer headers are located in
the shared pool, not the buffer cache. This is why you will find the reference to buffers in the
shared pool.
Buffer Chains
The buffers are placed in strings. Compare that to rows of spots in a parking lot. Cars come in to
an empty spot in a row. If they dont find one, they go to the next row and so on. Similarly
buffers are located on the cache as rows. However, unlike the parking spots which are physically
located next to each other, the buffers are logically placed as a sequence in the form of a linked
list, described in the above section. Each linked list of buffers is known as a buffer chain, as
shown in Fig 2.
Notice how each of the three chains has different numbers of buffers. This is quite normal.
Buffers are occupied only when some server process brings them up from the block. Otherwise
the buffers are free and not linked to anything. When the buffers are freed up, perhaps because
some process such as DBWn writes their contents to the disk, they are removed from the lista
process known as unlinking from the chain. So, in a normal database, buffers will be constantly
linked to and unlinked from a chainmaking the chain long or small depending on the
frequency of either activity. The number of buffer chains is determined by the hidden database
parameter _db_block_hash_buckets, which is automatically calculated from the size of the
buffer cache.
When a server process wants to access a specific buffer in the cache, it starts at the head of the
chain and goes on to inspect each buffer in sequence until it finds what it needs. This is called
walking the chain. You might be wondering about a nagging question herewhen a buffer
comes to the cache, who decides which of the three chains it should be linked to and how? A
corollary to that is a challenge posed by the server process in trying to find a specific buffer in
the cache. How does the process know which chain to walk? If it always starts at the chain 1, it
will take an extraordinary amount of time to locate the block. Typical buffer caches are huge, so
the number of chains may run into 10s of thousands, if not 100s. So, searching all the chains is
not practical. On the other hand, if Oracle were to maintain a memory table showing which
blocks are located in which buffers is not practical either, because maintaining that memory table
will be time consuming and make the process sequential. Several processes cant read chains in
parallel then.
Oracle solves the problem in a neat manner. Consider the parking lot example earlier. What if
you forget where you parked your car? Suppose after you come out of the mall, you find that all
the cars have been buried under a thick pile of snow making identification of any of the cars
impossible. So, you would have to start at the first car at the first row, dust off the snow from the
license plate, check for your car, move on to the next, and so on. Sounds like a lot of work,
doesnt it? So, to help forgetful drivers, the mall marks the rows with letter codes and asks the
drivers to park in the row matching the first letter of their last name. John Smith will need to park
in row S, and in row S only, even if row T or row R are completely empty. In that case, when
John returns to find his car and forgets where it is, he will know to definitely find it in row S.
That will be the domain of his searchmuch, much better than searching the entire parking lot.
Similarly, Oracle determines which specific chain a buffer should be linked to. Every block is
uniquely identified by a data block address (DBA). When the block comes to the buffer cache,
Oracle applies a hash function to determine the buffer chain number and places the block in a
buffer in that chain alone. Similarly, while looking up a specific buffer, Oracle applies the same
hash function to the DBA, instantly knows the chain the buffer will be found and walks that
specific buffer only. This makes accessing a buffer much easier compared to searching the entire
cache.
To find out the data block address, you need to first get the relative file# and block#. Here is an
example where I want to find out the blocks of the table named CBCTEST.
SQL> select
2
col1,
3
dbms_rowid.rowid_relative_fno(rowid) rfile#,
4
dbms_rowid.rowid_block_number(rowid) block#
5 from cbctest;
COL1
RFILE#
BLOCK#
---------- ---------- ---------1
6
220
2
6
220
3
6
220
4
6
221
5
6
221
221
6 rows selected.
From the output we see that there are 6 rows in this table and they are all located in two blocks in
a file with relative file# 6. The blocks are 220 and 221. Using this, we can get the data block
address. To get the DBA of the block 220:
SQL> select dbms_utility.make_data_block_address(6,220) from dual;
DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(6,220)
------------------------------------------25166044
The output shows the DBA of that block is 25166044. If there are three chains, we could apply a
modulo function that returns the reminder from an input after dividing it by 3:
SQL> select mod(25166044,3) from dual;
MOD(25166044,3)
--------------1
So, we will put it in chain #1 (assuming there are three chains and the first chain starts with 0).
The other block of that table, block# 221 will end up in chain #2:
SQL> select dbms_utility.make_data_block_address(6,221) from dual;
DBMS_UTILITY.MAKE_DATA_BLOCK_ADDRESS(6,221)
------------------------------------------25166045
SQL> select mod(25166045,3) from dual;
MOD(25166045,3)
--------------2
And so on. Conversely, Oracle if we get a DBA, we can apply the mod() function and the output
shows the chain it can be found on. Oracle does not use the exact mod() function as shown here;
but a more sophisticated hash function. The exact mechanics of the function is not important; the
concept is similar. Oracle can identify the exact chain the buffer needs to go to by applying a
hash function on the DBA of the buffer.
Multi-versioning of Buffers
Consider the update SQL statement shown in the beginning of the paper. When Oracle updates
the buffer that already exists in the buffer cache, it does not directly update it. Instead, it creates a
copy of the buffer and updates that copy. When a query selects data from the block as of a certain
SCN number, Oracle creates a copy of the buffer as of the point in time of interest and returns the
data from that copy. As you can see, there might be more than a single copy of the same block in
the buffer cache. While searching for a buffer the server process needs to search for the versions
of the buffer as well. This makes the buffer chain even longer.
To find out the specific buffer of a block, you can check the view V$BH (the buffer headers).
The column OBJD is the object_id. (Actually it's the DATA_OBJECT_ID. In this case both are
the same; but may not be in all cases). Here are the columns of interest to us:
To make it simpler to understand, we will use a decode() on the class# field to show the type of
the block. With that, here is our query:
select file#, block#,
decode(class#,
1,'data block',
2,'sort block',
3,'save undo block',
4,'segment header',
5,'save undo header',
6,'free list',
7,'extent map',
8,'1st level bmb',
9,'2nd level bmb',
10,'3rd level bmb',
11,'bitmap block',
12,'bitmap index block',
13,'file header block',
14,'unused',
15,'system undo header',
16,'system undo block',
17,'undo header',
18,'undo block')
class_type,
status
from v$bh
where objd = 99360
order by 1,2,3
/
FILE#
BLOCK# CLASS_TYPE
---------- ---------- ----------------6
219 segment header
6
221 segment header
6
222 data block
6
220 data block
STATUS
---------cr
xcur
xcur
xcur
4 rows selected.
There are 4 buffers. In this example we have not restarted the cache. So there are two buffers for
the segment header. There is one buffer for each data block 220 and 221. The status is "xcur",
which stands for Exclusive Current. It means that the buffer was acquired (or filled by a block)
with the intention of being modified. If the intention is merely to select, then the status would
have shown CR (Consistent Read). In this case since the rows were inserted modifying the
buffer, the blocks were gotten in xcur mode. From a different session update a single row. For
easier identification I have used Sess2> as the prompt:
Sess2> update cbctest set col2 = 'Y' where col1 = 1;
1 row updated.
STATUS
---------cr
xcur
xcur
cr
xcur
5 rows selected.
There are 5 buffers now, up one from the previous four. Note there are two buffers for block ID
220. One CR and one xcur. Why two?
It's because when the update statement was issued, it would have modified the block. Instead of
modifying the existing buffer, Oracle creates a "copy" of the buffer and modifies that. This copy
is now XCUR status because it was acquired for the purpose of being modified. The previous
buffer of this block, which used to be xcur, is converted to "CR". There can't be more than one
XCUR buffer for a specific block, that's why it is exclusive. If someone wants to find out the
most recently updated buffer, it will just have to look for the copy with the XCUR status. All
others are marked CR.
Suppose from a third session, update a different row in the same block.
Sess3> update cbctest set col2 = 'Y' where col1 = 2;
1 row updated.
STATUS
---------xcur
cr
xcur
xcur
cr
cr
cr
cr
cr
Whoa! There are 9 buffers now. Block 220 now has 6 buffers - up from 4 earlier. This was
merely a select statement, which, by definition does not change data. Why did Oracle create a
buffer for that?
Again, the answer is CR processing. The CR processing creates copies of the buffer and rolls
them back or forward to create the CR copy as of the correct SCN number. This created 2
additional CR copies. From one block, now you have 6 buffers and some buffers were created as
a result of select statement. This how Oracle creates multiple versions of the same block in the
buffer cache.
Latches
Now that you know how many buffers can be created and how they are located on the chains in
the buffer cache, consider examine another problem. What happens when two sessions want to
access the buffer cache? There could be several possibilities:
1)
2)
3)
Possibility #3 is not an issue; but #2 will be. We dont allow two processes to walk the chain at
the same time. So there needs to be some sort of a mechanism that prevents other processes to
perform an action when another process is doing it. This is enabled by a mechanism called a
latch. A latch is a memory structure that processes compete to acquire. Whoever gets is is said to
hold the latch; all others must wait until the latch is available. In many respects it sounds like a
lock. The purpose is the sameto provide exclusive access to a resourcebut locks have
queues. Several processes waiting for a lock will get it when the lock is released in the same
sequence they started waiting. Latches, on the other hand, are not sequential. Whenever latches
are available, every interested process jumps into the fray to capture it. Again, only one gets it;
the others must wait. A process first performs a loop, for 2000 times to actively look for the
availability of a latch. This is called spinning. After that the process sleeps for 1 ms and then
retries. If not successful, it tries for 1 ms, 2 ms, 2 ms, 4 ms, 4 ms, etc. until the latch is obtained.
The process is said to be sleep state in between.
So, latches are the mechanism for making sure no two processes are accessing the same chain.
This latch is known as cache buffers chains latch. There is one parent CBC latch and several
child CBC latches. However, latches consume memory and CPU; so Oracle does not create as
many child latches as there are chains. Instead a single latch may be used for two or more chains,
as shown in Fig 3. The number of child latches is determined by the hidden parameter
_db_block_hash_latches.
Latches are identified by latch# and child# (in case of child latches). A specific instance of latch
that is used is identified by its address in memory (latch address). To find out the latch that
protects a specific buffer, get the file# and block# as shown earlier and issue this SQL:
select hladdr
from x$bh
where dbarfil = 6
and dbablk = 220;
Going back to CBC latches, lets see how you can find out the correlation between chains and
latches. First, find the Latch# of the CBC latch. Latch# may change from version to version or
across platforms; so its a good idea to check for it.
select latch# from v$latch
where name = 'cache buffers chains';
LATCH#
-----203
This is the parent latch. To find out the child latches (the ones that protect the chains), you should
look into another viewV$LATCH_CHILDREN. To find out how many child latches are there:
SQL> select count(1) cnt from v$latch_children where latch# = 203;
CNT
------16384
If you check the values of the two hidden parameters explained earlier, you will see:
_db_block_hash_buckets 524288
_db_block_hash_latches 16384
The parameter _db_block_hash_buckets decides how many buffer chains are there and the
parameter _db_block_hash_latches decides the number of CBC latches. Did you notice the
value, 16384? It determines the number of CBC latches and we confirmed that it is in fact the
number of CBC latches.
Lets now jump into resolving the CBC latch issues. The sessions suffering from CBC latch
waits will show up in V$SESSION. Suppose one such session is SID 366. To find out the CBC
latch, check the P1, P1RAW and P1TEXT values in V$SESSION, as shown below:
select p1, p1raw, p1text
from v$session where sid = 366;
P1
P1RAW
P1TEXT
---------- ---------------- ------5553027696 000000014AFC7A70 address
P1TEXT clearly shows the description of the P1 column, i.e. the address of the latch. In this case
the address is 000000014AFC7A70. We can check the name of the latch and examine how many
times this latch has been requested by sessions but has been missed.
SQL> select gets, misses, sleeps, name
2 from v$latch where addr = '000000014AFC7A70';
GETS MISSES SLEEPS NAME
----- ------ ------ -------------------49081
14
10 cache buffers chains
From the output we conform that this is a CBC latch. It has been acquired 49,081 times, 14 times
missed and 10 times processes have gone to sleep waiting for it.
Next, identify the object whose buffer is so popular. Get the File# and Block# from the buffer
cache where the CBC latch is the latch address we identified to be the problem:
select dbarfil, dbablk, tch
from x$bh
where hladdr = '000000014AFC7A70';
DBARFIL DBABLK TCH
------- ------ ----6
220 34523
The TCH column shows the touch count, i.e. how many times the buffer has been accesseda
measure of its popularity and hence how much likely that it will be subject to CBC latch waits.
From the file# and block# we can get the object ID. The easiest way is to dump the block and get
the object ID from the dump file. Here is how you dump the above mentioned block.
alter system dump datafile 6 block min 220 block max 220;
Get the object ID (the value after objn). Using that value you can get the object name:
Now you know the table whose blocks are so highly popular resulting in CBC latches.
You can rewrite the code by selecting the data from the table into a collection using bulk collect
and then selecting from that collection rather than from the table. The SQL_ID column of the
V$SESSION will show you which SQLs are causing the CBC latch wait and getting to Object
shows you which specific object in that query is causing the problem, allowing you to devise a
better solution.
You can also proactively look for objects contributing to the CBC latch wait in the Active
Session History, as shown below:
select p1raw, count(*)
from v$active_session_history
where sample_time < sysdate 1/24
and event = 'latch: cache buffers chain'
group by event, p1
order by 3 desc;
The P1RAW value shows the latch address, using which you can easily find the file# and block#:
select o.name, bh.dbarfil, bh.dbablk, bh.tch
from x$bh bh, sys.obj$ o
With the approach shown earlier, you can now get the object information from the file# and
block#. Once you know the objects contributing to the CBC latch waits, you can reduce the waits
by reducing the number of times the latch is requested. That is something you can do by making
the blocks of the table less popular. The less the number of rows in a block, the less popular the
block will be. You can reduce the number of rows in a block by increasing PCTFREE or using
ALTER TABLE MINIMIZE RECORDS_PER_BLOCK. If that does not help, you can partition a table.
That forces the data block address to be recomputed for each partition, making it more likely that
the buffers will end up in different buffer chains and hence the competition for the same chain
will be less.
Conclusion
In this blog you learned how Oracle manages the buffer cache and how latches are used to ensure
only one process can walk the chain to access a buffer. This latch is known as Cache Buffer
Chain latch. You learned why this latch is obtained by Oracle and how to reduce the possibility
that two processes will want the latch at the same time. I hope this helped you understanding and
resolving Cache Buffer Chains Latch related waits. Your feedback will be highly appreciated.
Imagine a situation where you have to create two different schemas in the same databases; but both with the same
name. A typical example is in the case of Peoplesoft applications which have a specific schema name - SYSADM,
that can't be changed. So if you want to install two Peoplesoft applications in the same database, you will soon
discover that it's not possible since you can't have two schemas named SYSADM in the same database. So, what
are your choices?
Well, you could create two different databases. In fact, prior to Oracle Database 12c that was your only choice. But
with two different databases comes two different sets of overheads - two Oracle instances (the memory areas such
as SGA and the processes such as pmon and smon) which consume memory and CPU cycles in the host. The more
databases you have, the more the CPU and memory usage - all because you want to create multiple schemas in the
same name.
Not any more, in the multi-tenancy option in Oracle Database 12c. Instead of creating a physical database for each
SYSADM schema you want to create, you can a virtual database for each schema. Each virtual database behaves
like an independent database; but runs on the top of a real, physical database which may be hidden from the end
users. These virtual databases are called Containers. The physical database that houses these containers, is in
effect a database of containers, and is known as a Container Database (CDB). You can pull out (or "unplug") a
container from one CDB and place it (or, "plug" it) into another CDB. This is why a container is also known as
a Pluggable Database (PDB). For all practical purposes from the perspective of the clients, the PDBs are just
regular databases.
Please note a very important point: It is NOT necessary that the database be created as a CDB with PDBs inside it.
You can also create a database exactly how it was (non- CDB) in the prior versions. The multi-tenancy option to the
Oracle Database 12c is required to create s. That is an extra cost option; but there is no cost to create exactly one
PDB inside a CDB. Later in this article you will see how to create a database as a PDB. To find out if the database
has been created as a CDB or not, just check the column called CDB in the view V$DATABASE.
You can check the containers (or PDBs) created in a database in a view named V$PDBS, which is brand new in
Oracle Database 12c.
select con_id, dbid, name
from v$pdbs;
CON_ID DBID NAME
---------- ---------- -----------------------------2 4050437773 PDB$SEED
3 3315520345 PDB1
4 3874438771 PDB2
5 3924689769 PDB3
Note how the DBIDs are also different for each PDB. There are two striking oddities in this output:
There is no CON_ID of 1. The answer is simple - there is a special container called the "root" container,
known as CDB$Root that is created to hold the metadata. This container has the CON_ID of 1.
There is a PDB called PDB$SEED, which is something we didn't create. You will get the explanation of this
PDB later in the article.
There are new built-in functions to identify PDBs from their details without querying the V$PDBS view. Here is an
example how to identify the container ID from the name:
SQL> select con_name_to_id('PDB2') from dual;
CON_NAME_TO_ID('PDB2')
---------------------4
And, here is how you can get the container ID from the DBID:
SQL> select con_dbid_to_id(3924689769) from dual;
CON_DBID_TO_ID(3924689769)
-------------------------5
Operating on Specific PDBs
The next big question you may have is considering the unusual nature of the PDBs (they are virtual inside a real
database) how you can operate on a specific PDB. There are several approaches. Let's examine them one by one.
Session Variable. You can set a session variable called container to the name of the PDB you want to
operate on. First connect to the CDB as usual. Here is how I connected as the SYSDBA user:
Now all commands in this session will be executed in the context of the PDB called PDB1. For instance suppose you
want to shutdown the PDB named PDB1, you would issue:
Service Name. When you create a PDB, Oracle automatically adds it as a service in the listener. You can
confirm it by looking at the listener status:
The service "pdb1" actually points to the PDB called PDB1. It's very important to note that that this is not a service
name in initialization parameter of the database, as you can see from the service_names parameter of the
database.
PDB1 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = prosrv1.proligence.com)(PORT = 1521))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = PDB1)
)
)
Now you can connect to PDB1 using the connect string:
Using TWO_TASK. A third way is by defining the TWO_TASK operating system variable to point to the PDB
you want to connect to:
DBCA Approach
1.
Start DBCA.
2.
3.
4.
5.
In the next screen you will see a list of container databases (CDBs). Choose the one where you want the
PDB to be created. In this case we have only one CDB called CONA.
6.
In the next screen click on the option "Create a New Pluggable Database."
7.
On the next screen you will have to answer some questions about the PDB you are about to create.
8.
Enter the name PDB2, or something else you want to name it as. Let's examine the options on the screen.
9.
The PDB uses some storage exclusive to its own use; which is not part of the root container CDB$Root.You
will need to mention in the screen how you want that storage to be created. In this case I chose Oracle Managed File
(OMF) which lets Oracle put them in the proper location. I could have also chosen instead to put these files in a
common location, which means I would have remember to clean up the files later if I drop the PDB. The overall
storage occupied by CDB is a sum of the root container - CDB$Root, the seed PDB - PDB$Seed and all the PDBs
contained in it.
10.
I also checked a box called "Create a Default User Tablespace". Every PDB may contain its own USERS
tablespace that will be default tablespace of the users if not explicitly specified. This is very useful if you want the
default tablespace to be different for each PDB. That way you can make sure that not all the users from one PDB
take over the space in a common tablespace.
11.
You have to use a special user who can administer the PDB. In the above screen I used the name "syspdb2"
and entered its password.
12.
After the PDB is created, you will see a message like the following screen.
13.
After the PDB creation, you may examine the alert log to see the various activities performed to create the
PDB:
14.
15.
40.
Manual Approach
You don't have to fire up the DBCA interface. A simple SQL command does the trick. Connect to the CDB as
SYSDBA:
$ sqlplus sys/oracle as sysdba
SQL> create pluggable database pdb3
2 admin user syspdb3 identified by syspdb3
3 roles=(CONNECT,DBA)
4 /
Pluggable database created.
You will learn about the different clauses used here (admin user, roles, etc.) later. The PDB is created but not open
yet. You have to manually open it:
SQL> alter pluggable database pdb3 open;
Pluggable database altered.
Now that the database is open, you can connect to it using the different methods shown earlier. Note that you have to
be authenticated with SYSDBA role for this operation.
The column COMMON shows you if the user is common or not. From the output you know C##FINMASTER is
common while HRMASTER is not. You can also see that C##FINMASTER shows up in all containers while
HRMASTER shows up only in container 3, where it was originally created.
Although common users can be created in a CDB, there is little use of that in a real life application. Ordinarily you will
create local users in each PDB as required and that is what Oracle recommends.
Administration
So far you learned how the PDBs are considered independent from each other allowing you to create users with the
same names while not proliferating the actual databases. The next important topic you are probably interested in
learning is how to manage this entire infrastructure. Since the PDBs are logically different, it's quite conceivable that
separate DBAs are responsible for managing them. In that case, you want to makes sure the privilege of these DBAs
fall within the context of the respective container and not outside of it. >br> Earlier you saw how to create the PDB.
Here it is once again:
SQL> create pluggable database pdb3
2 admin user syspdb3 identified by syspdb3
3 roles=(CONNECT,DBA);
Note the clause "admin user syspdb3 identified by syspdb3". It means the PDB has a user called syspdb3 which is an
admin user. The next line "roles=(CONNECT,DBA)" indicates that the user has the CONNECT and DBA roles. This
becomes the DBA user of the PDB. Let's see that by connecting as that user and confirming that the roles have
enabled.
[oracle@prosrv1 trace]$ sqlplus syspdb3/syspdb3@pdb3
...
SQL> select * from session_roles;
ROLE
------CONNECT
DBA
PDB_DBA
... output truncated ...
Note that this is for this PDB alone; not in any other PDB. For instance, if you connect to PDB2, it will not work:
[oracle@prosrv1 ~]$ sqlplus syspdb3/syspdb3@pdb2
ERROR:
ORA-01017: invalid username/password; logon denied
Back in PDB3, since this user is a DBA, it can alter the parameters of the PDB as needed:
SQL> alter system set optimizer_index_cost_adj = 10;
System altered.
This is a very important point to consider here. The parameter changed here is applicable only to PDB3; not to any
other PDBs. Let's confirm that: In PDB2:
SQL> conn syspdb2/syspdb2@pdb2
Connected.
SQL> show parameter optimizer_index_cost_adj
NAME TYPE VALUE
------------------------------------ ----------- -----------------------------optimizer_index_cost_adj integer 100
However, in PDB3:
SQL> conn syspdb3/syspdb3@pdb3
Connected.
SQL> show parameter optimizer_index_cost_adj
NAME TYPE VALUE
------------------------------------ ----------- -----------------------------optimizer_index_cost_adj integer 10
PDB2 has the old, unchanged value while PDB3 has the changed value. This is a very important property of the
PDBs. You can change parameters in specific containers to suit the application. There is no need to force a common
parameter value for all the containers in the CDB. A classic example is in the case of two containers - one for
production and other for development. You may want to force values of a parameter for performance reasons. This
could be a permanent setting or just temporarily for some experiments. You can change the value in only one
container without affecting the others.
Note that not all the parameters can be modified in a PDB. A column ISPDB_MODIFIABLE in V$PARAMETER shows
whether the parameter can be modified in a PDB or not. Here is an example:
SQL> select name, ispdb_modifiable
2 from v$parameter
3 where name in (
4 'optimizer_index_cost_adj',
5 'audit_trail'
6* )
SQL> /
NAME ISPDB
------------------------------ ----audit_trail FALSE
optimizer_index_cost_adj TRUE
The audit_trail parameter is for the entire CDB; you can't modify them for individual PDBs. It makes sense in many
ways. Since audit trail is something that is for a physical database; not a virtual one, it is not modifiable for individual
PDBs. Similarly some parameters such as db_block_buffers, which is for an Oracle instance are non-modifiable as
well. That parameter is for an Oracle instance. A PDB doesn't have an instance; so the parameter has no relevance in
the PDB context and hence is non-modifiable.
Additionally, you can also use any of the normal ALTER SYSTEM commands. A common example is identifying
errant sessions and killing them. First we identify the session from V$SESSION. However, since V$SESSION shows
background processes for CDB as well, you need to trim down to show only for the current PDB. To do that, get the
container_id and filter the output from v$ession using that.
SQL> show con_id
CON_ID
-----------------------------5
SQL> select username, sid, serial#
2 from v$session
3 where con_id = 5;
USERNAME SID SERIAL#
------------------------------ ---------- ---------SYSPDB3 49 54303
C##FINMASTER 280 13919
2 rows selected.
SQL> alter system kill session '280,13919';
System altered.
There is a special case for starting and shutting down the PDBs. Remember, the PDBs themselves don't have any
instance (processes and memory areas) or controlfile and redo logs. These elements of an Oracle database instance
belongs to the CDB and shutting them down will shutdown all the PDBs. Therefore there is no concept called an
instance shutdown in case of PDBs. When you shutdown or startup a PDB, all that happens is that the PDB is closed.
Similarly the startup of PDB merely opens the PDB. The instance is already started, since that belongs to the CDB.
Let's see with an example.
[oracle@prosrv1 pluggable]$ sqlplus sys/oracle@pdb1 as sysdba
SQL*Plus: Release 12.1.0.1.0 Production on Sat Mar 9 14:51:38 2013
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Advanced Analytics
and Real Application Testing options
SQL> shutdown
Pluggable Database closed.
Here is the corresponding entry from alert log:
2013-03-09 14:51:50.022000 -05:00
ALTER PLUGGABLE DATABASE CLOSE
ALTER SYSTEM: Flushing buffer cache inst=0 container=3 local
Pluggable database PDB1 closed
Completed: ALTER PLUGGABLE DATABASE CLOSE
Adding Services in PDBs
Remember, Oracle automatically creates service names in the same name as the PDBs. This lets you connect to the
PDBs directly from the clients using the SERVICE_NAME clause in the TNS connect string. However, occasionally
you may want to add services in the PDBs themselves. To do so you can use the SRVCTL command with a special
parameter "-pdb" to indicate the PDB it should be created in:
[oracle@prosrv1 ~]$ srvctl add service -db CONA -s SERV1 -pdb PDB1
If you want to check on the service SERV1, use:
[oracle@prosrv1 ~]$ srvctl config service -db CONA -s SERV1
Service name: SERV1Service is enabled
Cardinality: SINGLETON
Disconnect: false
Service role: PRIMARY
Management policy: AUTOMATIC
DTP transaction: false
AQ HA notifications: false
Global: false
Commit Outcome: false
Failover type:
Failover method: TAF failover retries:TAF failover delay:
Connection Load Balancing Goal: LONG
Runtime Load Balancing Goal: NONE
TAF policy specification: NONE
Edition:
Pluggable database name: PDB1
Maximum lag time: ANY
SQL Translation Profile:
Retention: 86400 seconds
Replay Initiation Time: 300 seconds
The CPU allocation is controlled by a parameter called "shares", which determine how the CPU will be
divided among the PDBs in case of a crunch. So, if you have two PDBs - PDB1 and PDB2 - with shares parameter of
1 and 2 respectively, it tells the DBRM that PDB2 should get twice the amount of CPU consumed by PDB1. Note that
DBRM kicks in only when there is a contention for CPU. If the total demand is less that 100%, everyone gets as mush
they want; but if there a contention then PDB2 is guaranteed 2/(1+2), i.e. 2/3rd of the available CPU. Since PDB1's
share is only 1, it's guaranteed 1/3rd of the CPU.
2.
The second parameter is "utilization_limit", which puts a ceiling on the CPU consumption by a specific PDB
even if there is spare CPU available in that CDB. This is specified as a percentage of the total CPU. This parameter
allows you to put a cap on the CPU consumption by a PDB for any reason.
3.
The third parameter is "parallel_server_limit", which limits the number of parallel query servers that can be
kicked off in the PDB. This is a percentage of the overall maximum parallel query servers in a CDB.
Let's see how to implement this with an example. Suppose we have three PDBs named PDB1, PDB2 and PDB3.
PDB1 hosts the most important applications. If there is a CPU contention, we want to give PDB1 50%, PDB2 25%
and PDB3 25% of the available CPU respectively. When there is plenty of CPU, we don't want to limit any CPU
consumption by PDB1, since it hosts critical apps; but we want to limit PDB2 and PDB3 so that they can't ever take
up more than 50% and 70% of the CPUs respectively. We also want to limit the parallel query servers to 50% and
70% of the value defined by parallel_max_servers.
To implement this structure, we will execute the following PL/SQL block:
begin
dbms_resource_manager.clear_pending_area();
dbms_resource_manager.create_pending_area();
-- create the CDB resource plan
dbms_resource_manager.create_cdb_plan(
plan => 'dayshift_cona_plan',
comment => 'cdb plan for cona'
);
-- give the limits in the plan for PDB1
dbms_resource_manager.create_cdb_plan_directive(
plan => 'dayshift_cona_plan',
pluggable_database => 'pdb1',
shares => 2,
utilization_limit => 100,
parallel_server_limit => 100
);
-- and, now the same for PDB2
dbms_resource_manager.create_cdb_plan_directive(
plan => 'dayshift_cona_plan',
pluggable_database => 'pdb2',
shares => 1,
utilization_limit => 50,
parallel_server_limit => 50
);
-- and now, PDB3
dbms_resource_manager.create_cdb_plan_directive(
plan => 'dayshift_cona_plan',
pluggable_database => 'pdb3',
shares => 1,
utilization_limit => 70,
parallel_server_limit => 70
);
dbms_resource_manager.validate_pending_area();
dbms_resource_manager.submit_pending_area();
end;
/
With the plan in place, you should now enable it for the CDB to start enforcing it:
In this example I used Oracle Managed Files (OMF) that allows the Oracle Database to determine the location of the
files. While it is a good practice, it's not absolutely necessary. If you use specific locations, and the locations differ on
the target PDB, you can use use file_name_convert clause to let the files be copied to the desired location.
Cloning is primarily the way the PDBs are created. But cloning needs a very important ingredient - the source PDB
which is cloned. In this example we the source was PDB1. When you create your very first PDB, you didn't specify;
so where does Oracle get the files from?
But there is indeed a source. Do you remember seeing a PDB named PDB$SEED? Let's check the PDBs in our
CDB:
SQL> conn system/oracle
Connected.
SQL> select name
2 from v$pdbs;
NAME
-----------------------------PDB$SEED
PDB1
PDB2
PDB3
PDB4
5 rows selected.
Here you can see the newly created PDB - PDB4. The PDB named PDB$SEED is the "seed" container from which all
other containers are cloned. So when you create a fresh new PDB, with the syntax shown below:
create pluggable database pdb6
admin user syspdb6 identified by syspdb6;
It's actually cloned from PDB$SEED. This is a very important fact to remember. It means if you want a the database
to be created in a certain way, e.g. the system tablespace has to be of a specific size, etc., you can change that in the
seed PDB database and the new PDB will have the same value since Oracle simply copies the files from the seed to
the new PDB.
Transport
The idea of cloning is not limited to be inside the same CDB, or even the same server. You can clone a PDB from a
different CDB or even a different host. You can also "move" a PDB from one CDB to a different one. For instance,
suppose you have an application running against a PDB in a host called serv1. To debug some issue in the app, the
developers want to point the test app against that database; but there is a little problem - the database is inside a
production firewall and the test app server can't connect to it. You are asked to create a copy of the database outside
the firewall. Normally, you would have resorted to backup and restore - a possible but time consuming process, not to
mention the careful planning and additional work. But with PDB, it's a breeze; you just "transport" the PDB to a
different CDB. Let's see an example where we transport a PDB called PDB4 to a different host.
1.
2.
3.
4.
5.
6.
7.
8.
9.
Copy this file as well as the datafiles to the target server. You probably had a listing of the datafiles already.
If not, simply refer to the XML file. All the files are there. If the datafiles are on ASM, which they are most likely in, use
the remote copy command of ASMCMD.
10.
11.
12.
Execute this:
13.
14.
15.
16.
18.
Here is a final thread to the blog posts I had posted in the last three days, about interesting
situations faced by John the DBA at Acme Bank. In the first post, you saw how John restored a
controlfile when the autobackup was not being done. In the second post you learned how John
discovered the DBID when someone forgot to record it somewhere. In the final installment you
will see what John does when the controlfile backup simply does not exist, or exists somewhere
but simply can't be found, thus rendering the previous tips useless.
This time, John had to recreate the controlfile from scratch. Let me reiterate, he had to recreate
the controlfile, using SQL; not restore it from somewhere. How did he do it? Following his own
"best practices", honed by years and years of managing Oracle databases, wise ol' John always
takes a backup of the controlfile to trace using this command:
alter database backup controlfile to trace as '/tmp/cont.sql' reuse;
This command produces a text file named cont.sql, which is invaluable in creating the
controlfile. John puts the command as a cron job (in Unix; as a auto job on Windows) on
database servers so that this command gets excuted every day creating the text file. The "reuse"
option at the end ensures the command overwrites the existing file which means the text file
contains fresh data from the database when it is opened. Here is an except from the beginning of
the generated file.
-- The following are current System-scope REDO Log Archival related
-- parameters and can be included in the database initialization file.
--- LOG_ARCHIVE_DEST=''
-- LOG_ARCHIVE_DUPLEX_DEST=''
-... output removed for brevity...
It is a very long file. John scrolls down to the section that shows the following information:
-------
Below are two sets of SQL statements, each of which creates a new
control file and uses it to open the database. The first set opens
the database with the NORESETLOGS option and should be used only if
the current versions of all online logs are available. The second
set opens the database with the RESETLOGS option and should be used
if online logs are unavailable.
--------
The appropriate set of statements can be copied from the trace into
a script file, edited as necessary, and executed when there is a
need to re-create the control file.
Set #1. NORESETLOGS case
The following commands will create a new control file and use it
As you can see, this file contains a complete syntax for creating the controlfile using CREATE
CONTROLFILE command. But more important, the command contains all the data files and
online redo logs of the database. This is invaluable information to create the controlfile. John
creates a SQL script file called create_controlfile.sql where he puts the CREATE
CONTROLFILE SQL command. It's one long command with several lines. Here is how the file
looks like (with lines removed in between for brevity). Remember, this is just one command; so,
there is just one semicolon at the end for the execution:
CREATE CONTROLFILE REUSE DATABASE "PROQA3" NORESETLOGS ARCHIVELOG
MAXLOGFILES 32
... output removed for brevity ...
'+PROQA3DATA1/PROQA3/PROQA1_sysaux_03.dbf'
CHARACTER SET AL32UTF8
;
Then John extracts the following commands immediately following the CREATE
CONTROLFILE command from that above mentioned file and puts them on another file named
create_temp_tablespaces.sql:
-- Commands to add tempfiles to temporary tablespaces.
-- Online tempfiles have complete space information.
-- Other tempfiles may require adjustment.
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_01.dbf'
SIZE 31744M REUSE AUTOEXTEND OFF;
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_02.dbf'
SIZE 30720M REUSE AUTOEXTEND OFF;
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_03.dbf'
SIZE 30720M REUSE AUTOEXTEND OFF;
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_04.dbf'
SIZE 30720M REUSE AUTOEXTEND OFF;
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_05.dbf'
SIZE 30720M REUSE AUTOEXTEND OFF;
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_06.dbf'
SIZE 31744M REUSE AUTOEXTEND OFF;
ALTER TABLESPACE TEMP1 ADD TEMPFILE '+PROQA3DATA1/PROQA3/PROQA1_temp1_07.dbf'
SIZE 31744M REUSE AUTOEXTEND OFF;
-- End of tempfile additions.
With the preparations completed, John proceeds to next steps. First, he starts up the instance with
NOMOUNT option. He has to use NOMOUNT anyway since the controlfile is missing:
startup nomount
This command brings up the instance only. Next, John creates the controlfile by executing the
file he created earlier--create_controlfile.sql. When the comamnd succeeds, he gets the following
message:
Control file created.
Voila! The controlfile is now created from scratch. With that the database is mounted
automatically. However, this newly created controlfile is empty; it does not have any information
on the database, sequence numbers, etc. It reads the information from the datafile headers; but
the data files may have been checkpointed at points in the past. John has to bring them up as
much forward as possible. He has to perform a recovery on the datafiles. From the SQL*Plus
prompt, he issues this statement:
SQL> recover database using backup controlfile;
ORA-00279: change 7822685456060 generated at 04/25/2014 17:11:38 needed for
thread 1
ORA-00289: suggestion : +PROQA3ARCH1
ORA-00280: change 7822685456060 for thread 1 is in sequence #3
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
It's important that John uses "using backup controlfile" option. This controlfile is not the
current one; so the recovery process must know that. John carefully notes the SCN# of the
archived log being asked for--7,822,685,456,060. He has to provide an archived log that contains
changes with this SCN. To know that, he opens up another SQL*Plus window, connects as
sysdba and gathers the archived log information:
col first_change# head "First SCN# in Archive" format 999,999,999,999,999
col name format a80
select first_change#, name
from v$archived_log
order by 1
/
Referring to this output, he sees that the latest archived log has the starting SCN# of
7,822,685,453,816, which is less than the SCN# being asked for. Therefore this archived log may
or may not contain the changes being asked by the recovery process. He decided to give that
archived log anyway. So he pastes the entire path of the archived log at the prompt:
+PROQA3ARCH1/PROQA3/archivelog/2014_04_25/thread_1_seq_2.330.845829419
Clearly, the archived log John supplied is not something the the recovery process was looking
for. But that was the latest archived log; there is nothing after that. Remember, the data could
also be there on the online redo log which have not been archived yet. John has to make a
decision here. If the online redo logs are not available, he needs to end the recovery here by
typing:
cancel
On the other hand, if the online redo logs are intact and available, he will need to just pass it to
the recovery process. He gathers the details on the online redo logs from the other SQL*Plus
window:
select sequence#, member
from v$log l, v$logfile f
From the first SQL*Plus window, John starts the recovery process again (the recovery process
ends when it does not get the file it expects) and this time he supplies the name of the online redo
log file:
SQL> recover database using backup controlfile;
Voila! The database does not need any other recovery. Since the online logfile contains the last
known change, Oracle knows that there is no further recovery required and hence it stops asking
for any more changes. John has just recovered all the changes made to the database; nothing was
lost. He proceeds to opening the database.
alter database open resetlogs
Resetlogs is necessary here because John used a controlfile that he created. Remember, this is a
complete recovery (nothing was lost); but the database must be opened with resetlogs. This starts
the log sequence at 1 again. From a different window, John opens up the alert log of the database
and checks for the output:
... previous output removed for brevity ...
The output shows that the online redo logs started at sequence #1. With the recovery now
complete, John creates the temporary tablespaces using the script he had created earlier-create_temp_tablespaces.sql. Then he passes the database to the users for normal processing.
Takeaways
What did you learn from this story and John? Here is a summary:
1. Always use a recovery catalog. This post assumes that you lost that catalog as well; but
now you see how difficult it is without the catalog.
2. Always set the controlfile to autobackup. From the RMAN command prompt, issue
configure controlfile autobackup on. The default if off.
3. Always backup the RMAN logfile to the tape or other location where it would be
available even after the main sever with the database itself is inaccessible.
4. Always backup the controlfile to trace with a cron job that executes once a day and
updates the existing file.
5. If the controlfile backup is missing, check for the controlfile backup in the following
possible locations:
o snapshot controlfile
o backup taken in some location
6. Look for possible controlfile backups from RMAN log files.
7. If no backup of controlfile is available, create the controlfile from the trace you have
presumably created.
8. While recovering the database after creating the controlfile, always try giving the most
recent online redo logs as archived log names to achieve a complete recovery.
Thank you for reading. As always, I will appreciate your feedback. Tweet me at @ArupNanda,
or just post a comment here.
2. Set a tracefile identifier for easy identification of the trace file that will be generated.
SQL> alter session set tracefile_identifier = arup;
Session altered.
3. Dump the first few blocks of the datafile. The file of the SYSTEM tablespace works
perfectly. 10 blocks will do nice
SQL> alter system dump datafile
'+PROQA3DATA1/PROQA3/PROQA1_system_01.dbf' block min 1 block max 10;
System altered.
4. Check the trace file directory for a file with the term "ARUP" in it
prolin1:/PROQA/orabase/diag/rdbms/PROQA3/PROQA31/trace>ls -l *ARUP*
-rw-r--r-- 1 oracle asmadmin 145611 Apr 24 21:17
PROQA31_ora_61079250_ARUP.trc
-rw-r--r-- 1 oracle asmadmin 146 Apr 24 21:17
PROQA31_ora_61079250_ARUP.trm
2014-04-24 21:17:16.957
SESSION ID:(937.3) 2014-04-24 21:17:16.957
CLIENT ID:() 2014-04-24 21:17:16.957
SERVICE NAME:() 2014-04-24 21:17:16.957
MODULE NAME:(sqlplus@prolin1 (TNS V1-V3)) 2014-04-24 21:17:16.957
ACTION NAME:() 2014-04-24 21:17:16.957
6. Note the section marked in red. The DBID is prominently displayed there.
Db ID=2553024456
Without the controlfile, the recovery was stuck, even though all the valid pieces were there. It
was a rather alarming situation. Others would have panicked; but not John. As always, he
managed to resolve the situation by completing recovery. Interested to learn how? Read on.
Background
Since controlfile was also damaged, the first task at hand was to restore the controlfile. To restore
the controlfile, John needs a very special information: the DBID--database identifier. This is not
something that would be available until the database is at least mounted. In unmounted state-which is how the database is in right now--John couldn't just go and get it from the database.
Fortunately, he follows a best practice: he records the DBID in a safe place.
This is the command John used to restore the controlfile from the backup. The commands
assume the usage of Data Domain Boost, the media management layer (MML) plugin for Data
Domain backup appliance; but it could apply to any MML--NetBackup, TSM, etc.
SQL> startup nomount;
RMAN> run {
2>
allocate channel c1 type sbt_tape PARMS
'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNI
T=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>
set dbid = 2553024456;
4>
restore controlfile from autobackup;
5>
release channel c1;
6> }
using target database control file instead of recovery catalog
allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1
executing command: SET DBID
Starting restore at 22-APR-14
channel
channel
channel
channel
channel
channel
channel
channel
c1:
c1:
c1:
c1:
c1:
c1:
c1:
c1:
20140422
20140421
20140420
20140419
20140418
20140417
20140416
released channel: c1
RMAN-00571:===========================================================
RMAN-00569:=============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571:===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:08:25
RMAN-06172: no AUTOBACKUP found or specified handle is not a valid copy or
piece
So, RMAN couldn't locate the backup of the controlfile. John knew that by default, RMAN
searches only 7 days of backup. Thinking that perhaps the controlfile was somehow not backed
up in the last seven days, he expanded the search to 20 days, using the special parameter
maxdays, shown below:
RMAN> run {
2>
allocate channel c1 type sbt_tape PARMS
'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNI
T=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>
send 'set username ddboostadmin password password servername
prolin1.proligence.com';
3>
set dbid = 2553024456;
4>
restore controlfile from autobackup maxdays 20;
5>
release channel c1;
6> }
allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1
executing command: SET DBID
Starting restore at 22-APR-14
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: looking for AUTOBACKUP on day:
channel c1: no AUTOBACKUP in 20 days found
20140422
20140421
20140420
20140419
20140418
20140417
20140416
20140415
20140414
20140413
20140412
20140411
20140410
20140409
20140408
20140407
20140406
20140405
20140404
20140403
released channel: c1
RMAN-00571:
RMAN-00569:
RMAN-00571:
RMAN-03002:
RMAN-06172:
piece
===========================================================
=============== ERROR MESSAGE STACK FOLLOWS ===============
===========================================================
failure of restore command at 04/22/2014 16:17:56
no AUTOBACKUP found or specified handle is not a valid copy or
No luck; it gave the same error. So--John concluded--it was not an issue with the absence of
controlfile backup. Something else caused the backup of controlfile to be invisible. He did,
however, know that the controlfiles are backed up along with the regular backups. Without the
database in mounted mode, he couldn't find out the location of those controlfile backups. If this
database was registered to a catalog, he could have got that information from the catalog; but
unfortunately, being a new database, it was not yet registered. That avenue was closed.
He did, however, follow another best practice--saving the rman log files. As a rule, he sends the
RMAN output logs to the tape along with the backup. He recalled the most recent backup log
and checked the log for the name of the backup piece. Here is an excerpt from the log:
... output truncated ...
channel c8: starting piece 1 at 21-APR-14
channel c8: finished piece 1 at 21-APR-14
piece handle=14p69u7q_1_1 tag=TAG20140421T141608 comment=API Version 2.0,MMS
Version 1.1.1.0
channel c8: backup set complete, elapsed time: 00:00:01
channel c5: finished piece 1 at 21-APR-14
piece handle=10p69rhb_1_1 tag=TAG20140421T141608 comment=API Version 2.0,MMS
Version 1.1.1.0
channel c5: backup set complete, elapsed time: 00:47:33
channel c6: finished piece 1 at 21-APR-14
... output truncated ...
Looking at the output, John notes the names of the backup pieces created, listed next to "piece
handle"--14p69u7q_1_1, 10p69rhb_1_1, etc.He still did not know exactly which one contained
the controlfile backup; but it was not difficult to try them one by one. He tried to get the
controlfile from the first backuppiece, using the following command where he used a special
clause: restore controlfile from a location.
RMAN> run {
2>
allocate channel c1 type sbt_tape PARMS
'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNI
T=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>
set dbid = 2553024456;
4>
restore controlfile from '14p69u7q_1_1';
5>
release channel c1;
6> }
allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1
executing command: SET DBID
Starting restore at 22-APR-14
channel c1:
channel c1:
output file
output file
output file
It worked; the controlfile was restored! If it hadn't worked, John would have tried the other
backup pieces one by one until he hit the one with the controlfile backup.
[Update April 27th, 2013 This tip came from a reader Kamil Stawiarski of Poland who is an
Oracle Certified Master (http://education.oracle.com/education/otn/KStawiarski.html). Thank
you, Kamil] If the specific location of the controlfile backup was not known and the backup is on
a disk, John could have used a trick to locate it using the RMAN duplicate command:
c:\> rman auxiliary /
Next, John used the following command (he used the auxiliary database name as ORCL
completely randomly; any name would have been fine, as long as there is no real instance with
that name):
RMAN> duplicate database to orcl backup location='c:\temp\oraback';
Remember, he has no intention of actually duplication. All he wants to know is the location of
the controlfile backup. He gets that from the above output:
restore clone primary controlfile from
'C:\TEMP\oraback\05P6PAN2_1_1.RMAN';
Now he knows the location of the controlfile. He presses Control-C to stop the process. With the
location known, he uses the earlier used command restore controlfile from 'location' to restore
the controlfile.
[Update Apr 27th, 2013: this tip came from Anuj Mohan, another reader from the US. Excellent
tip, Anuj and thank you for sharing]. When RMAN starts, it creates a snaphot controlfile, whose
default location is $ORACLE_HOME/dbs. The snapshot controlfile is usually named with .f as
an extension.
With the controlfile restored, John mounted the database.
RMAN> alter database mount;
database mounted
The rest was easy; all he had to do was to issue "restore database" and "recover database using
backup controlfile". The first thing John did after the database was mounted was checking the
controlfile autobackup setting:
RMAN> show CONTROLFILE AUTOBACKUP;
RMAN configuration parameters for database with db_unique_name prodb3 are:
CONFIGURE CONTROLFILE AUTOBACKUP OFF; #default
Someone suggested that he could have tried to restore the controlfile from the TAG instead of the
actual backup piece. Had he attempted the restore from the TAG, he would have got a different
error:
RMAN> run {
2>
allocate channel c1 type sbt_tape PARMS
'BLKSIZE=1048576,SBT_LIBRARY=/prodb/oradb/db1/lib/libddobk.so,ENV=(STORAGE_UNI
T=DDB01,BACKUP_HOST=prolin1.proligence.com,ORACLE_HOME=/prodb/oradb/db1)';
3>
send 'set username ddboostadmin password password servername
prolin1.proligence.com';
4>
set dbid = 2553024456;
5>
restore controlfile from tag=TAG20140421T141608;
6>
release channel c1;
7> }
allocated channel: c1
channel c1: SID=1045 device type=SBT_TAPE
channel c1: Data Domain Boost API
sent command to channel: c1
executing command: SET DBID
Starting restore at 22-APR-14
released channel: c1
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of restore command at 04/22/2014 16:10:04
RMAN-06563: control file or SPFILE must be restored using FROM AUTOBACKUP
No Backup of Controlfile
Let's consider another scenario: there was no backup of controlfile. Dreaded as it sounds, it's still
not end of the world. John could create the controlfile from a backup located somewhere else.
This special backup could be created by two special commands:
SQL> alter database backup controlfile to '/tmp/cont.dbf';
Database altered.
The above command creates a copy of the controlfile with the data as of the time of the
command. Another command is:
This command creates a text file that you can use as SQL statement (after some minor editing) to
create a controlfile. The major difference between these two approaches is that the first approach
produces a snapshot of the controlfile as of that time, along with all the data--the backup, the
archived logs, etc. The second approach creates a brand new "blank" controfile that you must
feed to bring up. John uses both options as a Plan B. On another post we will see how he saved
the day using these two special controlfile backups.
Takeaways
What did you learn from the story. Here are some key takeaways:
1. Always write down the DBID of all your databases somewhere. If you use a recovery
catalog, it's there; but it's good to note it down separately. This number does not change
unless you use NID utility; so recording once is enough.
2. Always configure controlfile autobackup. The default is OFF; make it ON.
3. Always save the backup log files. In a crunch, they yield valuable information otherwise
not available.
4. When controlfile is not found, you can use restore controlfile from 'location' syntax in
RMAN to pull the controlfile from the location. If that location does not have a
controlfile backup, don't worry; just try all available locations. One might contain what
you are looking for. You have nothing to lose but everything to gain.
5. Always use a script for this type RMAN restore activities instead of typing at the prompt.
You will find changing data, e.g. various backup locations, easier and make less mistakes
that way.
6. Always create a backup controlfile everyday, even if you don't think you need it. You
may someday and you will thank yourself when you do.
A System for Oracle Users and Privileges with Automatic Expiry Dates
Tired of tracking down all the users in the database to deactivate them when they
cease to exist, or change roles, or fulfill their temporary need to the database? Or,
tracking down privileges you granted to existing users at the end of their requested
period? The solution is to think out of the box - developing a system that allows you
to create a database user account with an expiration date. This fire-and-forget
method allows you to create users with the assurance that they will be expired
(locked or dropped) at the expiration date automatically, without your intervention.
Interested? Read on how I developed such a system--along with source code for you
to try.
Introduction
What is a database user? In my opinion, there are two kinds of users:
1. Permanent Residents - those who live in the database forever until there is no
purpose for them. These are non-human users. Typical examples: admins
accounts (sys, system) and applications schemas.
2. Human Users - these are accounts created for real human beings.
It's the second category that is subject to a lot of scrutiny from many sources Payment Card Industry (PCI) mandates, Health Insurance Portability and
Accountability Act (HIPAA), Serbanes-Oxley (SOX), etc. All these mandates and
regulations have one thing in common - the need to identify and regulate the
human users. Common requirements in the mandates include database accounts
should be removed when they leave the organization, they should be validated very
so often (usually 90 days), they should get the privileges which they can justify a
business need for, and so on.
Concept
DBVisitor is a tool to create Oracle database user accounts with an expiration date.
A user in the Oracle database is permanent; there is no such thing as a temporary
user. Using the DBVisitor tool the DBA can create a visitor, which is a regular
database user but with a built-in expiration date (from as little as 5 minutes to as
much as needed) after which the user is either dropped or locked (the exact action
can be defined for each user specifically). This tool can also grant visitor privileges,
which are regular Oracle database privileges such as create table, select on
TableName, etc., with built-in expiration dates, after which the privilege is
automatically revoked. The expiration time can be extended for both the visitor and
the privilege. The tool keeps track of the creation, deletion, re-activation of the
users. The source code as well as all the scripts used in this tool can be downloaded
here.
Components
There are 7 major stored procedures in the tool. (Please Note: I plan to have all
these in a single package in a later release)
ADD_VISITOR
ADD_PRIVILEGE
EXPIRE_VISITORS
EXPIRE_VISITOR_PRI
To revoke the privileges at expiration (via a job)
VS
SEND_REMINDER_E
MAIL
UNLOCK_VISITOR
The actions are recoded in a table called DBVISITOR_EXPIRATION (for visitors) and
DBVISITOR_PRIVS (for the privileges granted). This table is never deleted. When the
expiration date is extended, a new record is inserted and the old record updated, to
leave an audit trail which can be examined later.
How it Works
When a visitor is created by this tool, a record goes into the DBVISITOR_EXPIRATION
table with the expiry date. A job searches that table and when it finds some visitor
whose expiration date is past, inactivates that visitor. The exact actions of
inactivation could be DROP, i.e. the user is completely dropped; or LOCK, i.e. it
is not dropped but its account is locked so it cant log in any more. The latter action
preserves any tables or other objects created by the user; but prevents the login.
The record is marked I (for Inactive). The active visitors are marked with A. The
same mechanism applies to privileges too, except that those records are located in
the table DBVISITOR_PRIVS.
When the expiration time is extended, DBVisitor creates a new record with the new
expiration date and status as A. The status of the old record is updated with the
flag X, for Extended. Similarly, when the account is unlocked, the status is shown
as U in the old record.
Not all parameters to the stored procedures are mandatory. If not specified, they
for
for
for
for
for
for
for
for
username: jsmith
duration: 3
dur_unit: hour
role:
password:
expiration_process:
email: john.smith@proligence.com
comments:
There is a very important things you should note here:we omitted entering some
fields, e.g. password, role, etc. These values are picked up from the default settings.
The default values are defined in the table DBVISITOR_EXPIRATION. At the end, an
email will go out to the visitor and you will see a small confirmation for the user
created:
*
*
*
*
*
UserID
Email
Password
Expires in
Expiry Date
:
:
:
:
:
JSMITH
JOHN.SMITH@PROLIGENCE.COM
changem3
3 HOUR
10/07/13 18:28:10
* Role
: VISITOR
* Expiry Process : DROP
And here is how you will grant a privilege create table to the visitor.
SQL> @addp
Enter value for usrname: jsmith
Enter value for privilege: create table
Enter value for duration: 2
Enter value for duration_unit: hours
* CREATE TABLE
* granted to JSMITH
* until 03/07/13 17:30:58
Note a very important point: we created the visitor for 3 hours but the privilege for
only 2 hours. This is allowed. If you need to add more privileges, just execute
addp.sql for each privilege. Do not give multiple privileges in the script.
Extension
When you need to extend the visit time or the privilege time, use extv.sql and
extp.sql respectively. You can extend the time only if the visitor or the privilege
being extended is active. Here is an example where you extend the visit time of
JSMITH by 2 more hours:
SQL> @extv
Enter value
Enter value
Enter value
Enter value
for
for
for
for
username: jsmith
extend_time: 2
extend_dur: hours
comments: to continue from earlier
*********************************************
*
* Expiration Date Change for JSMITH
* Old: 10/07/13 18:28:10
* New: 10/07/13 20:28:10
*
*********************************************
Updated.
Similarly, to extend the CREATE TABLE privilege to this user by 2 more hours, you
will need to execute the extp.sql script.
SQL> @extp
Enter
Enter
Enter
Enter
Enter
value
value
value
value
value
for
for
for
for
for
username: jsmith
priv_name: create table
extend_time: 2
extend_unit: hours
comments:
*********************************************
*
* Expiration Date Change for JSMITH
* for CREATE TABLE
* Old 10/11/13 14:52:37
* New 10/11/13 16:52:37
*
*********************************************
Updated.
Reporting
To find out the visitors and their privileges, you can select from the tables
DBVISITORS_EXPIRATION and DBVISITORS_PRIVS. To make it easier, three scripts
have been provided:
selv.sql this shows the visitors you have created earlier, along with the
expiration dates. The expired visitors are also shown. Status column shows
Active (A) or Inactive (I). If it shows X, then the visitors time was extended.
Here is a sample report:
SQL> @selv
Expiry
DB User Status Process Created on Expires on Locked on Dropped on changed on
Misc
-------------------- ------ -------- ----------------- --------------------------------- ----------------- -------------------------------------------------------JSMITH A DROP 09/30/13 09:50:30 09/30/13 12:50:30 Change Ticket 3456789
JOHN.SMITH@PROLIGENCE.COM
JOHNSMITH I DROP 09/29/13 21:59:24 09/29/13 23:59:24 09/29/13 23:59:48
ARUP@PROLIGENCE.COM
selp.sql this shows the privileges granted to the visitors, active or not.
Here is a sample report:
selxv.sql this shows the visitors who are expiring in the next <n> hours,
where <n> is something you supply.
Quick Reference
--> To add a visitor: addv.sql Default expiration: 2 hours
To add a privilege: addp.sql
Default expiration: 2 hours
To deactivate a visitor now: update dbvisitor_expiration set expiry_dt = sysdate
-1/24/60; commit;
To delete a privilege now, issue the above update against dbvisitor_privs
Never delete records from these tables.
To extend the visit time: extv.sql
To extend the privilege time: extp.sql
To reduce the expiration (make it expire earlier), extv.sql and extp.sql but use a
-ve num in duration
To get reports on visitors: selv.sql
To get reports on privileges: selp.sql
To get the list of visitors expiring in next <n> hours: selxv.sql
The column STATUS: A active and I Inactive. X the visitor or privilege was
initially granted and then extended.
To unlock a visitor after locked: unlock.sql
Important
You can extend the expiry only if the visitor or the privilege is active (status = A). If
the visitor is already expired, you cant extend it. You must re-add it (in the same
name).
-->
Specification
Here are the descriptions of each of the procedures and tables.
Tables
DBVISITOR_EXPIRATION
Its there to hold the visitor information, as a part of the tool. The term visitor is a
user in the database which has a built-in expiration date after which the user is
either dropped or locked.
Column
Purpose
DBUSER
STATUS
CREATED_DT
EXPIRY_DT
EXP_PROCESS
CHANGE_DT
REMINDER_SENT_DT
The Email ID
LOCKED_DT
DROPPED_DT
COMMENTS
-->
DBVISITOR_PRIVS
This holds temporary privileges granted to the visitor users. The privileges have a
built-in expiration date after which they are revoked automatically.
Column
Purpose
DBUSER
STATUS
EXPIRY_DT
GRANT_DT
CHANGE_DT
REVOKE_DT
PRIV_NAME
COMMENTS
Comments
DBVISITOR_PROPERTIES
Its part of the tool, this stores the default values of various parameters used in the
DBVisitor tool.
--> -->
Column
Purpose
NAME
VALUE
-->
Procedures
ADD_VISITOR
Purpose : Adding a visitor user to the database which has a built-in expiration date
after which the database user account is either locked or dropped, based on
settings.
Usage : This accepts 8 parameters:
p_username = the username to be created. This is prefixed by a
predefined
prefix when the user is created. If omitted, the default is
VISITOR<n> where <n> is a unique number.
p_duration = the duration after which the user is expired
p_dur_unit = the unit in which the above parameter is mentioned. Valid
values
are DAY(S), HOUR(S) and MINUTE(S). Can't exceed 90 days.
p_role
= the role granted to the visitor automatically
p_password = the password to be used for the user. This password is
used only
for initial login. the user must change the password immediately.
p_exp_proc = how the account is to be expired, i.e. LOCK or DROP
p_email
= the email ID of the user. For convenience you can specify SW
for starwoodhotels.com. starwood, sw.com, star will work too.
acn,
acc, accenture will work for accenture.com.
p_comments = any free format comments (up to 2000) chars can be
used.
All these parameters are optional. If omitted, the default values are
picked up
from a table called DBVISITOR_PROPERTIES.
ADD_PRIVILEGE
Purpose
: Adding a database privilege (create session, select on tableName, etc.)
to a visitor user in the database which has a built-in expiration date after which the
privilege is revoked automatically.
Usage
values
are DAY(S), HOUR(S) and MINUTE(S). Can't exceed 90 days.
p_comments = any free format comments (up to 2000) chars can be
used.
All these parameters, except user and privilege, are optional. If omitted,
the default values are picked up from a table called
DBVISITOR_PROPERTIES.
There is no default for the p_comments parameter.
EXTEND_VISIT_TIME
Purpose
: DBVisitor is tool to create a user in the DB with a built-in expiration
after which the database user account is either locked or dropped, based on
settings. This procedure is used to extend that expiration date.
Usage
extended.
p_extend_time = the duration by which the time is to be extended
p_extend_unit = the unit in which the above parameter is mentioned.
Valid values
are DAY(S), HOUR(S) and MINUTE(S). Can't exceed 90 days.
p_comments = any free format comments (up to 2000) chars can be
used.
All these parameters are optional. If omitted, the default values are
picked up
from a table called DBVISITOR_PROPERTIES.
EXTEND_PRIV_TIME
Purpose
: To extend the expiration time for a database privilege (e.g. create
session) of a visitor user in the database which has a built-in expiration date
afterwhich the privilege is revoked automatically.
Usage
values
are DAY(S), HOUR(S) and MINUTE(S). Can't exceed 90 days.
p_comments = any free format comments (up to 2000) chars can be
used.
All these parameters, except user and privilege, are optional. If omitted,
the default values are picked up from a table called
DBVISITOR_PROPERTIES.
There is no default for the p_comments parameter.
UNLOCK_VISITOR
Purpose
:
This procedure is used to unlock the account that was locked earlier by the
tool after it expired. You can only unlock an account; it will not work if
the visitor was dropped. You can set the expiration time (from now) for this
newly unlocked accounts.
Usage
now)
p_extend_unit = the unit in which the above parameter is mentioned.
Valid values
are DAY(S), HOUR(S) and MINUTE(S). Can't exceed 90 days.
p_comments
= any free format comments (up to 2000) chars can be
used.
All these parameters are optional. If omitted, the default values are
picked up
from a table called DBVISITOR_PROPERTIES.
SEND_REMINDER_EMAILS
Purpose
: This stored procedure reads through the dbvisitor_expiration and sends
reminder
emails to the visitors whose account is expiring in stated number of days.
The
reminder is sent only once. The column reminder_sent_dt is populated; and
reminder for that user is not sent again.
Usage
EXPIRE_VISITOR_PRIVS
Purpose
: This stored procedure reads through the dbvisitor_privs and revokes the
privileges for which the expiration date has been past.
Usage
That line shows you when you last logged in successfully. The purpose of that little
output is to alert you about the last time you (the user ARUP) logged in, very similar
to the message you get after logging in to a unix session. If you didn't login earlier,
this message will alert you for possible compromise of your account.
Suppression
What if you don't want to show this timestamp?
Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit
Production
With the Partitioning, OLAP, Advanced Analytics and Real Application
Testing options
The login time has been suppressed, going back to the old behavior.
Each option has its own pros and cons. Let's examine them and see if we can get the right fit for
our specific case.
Distributed Management
Under this model each component of Exadata is managed as an independent entity by a group
traditionally used to manage that type of infrastructure. For instance, the system admins would
manage the Linux OS, overseeing all aspects of it such as creation of users to applying the
patches and RPMs. The storage and database would be managed likewise by the specialist teams.
The benefit of this solution is its seeming simplicity - components are managed by their
respective specialists without a need for advanced training. The only need for training is for
storage, where the Exadata Storage Server commands are new and specific to Exadata.
While this approach seems a nobrainer on surface, it may not be so in reality. Exadata is not just
something patched up from these components; it is an engineered system. There is a huge
meaning behind that qualifier. These components are not designed to act alone; they are put
together to make the entire structure a better database machine. And, note the stress here - not an
application server, not a fileserver, not a mail server; not a general purpose server - but a
database machine alone. This means the individual components - the compute nodes, the storage
servers, the disks, the flashdisk cards and more - are tuned to achieve that overriding objective.
Any incremental tuning in any specific component has to be within the framework of the entire
frame; otherwise it may fail to produce the desired result, or worse, produce undesirable result.
For instance the disks where the database resides are attached to the storage cell servers; not the
database compute nodes. The cell servers, or Cells run Oracle Enterprise Linux, which is very
similar to Red Hat Linux. Under this model of administration, the system admins
are responsible for managing the operating system. A system admin looks at the host and
determines that it is under tuned since the filesystem cache is very low. In a normal Linux
system, that would have been a correct observation; but in Exadata, the database is in ASM and a
filesystem cache is less important. On the other hand, the Cells need the memory to place the
Storage Indexes on the disk contents. Placing a large filesystem cache not only produce nothing
to help the filesystem; but actually hurt the performance for the paging of Storage Indexes.
This is just one example of how the engineered systems are closely interrelated. Assuming they
are separate and assigning multiple groups with different skillsets may not work effectively.
Database Machine Administrator
This is leads to the other approach - making a single group responsible for the entire frame from
storage to the database. The single group would be able to understand the impact of the changes
in one component to the overall effectiveness of the rack and will be in a better position to plan
and manage. The single role that performs the management of Exadata is known as Database
Machine Administrator (DMA).
I can almost hear the questions firing off inside your brain. The most likely question probably is
whether it is even possible to have a single skillset that encompasses storage, system, database
and network.
Yes, it definitely is. Remember, the advantages of an engineered system do not stop at being a
carefully coordinated individual components. Another advantage is the lack of controls in those
components. There are less knobs to turn on each component in an Exadata system. Take for
instance the Operating System. There are two types of servers - the compute nodes and the cells.
In the cells, the activity performed by a system admin is severely limited - almost to the point of
being none. On the compute nodes, the activities are limited as well. The only allowable
activities are - setting up users, setting up email relays, possibly setting up an NFS mount and
handful of more. This can easily be done by a non-expert. One does not have to a System Admin
to manage the servers.
Consider storage, the other important component. Traditionally storage administrators perform
critical functions such as adding disks, carving out LUNs, managing replication for DR and so
on. These functions are irrelevant in Exadata. For instance, the disks are preallocated in Exadata,
the LUNs are created at installation time, there is no replication since the DR is by Data Guard
which at the Oracle database level. One need not be a storage expert to the perform the tasks in
Exadata. Additionally the Storage Admins are experts in the specific brand of storage, e.g. EMC
VMax or IBM XiV. In Exadata, the storage is different from all the other brands your storage
admins may be managing. They have to learn about the Exadata storage anyway; so why not
have someone else, specifically the DMA learn?
Consider Network. In Exadata the network components are very limited since it is only for the
components inside the rack. This reduces the flexibility of the configuration compared to a
regular general purpose network configuration. the special kind of hardware used in Exadata Infiniband - requires some special skills which the network ops folks may have to learn anyway.
So, why not the DMAs instead of them? Besides, Oracle already provides a lot of tools to
manage this layer.
That leaves the most visible component - the database which is, after all, the heart and soul of
Exadata. This layer is amenable to a considerable degree of tuning and the depth of skills in this
layer is vital to managing Exadata effectively. Transferring the skills needed here to a non-DBA
group or individual is difficult, if not impossible. This makes the DBA group the most natural
choice for evolving into the DMA role after absorbing the relevant other skills. The other skills
are not necessarily at par with the administrator of the respective components. For instance the
DMA does not need to be a full scale Linux system admin; but just needs to know a few relevant
concepts, commands and tools to perform the job well. Network management is Exadata is a
fraction of the skills expected from a network admin. The storage management in cell servers are
new to any group; so the DMA will find that as easy as any other group, if not easier.
By understanding the available knobs on all the constituent components of Exadata, the DMA
can be better prepared to be an effective administrator of the Exadata system; not by divvying up
the activities to individual groups which are generally autonomous. The advantages
are particularly seen when troubleshooting or patching Exadata. Hence, I submit here for your
consideration - a new role called DMA (Database Machine Administrator) for the management
of Exadata. The role should have the following skillsets:
60% Database Administration
20% Cell Administration
15% Linux Administration
5% Miscellaneous (Infiniband, network, etc.)
I have written an article series on Oracle Technology Network - Linux for Oracle DBAs. This 5part article series has all the commands an concepts the Oracle DBA should understand about
Linux. I have also written a 4 part article series - Commanding Exadata - for DBAs to learn the
20% cell administration. With these two , you will have everything you need to be a DMA.
Scroll down to the bottom of this page and click on "Collection of Some of My Very Popular
Web Articles" to locate all these articles and more.
Summary
In this blog entry, I argued for creating a single role to manage the Exadata system instead of
multiple groups managing individual parts. Here are the reasons in a nutshell:
1. Exadata is an engineered system where all the components play collaboratively instead of
as islands. Managing them separately may be ineffective and detrimental.
2. The support organizations of components such as Systems, storage, DBA, etc. in an
organizations are designed with a generic purpose in mind. Exadata is not generic. Its
management needs unprecedented close coordination among various groups which may
be new to the organization and perhaps difficult to implement.
3. The needed skillsets are mostly database centric; other components have very little to
manage.
4. These other skills are easy to add to the DBA skills making the natural transition to the
DMA role.
Best of luck in becoming a DMA and implementing Exadata.
Primary Keys Guarantee Uniqueness? Think Again.
When you create a table with a primary key or a unique constraint, Oracle
automatically creates a unique index, to ensure that the column does not contain a
duplicate value on that column; or so you have been told. It must be true because
that is the fundamental tenet of an Oracle database, or for that matter, any
database.
Well, the other day I was checking a table. There is a primary key on the column
PriKey. Here are the rows:
I got two rows with the same value. The table does have a primary key on this
column and it is enforced. I can test it by inserting another record with the same
value - 1:
SQL> insert into TableName values (1,)
It errors with ORA-00001: unique constraint violated error. The question is: why
there are two rows with duplicate values in a column that has an enforced primary
key, and it refuses to accept the very value that is already violated the primary key?
It could be a great interview question, test your mettle; or just entertaining. Read on
for the answer.
Setup
Lets start with creating a table
SQL> create table pktest1 (
2
pk1
number,
3
col2
varchar2(200)
4 );
Table created.
Notice how I deliberately decided not to add a primary key column now. Lets add
some records. Note that I inserted two records with the pk1 value of 1.
SQL> insert into pktest1 values (1,'One');
1 row created.
SQL> insert into pktest1 values (1,'Second One');
1 row created.
SQL> insert into pktest1 values (2,'Two');
1 row created.
SQL> commit;
Commit complete.
Note, I did not use a uniqueness clause; so the index will be created as nonunique. I
can confirm that by checking the status and uniqueness of the index quickly:
SQL> select index_name, status, uniqueness
2 from user_indexes
3 where table_name = 'PKTEST1';
INDEX_NAME
STATUS
UNIQUENES
------------------------------ -------- --------IN_PKTEST1_01
VALID
NONUNIQUE
The constraint creation failed, as expected since there are two rows with the same
value in the column. We have to delete the offending row for this PK to be created. I
should have done that; but instead I used something like the following:
SQL> alter table pktest1 add constraint in_pktest1_01 primary key (pk1)
disable keep index;
Table altered.
The constraint was created, even with the duplicate values! So, how did Oracle
allow that? The statement succeeded because you created the constrained with a
disabled status. You can confirm by checking the status of the constraint:
SQL> select constraint_name, status
2 from user_constraints
3 where table_name = 'PKTEST1';
CONSTRAINT_NAME
STATUS
------------------------------ -------IN_PKTEST1_01
DISABLED
This is where you need to understand a very important attribute of the key
enforcement of Oracle Database through an index. Its true that PK or UK
constraints are enforced through unique indexes. The only reason you the unique
index was created in that case is to enforce the uniqueness. But what if Oracle
already has a unique index on that column? In that case, Oracle decides to
repurpose that index for the primary key. Thats what happened in this case.
But wait, the index we created was non-unique; how did Oracle use it to enforce the
PK? Well, the answer is simple. Oracle simply doesnt care. If an index exists, Oracle
just uses it unique or not.
When you disable the constraint, the purpose of the index is also eliminated and it
is dropped. However, if you pre-created the index, it is not dropped. If the index was
created with the constraint definition, then the keep index clause preserves the
index. You can check that the index still exists even when the PK is gone.
SQL> select index_name, status
2 from user_indexes
3 where table_name = 'PKTEST1';
INDEX_NAME
STATUS
------------------------------ -------IN_PKTEST1_01
VALID
The data still shows duplicate rows. The trick is using the clause novalidate, which
instructs Oracle to skip checking of the existing data in the table. This is why the
duplicate value in the table were tolerated while creating the constraint. The
constraint is still enabled.
SQL> select constraint_name, status
2 from user_constraints
3 where table_name = 'PKTEST1';
CONSTRAINT_NAME
STATUS
------------------------------ -------IN_PKTEST1_01
ENABLED
However, only the existing rows are skipped; the future rows are subject to the
enforcement of the constraint, as shown below:
SQL> insert into pktest1 values (1,'Third One');
insert into pktest1 values (1,'Third One')
*
ERROR at line 1:
ORA-00001: unique constraint (ARUP.IN_PKTEST1_01) violated
This is yet another property of the Oracle database you should be aware of the
presence of NOVALIDATE clause in constraint enablement. This is why there were
Conclusion
Lets summarize what we learned from this:
(1) Just because there is an enabled constraint on a table does not mean that all the
rows will conform to the constraint. For instance, a primary key on a column does
not mean that the column will not contain any duplicate values.
(2) The primary key constraint is enforced through a unique index. However, if
Oracle already finds an index unique or not it uses it for the primary key.
(3) In that case, if the index was initially created as nonunique, it will continue to
show as nonunique; but it will actually be a unique index.
(4) If you disable the primary key constraint, the index, if created by Oracle to
support the PK, is also dropped. To keep the index, use the keep index clause.
(5) When you enable a constraint with the novalidate clause the constraint does
not check for the existing data; so there could be non-conforming values in the
table.
By the way, what is the difference between primary key and unique constraints?
They both enforce uniqueness of values in a column; but unique keys allow nulls
while primary keys don't. Both of them enforce the uniqueness by creating a unique
index.
challenge of indexing the entire World Wide Web in its servers so that it present
search results very, very quickly. Both of these issues represent the issues that
others probably didn't face earlier. Here are the relatively unique aspects of this
data, which are known as the "Three V's of Big Data".
Volume - the sheer mass of the data made it difficult, if not impossible, to
sort through them
Velocity - the data was highly transient. Website logs are relevant only for
that time period; for a different period it was different
Variety - the data was not pre-defined and not quite structured - at least not
the way we think of structure when we think of relational databases
Both these companies realized they are not going to address the challenges using
the traditional relational databases, at least not in the scale they wanted. So, they
developed tools and technologies to address these very concerns. They took a page
from the super-computing paradigm of divide and conquer. Instead of dissecting the
dataset as a whole, they divided it into smaller chunks to be processed by
hundreds, even thousands of small servers. This approach solved three basic,
crippling problems:
1. There was no need to use large servers, which typically costs a lot more than
small servers
2. There was a built-in data redundancy since the data was replicated between
these small servers
3. But the most important, it could scale well, very well simply by adding more
of those small servers
This is the fundamental concept that have rise to Hadoop. But before we cover
that, we need to learn about another important concept.
Name=Value Pairs
A typical relational database works by logically arranging the data into rows and
columns. Here is an example. You decide on a table design to hold your customers,
named simply CUSTOMERS. It has the columns CUST_ID, NAME, ADDRESS, PHONE.
Later, your organization decides to provide some incentives to the spouses as well
and so you added another column - SPOUSE.
Everything was well, until the time you discovered customer 1 and spouse are
divorced and there is a new spouse now. However the company decides to keep the
names of the ex-spouses as well, for marketing analytics. Like the right relational
application, you decide to break SPOUSE away from the main table and create a
new table - SPOUSES, which is a child of CUSTOMERS, joined by CUST_ID. This
requires massive code and database changes; but you survive. Later you had the
same issue with addresses (people have different addresses - work, home, vacation,
etc.) and phone numbers (cell phone, home phone, work phone, assistant's phone,
etc.). So you decide to break them into different tables as well. Again, code and
database changes. But the changes did not stop there. You had to add various
tables to record hobbies, associates, weights, dates of birth - the list is endless.
Every thing you record requires a database change and a code change. But worse not all the tables will be populated for every customer. In your company's quest to
build a 360 degree view of the customer, you collect some information; but there is
no guarantee that all the data points will be gathered. you are left with sparse
tables. Now, suddenly, someone says there is yet another attribute required for the
customer - professional associations. So, off you go - build yet another table,
followed by change control, code changes to incorporate that.
If you look at the scenario above, you will find that the real issue is trying force a
structure around a dataset that inherently unstructured - akin to a square peg in a
round hole. The lack of structure of the data is what makes it agile and useful; but
the lack of structure is also what makes it difficult in a relational database that
demands structure. This is the primary issue if you wan tto capture social media
data - Twitter feeds, Facebook updates, LinkedIn updates and Pinterest posts. It's
impossible to predict in advance, at least accurately, the exact information you will
expect to see in them. So, putting a structure around the data storage not only
makes life difficult for everyone - the DBAs will constantly need to alter the
structures and the developers/designers will constantly wait for the structure to be
in the form they want - slowing down capture and analysis of data.
So, what is the solution. If you think about it, think about how we - human beings process information. Do we parse information in form of rows in some table? Hardly.
We process and store information by associations. For instance, let's say I have a
friend John. I probably have nuggets of information like this:
Last Name = Smith
Lives at = 13 Main St, Anytown, USA
Age = 40
Birth Day = June 7th
Wife = Jane
Child = Jill
Jill goes to school = Top Notch Academy
Jill is in Grade = 3
... and so on. Suppose I meet another person - Martha- who tells me that her child
also goes to Grade 3 in Top Notch Academy. My brain probably goes through a
sequence like this:
Search for "Top Notch Academy"
Found it. It's Jill
Search for Jill.
Found it. She is child of John
Who is John's wife?
Found it. It's Jane.
Where do John and Jill live? ...
And finally, after this processing is all over, I say to Martha as a part of the
coversation - "What a coincidence! Jill - the daughter of my friends John and Jane
Smith goes there are well. Do you know them?". "Yes, I do," replies Martha. "In fact
they are in the same class, that of Mrs Gillen-Heller".
Immediately my brain processed this new piece of information and filed the data as:
Jill's Teacher = Mrs. Gillen-Heller
Jane's Friend = Martha
Martha's Child Goes to = ...
Months later, I meet with Jane and mention to her that I met Martha whose child
went to Mrs. Gillen-Heller's class - the same one as Jill. "Glad you met Martha,", Jane
says. "Oh, Jill is no longer in that class. Now she is in Mr. Fallmeister's class."
Aha! My brain probably stored that information as:
Jill's former teacher = Mrs. Gillen-Heller
This is called storing by a name=value pair. You see I stored the information as a
pair of property and it value. As information goes on, I keep adding more and more
pairs. When I need to retrieve information, I just get the proper property and by
associations, I get all the data I need. But the storing of data by name=value pairs
gives me enormous flexibility in storing all kinds of information without modifying
any data structures I may currently have.
This is also how the Big Data is tamed for processing. Since the data coming of
Twitter, Facebook, LinkedIn, Pinterest, etc. is impossible to categorize in advance, it
will be practically impossible to put it all in the relational format. Therefore, a
name=value pair type storage is the logical step in compiling and collating the data.
the name is also known as "key"; so the model is sometimes called key-value pair.
The value doesn't have to have a datatype. In fact it's probably a BLOB; so anything
can go in there - booking amount, birth dates, comments, XML documents, pictures,
audio and even movies. It provides an immense flexibility in capturing the
information that is inherently unstructured.
NoSQL Database
Now that you know about name value pairs, the next logical question you may have
is - how do we store these? Our thoughts about databases are typically colored by
our long-standing association with relational databases, making them almost
synonymous with the concept of database. Before relational databases were there,
even as a concept, big machines called mainframes ruled the earth. The databases
inside them were stored in hierarchical format. One such database from IBM was
IMS/DB, which was hierarchical. Later, when relational databases were up and
coming, another type of database concept - called a network database - was
developed to compete against it. An example of that category was IDMS (now
owned by Computer Associates) developed for mainframes. The point is, relational
databases were not the answer to all the questions then; and it is clear that they
are not now either.
This leads to the development a different type of database technologies based on
the the key-value model. Relational database systems are queried by SQL language,
which I am sure is familiar to almost anyone reading this blog post. SQL is a setoriented language - it operates on sets of data. In the key-value pair mode,
however, that does not work anymore. Therefore these key-value databases are
usually known as NoSQL, to separate them from the relational SQL-based
counterparts. Since their introduction, some NoSQL databases actually support SQL,
which is why "NoSQL" is not a correct term anymore. Therefore sometimes it is
referred to as "Not only SQL" databases. But the point is that their structure is not
dependent on relational. But how exactly the data is stored is usually left to the
implementer. Some examples are MongoDB, Dynamo, Big Table (from Google) etc.
I would stress here that almost any type of non-relational database can be classified
as NoSQL; not just the name-value pair models. For instance, Object Store, an object
database is also NoSQL. But for this blog post, I am assuming only key-value pair
database as the NoSQL one.
Map/Reduce
Let's summarize what we have learned so far:
1. The key-value pair model in databases offer flexibility in data storage without
the need for a predefined table structure
2. The data can be distributed across many machines where they are
independently processed and then collated.
When the system gets a large chunk of data, e.g. a Facebook feed, the first task is
to break it down to these keys and their corresponding values. After that the values
may be collated for a summary result. The process of dividing the raw data into
meaningful key-value pairs is known as "mapping". Later combining the values to
form summaries, or just eliminating the noise from the data to extract meaningful
information is known as "reducing". For instance you see "Name" and "Customer
Name" in the keys. They mean the same thing; so you reduce them to a single key
value - "Name". These are almost always used together; hence the operation is
known as Map/Reduce.
Here is a very rudimentary but practical example of Map/Reduce. Suppose you get
Facebook feeds and you are expected to find out the total of likes for our company's
recent post. Facebook feed comes in the form of a massive dataset. The first task is
to divide that among many servers - a principle described earlier to make the
process scale well. Once the dataset is divided, each machine run some code to
extract and collate the information and then present the data to some central
coordinator. to collate for the final time. Here is a pseudo-code for the process for
each server doing the processing on a subset of data:
begin
get post
while (there_are_remaining_posts) loop
extract status of "like" for the specific post
if status = "like" then
like_count := like_count + 1
else
no_comment := no_comment + 1
end if
end loop
end
Let's name this program counter(). Counter runs on all the servers, which are called
Nodes. As shown in the figure, there are three nodes. The raw dataset is divided
into three sub-datasets which are then fed to each of the three Nodes. A copy of the
subdataset is kept another server as well. That takes care of redundancy. Each node
perform their computation, send their results to an intermediate result set where
they are collated.
Map/Reduce Processing
How does this help? It does; in many ways. Let's see:
(1) First, since the data is stored in chunks and the copy of a chunk is stored in a
different node, there is built-in redundancy. There is no need to protect the data
being fed since there is a copy available elsewhere.
(2) Second, since the data is available elsewhere, if a node fails, all it needs to done
is that some other nodes will pick up the slack. There is no need to reshuffle or
restart the job.
(3) Third, since the nodes all perform task independently, when the datasize
becomes larger, all you have to do is to add a new node. Now the data will be
divided four ways instead of three and so will be processing load.
This is very similar to parallel query processes in Oracle Databases, with PQ servers
being analogous to nodes.
There are two very important points to note here:
(1) The subset of data each node gets is not needed to be viewed by all the nodes.
Each node gets its own set of data to be processed. A copy of the subset is
maintained in a different node - making the simultaneous access to the data
unnecessary. This means you can have the data in a local storage; not in expensive
SANs. This is not only brings cost significantly down but may performs better as well
due to a local access. As cost of Solid State Devices and flash-based storage
plummets, it could also mean the that the storage cost per performance will be
even better.
(2) The nodes need not be super fast. A relatively simple commodity class server is
enough for the processing as opposed to a large server. Typically servers are priced
for their use, e.g. a Enterprise class server with 32 CPUs is probably roughly
equivalent in performance to eight 4-CPU blades. But the cost of the former is way
more than 8 times the cost of the blade server. This model takes advantage of the
cheaper computers by scaling horizontally; not vertically.
Hadoop
Now that you know how the processing data in parallel and using a concept called
Map/Reduce allows you to shove in several compute intensive applications to
dissect large amounts of data, you will often wonder - there are a lot of moving
parts to be taken care of just to empower this process. In a monolithic server
environment you just have to kick off multiple copies of the program. The Operating
System does the job of scheduling these programs on the available CPUs, taking
them off the CPU (paging) to roll in another process, prevent processes from
corrupting each others' memory, etc. Now that these processes are occurring on
multiple computers, there has to be all these manual processes to make sure they
work. For instance, in this model you have to ensure that the jobs are split between
the nodes reasonably equally, the dataset is split equitably, the queue for feeding
data and getting data back from the Map/Reduce jobs are properly maintained, the
jobs fail over in case of node failure, so on. In short, you need an operating system
of operating systems to manage all these nodes as a monolithic processor.
What would you do if this operating procedures were already defined for you? Well,
that would make things really easy, won't it? you can then focus on what you are
good at - developing the procedures to slice and dice the data and derive
intelligence from it. This "procedure", or the framework is available, and it is called
Hadoop. It's an open source offering, similar to Mozilla and Linux; no single
company has exclusive ownership of it. However, many companies have adopted it
and evolved it into their offerings, similar to Linux distributions such as Red Hat,
SuSe and Oracle Enterprise Linux. Some of those companies are Cloudera,
Hortonworks, IBM, etc. Oracle does not have a Hadoop distribution. Instead it
licenses the one from Cloudera for its own Big Data Appliance. The Hadoop
framework runs on all the nodes of the cluster of nodes and acts as the coordinator.
A very important point to note here that Hadoop is just a framework; not the actual
program that performs Map/Reduce. Compare that to the operating system analogy;
an OS like Windows does not offer a spreadsheet. You will need to either develop or
buy an off the shelf product such as Excel to have that functionality. Similarly,
Hadoop offers a platform to run the Map/Reduce programs that you develop and you
put that logic in the code what you "map" and how you "reduce".
Remember another important advantage you had seen in this model earlier - the
ability to replicate data between multiple nodes so that the failure of a single node
does not cause the processing to be abandoned. This is offered through a new type
of filesystem called Hadoop Distributed Filesystem (HDFS). HDFS, which is a
distributed (and not a clustered) filesystem, by default has 3 copies of data on three
different nodes - two on the same rack and the third on a different rack. The nodes
communicate to each other using a HDFS- specific protocol that is built on TCP/IP.
The nodes are aware of the data present on the other nodes, which is precisely what
allows Hadoop job scheduler to divide the work among the nodes. Oh, by the way,
HDFS is not absolutely required for Hadoop; but as you can see, HDFS is the only
way for Hadoop to know which node has what data for smart job scheduling.
Without it, the division of labor will not be as efficient.
Hive
Now that you learned how Hadoop fills a major void for computations on a massive
dataset, you can't help but see the significance for datawarehouses where massive
datasets are common. Also common are jobs that churn through this data. However,
there is a little challenge. Remember, NoSQL databases mentioned earlier? That
means they do not support SQL. To get the data you have to write a program using
the APIs the vendor supplies. This may reek of COBOL programs of yesteryear
where you had to write a program to get the data out, making it inefficient and
highly programmer driven (although it did strut up job-security, especially during
the Y2K transition times). The inefficacy of the system gave rise to 4th Generation
Languages like SQL, which brought the power of queries to common users, ripping
the power away from programers. In other words, it brought the data and its users
closer, reducing the role the middleman significantly. In datawarehouses, it was
especially true since the power users issued queries to after getting the result from
the previous queries. It was like a conversation - ask a question, get the answer,
formulate your next question - and so on. If the conversation were dependent on
writing programs, it would have been impossible to be effective.
With that in mind, consider the implications of lack of SQL in these databases highly
suitable for datawarehouses. This requirement to write a program to get the data
everytime would have taken it straight to the COBOL-days. Well, not to worry,
Hadoop has another product that allows a SQL-like language called HiveQL. Just as
users could query relational databases with SQL very quickly, HiveQL allows users
to get the data for analytical processing directly. It was initially developed at
Facebook.
Hortonworks - may of the folks who founded this company came from
Google and Yahoo! where they built or added to the building blocks of Hadoop
IBM - they have a suite called Big Insights which has their distribution of
Hadoop. This is one of the very few companies who offer both the hardware
and software. the most impressive feature from IBM is a product called
Streams that can mine data frm a non-structured stream like Facebook in
realtime and send alerts and data feeds to other systems.
EMC
MapR
Conclusion
If the buzzing of the buzz-words surrounding any new technology annoy you and all
you get is tons of websites on the topic but not a small consolidated compilation of
terms, you are just like me. I was frustrated by the lack of information in a digestible
form on these buzzwords that are too important to ignore but would take too much
time to understand fully. This is my small effort to bridge that gap and get you going
on your quest for more information. If you have 100's or 1000's of questions after
reading this, I would congratulate myself - that is precisely what my objective was.
For instance how HiveQL differs from SQL or how Map/Reduce jobs are written these are questions that should be flying in your mind now. My future posts will
cover them and some more topics like HBase, Zookeeper, etc. that will unravel the
mysteries around the technology that is going to be commonplace in the very near
future.
Welcome aboard. I wish you luck in learning. As always, your feedback will be highly
appreciated.
Demonstration
Lets examine this with an example. First, lets check the various pools defined in the database
instance right now:
SQL> show parameter sga_target
NAME TYPE VALUE
------------------------------------ ----------- ----sga_target big integer 0
SQL> show parameter db_cache_size
NAME TYPE VALUE
------------------------------------ ----------- ----db_cache_size big integer 300M
SQL> show parameter streams_pool_size
NAME TYPE VALUE
------------------------------------ ----------- ----streams_pool_size big integer 0
After the Data Pump job is complete, check the size of the buffer cache again:
SQL> show parameter db_cache_size
The buffer cache got compressed from 300 MB earlier to 280 MB. But you didnt do that; Oracle
did it.
Well, where did the 20 MB of missing memory go? Now, check the size of the Streams Pool:
SQL> show parameter streams_pool_size
The Streams Pool was 0 earlier, as you intended it to be; but Oracle allocated 20 MB to it by
stealing that much memory from the buffer cache. The reason: the Streams Pool was used for the
Data Pump Export job, even though it does not sound intuitive. If you check the alert log, you
will see the activity recorded there:
$ adrci
The next question you may be wondering about is why did Oracle decide to give only 20 MB
to the Streams Pool? Why not 100 MB, or 10 MB? Is it dependent on the size of the table being
exported? The answer is no.
Oracle by default gives 10% of the size of the shared pool to the Streams Pool. Let me find out
the size of the shared pool:
SQL> show parameter shared_pool_size
The shared pool is 200 MB. 10% of that is 20 MB, which is how much was assigned to the
Streams Pool. That size is not dependent on the size of the exported data; but the size of the
shared pool.
It's important to understand that the shared pool is used to compute the default size of the streams
pool; the actual memory is carved out of buffer cache; not the shared pool.
If you check the databases operations, you will be able to confirm Oracles adjustment of the
pools:
SQL> select component, oper_type, parameter, initial_size, target_size,
final_size
2 from v$sga_resize_ops
3 order by start_time;
COMPONENT OPER_TYPE PARAMETER INITIAL_SIZE TARGET_SIZE FINAL_SIZE
--------------- ------------- -------------------- ------------ -------------------DEFAULT buffer STATIC db_cache_size 0 314572800 314572800
cache
DEFAULT buffer SHRINK db_cache_size 314572800 293601280 293601280
cache
streams pool GROW streams_pool_size 0 20971520 20971520
The output has been truncated to show only the relevant records. From the output you can see
clearly that the buffer cache was defined statically as 314572800, or 300 MB initially. Later the
buffer cache shrank from 314572800 to 293601280 (about 280 MB). The amount of shrinkage
was 314572800 - 293601280 = 20971520 (or, 20 MB), the exact amount the
streams_pool_size was allocated.
Why this is a problem? Well, the biggest problem is that the buffer cache size is now reduced
without your knowledge. The buffer cache lost 10% of the shared pool. But systems with large
shared pool, it could be substantial. Worse, the amount allocated to Streams Pool remains there;
it is not returned to the buffer cache as you might expect. You have to manually give it back:
SQL> alter system set streams_pool_size = 0;
In case of a RAC database, its possible that only one instance sees this change in Streams Pool
size; the other instances will be unaffected.
It would be prudent to note here that this surprise occurs when you do not use automatic SGA
settings. When auto SGA is used, i.e. sga_target is set to a non-zero value, you give up
complete control to Oracle to manipulate the memory structures. In that case Oracle juggles the
memory between various pools including Streams Pool - without your control anyway.
While it is not very well known, this behavior is not undocumented. Its mentioned in the
Utilities Guide at
http://docs.oracle.com/cd/E11882_01/server.112/e22490/dp_perf.htm#SUTIL973.
Conclusion
Just because you havent defined the streams_pool_size parameter as you dont use
Streams doesn't mean that Oracle will not assign some memory to Streams Pool. Data Pump,
which is frequently used in many databases, uses the Streams Pool and Oracle will assign it as
10% of the size of the shared pool and reduce the buffer cache by that amount to fund the
memory for the Streams Pool. So you should configure the Streams Pool, even if you dont use
Streams, so that Data Pump can use a precisely allocated pool it rather than stealing it from the
Buffer Cache. If you dont do that now, or dont intend to do it, then regularly check the
streams_pool_size value and set it to zero if it is not so.
Application Design is the only Reason for Deadlocks? Think Again
Have you ever seen a message ORA-00060: Deadlock detected and automatically
assumed that it was an application coding issue? Well, it may not be. There are
DBA-related issues and you may be surprised to find out that INSERTs may cause
deadlock. Learn all the conditions that precipitate this error, how to read the
"deadlock graph" to determine the cause, and most important: how to avoid it.
Introduction
I often get a lot of questions in some form or the other like the following:
What's a Deadlock
Deadlock is one of those little understood and often misinterpreted concepts in the
Oracle Database. The word rhymes with locking, so most people assume that it is
some form of row locking. Broadly speaking, its accurate; but not entirely. There
could be causes other than row level locking. This is also often confused by people
new to Oracle technology since the term deadlock may have a different meaning in
other databases. To add to the confusion, Oracles standard response to the problem
is that its an application design issue and therefore should be solved through
application redesign. Well, in a majority of cases application design is a problem;
but not in all cases. In this post, I will describe:
1. Why Deadlocks Occur
2. Primer on Oracle Latching, Locking
3. How to Interpret Deadlock Traces
4. Various Cases of Deadlocks
5. Some Unusual Cases from My Experience
Deadlocks Explained
With two Oracle sessions each locking the resource requested by the other, there
will never be a resolution because both will be hanging denying them the
opportunity to commit ot rollback and therefore releasing the lock. Oracle
automatically detects this deadly embrace and breaks it by forcing one statement
to roll back abruptly (and releasing the lock) and letting the other transaction to
continue.
Here is how a deadlock occurs. Two sessions are involved, doing updates on
different rows, as shown below:
Step Session 1
Session 2
---- ------------------- ----------------1.
Update Row1
(Does not Commit)
2.
Update Row2
(Does not Commit)
3.
Update Row2
4.
Waits on TX Enqueue
5.
Update Row1
At the step 5 above since Row1 is locked by session1, session2 will wait; but this
wait will be forever, since session1 is also waiting and cant perform a commit or
rollback until that wait is over. But session 1's wait will continue to exist until
session 2 commits or rollback - a Catch 22 situation. This situation is a cause
of deadlock and Oracle triggers the ststement at Step 3 to be rolled back (since it
detected that deadlock). Note that only the statement that detected the deadlock is
rolled; the previous statements stay. For instance, update row1 in Step 1 stays.
This is the most common cause of deadlocks and is purely driven by application
design and can only be solved by reducing the possibility of occurence of that
scenario. Now that you understand how a deadlock occurs, we will explore some
other causes of deadlocks. But before that, we will explore different types of locks in
Oracle.
Types of Locks
Database locks are queue-based, i.e. the session first waiting for the lock will get it
first, before another session which started waiting for the same resource after the
first session. The requesters are placed in a queue, hence locks are also called
Enqueues. There are several types of enqueues; but we will focus on row locking,
and specifically only two type of them:
TX this is the row level locking. When a row is locked by a session, this type
of lock is acquired.
When a deadlock occurs and one of the statements gets rolled back, Oracle records
the incident in the alert log. Here is an example entry:
ORA-00060: Deadlock detected. More info in file
/opt/oracle/diag/rdbms/odba112/ODBA112/trace/ODBA112_ora_18301.trc.
Along with the alert log entry, the incident creates a tracefile (as shown above). The
trace file shows valuable information on the deadlock and should be your first stop
in diagnosis. Let's see the various sections of the tracefile:
Deadlock Graph
The first section is important; it shows the deadlock graph. Here are the various
pieces of information on the graph. Deadlock graph tells you which sessions are
involved, what types of locks are being sought after, etc. Let's examine the
deadlock graph, shown in the figure below:
Row Information
The next critical section shows the information on the rows locked during the
activities of the two sessions. From the tracefile you can see the object ID. Using
that, you can get the object owner and the name from the DBA_OBJECTS view. The
information in on rowID is also available here. You can get primary key information
from the object using that rowID.
Process Information
The tracefile also shows the Oracle process information which displays the calling
user. That information is critical since the schema owner may not be the one that
With the information collected from various sections of the deadlock graph, you now
know the following:
The object (the table, materialized view, etc.) whose row was in the deadlock
The machine the session came from with the module, program (e.g.
SQL*Plus) and userid information
Now it is a cinch to know the cause of that deadlock and which specific part of the
application you need to address to fix it.
Other Causes
The case described above is just one type of locking scenario causing deadlocks;
but this is not the only one. Other types of locks also cause deadlocks. These
scenarios are usually difficult to identify and diagnose and are often misinterpreted.
Well, not for you.You will learn how to diagnose these other causes in this post.
These causes include:
1. ITL Waits
2. Bitmap Index Update
3. Direct Path Load
4. Overlapping PK Values
When a session - session1 - wants to lock the row1, it uses the slot#1 of the ITL, as
shown in Figure 3 below. Later, another session session2 updates row2. Since
there is no more ITL slot, Oracle creates a new slot slot#2 for this transaction.
However, at this stage, the block is almost packed. If a third transaction comes in,
there will be no more room for a third ITL slot to be created; causing the session to
wait on ITL. Remember, this new session wants to lock row3, which is not locked by
anyone and could have been locked by the session; but its artificially prevented
from being locked due to the absence of an ITL slot.
OWNER
OBJECT_NAME
VALUE
----------- ------------------------- ---------SYSMAN
MGMT_METRICS_1HOUR_PK
19
ARUP
ARUP
DLT2
DLT1
23
131
If you check the EVENT column of V$SESSION to see which sessions are
experiencing it right now, you will see that the sessions are waiting with the event:
enq: TX - allocate ITL entry.
Deadlock Scenario
Here is the scenario where two sessions cause a deadlock due to ITL shortage.
Imagine two rows row1 and row2 are in the same block. The block is so tightly
packed that only two ITL slots can be created.
Ste Session1
p
1
Session2
At Step 4 Session 2's hang can't be resolved until session 1 releases the lock, which
is not possible since it itself is hanging. This never ending situation is handled by
Oracle by detecting it as a deadlock and killing one of the sessions.
Deadlock Graph
To identify this scenario as the cause of deadlock, look at the deadlock graph. This is
how a deadlock graph looks like when caused by ITL waits.
The absence of row information on one of the sessions is a dead giveaway that this
is a block level issue; not related to specific rows. Here are the clues in this deadlock
graph:
The holders held the lock in "X" (exclusive) mode (this is expected for TX
locks)
However, only one of the waiters is waiting in the "X" mode. The other is
waiting with the "S" (shared) mode, indicating that it's not really a row lock
the session is waiting for.
These clues give you the confirmation that this is an ITL related deadlock; not
because of the application design. Further down the tracefile we see:
As you can see, its not 100% clear from the tracefile that the deadlock was caused
by ITL. However by examining the tracefile we see that the locks are of TX type and
the wait is in the S (shared) mode. This usually indicates ITL wait deadlock. You
can confirm that is the case by checking the ITL shortages on that segment from the
view V$SEGMENT_STATISTICS as shown earlier.
Update on 4/19/2013: [Thanks, Jonathan Lewis] Occasionally you may see two rows
here as well, as a result of a previous wait (e.g. buffer busy wait) on the block which
has not been cleaned out yet. In such a case you will see information on two rows;
but there are some other clues that may point to this cause. The row portion of the
rowid will be 0, meaning it was not a row but the block. The other clue might be that
the row information points to a row that has nothing to do with the SQL statement.
For instance, you may find the row information pointing to a row in table Table1
whereas the SQL statement is "update Table2 set col2 = 'X' where col1 = 2".
The solution is very simple. Just increase the INITRANS value of the table. INITRANS
determines the initial number of ITL slots. Please note, this value will affect only the
new blocks; the old ones will still be left with the old values. To affect the old ones
you can issue ALTER TABLE TableName MOVE to move the tables to nw blocks and
hence new structure.
Deadlock due to Foreign Key
This is a really tricky one; but not impossible to identify. When a key value in parent
table is updatd or a row is deleted, Oracle attempts to takes TM lock on the entire
child table. If an index is present on the foreign key column, then Oracle locates the
corresponding child rows and locks only those rows. The documentation in some
versions may not very clear on this. There is a documentation bug (MOS Bug#
2546492). In the absense of the index, a whole table TM lock may cause a deadlock.
Let's see the scenario when it happens.
Scenario
Here is the scenario when this deadlock occurs.
Step
Session1
Session2
Deadlock Graph
This is how the deadlock graph looks like when caused by unindexed foreign key. As
you can see, the deadlock graph does not clearly say that the issue was to do with
Foreign Key columns not being indexed.Instead, the clues here are:
TM locks for both the sessions, instead of TX. Remember: TM are metadata
related, as opposed to TX, which is a row related lock.
The lock type of holders is Share Exclusive (SX) as opposed to Exclusive (X)
These three clues together show that this deadlock is due to FK contention rather
than the conventional row locks.
So, what do you do? Simple - create the indexes on those FKs and you will not see
this again. As a general rule you should have indexes on FKs anyway; but there are
exceptions, e.g. a table whose parent key is never updated or deleted infrequently
(think a table with country codes, state codes or something pervasive like that). If
you see a lot of deadlocks in those cases, perhaps you should create indexes on
those tables anyway.
Scenario
Ste Session1
p
1
2
3
Session2
Deadlock Graph
As usual, the deadlock graph confirms this condition. Here is how the deadlock
graph looks like:
Lock mode for both the holders and waiters is X (indicating a row lock)
Session2
Update Row 1
(Bitmap index piece is locked)
Update Row2
(Hangs for TX
Row Lock)
3
Update Row2
(Hangs as bitmap index piece is locked by session2 and
can't release until it commits)
Deadlock!
Deadlock Graph
You can confirm this occurrence from readling the deadlock graph.
The lock wait mode is S (shared) but the type of lock is TX rather than TM.
The row information is available but the object ID is not the ID of the table;
but the bitmap index.
The solution to this deadlock is really simple just alter the application logic in such
a way that the two updates will not happen in sequence without commits in
between. If thats not possible, then you have to re-evaluate the need for a bitmap
index. Bitmap indexes are usually for datawarehouse only; not for OLTP.
Deadlock due to Primary Key Overlap
This is a very special case of deadlock, which occurs during inserts; not updates or
deletes. This is probably the only case where inserts cause deadlocks. When you
insert a record into a table but not commit it, the record goes in but a further insert
with the same primary key value waits. This lock is required for Oracle because the
first insert may be rolled back, allowing the second one to pass through. If the first
insert is committed, then the second insert fails with a PK violation. But in the
Scenario
Step
Session1
Session2
Insert PK Col = 2
(Hangs, until Session2 commits)
Insert PK Col = 1
(Hangs and Deadlock)
Deadlock Graph
The deadlock graph looks like the following.
The waiters are waiting for locks in S mode, even when the locks type TX.
The subsequent parts of the tracefile dont show any row information.
However, the latter parts of the tracefile shows the SQL statement, which should be
able to point to the cause of the deadlock as the primary key deadlock. Remember,
this may be difficult to diagnose first since there is no row information. But this is
probably normal since the row is not formed yet (it's INSERT, remember?).
Special Cases
I have encountered some very interesting cases of deadlocks which may be rather
difficult to diagnose. Here are some of these special cases.
Autonomous Transactions
Autonomous transactions are ones that are kicked off form inside another
transaction. The autonomous one follows its own commit, i.e. it can commit
independently of the outer transaction. The autonomous transaction may lock some
records the parent transaction might be interested in and vice versa a perfect
condition for deadlocks. Since the autonomous transactions is triggered by its
parent, the deadlocks are usually difficult to catch.
Here is how the deadlock graph looks like (exceprted from the tracefile)
---------Blocker(s)-------- ---------Waiter(s)--------Resource Name process session holds waits process session holds waits
TX-0005002d-00001a40 17 14 X 17 14 X
session 14: DID 0001-0011-00000077
session 14: DID 0001-0011-00000077
Rows waited on:
Session 14: obj - rowid = 000078D5 - AAAHjVAAHAAAACOAAA
(dictionary objn - 30933, file - 7, block - 142, slot - 0)
Information on the OTHER waiting sessions:
End of information on OTHER waiting sessions.
Here are the interesting things about this deadlock graph, which are clues to
identifying this type of deadlock:
The lock type is TX (row lock) and the mode is "X", which is exclusive. This
indicates a simple row lock.
The row information is not there because the autonomous transaction acts
independently of the parent.
If you see a deadlock graph like this, you can be pretty much assured that
autonomous transactions are to blame.
Update on 4/19/2013. [Thanks, Mohamed Houri] The above cause is not limited to
TX locks; it could happen in TM locks as well. The diagnosis remains the same.
This code locks the rows selected by the parallel query slaves. Since the select is
done in parallel, the PQ slaves distribute the rows to be selected. Therefore the
locking is also distributed among the PQ slaves. Since no two rows are updated by
the same PQ slave (and hence the same session), there is no cause for deadlocks.
However, assume the code is kicked off more than once concurrently. This kicks off
several PQ slaves and many query coordinators. In this case there is no guarantee
that two slaves (from different coordinators) will not pick up the same row. In that
case, you may run into deadlocks.
Freelists
In case of tablespaces defined with manual segment space management, if too
many process freelists are defined, it's possible to run out of transaction freelists,
causing deadlocks.
In Conclusion
The most common cause of deadlocks is the normal row level locking, which is
relatively easy to find. But that's not the only reason. ITL Shortage, Bitmap Index
Locking, Lack of FK Index, Direct Path Load, PK Overlap are also some of the
potential causes. You must check the tracefile and interpret the deadlock graph to
come to a definite conclusion on the cause of the deadlock. Some of the causes,
e.g. ITL shortage, are to do with the schema design; not application design and are
quite easy to solve. Some causes, as in the case of the PK overlap case, INSERTs
cause deadlocks.
I hope you found it useful in diagnosing the deadlock conditions in your system. As
always, your feedback is very much appreciated.
Switching Back to Regular Listener Log Format
Did you ever miss the older listener log file format and want to turn off the ADRstyle log introduced in 11g? Well, it's really very simple.
Problem
Oracle introduced the Automatic Diagnostic Repository (ADR) with Oracle 11g
Release 1. This introduced some type of streamlining of various log and trace files
generated by different Oracle components such as the database, listener, ASM, etc.
this is why you didn't find the alert log in the usual location specified by the familiar
background_dump_dest initialization parameter but in a directory specified by a
diferent parameter - ADR_BASE. Similarly listener logs now go in this format:
$ADR_BASE/tnslsnr//listener/alert/log.xml
Remember, this is in the XML format; not the usual listener.log. The idea was to
present the information in the listener log in a consistent, machine readable format
instead of the usually cryptic inconsistent older listener log format. Here is an
example of the new format:
<msg time='2013-03-31T13:17:22.633-04:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='oradba2'
host_addr='127.0.0.1' version='1'
>
<txt>31-MAR-2013 13:17:22 * service_update * D112D2 * 0</txt>
</msg>
<msg> time='2013-03-31T13:17:25.317-04:00' org_id='oracle' comp_id='tnslsnr'
type='UNKNOWN' level='16' host_id='oradba2'
host_addr='127.0.0.1'
>
<txt>WARNING: Subscription for node down event still pending </txt>
</msg>
Being in XML format, many tools now can be made to read the files unambiguously
since the data is now enclosed within meaningful tags. Additionally the listener log
files (the XML format) is now rotated. After reaching a certain threshold value the
file is renamed to log_1.xml and a new log.xml is created - somewhat akin to the
archived log concept in the case of redo log files.
While it proved useful for new tools, there was also the presence of myriads of tools
that read the older log format perfectly. So Oracle didn't stop the practice of writing
to the old format log. The old format log was still called listener.log but the
directory it is created in is different - $ADR_BASE/tnslsnr/Hostname/listener/trace.
Unfortunately there is no archiving scheme for this file so this simply kept growing.
In the pre-11g days you could temporarily redirect the log to a different location and
archive the old one by setting the following parameter in listener.ora:
log_directory = tempLocation
However, in Oracle 11g R1 and beyond, this will not work; you can't set the location
of the log_directory.
Solution
So, what's the solution? Simple. Just set the following parameter in listener.ora:
diag_adr_enabled_listener = off
This will disable the ADR style logging for the listener. Now, suppose you want to set
the directory to /tmp and log file name to listener_0405.log, add the following into
listener.ora (assuming the name of the listener is "listener"; otherwise make the
necessary change below):
log_file_listener = listener_0405.log
log_directory_listener = /tmp
That's it. the ADR style logging will be permanently be gone and you will be reunited
with your highly missed pre-11g style logging. You can confirm it:
LSNRCTL> status
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
-----------------------Alias
listener
Version
TNSLSNR for Linux: Version 11.2.0.1.0 - Production
Start Date
26-NOV-2012 16:50:58
Uptime
129 days 15 hr. 33 min. 31 sec
Trace Level
off
Security
ON: Local OS Authentication
SNMP
OFF
Listener Parameter File
/opt/oracle/product/11.2.0/grid/network/admin/listener.ora
Listener Log File
/tmp/listener_0405.log
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=EXTPROC1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=oradba2)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
... output truncated ...
Happy logging.
P.S. By the way, you can also change the values by issuing set commands from
LSNRCTL command prompt:
LSNRCTL> set log_file '/tmp'
However, if you have heeded my advice earlier, you might have set
admin_restrictions to ON; so can't use the set command. Instead, you would put the
value in listener.ora and reload the listener for the desired effect.
Why should you set the ADMIN_RESTRICTIONS_LISTENER to ON
Recently someone probably went through the slides of my session on "Real Life DBA
Best Practices" and had a question on OTN forum why I was recommending setting
the parameter to ON, as a best practice. I responded on the forum; but I feel it's
important enough to put it here as well.
As a best practice, I recommend setting this parameter to ON (the default is OFF).
But as I profess, a best practice is not one without a clear explanation. Here is the
explanation.
Over the period of time, the Oracle Database has encountered several security
vulnerabilities, some of them on the listener. Some are related to buffer overflow.
others involve unauthorized access into the listener process itself. Some of the
listener access exploits come from external listener manipulations. Did you know
that you do not need to even log into a server to connect to the listener? As long as
the port the listener is listeneing on is open (and it will be, for obvious reasons) you
can connect to the listener from a remote server.
In 10g, Oracle provided a default mechanism that does not require password from
the oracle user manipulating the listener via online commands. Having said that,
there have been bugs and there will be. Those vulnerabilities usually get fixed later;
but most often the fix does not get to the software quickly enough.
So, what should you do to protect against these vulnerabilities? I consider a simple
thing to do is to remove the possibilty altogether; and that's where the admin
restrictions come into picture. After setting this parameter, you can't dynamically
change the parameter. So, even though a connection is made somehow from an
outside server - bug or not - eliminating the possibilty altogether mitigates the risk.
And, that's why recommend it.
Let's ponder on the problem a little bit more. Is that a problem is setting the
parameter? Absolutely not. When you need to change a parameter, you simply log
on to the server, update the listener.ora and issue "lsnrctl reload". This reloads the
parameter file dynamically. Since you never stopped the listener, you will not see
unsuccessful conection requests from clients. So, it is dynamic. If you are the
oracle user, then you can log on to the server; so there is no issue there.
I advocate this policy rather than dyanamic parameter changes, for these simple
reasons:
(1) It plugs a potential hole dues to remote listener vulnerability attacks, regardless
of the probabilty of that happening.
(2) It forces you to make changes to listener.ora file, which shows the timestamp.
(3) I ask my DBAs to put extensive comments on the parameter files, including the
listener.ora file, to explain the change. I also ask them to comment a previous line
and create a new line with the new value, rather than updating a value directly. This
sort of documentation is a gem during debugging. Changing in the parameter file
allows that, while dynamic change does not.
So, I don't see a single functionality I lose by this practice; and I just showed you
some powerful reasons to adopt this practice. No loss, and some gain, however
small you consider that to be - and that's why I suggest it.
As I mentioned earlier, a best practice is not one without a clear explanation. I hope
this explanation makes it clear.
Quiz: Mystery of Create Table Statement
Happy Friday! I thought I would jumpstart your creative juices with this little, really
simple quiz. While it's trivial, it may not be that obvious to many. See if you can
catch it. Time yourself exactly 1 minute to get the answer. Tweet answer to me
@arupnanda
Here it goes. Database is 11.2.0.3. Tool is SQL*Plus.
The user ARUP owns a procedure that accepts an input string and executes it. Here
is the procedure.
create or replace procedure manipulate_arup_schema
(
p_input_string varchar2
)
is
begin
execute immediate p_input_string;
end;
/
The user ARUP have granted EXECUTE privileges on this to user SCOTT. The idea is
simple: SCOTT can create and drop tables and other objects in ARUP's schema
without requiring the dangerous create any table system privilege.
With this, SCOTT tries to create a table in the ARUP schema:
SQL> exec arup.manipulate_arup_schema ('create table abc (col1 number)')
PL/SQL procedure successfully completed.
The table creation was successul. Now SCOTT tries to create the table in a slightly
different manner:
SQL> exec arup.manipulate_arup_schema ('create table abc1 as select * from
dual');
Huh? After checking you did confirm that the user indeed doesn't have the quota on
tablespace USERS, so the error is genuine; but how did the first table creation
command go through successfully?
Tweet me the answer @arupnanda. Aren't on Twitter? Just post the answer here as a
comment. I will post the answer right here in the evening. Let's see who posts the
first answer. It shouldn't take more than 5 minutes to get the answer.
Have fun.
Update at the end of the Day. Here is the answer:
Hashing
Here is a simple example. Suppose you are negotiating a rate for your baby sitter
and you agreed on an amount - $123. Now you asked the sitter to tell your spouse
that amount. Well, how do you make sure she would mention that very amount?
After all, she has incentive to say a higher amount, doesn't see? She could say that
you agreed on $125 or even $150; your spouse will not be able to ascertain that.
(Imagine for a moment that you don't have access to normal modern technology
like a cellphone, etc. for you to communicate directly with your spouse). So you
develop a simple strategy - you come up with a formula that creates a number from
the amount. It could be as simple as, say, the total of all digits. So your amount $123 becomes:
1+2+3=6
You write that down on a paper, seal it in an envelope and ask the sitter to give it to
your spouse in addition to mentioned the agreed upon amount. You and your spouse
both know this formula; but the sitter doesn't. Suppose she fudges the amount you
agreed on to make it to, say $125. Upon her telling your spouse computes the
magic number:
1+2+5=8
Your spouse will compare this with the number inside the sealed envelope and
immediately come to the conclusion that the amount agreed by you was something
different; not $125. The authenticity of the value is now definitively established to
be false.
This process is called hashing and this magic number is called a hash value. Of
course the hashing process is much more complex than merely adding the digits. I
just wanted to show the concept with a very simple example. The mechanics of the
process, which was simply ading up the digits, is known as the hashing algorithm.
Here are some properties of this hashing process:
(1) The process is one-way. You can determine the hashvalue by adding the digits
(1+2+3); but you can't determine the source number from the hashvalue (6). You
spouse can't determine from the hashvalue mentioned by the sitter what amount
you agreed on. So, it's not the same as encryption, which allows you to decrypt and
come up with the source number.
(2) The purpose is not to store values. It's merely to establish the authenticity. In
this example, your spouse determines that the amount mentioned by the baby
sitter ($125) must be wrong, because its hashvalue would have been 8, not 6. After
that authenticity is established (or rejected, as in this case) the purpose of the
hashvalue cease to exist.
(3) The hashing function is deterministic, i.e. it will always come up with the same
value everytime it is invoked against the same source value.
(4) What if the baby sitter had mentioned $150? The hashvalue, in that case, would
have been 1+5+0 = 6, exactly the hashvalue computed by you. In that case, your
spouse would have determined the value $150 to be authentic, which would have
been wrong. So, it's important that the hashvalue is somewhat unique, to reduce
possibility of two different numbers producing the same result. This is known as
"collision" of hashvalues.
The algorithm is the key to make sure the possibilty of collisions is reduced. There
are several algortihms in use. Two very common ones are MD5 (message digest)
and SHA-1 (secure hash algorithm).
Since the source value can't be computed back from hash value, this is considered
by some as a more secure process than encryption. This process is useful in
situations where the reverse computation of values is not necessary; merely the
matching of hashvalues is needed. One such example is passwords. If you want to
establish that the password entered by the user matches the stored password, all
you have to do is generate the hashvalue and match that with the hashvalue stored
in the database. If they match, you establish that the password is correct; if not
then, well, it's not. This has an inherent security advantage. If someone somehow
manages to read the passwords, all that will be exposed will be the hashvalues of
the passwords; not the actual values of the passwords themselves. As we saw
earlier, it will be impossible to decipher the original password from the hashvalue.
That's why it is common in password storage.
Salt
So, that's great, with some higher degree of security for password store. What's the
problem?
The problem is that the hashvalues are way too predictable. Recall from the
previous section that the hashvalue of a specific input value is always the same
value. Considering the simpel hashfunction (adding digits), the input value $123 will
always return 6 as the hashvalue. Consider this: an adversary can see the
hashvalue and guess the value, as shown below.
Is the input value $120? The hash value is 1+2+0 = 3, which does not match "6", so
it must not be the correct number.
Is it $121? Hash value of 121 is 1+2+1=4, different from 6; so this is not correct
either.
Is it $122? Hashvalue of 122 is 5; so not correct.
Is it $123? Hashvalue is 6. Bingo! The adversary now knows the input value.
In just 4 attempts the adversary figured out the input value from the hashvalue.
Consider this scenario for passwords. The adversary can see the password hash
(from which he can't decipher the password); but he can generate hashes from
multiple nmput strings and check which one matches the stored password
hashvalue. Using the computing power of modern computers this turns almost
trivial. So a hash value is not inherently secure.
What is the solution, then? What if the hash value was not as predictable? If the
hash value generated from an input value were different, it would have been
impossible to match it against some stored value. This element of randomness to an
otherwise deterministic function is briught by introducing a modifier to the process,
called a "salt". Like its real-life namesake, salt adds spice to the hashvalue to give it
a unique "flavor", or a different vlaue. Here is an example where we are storing the
password value "Secret":
hash("Secret") = "X"
hash("Secret") + salt = "Y"
hash("Secret") + salt = "Z"
Everytime the salt is added, a different value is produced. It will not allow the
matching of passwords.
In case of LinkedIn, the passwords were stored without salt. Therefore it was easy
for the adversary to guess the passwords by creating SHA-1 hash values from
known words and comparing against the stored value. Here is a rough pseudo-code:
for w in ( ... list of words ... ) loop
l_hash := hash(w);
if l_hash != stored_value
continue;
else
show "Bingo! The password is '||w
end if;
The password is hashed and thus undecipherable, but we know that SCOTT's
password is "tiger." Therefore, the hash value for "tiger" when userid is "SCOTT" is
F894844C34402B67. Now, if SCOTT's password changes, this hash value also
changes. You can then confirm in the view DBA_USERS to see if SCOTTs password
matches this hash value, which will verify the password as "tiger".
So how can an adversary use this information? It's simple. If he creates the user
SCOTT with the password TIGER, he will come to know the hash values of stored in
the password column. Then he can build a table of such accounts and the hashed
values of the passwords and compare them against the password hashes stored in
the data dictionary. What's worse: he can create this user in any Oracle database;
not necessarly the one he is attacking right now.
This is why you must never use default passwords and easily guessed passwords.
Protection
Now that you know how the adversaries use the password hash to guess passwords.
you should identify all such users and expire them, or force them to change
passwords. How can you get a list of such users?
In Oracle Database 11g, this is easy, almost to the point of being trivial. The
database has a special view, dba_users_with_defpwd, that lists the usernames with
the default passwords. Here is an example usage:
select * from dba_users_with_defpwd;
USERNAME
-----------------------------DIP
MDSYS
XS$NULL
SPATIAL_WFS_ADMIN_USR
CTXSYS
OLAPSYS
OUTLN
OWBSYS
SPATIAL_CSW_ADMIN_USR
EXFSYS
ORACLE_OCM
output truncated
The output clearly shows the usernames that have the default password. You can
join this view with DBA_USERS to check on the status of the users:
select d.username, account_status
from dba_users_with_defpwd d, dba_users u
where u.username = d.username;
USERNAME
-----------------------------PM
OLAPSYS
BI
SI_INFORMTN_SCHEMA
OWBSYS
XS$NULL
ORDPLUGINS
APPQOSSYS
output truncated
ACCOUNT_STATUS
-------------------------------EXPIRED & LOCKED
EXPIRED & LOCKED
EXPIRED & LOCKED
EXPIRED & LOCKED
EXPIRED & LOCKED
EXPIRED & LOCKED
EXPIRED & LOCKED
EXPIRED & LOCKED
Oracle 10g
What if you don't have Oracle 11g?
In January 2006, Oracle made a downloadable utility available for identifying default
passwords and their users. This utility, available via a patch 4926128 is available on
My Oracle Support as described in the document ID 361482/1. As of this writing, the
utility checks a handful of default accounts in a manner similar to that described
above; by the time you read this, however, its functionality may well have
expanded.
Security expert Pete Finnigan has done an excellent job of collecting all such default
accounts created during various Oracle and third-party installations, which he has
exposed for public use in his website, petefinnigan.com. Rather than reinventing the
wheel, we will use Pete's work and thank him profusely. I have changed his original
approach a little bit, though.
First, create the table to store the default accounts and default password:.
CREATE TABLE osp_accounts (
product VARCHAR2(30),
security_level NUMBER(1),
username VARCHAR2(30),
password VARCHAR2(30),
hash_value VARCHAR2(30),
commentary VARCHAR2(200)
);
Then you can load the table using data collected by Pete Finnigan from many
sources. (Download the script script here.) After the table is loaded, you are ready to
search for default passwords. I use a very simple SQL statement to find out the
users:
col password format a20
col account_status format a20
col username format a15
select o.username, o.password, d.account_status
from dba_users d, osp_accounts o
where o.hash_value = d.password
/
USERNAME
--------------CTXSYS
OLAPSYS
DIP
DMSYS
EXFSYS
SYSTEM
WMSYS
XDB
OUTLN
SCOTT
SYS
PASSWORD
-------------------CHANGE_ON_INSTALL
MANAGER
DIP
DMSYS
EXFSYS
ORACLE
WMSYS
CHANGE_ON_INSTALL
OUTLN
TIGER
ORACLE
ACCOUNT_STATUS
-------------------OPEN
OPEN
EXPIRED & LOCKED
OPEN
EXPIRED & LOCKED
OPEN
EXPIRED & LOCKED
EXPIRED & LOCKED
OPEN
OPEN
OPEN
Here you can see some of the most vulnerable of situations, especially the last line,
which where the username says is SYS and the password is "ORACLE" (as is that of
SYSTEM)!! It may not be "change_on_install,", but it's just as predictable.
Action Items
Now that you know how one adversary used the salt-less hashing algorithm to
guess passwords, you have some specific actions to take.
(1) Advocate the use of non-dictionary words. Remember, the adversary can create
passwords and compare the resultant hash against the stored hash to see if they
match. Making it impossible for him to guess the list of such input values makes it
impossible to generate has values.
(2) Immediately check in the database for users with default passwords. Either
change the passwords, or Expire and Lock them.
(3) Whenever you use hashing (and not encryption), use salt, to make sure it is
diffcult, if not impossible for the adversary to guess.
With these two differentiators in place, the tool has great future. Check out everything on this
tool at http://www.oracle.com/technetwork/database/globalization/dmu/overview/index330958.html or just visit the booth at #OOW Demogrounds in Moscone South.
Oh, did I mention that the tool is free?
1
1
1
1
1
1
ananda
oracle
oracle
oracle
oracle
oracle
users
dba
dba
dba
dba
dba
70
132
132
132
132
132
Aug
Aug
Aug
Aug
Aug
Aug
4
4
4
4
4
4
04:02
04:02
04:02
04:02
04:02
04:02
file1
file2
file3
file4
file5
file6
and you need to change the permissions of all the files to match those of file1. Sure, you could
issue chmod 644 * to make that changebut what if you are writing a script to do that, and you
dont know the permissions beforehand? Or, perhaps you are making several permission changes
and based on many different files and you find it infeasible to go though the permissions of each
of those and modify accordingly.
A better approach is to make the permissions similar to those of another file. This command
makes the permissions of file2 the same as file1:
users
dba
70 Aug
132 Aug
4 04:02 file1
4 04:02 file2
The file2 permissions were changed exactly as in file1. You didnt need to get the permissions of
file1 first.
You can also use the same trick in group membership in files. To make the group of file2 the
same as file1, you would issue:
# chgrp --reference file1 file2
# ls -l file[12]
-rw-r--r-1 ananda
users
-rw-r--r-1 oracle
users
70 Aug
132 Aug
4 04:02 file1
4 04:02 file2
Of course, what works for changing groups will work for owner as well. Here is how you can use
the same trick for an ownership change. If permissions are like this:
# ls -l file[12]
-rw-r--r-1 ananda
-rw-r--r-1 oracle
users
dba
70 Aug
132 Aug
4 04:02 file1
4 04:02 file2
70 Aug
132 Aug
4 04:02 file1
4 04:02 file2
This is a trick you can use to change ownership and permissions of Oracle executables in a
directory based on some reference executable. This proves especially useful in migrations where
you can (and probably should) install as a different user and later move them to your regular
Oracle software owner.
More on Files
The ls command, with its many arguments, provides some very useful information on files. A
different and less well known command stat offers even more useful information.
Here is how you can use it on the executable oracle, found under $ORACLE_HOME/bin.
# cd $ORACLE_HOME/bin
# stat oracle
File: `oracle'
Size: 93300148
Device: 343h/835d
Blocks: 182424
Inode: 12009652
IO Block: 4096
Links: 1
Regular File
Access:
Access:
Modify:
Change:
Gid: (
500/
dba)
Note the information you got from this command: In addition to the usual filesize (which you
can get from ls -l anyway), you got the number of blocks this file occupies. The typical Linux
block size is 512 bytes, so a file of 93,300,148 bytes would occupy (93300148/512=) 182226.85
blocks. Since blocks are used in full, this file uses some whole number of blocks. Instead of
making a guess, you can just get the exact blocks.
You also get from the output above the GID and UID of the ownership of the file and the octal
representation of the permissions (6751). If you want to reinstate it back to the same permissions
it has now, you could use chmod 6751 oracle instead of explicitly spelling out the permissions.
The most useful part of the above output is the file access timestamp information. It shows you
that the file was accessed on 2006-08-04 04:30:52 (as shown next to Access:), or August 4,
2006 at 4:30:52 AM. This is when someone started to use the database. The file was modified on
2005-11-02 11:49:47 (as shown next to Modify:). Finally, the timestamp next to Change:
shows when the status of the file was changed.
-f,
a modifier to the stat command, shows the information on the filesystem instead of the file:
# stat -f oracle
File: "oracle"
ID: 0
Namelen: 255
Type: ext2/ext3
Blocks: Total: 24033242
Free: 15419301
Available: 14198462
Inodes: Total: 12222464
Free: 12093976
Another option, -t, gives exactly the same information but on one line:
Size: 4096
# stat -t oracle
oracle 93300148 182424 8de9 500 500 343 12009652 1 0 0 1154682061
1130950187 1130950524 4096
This is very useful in shell scripts where a simple cut command can be used to extract the values
for further processing.
Tip for Oracle Users
When you relink Oracle (often done during patch installations), it moves the existing executables
to a different name before creating the new one. For instance, you could relink all the utilities by
relink utilities
It recompiles, among other things, the sqlplus executable. It moves the exiting executable sqlplus
to sqlplusO. If the recompilation fails for some reason, the relink process renames sqlplusO to
sqlplus and the changes are undone. Similarly, if you discover a functionality problem after
applying a patch, you can quickly undo the patch by renaming the file yourself.
'sqlplusO'
8851
Blocks: 24
IO Block: 4096
343h/835d
Inode: 9125991
Links: 1
(0751/-rwxr-x--x) Uid: ( 500/ oracle)
Gid: (
2006-08-04 05:13:57.000000000 -0400
2005-11-02 11:50:46.000000000 -0500
2005-11-02 11:55:24.000000000 -0500
Regular File
500/
dba)
Regular File
500/
dba)
It shows sqlplusO was modified on November 11, 2005, while sqlplus was modified on August
4, 2006, which also corresponds to the status change time of sqlplusO . It indicates that the
original version of sqlplus was in effect from Nov 11, 2005 to Aug 4, 2006. If you want to
diagnose some functionality issues, this is a great place to start. In addition to the file changes, as
you know the permission's change time, you can correlate it with any perceived functionality
issues.
Another important output is size of the file, which is different9865 bytes for sqlplus as
opposed to 8851 for sqlplusOindicating that the versions are not mere recompiles; they
actually changed with additional libraries (perhaps). This also indicates a potential cause of some
problems.
File Types
When you see a file, how do you know what type of file it is? The command file tells you that.
For instance:
# file alert_DBA102.log
alert_DBA102.log: ASCII text
The file alert_DBA102.log is an ASCII text file. Lets see some more examples:
# file initTESTAUX.ora.Z
initTESTAUX.ora.Z: compress'd data 16 bits
This tells you that the file is a compressed file, but how do you know the type of the file was
compressed? One option is to uncompress it and run file against it; but that would make it
virtually impossible. A cleaner option is to use the parameter -z:
# file -z initTESTAUX.ora.Z
initTESTAUX.ora.Z: ASCII text (compress'd data 16 bits)
This is useful; but what type of file is that is being pointed to? Instead of running file again, you
can use the option -l:
# file -L spfile+ASM.ora.ORIGINAL
spfile+ASM.ora.ORIGINAL: data
This clearly shows that the file is a data file. Note that the spfile is a binary one, as opposed to
init.ora; so the file shows up as data file.
Tip for Oracle Users
Suppose you are looking for a trace file in the user dump destination directory but are unsure if
the file is located on another directory and merely exists here as a symbolic link, or if someone
has compressed the file (or even renamed it). There is one thing you know: its definitely an ascii
file. Here is what you can do:
file -Lz * | grep ASCII | cut -d":" -f1 | xargs ls -ltr
This command checks the ASCII files, even if they are compressed, and lists them in
chronological order.
Comparing Files
How do you find out if two filesfile1 and file2are identical? There are several ways and
each approach has its own appeal.
diff.
The simplest command is diff, which shows the difference between two files. Here are
the contents of two files:
# cat file1
In file1 only
In file1 and file2
# cat file2
In file1 and file2
In file2 only
If you use the diff command,
you will be able to see the difference between the files as shown
below:
# diff file1 file2
1d0
< In file1 only
2a2
> In file2 only
#
In the output, a "<" in the first column indicates that the line exists on the file mentioned first,
that is, file1. A ">" in that place indicates that the line exists on the second file (file2). The
characters 1d0 in the first line of the output shows what must be done in sed to operate on the
file file1 to make it same as file2.
Another option, -y, shows the same output, but side by side:
# diff -y file1 file2 -W 120
In file1 only
In file1 and file2
<
>
The -W option is optional; it merely instructs the command to use a 120-character wide screen,
useful for files with long lines.
If you just want to just know if the files differ, not necessarily how, you can use the -q option.
# diff -q file3 file4
# diff -q file3 file2
Files file3 and file2 differ
Files file3 and file4 are the same so there is no output; in the other case, the fact that the files
differ is reported.
If you are writing a shell script, it might be useful to produce the output in such a manner that it
can be parsed. The -u option does that:
# diff -u file1 file2
--- file1
2006-08-04 08:29:37.000000000 -0400
+++ file2
2006-08-04 08:29:42.000000000 -0400
@@ -1,2 +1,2 @@
-In file1 only
In file1 and file2
+In file2 only
The output shows contents of both files but suppresses duplicates, the + and - signs in the first
column indicates the lines in the files. No character in the first column indicates presence in both
files.
The command considers whitespace into consideration. If you want to ignore whitespace, use the
-b option. Use the -B option to ignore blank lines. Finally, use -i to ignore case.
The diff command can also be applied to directories. The command
diff dir1 dir2
shows the files present in either directories; whether files are present on one of the directories or
both. If it finds a subdirectory in the same name, it does not go down to see if any individual files
differ. Here is an example:
# diff DBA102 PROPRD
Common subdirectories: DBA102/adump and PROPRD/adump
Only in DBA102: afiedt.buf
Only in PROPRD: archive
Only in PROPRD: BACKUP
Only in PROPRD: BACKUP1
Only in PROPRD: BACKUP2
Note that the common subdirectories are simply reported as such but no comparison is made. If
you want to drill down even further and compare files under those subdirectories, you should use
the following command:
diff -r dir1 dir2
This command recursively goes into each subdirectory to compare the files and reports the
difference between the files of the same names.
Tip for Oracle Users
One common use of diff is to differentiate between different init.ora files. As a best practice, I
always copy the file to a new namee.g. initDBA102.ora to initDBA102.080306.ora (to
indicate August 3,2006)before making a change. A simple diff between all versions of the file
tells quickly what changed and when.
This is a pretty powerful command to manage your Oracle home. As a best practice, I never
update an Oracle Home when applying patches. For instance, suppose the current Oracle version
is 10.2.0.1. The ORACLE_HOME could be /u01/app/oracle/product/10.2/db1. When the time
comes to patch it to 10.2.0.2, I dont patch this Oracle Home. Instead, I start a fresh installation
on /u01/app/oracle/product/10.2/db2 and then patch that home. Once its ready, I use the
following:
# sqlplus / as sysdba
and so on.
The purpose of this approach is that the original Oracle Home is not disturbed and I can easily
fall back in case of problems. This also means the database is down and up again, pretty much
immediately. If I installed the patch directly on the Oracle Home, I would have had to shut the
database for a long timefor the entire duration of the patch application. In addition, if the patch
application had failed due to any reason, I would not have a clean Oracle Home.
Now that I have several Oracle Homes, how can I see what changed? Its really simple; I can
use:
diff -r /u01/app/oracle/product/10.2/db1 /u01/app/oracle/product/10.2/db2 |
grep -v Common
This tells me the differences between the two Oracle Homes and the differences between the files
of the same name. Some important files like tnsnames.ora, listener.ora, and sqlnet.ora should not
show wide differences, but if they do, then I need to understand why.
cmp. The command cmp is similar to diff:
# cmp file1 file2
file1 file2 differ: byte 10, line 1
The output comes back as the first sign of difference. You can use this to identify where the files
might be different. Like diff, cmp has a lot of options, the most important being the -s option,
that merely returns a code:
1, if they differ
Here is an example:
# cmp -s file3 file4
# echo $?
0
The special variable $? indicates
the return code from the last executed command. In this case
its 0, meaning the files file1 and file2 are identical.
# cmp -s file1 file2
# echo $?
Summary of Commands in
This Installment
Comman
d
Use
chmod
To change
permissions of a
file, using the -reference
parameter
chown
To change owner of
a file, using the -reference
parameter
chgrp
To change group of
a file, using the -reference
parameter
stat
file
diff
To see the
difference between
two files
cmp
To compare two
files
comm
To see whats
common between
two files, with the
output in three
columns
dba
8851 Aug
dba
8851 Nov
md5sum. This command generates a 32-bit MD5 hash value of the files:
# md5sum file1
ef929460b3731851259137194fe5ac47
file1
Two files with the same checksum can be considered identical. However, the usefulness of this
command goes beyond just comparing files. It can also provide a mechanism to guarantee the
integrity of the files.
Suppose you have two important filesfile1 and file2that you need to protect. You can use the
--check option check to confirm the files haven't changed. First, create a checksum file for both
these important files and keep it safe:
# md5sum file1 file2 > f1f2
Later, when you want to verify that the files are still untouched:
# md5sum --check f1f2
file1: OK
file2: OK
This shows clearly that the files have not been modified. Now change one file and check the
MD5:
# cp file2 file1
# md5sum --check f1f2
file1: FAILED
file2: OK
md5sum: WARNING: 1 of 2 computed checksums did NOT match
In the same line, you can also create MD5 checksums for all executables in
$ORACLE_HOME/bin and compare them from time to time for unauthorized modifications.
Conclusion
Thus far you have learned only some of the Linux commands you will find useful for performing
your job effectively. In the next installment, I will describe some more sophisticated but useful
commands, such as strace, whereis, renice, skill, and more.
Arup Nanda ( arup@proligence.com ) has been an Oracle DBA for more than 12
years, handling all aspects of database administrationfrom performance tuning to
security and disaster recovery. He is a coauthor of PL/SQL for DBAs (O'Reilly Media,
2005), was Oracle Magazine's DBA of the Year in 2003, and is an Oracle ACE.
The Problem
It seems really simple. We have an Oracle database (on all nodes of a full rack Exadata, to be
exact), which a lot of end-users connect to through apps designed in a rather adhoc and
haphazard manner - on Excel spreadsheets, Access forms, TOAD reports and other assorted
tools. We want to control the access from these machines and streamline them.
The database machine sits behind a firewall. To allow the adhoc tools accessing the database
from the client machines mean we have to change the firewall rules. Had it been one or two
clients, it would have been reasonable; but with 1000+ client machines, it becomes impractical.
So I was asked to provide an alternative solution.
The Solution
This is not a unique problem; it's the same problem when the machines need to access resources
that exist across firewalls. The easy solution is to punch a hole through the firewall to allow that
access; but is not desirable for obvious security reasons. A better solution, often implemented, is
to have a proxy server. The proxy sits between the two layers of access and can access the
servers behind the firewall. Clients make the request to the proxy which it passes on to the
server.
Such a proxy solves the problem; but we are looking for a simpler solution. Does one exist?
Yes, it does. The answer is Connection Manager from Oracle. Among its many functions, one
stands out - it acts as a proxy between the different layers of access and passes through the
request. It's not a separate product; but is an option in the Oracle Client software (not the
database or grid infrastructure software). This option is not automatically installed. When
installing the client software, choose "Custom" and explicitly select "Connection Manager" from
the list.
The Architecture
Let's quickly go through the architecture of tool. Assume there are three hosts:
client1 - the machine where the client runs and wants to connect to the database
TNS_REG =
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)(HOST = dbhost1)(PORT = 1521)
)
(CONNECT_DATA =
(SERVICE_NAME=srv1)
)
)
However, this will not work since the client does not even know the routing for dbhost1. Instead
the client connects to the CM host. The connection manager has two processes
The CMGW processes allows the client connections to come in through them. The admin
process manages the gateways. We will cover more on that later.
After setting up the CM processes, you will need to rewrite the TNSNAMES.ORA in the
following way:
TNS_CM =
(DESCRIPTION =
(SOURCE_ROUTE = YES)
(ADDRESS =
(PROTOCOL = TCP)(HOST = cmhost1)(PORT = 1950)
)
(ADDRESS =
(PROTOCOL = TCP)(HOST = dbhost1)(PORT = 1521)
)
(CONNECT_DATA =
(SERVICE_NAME=srv1)
)
)
How it Works
Note the special parameter:
SOURCE_ROUTE = YES
This tells the client connection request to attempt the first address listed first and only then
attempt the next one. This is different from load balance setups where you would expect the
client tool to pick one of the addresses at random. So the client attempts this address first:
(PROTOCOL = TCP)(HOST = cmhost1)(PORT = 1950)
This is the listener for Connection Manager. The clients are allowed to connect to the port 1950
(the port where CM listener listens on) on the host cmhost1.
After that the connection attempts the second address
(PROTOCOL = TCP)(HOST = dbhost1)(PORT = 1521)
However, it will fail since the client does not have access to the port 1521 of the host dbhost1.
This is where CM comes in. The client does not make the request; CM does on behalf of the
client connection that just came in. The connection manager (running on cmhost1) makes the
request with that address. Since the host cmhost1 can access dbhost1 on port 1521, that
connection request goes through successfully. When the response comes back from the database,
CM passes it back to the original client.
A single CM connection can handle many client connection requests.
Setting Up
Now that you know how CM works, let's see how to enable it, step by step.
(1) Install CM, if you don't have already. Check for a file cmctl under $ORACLE_HOME/bin. If
you have it, CM may have been installed already. If not, install CM by running the installer from
Oracle Client (not Database or Grid Infra) software. Choose Custom Install and explicitly choose
Connection Manager.
(2) Go to $OH/network/admin (remember the $OH of the client software home; not the database
home)
(3) You need to create a configuration file called cman.ora. Instead of creating it from scratch, go
to the samples subdirectory and copy the cman.ora sample file back into the admin directory.
(4) In the file cman.ora, make the changes to the following lines. Of course, I assumed cmhost1
as the server running Connection Manager process. Substitute by whatever name you choose for
the CM server. I also assumed you would use port 1950 for the CM listener. It does not have to
be. Whatever you choose will need to be opened in the firewall.
The lines you are changing will be at the beginning and the end of the cman.ora file.
cman_cmhost1 =
(configuration=
(address=(protocol=tcp)(host=cmhost1)(port=1950))
(parameter_list =
...
...
...
# conn_stats = connect_statistics
(rule_list=
(rule=
(src=*)(dst=*)(srv=*)(act=accept)
(action_list=(aut=off)(moct=0)(mct=0)(mit=0)(conn_stats=on))
)
)
)
Keep the remaining lines as is, for now. I will explain the meaning of these parameters later.
(5) Start the CM command line interface by executing "cmctl"
# cmctl
CMCTL for Linux: Version 11.2.0.1.0 - Production on 29-AUG-2011
15:16:01
Copyright (c) 1996, 2009, Oracle.
Trace Level
OFF
Instance Config file
/opt/oracle/product/11gR2/client1/network/admin/cman.ora
Instance Log directory
/opt/oracle/product/11gR2/client1/network/log
Instance Trace directory
/opt/oracle/product/11gR2/client1/network/trace
If it does not start, refer to the troubleshooting section later in this blog.
(9) Make the TNSNAMES.ORA file change at the client as shown earlier.
(10) Make the connection using this new TNS connection alias:
C:\> sqlplus arup/arup@TNS_CM
You should be able to connect to the database server now. Note, you still can't access the
database host directly. If you use the regular TNS connect string - TNS_REG - you will fail. This
new connection was established through the connection manager.
(11) Check the number of connections coming through the CM, using CMCTL tool:
CMCTL:CMAN_cmhost1> show connections
Number of connections: 1.
The command completed successfully.
The output shows there is one connection through the CM listener. As you connect more, you
will see the number next to "Number of connections:" increasing.
That's it. You have successfully configured Connection Manager interface.
Fine Tuning
In the previous setup I asked you to enter some values without really explaining the significance
of them. Let's go through them.
One of the powerful features of the CM interface is to act as sort of a firewall, i.e. allow
connections from/to certain hosts and for specific services. You can define these inside the
RULES_LIST section as shown below:
(rule_list=
(rule=
(src=x)(dst=x)(srv=*)(act=accept)
(action_list=(aut=off)(moct=0)(mct=0)(mit=0)(conn_stats=on))
)
)
Here are the parameters and what they mean:
src = the source server where the connection request would come from. If you want to
leave it unrestricted, use "*", as a wildcard.
dst = the destination server, which is probably the database server the request would go
to. Again, unrestricted access would be given as "*".
On src and dst parameters you can give hostnames, IP addresses as well as wildcards. You would
use this section to allow or deny the access between different servers, making it a really powerful
firewall-like tool.
The action_list parameter allows you to fine tune the actions on the connection.
aut = whether the Oracle Advanced Security Option authentication filter should be
applied. The value shown here is OFF, means this is not to be applied.
moct = after how long the outbound connection established should timeout. The value set
here is 0, means the outbound connection is never to be timed out.
mct = after how long the session should disconnect. The value is 0, i.e. never.
Note the use of parentheses. You can use different rules and actions for each combination of
sources and destinations. It allows you to finetune the access. For instance, database D1 is highly
secure and you would want ASO filter; but not database D2. For the request coming from the
same client, you can have a different set of actions for each destination. For D1, the more secure
database host, you can establish various timeouts.
CMCTL Primer
Now that you know about the CMAN.ORA file, let's see the activities you can perform in
CMCTL. The first command you should explore should be "help".
CMCTL> help
The following operations are available
An asterisk (*) denotes a modifier or extended command:
administer
reload
show*
suspend*
close*
resume*
shutdown
exit
save_passwd
sleep
quit
set*
startup
connections
parameters
version
defaults
rules
events
services
connection_statistics
idle_timeout
log_directory
outbound_connect_timeout
session_timeout
trace_level
Most of these modifiers are self explanatory, e.g. show status will show you the status of CM;
show connections will show connections established through CM, etc.
Troubleshooting
Of course things may not go well the first time. Don't despair. You can perform extensive
diagnostics and enable logging and tracing.
The most common error may come during the startup:
Takeaways
Connection Manager is a great product form Oracle Network family of products that can, among
many things, perform as a connection concentrator from multiple client requests, act as a rule
based mini-firewall for the database requests and act as a proxy between different access
domains. Here you learned how to set it up, fine tune the parameters and manage it effectively.
Hope you liked it. As always, please provide your feedback.
Difference between Select Any Dictionary and Select_Catalog_Role
When you want to give a user the privilege to select from data dictionary and
dynamic performance views such as V$DATAFILE, you have two options:
Did you ever wonder why there are two options for accomplishing the same
objective? Is one of them redundant? Won't it make sense for Oracle to have just
one privilege? And, most important, do these two privileges produce the same
result?
The short answer to the last question is -- no; these two do not produce the same
result. Since they are fundamentally different, there is a place of each of these. One
is not a replacement for the other. In this blog I will explain the subtle but important
differences between the two seemingly similar privileges and how to use them
properly.
Create the Test Case
First let me demonstrate the effects by a small example. Create two users called
SCR and SAD:
SQL> create user scr identified by scr;
SQL> create user sad identified by sad;
Grant the necessary privileges to these users, taking care to grant a different one to
each user.
SQL> grant create session, select any dictionary to sad;
Grant succeeded.
Grant succeeded.
Both users have the privilege to select from the dictionary views as we expected.
So, what is the difference between these two privileges? To understand that, let's
create a procedure on the dictionary tables/views on each schema. Since we will
create the same procedure twice, let's first create a script which we will call p.sql.
Here is the script:
Procedure created.
The procedure was created properly; but when you connect as SCR and execute the
script:
SQL> @p.sql
LINE/COL
-------4/2
6/7
ERROR
-----------------------------------------------PL/SQL: SQL Statement ignored
PL/SQL: ORA-00942: table or view does not exist
That must be perplexing. We just saw that the user has the privilege to select from
the V$SESSION view. You can double check that by selecting from the view one
more time. So, why did it report ORA-942: table does not exist?
Not All Privileges have been Created Equal
The answer lies in the way Oracle performs compilations. To compile a code with a
named object, the user must have been granted privileges by direct grants; not
through the roles. Selecting or performing DML statements do not care how the
privileges were received. The SQL will work as long as the privileges are there. The
privilege SELECT ANY DICTIONARY is a system privilege, similar to create session or
unlimited tablespace. This is why the user SAD, which had the system privilege,
could successfully compile the procedure P.
The user SCR had the role SELECT_CATALOG_ROLE, which allowed it to SELECT from
V$SESSION but not to create the procedure. Remember, to create another object on
the base object, the user must have the direct grant on the base object; not through
a role. Since SCR had the role not the direct grant on V$DATAFILE, it can't compile
the procedure.
So while both the privileges allow the users to select from v$datafile, the role does
not allow the users to create objects; the system privilege does.
Why the Role?
Now that you know how the privileges are different, you maybe wondering why the
role is even there. It seems that the system grant can do everything and there is no
need for a role. Not quite.The role has a very different purpose. Roles provide
privileges; but only when they are enabled. To see what roles are enabled in a
session, use this query:
ROLE
-----------------------------SELECT_CATALOG_ROLE
HS_ADMIN_SELECT_ROLE
2 rows selected.
Just because a role was granted to the user does not necessarily mean that the role
would be enabled. The roles which are marked DEFAULT by the user will be enabled;
the others will not be. Let's see that with an example. As SYS user, execute the
following:
User altered.
Now connect as SCR user and see which roles have been enabled:
no rows selected
None of the roles have been enabled. Why? That's because none of the roles are
default for the user (effected by the alter user statement by SYS). At this point when
you select from a dynamic performance view:
You will get this error because the role is not enabled, or active. Without the role the
user does not have any privilege to select from the data dictionary or dynamic
performance view. To enable the role, the user has to execute the SET ROLE
command:
Role set.
ROLE
-----------------------------SELECT_CATALOG_ROLE
HS_ADMIN_SELECT_ROLE
2 rows selected.
Now the roles have been enabled. Since the roles are not default, the user
must explicitly enable them using the SET ROLE command. This is a very important
characteristic of the roles. We can control how the user will get the privilege. Merely
granting a role to a user will not enable the role; the user's action is required and
that can be done programmatically. In security conscious environments, you may
want to take advantage of that property. A user does not always have the to have to
privilege; but when needed it will be able to do so.
The SET ROLE command is an SQL*Plus command. To call it from SQL, use this:
begin
dbms_session.set_role ('SELECT_CATALOG_ROLE');
end;
You can also set a password for the role. So it will be set only when the correct
password is given;
Role altered.
Role set.
You can also revoke the execute privilege on dbms_session from public. After that
the user will not be able to use it to set the role. You can construct another wrapper
procedure to call it. Inside the wrapper, you can have all sort of checks and
balances to make sure the call is acceptable.
We will close this discussion with a tip. How do you know which roles are default?
Simply use the following query:
GRANTED_ROLE
DEF
------------------------------ --SELECT_CATALOG_ROLE
NO
Update
Conclusion
In this blog entry I started with a simple question - what is the difference between
two seemingly similar privileges - SELECT ANY DICTIONARY and
SELECT_CATALOG_ROLE. The former is a system privilege, which remains active
throughout the sessions and allows the user to create stored objects on objects on
which it has privileges as a result of the grant. The latter is not a system grant; it's a
role which does not allow the grantee to build stored objects on the granted objects.
The role can also be non-default which means the grantee must execute a set role
or equivalent command to enable it. The role can also be password protected, if
desired.
The core message you should get from this is that roles are different from privileges.
Privileges allow you to build stored objects such as procedures on the objects on
which the privilege is based. Roles do not.
Nulls in Ordering
You want to find out the tables with the highest number of rows in a database.
Pretty simple, right? You whip up the following query:
select owner, table_name, num_rows
from dba_tables
order by num_rows;
Whoa! The NUM_ROWS columns comes up with blanks. Actually they are nulls. Why
are they coming up first? This is due to the fact that these tables have not been
analyzed. CRM_ETL seems like an ETL user. The tables with GTT_ in their names
seem to be global temporary table, hence there are no statistics. The others belong
to SYS and SQLTXPLAIN, which are Oracle default users and probably never
analyzed. Nulls are not comparable to actual literals; so they are neither less or
greater than the others. By default the nulls come up first when asking for a ordered
list.
You need to find the tables with the highest number of rows fast. If you scroll down,
you will see these rows; but it will take some time and it makes you impatient.You
can add a new predicate something like: where num_rows is not null; but it's not
really elegant. It will do the null processing. And what if you want the table names
with null num_rows as well? This construct will eliminate that possibility. So, you
need a different approach.
Nulls Last
If you want to fetch the nulls but push them tot he end of the list rather than first,
you should add a new clause to the order by - NULLS LAST, as shown below.
OWNER
---------------CRM_ETL
CRM_ETL
CRM_ETL
TABLE_NAME
NUM_ROWS
--------------- ---------F_SALES_SUMM_01 1664092226
F_SALES_SUMM_02 948708587
F_SALES_SUMM_03 167616243
This solves the problem. The nulls will be shown; but after the last of the rows with
non-null num_rows value.
A question came up on my blog entry http://arup.blogspot.com/2011/01/more-on-interestedtransaction-lists.html. I think the question warrants a more comprehensive explanation instead of
an answer of a few lines. So I decided to create another blog.
Here was the question:
Could you please explain on the scenario when multiple transactions try to update the same row
as well. Will there be any ITL allocated? Yes, I am talking about the real locking scenario.
Paraphrased differently, the reader wants to know what would happen when this series of event
happens:
1. update row 1 (locked by transaction 1, and occupying one ITL slot)
2. update row 2 (locked by transaction 2, occupying a different ITL slot)
3. Transaction 3 now wants to update either row 1 or row 2. It will hang of course. But will
it trigger the creation of a new ITL slot?
I also decided to expand the questions to cover one more scenario. Transaction 4 wants to update
row 1 and row 4 in the same statement. Row 4 is not locked; but row 1 is. So will transaction 4
be allowed to lock row 4, even though the statement itself will hang? Will it trigger the creation
of another ITL?
Examination
Let's examine these question via a case study. To demonstrate, let me create a table with three
rows:
SQL> create table itltest2 (col1 number, col2 number)
2 /
Table created.
SQL> insert into itltest2 values (1,1);
1 row created.
SQL> c/1,1/1,2
1* insert into itltest2 values (1,2)
SQL> /
1 row created.
SQL> c/1,2/2,2
1* insert into itltest2 values (2,2)
SQL> /
1 row created.
SQL> commit;
javascript:void(0)
Now open three sessions and issue different statements
Session1> update itltest2 set col2 = col2 + 1 where col1 = 1;
2 rows updated.
If you check the transaction ID, you will see the transaction details:
SQL> select dbms_transaction.local_transaction_id from dual;
LOCAL_TRANSACTION_ID
------------------------------------------------------------------------------7.10.33260
This will hang. The reason is obvious. The transaction is trying to get a lock on rows 2 and 3.
Since row 2 is already locked by transaction 1, it can't be locked. However, what about row 3? It
should have been able to be locked. Was it locked? Let's make a simple check by updating only
row 3 from another session, which was attempted to be locked by transaction 2.
Session3> update itltest2 set col2 = col2 + 1 where col1 = 2 and col2 = 2;
1 row updated.
We know that there are three transactions and three lock requests. Or, are there? Let's check in
V$TRANSACTION:
SQL> select XIDUSN, XIDSLOT, XIDSQN
2 from v$transaction;
XIDUSN
XIDSLOT
XIDSQN
---------- ---------- ---------7
10
33260
10
4
33214
There are only two transactions that have placed locks. If you combine the XIDUSN, XIDSLOT
and XIDSQN, separated by periods, you will get the transaction ID shown earlier. The
transaction that is hanging has not placed a lock on the row it could have put a lock on. That is
consistent with the concept of statements inside transactions - either all rows will be updated or
none - not in piecemeal. If one of the rows can't be locked, none of the rows will be.
What about ITL slots. Let's see them by doing block dumps. First , we need to know the block
number these rows are in:
Xid
0x000a.004.000081be
0x0007.00a.000081ec
Uba
0x00c004fe.1873.23
0x00c00350.194c.18
Flag
-------
Lck
1
2
Scn/Fsc
fsc 0x0000.00000000
fsc 0x0000.00000000
There are just two ITL slots; not three. Remember the XID column is in hexadecimal. If you
convert the XID columns in the v$transaction view:
SQL> select
2
to_char(XIDUSN,'XXXXXX'),
3
to_char(XIDSLOT,'XXXXXX'),
4
to_char(XIDSQN,'XXXXXX')
5 from v$transaction;
TO_CHAR TO_CHAR TO_CHAR
------- ------- ------7
A
81EC
A
4
81BE
Note how the output matches the entry under the column marked "Xid" in the Itl output. you saw
the same transaction IDs in the same Itl. There are just two ITL slots and each slot points to a
transaction that has placed the lock. The transaction that has not placed the lock is not given an
ITL slot; there is no no need for it.
Lock Change
Now suppose Transactios 1 and 3 ended by either commit or rollback. Transaction 2, which was
hanging until now, will be free to put the locks. Let's see the ITL slots:
Itl
0x01
0x02
Xid
0x0008.00f.0000a423
0x0006.005.0000a43f
Uba
Flag Lck
0x00c013ce.1e11.05 C--0
0x00c008fe.1d22.12 ---2
Scn/Fsc
scn 0x0000.0244bcb2
fsc 0x0000.00000000
If you examine the hexadecimal values of the XID values from V$TRANSACTION,
SQL> select
2
to_char(XIDUSN,'XXXXXX'),
3
4
5
to_char(XIDSLOT,'XXXXXX'),
to_char(XIDSQN,'XXXXXX')
from v$transaction;
This matches the transaction Id we see in the "Xid" column of the ITL slot. The other ITL slot is
now free from any other lock.
COMPLETIO
--------18-JUL-08
19-JUL-08
20-JUL-08
21-JUL-08
22-JUL-08
Bingo! The BCT use ceased from the 20th July date. That was what caused the
whole file to be scanned. But why was it stopped? No one actually stopped it.
Investigating even further, I found from the alert log of Node 1:
Sun Jul 20 00:23:52 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.
From Node 2:
Sun Jul 20 00:23:51 2008
CHANGE TRACKING ERROR in another instance, disabling change tracking
Block change tracking service stopping.
Alert log of Node 3 showed the issue:
Sun Jul 20 00:23:50 2008
Unexpected communication failure with ASM instance:
ORA-12549: TNS:operating system resource quota exceeded
CHANGE TRACKING ERROR 19755, disabling change tracking
Sun Jul 20 00:23:50 2008
Errors in file /xxx/oracle/admin/XXXX/bdump/xxx3_ctwr_20729.trc:
ORA-19755: could not open change tracking file
ORA-19750: change tracking file: '+DG1/change_tracking.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG1/change_tracking.dbf
ORA-12549: TNS:operating system resource quota exceeded
Block change tracking service stopping.
The last message shows the true error. The error was operating system resource
quota exceeded, making the diskgroup unavailable. Since the ASM diskgroup was
down, all the files were also not available, including BCT file. Surprisingly, Oracle
decided to stop BCT altogether rather than report it as a problem and let the user
decide the next steps. So block change tracking was silently disabled and the DBAs
didn't get a hint of that. Ouch!
Resolution
Well, now that we discovered the issue, we took the necessary steps to correct it.
Because of the usual change control process, it took some time to have the change
approved and put in place. We executed the following to put the BCT file.
alter database enable block change tracking using file '+DG1/change_tracking.dbf'
The entry in alert log confirms it (all all nodes)
Block change tracking file is current.
But this does not solve the issue completely. to use block change tracking, there has
to be a baseline, which is generally a full backup. We never take a full backup. We
always take an incremental image copy and then merge to a full backup on a
separate location. So, the first order of business was to take a full backup. After that
we immediately took an incremental. It took just about an hour, down from some
18+ hours earlier.
Here is some analysis. Looking at the backup of just one file - file#1, i.e. SYSTEM
datafile:
select COMPLETION_TIME, USED_CHANGE_TRACKING, BLOCKS, BLOCKS_READ
from v$backup_datafile
where file# = 1
order by 1
/
The output:
COMPLETIO USE BLOCKS BLOCKS_READ
--------- --- ---------- ----------18-AUG-08 NO 31713 524288
18-AUG-08 NO 10960 524288
20-AUG-08 NO 12764 524288
21-AUG-08 NO 5612 524288
22-AUG-08 NO 11089 524288
23-AUG-08 NO 8217 524288
23-AUG-08 NO 8025 524288
25-AUG-08 NO 3230 524288
26-AUG-08 NO 6629 524288
27-AUG-08 NO 11094 524288 <= the filesize was increased 28-AUG-08 NO
3608 786432 29-AUG-08 NO 8199 786432 29-AUG-08 NO 12893 786432 31-AUG08 YES 1798 6055 01-SEP-08 YES 7664 35411
Columns descriptions:
USE - was Block Change Tracking used?
BLOCKS - the number of blocks backed up
BLOCKS_READ - the number of blocks read by the backup
Note, when the BCT was not used, the *entire* file - 524288 blocks - were
being read every time. Of course only a percent of that was being backed up
since that percentage changed; but the whole file was being checked.
After BCT, note how the "blocks read" number dropped dramatically. That is
the magic behind the dropped time.
I wanted to find out exactly how much I/O savings BCT was bringing us. A simple
query would show that:
select sum(BLOCKS_READ)/sum(DATAFILE_BLOCKS)
from v$backup_datafile
where USED_CHANGE_TRACKING = 'YES'
/
The output:
.09581342
That's just 9.58%. After BCT, only 9.58% of the blocks of the datafiles were scanned!
Consider the impact of that. Before BCT, the entire file was scanned for changed
blocks. After BCT, only about 9.58% of the blocks were scanned for changed blocks.
Just 9.58%. How sweet is that?!!!
Here are three representative files:
File# Blocks Read Actual # of blocks Pct Read
----- ------------- ------------------- -------985 109 1254400 .009
986 1 786432 .000
987 1 1048576 .000
Note, files 986 and 987 were virtually unread (only one block was read). Before BCT,
all the 1048576 blocks were read; after BCT only 1 was. This makes perfect sense.
These files are essentially older data; so nothing changes there. RMAN incremental
is now blazing fast because it scans less than 10% of the blocks. The I/O problem
disappered too, making the database performance even better.
So, we started with some random I/O issue, causing a node failure, which led to
increased time for incremental, which was tracjed down to a block change tracking
file being suspended by Oracle silently without raising an error.
Takeaways:
The single biggest takeway you should get is that just because it is defined, don't
get the idea it is going to be there. So, a periodic check for the BCT file is a must. I
will work on developing an automated tool to check for non-use of BCT file. The tool
will essentially issue:
SELECT count(1)
FROM v$backup_datafile
where USED_CHANGE_TRACKING = 'NO'
/
If the output is >1, then an alert should be issued. Material for the next blog.
Thanks for reading.