Anda di halaman 1dari 23

http://www.oracle.com/technetwork/articles/linux/part1-091089.

html
Guide to Advanced Linux Command Mastery
by Arup Nanda
Published August 2006
In Sheryl Calish's excellent article Guide to Linux File Command Mastery," you learned some routine Linux commands, which are especially
valuable for Linux newbies. But now that you have mastered the basics, lets move on to some more sophisticated commands that you will find
extremely useful.
In this four-part series, you will learn some not-so-well-known tricks about various routine commands as well as variations in usage that make
them more useful. As the series progresses, you will learn successively difficult commands to master.
Note that these commands may differ based on the specific version of Linux you use or which specific kernel is compiled, but if so, probably
only slightly.
Painless Changes to Owner, Group, and Permissions
In Sheryl's article you learned how to use chown and chgrp commands to change ownership and group of the files. Say you have several files
like this:
# ls -l
total 8
-rw-r--r-- 1 ananda users 70 Aug 4 04:02 file1
-rwxr-xr-x 1 oracle dba 132 Aug 4 04:02 file2
-rwxr-xr-x 1 oracle dba 132 Aug 4 04:02 file3
-rwxr-xr-x 1 oracle dba 132 Aug 4 04:02 file4
-rwxr-xr-x 1 oracle dba 132 Aug 4 04:02 file5
-rwxr-xr-x 1 oracle dba 132 Aug 4 04:02 file6
and you need to change the permissions of all the files to match those of file1. Sure, you could issue chmod 644 * to make that changebut
what if you are writing a script to do that, and you dont know the permissions beforehand? Or, perhaps you are making several permission
changes and based on many different files and you find it infeasible to go though the permissions of each of those and modify accordingly.
A better approach is to make the permissions similar to those of another file. This command makes the permissions of file2 the same as file1:
chmod --reference file1 file2
Now if you check:
# ls -l file[12]
total 8
-rw-r--r-- 1 ananda users 70 Aug 4 04:02 file1
-rw-r--r-- 1 oracle dba 132 Aug 4 04:02 file2
The file2 permissions were changed exactly as in file1. You didnt need to get the permissions of file1 first.
You can also use the same trick in group membership in files. To make the group of file2 the same as file1, you would issue:
# chgrp --reference file1 file2
# ls -l file[12]
-rw-r--r-- 1 ananda users 70 Aug 4 04:02 file1
-rw-r--r-- 1 oracle users 132 Aug 4 04:02 file2
Of course, what works for changing groups will work for owner as well. Here is how you can use the same trick for an ownership change. If
permissions are like this:
# ls -l file[12]
-rw-r--r-- 1 ananda users 70 Aug 4 04:02 file1
-rw-r--r-- 1 oracle dba 132 Aug 4 04:02 file2
You can change the ownership like this:
# chown --reference file1 file2
# ls -l file[12]
-rw-r--r-- 1 ananda users 70 Aug 4 04:02 file1
-rw-r--r-- 1 ananda users 132 Aug 4 04:02 file2
Note that the group as well as the owner have changed.
Tip for Oracle Users
This is a trick you can use to change ownership and permissions of Oracle executables in a directory based on some reference executable.
This proves especially useful in migrations where you can (and probably should) install as a different user and later move them to your regular
Oracle software owner.
More on Files
The ls command, with its many arguments, provides some very useful information on files. A different and less well known command
stat offers even more useful information.
Here is how you can use it on the executable oracle, found under $ORACLE_HOME/bin.
# cd $ORACLE_HOME/bin
# stat oracle
File: `oracle'
Size: 93300148 Blocks: 182424 IO Block: 4096 Regular File
Device: 343h/835d Inode: 12009652 Links: 1
Access: (6751/-rwsr-s--x) Uid: ( 500/ oracle) Gid: ( 500/ dba)
Access: 2006-08-04 04:30:52.000000000 -0400
Modify: 2005-11-02 11:49:47.000000000 -0500
Change: 2005-11-02 11:55:24.000000000 -0500
Note the information you got from this command: In addition to the usual filesize (which you can get from ls -l anyway), you got the number
of blocks this file occupies. The typical Linux block size is 512 bytes, so a file of 93,300,148 bytes would occupy (93300148/512=) 182226.85
blocks. Since blocks are used in full, this file uses some whole number of blocks. Instead of making a guess, you can just get the exact blocks.
You also get from the output above the GID and UID of the ownership of the file and the octal representation of the permissions (6751). If you
want to reinstate it back to the same permissions it has now, you could use chmod 6751 oracle instead of explicitly spelling out the
permissions.
The most useful part of the above output is the file access timestamp information. It shows you that the file was accessed on 2006-08-04
04:30:52 (as shown next to Access:), or August 4, 2006 at 4:30:52 AM. This is when someone started to use the database. The file was
modified on 2005-11-02 11:49:47 (as shown next to Modify:). Finally, the timestamp next to Change: shows when the status of the file was
changed.
-f, a modifier to the stat command, shows the information on the filesystem instead of the file:
# stat -f oracle
File: "oracle"
ID: 0 Namelen: 255 Type: ext2/ext3
Blocks: Total: 24033242 Free: 15419301 Available: 14198462 Size: 4096
Inodes: Total: 12222464 Free: 12093976
Another option, -t, gives exactly the same information but on one line:
# stat -t oracle
oracle 93300148 182424 8de9 500 500 343 12009652 1 0 0 1154682061
1130950187 1130950524 4096
This is very useful in shell scripts where a simple cut command can be used to extract the values for further processing.
Tip for Oracle Users
When you relink Oracle (often done during patch installations), it moves the existing executables to a different name before creating the new
one. For instance, you could relink all the utilities by
relink utilities
It recompiles, among other things, the sqlplus executable. It moves the exiting executable sqlplus to sqlplusO. If the recompilation fails for
some reason, the relink process renames sqlplusO to sqlplus and the changes are undone. Similarly, if you discover a functionality problem
after applying a patch, you can quickly undo the patch by renaming the file yourself.
Here is how you can use stat on these files:
# stat sqlplus*
File: 'sqlplus'
Size: 9865 Blocks: 26 IO Block: 4096 Regular File
Device: 343h/835d Inode: 9126079 Links: 1
Access: (0751/-rwxr-x--x) Uid: ( 500/ oracle) Gid: ( 500/ dba)
Access: 2006-08-04 05:15:18.000000000 -0400
Modify: 2006-08-04 05:15:18.000000000 -0400
Change: 2006-08-04 05:15:18.000000000 -0400

File: 'sqlplusO'
Size: 8851 Blocks: 24 IO Block: 4096 Regular File
Device: 343h/835d Inode: 9125991 Links: 1
Access: (0751/-rwxr-x--x) Uid: ( 500/ oracle) Gid: ( 500/ dba)
Access: 2006-08-04 05:13:57.000000000 -0400
Modify: 2005-11-02 11:50:46.000000000 -0500
Change: 2005-11-02 11:55:24.000000000 -0500
It shows sqlplusO was modified on November 11, 2005, while sqlplus was modified on August 4, 2006, which also corresponds to the status
change time of sqlplusO . It indicates that the original version of sqlplus was in effect from Nov 11, 2005 to Aug 4, 2006. If you want to
diagnose some functionality issues, this is a great place to start. In addition to the file changes, as you know the permission's change time,
you can correlate it with any perceived functionality issues.
Another important output is size of the file, which is different9865 bytes for sqlplus as opposed to 8851 for sqlplusOindicating that the
versions are not mere recompiles; they actually changed with additional libraries (perhaps). This also indicates a potential cause of some
problems.
File Types
When you see a file, how do you know what type of file it is? The command file tells you that. For instance:
# file alert_DBA102.log
alert_DBA102.log: ASCII text
The file alert_DBA102.log is an ASCII text file. Lets see some more examples:
# file initTESTAUX.ora.Z
initTESTAUX.ora.Z: compress'd data 16 bits
This tells you that the file is a compressed file, but how do you know the type of the file was compressed? One option is to uncompress it and
run file against it; but that would make it virtually impossible. A cleaner option is to use the parameter -z:
# file -z initTESTAUX.ora.Z
initTESTAUX.ora.Z: ASCII text (compress'd data 16 bits)
Another quirk is the presence of symbolic links:
# file spfile+ASM.ora.ORIGINAL
spfile+ASM.ora.ORIGINAL: symbolic link to
/u02/app/oracle/admin/DBA102/pfile/spfile+ASM.ora.ORIGINAL
This is useful; but what type of file is that is being pointed to? Instead of running file again, you can use the option -l:
# file -L spfile+ASM.ora.ORIGINAL
spfile+ASM.ora.ORIGINAL: data
This clearly shows that the file is a data file. Note that the spfile is a binary one, as opposed to init.ora; so the file shows up as data file.
Tip for Oracle Users
Suppose you are looking for a trace file in the user dump destination directory but are unsure if the file is located on another directory and
merely exists here as a symbolic link, or if someone has compressed the file (or even renamed it). There is one thing you know: its definitely
an ascii file. Here is what you can do:
file -Lz * | grep ASCII | cut -d":" -f1 | xargs ls -ltr
This command checks the ASCII files, even if they are compressed, and lists them in chronological order.
Comparing Files
How do you find out if two filesfile1 and file2are identical? There are several ways and each approach has its own appeal.
diff. The simplest command is diff, which shows the difference between two files. Here are the contents of two files:
# cat file1
In file1 only
In file1 and file2
# cat file2
In file1 and file2
In file2 only
If you use the diff command, you will be able to see the difference between the files as shown below:
# diff file1 file2
1d0
< In file1 only
2a2
> In file2 only
#
In the output, a "<" in the first column indicates that the line exists on the file mentioned first,that is, file1. A ">" in that place indicates that the
line exists on the second file (file2). The characters 1d0 in the first line of the output shows what must be done in sed to operate on the file
file1 to make it same as file2.
Another option, -y, shows the same output, but side by side:
# diff -y file1 file2 -W 120
In file1 only <
In file1 and file2 In file1 and file2
> In file2 only

The -W option is optional; it merely instructs the command to use a 120-character wide screen, useful for files with long lines.
If you just want to just know if the files differ, not necessarily how, you can use the -q option.
# diff -q file3 file4
# diff -q file3 file2
Files file3 and file2 differ
Files file3 and file4 are the same so there is no output; in the other case, the fact that the files differ is reported.
If you are writing a shell script, it might be useful to produce the output in such a manner that it can be parsed. The -u option does that:
# diff -u file1 file2
--- file1 2006-08-04 08:29:37.000000000 -0400
+++ file2 2006-08-04 08:29:42.000000000 -0400
@@ -1,2 +1,2 @@
-In file1 only
In file1 and file2
+In file2 only
The output shows contents of both files but suppresses duplicates, the + and - signs in the first column indicates the lines in the files. No
character in the first column indicates presence in both files.
The command considers whitespace into consideration. If you want to ignore whitespace, use the -b option. Use the -B option to ignore blank
lines. Finally, use -i to ignore case.
The diff command can also be applied to directories. The command
diff dir1 dir2
shows the files present in either directories; whether files are present on one of the directories or both. If it finds a subdirectory in the same
name, it does not go down to see if any individual files differ. Here is an example:
# diff DBA102 PROPRD
Common subdirectories: DBA102/adump and PROPRD/adump
Only in DBA102: afiedt.buf
Only in PROPRD: archive
Only in PROPRD: BACKUP
Only in PROPRD: BACKUP1
Only in PROPRD: BACKUP2
Only in PROPRD: BACKUP3
Only in PROPRD: BACKUP4
Only in PROPRD: BACKUP5
Only in PROPRD: BACKUP6
Only in PROPRD: BACKUP7
Only in PROPRD: BACKUP8
Only in PROPRD: BACKUP9
Common subdirectories: DBA102/bdump and PROPRD/bdump
Common subdirectories: DBA102/cdump and PROPRD/cdump
Only in PROPRD: CreateDBCatalog.log
Only in PROPRD: CreateDBCatalog.sql
Only in PROPRD: CreateDBFiles.log
Only in PROPRD: CreateDBFiles.sql
Only in PROPRD: CreateDB.log
Only in PROPRD: CreateDB.sql
Only in DBA102: dpdump
Only in PROPRD: emRepository.sql
Only in PROPRD: init.ora
Only in PROPRD: JServer.sql
Only in PROPRD: log
Only in DBA102: oradata
Only in DBA102: pfile
Only in PROPRD: postDBCreation.sql
Only in PROPRD: RMANTEST.sh
Only in PROPRD: RMANTEST.sql
Common subdirectories: DBA102/scripts and PROPRD/scripts
Only in PROPRD: sqlPlusHelp.log
Common subdirectories: DBA102/udump and PROPRD/udump
Note that the common subdirectories are simply reported as such but no comparison is made. If you want to drill down even further and
compare files under those subdirectories, you should use the following command:
diff -r dir1 dir2
This command recursively goes into each subdirectory to compare the files and reports the difference between the files of the same names.
Tip for Oracle Users
One common use of diff is to differentiate between different init.ora files. As a best practice, I always copy the file to a new namee.g.
initDBA102.ora to initDBA102.080306.ora (to indicate August 3,2006)before making a change. A simple diff between all versions of the
file tells quickly what changed and when.
This is a pretty powerful command to manage your Oracle home. As a best practice, I never update an Oracle Home when applying patches.
For instance, suppose the current Oracle version is 10.2.0.1. The ORACLE_HOME could be /u01/app/oracle/product/10.2/db1. When the time
comes to patch it to 10.2.0.2, I dont patch this Oracle Home. Instead, I start a fresh installation on /u01/app/oracle/product/10.2/db2 and then
patch that home. Once its ready, I use the following:
# sqlplus / as sysdba
SQL> shutdown immediate
SQL> exit
# export ORACLE_HOME=/u01/app/oracle/product/10.2/db2
# export PATH=$ORACLE_HOME/bin:$PATH
# sqlplus / as sysdba
SQL> @$ORACLE_HOME/rdbms/admin/catalog
...
and so on.
The purpose of this approach is that the original Oracle Home is not disturbed and I can easily fall back in case of problems. This also means
the database is down and up again, pretty much immediately. If I installed the patch directly on the Oracle Home, I would have had to shut the
database for a long timefor the entire duration of the patch application. In addition, if the patch application had failed due to any reason, I
would not have a clean Oracle Home.
Now that I have several Oracle Homes, how can I see what changed? Its really simple; I can use:
diff -r /u01/app/oracle/product/10.2/db1 /u01/app/oracle/product/10.2/db2 |
grep -v Common
This tells me the differences between the two Oracle Homes and the differences between the files of the same name. Some important files like
tnsnames.ora, listener.ora, and sqlnet.ora should not show wide differences, but if they do, then I need to understand why.
cmp. The command cmp is similar to diff:
# cmp file1 file2
file1 file2 differ: byte 10, line 1
The output comes back as the first sign of difference. You can use this to identify where the files might be different. Like diff, cmp has a lot
of options, the most important being the -s option, that merely returns a code:
0, if the files are identical
1, if they differ
Some other non-zero number, if the comparison couldnt be made
Here is an example:
# cmp -s file3 file4
# echo $?
0
The special variable $? indicates the return code from the last executed command. In this case its 0, meaning the files file1 and file2 are
identical.
# cmp -s file1 file2
# echo $?
1
means file1 and file2 are not the same.
This property of cmp can prove very useful in shell scripting where you merely want to check if two files differ in any way, but not necessarily
check what the difference is. Another important use of this command is to compare binary files, where diff may not be reliable.
Tip for Oracle Users
Recall from a previous tip that when you relink Oracle executables, the older version is kept prior to being overwritten. So, when you relink, the
executable sqlplus is renamed to sqlplusO and the newly compiled sqlplus is placed in the $ORACLE_HOME/bin. So how do you ensure
that the sqlplus that was just created is any different? Just use:
# cmp sqlplus sqlplusO
sqlplus sqlplusO differ: byte 657, line 7
If you check the size:
# ls -l sqlplus*
-rwxr-x--x 1 oracle dba 8851 Aug 4 05:15 sqlplus
-rwxr-x--x 1 oracle dba 8851 Nov 2 2005 sqlplusO
Even though the size is the same in both cases, cmp proved that the two programs differ.
comm. The command comm is similar to the others but the output comes in three columns, separated by tabs. Here is an example:
# comm file1 file2
In file1 and file2
In file1 only
In file1 and file2
In file2 only
This command is useful when you may want to see the contents of a file not in the
other, not just a differencesort of a MINUS utility in SQL language. The option - Summary of Commands in This Installment
1suppresses the contents found in first file:
# comm -1 file1 file2 Command Use
chmod To change permissions of a file, using the
In file1 and file2
- -reference parameter
In file2 only
chown To change owner of a file, using the - -
md5sum. This command generates a 32-bit MD5 hash value of the files:
reference parameter
# md5sum file1
chgrp To change group of a file, using the - -
ef929460b3731851259137194fe5ac47 file1
reference parameter
Two files with the same checksum can be considered identical. However, the
stat To find out about the extended attributes
usefulness of this command goes beyond just comparing files. It can also provide a
mechanism to guarantee the integrity of the files. of a file, such as date last accessed
Suppose you have two important filesfile1 and file2that you need to protect. You file To find out about the type of file, such
can use the --check option check to confirm the files haven't changed. First, create ASCII, data, and so on
a checksum file for both these important files and keep it safe: diff To see the difference between two files
# md5sum file1 file2 > f1f2 cmp To compare two files
Later, when you want to verify that the files are still untouched: comm To see whats common between two
# md5sum --check f1f2 files, with the output in three columns
file1: OK md5sum To calculate the MD5 hash value of files,
file2: OK used to determine if a file has changed
This shows clearly that the files have not been modified. Now change one file and
check the MD5:
# cp file2 file1
# md5sum --check f1f2
file1: FAILED
file2: OK
md5sum: WARNING: 1 of 2 computed checksums did NOT match
The output clearly shows that file1 has been modified.
Tip for Oracle Users
md5sum is an extremely powerful command for security implementations. Some of the configuration files you manage, such as listener.ora,
tnsnames.ora, and init.ora, are extremely critical in a successful Oracle infrastructure and any modification may result in downtime. These are
typically a part of your change control process. Instead of just relying on someones word that these files have not changed, enforce it using
MD5 checksum. Create a checksum file and whenever you make a planned change, recreate this file. As a part of your compliance, check this
file using the md5sum command. If someone inadvertently updated one of these key files, you would immediately catch the change.
In the same line, you can also create MD5 checksums for all executables in $ORACLE_HOME/bin and compare them from time to time for
unauthorized modifications.
Conclusion
Thus far you have learned only some of the Linux commands you will find useful for performing your job effectively. In the next installment, I
will describe some more sophisticated but useful commands, such as strace, whereis, renice, skill, and more.

Guide to Advanced Linux Command Mastery, Part 2

by Arup Nanda
Published February 2007
In Part 1 of the series, you learned some useful commands not so widely known and some of the often used commands but not-
so-well-known parameters to do your job more efficiently. Continuing on the series, now you will learn some slightly more
advanced Linux commands useful for Oracle users, whether developers or DBAs.
alias and unalias
Suppose you want to check the ORACLE_SID environment variable set in your shell. You will have to type:
echo $ORACLE_HOME
As a DBA or a developer, you frequently use this command and will quickly become tired of typing the entire 16 characters. Is
there is a simpler way?
There is: the alias command. With this approach you can create a short alias, such as "os", to represent the entire
command:
alias os='echo $ORACLE_HOME'
Now whenever you want to check the ORACLE_SID, you just type "os" (without the quotes) and Linux executes the aliased
command.
However, if you log out and log back in, the alias is gone and you have to enter the alias command again. To eliminate this
step, all you have to do is to put the command in your shell's profile file. For bash, the file is .bash_profile (note the
period before the file name, that's part of the file's name) in your home directory. For bourne and korn shells,
it's .profile, and for c-shell, .chsrc.
You can create an alias in any name. For instance, I always create an alias for the command rm as rm -i, which makes
the rm command interactive.
alias rm=rm -i
Whenever I issue an rm command, Linux prompts me for confirmation, and unless I provide "y", it doesn't remove the file
thus I am protected form accidentally removing an important file. I use the same for mv (for moving the file to a new
name), which prevents accidental overwriting of existing files, and cp (for copying the file).
Here is a list of some very useful aliases I like to define:
alias bdump='cd $ORACLE_BASE/admin/$ORACLE_SID/bdump'
alias l='ls -d .* --color=tty'
alias ll='ls -l --color=tty'
alias mv='mv -i'
alias oh='cd $ORACLE_HOME'
alias os='echo $ORACLE_SID'
alias rm='rm -i'
alias tns='cd $ORACLE_HOME/network/admin'
To see what aliases have been defined in your shell, use alias without any parameters.
However, there is a small problem. I have defined an alias, rm, that executes rm -i. This command will prompt for my
confirmation every time I try to delete a file. But what if I want to remove a lot of files and am confident they can be
deleted without my confirmation?
The solution is simple: To suppress the alias and use the command only, I will need to enter two single quotes:
$ ''rm *
Note, these are two single quotes (') before the rm command, not two double quotes. This will suppress the alias rm. Another
approach is to use a backslash (\):
$ \rm *
To remove an alias previously defined, just use the unalias command:
$ unalias rm
ls
The humble ls command is frequently used but rarely to its full potential. Without any options, ls merely displays all
files and directories in tabular format.
$ ls
admin has mesg precomp
apex hs mgw racg
assistants install network rdbms

... output snipped ...

To show them in a list, use the -1 (this is the number 1, not the letter "l") option.
$ ls -1
admin
apex
assistants

... output snipped ...

This option is useful in shell scripts where the files names need to be fed into another program or command for manipulation.
You have most definitely used the -l (the letter "l", not the number "1") that displays all the attributes of the files and
directories. Let's see it once again:
$ ls -l
total 272
drwxr-xr-x 3 oracle oinstall 4096 Sep 3 03:27 admin
drwxr-x--- 7 oracle oinstall 4096 Sep 3 02:32 apex
drwxr-x--- 7 oracle oinstall 4096 Sep 3 02:29 assistants
The first column shows the type of file and the permissions on it: "d" means directory, "-" means a regular file, "c" means a
character device, "b" means a block device, "p" means named pipe, and "l" (that's a lowercase letter L, not I) means symbolic
link.
One very useful option is --color, which shows the files in many different colors based on the type of file. Here is an
example screenshot:
Note that files file1 and file2 are regular files. link1 is a symbolic link, shown in red; dir1 is a directory, shown in
yellow; and pipe1 is a named pipe, shown in different colors for easier identification.
In some distros, the ls command comes pre-installed with an alias (described in the previous section) as ls --color; so
you can see the files in color when you type "ls". This approach may be undesirable, however, especially if you have an
output like that above. You can change the colors, but a quicker way may be just to turn off the alias:
$ alias ls="''ls"
Another useful option is the -F option, which appends a symbol after each file to show the type of the file - a "/" after
directories, "@" after symbolic links, and "|" after named pipes.
$ ls -F
dir1/ file1 file2 link1@ pipe1|
If you have a subdirectory under a directory and you want to list only that directory, ls -l will show you the contents of
the subdirectory as well. For instance, suppose the directory structure is like the following:
/dir1
+-->/subdir1
+--> subfile1
+--> subfile2
The directory dir1 has a subdirectory subdir1 and two files: subfile1 and subfile2. If you just want to see the attributes of
the directory dir1, you issue:
$ ls -l dir1
total 4
drwxr-xr-x 2 oracle oinstall 4096 Oct 14 16:52 subdir1
-rw-r--r-- 1 oracle oinstall 0 Oct 14 16:48 subfile1
-rw-r--r-- 1 oracle oinstall 0 Oct 14 16:48 subfile2
Note that the directory dir1 is not listed in the output. Rather, the contents of the directory are displayed. This is
expected behavior when processing directories. To show the directory dir1 only, you will have to use the -d command.
$ ls -dl dir1
drwxr-xr-x 3 oracle oinstall 4096 Oct 14 16:52 dir1
If you notice the output of the following ls -l output:
-rwxr-x--x 1 oracle oinstall 10457761 Apr 6 2006 rmanO
-rwxr-x--x 1 oracle oinstall 10457761 Sep 23 23:48 rman
-rwsr-s--x 1 oracle oinstall 93300507 Apr 6 2006 oracleO
-rwx------ 1 oracle oinstall 93300507 Sep 23 23:49 oracle
You will notice that the sizes of the files are shown in bytes. This may be easy in small files but when file sizes are
pretty large, a long number may not be very easy to read. The option "-h" comes handy then, to display the size in a human
readable form.
$ ls -lh

-rwxr-x--x 1 oracle oinstall 10M Apr 6 2006 rmanO


-rwxr-x--x 1 oracle oinstall 10M Sep 23 23:48 rman
-rwsr-s--x 1 oracle oinstall 89M Apr 6 2006 oracleO
-rwx------ 1 oracle oinstall 89M Sep 23 23:49 oracle
Note how the size has been shown in M (for megabytes), K (for kilobytes), and so on.
$ ls -lr
The parameter -r shows the output in the reverse order. In this command, the files will be shown in the reverse alphabetical
order.
$ ls -lR
The -R operator makes the ls command execute recursivelythat is, go under to the subdirectories and show those files too.
What if you want to show the largest to the smallest files? This can be done with the -S parameter.
$ ls -lS

total 308
-rw-r----- 1 oracle oinstall 52903 Oct 11 18:31 sqlnet.log
-rwxr-xr-x 1 oracle oinstall 9530 Apr 6 2006 root.sh
drwxr-xr-x 2 oracle oinstall 8192 Oct 11 18:14 bin
drwxr-x--- 3 oracle oinstall 8192 Sep 23 23:49 lib
xargs
Most Linux commands are about getting an output: a list of files, a list of strings, and so on. But what if you want to use
some other command with the output of the previous one as a parameter? For example, the file command shows the type of the
file (executable, ascii text, and so on); you can manipulate the output to show only the filenames and now you want to pass
these names to the ls -lcommand to see the timestamp. The command xargs does exactly that. It allows you to execute some
other commands on the output. Remember this syntax from Part 1:
file -Lz * | grep ASCII | cut -d":" -f1 | xargs ls -ltr
Let's dissect this command string. The first, file -Lz *, finds files that are symbolic links or compressed. It passes the
output to the next command, grep ASCII, which searches for the string "ASCII" in them and produces the output similar to
this:
alert_DBA102.log: ASCII English text
alert_DBA102.log.Z: ASCII text (compress'd data 16 bits)
dba102_asmb_12307.trc.Z: ASCII English text (compress'd data 16 bits)
dba102_asmb_20653.trc.Z: ASCII English text (compress'd data 16 bits)
Since we are interested in the file names only, we applied the next command, cut -d":" -f1, to show the first field only:
alert_DBA102.log
alert_DBA102.log.Z
dba102_asmb_12307.trc.Z
dba102_asmb_20653.trc.Z
Now, we want to use the ls -l command and pass the above list as parameters, one at a time. The xargs command allowed
you to to that. The last part, xargs ls -ltr, takes the output and executes the command ls -ltr against them, as if
executing:
ls -ltr alert_DBA102.log
ls -ltr alert_DBA102.log.Z
ls -ltr dba102_asmb_12307.trc.Z
ls -ltr dba102_asmb_20653.trc.Z
Thus xargs is not useful by itself, but is quite powerful when combined with other commands.
Here is another example, where we want to count the number of lines in those files:
$ file * | grep ASCII | cut -d":" -f1 | xargs wc -l
47853 alert_DBA102.log
19 dba102_cjq0_14493.trc
29053 dba102_mmnl_14497.trc
154 dba102_reco_14491.trc
43 dba102_rvwr_14518.trc
77122 total
(Note: the above task can also be accomplished with the following command:)
$ wc -l file * | grep ASCII | cut -d":" -f1 | grep ASCII | cut -d":" -f1
The xargs version is given to illustrate the concept. Linux has several ways to achieve the same task; use the one that
suits your situation best.
Using this approach you can quickly rename files in a directory.
$ ls | xargs -t -i mv {} {}.bak
The -i option tells xargs to replace {} with the name of each item. The -t option instructs xargs to print the command
before executing it.
Another very useful operation is when you want to open the files for editing using vi:
$ file * | grep ASCII | cut -d":" -f1 | xargs vi
This command opens the files one by one using vi. When you want to search for many files and open them for editing, this
comes in very handy.
It also has several options. Perhaps the most useful is the -p option, which makes the operation interactive:
$ file * | grep ASCII | cut -d":" -f1 | xargs -p vi
vi alert_DBA102.log dba102_cjq0_14493.trc dba102_mmnl_14497.trc

dba102_reco_14491.trc dba102_rvwr_14518.trc ?...

Here xarg asks you to confirm before running each command. If you press "y", it executes the command. You will find it
immensely useful when you take some potentially damaging and irreversible operations on the filesuch as removing or
overwriting it.
The -t option uses a verbose mode; it displays the command it is about to run, which is a very helpful option during
debugging.
What if the output passed to the xargs is blank? Consider:
$ file * | grep SSSSSS | cut -d":" -f1 | xargs -t wc -l
wc -l
0
$
Here searching for "SSSSSS" produces no match; so the input to xargs is all blanks, as shown in the second line (produced since
we used the -t, or the verbose option). Although this may be useful, In some cases you may want to stop xargs if there is
nothing to process; if so, you can use the -r option:
$ file * | grep SSSSSS | cut -d":" -f1 | xargs -t -r wc -l
$
The command exits if there is nothing to run.
Suppose you want to remove the files using the rm command, which should be the argument to the xargs command.
However, rm can accept a limited number of arguments. What if your argument list exceeds that limit? The -n option to xargs
limits the number of arguments in a single command line.
Here is how you can limit only two arguments per command line: Even if five files are passed to xargs ls -ltr, only two
files are passed to ls -ltr at a time.
$ file * | grep ASCII | cut -d":" -f1 | xargs -t -n2 ls -ltr
ls -ltr alert_DBA102.log dba102_cjq0_14493.trc
-rw-r----- 1 oracle dba 738 Aug 10 19:18 dba102_cjq0_14493.trc
-rw-r--r-- 1 oracle dba 2410225 Aug 13 05:31 alert_DBA102.log
ls -ltr dba102_mmnl_14497.trc dba102_reco_14491.trc
-rw-r----- 1 oracle dba 5386163 Aug 10 17:55 dba102_mmnl_14497.trc
-rw-r----- 1 oracle dba 6808 Aug 13 05:21 dba102_reco_14491.trc
ls -ltr dba102_rvwr_14518.trc
-rw-r----- 1 oracle dba 2087 Aug 10 04:30 dba102_rvwr_14518.trc
Using this approach you can quickly rename files in a directory.
$ ls | xargs -t -i mv {} {}.bak
The -i option tells xargs to replace {} with the name of each item.
rename
As you know, the mv command renames files. For example,
$ mv oldname newname
renames the file oldname to newname. However, what if you don't know the filenames yet? The rename command comes in really
handy here.
rename .log .log.date +%F-%H:%M:%S *
replaces all files with the extension .log with .log.<dateformat>. So sqlnet.log becomes sqlnet.log.2006-09-12-23:26:28.
find
Among the most popular for Oracle users is the find command. By now you know about using find to find files on a given
directory. Here is an example to find files starting with the word "file" in the current directory:
$ find . -name "file*"
./file2
./file1
./file3
./file4
However, what if you want to search for names like FILE1, FILE2, and so on? The -name "file*" will not match them. For a
case-insensitive search, use the -iname option:
$ find . -iname "file*"
./file2
./file1
./file3
./file4
./FILE1
./FILE2
You can limit your search to a specific type of files only. For instance, the above command will get the files of all types:
regular files, directories, symbolic links, and so on. To search for only regular files, you can use the -type f parameter.
$ find . -name "orapw*" -type f
./orapw+ASM
./orapwDBA102
./orapwRMANTEST
./orapwRMANDUP
./orapwTESTAUX
The -type can take the modifiers f (for regular files), l (for symbolic links), d (directories), b (block devices), p (named
pipes), c (character devices), s (sockets).
A slight twist to the above command is to combine it with the file command you learned about in Part 1. The
command file tells you what type of file it is. You can pass it as a post processor for the output from find command.
The -exec parameter executes the command following the parameter. In this case, the command to execute after
the find is file:
$ find . -name "*oraenv*" -type f -exec file {} \;
./coraenv: Bourne shell script text executable
./oraenv: Bourne shell script text executable
This is useful when you want to find out if the ASCII text file could be some type of shell script.
If you substitute -exec with -ok, the command is executed but it asks for your confirmation first. Here's an example:
$ find . -name "sqlplus*" -ok {} \;
< {} ... ./sqlplus > ? y

SQL*Plus: Release 9.2.0.5.0 - Production on Sun Aug 6 11:28:15 2006

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

Enter user-name: / as sysdba

Connected to:
Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production

SQL> exit
Disconnected from Oracle9i Enterprise Edition Release 9.2.0.5.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.5.0 - Production
< * ... ./sqlplusO > ? n
$
Here we have asked the shell to find all programs starting with "sqlplus", and execute them. Note there is nothing between -ok
and {}, so it will just execute the files it finds. It finds two filessqlplus and sqlplusOand asks in each case if you want
to execute it. We answered "y" to the prompt against "sqlplus" and it executed. After exiting, it prompted the second file it
found (sqlplusO) and for confirmation again again, to which we answered "n"thus, it was not executed.
Tip for Oracle Users
Oracle produces many extraneous files: trace files, log files, dump files, and so on. Unless they are cleaned periodically,
they can fill up the filesystem and bring the database to a halt.
To ensure that doesn't happen, simply search for the files with extension "trc" and remove them if they are more than three
days old. A simple command does the trick:
find . -name "*.trc" -ctime +3 -exec rm {} \;
To forcibly remove them prior to the three-day limit, use the -f option.

find . -name "*.trc" -ctime +3 -exec rm -f {} \;


If you just want to list the files:
find . -name "*.trc" -ctime +3 -exec ls -l {} \;
m4
This command takes an input file and substitutes strings inside it with the parameters passed, similar to substituting for
variables. For example, here is an input file:
$ cat temp
The COLOR fox jumped over the TYPE fence.
Were you to substitute the strings "COLOR" by "brown" and "TYPE" by "broken", you could use:
$ m4 -DCOLOR=brown -DTYPE=broken temp
The brown fox jumped over the broken fence.
Else, if you want to substitute "white" and "high" for the same:

$ m4 -DCOLOR=white -DTYPE=high temp


The white fox jumped over the high fence.
whence and which
These commands are used to find out the where the executables mentioned are stored in the PATH of the user. When the
executable is found in the path, they behave pretty much the same way and display the path:

$ which sqlplus
/u02/app/oracle/products/10.2.0.1/db1/bin/sqlplus
$ whence sqlplus
/u02/app/oracle/products/10.2.0.1/db1/bin/sqlplus

The output is identical. However, if the executable is not found in the path, the behavior is different. The which command
produces an explicit message:
$ which sqlplus1
/usr/bin/which: no sqlplus1 in (/u02/app/oracle/products/10.2.0.1/db1/bin:/usr

/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin)

whereas the whence command produces no message:


$ whence sqlplus1]
and returns to shell prompt. This is useful in cases where the executable is not found in the path (instead of displaying the
message):

$ whence invalid_command
$ which invalid_command
which: no invalid_command in (/usr/kerberos/sbin:/usr/kerberos/bin:/bin:/sbin:

/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:
/usr/bin/X11:/usr/X11R6/bin:/root/bin)

When whence does not find an executable in the path, it returns without any message but the return code is not zero. This
fact can be exploited in shell scripts; for example:
RC=whence myexec
If [ $RC -ne "0" ]; then
echo "myexec is not in the $PATH"
fi
A very useful option to which is the -i option, which displays the alias as well as the executable, if present. For example,
you saw the use of the alias at the beginning of this article. The rm command is actually an alias in my shell, and there
is an rm command elsewhere in the system as well.
$ which ls
/bin/ls

$ which -i ls
alias ls='ls --color=tty'
/bin/ls
The default behavior of which is to show the first occurrence of the executable in the path. If the executable exists in
different directories in the path, the subsequent occurrences are ignored. You can see all the occurrences of the executable
via the -a option.
$ which java
/usr/bin/java

$ which -a java
/usr/bin/java
/home/oracle/oracle/product/11.1/db_1/jdk/jre/bin/java
top
The top command is probably the most useful one for an Oracle DBA managing a database on Linux. Say the system is slow and
you want to find out who is gobbling up all the CPU and/or memory. To display the top processes, you use the command top.
Note that unlike other commands, top does not produce an output and sits still. It refreshes the screen to display new
information. So, if you just issue top and leave the screen up, the most current information is always up. To stop and exit
to shell, you can press Control-C.
$ top

18:46:13 up 11 days, 21:50, 5 users, load average: 0.11, 0.19, 0.18


151 processes: 147 sleeping, 4 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 12.5% 0.0% 6.7% 0.0% 0.0% 5.3% 75.2%
Mem: 1026912k av, 999548k used, 27364k free, 0k shrd, 116104k buff
758312k actv, 145904k in_d, 16192k in_c
Swap: 2041192k av, 122224k used, 1918968k free 590140k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
451 oracle 15 0 6044 4928 4216 S 0.1 0.4 0:20 0 tnslsnr
8991 oracle 15 0 1248 1248 896 R 0.1 0.1 0:00 0 top
1 root 19 0 440 400 372 S 0.0 0.0 0:04 0 init
2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
3 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kapmd
4 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
7 root 15 0 0 0 0 SW 0.0 0.0 0:01 0 bdflush
5 root 15 0 0 0 0 SW 0.0 0.0 0:33 0 kswapd
6 root 15 0 0 0 0 SW 0.0 0.0 0:14 0 kscand
8 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kupdated
9 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd

... output snipped ...

Let's examine the different types of information produced. The first line:
18:46:13 up 11 days, 21:50, 5 users, load average: 0.11, 0.19, 0.18
shows the current time (18:46:13), that system has been up for 11 days; that the system has been working for 21 hours 50
seconds. The load average of the system is shown (0.11, 0.19, 0.18) for the last 1, 5 and 15 minutes respectively. (By the
way, you can also get this information by issuing the uptime command.)
If the load average is not required, press the letter "l" (lowercase L); it will turn it off. To turn it back on press l
again. The second line:
151 processes: 147 sleeping, 4 running, 0 zombie, 0 stopped
shows the number of processes, running, sleeping, etc. The third and fourth lines:
CPU states: cpu user nice system irq softirq iowait idle
total 12.5% 0.0% 6.7% 0.0% 0.0% 5.3% 75.2%
show the CPU utilization details. The above line shows that user processes consume 12.5% and system consumes 6.7%. The user
processes include the Oracle processes. Press "t" to turn these three lines off and on. If there are more than one CPU, you
will see one line per CPU.
The next two lines:
Mem: 1026912k av, 1000688k used, 26224k free, 0k shrd, 113624k buff
758668k actv, 146872k in_d, 14460k in_c
Swap: 2041192k av, 122476k used, 1918716k free 591776k cached
show the memory available and utilized. Total memory is "1026912k av", approximately 1GB, of which only 26224k or 26MB is
free. The swap space is 2GB; but it's almost not used. To turn it off and on, press "m".
The rest of the display shows the processes in a tabular format. Here is the explanation of the columns:
Column Description

PID The process ID of the process


USER The user running the process

PRI The priority of the process

NI The nice value: The higher the value, the lower the priority of the task
SIZE Memory used by this process (code+data+stack)

RSS The physical memory used by this process

SHARE The shared memory used by this process

STAT The status of this process, shown in code. Some major status codes are:
R Running
S Sleeping
Z Zombie
T Stopped
You can also see second and third characters, which indicate:
W Swapped out process
N positive nice value
%CPU The percentage of CPU used by this process

%MEM The percentage of memory used by this process

TIME The total CPU time used by this process

CPU If this is a multi-processor system, this column indicates the ID of the CPU this
process is running on.
COMMAND The command issued by this process
While the top is being displayed, you can press a few keys to format the display as you like. Pressing the uppercase M key
sorts the output by memory usage. (Note that using lowercase m will turn the memory summary lines on or off at the top of the
display.) This is very useful when you want to find out who is consuming the memory. Here is sample output:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
31903 oracle 15 0 75760 72M 72508 S 0.0 7.2 0:01 0 ora_smon_PRODB2
31909 oracle 15 0 68944 66M 64572 S 0.0 6.6 0:03 0 ora_mmon_PRODB2
31897 oracle 15 0 53788 49M 48652 S 0.0 4.9 0:00 0 ora_dbw0_PRODB2
Now that you learned how to interpret the output, let's see how to use command line parameters.
The most useful is -d, which indicates the delay between the screen refreshes. To refresh every second, use top -d 1.
The other useful option is -p. If you want to monitor only a few processes, not all, you can specify only those after the -p
option. To monitor processes 13609, 13608 and 13554, issue:
top -p 13609 -p 13608 -p 13554
This will show results in the same format as the top command, but only those specific processes.
Tip for Oracle Users
It's probably needless to say that the top utility comes in very handy for analyzing the performance of database servers.
Here is a partialtop output.
20:51:14 up 11 days, 23:55, 4 users, load average: 0.88, 0.39, 0.27
113 processes: 110 sleeping, 2 running, 1 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 1.0% 0.0% 5.6% 2.2% 0.0% 91.2% 0.0%
Mem: 1026912k av, 1008832k used, 18080k free, 0k shrd, 30064k buff
771512k actv, 141348k in_d, 13308k in_c
Swap: 2041192k av, 66776k used, 1974416k free 812652k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16143 oracle 15 0 39280 32M 26608 D 4.0 3.2 0:02 0 oraclePRODB2...
5 root 15 0 0 0 0 SW 1.6 0.0 0:33 0 kswapd

... output snipped ...

Let's analyze the output carefully. The first thing you should notice is the "idle" column under CPU states; it's 0.0%
meaning, the CPU is completely occupied doing something. The question is, doing what? Move your attention to the column
"system", just slightly left; it shows 5.6%. So the system itself is not doing much. Go even more left to the column marked
"user", which shows 1.0%. Since user processes include Oracle as well, Oracle is not consuming the CPU cycles. So, what's
eating up all the CPU?
The answer lies in the same line, just to the right under the column "iowait", which indicates 91.2%. This explains it all:
the CPU is waiting for IO 91.2% of the time.
So why so much IO wait? The answer lies in the display. Note the PID of the highest consuming process: 16143. You can use the
following query to determine what the process is doing:
select s.sid, s.username, s.program
from v$session s, v$process p
where spid = 16143
and p.addr = s.paddr
/

SID USERNAME PROGRAM


------------------- -----------------------------
159 SYS rman@prolin2 (TNS V1-V3)
The rman process is taking up the IO waits related CPU cycles. This information helps you determine the next course of
action.
skill and snice
From the previous discussion you learned how to identify a CPU consuming resource. What if you find that a process is
consuming a lot of CPU and memory, but you don't want to kill it? Consider the top output below:
$ top -c -p 16514

23:00:44 up 12 days, 2:04, 4 users, load average: 0.47, 0.35, 0.31


1 processes: 1 sleeping, 0 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 0.0% 0.6% 8.7% 2.2% 0.0% 88.3% 0.0%
Mem: 1026912k av, 1010476k used, 16436k free, 0k shrd, 52128k buff
766724k actv, 143128k in_d, 14264k in_c
Swap: 2041192k av, 83160k used, 1958032k free 799432k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16514 oracle 19 4 28796 26M 20252 D N 7.0 2.5 0:03 0 oraclePRODB2...
Now that you confirmed the process 16514 is consuming a lot of memory, you can "freeze" itbut not kill itusing
the skill command.
$ skill -STOP 1
After this, check the top output:
23:01:11 up 12 days, 2:05, 4 users, load average: 1.20, 0.54, 0.38
1 processes: 0 sleeping, 0 running, 0 zombie, 1 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 2.3% 0.0% 0.3% 0.0% 0.0% 2.3% 94.8%
Mem: 1026912k av, 1008756k used, 18156k free, 0k shrd, 3976k buff
770024k actv, 143496k in_d, 12876k in_c
Swap: 2041192k av, 83152k used, 1958040k free 851200k cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16514 oracle 19 4 28796 26M 20252 T N 0.0 2.5 0:04 0 oraclePRODB2...
The CPU is now 94% idle from 0%. The process is effectively frozen. After some time, you may want to revive the process from
coma:
$ skill -CONT 16514
This approach is immensely useful for temporarily freezing processes to make room for more important processes to complete.
The command is very versatile. If you want to stop all processes of the user "oracle", only one command does it all:
$ skill -STOP oracle
You can use a user, a PID, a command or terminal id as argument. The following stops all rman commands.
$ skill -STOP rman
As you can see, skill decides that argument you entereda process ID, userid, or commandand acts appropriately. This
may cause an issue in some cases, where you may have a user and a command in the same name. The best example is the "oracle"
process, which is typically run by the user "oracle". So, when you want to stop the process called "oracle" and you issue:
$ skill -STOP oracle
all the processes of user "oracle" stop, including the session you may be on. To be completely unambiguous you can optionally
give a new parameter to specify the type of the parameter. To stop a command called oracle, you can give:
$ skill -STOP -c oracle
The command snice is similar. Instead of stopping a process it makes its priority a lower one. First, check the top output:
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3 root 15 0 0 0 0 RW 0.0 0.0 0:00 0 kapmd
13680 oracle 15 0 11336 10M 8820 T 0.0 1.0 0:00 0 oracle
13683 oracle 15 0 9972 9608 7788 T 0.0 0.9 0:00 0 oracle
13686 oracle 15 0 9860 9496 7676 T 0.0 0.9 0:00 0 oracle
13689 oracle 15 0 10004 9640 7820 T 0.0 0.9 0:00 0 oracle
13695 oracle 15 0 9984 9620 7800 T 0.0 0.9 0:00 0 oracle
13698 oracle 15 0 10064 9700 7884 T 0.0 0.9 0:00 0 oracle
13701 oracle 15 0 22204 21M 16940 T 0.0 2.1 0:00 0 oracle
Now, drop the priority of the processes of "oracle" by four points. Note that the higher the number, the lower the priority.
$ snice +4 -u oracle

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
16894 oracle 20 4 38904 32M 26248 D N 5.5 3.2 0:01 0 oracle
Note how the NI column (for nice values) is now 4 and the priority is now set to 20, instead of 15. This is quite useful in
reducing priorities.
Arup Nanda ( arup@proligence.com ) has been an Oracle DBA for more than 12 years, handling all aspects of database
administrationfrom performance tuning to security and disaster recovery. He is a coauthor of PL/SQL for DBAs (O'Reilly
Media, 2005), was Oracle Magazine's DBA of the Year in 2003, and is an Oracle ACE.
DBA: Linux
Guide to Advanced Linux Command Mastery, Part 3: Resource Management

by Arup Nanda
Published January 2009
In this installment, learn advanced Linux commands for monitoring physical components
A Linux system comprises several key physical components such as CPU, memory, network card, and storage devices. To
effectively manage a Linux environment, you should be able to measure the various metrics of these resourceshow much
each component is processing, if there is a bottleneck, and so onwith reasonable accuracy.
In the other parts of this series you learned some commands for measuring metrics at a macro level. In this installment,
however, you will learn advanced Linux commands for monitoring physical components specifically. Specifically, you will learn
about the commands in the following categories:
Component Commands
Memory free, vmstat, mpstat, iostat, sar
CPU vmstat, mpstat, iostat, sar
I/O vmstat, mpstat, iostat, sar
Processes ipcs, ipcrm
As you can see, some commands appear in more than one category. This is due to the fact that the commands can perform
many tasks. Some commands are better suited to some components--e.g. iostat for I/O--but you should understand the
differences in their workings and use the ones you are more comfortable with.
In most cases, a single command may not be useful to understand what really is going on. You should know multiple commands
to get the information you want.
free
One common question is, How much memory is being used by my applications and various server, user, and system
processes? Or, How much memory is free right now? If the memory used by the running processes is more than the available
RAM, the processes are moved to swap. So an ancillary question is, How much swap is being used?
The free command answers all those questions. Whats more, a very useful option, m , shows free memory in megabytes:
# free -m
total used free shared buffers cached
Mem: 1772 1654 117 0 18 618
-/+ buffers/cache: 1017 754
Swap: 1983 1065 918
The above output shows that the system has 1,772 MB of RAM, of which 1,654 MB is being used, leaving 117 MB of free
memory. The second line shows the buffers and cache size changes in the physical memory. The third line shows swap
utilization.
To show the same in kilobytes and gigabytes, replace the -m option with -k or -g respectively. You can get down to byte level as
well, using the b option.
# free -b
total used free shared buffers cached
Mem: 1858129920 1724039168 134090752 0 18640896 643194880
-/+ buffers/cache: 1062203392 795926528
Swap: 2080366592 1116721152 963645440
The t option shows the total at the bottom of the output (sum of physical memory and swap):
# free -m -t
total used free shared buffers cached
Mem: 1772 1644 127 0 16 613
-/+ buffers/cache: 1014 757
Swap: 1983 1065 918
Total: 3756 2709 1046
Although free does not show the percentages, we can extract and format specific parts of the output to show used memory as a
percentage of the total only:
# free -m | grep Mem | awk '{print ($3 / $2)*100}'
98.7077
This comes handy in shell scripts where the specific numbers are important. For instance, you may want to trigger an alert when
the percentage of free memory falls below a certain threshold.
Similarly, to find the percentage of swap used, you can issue:
free -m | grep -i Swap | awk '{print ($3 / $2)*100}'
You can use free to watch the memory load exerted by an application. For instance, check the free memory before starting the
backup application and then check it immediately after starting. The difference could be attributed to the consumption by the
backup application.
Usage for Oracle Users
So, how can you use this command to manage the Linux server running your Oracle environment? One of the most common
causes of performance issues is the lack of memory, causing the system to swap memory areas into the disk temporarily.
Some degree of swapping is probably inevitable but a lot of swapping is indicative of lack of free memory.
Instead, you can use free to get the free memory information now and follow it up with the sar command (shown later) to check
the historical trend of the memory and swap consumption. If the swap usage is temporary, its probably a one-time spike; but if
its a pronounced over a period of time, you should take notice. There are a few obvious and possible suspects of chronic
memory overloads:
A large SGA that is more that memory available
Very large allocation on PGA
Some process with bugs that leaks memory
For the first case, you should make sure SGA is less that available memory. A general rule of thumb is to use about 40 percent
of the physical memory for SGA, but of course you should define that parameter based on your specific situation. In the second
case, you should try to reduce the large buffer allocation in queries. In the third case you should use the ps command
(described in an earlier installment of this series) to identify the specific process that might be leaking memory.
ipcs
When a process runs, it grabs from the shared memory. There could be one or many shared memory segments by this
process. The processes send messages to each other (inter-process communication, or IPC) and use semaphores. To display
information about shared memory segments, IPC message queues, and semaphores, you can use a single command: ipcs.
The m option is very popular; it displays the shared memory segments.
# ipcs -m

------ Shared Memory Segments --------


key shmid owner perms bytes nattch status
0xc4145514 2031618 oracle 660 4096 0
0x00000000 3670019 oracle 660 8388608 108
0x00000000 327684 oracle 600 196608 2 dest
0x00000000 360453 oracle 600 196608 2 dest
0x00000000 393222 oracle 600 196608 2 dest
0x00000000 425991 oracle 600 196608 2 dest
0x00000000 3702792 oracle 660 926941184 108
0x00000000 491529 oracle 600 196608 2 dest
0x49d1a288 3735562 oracle 660 140509184 108
0x00000000 557067 oracle 600 196608 2 dest
0x00000000 1081356 oracle 600 196608 2 dest
0x00000000 983053 oracle 600 196608 2 dest
0x00000000 1835023 oracle 600 196608 2 dest
This output, taken on a server running Oracle software, shows the various shared memory segments. Each one is uniquely
identified by a shared memory ID, shown under the shmid column. (Later you will see how to use this column value.) The
owner, of course, shows the owner of the segment, the perms column shows the permissions (same as unix permissions),
and bytes shows the size in bytes.
The -u option shows a very quick summary:
# ipcs -mu

------ Shared Memory Status --------


segments allocated 25
pages allocated 264305
pages resident 101682
pages swapped 100667
Swap performance: 0 attempts 0 successes
The l option shows the limits (as opposed to the current values):
# ipcs -ml

------ Shared Memory Limits --------


max number of segments = 4096
max seg size (kbytes) = 907290
max total shared memory (kbytes) = 13115392
min seg size (bytes) = 1
If you see the current values at or close the limit values, you should consider upping the limit.
You can get a detailed picture of a specific shared memory segment using the shmid value. The i option accomplishes that.
Here is how you will see details of the shmid 3702792:
# ipcs -m -i 3702792

Shared memory Segment shmid=3702792


uid=500 gid=502 cuid=500 cgid=502
mode=0660 access_perms=0660
bytes=926941184 lpid=12225 cpid=27169 nattch=113
att_time=Fri Dec 19 23:34:10 2008
det_time=Fri Dec 19 23:34:10 2008
change_time=Sun Dec 7 05:03:10 2008
Later you will an example of how you to interpret the above output.
The -s shows the semaphores in the system:
# ipcs -s

------ Semaphore Arrays --------


key semid owner perms nsems
0x313f2eb8 1146880 oracle 660 104
0x0b776504 2326529 oracle 660 154
and so on
This shows some valuable data. It shows the semaphore array with the ID 1146880 has 104 semaphores, and the other one
has 154. If you add them up, the total value has to be below the maximum limit defined by the kernel parameter (semmax).
While installing Oracle Database software, the pre-install checker has a check for the setting for semmax. Later, when the
system attains steady state, you can check for the actual utilization and then adjust the kernel value accordingly.
Usage for Oracle Users
How can you find out the shared memory segments used by the Oracle Database instance? To get that, use the oradebug
command. First connect to the database as sysdba:
# sqlplus / as sysdba
In the SQL, use the oradebug command as shown below:
SQL> oradebug setmypid
Statement processed.
SQL> oradebug ipc
Information written to trace file.
To find out the name of the trace file:
SQL> oradebug TRACEFILE_NAME
/opt/oracle/diag/rdbms/odba112/ODBA112/trace/ODBA112_ora_22544.trc
Now, if you open that trace file, you will see the shared memory IDs. Here is an excerpt from the file:
Area #0 `Fixed Size' containing Subareas 0-0
Total size 000000000014613c Minimum Subarea size 00000000
Area Subarea Shmid Stable Addr Actual Addr
0 0
17235970
0x00000020000000 0x00000020000000
Subarea size Segment size
0000000000147000 000000002c600000
Area #1 `Variable Size' containing Subareas 4-4
Total size 000000002bc00000 Minimum Subarea size 00400000
Area Subarea Shmid Stable Addr Actual Addr
1 4

17235970
0x00000020800000 0x00000020800000
Subarea size Segment size
000000002bc00000 000000002c600000
Area #2 `Redo Buffers' containing Subareas 1-1
Total size 0000000000522000 Minimum Subarea size 00000000
Area Subarea Shmid Stable Addr Actual Addr
2 1

17235970
0x00000020147000 0x00000020147000
Subarea size Segment size
0000000000522000 000000002c600000
... and so on ...

The shared memory id has been shown in bold red. You can use this shared memory ID to get the details of the shared
memory:
# ipcs -m -i

17235970

Another useful observation is the value of lpid the process ID of the process that last touched the shared memory segment. To
demonstrate the value in that attribute, use SQL*Plus to connect to the instance from a different session.
# sqlplus / as sysdba
In that session, find out the PID of the server process:
SQL> select spid from v$process
2 where addr = (select paddr from v$session
3 where sid =
4 (select sid from v$mystat where rownum < 2)
5 );

SPID
------------------------
13224
Now re-execute the ipcs command against the same shared memory segment:
# ipcs -m -i 17235970

Shared memory Segment shmid=17235970


uid=500 gid=502 cuid=500 cgid=502
mode=0660 access_perms=0660
bytes=140509184 lpid=13224 cpid=27169 nattch=113
att_time=Fri Dec 19 23:38:09 2008
det_time=Fri Dec 19 23:38:09 2008
change_time=Sun Dec 7 05:03:10 2008
Note the value of lpid, which was changed to 13224, from the original value 12225. The lpid shows the PID of the last process
that touched the shared memory segment, and you saw how that value changes.
The command by itself provides little value. The next command ipcrm allows you to act based on the output, as you will see
in the next section.
ipcrm
Now that you identified the shared memory and other IPC metrics, what do you do with them? You saw some usage earlier,
such as identifying the shared memory used by Oracle, making sure the kernel parameter for shared memory is set, and so on.
Another common application is to remove the shared memory, the IPC message queue, or the semaphore arrays.
To remove a shared memory segment, note its shmid from the ipcs command output. Then use the m option to remove the
segment. To remove segment with ID 3735562, use:
# ipcrm m 3735562
This will remove the shared memory. You can also use this to kill semaphores and IPC message queues as well (using s and
q parameters).
Usage for Oracle Users
Sometimes when you shutdown the database instance, the shared memory segments may not be completely cleaned up by the
Linux kernel. The shared memory left behind is not useful; but it hogs the system resources making less memory available to
the other processes. In that case, you can check any lingering shared memory segments owned by the oracle user and then
remove them, if any using the ipcrm command.
vmstat
When called, the grand-daddy of all memory and process related displays, vmstat, continuously runs and posts its information. It
takes two arguments:
# vmstat <interval> <count>
<interval> is the interval in seconds between two runs. <count> is the number of repetitions vmstat makes. Here is a sample
when we want vmstat to run every five seconds and stop after the tenth run. Every line in the output comes after five seconds
and shows the stats at that time.
# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 1087032 132500 15260 622488 89 19 9 3 0 0 4 10 82 5
0 0 1087032 132500 15284 622464 0 0 230 151 1095 858 1 0 98 1
0 0 1087032 132484 15300 622448 0 0 317 79 1088 905 1 0 98 0
shows up to 10 times.
The output shows a lot about the system resources. Lets examine them in detail:
procs Shows the number of processes
r Processs waiting to be run. The more the load on the system, the more the number of processes
waiting to get CPU cycles to run.
b Uninterruptible sleeping processes, also known as blocked processes. These processes are
most likely waiting for I/O but could be for something else too.
Sometimes there is another column as well, under heading w, which shows the number of processes that can be run but have
been swapped out to the swap area.
The numbers under b should be close to 0. If the number under w is high, you may need more memory.
The next block shows memory metrics:
swpd Amount of virtual memory or swapped memory (in KB)
free Amount of free physical memory (in KB)
buff Amount of memory used as buffers (in KB)
cache Kilobytes of physical memory used as cache
The buffer memory is used to store file metadata such as i-nodes and data from raw block devices. The cache memory is used
for file data itself.
The next block shows swap activity:
si Rate at which the memory is swapped back from the disk to the physical RAM (in KB/sec)
so Rate at which the memory is swapped out to the disk from physical RAM (in KB/sec)
The next block slows I/O activity:
bi Rate at which the system sends data to the block devices (in blocks/sec)
bo Rate at which the system reads the data from block devices (in blocks/sec)
The next block shows system related activities:
in Number of interrupts received by the system per second
cs Rate of context switching in the process space (in number/sec)
The final block is probably the most used the information on CPU load:
us Shows the percentage of CPU spent in user processes. The Oracle processes come in this
category.
sy Percentage of CPU used by system processes, such as all root processes
id Percentage of free CPU
wa Percentage spent in waiting for I/O
Lets see how to interpret these values. The first line of the output is an average of all the metrics since the system was
restarted. So, ignore that line since it does not show the current status. The other lines show the metrics in real time.
Ideally, the number of processes waiting or blocking (under the procs heading) should be 0 or close to 0. If they are high,
then the system either does not have enough resources like CPU, memory, or I/O. This information comes useful while
diagnosing performance issues.
The data under swap indicates if excessive swapping is going on. If that is the case, then you may have inadequate physical
memory. You should either reduce the memory demand or increase the physical RAM.
The data under io indicates the flow of data to and from the disks. This shows how much disk activity is going on, which does
not necessarily indicate some problem. If you see some large number under proc and then b column (processes being
blocked) and high I/O, the issue could be a severe I/O contention.
The most useful information comes under the cpu heading. The id column shows idle CPU. If you subtract that number
from 100, you get how much percent the CPU is busy. Remember the top command described in another installment of this
series? That also shows a CPU free% number. The difference is: top shows that free% for each CPU whereas vmstat shows
the consolidated view for all CPUs.
The vmstat command also shows the breakdown of CPU usage: how much is used by the Linux system, how much by a user
process, and how much on waiting for I/O. From this breakdown you can determine what is contributing to CPU consumption. If
system CPU load is high, could there be some root process such as backup running?
The system load should be consistent over a period of time. If the system shows a high number, use the top command to
identify the system process consuming CPU.
Usage for Oracle Users
Oracle processes (the background processes and server processes) and the user processes (sqlplus, apache, etc.) come under
us. If this number is high, use top to identify the processes. If the wa column shows a high number, it indicates the I/O
system is unable to catch up with the amount of reading or writing. This could occasionally shoot up as a result of spikes in
heavy updates in the database causing log switch and a subsequent spike in archiving processes. But if it consistently shows a
large number, then you may have an I/O bottleneck.
I/O blockages in an Oracle database can cause serious problems. Apart from performance issues, the slow I/O could cause
controlfile writes to be slow, which may cause a process to wait to acquire a controlfile enqueue. If the wait is more that 900
seconds, and the waiter is a critical process like LGWR, it brings down the database instance.
If you see a lot of swapping, perhaps the SGA is sized too large to fit in the physical memory. You should either reduce the SGA
size or increase the physical memory.
mpstat
Another useful command to get CPU related stats is mpstat. Here is an example output:
# mpstat -P ALL 5 2
Linux 2.6.9-67.ELsmp (oraclerac1) 12/20/2008

10:42:38 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:42:43 PM all 6.89 0.00 44.76 0.10 0.10 0.10 48.05 1121.60
10:42:43 PM 0 9.20 0.00 49.00 0.00 0.00 0.20 41.60 413.00
10:42:43 PM 1 4.60 0.00 40.60 0.00 0.20 0.20 54.60 708.40

10:42:43 PM CPU %user %nice %system %iowait %irq %soft %idle intr/s
10:42:48 PM all 7.60 0.00 45.30 0.30 0.00 0.10 46.70 1195.01
10:42:48 PM 0 4.19 0.00 2.20 0.40 0.00 0.00 93.21 1034.53
10:42:48 PM 1 10.78 0.00 88.22 0.40 0.00 0.00 0.20 160.48

Average: CPU %user %nice %system %iowait %irq %soft %idle intr/s
Average: all 7.25 0.00 45.03 0.20 0.05 0.10 47.38 1158.34
Average: 0 6.69 0.00 25.57 0.20 0.00 0.10 67.43 724.08
Average: 1 7.69 0.00 64.44 0.20 0.10 0.10 27.37 434.17
It shows the various stats for the CPUs in the system. The P ALL options directs the command to display stats for all the
CPUs, not just a specific one. The parameters 5 2 directs the command to run every 5 seconds and for 2 times. The above
output shows the metrics for all the CPUs first (aggregated) and for each CPU individually. Finally, the average for all the CPUs
has been shown at the end.
Lets see what the column values mean:
%user Indicates the percentage of the processing for that CPU consumes by user processes. User
processes are non-kernel processes used for applications such as an Oracle database. In
this example output, the user CPU %age is very little.

%nice Indicates the percentage of CPU when a process was downgraded by nice command. The
command nice has been described in an earlier installment. It brief, the command nice
changes the priority of a process.

%system Indicates the CPU percentage consumed by kernel processes

%iowait Shows the percentage of CPU time consumed by waiting for an I/O to occur

%irq Indicates the %age of CPU used to handle system interrupts

%soft Indicates %age consumed for software interrupts

%idle Shows the idle time of the CPU

%intr/s Shows the total number of interrupts received by the CPU per second
You may be wondering about the purpose of the mpstat command when you have vmstat, described earlier. There is a huge
difference: mpstat can show the per processor stats, whereas vmstat shows a consolidated view of all processors. So, its
possible that a poorly written application not using multi-threaded architecture runs on a multi-processor machine but does not
use all the processors. As a result, one CPU overloads while others remain free. You can easily diagnose these sorts of issues
via mpstat.
Usage for Oracle Users
Similar to vmstat, the mpstat command also produces CPU related stats so all the discussion related to CPU issues applies to
mpstat as well. When you see a low %idle figure, you know you have CPU starvation. When you see a higher %iowait figure,
you know there is some issue with the I/O subsystem under the current load. This information comes in very handy in
troubleshooting Oracle database performance.
iostat
A key part of the performance assessment is disk performance. The iostat command gives the performance metrics of the
storage interfaces.
# iostat
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

avg-cpu: %user %nice %sys %iowait %idle


15.71 0.00 1.07 3.30 79.91

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn


cciss/c0d0 4.85 34.82 130.69 307949274 1155708619
cciss/c0d0p1 0.08 0.21 0.00 1897036 3659
cciss/c0d0p2 18.11 34.61 130.69 306051650 1155700792
cciss/c0d1 0.96 13.32 19.75 117780303 174676304
cciss/c0d1p1 2.67 13.32 19.75 117780007 174676288
sda 0.00 0.00 0.00 184 0
sdb 1.03 5.94 18.84 52490104 166623534
sdc 0.00 0.00 0.00 184 0
sdd 1.74 38.19 11.49 337697496 101649200
sde 0.00 0.00 0.00 184 0
sdf 1.51 34.90 6.80 308638992 60159368
sdg 0.00 0.00 0.00 184 0
... and so on ...
The beginning portion of the output shows metrics such as CPU free and I/O waits as you have seen from the mpstat command.
The next part of the output shows very important metrics for each of the disk devices on the system. Lets see what these
columns mean:
Device The name of the device
tps Number of transfers per second, i.e. number of I/O operations per second. Note: this is just
the number of I/O operations; each operation could be huge or small.
Blk_read/s Number of blocks read from this device per second. Blocks are usually of 512 bytes in
size. This is a better value of the disks utilization.
Blk_wrtn/s Number of blocks written to this device per second
Blk_read Number of blocks read from this device so far. Be careful; this is not what is happening
right now. These many blocks have already been read from the device. Its possible that
nothing is being read now. Watch this for some time to see if there is a change.
Blk_wrtn Number of blocks written to the device
In a system with many devices, the output might scroll through several screensmaking things a little bit difficult to examine,
especially if you are looking for a specific device. You can get the metrics for a specific device only by passing that device as a
parameter.
# iostat sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

avg-cpu: %user %nice %sys %iowait %idle


15.71 0.00 1.07 3.30 79.91

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn


sdaj 1.58 31.93 10.65 282355456 94172401
The CPU metrics shown at the beginning may not be very useful. To suppress the CPU related stats shown in the beginning of
the output, use the -d option.

You can place optional parameters at the end to let iostat display the device stats in regular intervals. To get the stats for this
device every 5 seconds for 10 times, issue the following:
# iostat -d sdaj 5 10

You can display the stats in kilobytes instead of just bytes using the -k option:

# iostat -k -d sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn


sdaj 1.58 15.96 5.32 141176880 47085232
While the above output can be helpful, there is lot of information not readily displayed. For instance, one of the key causes of
disk issues is the disk service time, i.e. how fast the disk gets the data to the process that is asking for it. To get that level of
metrics, we have to get the extended stats on the disk, using the -x option.
# iostat -x sdaj
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

avg-cpu: %user %nice %sys %iowait %idle


15.71 0.00 1.07 3.30 79.91

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdaj 0.00 0.00 1.07 0.51 31.93 10.65 15.96 5.32 27.01 0.01 6.26 6.00 0.95
Lets see what the columns mean:
Device The name of the device
rrqm/s The number of read requests merged per second. The disk requests are queued. Whenever
possible, the kernel tries to merge several requests to one. This metric measures the merge
requests for read transfers.
wrqm/s Similar to reads, this is the number of write requests merged.
r/s The number of read requests per second issued to this device
w/s Likewise, the number of write requests per second
rsec/s The number of sectors read from this device per second
wsec/s The number of sectors written to the device per second
rkB/s Data read per second from this device, in kilobytes per second
wkB/s Data written to this device, in kb/s
avgrq-sz Average size of the read requests, in sectors
avgqu- Average length of the request queue for this device
sz
await Average elapsed time (in milliseconds) for the device for I/O requests. This is a sum of
service time + waiting time in the queue.
svctm Average service time (in milliseconds) of the device
%util Bandwidth utilization of the device. If this is close to 100 percent, the device is saturated.
Well, thats a lot of information and may present a challenge as to how to use it effectively. The next section shows how to use
the output.
How to Use It
You can use a combination of the commands to get some meaning information from the output. Remember, disks could be slow
in getting the request from the processes. The amount of time the disk takes to get the data from it to the queue is called service
time. If you want to find out the disks with the highest service times, you issue:
# iostat -x | sort -nrk13
sdat 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 18.80 0.00 64.06 64.05 0.00
sdv 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 17.16 0.00 18.03 17.64 0.00
sdak 0.00 0.00 0.00 0.14 0.00 1.11 0.00 0.55 8.02 0.00 17.00 17.00 0.24
sdm 0.00 0.00 0.00 0.19 0.01 1.52 0.01 0.76 8.06 0.00 16.78 16.78 0.32
... and so on ...
This shows that the disk sdat has the highest service time (64.05 ms). Why is it so high? There could be many possibilities but
three are most likely:
1. The disk gets a lot of requests so the average service time is high.
2. The disk is being utilized to the maximum possible bandwidth.
3. The disk is inherently slow.
Looking at the output we see that reads/sec and writes/sec are 0.00 (almost nothing is happening), so we can rule out #1. The
utilization is also 0.00% (the last column), so we can rule out #2. That leaves #3. However, before we draw a conclusion that the
disk is inherently slow, we need to observe that disk a little more closely. We can examine that disk alone every 5 seconds for
10 times.
# iostat -x sdat 5 10
If the output shows the same average service time, read rate and utilization, we can conclude that #3 is the most likely factor. If
they change, then we can get further clues to understand why the service time is high for this device.
Similarly, you can sort on the read rate column to display the disk under constant read rates.
# iostat -x | sort -nrk6
sdj 0.00 0.00 1.86 0.61 56.78 12.80 28.39 6.40 28.22 0.03 10.69 9.99 2.46
sdah 0.00 0.00 1.66 0.52 50.54 10.94 25.27 5.47 28.17 0.02 10.69 10.00 2.18
sdd 0.00 0.00 1.26 0.48 38.18 11.49 19.09 5.75 28.48 0.01 3.57 3.52 0.61
... and so on ...

The information helps you to locate a disk that is hotthat is, subject to a lot of reads or writes. If the disk is indeed hot, you
should identify the reason for that; perhaps a filesystem defined on the disk is subject to a lot of reading. If that is the case, you
should consider striping the filesystem across many disks to distribute the load, minimizing the possibility that one specific disk
will be hot.
sar
From the earlier discussions, one common thread emerges: Getting real time metrics is not the only important thing; the
historical trend is equally important.
Furthermore, consider this situation: how many times has someone reported a performance problem, but when you dive in to
investigate, everything is back to normal? Performance issues that have occurred in the past are difficult to diagnose without
any specific data as of that time. Finally, you will want to examine the performance data over the past few days to decide on
some settings or to make adjustments.
The sar utility accomplishes that goal. sar stands for System Activity Recorder, which records the metrics of the key
components of the Linux systemCPU, Memory, Disks, Network, etc.in a special place: the directory /var/log/sa. The data is
recorded for each day in a file named sa<nn> where <nn> is the two digit day of the month. For instance the file sa27 holds the
data for the date 27th of that month. This data can be queried by the command sar.
The simplest way to use sar is to use it without any arguments or options. Here is an example:
# sar
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

12:00:01 AM CPU %user %nice %system %iowait %idle


12:10:01 AM all 14.99 0.00 1.27 2.85 80.89
12:20:01 AM all 14.97 0.00 1.20 2.70 81.13
12:30:01 AM all 15.80 0.00 1.39 3.00 79.81
12:40:01 AM all 10.26 0.00 1.25 3.55 84.93
... and so on ...
The output shows the CPU related metrics collected in 10 minute intervals. The columns mean:
CPU The CPU identifier; all means all the CPUs
%user The percentage of CPU used for user processes. Oracle processes come under this
category.
%nice The %ge of CPU utilization while executing under nice priority
%system The %age of CPU executing system processes
%iowait The %age of CPU waiting for I/O
%idle The %age of CPU idle waiting for work
From the above output, you can see that the system has been well balanced; actually severely under-utilized (as seen from the
high degree of %age idle number). Going further through the output we see the following:
... continued from above ...
03:00:01 AM CPU %user %nice %system %iowait %idle
03:10:01 AM all 44.99 0.00 1.27 2.85 40.89
03:20:01 AM all 44.97 0.00 1.20 2.70 41.13
03:30:01 AM all 45.80 0.00 1.39 3.00 39.81
03:40:01 AM all 40.26 0.00 1.25 3.55 44.93
... and so on ...
This tells a different story: the system was loaded by some user processes between 3:00 and 3:40. Perhaps an expensive query
was executing; or perhaps an RMAN job was running, consuming all that CPU. This is where the sar command is useful--it
replays the recorded data showing the data as of a certain time, not now. This is exactly what you wanted to accomplish the
three objectives outlined in the beginning of this section: getting historical data, finding usage patterns and understanding
trends.
If you want to see a specific days sar data, merely open sar with that file name, using the -f option as shown below (to open the
data for 26th)
# sar -f /var/log/sa/sa26
It can also display data in real time, similar to vmstat or mpstat. To get the data every 5 seconds for 10 times, use:
# sar 5 10
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

01:39:16 PM CPU %user %nice %system %iowait %idle


01:39:21 PM all 20.32 0.00 0.18 1.00 78.50
01:39:26 PM all 23.28 0.00 0.20 0.45 76.08
01:39:31 PM all 29.45 0.00 0.27 1.45 68.83
01:39:36 PM all 16.32 0.00 0.20 1.55 81.93
and so on 10 times
Did you notice the all value under CPU? It means the stats were rolled up for all the CPUs. In a single processor system that is
fine; but in multi-processor systems you may want to get the stats for individual CPUs as well as an aggregate one. The -P ALL
option accomplishes that.
#sar -P ALL 2 2
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

01:45:12 PM CPU %user %nice %system %iowait %idle


01:45:14 PM all 22.31 0.00 10.19 0.69 66.81
01:45:14 PM 0 8.00 0.00 24.00 0.00 68.00
01:45:14 PM 1 99.00 0.00 1.00 0.00 0.00
01:45:14 PM 2 6.03 0.00 18.59 0.50 74.87
01:45:14 PM 3 3.50 0.00 8.50 0.00 88.00
01:45:14 PM 4 4.50 0.00 14.00 0.00 81.50
01:45:14 PM 5 54.50 0.00 6.00 0.00 39.50
01:45:14 PM 6 2.96 0.00 7.39 2.96 86.70
01:45:14 PM 7 0.50 0.00 2.00 2.00 95.50

01:45:14 PM CPU %user %nice %system %iowait %idle


01:45:16 PM all 18.98 0.00 7.05 0.19 73.78
01:45:16 PM 0 1.00 0.00 31.00 0.00 68.00
01:45:16 PM 1 37.00 0.00 5.50 0.00 57.50
01:45:16 PM 2 13.50 0.00 19.00 0.00 67.50
01:45:16 PM 3 0.00 0.00 0.00 0.00 100.00
01:45:16 PM 4 0.00 0.00 0.50 0.00 99.50
01:45:16 PM 5 99.00 0.00 1.00 0.00 0.00
01:45:16 PM 6 0.50 0.00 0.00 0.00 99.50
01:45:16 PM 7 0.00 0.00 0.00 1.49 98.51

Average: CPU %user %nice %system %iowait %idle


Average: all 20.64 0.00 8.62 0.44 70.30
Average: 0 4.50 0.00 27.50 0.00 68.00
Average: 1 68.00 0.00 3.25 0.00 28.75
Average: 2 9.77 0.00 18.80 0.25 71.18
Average: 3 1.75 0.00 4.25 0.00 94.00
Average: 4 2.25 0.00 7.25 0.00 90.50
Average: 5 76.81 0.00 3.49 0.00 19.70
Average: 6 1.74 0.00 3.73 1.49 93.03
Average: 7 0.25 0.00 1.00 1.75 97.01
This shows the CPU identifier (starting with 0) and the stats for each. At the very end of the output you will see the average of
runs against each CPU.
The command sar is not only fro CPU related stats. Its useful to get the memory related stats as well. The -r option shows the
extensive memory utilization.
# sar -r
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008
12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad
12:10:01 AM 712264 32178920 97.83 2923884 25430452 16681300 95908 0.57 380
12:20:01 AM 659088 32232096 98.00 2923884 25430968 16681300 95908 0.57 380
12:30:01 AM 651416 32239768 98.02 2923920 25431448 16681300 95908 0.57 380
12:40:01 AM 651840 32239344 98.02 2923920 25430416 16681300 95908 0.57 380
12:50:01 AM 700696 32190488 97.87 2923920 25430416 16681300 95908 0.57 380
Lets see what each column means:
kbmemfree The free memory available in KB at that time
kbmemused The memory used in KB at that time
%memused %age of memory used
kbbuffers This %age of memory was used as buffers
kbcached This %age of memory was used as cache
kbswpfree The free swap space in KB at that time
kbswpused The swap space used in KB at that time
%swpused The %age of swap used at that time
kbswpcad The cached swap in KB at that time
At the very end of the output, you will see the average figure for time period.
You can also get specific memory related stats. The -B option shows the paging related activity.
# sar -B
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

12:00:01 AM pgpgin/s pgpgout/s fault/s majflt/s


12:10:01 AM 134.43 256.63 8716.33 0.00
12:20:01 AM 122.05 181.48 8652.17 0.00
12:30:01 AM 129.05 253.53 8347.93 0.00
... and so on ...
The column shows metrics at that time, not currently.
pgpgin/s The amount of paging into the memory from disk, per second
pgpgout/s The amount of paging out to the disk from memory, per second
fault/s Page faults per second
majflt/s Major page faults per second
To get a similar output for swapping related activity, you can use the -W option.
# sar -W
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

12:00:01 AM pswpin/s pswpout/s


12:10:01 AM 0.00 0.00
12:20:01 AM 0.00 0.00
12:30:01 AM 0.00 0.00
12:40:01 AM 0.00 0.00
... and so on ...
The columns are probably self-explanatory; but here is the description of each anyway:
pswpin/s Pages of memory swapped back into the memory from disk, per second
pswpout/s Pages of memory swapped out to the disk from memory, per second
If you see a lot of swapping, you may be running low on memory. Its not a foregone conclusion but rather something that may
be a strong possibility.
To get the disk device statistics, use the -d option:
# sar -d
Linux 2.6.9-55.0.9.ELlargesmp (prolin3) 12/27/2008

12:00:01 AM DEV tps rd_sec/s wr_sec/s


12:10:01 AM dev1-0 0.00 0.00 0.00
12:10:01 AM dev1-1 5.12 0.00 219.61
12:10:01 AM dev1-2 3.04 42.47 22.20
12:10:01 AM dev1-3 0.18 1.68 1.41
12:10:01 AM dev1-4 1.67 18.94 15.19
... and so on ...
Average: dev8-48 4.48 100.64 22.15
Average: dev8-64 0.00 0.00 0.00
Average: dev8-80 2.00 47.82 5.37
Average: dev8-96 0.00 0.00 0.00
Average: dev8-112 2.22 49.22 12.08
Here is the description of the columns. Again, they show the metrics at that time.
tps Transfers per second. Transfers are I/O operations. Note: this is just number of operations;
each operation may be large or small. So, this, by itself, does not tell the whole story.
rd_sec/s Number of sectors read from the disk per second
wr_sec/s Number of sectors written to the disk per second
To get the historical network statistics, you use the -n option:
# sar -n DEV | more
Linux 2.6.9-42.0.3.ELlargesmp (prolin3) 12/27/2008

12:00:01 AM IFACE rxpck/s txpck/s rxbyt/s txbyt/s rxcmp/s txcmp/s rxmcst/s


12:10:01 AM lo 4.54 4.54 782.08 782.08 0.00 0.00 0.00
12:10:01 AM eth0 2.70 0.00 243.24 0.00 0.00 0.00 0.99
12:10:01 AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth3 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth4 143.79 141.14 73032.72 38273.59 0.00 0.00 0.99
12:10:01 AM eth5 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth6 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM eth7 0.00 0.00 0.00 0.00 0.00 0.00 0.00
12:10:01 AM bond0 146.49 141.14 73275.96 38273.59 0.00 0.00 1.98
and so on
Average: bond0 128.73 121.81 85529.98 27838.44 0.00 0.00 1.98
Average: eth8 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: eth9 3.52 6.74 251.63 10179.83 0.00 0.00 0.00
Average: sit0 0.00 0.00 0.00 0.00 0.00 0.00 0.00

In summary, you have these options for the sar command to get the metrics for the components:
Use this option to get the stats on:
-P Specific CPU(s)
-d Disks
-r Memory
-B Paging
-W Swapping
-n Network
What if you want to get the all the available stats on one output? Instead of calling sar with all these options, you can use the -A
option which shows all the stats stored in the sar files.
Conclusion
In summary, using these limited set of commands you can handle most of the tasks involved with resource management in a
Linux environment. I suggest you practice these in your environment to make yourself familiar with these commands, along with
the options described here.
In the next installments, you will learn how to monitor and manage the network. You will also learn various commands that help
you manage a Linux environment: finding out who has logged in, setting shell profiles, backing up using cpio and tar, and so on.
Further Reading
Guide to Advanced Linux Command Mastery, Part 1
Guide to Advanced Linux Command Mastery, Part 2
Arup Nanda ( arup@proligence.com) has been exclusively an Oracle DBA for more than 12 years with experiences spanning all
areas of Oracle Database technology, and was named "DBA of the Year" by Oracle Magazine in 2003. Arup is a frequent speaker
and writer in Oracle-related events and journals and an Oracle ACE Director. He co-authored four books, including RMAN Recipes
for Oracle Database 11g: A Problem Solution Approach .

Anda mungkin juga menyukai