Anda di halaman 1dari 39

Eclipse Test & Performance Tools Platform, Part 2:

Monitor applications
Collect and analyze a variety of log files

Skill Level: Intermediate

Martin Streicher (martin.streicher@linux-mag.com)


Editor in Chief
Linux Magazine

25 Apr 2006

In this "Eclipse Test & Performance Tools Platform" tutorial series, learn how to use
the capabilities of the Eclipse Test & Performance Tools Platform (TPTP) to convert
application log files into a structured format. Then, using TPTP and other specialized
tools designed to process and analyze log files, you can quickly discern usage
patterns, performance profiles, and errors.

Section 1. Before you start

About this series


Writing code for an application is the first stage in the long process required to
deliver robust production-quality programs. Code must be tested to vet its operation
and accuracy. Code must often be profiled to remove bottlenecks that impede
performance and to remove wasteful or inadvertent use of resources, especially
memory. Code must also be monitored -- to pinpoint failures, of course, but also to
identify usage patterns, opportunities for further enhancement and optimization, and
attempted and actual intrusions.

The Eclipse Test & Performance Tools Platform (TPTP) is a software architecture
and several realized components (so far) that extend the Eclipse platform to include

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 1 of 39
developerWorks® ibm.com/developerWorks

test, performance, and monitoring tools. This "Eclipse Test & Performance Tools
Platform" series explores the capabilities of TPTP. Part 1 demonstrates how to
profile a Java™ technology application. Part 2 demonstrates how to capture and
transform arbitrary log files to the widely supported Common Base Events (CBE)
format. Part 3 explains how to manage application testing.

About this tutorial


This tutorial shows how to use the capabilities of the Eclipse TPTP to convert a
typical application log file into CBE occurrences. With a modicum of specifications
and some light coding to create a series of rules, you can transform virtually any log
file into a unified, structured format. Then, using the Eclipse TPTP and other
specialized tools, you can combine, process, and quickly discern usage patterns,
performance profiles, and errors.

Objectives
In this tutorial, you learn how to write an adapter to transform a typical Linux®
software service log file into CBE data. You create the transform incrementally and
debug the transform with the Eclipse TPTP Adapter Configuration Editor, then the
Generic Log Adapter (GLA) to input, transform, and emit the data.

Prerequisites
You should have experience with software development and the entire software
development life cycle, including testing and profiling. You should also have
experience installing software from the command line, and setting and managing
shell and system environment variables, such as the shell's PATH variable and the
Java CLASSPATH. Additionally, it's vital that you have some experience reading and
writing regular expressions. Acquaintance with Eclipse and the Eclipse user
interface (UI) paradigms is also beneficial.

System requirements
You can run Eclipse on any system that has a JVM, such as Solaris, Linux, Mac OS
X, or Windows. If you don't have a JVM and Eclipse installed on your system, make
sure you have at least 300 MB of disk space free for all the software. You also need
enough free physical memory to run the JVM. In general, 64 MB or more of free
physical memory is recommended.

You must install several software packages on your UNIX®, Linux, Mac OS X, or
Microsoft® Windows® system. You need a functioning Java Virtual Machine (JVM),

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 2 of 39
ibm.com/developerWorks developerWorks®

a copy of the Eclipse SDK, a copy of the Eclipse TPTP runtime, and several
prerequisites and co-requisites on which the Eclipse TPTP depends. You also need
a copy of the Eclipse TPTP GLA, which allows you to transform log files in a
stand-alone application or in your own application. Here's everything you need:

• Download Java technology from Sun Microsystems or from IBM


• Eclipse V3.1 Software Development Kit (SDK)
• Eclipse Modeling Framework (EMF) SDK V2.1
• XML Schema Infoset Model (XSD) SDK V2.1
• Version 1.1.1 of Eclipse Unified Modeling Language (UML) 2
• The Eclipse TPTP runtime
• GLA runtime

Section 2. Transforming and analyzing log files


To allow ongoing monitoring, a complex application -- and certainly an application
expected to run continuously -- is typically instrumented during development to emit
a log file, which is a record of application activity. Some activity can be detailed
internal diagnostics, which is information crucial for isolating a bug or untangling
interactions with other system and software components. Some activity logged might
be initiated by the application itself -- say, to read a configuration file or to open a
port for listening. Other activity might be generated by requests for service.

The problem: Ongoing monitoring for legacy applications


Depending on the application's purpose, a systems administrator might review the
program's corresponding log file from time to time -- when an error occurs or even in
real time to react to emergent events. Logs are often full of valuable historical
information, too. Think of the traffic and usage patterns found only in Apache HTTP
Server logs, for instance.

It would be ideal if all log files captured at least a minimum of information. It would
be even better -- certainly from a systems administrator's point of view -- if the format
of all log files was uniform. Consistency would make reading logs far easier, and
homogeneity would certainly facilitate (not to mention cheapen the expense of) the
development of automated tools that weed out vital events from the informational.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 3 of 39
developerWorks® ibm.com/developerWorks

But invariability is not reality. Applications differ greatly (as do underlying operating
system facilities and programming language libraries). Some applications are
entrenched and cannot be revised ("legacy applications") to be brought to uniformity.
And it's an ugly truth that expensive and scarce developer cycles are usually spent
on new features, not retrofits.

The solution: Transform log file data


Short of the ideal and realizing that one solution cannot ever fit all, it is far more
practical to transform log file data to meet evolving standards, de-facto or otherwise,
and to apply state-of-the-art analysis tools. For example, the CBE format is part of
an effort to define a broad standard for recording, tracking, and analyzing events,
which are occurrences and situations that take place in computing systems. Many
tools exist to process and analyze CBE data, which is based on XML.

But while transformation from arbitrary log file to CBE may be practical, the process
may not be easy or inexpensive. Given the variety of applications and the sheer
number of log file formats, writing so many transforms can be a Herculean task in
itself.

The Eclipse TPTP GLA and Adapter Configuration Editor simplify the creation of
transforms, thereby easing the migration to CBE. The GLA applies an adapter
created by the Adapter Configuration Editor to a log file and yields CBE data. The
Adapter Configuration Editor can run a handmade Java class if need be -- a static
adapter -- or it can run a series of rules to divide the log file into records, fields, and
values and reassemble them as CBE data. The latter form of adapter is a
rules-based adapter and requires no coding. Better yet, the Adapter Configuration
Editor runs in Eclipse and provides a rich adapter development environment in which
you can incrementally define and test your adapter. Finally, you can choose to
integrate the GLA with your own code or use third-party tools, such as the IBM Log
and Trace Analyzer, to probe and investigate the resulting CBE event files.

This tutorial shows how to use the capabilities of the Eclipse TPTP GLA and Adapter
Configuration Editor to convert a typical Linux application log file to CBE events.
With a log file in hand and a little regular expression know-how, you can transform
the log into a unified, structured CBE format.

Section 3. Installing the prerequisite software and


components

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 4 of 39
ibm.com/developerWorks developerWorks®

Before you can begin, you must install and set up the required software and
components (see Prerequisites).

Install J2RE V1.4


Download and install Version 1.4 or 1.5 (also called Version 5.0). (If your system
already has J2RE V1.4 or later, you can safely skip this step.)

Typically, the JRE is distributed as a self-extracting binary. Assuming that you


downloaded the J2RE packages to your home directory, installation (on Linux) is
typically as easy as Listing 1.

Listing 1. J2RE V1.4 installation

% cd ~
% mkdir ~/java
% cd ~/java
% mv ~/jre-1_5_0_06-linux-i586.bin .
% chmod +x jre-1_5_0_06-linux-i586.bin
% ./jre-1_5_0_06-linux-i586.bin
...
% rm ./jre-1_5_0_06-linux-i586.bin
% ls -F
jre1.5.0_06/

The commands in Listing 1 install J2RE V1.5, but the steps to install J2RE V1.4 are
identical (except for the file name).

Install the Eclipse V3.1 SDK


Download the Eclipse V3.1 SDK that's appropriate for your platform. You can find
the SDK at the Eclipse Downloads. Typically, installation is as easy as unpacking
the Eclipse tarball (.tar.gz) file into the directory of your choice.

For example, if you're using Linux, download the Eclipse V3.1 SDK tarball and
unpack it in a directory such as ~/java/ using the commands in Listing 2.

Listing 2. Eclipse V3.1 SDK installation

% cd ~/java
% mv ~/eclipse-SDK-3.1.1-linux-gtk.tar.gz .
% tar zxvf eclipse-SDK-3.1.1-linux-gtk.tar.gz
...
% rm eclipse-SDK-3.1.1-linux-gtk.tar.gz

To verify that you successfully installed Eclipse, remain in the directory where you

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 5 of 39
developerWorks® ibm.com/developerWorks

unpacked Eclipse, make sure the java executable is in your PATH, and run java
-jar eclipse/startup.jar. For example:

Listing 3. Verify installation

% export JAVA_DIR=$HOME/java
% export JAVA_HOME=$JAVA_DIR/jre1.5.0_06
% export PATH=$JAVA_HOME/bin
% export CLASSPATH=$JAVA_HOME
% cd $JAVA_DIR
% java -jar eclipse/startup.jar

If Eclipse prompts you to choose a directory for your workspace, use


$HOME/java/workspace. This directory retains all the projects you create in Eclipse.
(Of course, if you have many projects, you can create other workspaces later,
perhaps to contain one project per workspace.) Now, quit Eclipse to install the
Eclipse TPTP, its prerequisites and co-requisites, and the GLA.

Install the TPTP and GLA runtime


The Eclipse TPTP runtime contains the software required to create, debug, and run
adapters. To install the Eclipse TPTP software, download the Eclipse TPTP and
GLA runtimes. Both are typically distributed in zip format. Move both files into the
directory that contains the J2RE and Eclipse and extract them (see Listing 4). If
you're prompted to overwrite any files, simply choose All.

Listing 4. Eclipse TPTP and GLA installation

% cd ~/java
% mv ~/tptp.runtime-TPTP-4.1.0.zip .
% mv ~/tptp.gla.runtime-TPTP-4.1.0.1.zip .
% unzip tptp.runtime-TPTP-4.1.0.zip
...
% unzip tptp.gla.runtime-TPTP-4.1.0.1.zip
...
% rm tptp.runtime-TPTP-4.1.0.zip
% rm tptp.gla.runtime-TPTP-4.1.0.1.zip
% ls -F
GenericLogAdapter/ eclipse/ jre1.5.0_06/

Install the EMF SDK V2.1


You must install EMF SDK V2.1 for TPTP to work properly.

Quit Eclipse if it's running and download the EMF SDK V2.1. Then change to the
directory that contains the Eclipse folder and run unzip emf-sdo-SDK-2.1.0.zip (see
Listing 5).

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 6 of 39
ibm.com/developerWorks developerWorks®

Listing 5. EMF SDK V2.1 installation

% cd $JAVA_DIR
% ls
eclipse jre1.5.0_06
% mv ~/emf-sdo-SDK-2.1.0.zip .
% unzip emf-sdo-SDK-2.1.0.zip
creating: eclipse/features/
creating: eclipse/features/org.eclipse.emf.ecore.sdo_2.1.0/
creating: eclipse/features/org.eclipse.emf_2.1.0/
inflating: ...
...
% rm emf-sdo-SDK-2.1.0.zip

Install the XSD SDK V2.1


As with the previous file, change to the directory that contains the Eclipse directory
and run unzip xsd-SDK-2.1.0.zip (see Listing 6).

Listing 6. XSD SDK V2.1 installation

% cd $JAVA_DIR
% mv ~/xsd-SDK-2.1.0.zip .
% unzip xsd-SDK-2.1.0.zip
% rm xsd-SDK-2.1.0.zip

If prompted to confirm the overwrite of any files, simply press y (lowercase) to


answer "yes" to each question.

Install the UML V2.0 Metamodel Implementation


To use the UML features of the Eclipse TPTP, you must install the UML V2.0
Metamodel Implementation. If using Eclipse V3.1.1, download Version 1.1.1 of UML
2 and unpack its archive file in the same directory that contains Eclipse (see Listing
7).

Listing 7. UML V2.0 Metamodel Implementation installation

% cd $JAVA_DIR
% mv ~/uml2-1.1.1.zip .
% unzip uml2-1.1.1.zip
...
% rm uml2-1.1.1.zip

Install the Agent Controller

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 7 of 39
developerWorks® ibm.com/developerWorks

The Agent Controller is a vital component of the Eclipse TPTP that allows Eclipse to
launch applications and interact with those applications to extract profiling data.
Download the Agent Controller runtime appropriate for your operating system. Next,
create a directory named tptpd in the same directory that contains Eclipse and
unpack the Agent Controller archive into that directory (see Listing 8).

Listing 8. Agent Controller installation

% mkdir $JAVA_DIR/tptpd
% cd $JAVA_DIR/tptpd
% mv ~/tptpdc.linux_ia32-TPTP-4.1.0.zip .
% unzip tptpdc.linux_ia32-TPTP-4.1.0.zip

If you see two errors like these:

Listing 9. Agent Controller installation

linking: lib/libxerces-c.so
warning: symbolic link (lib/libxerces-c.so) failed
linking: lib/libxerces-c.so.24
warning: symbolic link (lib/libxerces-c.so.24) failed

recreate the two links manually by typing the following:

Listing 10. Agent Controller installation

% cd $JAVA_DIR/tptpd/lib
% rm libxerces-c.so libxerces-c.so.24
% ln -s libxerces-c.so.24.0 libxerces-c.so
% ln -s libxerces-c.so.24.0 libxerces-c.so.24

Add the Agent Controller directory

To use the Agent Controller, you must add its lib directory to your
LD_LIBRARY_PATH. For example, if you're running Linux and have adopted the
same directory structure shown in the steps above, you'd add $JAVA_DIR/tptpd/lib
as follows:

% export LD_LIBRARY_PATH=$JAVA_DIR/tptpd/lib:$LD_LIBRARY_PATH

You must also ensure that the contents of the Controller's lib and bin directories are
executable. To do that, run:

% chmod +x $JAVA_DIR/tptpd/{bin,lib}/*

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 8 of 39
ibm.com/developerWorks developerWorks®

Now, add the scripts that configure, and start and stop the Agent Controller to your
PATH:

% export PATH=$JAVA_DIR/tptpd/bin:$PATH

Configure the Agent Controller for your environment

Finally, you configure the Agent Controller to match your environment. Change to
the Agent Controller's bin directory, then run SetConfig.sh.

% cd $JAVA_DIR/tptpd/bin
% ./SetConfig.sh

When the configure script prompts you, accept the defaults. Running the configure
script creates the file config/serviceconfig.xml in the Agent Controller's hierarchy of
files.

Test the Agent Controller

To test the Agent Controller, run RAStart.sh. To stop the Controller, run RAStop.sh:

Listing 11. Agent Controller installation

db% RAStart.sh
Starting Agent Controller
RAServer started successfully
% RAStop.sh
RAServer stopped, pid = 5891
RAServer stopped, pid = 5892
RAServer stopped, pid = 5893
RAServer stopped, pid = 5894
RAServer stopped, pid = 5895
RAServer stopped, pid = 5896
RAServer stopped, pid = 5897
RAServer stopped, pid = 5898
RAServer stopped, pid = 5899
RAServer stopped, pid = 5900
RAServer stopped, pid = 5901
RAServer stopped, pid = 5902
RAServer stopped, pid = 5904
RAServer stopped, pid = 5905
RAServer stopped, pid = 5906

Finished! Restart Eclipse and you should see a new button on the Eclipse toolbar
that looks like Figure 1. That's the TPTP Profile button -- the indication that your
installation of TPTP has been successful. You're ready to continue with the tutorial.

Figure 1. The TPTP Profile button

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 9 of 39
developerWorks® ibm.com/developerWorks

Section 4. Creating an adapter


The GLA uses an XML configuration file to control how it parses and transforms log
files, and how it emits data. A configuration file contains one or more contexts in
which each context defines how to transform one log file. In some cases, contexts
within a configuration file can run simultaneously.

The adapter configuration file


Begin by creating an adapter configuration file to process the Linux log file named
daemon.log. On your test system, running Debian Linux, daemon.log captures
messages from the POP3 (e-mail), THTTPD (the "trivial" HTTP server -- a small, fast
Web server that only serves static files), and MyDNS (a small, easy-to-configure
Domain Name System (DNS) server) daemons. Daemon.log also records when the
MySQL daemon starts and stops.

Listing 12 shows a snippet of the file with log entries created by the POP3 and
THTTPD servers.

Listing 12. A snippet of the Linux daemon.log file

Mar 2 07:24:54 db popa3d[8861]: Session


from 66.27.187.89
Mar 2 07:24:55 db popa3d[8861]: \
Authentication passed for joan
Mar 2 07:24:55 db popa3d[8861]: \
1422 messages (11773432 bytes) loaded
Mar 2 07:24:57 db popa3d[8861]: \
0 (0) deleted, 1422 (11773432) left
Mar 2 07:26:28 db thttpd[7784]: \
up 3600 seconds, stats for 3600 seconds:
Mar 2 07:26:28 db thttpd[7784]: \
thttpd - 0 connections (0/sec), 0 max
simultaneous
Mar 2 07:26:28 db thttpd[7784]: \
map cache - 0 allocated, 0 active (0
bytes)...
Mar 2 07:26:28 db thttpd[7784]: \
fdwatch - 1589 selects (0.441389/sec)
Mar 2 07:26:28 db thttpd[7784]: \
timers - 3 allocated, 3 active, 0 free
Mar 2 07:27:35 db popa3d[8911]: \
Session from 71.65.224.25
Mar 2 07:27:35 db popa3d[8911]: \
Authentication passed for martin
Mar 2 07:27:35 db popa3d[8911]: \
1350 messages (10880072 bytes) loaded
Mar 2 07:27:36 db popa3d[8911]: \
4 (11356) deleted, 1346 (10868716) left
Mar 2 07:29:54 db popa3d[8963]: \

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 10 of 39
ibm.com/developerWorks developerWorks®

Session from 66.27.187.89

Each context in an adapter configuration file defines six components: the context
instance, the sensor, the extractor, the parser, the formatter, and the outputter. The
context instance sets parameters for the general operation of the transformation,
including whether the log is appended to continuously and how frequently the log is
amended. The remaining five components (conceptually) act in sequence, reading
input, performing a task, and passing results on for further processing (except for
certain outputters, which simply write results to a file or to the console):

• The sensor reads the log file in pieces until it reaches the end of the file
and pauses. Then, when the sensor detects that the log file has grown, it
reads the additional data. The sensor passes its data to the next stage,
the extractor.
• The extractor reads data and divides it into individual records. One
regular expression defines what the start of a record looks like, and
another regular expression defines the end of record. Individual records,
when identified, are passed on to the parser for additional processing.
• The parser reads one record at a time from the extractor and
decomposes each into fields and values. Furthermore, the parser can
make decisions based on the content of a record and apply one or more
sets of rules to yield fields and values. For instance, if a log file indicates
the start, interim, and end of an event, the parser can decompose each
record into a set of fields and values unique to that event. Ultimately, the
parser's objective is to map fields and values in each log file entry to the
proper elements, attributes, and values in a CBE XML record. The
formatter reads the output of the parser.
• The formatter's job is simple: It reads the elements, attributes, and values
the parser creates, and it creates an object suitable for consumption by
the last stage in the context, the outputter.
• And the outputter consumes objects from the formatter and emits the
object. Outputters can emit XML to a file or to the console. They can also
create a new log file or pass the data to a daemon.
The next five sections describe how to define each of the six components of a
context.

Create an adapter configuration file


To begin, create a simple Eclipse project to contain the adapter configuration file:

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 11 of 39
developerWorks® ibm.com/developerWorks

1. Click File > New and expand Simple. Choose Project and click Next.

2. Name the project My Adapter and click Finish.

3. Click File > New > Other and expand Generic Log Adapter. Choose
Generic Log Adapter File and click Next.

4. Choose My Adapter and name the adapter file my.adapter. Click Next.

5. Choose a template for the log file you want to process with this adapter
(see Figure 2).
You can use a snippet of the actual log file you want to process or an
accurate representation of the log file -- say, from a detailed specification.
Click Browse, navigate to the file system, and open the template. After
making your selection, click Finish. Click Yes when prompted to switch
perspectives.

Figure 2. Choose a template that represents the log file to transform

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 12 of 39
ibm.com/developerWorks developerWorks®

Figure 3 shows the Generic Log Adapter perspective. As you can see, the UI
displays a context instance in which the sensor's properties point to the template log
file you just chose. The context instance also includes an extractor, a parser, a
formatter, and an outputter, which you must define further.

Figure 3. The Generic Log Adapter perspective

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 13 of 39
developerWorks® ibm.com/developerWorks

Configure the context


Each context instance describes how to process one log file. You can set several
options in a context instance. To see the options, click Context Instance below
Configuration. You should see a panel that resembles Figure 4.

Figure 4. Context instance options

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 14 of 39
ibm.com/developerWorks developerWorks®

You can edit the Description to capture the intent of this particular context. In
addition:

• If your log file is continuously updated, as is the case with daemon.log,


select the Continuous operation check box.
• Maximum idle time is the number of milliseconds a context should wait
for the log file to change before the context instance is shut down.
• Pause interval controls how long the context should wait after it reaches
the end of a log file.
• Because log files aren't only ASCII text, you can set the ISO language
code (using two lowercase letters), the ISO country code (using two
uppercase letters), and the file's Encoding (using a value from the
Internet Assigned Numbers Authority (IANA) character set registry). By
default, these parameters are set to en, US, and the default encoding of
the JVM.
• Finally, because some log files do not denote time zone, year, month, and

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 15 of 39
developerWorks® ibm.com/developerWorks

day and because CBEs require all four values, you can provide substitute
values in the Timezone GMT offset and Log file creation date fields.
Because daemon.log grows continuously, select the Continuous operation check
box. Because mail is typically polled often, set Maximum idle time and Pause
interval to 120. The test machine is located in Colorado, so the GMT is -7.
Daemon.log doesn't specify the year, so a default of 2006 is provided as a
substitute. After making these changes, save the file.

Section 5. Specifying the sensor


A sensor reads a log file and forwards the data collected to the extractor. The next
step is to specify how your sensor should work.

Specify how the sensor works


Click the sensor. Its properties are shown in Figure 5, which also shows the values
set for the daemon.log sensor.

Figure 5. Setting the sensor for daemon.log

Because daemon.log is a single file, you don't need to change the Sensor type
option. The Description field provides for clarity of purpose. Maximum blocking
defines the number of lines to read before passing the input along to the extractor.
Because entries in daemon.log tend to span many lines, 10 is a reasonable setting.
The value for Confidence buffer size dictates the size of a buffer to contain the last
n bytes of the log file. If the log file changes -- that is, the last n bytes differ from
what's retained in the Confidence buffer -- the sensor reads more input. The default

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 16 of 39
ibm.com/developerWorks developerWorks®

is 1,024 bytes, which is sufficient for this example.

Some logs append a footer to the end of the log file (each time new data is written).
Usually, this data is best ignored, so to skip the footer, specify the number of bytes
to skip in File footer size. Daemon.log doesn't have a footer, so the value is set to
0.

If you expand the Sensor type (by clicking on the arrow), you'll see two additional
properties: directory and fileName. These properties are initially set to the location
and name of your template log file, but you'll soon switch them to process live data.

Don't forget to save the configuration file after setting the sensor properties. And, in
general, always save the configuration file before you attempt to run the adapter.

Section 6. Editing the extractor


The role of the sensor is to collect input. The role of the extractor is to divide the
incoming input stream into individual records. (The next component in the chain --
the parser -- divides each record into fields.)

Configure the extractor properties


To edit the extractor, click Extractor. Its properties are shown in Figure 6. The
properties of the extractor specify the delimiters of each record and control whether
those delimiters should be included in the record passed on to the parser.

Figure 6. The extractor properties

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 17 of 39
developerWorks® ibm.com/developerWorks

In the example log file, daemon.log, each line of the log is a separate event. This
makes the extractor particularly easy to configure. (Figure 6 is the appropriate
configuration for daemon.log.)

• The Contains line breaks check box is cleared, because each line in
daemon.log is a record. However, if an entry were to span many lines, as
is the case with MySQL or IBM DB2® database logs, you'd select this
check box.
• The Replace line breaks check box is also cleared in this example. If the
log file contained line breaks, though, you could select this check box to
either delete each line break or replace each one with a special marker --
useful for parsing. To delete line breaks, simply select the check box; to
replace each line break with a token, select the check box and provide the
delimiter in the Line break symbol field. It's best to choose a symbol that
doesn't appear in the log file.
• The Start pattern and End pattern are regular expressions that describe
the start and end of each record. Here, where each line is a record, the

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 18 of 39
ibm.com/developerWorks developerWorks®

beginning of the line, or ^ (caret), marks the start of the record. The end
of the line, or $ (dollar sign), marks the end of each record. Because ^
and $ do not capture any content, neither need be included in the record
itself.
Save your work before continuing.

A MySQL example
For comparison, create another example extractor for MySQL's slow query log, a
special log used to capture suboptimal queries. Each entry in the slow query log
spans at least three lines (see Listing 13).

Listing 13. A snippet of MySQL's slow query log

# Time: 030207 15:03:33


# Query_time: 13 Lock_time: 0 Rows_sent: 0 Rows_examined: 0
SELECT l FROM un WHERE ip='209.xx.xxx.xx';
# Time: 030207 15:03:42
# Query_time: 17 Lock_time: 1 Rows_sent: 0 Rows_examined: 0
SELECT l FROM un WHERE ip='214.xx.xxx.xx';
# Time: 030207 15:03:43
# Query_time: 57 Lock_time: 0 Rows_sent: 2117 Rows_examined: 4234
SELECT c,cn,ct FROM cr,l,un WHERE ci=lt AND lf='MP' AND ui=cu;

An extractor for the slow query log might look something like Figure 7.

Figure 7. A sample extractor for the MySQL slow query log

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 19 of 39
developerWorks® ibm.com/developerWorks

Figure 8 shows the second of the three records, each successfully processed by the
extractor.

Figure 8. An extracted record from the slow query log

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 20 of 39
ibm.com/developerWorks developerWorks®

Testing your work so far


Returning to the daemon.log adapter, you can now test the sensor and extractor
components to verify that data is being acquired and divided into records.

Rerun the adapter

Glance at the two panes at the bottom of the Generic Log Adapter perspective. You
should see something resembling Figure 9. At left is the Extractor Result pane; at
right, layered, are the Formatter Result pane, the Sensor Result pane, and the
Problems pane. A series of buttons that control the adapter appear within the
Extractor Result pane. Figure 10 labels the buttons (or you can slowly mouse over
each button to see a tool tip.)

Figure 9. Context components display panes

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 21 of 39
developerWorks® ibm.com/developerWorks

Figure 10. The adapter control buttons

Click Rerun adapter to restart processing from the beginning of the log file template.
Then click Next event to process the first event.

• The Sensor Result pane should show the first 10-20 lines of the log file.
• The Extractor Result pane should show the first line of the log file, Mar 2
06:27:35 db popa3d[7964]: Session from 71.65.224.25.
• The Problems pane should be empty. However, pay close attention to this
pane whenever you run your adapter. If you've omitted required CBE
properties, specified an illegal regular expression, or used an
unsupported value, this pane should point those out.
• The Formatter Result pane is irrelevant because a parser has yet to be
defined. However, it does show an initial XML CBE for the current record:
Listing 14. Initial XML CBE for current record

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 22 of 39
ibm.com/developerWorks developerWorks®

<CommonBaseEvent
creationTime="replace with our message text"
globalInstanceId="A1DAABE6C7876D20E8E9E8C475042F1B"
version="1.0.1">
</CommonBaseEvent>

As you'll see, as you define your parser, additional elements and attributes will
automatically be added to the XML.

To have the extractor produce the next record, click Next event again. To
fast-forward to the last record (in the input the sensor has collected so far), click
Show last event.

Section 7. Producing the parser


The sensor reads data. The extractor subdivides the data into records. The role of
the parser is to extract specific fields from each record and use those values to
construct a complete CBE XML record.

The role of the parser


The parser may extract some fields from the log file directly, such as a time stamp,
host name, daemon name, and a text message. The parser may also infer data from
a record. For example, the parser may detect that the record originated with a
software service and set the CBE componentIdType attribute to ServiceName. In
other instances, the parser may add data to a record. In particular, if a log entry
doesn't record the day, month, year, time, and time zone of the event, the parser
must add that data to create a valid CBE.

To put the parser for the daemon.log example in perspective, Listing 15 shows a
valid CBE XML record for the log entry Mar 2 06:27:35 db popa3d[7964]:
Session from 71.65.224.25. Some of the attributes are plainly derived from
the original log entry; others will be manufactured from implied data. (Many of the
values of the attributes come from the Common Base Events Specification. It's
helpful to use that document while creating your parsers.)

Listing 15. The CBE equivalent of the first record of daemon.log

<CommonBaseEvent
creationTime="2006-03-02T13:27:35.000Z"
globalInstanceId="A1DAABECA2ACB4F0E8E9E8C475042F1B"
msg="Session from 71.65.224.25"

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 23 of 39
developerWorks® ibm.com/developerWorks

version="1.0.1">
<sourceComponentId
component="popa3d"
componentIdType="ServiceName"
location="db.linux-mag.com"
locationType="Hostname"
subComponent="7964"
componentType="daemon"
/>
<situation
categoryName="StartSituation">
<situationType
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:type="StartSituation"
reasoningScope="EXTERNAL"
successDisposition="SUCCESSFUL"
situationQualifier="START INITIATED"/>
</situation>
</CommonBaseEvent>

Also keep in mind that (at a minimum) every CBE must define the creationTime
attribute, the msg attribute, and the sourceComponentId element, which in turn
must have the six attributes shown in Listing 15. The situation element (among
others) is optional, but is part of the example to elaborate upon the event.

How the parser works


Click Parser in the Generic Log Adapter perspective to begin the process of defining
it. Figure 11 shows what the completed parser looks like. There is one parser task
for every attribute and element in the CBE shown in Listing 15.

Figure 11. The complete parser for daemon.log

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 24 of 39
ibm.com/developerWorks developerWorks®

The parser works in two phases. First, it divides the incoming record (from the
extractor) into positions, or numbered parts, in which each part is separated from the
other by the separator token. If no separator token is specified, this step is skipped.
Then the parser divides the record into designations, or (name, value) pairs, in which
each (name, value) pair is two strings joined by the designation token. If no
designation token is specified, the latter step is skipped.

Consider this example: If the separator token is the regular expression [ ]+, the
designation token is = (equal sign), and the parser is handed the record:

03/05/06 12:51:06EST Mail name=joe action=login authentication=password

the parser would define six positions and three designations, as shown in Table 1.

Table 1. Positions and designations from the parser


Position/Designation Value
1 03/05/06

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 25 of 39
developerWorks® ibm.com/developerWorks

2 12:51:06EST
3 Mail
4 name=joe
5 action=login
6 authentication=password
h{'name'} joe
h{'action'} login
h{'authentication'} password

Note: If your incoming record begins with the separator token, position 1 is created,
but left empty.

You can use all the defined positions and designations to simplify each parser task.
For instance, to create the creationTime attribute, you need only parse position 2.
Of course, the entire original record is always available. However, positions and
designations make each parsing task faster and easier to manage because the
source string is smaller. In many cases, you can use a position or designation
directly for a CBE value.

Parse the sample log entries


Click Parser again. For convenience, break each daemon.log entry into two
positions using the separator token :[ ]+ (a colon followed by one or more
spaces). The daemon.log log entries don't have (name, value) pairs, so the
designator token is omitted. These settings are shown in Figure 12. Now, save your
work.

Figure 12. Dividing a record into positions

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 26 of 39
ibm.com/developerWorks developerWorks®

Set the creationTime

Set the first required field in the CBE: creationTime. The goal is to transform the
time stamp provided with the daemon.log record into a time format compatible with
the XML schema dateTime data type. As a convenience, the adapter can
automatically permute a time format understood by class
java.text.SimpleDateFormat into the XML schema data type.

To set the creationTime field, complete these steps:

1. Expand the parser and select creationTime. This is a required CBE


attribute, so select the Required by parent check box.

2. Click the substitution rule associated with creationTime.

3. For Positions, type 1 because position 1 contains the time stamp to


extract.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 27 of 39
developerWorks® ibm.com/developerWorks

4. For Match, provide the regular expression


^(\w{3})\s+(\d{1,2})\s+([\d:]+)\s+.*$. This expression
captures the month name as $1, the day of the month as $2, and the time
of day as $3.

5. For Substitute, supply $1 $2 @YEAR $3 @TIMEZONE.


Substitute is used instead of the entire incoming record in the rest of this
specific parsing task. $1, $2, and $3 came from the previous step.
However, because the time stamp doesn't include a year or a time zone,
the year and time zone associated with the current context instance,
represented by the shorthand @YEAR and @TIMEZONE, respectively, are
used instead. Therefore, for the first daemon.log record, the settings in
Substitute yield the string Mar 02 2006 06:27:35 -0700.

6. Ignoring the Substitute extension class field, which allows you to


provide a Java class to do additional substitutions, transform the result of
the substitution to the right type. You can use a
java.text.SimpleDateFormat format string to do the heavy lifting.
Set Time format to MMM dd yyyy hh:mm:ss Z, indicating a
three-letter name of the month; a two-digit day of the month; a four-digit
year; hours, minutes, and seconds separated by colons; and an RFC 822
time zone.

Figure 13 shows the final settings for creationTime. If you save the configuration
file and rerun the adapter, the Formatter Result pane should show a new XML
record with attribute creationTime="2006-03-02T13:27:35.000Z".

Figure 13. Parsing the incoming time stamp into the creationTime attribute

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 28 of 39
ibm.com/developerWorks developerWorks®

Getting the message

The msg attribute is another required CBE attribute. Add this attribute and create the
parser task to extract a suitable value:

1. Right-click CommonBaseEvent, then click Add > msg.

2. Click msg, then select the Required by parent check box.

3. Expand msg, then click Substitution Rule.

4. Specify 2 in the Positions field because the message portion of the log
entry is located in position 2. (It's everything after the separator token.)

5. For Match, specify a regular expression that selects the entire string. The
regular expression ^(.*)$ captures everything in $1.

6. For Substitute, specify $1.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 29 of 39
developerWorks® ibm.com/developerWorks

Figure 14 shows the final settings.

Figure 14. Settings to extract the message

Save the configuration file and click Rerun adapter, found in the Extractor Result
pane. Click Next event and switch to the Formatter Result pane. You should see a
new msg attribute that looks like msg="Session from 71.65.224.25".

Find the source

The last mandatory part of a CBE record is the sourceComponentId, used to


record the component (service, system, and so on) that's affected by the event. In
the instance of daemon.log, the components affected are software services running
on a specific host. The parser's job is to capture and record the specifics.

Right-click CommonBaseEvent once again, and then click Add >


sourceComponentId. (Figure 15 shows all the possible attributes and elements you
can add to a CBE.) For brevity, Table 2 shows all the settings required for
sourceComponentId. One new setting is Default value. If a match is made by a
parsing rule, but no substitute value is provided, the Default value is used.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 30 of 39
ibm.com/developerWorks developerWorks®

Figure 15. List of elements and attributes you can add to a CBE record

Table 2. Settings for the sourceComponentId


Item Default Required by Positions Match Substitute Notes
value parent
component Yes 1 ^.* db $1 Captures the
(\w+)\[.*$ name of the
software
service,
such as
pop3ad or
mysqld.
componentIdType
ServiceNameYes ^(.*) Indicates
that the
component
records the
name of a
service;

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 31 of 39
developerWorks® ibm.com/developerWorks

ServiceName
is one of the
prescribed
values for
this attribute,
according to
the CBE
specification.
componentType
daemon Yes ^(.*) Describes
the class of
the
component.
location Yes
db.linux-mag.com ^(.*) Specifies the
physical
address that
corresponds
to the
location of a
component.
The format
of the value
of the
location is
specified by
the
locationType
property. It is
recommended
that you use
a fully
qualified
host name
for this
attribute.
Here,
because the
log entry
does not
include a
host name,
one is added
via the
default
value. In
other cases,
you may be
able to parse
the host
name
directly from
the log.
locationType Hostname Yes 1 ^(.*) Specifies the
format and

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 32 of 39
ibm.com/developerWorks developerWorks®

meaning of
the value in
the location
property.
The
Hostname
keyword is
one of many
possible
keywords
that you can
use here.
subComponent Yes ^.*\[(\d+)\].*
$1 Identifies the
specific
daemon
process that
the event
affects.

If you make all the changes listed in Table 2, and save and rerun the adapter, you
should yield CBE event records that resemble Listing 15. As an additional exercise,
add a situation to the CBE. Situations categorize the type of situation that
initiated the event. For instance, you might create a parser to create a
StartSituation whenever the daemon is initially contacted for service or create
another parser to create a RequestSituation when a request is made.

Situations aren't required (hence, Required by parent can be disabled), but you
may find them useful to add granularity to your CBE records. If you create a situation
and add a series of possible situation parsers, select the Child choice check box if
processing can stop after the first match is made.

Here's a helpful tip for debugging your parsers: If a property is required, but not
found in the incoming record (passed to the parser from the extractor), the Formatter
Result pane for that record will be empty. In other words, required properties behave
like logical AND: If one match fails, processing for that record stops. It's often useful
to clear the Required by parent check box to debug rules. Build your rules slowly
and incrementally, and watch the Problem pane for clues.

Section 8. The formatter and organizing the outputter


Now that the parser has yielded properties and values, the new data must be
assembled into a CBE instance. That's the role of the formatter.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 33 of 39
developerWorks® ibm.com/developerWorks

Emit CBE XML records to a file


The adapter formatter requires no configuration. It's an internal operation that
creates CBE objects that conform to the CBE V1.0.1 specification.

After the formatter has created CBE objects, it's the job of the outputter to emit them
to a file, standard output, another log, a logging agent, or a log analyzer. If your
adapter configuration defines multiple contexts, you can use a special formatter to
allow multiple contexts to write to a single file.

To keep things simple, emit the CBE XML records to a single file:

1. Click Outputter in the Generic Log Adapter perspective, then choose


SingleFileOutputter for Outputter type.

2. Right-click Outputter, then click Add > property.

3. Click the new property, then set Property name to directory. Set the
Property value to a directory to which you're able to write files. Omit the
name of a file. Just specify the path of the directory, omitting the trailing
slash.

4. Right-click Outputter again and click Add > property. Set this new
Property name to fileName, and set the Property value to a file name.
This file will be created in the directory named by directory.

Change the context instance


In addition to changing the configuration, you must also change the context instance
to use the proper outputter class. To do so, complete these steps:

1. Expand Contexts in the General Log Adapter perspective and expand


Context Basic Context Implementation.

2. Click Component Logging Agent Outputter.

3. Change the Name and Description to Single File Outputter.

4. Change the Executable class to


org.eclipse.hyades.logging.adapter.outputters.CBEFileOutputter.

5. Save the configuration file.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 34 of 39
ibm.com/developerWorks developerWorks®

Add the SingleFileOutputterType


There is one more important step: For some reason, the Adapter Configuration
Editor can omit an important element from the outputter definition in the
configuration file for the adapter. (You can read the relevant thread on the
developerWorks Autonomic computing forum's No Output from Outputter.) However,
you can quickly add the element to the file manually.

Using your favorite editor, open the file my.adapter. Scroll to the bottom of the file
and look for the following text.

Listing 16. The CBE equivalent of the first record of daemon.log

<cc:Outputter
description="Single File Outputter"
uniqueID="N13725210AFF11DA8000AE8373D52828"
type="SingleFileOutputter">
<pu:Property propertyName="directory"
propertyValue="/home/mstreicher"/>
<pu:Property propertyName="fileName"
propertyValue="emitter.log"/>
<op:SingleFileOutputterType directory="/home/mstreicher"
fileName="emitter.log"/>
</cc:Outputter>

If the line <op:SingleFileOutputterType... /> is missing, add it, changing


the values of attributes directory and fileName to match the values of the
similarly names properties. Then save the file.

Section 9. Running the GLA


Your rules-based adapter is now complete. Step through your template log file using
the controls in the Extractor Result pane and validate its operation. When you're
satisfied that everything is working properly, you can move on to running your
adapter using the stand-alone GLA.

Run the adapter using GLA


The GLA uses the settings you created in your adapter to read a log file and produce
a CBE XML document. Listing 11 shows a small portion of the file my.adapter.

Listing 17. A snippet of the file my.adapater

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 35 of 39
developerWorks® ibm.com/developerWorks

<adapter:Adapter...
<cc:ContextInstance
charset=""
continuousOperation="true"
description="A context for
daemon.log"
isoCountryCode="" isoLanguageCode=""
maximumIdleTime="120"
pauseInterval="120"
timezone="-0700"
uniqueID="N05306B00AFF11DA8000AE8373D52828"
year="2006">
<cc:Sensor
description="Read the daemon.log"
uniqueID="N057E8B10AFF11DA8000AE8373D52828"
confidenceBufferSize="1024"
fileFooterSize="0"
maximumBlocking="10"
type="SingleFileSensor">
<pu:Property
propertyName="directory"
propertyValue="/home/mstreicher/java-tptp-gla"
/>
<pu:Property
propertyName="fileName"
propertyValue="daemon.log"
/>
<sensor:SingleFileSensor
directory="/home/mstreicher/java-tptp-gla"
fileName="daemon.log"'
/>
</cc:Sensor>
<ex:Extractor
containsLineBreaks="false"
description="Divide daemon.log into
individual records"
endPattern="$"
includeEndPattern="false"
includeStartPattern="false"
lineBreakSymbol=""
replaceLineBreaks="false"
startPattern="^"
uniqueID="N05AA7D00AFF11DA8000AE8373D52828"
/>
.
</adapter:Adapter>

To run the GLA, you must first edit its script to point to where you installed it. Using
your favorite editor, open the file gla.sh in GenericLogAdapter/bin. (If you followed
the installation instructions verbatim, the file resides in
~/java/GenericLogAdapter/bin/gla.sh.) Find the line
GLA_HOME=/home/eclipse/GenericLogAdapter and change the path to point
to the directory that contains your copy of the GLA. Again, if you followed the
instructions verbatim, you would change the line to read
GLA_HOME=~/java/GenericLogAdapter. Save the file.

Next, find the file my.adapter in your Eclipse workspace under the directory My
Adapter. On the test system, my.adapter was found in ~/workspace/My
Adapter/my.adapter. To run the adapter, execute gla.sh, providing the path to your

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 36 of 39
ibm.com/developerWorks developerWorks®

adapter file as the only argument:

% ~/java/GenericLogAdapter/bin/gla.sh ~/workspace/My\ Adapter/my.adapter

After a moment, the file emitter.log should appear in your home directory (or
wherever you configured your file outputter to create the file).

Section 10. Summary


This tutorial demonstrated how to create a rules-based adapter to convert a typical
Linux log file into a CBE log file. Given a CBE log, you can use a tool such as the
Autonomic Computing Toolkit's Log and Trace Analyzer to further process the CBE
data.

Furthermore, if the rules constructs provided by Adapter Configuration Editor aren't


suitable for your log file, you can integrate your own Java class to parse and emit
CBE format. Unlike the rules-based adapter, a static parser (so-named because it
uses a Java class instead of rules) only needs a sensor and outputter, both of which
your Java code provides. You still run the GLA on the final configuration file, but you
must include your Java class in the GLA CLASSPATH, too.

In any case, the Adapter Configuration Editor and the GLA provide a powerful
environment in which to analyze the behavior of existing, even legacy applications
using modern autonomic computing tools. Simple conversion from any number of
log file formats to CBE requires just a few minutes of work; complex, detailed
conversions are easily accomplished with rich rules or your own code.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 37 of 39
developerWorks® ibm.com/developerWorks

Resources
Learn
• Read Part 1 and Part 3 of this series by Martin Streicher.
• Learn more about the Eclipse Foundation and its many projects.
• Read "Appendix A. Understanding Common Base Events Specification V1.0.1"
of the Autonomic Computing Toolkit Developer's Guide (IBM 2004) for
information about the Common Base Event format.
• Read a description of the CBE format, produced by the TPTP Log Adapter.
• Read the detailed specification of the Canonical Situation Data Format (PDF).
• Learn how to install the GLA as a stand-alone application with the Stand-alone
Generic Log Adapter Installation Guide.
• Expand your Eclipse skills by visiting IBM developerWorks' Eclipse project
resources.
• Browse all of the Eclipse content on developerWorks.
• Stay current with developerWorks technical events and webcasts.
• Visit the developerWorks Open source zone for extensive how-to information,
tools, and project updates to help you develop with open source technologies
and use them with IBM's products.
Get products and technologies
• Download the entire Eclipse TPTP runtime. Discover all the features of the
Eclipse TPTP, as well as extensive documentation, tutorials, presentations, and
screencasts that illuminate the capabilities of the Eclipse TPTP.
• Download the Eclipse GLA.
• Read more about the Autonomic Computing Toolkit and download its Log and
Trace Analyzer.
• Download Java technology from Sun Microsystems or from IBM.
• Download the freely available, extensible open source Eclipse SDK.
Discuss
• Connect with Eclipse developers and other users in the Eclipse mailing lists and
newsgroups. (You must register to read the newsgroups, but membership is
free, and the registration process is easy.)
• Get involved in the developerWorks community by participating in
developerWorks blogs.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 38 of 39
ibm.com/developerWorks developerWorks®

About the author


Martin Streicher
Martin Streicher is the Editor-in-Chief of Linux Magazine. Martin earned
a Master of Science in Computer Science from Purdue University and
has been programming UNIX-like systems since 1986 in the Pascal, C,
Perl, Java, and (most recently) Ruby programming languages.

Monitor applications Trademarks


© Copyright IBM Corporation 2006. All rights reserved. Page 39 of 39

Anda mungkin juga menyukai