Anda di halaman 1dari 9

Strengths and Limitations of Nagios as a Network Monitoring

Solution
Sophon Mongkolluksamee, Panita Pongpaibool, Chavee Issariyapat
National Electronic and Computer Technology Center
112 Thailand Science Park, Phahonyothin Road, Klong 1,
Klong Luang, Pathumthani 12120, Thailand
{sophon.mongkolluksamee, panita, chavee.issariyapat}@nectec.or.th

Abstract
Network monitoring software is an important tool to monitor and administer network devices
and services. It reduces a burden of network administrators with automatic checking of device and
service status and error reports. Nagios is a flexible and extensible open-source network monitoring
software that is widely used and has a big user and developer community. However, Nagios is not
perfect because of many limitations. Some limitations, such as difficult configuration and unattractive
user interface, can be improved with third-party add-ons, but some limitations require an art of writing
configuration and other tools to fix it. Moreover, most add-ons are not straightforward and user-
friendly. Administrators still need to tweak and adapt them to suit each network. Thus the task to adjust
Nagios according to specific networks demands expertise of network and system administrators.
Nevertheless, with the flexibility, extensibility and variety of add-ons of Nagios, it can be used as a
framework for building more powerful and easy-to-use network monitoring software.


1. Introduction
Computer systems and networks play a vital role in most organizations as more and more
business transactions rely on remote information access over Internet connectivity. A one-hour down
time could cost as much as 6.45 million dollars in brokerage business or 90 thousand dollars in retail
business [1]. It becomes the responsibility of IT and network engineers to make sure these servers and
networks are operational 24x7. In order to achieve this 100% service availability, network monitoring
software becomes essential as these network engineers cannot afford to manually check each device
constantly, especially in a large network.
There are a large number of network monitoring products in the market, both commercial and
open-source. Commercial products usually provide comprehensive features but they also cost a lot.
Some examples of popular commercial solutions are HP Operations Manager (formerly OpenView)
[2], IBM Tivoli [3], and CA Unicenter [4]. On the other hand, open-source products come with no
cost, but they usually have some limitations, such as limited number of devices and services to
monitor, and most importantly no technical support. Examples of well-known open-source network
monitoring tools are Nagios [5], Zenoss [6], Groundwork [7], OpenNMS [8], and Hyperic [9].
This article focuses on Nagios, which is the most popular among all open-source network
monitoring software. Nagios popularity can be seen from its highest volume of download from
SourceForge.net, as shown in Figures 1 and 2. In addition to the SourceForge data, the Google trend,
which presents a search volume comparison of the five open-source tools, also demonstrates the high
attention of Internet users on Nagios, as shown in Figure 3. We acknowledge that these two sources of
references may not give a complete picture since there are other ways to obtain the software besides
SourceForge, such as through package management repositories of many Linux distributions. Besides,
downloading software does not necessarily mean actual software usage. However, the data does reflect
considerable interests given to Nagios over other open-source tools.
Figure 1 Sourgeforge top 5 opensource NMS
Figure 2 Sourceforge top 5 opensource NMS download total (2009)
Figure
This article discusses strengths and limitations of Nagios as a network monitoring solutions
for enterprise networks. We will not compare Nagios against popular commercial products
open-source products because it has been discussed
share our experiences from the point of view of
is good at what it does and what else is left for the community to do to make Nagios even better.
Important contribution of this article is the discussion is
with the native Nagios architecture
previously. Then in Section 4, we present a new network monitoring tool built on Nagios framework
with the goal to improve many usability limitations of Nagios.
Sourgeforge top 5 opensource NMS monthly download (2009)
Sourceforge top 5 opensource NMS download total (2009)
Figure 3 Trend of 5 Opensource NMSs by Google trend.
This article discusses strengths and limitations of Nagios as a network monitoring solutions
for enterprise networks. We will not compare Nagios against popular commercial products
it has been discussed in previously in many publications
share our experiences from the point of view of Nagios users and developers. We focus on
is good at what it does and what else is left for the community to do to make Nagios even better.
contribution of this article is the discussion is Section 3.2 where we bring to attention issues
hitecture. As far as we know, these issues have not been addressed
Then in Section 4, we present a new network monitoring tool built on Nagios framework
with the goal to improve many usability limitations of Nagios.

download (2009)

Sourceforge top 5 opensource NMS download total (2009)

This article discusses strengths and limitations of Nagios as a network monitoring solutions
for enterprise networks. We will not compare Nagios against popular commercial products or other
publications [15]. Rather, we
. We focus on why Nagios
is good at what it does and what else is left for the community to do to make Nagios even better.
where we bring to attention issues
. As far as we know, these issues have not been addressed
Then in Section 4, we present a new network monitoring tool built on Nagios framework
2. Nagios

Nagios is the open-source network monitoring software of choice because it is widely used
and has a big user and developer community. Its users encompass ISPs, governments, as well as big
enterprises, such as Yahoo, Amazon, Google [16]. Nagios is proven to be scalable for a large network
with as many as 100,000 hosts and 1,000,000 services [16]. In addition, it has received several awards
from the Linux and networking communities, such as the Infoworld's Best of Open Source Software
(BOSSIE) 2008 Award under the "Server Monitoring" category, and the Linux Journal Reader's Choice
2009 award for "Favorite Linux Monitoring Application" [14].

Nagios is very flexible and configurable. The main tasks of Nagios are to monitor status of
network devices and their services and to notify system administrators when server or network
problems happen. The core of Nagios engine is a scheduler daemon that regularly probes specified
network devices and their services. Nagios perform status check through the use of external plugins,
which are compiled executables or scripts (Perl, shell, etc.) that can be run from a command line to
check the status of hosts or services. Users can create their own plugins to support any peculiar devices
or services in their network. A comprehensive list of user-contributed plugins can be found at several
web sites, for example [10,12]. In addition to regular network services such as web, database, and mail,
Nagios can also check local resources such as CPU load, memory, or temperature via the NRPE
(Nagios Remote Plugin Executor) agent installed at local devices. If devices support SNMP, Nagios
have a plug-ins that can use to get information from them.

3. Limitations of Nagios
Although flexibility is the feature that sets Nagios apart from other monitoring tools, the
flexibility of Nagios becomes a two-edged sword. Users usually experience that flexibility and
configurability usually result in a lengthy and troublesome configuration process. Some limitations are
probably due to the minimalism philosophy of Nagios design concept, for example the lack of
interactive and attractive user interface. Fortunately, the Nagios community has recognized these
problems. Some of the weaknesses can be fulfilled with add-ons or plugins from the Nagios
community. However, there are limitations that remain a challenge to fulfill. This section investigates
these two types of limitations in Nagios from the point of view of Nagios users and developers.
3.1 Limitations that can be solved with third-party add-ons
User-Unfriendly GUI
One weakness often heard about Nagios is its unfriendly user interface. The design of Nagios
web-based user interface is quite unattractive, mainly consists of tables and texts. A graph display is
available under the Trends section which shows only the host and service state (e.g., ok, warning,
critical) over time. No time series of performance data is reported in the graphic format. In terms of
topology visualization, users can choose to display their network map only in circular or tree layout,
but not a free-form topology display. There is no dashboard display that could give important
information at a glancea basic requirement for any NOC. In sum, Nagios interface is neither flexible
nor interactive. This is perhaps because the web interface uses an old-fashioned CGI technology
instead of more flexible and interactive technology like PHP or Flash which is more powerful and easy
to integrate with other web technology like AJAX and CSS. To overcome this weakness, a number of
projects aim to develop better graphic user interface for Nagios, for example, Groundwork, NetHAM
[17], and NagVis [18].
Lack of Database and Performance Records
Nagios does not come with database perhaps because it tries to be lightweight. Only state and
notification information is archived in text-based log files. This is fine for monitoring current states of
network. However, for trouble-shooting purpose, past performance data is important. Unfortunately
Nagios only displays current performance data (e.g., loss rate, round-trip delay, load), but does not
collect historical performance data. Not only does this make trouble-shooting difficult, it also prevents
further assessment of network health. Moreover, this hinders the generation of performance reports for
executives or IT managers. Several add-ons are developed to overcome this limitation. NDOUtils [19]
is a Nagios core add-on for exporting current and historical data from one or more Nagios instances to
a MySQL database. Opdb [20] is brokers that writes statistics and performance data to a MySQL
database. PHP4Nagios [21] and NagiosGrapher [22] collect performance data into a round-robin
database (RRD). With the use of database, users can generate reports in many flexible output formats
such as CSV, XML, PDF, or on-screen graphics.
Difficult Configuration
Users may discover that configuring Nagios to monitor just few devices and services already
requires a lengthy procedure. That is because Nagios requires text-based configuration files. Creating a
configuration file is no easy task because a user must understand a complex structure and all
configurable options. A user can create a configuration file manually using normal text editor (e.g., vi,
nano), but this is troublesome and error-prone especially for a complex configuration. Configuration
tools from the Nagios community can simplify this process. Examples of these tools are NetHAM,
Lilac and Fruity [23], which are PHP-based web interface for generating simple Nagios configurations.
For more complex configurations, NagiosQL [24] and NConf [25] offer enterprise-class features like
service templates and dependencies, assisting in configuration of a large network topology.
Lack of Automatic Device Discovery
In a simple network with not many devices and services, generating configuration files would
not be difficult to do manually or with configuration tools discussed above. However, if the network
consists of many devices (hundreds or thousands hosts), this becomes a difficult and tedious task.
Automatic discovery of servers and routers in the network would be a welcome option. Examples of
plugins that can do this are NACE [26], check_find_new_hosts [27], Nmap2Nagios [28],
Nmap2Nagios-ng [29]. NACE queries information from network and automatically creates host and
service definitions for hosts on the network. check_find_new_hosts can scan a subnet to find hosts that
Nagios is not monitoring and add them to Nagios. Nmap2Nagios and Nmap2Nagios-ng are perl-scripts
that convert Nmaps host scan output into Nagios configuration files. These tools allow Nagios to scale
well in terms of automatic configuration.
Although device discovery is convenient to have, its accuracy should be considered. Since
most of the plugins above use a generic method of discovery, it is almost impossible to discover all
devices accurately. Special devices like routers or managed switches require knowing model numbers
or community strings in advance. Moreover, to some network administrators, auto-discovery (e.g., host
scan) could be seen as a security threat to a network.
Lack of Distributed Monitoring
When trying to monitor hundreds or thousands of hosts and services, only one Nagios server
might not be sufficient to handle many check loads. To improve performance scalability, users should
set up multiple Nagios servers. Moreover, some users may want to set up Nagios servers in multiple
locations for reliability purpose. Out-of-the-box Nagios is not capable of distributed monitoring. Such
requirements must be achieved with external tools like NSCA [19] and DNX [30]. NSCA provides a
channel to integrate passive alerts and check results from multiple Nagios machines. DNX is a modular
extension of Nagios that distribute load of Nagios checking processes to multiple remote servers.

3.2 Limitations without Current Solutions
Through our experiences implementing Nagios for monitoring enterprise networks, we
encountered many obstacles where none of existing add-ons or plugins can help. Most of the problems
are due to limitations of the native Nagios structure. We discuss some of these limitations in this
section. We find no previous articles discussing these points, so we hope to bring to attention these
issues to the Nagios community for further improvement to either the Nagios architecture itself or add-
ons that can address these problems.
How to treat an interface of a router or a switch?
Nagios defines two types of objects to monitorhost and service. A host is a computer,
network device, printer, UPS or device that can be accessed via the Internet Protocol (IP). A service
can be a real network service such as POP3, SNMP and HTTP, or other information such as processor
load, disk usage, or temperature that can be accessed locally within a host. Nagios makes no distinction
among different types of devices like servers, routers, or switches. Nagios treats every device
generically as a host. This makes it problematic for networks that consist of many routers and switches
because these network devices require more complex monitoring for their interfaces.
To set up Nagios to monitor an interface of a router, users must choose whether to treat it as a
host or a service. If treated as a service, then a 24-interface router will need at least 24 services
associated with each interface. Moreover, administrators may be interested in multiple performance
metrics per interface, such as byte-in, byte-out, and check-alive. In such case, at least 3x24 services are
required per router! Not only is this configuration cumbersome, it could also lead to confusion and
errors. On the other hand, it may be more sensible to treat an interface as a host since each interface has
its own IP address and has connectivity associated with it. This does not reduce number of
configuration files, however. It still requires 3-service x 24-host (interface) configurations for all
interfaces.
The advantage for treating an interface as a host is better visibility. For example, if treated as a
service, an interface will not show up on a network map. This makes it difficult to quickly trouble-
shoot connectivity problem. If treated as a host, an interface can show up on a topology display, as
shown in Figure 4. However, there is an issue how to relate all interfaces that belong to the same router
since Nagios considers each host as an independent device. Moreover, interfaces on a layer-2 managed
switch will not have IP addresses. With this dilemma, it is not clear how to elegantly treat an interface
of a router or a switch.

Figure 4 An example of configuring router interfaces as hosts
How to monitor and alert status of a link?
Similarly to the issue of interface discussed above, Nagios has no built-in support for a link.
Nagios simply considers a link to be connectivity between two hosts (parent-child relationship). There
is currently no way to define properties or status of a parent-child relationship. Moreover, this parent-
child relationship only implies logical connectivity, not necessary physical connectivity. Currently, to
monitor a link status, users must monitor status of interfaces at both sides of a link. However, some
performance metrics depends on physical properties of a link not of interfaces, for example,
propagation delay and channel quality. Other metrics, like link utilization and bandwidth, can be
measured on just one interface to avoid redundancy. So it seems that a link should be treated as an
independent entity. Obviously a link cannot be treated as a host because it does not own an IP address.
Should a link be treated as a service? Maybe, but a service must be associated with one host. In this
case, we can associate only one, not two, host with a link. This may not yield a complete status of a
link.
How to detect network anomaly dynamically?
Almost all check plugins available in Nagios community use thresholds to classify levels of
network status. Users must define absolute threshold values for critical, warning, and ok levels for each
service check. This procedure is troublesome. For example, the check-ping service for two servers may
require different threshold levels because the servers are locate at different location. In this case, users
must define two different check-ping services, one for each set of thresholds. Moreover, it is
impractical to expect users to know before-hand all performance levels, such as round-trip delay, loss
rate, and load, for all servers.
It would be easier if Nagios supports a plugin that accepts generic thresholds, or even better
a plugin that does not rely on thresholds at all. A generic threshold could be something like percentage
of deviation from a normal level. So different servers could have different normal performance levels,
and users do not need to know these numbers in advance. Users just define, for example, 2 standard
deviations away from the average means a critical level. Note that to implement this kind of plugin, we
must allow time for Nagios to collect performance statistics. The advantage is all servers can then use
the same check-ping service, no matter how different their average round-trip delay is, for example.
How to monitor multiple network sites with one Nagios server?
Nagios can monitor hundreds or thousands hosts and services in a single network with no trouble
as proven by many large enterprise users [16]. However, we face difficulty trying to set up a Nagios
server to monitor multiple network sites that are administered by different groups of people. Nagios
web report is designed to display all status information in one tactical overview. We cannot simply set
up Nagios to separate the view by groups of devices. (To do so, we must modify the CGI codes.)
Moreover, the notification and contact options are defined per service. This mean all alerts due to
check-http failure must go to the same group of administrators. Otherwise, we must define multiple
check-http plugins which differ only in the contact option.
A straightforward way to solve this problem is to install two instances of Nagios on the same
server for monitoring two network sites. However, this solution is not efficient because it requires
twice the effort for setup and configuration. Therefore, an elegant solution to this problem is still much
needed.
4. Nagios as a Framework for creating a User-Friendly Network Monitoring Tool

Currently, we are working on the network monitoring system targeted at small and medium
organizations. We have developed a prototype called NetHAM which stands for Network Health
Analysis and Monitoring. We use Nagios as a core module of our new system as it is powerful and
flexible enough to be integrated. However, we still need to develop many in-between modules in order
to link NetHAM functionality to Nagios process. The relation between Nagios as a core engine and our
modules is as depicted in figure 5.



Figure 5 Diagram of internal NetHAM process.

NetHAM user interface is a combination of two-part web interface. Firstly, the monitoring
part has been created with Adobe-Flex to display real-time status of networks. The status displayed by
this module consists of a network diagram, host/service status and a service statistic as shown in figure
6. Host and service status are purely taken from the status.dat file which is updated regularly by
Nagios, where as the network diagram is retrieved from the same source combined with network
structure from Nagios configuration. As for recording service statistic, we have to develop the Nagios
Event Broker module called NagDB to capture output returned by Nagios plugin. NagDB uses pre-
defined patterns to extract service statistics from output string and store them in MySQL database.

Figure 6 NetHAM monitor screen

We also built a NagTrigger module to link internal Nagios events to external command
executions. One important application of this module is that we can use it to force NetHAM to refresh
its data whenever Nagios updates its status. By doing this way, there is no need to modify Nagios
source code while we can still use it as a core engine without problem.

Secondly, the configuration part, as depicted in figure 7, was built using NagiosQL as a basis.
The advantage of using NagiosQL is that all Nagios configurations created using NagiosQL will be
represented in the form of database entries. This means that every part of configuration can be easily
accessed and modified. Not only does it help user to tune up NetHAM more conveniently, this will also
allow us to develop intuitive user interface to reconfigure network monitoring process.



Figure 7 NetHAM configure screen
Say that NetHAM overcomes all limitations mentioned in Section 3.1. Future work includes
implementation that will try to solve limitations mentioned in Section 3.2.

5. Conclusion
Network monitoring is an important component for good network management. Open-source
network monitoring is a sensible choice for organizations that do not have budget for a commercial
product. Nagios is one of the most popular open-source network monitoring tools as seen from its large
community of users, its download volume, and its search trend. This article investigates strengths and
limitations of Nagios from the experiences of Nagios users and developers. We highlight two types of
limitationsones that can be addressed by add-ons and plugins and ones where none of existing add-
ons or plugins can help. In most cases the native Nagios architecture is the limiting factor. We then
present our new model of network monitoring tool, called NetHAM, developed over the Nagios
framework. NetHAM is able to overcome many limitations of Nagios and provides a more powerful
and user-friendly network monitoring environment.
References
[1] Performance Technologies Inc., 2001. The Effects of Network Downtime on Profits and
Productivity--A White Paper Analysis on the Importance of Non-stop Networking. White
Paper. Available: http://whitepapers.informationweek.com/detail/RES/ 991044232_762.html
[2] HP.com. (n.d.). Looking for HP Openview [Online]. Available: http://openview.hp.com
[3] IBM.com. (n.d.).IBM Tivoli software [Online]. Available: http://www-
01.ibm.com/software/tivoli/
[4] CA Unicenter[Online]. Available: http://www.ca.com/us/infrastructure-management.aspx
[5] Nagios. (n.d.). [Online]. Available: http://nagios.org
[6] ZeNoss. (n.d.). [Online]. Available: http://zenoss.com
[7] Groundwork. (n.d.). [Online]. Available: http://www.groundworkopensource.com
[8] OpenNMS. (n.d.). [Online]. Available: http://opennms.org
[9] Hyperic. (n.d.). [Online]. Available: http://www.hyperic.com
[10] Nagios Exchange. (n.d.). [Online]. Available: http://exchange.nagios.org
[11] Sourecforge (n.d.). Find Monitoring Software. [Online]. Available:
http://sourceforge.net/softwaremap/trove_list.php?form_cat=152
[12] Nagiosplugins. (n.d.). [Online]. Available: http://www.nagiosplugins.org
[13] D. Doug, B. James R., M. High, Best of open source networking software,infoworld.com,
Aug 31, 2009.[Online]. Available: http://www.infoworld.com/d/open-source/best-open-
source-networking-software-767&current=6&last=1
[14] G. Jame, Readers' Choice Awards 2009, linuxjournal.com, Jun 1st, 2009. [Online].
Available: http://www.linuxjournal.com/article/10451
[15] Craig, OpenNMSvsNagios, rootdev.com, Jul 2
nd
, 2008. [Online]. Available:
http://www.rootdev.com/tech/opennms-vs-nagios
[16] nagios.org. (n.d.).Nagios User Profiles [Online]. Available: http://users.nagios.org/
[17] NetHAM. (n.d.). [Online]. Available: http://inms.in.th/portal/index.php/about-inms-
project/netham
[18] NagVis.org. (n.d.). [Online]. Available: http://www.nagvis.org
[19] Nagios.org. (n.d.). NagiosAddons. [Online]. Available:
http://www.nagios.org/download/addons
[20] Opmon.org. (n.d.). opdb. [Online]. Available: http://www.opmon.org/category/projects/opdb
[21] Pnp4nagios.org. (n.d.). [Online]. Available: http://docs.pnp4nagios.org/pnp-0.4/start
[22] netways.org. (n.d.). NagiosGrapher Doc. [Online]. Available:
https://www.netways.org/wiki/grapher
[23] lilacplatform.com. (n.d.). Make Open Source Monitoring Easy. [Online]. Available:
http://www.lilacplatform.com/
[24] nagiosql.org. (n.d.). [Online]. Available: http://www.nagiosql.org
[25] nconf.org. (n.d.). NConf - Enterprise Nagiosconfigurator. [Online]. Available:
http://www.nconf.org/dokuwiki/doku.php
[26] freshmeat.net. (n.d.). Nagios Automated Configuration Engine. [Online]. Available:
http://freshmeat.net/projects/nace
[27] nagios.org. (n.d.). [Online]. Available: http://exchange.nagios.org/directory/Plugins/Network-
and-Systems-Management/Nagios/check_find_new_hosts/details
[28] sourceforge.net. (n.d). Nmap2Nagios. [Online]. Available:
http://nmap2nagios.sourceforge.net/
[29] sourceforge.net. (n.d.). Nmap2Nagios-ng. [Online]. Available:
http://exchange.nagios.org/directory/Addons/Configuration/Auto%252DDiscovery/nmap2nag
ios%252Dng/details
[30] sourforge.net. (n.d.). Distributed Nagios Executor. [Online]. Available:
http://dnx.sourceforge.net