Anda di halaman 1dari 9

home docs about cookbook bugs

Introduction
Logstash is a tool for receiving, processing and outputting logs. All kinds of logs.
System logs, webserver logs, error logs, application logs and just about anything you
can throw at it. Sounds great, eh?
Using Elasticsearch as a backend datastore, and kibana as a frontend reporting tool,
Logstash acts as the workhorse, creating a powerful pipeline for storing, querying and
analyzing your logs. With an arsenal of built-in inputs, filters, codecs and outputs, you
can harness some powerful functionality with a small amount of effort. So, lets get
started!
Prerequisite: Java
The only prerequisite required by Logstash is a Java runtime. You can check that you
have it installed by running the command java -version in your shell. Heres
something similar to what you might see:
It is recommended to run a recent version of Java in order to ensure the greatest success in running Logstash.
Its fine to run an open-source version such as OpenJDK:http://openjdk.java.net/
Or you can use the official Oracle version:http://www.oracle.com/technetwork/java/index.html
Once you have verified the existence of Java on your system, we can move on!
Up and Running!
Logstash in two commands
First, were going to download the logstash binary and run it with a very simple configuration.
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-1.4.1.tar.gz
Now you should have the file named logstash-1.4.1.tar.gz on your local filesystem. Lets unpack it:
tar zxvf logstash-1.4.1.tar.gz
cd logstash-1.4.1
Now lets run it:
bin/logstash -e 'input { stdin { } } output { stdout {} }'
Now type something into your command prompt, and you will see it output by Logstash:
hello world
2013-11-21T01:22:14.405+0000 0.0.0.0 hello world
OK, thats interesting We ran Logstash with an input called "stdin", and an output named "stdout", and Logstash basically echoed
back whatever we typed in some sort of structured format. Note that specifying the -e command line flag allows Logstash to accept a
configuration directly from the command line. This is especially useful for quickly testing configurations without having to edit a file
> java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
between iterations.
Lets try a slightly fancier example. First, you should exit Logstash by issuing a CTRL-C command in the shell in which it is running.
Now run Logstash again with the following command:
bin/logstash -e 'input { stdin { } } output { stdout { codec => rubydebug } }'
And then try another test input, typing the text "goodnight moon":
goodnight moon
{
"message" => "goodnight moon",
"@timestamp" => "2013-11-20T23:48:05.335Z",
"@version" => "1",
"host" => "my-laptop"
}
So, by re-configuring the "stdout" output (adding a "codec"), we can change the output of Logstash. By adding inputs, outputs and
filters to your configuration, its possible to massage the log data in many ways, in order to maximize flexibility of the stored data when
you are querying it.
Storing logs with Elasticsearch
Now, youre probably saying, "thats all fine and dandy, but typing all my logs into Logstash isnt really an option, and merely seeing
them spit to STDOUT isnt very useful." Good point. First, lets set up Elasticsearch to store the messages we send into Logstash. If
you dont have Elasticearch already installed, you can download the RPM or DEB package, or install manually by downloading the
current release tarball, by issuing the following four commands:
Note
This tutorial specifies running Logstash 1.4.1 with Elasticsearch %ELASTICSEARCH_VERSION%. Each release of Logstash has a
recommended version of Elasticsearch to pair with. Make sure the versions match based on the Logstash version youre running!
More detailed information on installing and configuring Elasticsearch can be found on The Elasticsearch reference pages. However,
for the purposes of Getting Started with Logstash, the default installation and configuration of Elasticsearch should be sufficient.
Now that we have Elasticsearch running on port 9200 (we do, right?), Logstash can be simply configured to use Elasticsearch as its
backend. The defaults for both Logstash and Elasticsearch are fairly sane and well thought out, so we can omit the optional
configurations within the elasticsearch output:
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } }'
Type something, and Logstash will process it as before (this time you wont see any output, since we dont have the stdout output
configured)
you know, for logs
You can confirm that ES actually received the data by making a curl request and inspecting the return:
curl 'http://localhost:9200/_search?pretty'
which should return something like this:
curl -O https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.1.1.tar.gz
tar zxvf elasticsearch-1.1.1.tar.gz
cd elasticsearch-1.1.1/
./bin/elasticsearch
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
Congratulations! Youve successfully stashed logs in Elasticsearch via Logstash.
Elasticsearch Plugins (an aside)
Another very useful tool for querying your Logstash data (and Elasticsearch in general) is the Elasticearch-kopf plugin. Here is more
information on Elasticsearch plugins. To install elasticsearch-kopf, simply issue the following command in your Elasticsearch
directory (the same one in which you ran Elasticsearch earlier):
bin/plugin -install lmenezes/elasticsearch-kopf
Now you can browse to http://localhost:9200/_plugin/kopf to browse your Elasticsearch data, settings and mappings!
Multiple Outputs
As a quick exercise in configuring multiple Logstash outputs, lets invoke Logstash again, using both the stdout as well as the
elasticsearch output:
Typing a phrase will now echo back to your terminal, as well as save in Elasticsearch! (Feel free to verify this using curl or
elasticsearch-kopf).
Default - Daily Indices
You might notice that Logstash was smart enough to create a new index in Elasticsearch The default index name is in the form of
logstash-YYYY.MM.DD, which essentially creates one index per day. At midnight (GMT?), Logstash will automagically rotate the index
to a fresh new one, with the new current days timestamp. This allows you to keep windows of data, based on how far retroactively
youd like to query your log data. Of course, you can always archive (or re-index) your data to an alternate location, where you are able
to query further into the past. If youd like to simply delete old indices after a certain time period, you can use the Elasticsearch
Curator tool.
Moving On
Now youre ready for more advanced configurations. At this point, it makes sense for a quick discussion of some of the core features
of Logstash, and how they interact with the Logstash engine.
The Life of an Event
Inputs, Outputs, Codecs and Filters are at the heart of the Logstash configuration. By creating a pipeline of event processing,
Logstash is able to extract the relevant data from your logs and make it available to elasticsearch, in order to efficiently query your
data. To get you thinking about the various options available in Logstash, lets discuss some of the more common configurations
currently in use. For more details, read about the Logstash event pipeline.
Inputs
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash-2013.11.21",
"_type" : "logs",
"_id" : "2ijaoKqARqGvbMgP3BspJA",
"_score" : 1.0, "_source" : {"message":"you know, for logs","@timestamp":"2013-11-21T18:45:09.862Z","@version":"1","host":"my-laptop"}
} ]
}
}
bin/logstash -e 'input { stdin { } } output { elasticsearch { host => localhost } stdout { } }'
Inputs are the mechanism for passing log data to Logstash. Some of the more useful, commonly-used ones are:
file: reads from a file on the filesystem, much like the UNIX command "tail -0a"
syslog: listens on the well-known port 514 for syslog messages and parses according to RFC3164 format
redis: reads from a redis server, using both redis channels and also redis lists. Redis is often used as a "broker" in a centralized
Logstash installation, which queues Logstash events from remote Logstash "shippers".
lumberjack: processes events sent in the lumberjack protocol. Now called logstash-forwarder.
Filters
Filters are used as intermediary processing devices in the Logstash chain. They are often combined with conditionals in order to
perform a certain action on an event, if it matches particular criteria. Some useful filters:
grok: parses arbitrary text and structure it. Grok is currently the best way in Logstash to parse unstructured log data into
something structured and queryable. With 120 patterns shipped built-in to Logstash, its more than likely youll find one that meets
your needs!
mutate: The mutate filter allows you to do general mutations to fields. You can rename, remove, replace, and modify fields in your
events.
drop: drop an event completely, for example, debug events.
clone: make a copy of an event, possibly adding or removing fields.
geoip: adds information about geographical location of IP addresses (and displays amazing charts in kibana)
Outputs
Outputs are the final phase of the Logstash pipeline. An event may pass through multiple outputs during processing, but once all
outputs are complete, the event has finished its execution. Some commonly used outputs include:
elasticsearch: If youre planning to save your data in an efficient, convenient and easily queryable format Elasticsearch is the
way to go. Period. Yes, were biased :)
file: writes event data to a file on disk.
graphite: sends event data to graphite, a popular open source tool for storing and graphing metrics. http://graphite.wikidot.com/
statsd: a service which "listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more
pluggable backend services". If youre already using statsd, this could be useful for you!
Codecs
Codecs are basically stream filters which can operate as part of an input, or an output. Codecs allow you to easily separate the
transport of your messages from the serialization process. Popular codecs include json, msgpack and plain (text).
json: encode / decode data in JSON format
multiline: Takes multiple-line text events and merge them into a single event, e.g. java exception and stacktrace messages
For the complete list of (current) configurations, visit the Logstash "plugin configuration" section of the Logstash documentation page.
More fun with Logstash
Persistent Configuration files
Specifying configurations on the command line using -e is only so helpful, and more advanced setups will require more lengthy, long-
lived configurations. First, lets create a simple configuration file, and invoke Logstash using it. Create a file named "logstash-
simple.conf" and save it in the same directory as Logstash.
input { stdin { } }
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
Then, run this command:
bin/logstash -f logstash-simple.conf
Et voil! Logstash will read in the configuration file you just created and run as in the example we saw earlier. Note that we used the -f
to read in the file, rather than the -e to read the configuration from the command line. This is a very simple case, of course, so lets
move on to some more complex examples.
Filters
Filters are an in-line processing mechanism which provide the flexibility to slice and dice your data to fit your needs. Lets see one in
action, namely the grok filter.
input { stdin { } }
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
Run Logstash with this configuration:
bin/logstash -f logstash-filter.conf
Now paste this line into the terminal (so it will be processed by the stdin input):
You should see something returned to STDOUT which looks like this:
As you can see, Logstash (with help from the grok filter) was able to parse the log line (which happens to be in Apache "combined
log" format) and break it up into many different discrete bits of information. This will be extremely useful later when we start querying
and analyzing our log data for example, well be able to run reports on HTTP response codes, IP addresses, referrers, etc. very
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"
{
"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] \"GET /xampp/status.php HTTP/1.1\" 200 3891 \"http://cadenza/xampp/navi.php\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\"",
"@timestamp" => "2013-12-11T08:01:45.000Z",
"@version" => "1",
"host" => "cadenza",
"clientip" => "127.0.0.1",
"ident" => "-",
"auth" => "-",
"timestamp" => "11/Dec/2013:00:01:45 -0800",
"verb" => "GET",
"request" => "/xampp/status.php",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "3891",
"referrer" => "\"http://cadenza/xampp/navi.php\"",
"agent" => "\"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0\""
}
easily. There are quite a few grok patterns included with Logstash out-of-the-box, so its quite likely if youre attempting to parse a
fairly common log format, someone has already done the work for you. For more details, see the list of logstash grok patterns on
github.
The other filter used in this example is the date filter. This filter parses out a timestamp and uses it as the timestamp for the event
(regardless of when youre ingesting the log data). Youll notice that the @timestamp field in this example is set to December 11, 2013,
even though Logstash is ingesting the event at some point afterwards. This is handy when backfilling logs, for example the ability to
tell Logstash "use this value as the timestamp for this event".
Useful Examples
Apache logs (from files)
Now, lets configure something actually useful apache2 access log files! We are going to read the input from a file on the localhost,
and use a conditional to process the event according to our needs. First, create a file called something like logstash-apache.conf with
the following contents (youll need to change the logs file path to suit your needs):
input {
file {
path => "/tmp/access_log"
start_position => beginning
}
}
filter {
if [path] =~ "access" {
mutate { replace => { "type" => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => localhost
}
stdout { codec => rubydebug }
}
Then, create the file you configured above (in this example, "/tmp/access_log") with the following log lines as contents (or use some
from your own webserver):
Now run it with the -f flag as in the last example:
bin/logstash -f logstash-apache.conf
You should be able to see your apache log data in Elasticsearch now! Youll notice that Logstash opened the file you configured, and
read through it, processing any events it encountered. Any additional lines logged to this file will also be captured, processed by
Logstash as events and stored in Elasticsearch. As an added bonus, they will be stashed with the field "type" set to "apache_access"
(this is done by the type "apache_access" line in the input configuration).
In this configuration, Logstash is only watching the apache access_log, but its easy enough to watch both the access_log and the
71.141.244.242 - kurt [18/May/2011:01:48:10 -0700] "GET /admin HTTP/1.1" 301 566 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
134.39.72.245 - - [18/May/2011:12:40:18 -0700] "GET /favicon.ico HTTP/1.1" 200 1189 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E)"
98.83.179.51 - - [18/May/2011:19:35:08 -0700] "GET /css/main.css HTTP/1.1" 200 1837 "http://www.safesand.com/information.htm" "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1"
error_log (actually, any file matching *log), by changing one line in the above configuration, like this:
input {
file {
path => "/tmp/*_log"
...
Now, rerun Logstash, and you will see both the error and access logs processed via Logstash. However, if you inspect your data (using
elasticsearch-kopf, perhaps), you will see that the access_log was broken up into discrete fields, but not the error_log. Thats because
we used a "grok" filter to match the standard combined apache log format and automatically split the data into separate fields.
Wouldnt it be nice if we could control how a line was parsed, based on its format? Well, we can
Also, you might have noticed that Logstash did not reprocess the events which were already seen in the access_log file. Logstash is
able to save its position in files, only processing new lines as they are added to the file. Neat!
Conditionals
Now we can build on the previous example, where we introduced the concept of a conditional. A conditional should be familiar to
most Logstash users, in the general sense. You may use if, else if and else statements, as in many other programming languages.
Lets label each event according to which file it appeared in (access_log, error_log and other random files which end with "log").
input {
file {
path => "/tmp/*_log"
}
}
filter {
if [path] =~ "access" {
mutate { replace => { type => "apache_access" } }
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
} else if [path] =~ "error" {
mutate { replace => { type => "apache_error" } }
} else {
mutate { replace => { type => "random_logs" } }
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
Youll notice weve labeled all events using the "type" field, but we didnt actually parse the "error" or "random" files There are so
many types of error logs that its better left as an exercise for you, depending on the logs youre seeing.
Syslog
OK, now we can move on to another incredibly useful example: syslog. Syslog is one of the most common use cases for Logstash,
and one it handles exceedingly well (as long as the log lines conform roughly to RFC3164 :). Syslog is the de facto UNIX networked
logging standard, sending messages from client machines to a local file, or to a centralized log server via rsyslog. For this example,
you wont need a functioning syslog instance; well fake it from the command line, so you can get a feel for what happens.
First, lets make a simple configuration file for Logstash + syslog, called logstash-syslog.conf.
Run it as normal:
bin/logstash -f logstash-syslog.conf
Normally, a client machine would connect to the Logstash instance on port 5000 and send its message. In this simplified case, were
simply going to telnet to Logstash and enter a log line (similar to how we entered log lines into STDIN earlier). First, open another shell
window to interact with the Logstash syslog input and type the following command:
telnet localhost 5000
You can copy and paste the following lines as samples (feel free to try some of your own, but keep in mind they might not parse if the
grok filter is not correct for your data):
Now you should see the output of Logstash in your original shell as it processes and parses messages!
input {
tcp {
port => 5000
type => syslog
}
udp {
port => 5000
type => syslog
}
}
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
Dec 23 12:11:43 louis postfix/smtpd[31499]: connect from unknown[95.75.93.154]
Dec 23 14:42:56 louis named[16000]: client 199.48.164.7#64817: query (cache) 'amsterdamboothuren.com/MX/IN' denied
Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)
Dec 22 18:28:06 louis rsyslogd: [origin software="rsyslogd" swVersion="4.2.0" x-pid="2253" x-info="http://www.rsyslog.com"] rsyslogd was HUPed, type 'lightweight'.
{
"message" => "Dec 23 14:30:01 louis CRON[619]: (www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"@timestamp" => "2013-12-23T22:30:01.000Z",
"@version" => "1",
"type" => "syslog",
"host" => "0:0:0:0:0:0:0:1:52617",
"syslog_timestamp" => "Dec 23 14:30:01",
"syslog_hostname" => "louis",
"syslog_program" => "CRON",
Congratulations! Youre well on your way to being a real Logstash power user. You should be comfortable configuring, running and
sending events to Logstash, but theres much more to explore.
Hello! I'm your friendly footer. If you're actually reading this, I'm impressed.
"syslog_pid" => "619",
"syslog_message" => "(www-data) CMD (php /usr/share/cacti/site/poller.php >/dev/null 2>/var/log/cacti/poller-error.log)",
"received_at" => "2013-12-23 22:49:22 UTC",
"received_from" => "0:0:0:0:0:0:0:1:52617",
"syslog_severity_code" => 5,
"syslog_facility_code" => 1,
"syslog_facility" => "user-level",
"syslog_severity" => "notice"
}

Anda mungkin juga menyukai