Remote logging


1

Reading Time: 15min

What this is about

  • Secure and reliable syslog remote logging
  • Free yourself from receiving hundreds of logging emails each day
  • Setup email alerts

Log server

I assume we all have an understanding what logging is and why logging matters.
What we want is a centralized place where all log messages can be sent to, where they are analysed and can trigger alerts.
There is a big picture you can get over long-term logging, called Data-Mining2 and which is out of scope for this article.

The first problem I repeatedly faced was how to collect all logging messages securely in a central place. Of course rsyslog can manage it, but has its own pitfalls and doesn’t allow instant further processing.
Logstash3, written in Java, and logstash-forwarder, written in the Go language, came to the rescue.

Logstash-forwarder, Logstash, Graylog2, Elasticsearch, Kibana.

logflowWhat a chain of tools. Logstash-forwarder (shipper), Logstash (server/filter), Elasticsearch (database) and Kibana (frontend) seem to be an obvious choice, since “brought together” by Elasticsearch in 20134.
Graylog2 (server/filter/frontend) is a different project from Germany that seems to have a lot in common, but on the other hand has some frontend features that are missing in Kibana.

Logstash-forwarder uses the Lumberjack-protocol that should not be mixed up with the “Project Lumberjack”5, based on the work of the CEE: Common Event Expression6.

CEE is a sort of comedic tragedy of design

— Jordan Sissel, developer of Logstash and logstash-forwarder7.

I can only guess why he named his own protocol Lumberjack, but the name seems to be a great fit8/9 and also a parody of the Monty Python skit “Lumberjack Song”10.

The lumberjack protocol used by this project exists to provide a network protocol for transmission that is secure, low latency, low resource usage, and reliable.

— Jordan Sissel about logstash-forwarder11

Logstash-forwarder takes files or stdout as input stream. Then it provides it securely and reliable to Logstash, which does further processing. This is exactly what we need, as all applications already write log-files. With Logstash alone we would have plenty of choices for other input formats. For the task of remotely collecting syslog files, the so called “lumberjack-input” suffices, as it is the service that logstash-forwarder connects to.

Forward and process

Logstash-forwarder is sending the syslog file, tagging it as syslog type. Besides the server and SSL-cert configuration, it’s configuration has only a “files”-block, indicating what it is supposed to send to the Logstash server:

"files":[
 {
 "paths":[
 "/var/log/syslog"
 ],
 "fields": { "type": "syslog" }
 }
]

Now we have a lightweight shipper on each host, sending logfiles to our logserver, where we have Logstash and its more heavy requirements Java and Elasticsearch installed. As a frontend view on Elasticsearch, we also have Kibana12 in place.

Focusing on Logstash, it collects all messages forwarded by the shipper (logstash-forwarder) and outputs everything to the database (Elasticsearch) for later review (Kibana).

The main configuration for Logstash is also very simple.

Have an input:

input {
 lumberjack {
 port => 5000
 ssl_certificate => "path"
 ssl_key => "path"
 }
}

Have an output:

output {
 elasticsearch { host => localhost }
}

Anything that arrives via logstash-forwarder, will be stored to Elasticsearch.
With Elasticsearch’s query language, we have the possibility to search and filter very quickly, to build exactly the view that helps us identifying/analysing problems.
This leads to the questions what to search for, or in other words: How is our data structured?
The next big problem with logging is that most applications have their own log-message standard. But we want to have it normalized. One simple definition of a log-message is:

“timestamp plus data” – Jordan Sissel13

That describes enough to have at least one timestamp field required.
As logstash-forwarder just sends files line by line, we need a parser construct for Logstash to analyse the incoming data. Therefore Logstash has filters. A special one, we can use to parse the incoming syslog messages, is called “the grok filter”.
The grok filter provides named regex-patterns to break up a complex regex for parsing a message line. I know this method a bit from fail2ban and am still wondering, if they have anyhow related roots.
With the filter now in place, we extract a timestamp field and some other fields additionally to the base message and are then able to search within Elasticsearch more specifically.

filter {
 if[type] == "syslog" {
 grok {
   match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %{GREEDYDATA:syslog_message}" }
 }
 date {
   match =>[ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
 }
 }
}

Logs via email

As every host is sending emails from applications or cronjobs to hostmaster, my mailbox was a pain to look at every morning. Now I forward everything to a special mailbox and parse it via Logstash, which can also trigger a simple alert.
Later on, I will show how we made our setup using Graylog2, which provides alerts via streams over a web GUI.

IMAP input

As Logstash is already running, I only needed another input

input {
 imap {
 host => "dns, or ip"
 password => "password"
 user => "login"
 type => "email"
 }
}

Now every mail that arrives will be tagged as “email”, and has already been processed with a timestamp by the IMAP input.
I want to have a simple alert, which sends me an email for every matched error. Logstash has a conditional expression:

output {
 if[type] == "email" and[message] ~ /error/ {
 email {
 attachments =>["message"]
 to => "email-address"
 }
 }
}

Wow, that was easy. And there are also Logstash outputs to inform our Icinga, or send a note via “hipchat-output”.
But now, imagine using the email alert for a series of logging messages. That could easily lead to hundreds of email alerts. We need a more sophisticated solution for this.

Email alerts

After having Graylog214 installed and configured on the same server Logstash is running, I only changed the output of Logstash to gelf:

output {
 gelf { host => localhost }
}

And now everything Logstash gets via inputs and parses via filters, arrives at Graylog2’s gelf-udp-input, which then has an Elasticsearch output again.

Why the heck this overhead? Because Logstash in combination with logstash-forwarder is great at sending messages securely, reliable and low resource. Also it has a lot pre-processing options.
And Graylog2 is great in terms of alerts. I can now configure “streams” that are basically pattern-matched messages, and can trigger an pattern-matched email-alert.

What is the benefit over the Logstash email output?

  • Graylog2 stream alerts can be paused after first trigger, leading to a much lower alerting rate
  • Graylog2 can manage users and permissions for streams that let me provide logs for a certain host to anybody using it, without showing everything
  • Graylog2 has a lot more to offer. E.g. looking promising with additional external dashboards specialized for deployments15

Additionally, I will keep using Kibana, because it has a much cleaner view on the whole index.

Résumé

With rsyslog forwarding I always duplicated all logs before processing them. Without logstash-forwarder it was a hassle to use logstash (overhead of Java, or rsyslog odysseys). Now logging is fun for me. Lightweight forwarder, secure transmission. The specs for the logserver are a bit heavy (6GB RAM), but as best practice advices, it could be clustered if required.

Disclaimer

I provide by no means a copy and paste ready configuration. You are in responsibility to understand missing requirements and side-effects.

Resources