Keyboard meet head

Where my head meets keyboard..

Pushing the check states from Xymon to Graphite

Note, that this article was written quite a while ago in 2016. Technologies, circumstances, people and my opinions or understanding might have changed since. Please bear that in mind when reading this.

Motivation🔗

Chances are, you never heard about Xymon (formely Hobbit), so let me give you some idea. It's actually pretty decent monitoring system - if you still live in 90s. :D But hey, let's give credit where it's due, compared to other systems at that time, it was reasonably fast, has quite easy to understand configuration, comes with many standard checks out of the box and comes with a web interface. Probably that's why some people still use it to this day.

In our company it's one of those legacy systems, that we need to replace, so as a first step, let's see if we can get some of the data out while we're still using it.

Let's get some data out of it🔗

The idea of this short excercise was to get the state of all checks and feed them to graphite where we could do some analysis. Xymon comes with quite powerful protocol that you can access via xymon binary. In fact, Xymon itself is using that protocol to receive status reports from all the clients.

However what we're interested is this command, ("message") that should give you back a summary of all known tests (checks) available to Xymon daemon (which is your central point of metrics collection):

xymondboard

On top of that we can only ask for specific data. In our case we're only interested in three specific values, so let's only ask for that:

xymondboard fields=hostname,testname,color

First two are prety selfexplanatory, but let's see what this color is. Generally speaking, Xymon defines state of test in colors:

Let's add the actual xymon binary there and host where to fetch the data from and we'll get the final vershion of the whole command:

xymon <server_hostname> 'xymondboard fields=hostname,testname,color'

If you try running the above line manually you'll get back on standartd output hostname, testname and color separated by vertical bar character. ("|" or as we know it, the pipe) One test per line, which is definitely handy.

Sprinkle it with Python magic🔗

So now we know how to get the data out of Xymon, how do we get it to Graphite? Well we'll add couple lines of python:

#!/usr/bin/python

import fileinput
import re
import socket
import time

values = {
        "blue": -1,
        "clear": -1,
        "green": 0,
        "purple": -1,
        "red": 2,
        "yellow": 1,
}

sock = socket.socket()
sock.connect(('127.0.0.1', 2003))

ts = int(time.time())

for line in fileinput.input():
        (hostname, metric, color) = line.split("|")
        graph_domain = re.sub('[^a-z0-9.]','_',hostname.lower()).split(".")
        if len(graph_domain) < 2:
                graph_domain.append("_undefined_")
        graph_path = "{host_path}.{metric}".format(
                host_path = ".".join(graph_domain[::-1]),
                metric = re.sub('[^a-z0-9.]','_',metric.lower()))

        sock.sendall(
                "{path} {value} {timestamp}\n".format(
                                path = graph_path,
                                value = values.get(color.strip(), -1),
                                timestamp = ts
                ))
sock.close()

I'm sure you've seen better code, there's no error handling and major cleanup is due, but for a quick 5 minute hack, it should work. Let's have a closer look. First we define mapping from color to numeric values usable in graphite. I went with -1 for unknown statuses, 0 for green, 1 for warning and 2 for error. Then we open connection to graphite.

Now that we're ready to send data, we read one line at the time from stdin, split it to get the values. We're doing some parsing of the hostname here as well - we want to change the hosname in "host1.example.com" to appear as "com.example.host1" in graphite - this way we can group metrics by domain. (obviously different mapping might be better in your case)

We also sanitise the hostnames and test names, so the resulting path is acceptable by graphite. Finally we send all that to graphite with proper timestamp and value represented by number.

Now add we just need to run it every minute via cron and we're done:

* * * * * /bin/bash -c "/bin/xymon xymon.example.com 'xymondboard fields=hostname,testname,color' | /bin/grafeed.py >/dev/null\"

Perhaps even once every 5 minutes should be OK, considering most of the checks won't have better granularity, but let's leave some breathing space, shall we? With proper aggregation set up in Graphite, the required storage will be quite small anyway.

Final words🔗

This is why I love Python. Batteries included philosophy makes it dead simple to write a quick integration script in minutes. On top of that, there are no external dependencies - if you work on legacy systems, you might sometimes find yourself unable to install any for many reasons. (outdated OS, perhaps with limited or no connection, you get the idea) That's the place where even older version of python might come extremely handy with all its modules included.

As an proof of concept we now collect checks statuses in a convenient format, that's easy to browse and it only took us couple minutes.

But still, don use Xymon. Seriously.

There's no comment system on this page, but I do accept feedback. If you are interested in commenting, send me a message and I may publish your comments, in edited form.