"Munin is a server/node pair that graphs, htmlifies and optionally sends out notifications about data it gathers. It's designed to let it be easy to graph new datasources."
from the official munin documentationRev 1.3 - see the changelog
Munin is a monitoring tool written in Perl started by Jimmy Olsen late 2003, based on the excellent RRD tool by Tobi Oetiker. Even if the development has slowed down since 2005, Munin is a stable tool; it is also very widely used, thanks to its very easy setup.
It consists of munin-node, a daemon you will install on every server you want to monitor and which will gather the data, and munin, which you will install on your monitoring server and which will connect at a regular basis to every node to retrieve it. Munin will then use the data to generate the corresponding graphs and HTML pages.
We have two machines; a server we want to monitor (serverA, IP: aaa.aaa.aaa.aaa) and a server which will monitor it (serverX, IP: xxx.xxx.xxx.xxx).
On the server we want to monitor, we need to install munin-node.
By default, only serverA itself will be allowed to connect to this node to retrieve the data; we need to explicitly allow serverX to connect to it; this is done at the end of the configuration file of munin-node (usually in /etc/munin/munin-node.conf).
You will find the following line:
Below, allow the IP adress of serverX to connect (the ^ and $ at the beginning and at the end are important):
As munin-node runs as a daemon, you need to restart it to make the changes active.
On the monitoring server, we install munin:
We need to tell munin that we want it to monitor serverA. Munin's configuration munin.conf is usually to be found in /etc/munin/.
At the end of the file, add the following:
serverA should be the name of your machine. Domain is the "domain" of your machine; in fact it is more a group name, used to sort your servers. You can choose to sort by location (server1.london...), by role (server1.apache), or whatever you feel is relevant. The usual notation I found everywhere is [serverA.domain]; I don't know why, as it creates a problem with domain names using dots, and makes the domain name appear after every node name in the overview page. I would recommend to use the notation I gave.
Munin is a Perl script run every 5 minutes by cron. The cronjob should have been set automatically during the installation. Therefore, you don't have to restart it; just wait 5 minutes.
Your files should already be available in /var/www/munin by now. Make them available by installing an HTTP server; lighttpd would do the job here.
You can then access the monitoring via your favorite browser with the address http://xxx.xxx.xxx.xxx/munin/.

Congratulations! Munin is working. Your server is now monitored, the default setup should be enough for most of the cases. But there is a lot of other cool stuff you can do with munin, which I will describe now :).
Along this tutorial, you will create and install plugins, add nodes, virtual nodes... It is really important for troubleshooting and debugging that you understand how munin communicates with the nodes - so we are going to play with telnet a little bit here. Open a shell on a machine which is allowed to connect to the munin-node; for our example I will take ServerA (the node) and serverX (the server allowed to access the node). The default port of munin is the 4949.
As there is no help command, just enter something and hit enter to get a list of the commands available:
Let's start with the list command: as you can guess, it lists all the "services" presented by the node (your output may differ a little bit).
This list actually depend on what you have installed; if you have a mysql server installed, you will see mysql-related services in the list. The two commands config and fetch take a service as argument. Let's have a look at our swap:
The config command will tell our munin how to build the graph. It will give a title for the graph, a category, a legend for the axes... These variables are needed by rrdtool to build the graphs. The fetch command actually retrieves the values themselves.
The nodes command list the nodes made available by the current node; Right now we have only one node, but as one node can in theory monitor several servers or equipments, munin as introduced the concept of virtual node which we will detail later. This command is not too important for now.
Finally, we have the version and the quit commands:
Using telnet to access your munin-node is usually not useful, although it can be sometimes for debugging purposes. It has been described here to help you understand how munin and munin-node communicate together, and help you understand the whole data gathering process.
Munin graphs are nice; but if you don't want to check them every morning for suspiciously high network traffic or critical disk usage, you would probably want munin to send you an alert if it finds an "unusual" value. Munin has a very basic alerting system built-in. Imagine your email adress is yann@foo.com, and you want to receive a mail if the load on serverA goes over 3, and another mail if it reaches the critical value 5. You also want chris, chris@foo.com, to be notified.
In /etc/munin/munin.conf, add the following line (usually over the part defining the nodes to monitor) :
Then, in the part describing serverA:
The values 3 and 5 are here maximal values. If you wanted to say, I want to be warned if the load goes under 1, you could replace 3 by 1:. You can also set a minimum and a maximum value: load.warn 1:3 would warn you if the load goes under 1 or over 3.
To monitor part of a service with munin, you will need the internal name of the element you want to check. For example, we want to be warned if the usage of the disk /dev/sdb1 on serverA exceeds 95%; the line we will add is _dev_sdb1.warning 95, devsdb1 being the internal name of the element. There are two ways to find this internal name.
We know the usage of that disk is monitored by the plugin "df". So we can go the HTML page produced by munin, click on the graph corresponding to the df plugin; on the bottom of the page, a table lists all the elements monitored by the df plugin, with their internal name. The other way is to connect to the node with telnet, and fetch the df plugin:
Anyway, remember: munin is run by cron every five minutes. And no, munin doesn't keep track of who it has already mailed or not. I let you imagine what would happen if the usage of your disk /dev/sdb1 goes up to 96% friday evening, just after you left work. You may have a surprise on monday morning, when checking your mails, it may be tens if not hundreds of mails... You can not make groups of contacts neither, or group of machines. If you want to have warnings on 10 services on 50 machines, it starts to get quite complicated... Therefore I would recomment you use one of the Nagios methods.
First you need a way for Nagios to accept messages from Munin. Nagios has exactly such a thing, namely the NSCA which is documented here: http://nagios.sourceforge.net/docs/1_0/addons.html#nsca.
NSCA consists of a client (a binary usually named send_nsca) and a server usually run from inetd. We recommend that you enable encryption on NSCA communication.
You also need to configure Nagios to accept messages via NSCA. Those will be passive alerts.
If you don't want to use passive checks. You can use check_munin_rrd plugin.
Basically Munin-node data get stored on the munin server as usual and Nagios is reading those data to check the status of the node.
Previous implementation was using a check from Nagios directly onto Munin-node which is overkill since the Munin server gets the data already via cron.
You need to define a
Don't use smaller value for normal_check_interval, munin updates data every 5 minutes.
It happened to me twice to write or to use plugins for munin-node which were not concerning directly the server where the node was installed on. The first one was a plugin using the SNMP interface of an UPS to check the temperature in our server room, and the other one a plugin returning the number of visitors on a forum, by downloading the page and finding the value with a regular expression.
As these plugins were installed on the monitoring machine, I had the temperature of the server room somewhere in between the available entropy and the load graphs on the monitoring machine. With the virtual nodes, I can create a virtual node called ServerRoom1, with for example two plugins, temperature and humidity. There is no physical machine called "ServerRoom1", so the munin-node installed on serverA will just tell me, while I fetches its nodes, which ones are for serverA and which one for ServerRoom1. This is what "config temperature" in a telnet session to my server will return:
As you can see, there is an additional line specifying the host name. For the other plugins, no host_name is defined, therefore it is assumed there are for the default node (the one you get the greeting from just after connecting with telnet, remember?). The nodes command will also return several nodes:
There are several ways to say to munin that a plugin is reporting data for a virtual node. The preferred method is the following; edit the file /etc/munin/plugin.conf.d/munin-node on the machine running the node:
Find the section related to your plugin, or if you can not find it, add it at the end; if your plugin name is temperature, and the name you want to give to the virtual node is ServerRoom1, then add this:
Remember to restart munin-node; you should now be able to check via telnet that the node on serverA is presenting two nodes.
You then need to configure munin on the monitoring server:
ServerRoom1 being the name of the virtualnode, and aaa.aaa.aaa.aaa the IP address of the server with the munin-node.
Save, wait 5 minutes, it should work :)
It is sometimes useful to compare data coming from several nodes; if you have a cluster of 5 load balanced HTTP servers, it can be interesting to have the curves for the load of all 5 servers on a single graph, to check if the load balancer is properly configured. You could also put on the same graph the load of your fileserver, and the CPU time spent on "i/o wait" on a server accessing it, to study the correlation between both values.
Let's take the example with the 5 HTTP servers. First, we create a new virtual node:
Then we create our new graph:
After the next run of munin-cron, your new graph should be available.
A little more complex example now. I have 3 plugins collecting the number of users on a forum; the first plugin collects the number of guests, hidden users and online registered users on the forum of the german Ubuntu community. The second plugin collects the number of guests and registered users on the french one, and the third the number of guests, hidden users and online registered users on the russian one.
The idea is to aggregate all these users on a single graph; I want one stack for each forum, so for each plugin we will have to sum guests, hidden users and registered users. Drops in the following graph are due to timeouts of one of the plugins due to heavy load on the host:

This is made with the following configuration, in munin.conf, in the node you want to put the graph in:
Misc is the domain, Forums the name of the node. forum_sum is the name of the virtual plugin we create; users_fr, users_de and users_ru the name of the different fields for that plugin. forum_users_* are the names of the plugins fetching data from the forums.
One of the best aspects of munin is its ease of configuration. The drawbacks to this is that sometimes you will get informations you may wish to complete; "Wireless" may be more precise than eth1, "NAS share" more informational than /dev/sdb3. This is made possible by munin with the overriding, on a per-node basis, of some configuration data of the plugins.
Alright, let's assume on your computer eth1 corresponds to your wireless network. The graph concerning the bandwidth usage on that interface is generated by the plugin if_eth1 (all the plugins are in /usr/share/munin/plugins; it is fairly easy with the name to guess which plugin produces which graph).
We need to get the output of the config for that plugin. We have seen before how to do that with telnet, this time we will connect directly to the server with munin-node installed and use munin-run - the result is exactly the same, only this time you execute the operation locally.
The element we want to override here is the variable graph_title. This is done on the server with munin (serverX), in the munin.conf. Find the node referring to the server for which you want to do the modification, and add the line concerning that value:

This should already be enough :) You can also overwrite the other variables returned by config, like the description (graph_info), which can be useful to add special informations, like "the disk is getting full, but it's fine, we already ordered a new one" or being more precise in describing how monitoring the entropy of a system can be useful...
I am not a big fan of munin's default templates; so I wrote my own and usually uses that one (see http://monitor.thehumanjourney.net/munin/). Writing templates for munin is extremely easy, as munin uses the html template package from perl, libhtml-template-perl. The templates files are also located with the configuration files, in /etc/munin/templates; as these files are separated from the core of munin, they are not affected by upgrades.

Documentation about libhtml-template-perl is accessible via man HTML::Template. Basically, munin's template files are normal HTML files, with a couple of additional markup provided by HTML::Template enabling the use of conditional statements and loops, like <TMPL_IF> or <TMPL_LOOP> . This is for example the main loop in /etc/munin/munin-nodeview.tmpl, displaying the list of services:
The HTML code is not the best ever... There is a lot to optimise there :) If you want to see which variables are available in each file, have a look at /usr/share/munin/munin-html, at the list of arguments given to template_name->param; here is the call for the nodeview:
So for the node view, you will have the name of the current node, the name of the current domain, a list of all the domains, a timestamp corresponding to the time the page has been generated, and a list of the categories for that node; the list of the categories also contains the list of the services - but you have either to look higher in the code or in the current template to know that.
So, let your imagination act; I have never found other templates than the default one, and I would love this to change!
Munin-node (maybe depending on the package) comes with a lot of plugins; they can be found in /usr/share/munin/plugins (on a server with munin-node installed). When you install it, it checks which ones are relevant (for example, it won't add apache monitoring if you don't have any apache server installed), and add soft links to these in /etc/munin/plugins. So, to get the list of currently used plugins:
And to see the list of available plugins:
There is a tool provided with munin-node which does pretty much the same, and can sometimes be useful, munin-node-configure. Run without parameter, it will list all the plugins, and say if they are used or not.
As you can see, there are a lot of plugins available, and only a few that are activated. To see if a plugin is usable, run it with the "autoconf" parameter:
The plugin will tell you "yes" if it is installable, and no, sometimes with an explicit additional error message, if it is not. You can also run munin-node-configure with the --suggest parameter to see, among the plugins which are not installed, which ones you can install, or the error messages for those you can't:
If you find an interesting plugin which is not already installed and which is installable, you can install it by adding a link in /etc/munin/plugins:
Munin comes with many plugins, which work out of the box after the installation. Even though, you may still want to monitor values for which no plugin has been created yet. For this chapter, we will create a plugin which monitors the CPU usage for a defined set of users; I created it to help identify which applications were using most of the CPU in a fastCGI environment, on one of our Ubuntu-eu servers.
You can create plugins in the language of your choice; most of the ones munin comes with are either in Perl or shell scripts, but you should be able to use whatever language you are familiar with. Although, as the plugin I will describe here is in sh, this chapter requires a basic knowledge of shell programming and awk.
The final sourcecode is available here.
First, you need to define which data you want to graph, and how to retrieve it. In our example, it would be the CPU usage for a specified user, at the time the plugin is run. I noticed the output of ps on my system, with it's BSD syntax, printed out the CPU usage for every process. It is definitely not very precise, as the values are rounded up; but I believe it gives a fairly good overview of which users are using most of the CPU.
So, there are three things to do: filter out the processes that belong to a specific user, isolate the CPU column, and add all the values. To filter out processes that belong to a user, we will use the -u option of ps; to isolate the CPU column, the easiest way is to use the -o option of ps:
We can now add the values with a small awk script (here in a single line, to make it easy to try out):
Now that looks interesting... But we don't want to monitor only a single user, but several, and the variables need to be presented under the form var_name.value:
Save this in /usr/share/munin/plugins/cpubyuser, add a soft link to it in /etc/munin/plugins/, and try it out:
Now, we don't want the usernames to be hardcoded in the plugin, but set in one of munin's configuration files. Edit the file /etc/munin/plugin-conf.d/munin-node, and add the following lines at the end:
... and remove the line defining $USERS in our script. Reload munin-node (sudo /etc/init.d/munin-node restart).
If you run the script directly, the variable $USERS won't be defined, and the script will exit. If you run it via munin-run, munin-node will create the environment variable for the script, so it will work.
As we have seen in the previous chapter, when munin-node installs, it runs every script with the autoconf parameter, to decide if it should or not activate it. Therefore, our plugin also need to handle this.
Add this at the top of the plugin, before the main loop. If the plugin is run with the autoconf parameter, it will check for the environment variable $USERS; if it is defined, our plugin with reply with "yes", meaning that it can be used. Else, it will reply "no". Your plugin is supposed to check in this section if everything is ok for the plugin to run correctly; like checking if apache is running if your plugin is about apache.
When run with the "config" parameter, your plugin is supposed to output data for RRDTool to tell it how to graph the different values given by our plugin, and metadata like the title of the graphs, the title of the axes...
The "base" parameter on the second line states if we consider that 1M=1024K or 1M=1000K. One of the most important part of this is the loop in the end, which tells rrdtool how it should process the data we feed it with; for every variable monitored, we need to tell rrdtool what to do with it.
For example, the variable "yann" should be labelled "yann" (you can change it to "CPU usage for $USER", or whatever you think is better). The label is what will appear in the legend, on the graph, it should be reasonably short. For a longer description, you should use the "info" section.
For the "type" of the value, you should read man rrdcreate, there is a whole part on the different types of data sources. A "gauge" is described like this: " GAUGE: is for things like temperatures or number of people in a room or the value of a RedHat share.".
Well, our plugin is nearly done. Now add these two lines at the top, between the shebang and the autoconf part:
This will tell munin that it should attempt to automatically install and configure your plugin. Now you just need to add some comments, installation instructions, license, history, author contact, etc.... and your first munin plugin is ready!

Now, this was a very simple plugin; depending on what you want to do, it can be a lot harder (look at the memory plugin!). If you want to create complex graphs, you should definitely complete your knowledge by reading rrdtool's documentation. On the other hand, if you only want to create simple plugins, reading plugins in /usr/share/munin/plugins and trying to adapt them is a very good way to get started.
As seen before, munin is run by a cron job every five minutes. So, every five minutes, it connects to all the servers it has to monitor, fetches all the data, writes the data in hundreds of RRD files, and recreates all the HTML files and hundreds of PNG files; the more servers monitored, the more CPU munin will use.
Some other tests are also rather interesting:
This test (made on my laptop, one node monitored only) shows two interesting things: first, the generation of the PNGs is the heaviest part of the process (10.965 seconds of cpu usage vs 0.532 for the three other processes); second, the munin-update process takes nearly 30 seconds to complete, but barely uses the CPU - probably because it is waiting for the node to run all its plugins. That's why when munin starts, it forks, and run a process for each node, and why you should not prevent it from forking (there is an option for that - don't use it).
If now I was monitoring 10 nodes, it would take approx. 110 seconds on my laptop (if nothing else is running), every five minutes. In other words: as you add nodes to munin, it tends to become quite heavy.
One of the ways to improve the performances is to change the way Munin creates the graphs; instead of recreating the graphs every five minutes, we can create them only when a user has requested them, by displaying one of the webpages. This is made possible with CGI.
So, how does it work? When installed, Munin creates a script in /usr/lib/cgi-bin/, munin-cgi-graph. When configured as CGI, Munin changes the links to the pictures in the HTML files, making them point to munin-cgi-graph:
Depending on the path, munin-cgi-graph will create the appropriate graph, which will then be displayed. There is also a caching system, so that if you reload the page within five minutes, the graphs won't be regenerated again; therefore, as munin will write the files to the disk, the directory /var/www/munin must be writeable by the apache process. Making the files belong to the user munin and the group www-data, and giving the group write access, is one solution:
The performance gain is huge; but one of the drawbacks to this method is that it takes a lot more time to display a page containing several graphs, like the node view.
To configure Munin as CGI you need to add the following lines to your /etc/munin/munin.conf:
These lines help Munin to create correct links to the graphs. Now, assuming you are using Apache, you need to edit your main apache configuration file, to allow /usr/lib/cgi-bin to run CGI scripts:
Finally, you need to tell Apache that your website is going to use CGI. If you have a special virtual host set up for munin, then add that line there; else add it somewhere in the main apache configuration file:
Munin-cgi-graph also uses the perl module Date::Manip; which you need to install. Your Munin is now running as CGI!
If your graphs are not updating anymore, if some of them don't display, or if the HTML pages don't get created at all, it is highly likely that your problem is related to permissions. Remember: you are never, ever supposed to run munin as root. If you do so, even only once, it will create all the rrd, html, png files with the owner root; therefore, munin, as run by the cronjob as user munin, won't be available to write to them anymore. So, check the rights on the files in /var/www/munin and in /var/lib/munin - these directories and all the files they contain should be owned and be writeable by the user munin. /var/www/munin should even by writeable by the HTTP process. The user munin should also have read access on the files in /etc/munin/ (and subdirectories), and read and execute rights on the files in /usr/share/munin.
If you believe the rights are correct, you should read the log files, which are in /var/log/munin. If a node appears in the list but contains no data, maybe your munin server is failing to connect to the node; you should read munin-update.log. Munin-graph.log and munin-html.log may also help you track down wrong permissions.
If you still don't find the error or need more information, try connecting from the server to the faulty node with telnet as described previously, and fetch the data manually. You may get a detailled error message from the plugin.
Hint: You can also at every moment delete all the files in /var/www/munin: they will be rebuilt after the next time munin is started by cron. You won't lose any data, as the data is stored in rrd files in another directory (/var/lib/munin).
There are several ways to learn more about Munin. The most complete documentation about how to use munin I found was the one distributed with the package, available in /usr/share/doc/munin/; but it is more a list of options than a tutorial, so it assumes you already know how munin and rrd are working.
The official munin wiki is also a good source of information, although i find it a bit confused sometimes; if you don't find what you're looking for, try a search in the archives of the mailing list, or ask on the mailing list if you definitely don't find: there will always be a nice person there to help you out.
This document has been made possible thanks to: