Monitoring PostgreSQL with Sensu

von | Aug 18, 2016 | PostgreSQL

Sensu is a relative new Player in the Monitoring World. Upon now i’ve monitored my PostgreSQL Infrastructurs with Nagios or Icinga 2, now it’s time to see what Sensu offers. Here i will share some experience from a test i run today. 

To check PostgreSQL using Sensu you will need this Plugin from the Sensu Plugins Project on Github: https://github.com/sensu-plugins/sensu-plugins-postgres In this post i will not cover the Installation of the Plugin or the Basic Configuration of Sensu. I expect you to have a already running Sensu Instance.

After the installation you will find three Plugins whose names start with check-. These are the plugins covered in this post. The other ones are Metric Collection Plugins that ship Performance-Data to Graphite for example. Now let’s take a close look at check-postgres-alive.

This plugin is a pretty simple ‚can-i-connect-to-my-database‘ Plugin. You can specify the Hostname (-h) and a Port (-P). The Hostname defaults to localhost as the Port does to 5432. Next you need to specify a Username (-u) and a Password (-p) to connect to the Database specified with -d Parameter. The Username defaults to the user running your Sensu Core, on my Ubuntu-System this is ’sensu‘. Regarding the password i think it’s more convenient to specify it in a .pgpass-File in the Sensu Users Home Directory. After the command you configure the subscribers, each Host that subscribes to the dev-channel will automatically pick up the check definition and execute the check on the local machine, reporting back the result to the Sensu Core. Interval, is obviously the check-interval to run the check. That’s a example of such a test.


"pg_alive": {
  "command": "/usr/local/bin/check-postgres-alive.rb -h localhost -P 5432 -u sensu -p sensu -d template1",
  "subscribers": [ "dev" ],
  "interval": 60
},

The output of the plugin is something similar to this if the check succeed


CheckPostgres OK: Server version: {"version"=>"PostgreSQL 9.3.13 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4, 64-bit"}

When the check fails:


CheckPostgres CRITICAL: Error message: could not connect to server: Connection refused

The next Plugin we will cover is the Connection-Plugin, that checks the number of connected Backends either against an absolute number you specify with -w for warning and -c for critical or when specifying -a as a percentage of the available connections of Postgres‘ max_connections setting. I recommend to use a percentage value as it is more flexibel as absolut numbers. Be aware, that the plugin does not take superuser_reserved_connections into account, so expect a bit lower numbers. The next example warns if more than 50% of the available connections are in use, and issues a critical state when there are more than 75% in use. As you can see, i omitted hostname, port, username and password in this case because i use a .pgpass and the defaults of the parameters work fine for me in that case. One hint about the database defined with -d that applies to this as well as the alive plugin, it doesn’t matter which database you configure for both plugins as long as the database exists and allows connections to it. The reason for this is, that the number of connections in PostgreSQL is a Cluster-wide counter. I need to mention, that theres seems to be a bug in the latest release of that plugin, -w and -c together does never issue a critical state on my System, instead it returns warning state every time.


"pg_connections": {
  "command": "/usr/local/bin/check-postgres-connections.rb -d template1 -w 50 -c 70 -a",
  "subscribers": [ "dev" ],
  "interval": 60
},

The Plugin Output looks like this:


CheckPostgresConnections OK: PostgreSQL connections under threshold: 10%, 1 out of 10 connections
CheckPostgresConnections WARNING: PostgreSQL connections at 50%, 5 out of 10 connections
CheckPostgresConnections CRITICAL: PostgreSQL connections at 70%, 7 out of 10 connections

The last plugin is somehow like the Swiss Army knife to check your database. The ‚check-postgres-query‘-Plugin allows you to execute any query against your database. You only need to make sure, that your query return a numeric value. An example of such a query is to examine the Ratio of Commits vs. Rollbacks. In contrast to the previously mentioned plugins, now you have to make sure you specified the correct database with -d because in that case we query a database-specific counter. Alternativ you can check for the number of rows returned by a query and send out an alert if there are more or less rows in a table than expected. If you like to use complexer queries as i do in these example, you should think about using views for that. I guess this way your sensu configuration stays more readable.


"pg_commitratio": {
"command": "/usr/local/bin/check-postgres-query.rb -d template1 -q \"select round(100*xact_commit/(xact_commit+xact_rollback)::numeric,2 ) from pg_stat_database where datname = current_database();\" -w 'value < 97' -c 'value < 90'",
"subscribers": [ "dev" ],
"interval": 60
},

Another example is to query the Hitratio, blocks in shared buffers vs blocks that has to be read from disk.


"pg_hitratio": {
"command": "/usr/local/bin/check-postgres-query.rb -d template1 -q \"select round(100*blks_hit/(blks_read+blks_hit)::numeric,2 ) from pg_stat_database where datname = current_database();\" -w 'value < 97' -c 'value < 90'",
"subscribers": [ "dev" ],
"interval": 60
}

Sadly the output does not contain the value returned by the plugin in case the state is ok


CheckPostgresQuery OK: Query OK

but when the check fails, the result is included


CheckPostgresQuery WARNING: Results: [["97.83"]]

Finally i want to mention, that it is also possible to alert on metrics data in a timeseries database like Graphite by using a metric analysis checks, take a look at the Sensu documentation for details on this topic.