5 Pillars of Rails Cluster Monitoring

My how you’ve grown! A couple of years ago your little Rails app was on a single server. Now you’re on a whole cluster – you’ve got web servers, database servers, HAProxy servers, and more. I’m so proud of you!

Monitoring your Rails cluster has gotten more difficult though, huh? When it comes to monitoring a cluster of servers, there are lots of options with overlapping features. Some products are open source, some aren’t. Some are hosted, some aren’t. At Scout, we’re very happy with our monitoring stack. We know a bit about monitoring, so what are we using under the hood to monitor our Rails/Sinatra cluster?

Philosophy

You’re beyond emergency monitoring now

If you have a server cluster, you have a decent chunk of users depending on you. That means you’re balancing time between adding new functionality and scaling the legacy bits. For you, monitoring has evolved beyond crowdsourced monitoring (dear webmaster: your website is down). With a growing application on your hands, you need peace of mind. That comes when you start acting on performance hiccups, not meltdowns.

There isn’t one application for this (and there shouldn’t be)

There may be one application out there that does all of this, but if there is, it’s probably bloated and confusing as hell. Additionally, I guarantee it requires you to sit through a sales webinar, doesn’t have a published price, and has plenty of whitepapers. That’s not our style.

Process Monitoring

Monit, like its guard dog logo, is dumb yet dependable: exactly what you need to ensure your key processes are running (and restart in seconds if they aren’t).

At Scout, we use this to ensure our background job processes are running.

An alternative to Monit is god, a Ruby-based process monitoring framework. Many of our customers use god. Either of these would work – we got started with Monit before god was released.

How it works

The Monit daemon runs locally on our servers. It executes a designated set of checks at a specified interval (the default is 2 minutes, but we run the checks every minute at Scout). When a process isn’t running, Monit will attempt to restart it (emailing us the status as it does).

Installation & Configuration

Monit is the only monitoring tool we use that requires extensive configuration outside of a web UI. It can be difficult to remember the configuration syntax since we rarely update it – but we rarely update it.

We walked through the setup of an example check a couple of years ago. It’s a good place to start after installing Monit.

Cluster Monitoring

This is our nerve center. There are several pieces to Scout and each piece is monitored. We monitor the necessary basics (has disk usage dramatically increased?) and specific applications (how many queries/sec is MySQL handling?).

Much of this is about visualizing correlations – changing one part of your cluster often has a domino effect. It’s why charts are such a key part of Scout. As you might expect, we use Scout for this.

How it works

The lightweight Scout agent runs every minute via Cron. Metrics are sent to the Scout service.

Installation & Configuration

Scout is just a Ruby gem so installation is straightforward. All of the configuration is done on the web UI. Additionally, you can easily update settings across your entire cluster at once (or just part of it).

Application Monitoring

We’re often asked to compare Scout to New Relic. There are overlaps, but each is really focused on separate things: Scout takes a server-centric view. New Relic takes an app-centric view. With most of our development time focused on the Rails and Sinatra apps that power Scout, it makes sense to monitor those with a more specialized tool. We’re not alone: a majority of our customers use both Scout and New Relic.

How it works

New Relic runs within your Rails application, sending data to the New Relic service.

Installation & Configuration

Like Scout, it’s just a Ruby gem so installation is straightforward. A lightweight configuration file is used to setup the agent. The config file has solid documentation and examples.

Exceptions

We want to find out when a user encounters a bug immediately. Hoptoad and Exceptional are two quality hosted services for exception notification and aggregation.

How it works

HopToad and Exceptional run inside your Rails app. When an exception occurs the service is contacted with the exception details.

Installation & Configuration

Like Scout and New Relic, these are also just Ruby gems with an easy installation. Like New Relic, a lightweight configuration file is used.

Website Availability

We use Pingdom to check if our website is responding. Even you mom’s knitting blog needs availability monitoring.

How it works

Pingdom’s servers (they have many locations) check the available of a URL at an interval you specify (the minimum interval is one minute, which is what we use on our metric collection service).

Installation and Configuration

No installation – all of the configuration is completed through the Pingdom website.

Summary

There are 5 pillars to monitoring a Rails cluster and 5 nice apps for doing it. When you’re big enough to have a cluster of servers, you need monitoring for both emergencies and capacity planning.

Related

Subscribe to our RSS feed or follow us on Twitter to hear more about Rails performance.