October 19, 2010
My how you’ve grown! A couple of years ago your little Rails app was on a single server. Now you’re on a whole cluster – you’ve got web servers, database servers, HAProxy servers, and more. I’m so proud of you!
Monitoring your Rails cluster has gotten more difficult though, huh? When it comes to monitoring a cluster of servers, there are lots of options with overlapping features. Some products are open source, some aren’t. Some are hosted, some aren’t. At Scout, we’re very happy with our monitoring stack. We know a bit about monitoring, so what are we using under the hood to monitor our Rails/Sinatra cluster?
If you have a server cluster, you have a decent chunk of users depending on you. That means you’re balancing time between adding new functionality and scaling the legacy bits. For you, monitoring has evolved beyond crowdsourced monitoring (dear webmaster: your website is down). With a growing application on your hands, you need peace of mind. That comes when you start acting on performance hiccups, not meltdowns.
There may be one application out there that does all of this, but if there is, it’s probably bloated and confusing as hell. Additionally, I guarantee it requires you to sit through a sales webinar, doesn’t have a published price, and has plenty of whitepapers. That’s not our style.
Monit, like its guard dog logo, is dumb yet dependable: exactly what you need to ensure your key processes are running (and restart in seconds if they aren’t).
At Scout, we use this to ensure our background job processes are running.
An alternative to Monit is god, a Ruby-based process monitoring framework. Many of our customers use god. Either of these would work – we got started with Monit before god was released.
The Monit daemon runs locally on our servers. It executes a designated set of checks at a specified interval (the default is 2 minutes, but we run the checks every minute at Scout). When a process isn’t running, Monit will attempt to restart it (emailing us the status as it does).
Monit is the only monitoring tool we use that requires extensive configuration outside of a web UI. It can be difficult to remember the configuration syntax since we rarely update it – but we rarely update it.
This is our nerve center. There are several pieces to Scout and each piece is monitored. We monitor the necessary basics (has disk usage dramatically increased?) and specific applications (how many queries/sec is MySQL handling?).
Much of this is about visualizing correlations – changing one part of your cluster often has a domino effect. It’s why charts are such a key part of Scout. As you might expect, we use Scout for this.
The lightweight Scout agent runs every minute via Cron. Metrics are sent to the Scout service.
Scout is just a Ruby gem so installation is straightforward. All of the configuration is done on the web UI. Additionally, you can easily update settings across your entire cluster at once (or just part of it).
We’re often asked to compare Scout to New Relic. There are overlaps, but each is really focused on separate things: Scout takes a server-centric view. New Relic takes an app-centric view. With most of our development time focused on the Rails and Sinatra apps that power Scout, it makes sense to monitor those with a more specialized tool. We’re not alone: a majority of our customers use both Scout and New Relic.
New Relic runs within your Rails application, sending data to the New Relic service.
Like Scout, it’s just a Ruby gem so installation is straightforward. A lightweight configuration file is used to setup the agent. The config file has solid documentation and examples.
HopToad and Exceptional run inside your Rails app. When an exception occurs the service is contacted with the exception details.
Like Scout and New Relic, these are also just Ruby gems with an easy installation. Like New Relic, a lightweight configuration file is used.
We use Pingdom to check if our website is responding. Even you mom’s knitting blog needs availability monitoring.
Pingdom’s servers (they have many locations) check the available of a URL at an interval you specify (the minimum interval is one minute, which is what we use on our metric collection service).
No installation – all of the configuration is completed through the Pingdom website.
There are 5 pillars to monitoring a Rails cluster and 5 nice apps for doing it. When you’re big enough to have a cluster of servers, you need monitoring for both emergencies and capacity planning.