June 05, 2019
Updated version of an article first published on February 16th, 2016.
If you've been around the Ruby/Rails ecosystem for a bit you've likely heard the term 'background job' or 'offline processing'. But what does that actually mean? How do you know which tasks are suitable to be processed 'in the background'? Once you define those tasks, how do pick the right background job framework for your application?
In this post I'll cover all of the above, as well as compare and contrast a few of the leading Ruby background job frameworks.
Lets start with some terminology. A background or asynchronous job (or task) is one that is processed outside of the usual request/response workflow that is part of any modern web framework. Normally, web applications receive a request from the outside world, do some processing (such as querying a database) and immediately return a response within a few milliseconds. This is the normal pattern that we have all become accustom to when developing applications for the web and is known as synchronous communication.
Asynchronous tasks on the other hand, are those that may be started from a normal web request, but require a longer time to complete than the normal request. Because these requests cannot be processed immediately and return a response, they are known as asynchronous. In order to not interrupt the normal synchronous workflow of an application, asynchronous tasks are normally processed on a separate thread or are spawned as a separate process entirely.
It may be easier to explain what a background (asynchronous) job is by describing a common use case that many applications have to fulfill. Let's imagine you have written a Customer Relationship Management (CRM) application in Rails. The actual details of this application are not important, but imagine that a major feature of this application is that it must send email to customers when an order was fulfilled or when a support person has made an update to a support request.
You could implement this feature 'inline', meaning that as soon as an action triggers an email, that action cannot complete (a response cannot be returned to the user) until that email has been sent. This could work, however lets go over some possible ways that this approach may break down:
All of these potential failures are something that a production application must be designed to handle. They also all mean that your application would 'block' instead of returning an immediate response to the user. This is not ideal as users do not like waiting minutes or even seconds for an unresponsive application. A long request can also cause capacity issues for application servers, delaying response times for requests from other users which leads to cascading failure.
Background jobs are a common way to alleviate this problem. Let's re-imagine this same workflow but with using a background job. Once an email is triggered, your application schedules or queues an
EmailSendJob that contains all of the relevant information required such as the recipient, body, subject, etc. Your application could queue several of these jobs back to back before they are actually processed. This is allowed because the processing of these jobs occur on a 'background thread' and do not affect the normal synchronous workflow of your application.
Another benefit of using background jobs is that they can be made to be retryable. In the above failure scenarios, email could not be delivered because of a 'blockage' somewhere in the pipeline. At some point, your email server will come back up, or the customer will clear out their inbox and emails should be able to be sent successfully. Asynchronous jobs that are also retryable allow your application to recover gracefully from these failures and retry the send at a later time.
Hopefully by now you see how you can use a background job framework in your application. But which one should you choose? Just like there is no single programming language that can solve everyone's problem, there is no single 'right' job framework. It all comes down to choosing the best one to fit your use case.
That being said, there are several that I believe are general purpose and stable enough to be used in a production application.
Delayed::Job is a Ruby background job framework that was extracted by the folks at Shopify, a popular ecommerce site. Delayed::Job works by maintaining a 'job' table in the database to keep track of a task and its position in the job lifecycle (scheduled, running, complete, failed, etc). Delayed::Job integrates easily with Rails and ActiveRecord if you are using a relational database, as well as Mongoid for interacting with a non-relational MongoDB store.
Delayed::Job is very stable and has been around for years. The fact that it helps Shopify run its core product by handling such tasks as image resizing, sending newsletters, and updating search indexes makes it even more of a contender in my opinion. In case of a system failure, Delayed::Job jobs should be able to be restarted as long as the job was successfully persisted to the database. This can be a huge benefit when working within a distributed system.
One of the downsides of using a database backed job framework is that it adds or enforces the dependency on the database itself. If your 'jobs' table is in the same database as the one used by your application, this could be a point of contention and place your database under unnecessary high load if you have a huge backlog of queued jobs. Your database will more easily become the bottleneck of your application the more that your application is dependent on it.
Sidekiq is perhaps one of the most well known of the Ruby background job frameworks mainly because of its reliability and performance. Sidekiq is backed by Redis, the extremely popular in-memory data store that powers many of the web applications that we use everyday. Redis, just like a database, runs in a separate process or more commonly on a separate server than that of your application. The main benefit that Redis has over a traditional database is its speed. Because Redis is (almost) entirely in memory, data creation and retrieval is extremely fast.
Sidekiq leverages the speed of Redis by using it as its job management store. Per Sidekiq's documentation, it can process up to 100,000 jobs in 22 seconds, compared to the 465 seconds that Delayed::Job requires. Another benefit of Sidekiq is that it comes with a built in dashboard allowing you to view all of your job queues and their processing state. This can be extremely helpful when debugging a failed or stuck job, or just when you want to have better insight into the work that your application is actually doing. While Sidekiq is open source, it does have two other paid versions, Sidekiq Pro and Sidekiq Enterprise that come with extra features such as rate limiting, periodic scheduled jobs, and unique jobs as well as priority email and chat support.
The benefits of leveraging Redis do not come without their downsides. Because Redis is an in-memory store, it can lead to data loss if your Redis instance crashes while enqueuing or dequeuing a job. Redis does try to mitigate this by either persisting data from time to time to disk, called snapshotting, or by writing to an append only file as data is modified in memory. Also, if your application does not already leverage Redis, using Sidekiq as your job framework does require adding yet another dependency to your infrastructure.
SuckerPunch takes an entirely different approach when it comes to job management by operating within the same process as your application. SuckerPunch achieves this by building on top of concurrent-ruby, a Ruby framework that aids in writing thread safe Ruby code. Since SuckerPunch operates in process, it also stores job state entirely in memory. This means that there are no additional dependencies when adding SuckerPunch to your application.
Because SuckerPunch runs 'within' your existing application, it can be ideal when running in an environment where additional processing comes and a high cost such as Heroku. SuckerPunch does not require a separate Rake task or Ruby process like Delayed::Job and Sidekiq. Also, since all state persistence is in memory, SuckerPunch can operate extremely fast on small tasks.
It may go without saying, but SuckerPunch is the least durable and resilient when it comes to dealing with system failure. SuckerPunch's entirely in-memory persistence model means that if your application is stopped or restarted, any jobs in progress or those in the pending job queue are lost. SuckerPunch's documentation states: "... Sucker Punch is generally recommended for jobs that are fast and non-mission critical (ie. logs, emails, etc.)".
Choosing a background job framework for your Ruby application is no small task. There is no right or wrong job framework, but there are those that may fit your use case better than others. Some things to consider when deciding on which framework to choose include:
Here's a simple table comparing each of the frameworks we've covered in a few important areas:
|Persistence||Database (ActiveRecord / Mongoid)||Redis||In Memory|
|Priority Queues||Yes||Yes||Yes (specify # of workers)|
|Support||Open Source||Open Source/ Paid||Open Source|
|Ruby/Rails Version||Ruby 1.9/Rails 3.0+||Ruby 2.0+/ Rails 3.2+||Ruby 2.0+|
As you can see, the different job frameworks mentioned each have their own benefits and drawbacks as well as their own feature sets that your application may require. As with almost everything, there are tradeoffs that must be taken into account when choosing a background job framework. Hopefully this post has helped you get a better grasp on what each of the leading Ruby background job frameworks have to offer, and how they each may suit your needs.
Want more Rails insights like this delivered monthly to your inbox? Just put your email into the sidebar form.
Adding servers can be a band-aid for slow code. Scout APM helps you find and fix your inefficient and costly code. We automatically identify N+1 SQL calls, memory bloat, and other code-related issues so you can spend less time debugging and more time programming.
Ready to optimize your site? Sign up for a free trial.
Mark Phelps is a Senior Software Engineer and Team Lead at Validic in Durham, NC.
He loves writing clean code and building great software, mostly in Ruby and Java. He also writes about software and startups on his blog.
He graduated in 2008 with a B.S. in Computer Science from Old Dominion University.
When he's not busy writing software, he can usually be found at home in Durham, NC drinking coffee or a good beer with his wife and two dogs (the dogs mostly drink water though).