Like a pair of jumper cables, ActiveRecord’s joins
, includes
, preload
, and eager_load
methods are incredibly useful, but also very dangerous when used incorrectly. Knowing when and where to use each approach – and even when to combine them – can save you considerable trouble as your rails app grows.
I’ll explore the when and where of each method below.
joins
What’s the ideal use case for joins
?
If you are just filtering results – not accessing records from a relationship – joins
is your goto. The example below fetches all blog posts with a comment authored by Derek. I’m not accessing any of the associated comments, so joins
is a great fit:
Post.joins(:comments).where(:comments => {author: 'Derek'}).map { |post| post.title } Post Load (1.2ms) SELECT "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."author" = $1 => ["One weird trick to better Rails apps", "1,234 weird tricks to faster Rails apps", "You wouldn't believe what happened to this Rails developer after 14 days"]
Do joins prevent N+1 queries?
By itself, no. joins
does not load data from the relationship into memory: accessing columns from the relationship will trigger N+1 queries.
For example, notice all of the additional queries when accessing the Comment
relationship:
Post.joins(:comments).where(:comments => {author: 'Derek'}).map { |post| post.comments.size } Post Load (1.2ms) SELECT "posts".* FROM "posts" INNER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."author" = $1 (1.0ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (3.0ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (0.3ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (1.0ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (2.1ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 (1.4ms) SELECT COUNT(*) FROM "comments" WHERE "comments"."post_id" = $1 => [3,5,2,4,2,1]
Can joins be combined with includes
, preload
, and eager_load
?
Yes. The join type specified by joins
(default is INNER JOIN
) will override any join applied by includes
or eager_load
. Note that preload
doesn’t apply a join.
includes
Can includes prevent N+1 queries?
Yes. includes
will load (1) all records of the parent and (2) all of the records referenced as arguments in the includes
method.
Notice how using includes
in the example below only triggers 1 additional query. Without includes
, there would be an additional query to count the number of comments for every post:
Post.includes(:comments).map { |post| post.comments.size } Post Load (1.2ms) SELECT "posts".* FROM "posts" Comment Load (2.0ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (1, 3, 4, 5, 6) => [3,5,2,4,2,1]
Does includes
always generate a separate query to fetch the records in the relationship?
No. includes
will either use a separate query (like above) or a LEFT OUTER JOIN
. If you have a where
or order
clause that references a relationship, a LEFT OUTER JOIN
is used versus a separate query.
Is a single query or two queries faster?
Digging through the ActiveRecord source, I don’t believe ActiveRecord makes the decision to use two queries or a single join query based on performance. If you are seeing slow performance with an includes
query, I’d suggest using a tool like Scout DevTrace locally and examining which approach ActiveRecord is using when running includes
.
If two queries are being used, you can experiment with a single LEFT OUTER JOIN
query by adding references
to the ActiveRecord relation:
Post.includes(:comments).references(:comments).map { |post| post.comments.size }
What happens when I apply conditions to a relationship referenced via includes?
ActiveRecord will return all of the parent records and just the relationship records that match the condition.
For example, the following will return all Post
records with a Comment
by Derek, and just those comments authored by Derek:
Post.includes(:comments).references(:comments).where(comments => {author: 'Derek'}).map { |post| post.comments.size }
Does includes prevent all N+1 queries?
No. If you are accessing data in a nested relationship, that data isn’t preloaded. For example, an additional query would be required to load the Comment#likes
association for each comment:
<% post.comments.each do |comment| %> <%= comment.likes.map { |like| like.user_avatar_url } <% end %>
Can I prevent N+1s in nested relationships?
Yes. You can load nested relationships via includes
:
Post.includes(comments => :likes).references(:comments).map { |post| post.comments.size }
Should I always always load data from nested relationships?
No. It’s very easy to end up initializing a significant number of records. For example, a popular Comment
may have thousands of Like
records, which would result in a slow query and significant memory allocations.
A tool like Scout DevTrace running locally against production-like data can help you determine the faster approach.
preload
Should I ever use preload by itself?
Sometimes, but not by default. I use preload
versus includes
if I know using a LEFT OUTER JOIN
to load a relationship is significantly slower. Otherwise, if I add where
or order
clauses at a later date, those clauses would trigger eager_load
, which would trigger a join.
Is it common to combine joins
with preload
?
If I need all relationship records – not just just those that match a relationship condition – I’ll combine preload
and joins
. For example:
- Find all
Post
records with aComment
authored by Derek - Render those
Post
records and the total count of comments for each post
includes
will only fetch Comment
records authored by Derek, not all comments associated with each post.
Post.joins("LEFT OUTER JOIN comments ON comments.post_id = posts.id").where(:comments => {author: 'Derek'}).preload(:comments).map { |post| post.comments.size }
eager_load
includes
delegates to eager_load
when a where
or order
clause references a relationship.
Should I ever use eager_load by itself?
Yes. If I’ve found includes
to be slow using two queries, using eager_load
will force a single query via a LEFT OUTER JOIN
. Its presence in my code indicates I’m future-proofing against using two queries to fetch records.
Can I combine eager_load with joins?
Yes. In the following example:
Post.joins(:comments).eager_load(:comments).map { |post| post.comments.size }
ActiveRecord will do the following:
- Return an
Array
ofPost
records with comments. - Load the comments associated with each
Post
.
It’s includes
with an INNER JOIN
vs. a LEFT OUTER JOIN
.
TL;DR
I’d roughly summarize my approach to these methods like this:
- If I’m just filtering, use
joins
. - If I’m accessing relationships, start with
includes
. - If
includes
is slow using two separate queries, I’ll useeager_load
to force a single query and compare performance.
There are many edge cases when accessing relationships via ActiveRecord. Hopefully this is enough to prevent some of the more basic performance deadends when using joins
, includes
, preload
, and eager_load
.
Also, See
- Finding slow ActiveRecord queries with Scout
- Which Ruby background job framework is right for you?
- State of the 2017 Rails Stack
- Unicorn vs. Puma vs. Passenger: which app server is right for you?
Ready to optimize your site?
Adding servers can be a band-aid for slow code. Scout APM helps you find and fix your inefficient and costly code. We automatically identify N+1 SQL calls, memory bloat, and other code-related issues so you can spend less time debugging and more time programming.
Ready to optimize your site? Sign up for a free trial.
This article is an updated version first published in January 2017.
derek@scoutapp.com