Rails provides a convenient .excluding
method for identifying a group of items while omitting another set.
There may be a drawback that you might not notice when using this method.
When using .excluding
ActiveRecord will execute 2 database queries, one to get the id
or items that are to be excluded, and the second to perform the main query.
Imagine we have a Post
model and a table of posts
, for simplicity a Post
only has an id
and a published
boolean attribute. The table would look like this:
id | published |
---|---|
1 | true |
2 | true |
3 | false |
4 | true |
5 | false |
If we want to get all
the Posts excluding the published
Posts we could use the excluding
method, resulting in 2 queries being executed
published_posts = Post.where(published: true)
# Generated SQL for the query
# => SELECT "posts".* FROM "posts" WHERE "posts"."published" = true
unpublished_posts = Post.all.excluding(published_posts)
# **Executes** published_posts query
# => #<ActiveRecord::Relation [#<Post id: 1, published: true>, #<Post id: 2, published: true>, #<Post id: 4, published: true>]>
# Injects the result into the main query
# => SELECT "posts".* FROM "posts" WHERE "posts"."id" NOT IN (1, 2, 4)
# **Executes** the unpublished_posts query
For large or complex queries, this double execution may be costly to the applicationโs performance.
Part of the reason that 2 queries are executed, is that the .excluding
method calls .flatten
on the collection, which will trigger the first database query.
To optimise for fewer database queries, we can switch back to the examples that .excluding
was introduced to remove.
We can use the .where.not
chain to essentially inject the query from the first part (published_posts
) into the second part of the query (unpublished_posts
).
published_posts = Post.where(published: true)
# Generated SQL for the query
# => SELECT "posts".* FROM "posts" WHERE "posts"."published" = true
# SQL generated but not executed
unpublished_posts = Post.all.where.not(id: published_posts)
# Injects the SQL from published_posts into the main query
# => SELECT "posts".* FROM "posts" WHERE "posts"."id" NOT IN (SELECT "posts".* FROM "posts" WHERE "posts"."published" = true)
# **Executes** the combined query
While using .where.not
isnโt as pleasant to read as .excluding
in certain circumstances it can be more performant by only executing a single database query rather than multiple queries.