N + 1 and Cross Cluster Queries in Rails
N + 1 query optimisation in Rails is a classic ORM optimisation problem. Knowing the standard patterns for solving N + 1 queries in Rails should be a prerequisite for anyone wanting to be considered a senior rails developer.
This problem occurs an entity has associated records that are fetched through another set of associated records. For example, a user couple have many purchased products through their orders. To get all of a users purchased products, we make 1 query to get all their N orders. The problem is that we then must make N additional queries for the products in each order or use a work around.
An N + 1 Example
Let’s create a new project:
rails new blog
Let’s create a user:
rails g model User username
Let’s create a blog post:
rails g model Post title body user:references
Create the DB & run migrations:
rails db:create && rails db:migrate
Let’s update our user so that it has many blog posts:
class User < ApplicationRecord
has_many :posts, dependent: :destroy
end
Let’s create a few users with 100 posts each:
blog ➤ rails c
Loading development environment (Rails 7.0.0.alpha2)
3.0.0 :001 > ["keoghpe", "elon", "gandalf", "dhh", "frodo"].each {|name| User.create!(username: name)}
(0.7ms) SELECT sqlite_version(*)
TRANSACTION (0.0ms) begin transaction
User Create (0.4ms) INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?) [["username", "keoghpe"], ["created_at", "2021-11-20 14:09:30.163285"], ["updated_at", "2021-11-20 14:09:30.163285"]]
TRANSACTION (0.9ms) commit transaction
TRANSACTION (0.0ms) begin transaction
User Create (0.3ms) INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?) [["username", "elon"], ["created_at", "2021-11-20 14:09:30.166146"], ["updated_at", "2021-11-20 14:09:30.166146"]]
TRANSACTION (0.5ms) commit transaction
TRANSACTION (0.0ms) begin transaction
User Create (0.3ms) INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?) [["username", "gandalf"], ["created_at", "2021-11-20 14:09:30.167708"], ["updated_at", "2021-11-20 14:09:30.167708"]]
TRANSACTION (0.5ms) commit transaction
TRANSACTION (0.0ms) begin transaction
User Create (0.3ms) INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?) [["username", "dhh"], ["created_at", "2021-11-20 14:09:30.169435"], ["updated_at", "2021-11-20 14:09:30.169435"]]
TRANSACTION (0.5ms) commit transaction
TRANSACTION (0.0ms) begin transaction
User Create (0.3ms) INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?) [["username", "frodo"], ["created_at", "2021-11-20 14:09:30.170897"], ["updated_at", "2021-11-20 14:09:30.170897"]]
TRANSACTION (0.5ms) commit transaction
=> ["keoghpe", "elon", "gandalf", "dhh", "frodo"]
3.0.0 :002 > User.find_each do |user|
3.0.0 :003 > 100.times {|n| Post.create!(user: user, title: "Title for post #{n}", body: "Body for post #{n}")}
3.0.0 :004 > end
User Load (0.2ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ? [["LIMIT", 1000]]
TRANSACTION (0.0ms) begin transaction
Post Create (0.3ms) INSERT INTO "posts" ("title", "body", "user_id", "created_at", "updated_at") VALUES (?, ?, ?, ?, ?) [["title", "Title for post 0"], ["body", "Body for post 0"], ["user_id", 1], ["created_at", "2021-11-20 14:10:44.804986"], ["updated_at", "2021-11-20 14:10:44.804986"]]
# Lots more log lines for each insert
Let’s see an N + 1 query in action by trying to print the username and title for each of our users posts:
3.0.0 :008 >
3.0.0 :008 > User.find_each do |user|
3.0.0 :009 > user.posts.each do |post|
3.0.0 :010 > puts "#{user.username}: #{post.title}"
3.0.0 :011 > end
3.0.0 :012 > end
User Load (0.1ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ? [["LIMIT", 1000]]
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ? [["user_id", 1]]
keoghpe: Title for post 0
keoghpe: Title for post 1
keoghpe: Title for post 2
....
keoghpe: Title for post 99
Post Load (0.3ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ? [["user_id", 2]]
elon: Title for post 0
...
elon: Title for post 99
Post Load (0.3ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ? [["user_id", 3]]
gandalf: Title for post 0
# ...
gandalf: Title for post 99
Post Load (0.3ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ? [["user_id", 4]]
dhh: Title for post 0
# ....
dhh: Title for post 99
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ? [["user_id", 5]]
frodo: Title for post 0
# ... etc
We can see for every user, we are making an additional round trip to the database.
Eager loading
We can make this more efficient by performing a single query to the database for all posts instead. In rails this is known as eager loading and it is suppported with the includes
method. We can use includes in our previous example to generate just 2 queries:
3.0.0 :014 > User.includes(:posts).find_each do |user|
3.0.0 :015 > user.posts.each do |post|
3.0.0 :016 > puts "#{user.username}: #{post.title}"
3.0.0 :017 > end
3.0.0 :018 > end
User Load (0.1ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ? [["LIMIT", 1000]]
Post Load (1.5ms) SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (?, ?, ?, ?, ?) [["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5]]
keoghpe: Title for post 0
keoghpe: Title for post 1
keoghpe: Title for post 2
keoghpe: Title for post 3
keoghpe: Title for post 4
# ... etc
More Complex Associations
Active record associations are pretty flexible. We can use queries on the associations and use them with includes for more complex relationships. Let’s add a featured column to our posts and create the featured_posts
association on User
.
Quit the rails console and add the featured column to posts:
blog ➤ rails g migration AddFeaturedToPosts featured:boolean git:main*
invoke active_record
create db/migrate/20211120143359_add_featured_to_posts.rb
Set the default value to false:
class AddFeaturedToPosts < ActiveRecord::Migration[7.0]
def change
add_column :posts, :featured, :boolean, default: false
end
end
Run the migrations with rails db:migrate
.
Let’s create a scope on our Post model and create the associated relationship on our user model:
class Post < ApplicationRecord
belongs_to :user
scope :featured, -> { where(featured: true) }
end
class User < ApplicationRecord
has_many :posts, dependent: :destroy
has_many :featured_posts, -> { featured }, class_name: 'Post'
end
Let’s start a rails console and update one fifth of our posts to be featured. We can then see includes work for the association with a query:
blog ➤ rails c
Loading development environment (Rails 7.0.0.alpha2)
3.0.0 :001 > Post.where("id % 5 = 0").update_all(featured: true)
(0.6ms) SELECT sqlite_version(*)
Post Update All (0.2ms) UPDATE "posts" SET "featured" = ? WHERE (id % 5 = 0) [["featured", 1]]
=> 100
3.0.0 :002 > User.includes(:featured_posts).find_each do |user|
3.0.0 :003 > user.featured_posts.each do |post|
3.0.0 :004 > puts "#{user.username}: #{post.title}"
3.0.0 :005 > end
3.0.0 :006 > end
User Load (0.1ms) SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ? [["LIMIT", 1000]]
Post Load (0.5ms) SELECT "posts".* FROM "posts" WHERE "posts"."featured" = ? AND "posts"."user_id" IN (?, ?, ?, ?, ?) [["featured", 1], ["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5]]
keoghpe: Title for post 4
keoghpe: Title for post 9
keoghpe: Title for post 14
keoghpe: Title for post 19
keoghpe: Title for post 24
keoghpe: Title for post 29
keoghpe: Title for post 34
# ... etc
Pretty neat!
Rails 7 and Cross-Cluster Associations
Thanks to the work of Eileen M. Uchitelle, Rails 7 now supports cross cluster associations. Eileen has a great blog post on the Github blog explaining what this means. One thing that’s not clear from her explanation, is whether or not eager loading is supported on these new associations. I want to run an experiment to find out!
Let’s pretend that our appliation is going to operate at a large scale, and we want to support comments on posts. We’ve decided that the best way to scale our application is to put comments in their own database.
Let’s update our database.yml file to handle a second db:
# config/database.yml
default: &default
adapter: sqlite3
pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
timeout: 5000
development:
primary:
<<: *default
database: db/development.sqlite3
posts:
<<: *default
database: db/posts_development.sqlite3
Create the database from the shell & create the comments model:
blog ➤ rails db:create
Database 'db/development.sqlite3' already exists
Created database 'db/posts_development.sqlite3'
Database 'db/test.sqlite3' already exists
blog ➤ rails g model Comment author_id:integer post_id:integer text --database comments
invoke active_record
create db/migrate/20211120150613_create_comments.rb
create app/models/comments_record.rb
create app/models/comment.rb
invoke test_unit
create test/models/comment_test.rb
create test/fixtures/comments.yml
We can update our posts to have comments:
class Post < ApplicationRecord
belongs_to :user
has_many :comments
scope :featured, -> { where(featured: true) }
end
Migrate to create the new table:
blog ➤ rails db:migrate
== 20211120151055 CreateComments: migrating ===================================
-- create_table(:comments)
-> 0.0018s
== 20211120151055 CreateComments: migrated (0.0019s) ==========================
Let’s generate 10,000 random comments:
blog ➤ rc
Loading development environment (Rails 7.0.0.alpha2)
3.0.0 :001 > post_ids = Post.ids
(1.0ms) SELECT sqlite_version(*)
Post Pluck (1.2ms) SELECT "posts"."id" FROM "posts"
=>
[1,
...
3.0.0 :002 > user_ids = User.ids
User Pluck (0.1ms) SELECT "users"."id" FROM "users"
=> [1, 2, 3, 4, 5]
3.0.0 :003 > 10_000.times do |n|
3.0.0 :004 > Comment.create!(author_id: user_ids.sample, post_id: post_ids.sample, text: "This is a comment #{n}")
3.0.0 :005 > end
In Rails 6, to do a cross cluster query, we need to pass ids explicitly as there are no cross cluster associations:
3.0.0 :006 > Comment.where(post_id: Post.first.id)
Post Load (1.3ms) SELECT "posts".* FROM "posts" ORDER BY "posts"."id" ASC LIMIT ? [["LIMIT", 1]]
Comment Load (1.3ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 1]]
=>
[#<Comment:0x00007f9ee3b1fdf8
id: 280,
author_id: 1,
post_id: 1,
text: "This is a comment 279",
created_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00,
updated_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00>,
#<Comment:0x00007f9ee3b1fce0
id: 1641,
author_id: 5,
post_id: 1,
text: "This is a comment 1640",
created_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00,
updated_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00>,
# ...
In Rails 7, we now have cross cluster associations, making these queries much easier:
3.0.0 :007 > Post.first.comments
Post Load (0.1ms) SELECT "posts".* FROM "posts" ORDER BY "posts"."id" ASC LIMIT ? [["LIMIT", 1]]
Comment Load (0.8ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 1]]
=>
[#<Comment:0x00007f9ee3fd5078
id: 280,
author_id: 1,
post_id: 1,
text: "This is a comment 279",
created_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00,
updated_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00>,
#<Comment:0x00007f9ee3fd4d08
id: 1641,
author_id: 5,
post_id: 1,
text: "This is a comment 1640",
created_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00,
updated_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00>,
# ...
Very cool! Much cleaner and easier.
Now back to N + 1 queries. We can create an N + 1 query by getting the comments on our featured posts:
3.0.0 :008 > Post.featured.each do |post|
3.0.0 :009 > post.comments.each do |comment|
3.0.0 :010 > puts "#{post.title}: #{comment.text}"
3.0.0 :011 > end
3.0.0 :012 > end
Post Load (2.3ms) SELECT "posts".* FROM "posts" WHERE "posts"."featured" = ? [["featured", 1]]
Comment Load (0.9ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 5]]
Title for post 4: This is a comment 1643
# ....
Title for post 4: This is a comment 9662
Comment Load (0.8ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ? [["post_id", 10]]
Title for post 9: This is a comment 1155
# ....
Same N + 1 problem. Let’s see if includes works:
3.0.0 :013 > Post.featured.includes(:comments).each do |post|
3.0.0 :014 > post.comments.each do |comment|
3.0.0 :015 > puts "#{post.title}: #{comment.text}"
3.0.0 :016 > end
3.0.0 :017 > end
Post Load (0.4ms) SELECT "posts".* FROM "posts" WHERE "posts"."featured" = ? [["featured", 1]]
Comment Load (7.0ms) SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) [["post_id", 5], ["post_id", 10], ["post_id", 15], ["post_id", 20], ["post_id", 25], ["post_id", 30], ["post_id", 35], ["post_id", 40], ["post_id", 45], ["post_id", 50], ["post_id", 55], ["post_id", 60], ["post_id", 65], ["post_id", 70], ["post_id", 75], ["post_id", 80], ["post_id", 85], ["post_id", 90], ["post_id", 95], ["post_id", 100], ["post_id", 105], ["post_id", 110], ["post_id", 115], ["post_id", 120], ["post_id", 125], ["post_id", 130], ["post_id", 135], ["post_id", 140], ["post_id", 145], ["post_id", 150], ["post_id", 155], ["post_id", 160], ["post_id", 165], ["post_id", 170], ["post_id", 175], ["post_id", 180], ["post_id", 185], ["post_id", 190], ["post_id", 195], ["post_id", 200], ["post_id", 205], ["post_id", 210], ["post_id", 215], ["post_id", 220], ["post_id", 225], ["post_id", 230], ["post_id", 235], ["post_id", 240], ["post_id", 245], ["post_id", 250], ["post_id", 255], ["post_id", 260], ["post_id", 265], ["post_id", 270], ["post_id", 275], ["post_id", 280], ["post_id", 285], ["post_id", 290], ["post_id", 295], ["post_id", 300], ["post_id", 305], ["post_id", 310], ["post_id", 315], ["post_id", 320], ["post_id", 325], ["post_id", 330], ["post_id", 335], ["post_id", 340], ["post_id", 345], ["post_id", 350], ["post_id", 355], ["post_id", 360], ["post_id", 365], ["post_id", 370], ["post_id", 375], ["post_id", 380], ["post_id", 385], ["post_id", 390], ["post_id", 395], ["post_id", 400], ["post_id", 405], ["post_id", 410], ["post_id", 415], ["post_id", 420], ["post_id", 425], ["post_id", 430], ["post_id", 435], ["post_id", 440], ["post_id", 445], ["post_id", 450], ["post_id", 455], ["post_id", 460], ["post_id", 465], ["post_id", 470], ["post_id", 475], ["post_id", 480], ["post_id", 485], ["post_id", 490], ["post_id", 495], ["post_id", 500]]
Title for post 4: This is a comment 1643
Title for post 4: This is a comment 2114
Title for post 4: This is a comment 2703
Success!