N + 1 and Cross Cluster Queries in Rails

N + 1 query optimisation in Rails is a classic ORM optimisation problem. Knowing the standard patterns for solving N + 1 queries in Rails should be a prerequisite for anyone wanting to be considered a senior rails developer.

This problem occurs an entity has associated records that are fetched through another set of associated records. For example, a user couple have many purchased products through their orders. To get all of a users purchased products, we make 1 query to get all their N orders. The problem is that we then must make N additional queries for the products in each order or use a work around.

An N + 1 Example

Let’s create a new project:

rails new blog

Let’s create a user:

rails g model User username

Let’s create a blog post:

rails g model Post title body user:references

Create the DB & run migrations:

rails db:create && rails db:migrate

Let’s update our user so that it has many blog posts:

class User < ApplicationRecord
  has_many :posts, dependent: :destroy
end

Let’s create a few users with 100 posts each:

blog  rails c
Loading development environment (Rails 7.0.0.alpha2)
3.0.0 :001 > ["keoghpe", "elon", "gandalf", "dhh", "frodo"].each {|name| User.create!(username: name)}
   (0.7ms)  SELECT sqlite_version(*)
  TRANSACTION (0.0ms)  begin transaction
  User Create (0.4ms)  INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?)  [["username", "keoghpe"], ["created_at", "2021-11-20 14:09:30.163285"], ["updated_at", "2021-11-20 14:09:30.163285"]]
  TRANSACTION (0.9ms)  commit transaction
  TRANSACTION (0.0ms)  begin transaction
  User Create (0.3ms)  INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?)  [["username", "elon"], ["created_at", "2021-11-20 14:09:30.166146"], ["updated_at", "2021-11-20 14:09:30.166146"]]
  TRANSACTION (0.5ms)  commit transaction
  TRANSACTION (0.0ms)  begin transaction
  User Create (0.3ms)  INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?)  [["username", "gandalf"], ["created_at", "2021-11-20 14:09:30.167708"], ["updated_at", "2021-11-20 14:09:30.167708"]]
  TRANSACTION (0.5ms)  commit transaction
  TRANSACTION (0.0ms)  begin transaction
  User Create (0.3ms)  INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?)  [["username", "dhh"], ["created_at", "2021-11-20 14:09:30.169435"], ["updated_at", "2021-11-20 14:09:30.169435"]]
  TRANSACTION (0.5ms)  commit transaction
  TRANSACTION (0.0ms)  begin transaction
  User Create (0.3ms)  INSERT INTO "users" ("username", "created_at", "updated_at") VALUES (?, ?, ?)  [["username", "frodo"], ["created_at", "2021-11-20 14:09:30.170897"], ["updated_at", "2021-11-20 14:09:30.170897"]]
  TRANSACTION (0.5ms)  commit transaction
 => ["keoghpe", "elon", "gandalf", "dhh", "frodo"]
3.0.0 :002 > User.find_each do |user|
3.0.0 :003 >   100.times {|n| Post.create!(user: user, title: "Title for post #{n}", body: "Body for post #{n}")}
3.0.0 :004 > end
  User Load (0.2ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ?  [["LIMIT", 1000]]
  TRANSACTION (0.0ms)  begin transaction
  Post Create (0.3ms)  INSERT INTO "posts" ("title", "body", "user_id", "created_at", "updated_at") VALUES (?, ?, ?, ?, ?)  [["title", "Title for post 0"], ["body", "Body for post 0"], ["user_id", 1], ["created_at", "2021-11-20 14:10:44.804986"], ["updated_at", "2021-11-20 14:10:44.804986"]]
  # Lots more log lines for each insert

Let’s see an N + 1 query in action by trying to print the username and title for each of our users posts:

3.0.0 :008 >
3.0.0 :008 > User.find_each do |user|
3.0.0 :009 >   user.posts.each do |post|
3.0.0 :010 >     puts "#{user.username}: #{post.title}"
3.0.0 :011 >   end
3.0.0 :012 > end
  User Load (0.1ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ?  [["LIMIT", 1000]]
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ?  [["user_id", 1]]
keoghpe: Title for post 0
keoghpe: Title for post 1
keoghpe: Title for post 2
....
keoghpe: Title for post 99
  Post Load (0.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ?  [["user_id", 2]]
elon: Title for post 0
...
elon: Title for post 99
  Post Load (0.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ?  [["user_id", 3]]
gandalf: Title for post 0
# ...
gandalf: Title for post 99
  Post Load (0.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ?  [["user_id", 4]]
dhh: Title for post 0
# ....
dhh: Title for post 99
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = ?  [["user_id", 5]]
frodo: Title for post 0
# ... etc

We can see for every user, we are making an additional round trip to the database.

Eager loading

We can make this more efficient by performing a single query to the database for all posts instead. In rails this is known as eager loading and it is suppported with the includes method. We can use includes in our previous example to generate just 2 queries:

3.0.0 :014 > User.includes(:posts).find_each do |user|
3.0.0 :015 >   user.posts.each do |post|
3.0.0 :016 >     puts "#{user.username}: #{post.title}"
3.0.0 :017 >   end
3.0.0 :018 > end
  User Load (0.1ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ?  [["LIMIT", 1000]]
  Post Load (1.5ms)  SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (?, ?, ?, ?, ?)  [["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5]]
keoghpe: Title for post 0
keoghpe: Title for post 1
keoghpe: Title for post 2
keoghpe: Title for post 3
keoghpe: Title for post 4
# ... etc

More Complex Associations

Active record associations are pretty flexible. We can use queries on the associations and use them with includes for more complex relationships. Let’s add a featured column to our posts and create the featured_posts association on User.

Quit the rails console and add the featured column to posts:

blog ➤ rails g migration AddFeaturedToPosts featured:boolean                                                                                                                                                                                        git:main*
      invoke  active_record
      create    db/migrate/20211120143359_add_featured_to_posts.rb

Set the default value to false:

class AddFeaturedToPosts < ActiveRecord::Migration[7.0]
  def change
    add_column :posts, :featured, :boolean, default: false
  end
end

Run the migrations with rails db:migrate.

Let’s create a scope on our Post model and create the associated relationship on our user model:

class Post < ApplicationRecord
  belongs_to :user

  scope :featured, -> { where(featured: true) }
end

class User < ApplicationRecord
  has_many :posts, dependent: :destroy
  has_many :featured_posts, -> { featured }, class_name: 'Post'
end

Let’s start a rails console and update one fifth of our posts to be featured. We can then see includes work for the association with a query:

blog  rails c
Loading development environment (Rails 7.0.0.alpha2)
3.0.0 :001 > Post.where("id % 5 = 0").update_all(featured: true)
   (0.6ms)  SELECT sqlite_version(*)
  Post Update All (0.2ms)  UPDATE "posts" SET "featured" = ? WHERE (id % 5 = 0)  [["featured", 1]]
 => 100
3.0.0 :002 > User.includes(:featured_posts).find_each do |user|
3.0.0 :003 >   user.featured_posts.each do |post|
3.0.0 :004 >     puts "#{user.username}: #{post.title}"
3.0.0 :005 >   end
3.0.0 :006 > end
  User Load (0.1ms)  SELECT "users".* FROM "users" ORDER BY "users"."id" ASC LIMIT ?  [["LIMIT", 1000]]
  Post Load (0.5ms)  SELECT "posts".* FROM "posts" WHERE "posts"."featured" = ? AND "posts"."user_id" IN (?, ?, ?, ?, ?)  [["featured", 1], ["user_id", 1], ["user_id", 2], ["user_id", 3], ["user_id", 4], ["user_id", 5]]
keoghpe: Title for post 4
keoghpe: Title for post 9
keoghpe: Title for post 14
keoghpe: Title for post 19
keoghpe: Title for post 24
keoghpe: Title for post 29
keoghpe: Title for post 34
# ... etc

Pretty neat!

Rails 7 and Cross-Cluster Associations

Thanks to the work of Eileen M. Uchitelle, Rails 7 now supports cross cluster associations. Eileen has a great blog post on the Github blog explaining what this means. One thing that’s not clear from her explanation, is whether or not eager loading is supported on these new associations. I want to run an experiment to find out!

Let’s pretend that our appliation is going to operate at a large scale, and we want to support comments on posts. We’ve decided that the best way to scale our application is to put comments in their own database.

Let’s update our database.yml file to handle a second db:

# config/database.yml
default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: db/development.sqlite3
  posts:
    <<: *default
    database: db/posts_development.sqlite3

Create the database from the shell & create the comments model:

blog ➤ rails db:create
Database 'db/development.sqlite3' already exists
Created database 'db/posts_development.sqlite3'
Database 'db/test.sqlite3' already exists
blog ➤ rails g model Comment author_id:integer post_id:integer text --database comments
      invoke  active_record
      create    db/migrate/20211120150613_create_comments.rb
      create    app/models/comments_record.rb
      create    app/models/comment.rb
      invoke    test_unit
      create      test/models/comment_test.rb
      create      test/fixtures/comments.yml

We can update our posts to have comments:

class Post < ApplicationRecord
  belongs_to :user
  has_many :comments

  scope :featured, -> { where(featured: true) }
end

Migrate to create the new table:

blog ➤ rails db:migrate
== 20211120151055 CreateComments: migrating ===================================
-- create_table(:comments)
   -> 0.0018s
== 20211120151055 CreateComments: migrated (0.0019s) ==========================

Let’s generate 10,000 random comments:

blog  rc
Loading development environment (Rails 7.0.0.alpha2)
3.0.0 :001 > post_ids = Post.ids
   (1.0ms)  SELECT sqlite_version(*)
  Post Pluck (1.2ms)  SELECT "posts"."id" FROM "posts"
 =>
[1,
...
3.0.0 :002 > user_ids = User.ids
  User Pluck (0.1ms)  SELECT "users"."id" FROM "users"
 => [1, 2, 3, 4, 5]
3.0.0 :003 > 10_000.times do |n|
3.0.0 :004 >   Comment.create!(author_id: user_ids.sample, post_id: post_ids.sample, text: "This is a comment #{n}")
3.0.0 :005 > end

In Rails 6, to do a cross cluster query, we need to pass ids explicitly as there are no cross cluster associations:

3.0.0 :006 > Comment.where(post_id: Post.first.id)
  Post Load (1.3ms)  SELECT "posts".* FROM "posts" ORDER BY "posts"."id" ASC LIMIT ?  [["LIMIT", 1]]
  Comment Load (1.3ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ?  [["post_id", 1]]
 =>
[#<Comment:0x00007f9ee3b1fdf8
  id: 280,
  author_id: 1,
  post_id: 1,
  text: "This is a comment 279",
  created_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00,
  updated_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00>,
 #<Comment:0x00007f9ee3b1fce0
  id: 1641,
  author_id: 5,
  post_id: 1,
  text: "This is a comment 1640",
  created_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00,
  updated_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00>,
# ...

In Rails 7, we now have cross cluster associations, making these queries much easier:

3.0.0 :007 > Post.first.comments
  Post Load (0.1ms)  SELECT "posts".* FROM "posts" ORDER BY "posts"."id" ASC LIMIT ?  [["LIMIT", 1]]
  Comment Load (0.8ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ?  [["post_id", 1]]
 =>
[#<Comment:0x00007f9ee3fd5078
  id: 280,
  author_id: 1,
  post_id: 1,
  text: "This is a comment 279",
  created_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00,
  updated_at: Sat, 20 Nov 2021 15:23:05.107472000 UTC +00:00>,
 #<Comment:0x00007f9ee3fd4d08
  id: 1641,
  author_id: 5,
  post_id: 1,
  text: "This is a comment 1640",
  created_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00,
  updated_at: Sat, 20 Nov 2021 15:23:07.532991000 UTC +00:00>,
# ...

Very cool! Much cleaner and easier.

Now back to N + 1 queries. We can create an N + 1 query by getting the comments on our featured posts:

3.0.0 :008 > Post.featured.each do |post|
3.0.0 :009 >   post.comments.each do |comment|
3.0.0 :010 >     puts "#{post.title}: #{comment.text}"
3.0.0 :011 >   end
3.0.0 :012 > end
  Post Load (2.3ms)  SELECT "posts".* FROM "posts" WHERE "posts"."featured" = ?  [["featured", 1]]
  Comment Load (0.9ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ?  [["post_id", 5]]
Title for post 4: This is a comment 1643
# ....
Title for post 4: This is a comment 9662
  Comment Load (0.8ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" = ?  [["post_id", 10]]
Title for post 9: This is a comment 1155
# ....

Same N + 1 problem. Let’s see if includes works:

3.0.0 :013 > Post.featured.includes(:comments).each do |post|
3.0.0 :014 >   post.comments.each do |comment|
3.0.0 :015 >     puts "#{post.title}: #{comment.text}"
3.0.0 :016 >   end
3.0.0 :017 > end
  Post Load (0.4ms)  SELECT "posts".* FROM "posts" WHERE "posts"."featured" = ?  [["featured", 1]]
  Comment Load (7.0ms)  SELECT "comments".* FROM "comments" WHERE "comments"."post_id" IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)  [["post_id", 5], ["post_id", 10], ["post_id", 15], ["post_id", 20], ["post_id", 25], ["post_id", 30], ["post_id", 35], ["post_id", 40], ["post_id", 45], ["post_id", 50], ["post_id", 55], ["post_id", 60], ["post_id", 65], ["post_id", 70], ["post_id", 75], ["post_id", 80], ["post_id", 85], ["post_id", 90], ["post_id", 95], ["post_id", 100], ["post_id", 105], ["post_id", 110], ["post_id", 115], ["post_id", 120], ["post_id", 125], ["post_id", 130], ["post_id", 135], ["post_id", 140], ["post_id", 145], ["post_id", 150], ["post_id", 155], ["post_id", 160], ["post_id", 165], ["post_id", 170], ["post_id", 175], ["post_id", 180], ["post_id", 185], ["post_id", 190], ["post_id", 195], ["post_id", 200], ["post_id", 205], ["post_id", 210], ["post_id", 215], ["post_id", 220], ["post_id", 225], ["post_id", 230], ["post_id", 235], ["post_id", 240], ["post_id", 245], ["post_id", 250], ["post_id", 255], ["post_id", 260], ["post_id", 265], ["post_id", 270], ["post_id", 275], ["post_id", 280], ["post_id", 285], ["post_id", 290], ["post_id", 295], ["post_id", 300], ["post_id", 305], ["post_id", 310], ["post_id", 315], ["post_id", 320], ["post_id", 325], ["post_id", 330], ["post_id", 335], ["post_id", 340], ["post_id", 345], ["post_id", 350], ["post_id", 355], ["post_id", 360], ["post_id", 365], ["post_id", 370], ["post_id", 375], ["post_id", 380], ["post_id", 385], ["post_id", 390], ["post_id", 395], ["post_id", 400], ["post_id", 405], ["post_id", 410], ["post_id", 415], ["post_id", 420], ["post_id", 425], ["post_id", 430], ["post_id", 435], ["post_id", 440], ["post_id", 445], ["post_id", 450], ["post_id", 455], ["post_id", 460], ["post_id", 465], ["post_id", 470], ["post_id", 475], ["post_id", 480], ["post_id", 485], ["post_id", 490], ["post_id", 495], ["post_id", 500]]
Title for post 4: This is a comment 1643
Title for post 4: This is a comment 2114
Title for post 4: This is a comment 2703

Success!

Written on December 13, 2021