🎓 Make an existing relationship polymorphic

In a world where we have a Car model that historically could be owned by a User, we want to change this relationship to allow a Company to own a Car too.

The assumed shape of the database would be like this:

erDiagram  
    Car {  
    int id PK    int user_id FK "users.id" }  
User {  
    int id PK}  
Company {  
    int id PK}  
  
Car }o--|| User : "Car belongs_to User"

and the Rails models would look nice and simple like this:

class Car < ApplicationRecord  
  belongs_to :user
end  
  
class User < ApplicationRecord  
  has_many :cars
end  
  
class Company < ApplicationRecord  
end  

Where we want to get to

We’re wanting the owner of the Car to be either a User or a Company. For this we have identified that a polymorphic relationship should be introduced.

Our application is widely used and is deployed with “rolling” deployments.

If our application was small scale we could rename the the user_id column on the Car to owner_id. Add the owner_type string column and backfill it with User. We could then start changing all the usage to set the owner rather than the user. We could probably do this as a big bang PR.

In a scaled application with rolling deployments, it’s not quite as easy as this. We require the Application to be functional at all times, while backfilling and switching implementation at the correct times. This will involve several steps, split across PRs and deployments. This post aims to set out what should change and in what order they should be deployed.

The end goal is to get the database to look like this:

erDiagram  
    Car {       int id PK       int owner_id "users.id or companies.id"       int owner_type "User or Company"   
    }  
    User {       int id PK    }    Company {       int id PK    }        Car }o--|| User : "Car owned by a User"  
    Car }o--|| Company : "Car owned by a Company"

and the Rails models to look like this:

class Car < ApplicationRecord  
  belongs_to :owner, polymorphic: true
end  
  
class User < ApplicationRecord  
  has_many :cars, as: :owner
end  
  
class Company < ApplicationRecord  
  has_many :cars, as: :owner
end  

Steps to take

Adding the reference columns

First of all we’re going to add our new reference (owner) columns to our cars table. We’ll follow the best practices set out by the Strong Migrations Gem

We have to make the reference null: true as we’ll be backfilling data in a later step.

Generate a migration from the command line

rails g migration AddOwnerToCars owner:references{polymorphic}

and modify the migration, adding disable_ddl_transaction!, null: true and index: {algorithm: :concurrently}.

class AddOwnerToCars < ActiveRecord::Migration[7.1]  
  disable_ddl_transaction!  
  
  def change  
    add_reference :cars, :owner, polymorphic: true, null: true, index: {algorithm: :concurrently}  
  end  
end

With no other changes we can ==create a pull request==, ==merge and deploy== this change. These columns will be added to our database with no values and will not require any values to be added. The application should continue working as it was previously.

The database schema will look like this:

erDiagram  
    Car {       int id PK       int user_id FK "users.id"       int owner_id "users.id or companies.id"       int owner_type "User or Company"   
    }

Double write

To start with we will want to “double write” the Car’s user and owner. When we set the user value of the Car we also want it to set the owner.

This will mean that any Car that we create or update from now on will have an owner as well as a user.

To achieve this we can add a after_save hook on the Car model. It’s worth checking that there is no code which directly creates or updates cars in the database, otherwise this hook will not be fired. If you have SQL based changes, they will need modifying to write to the owner_id and owner_type when writing to the user_id.

It’s worth thoroughly checking that both user and owner fields are being persisted to the database, your test suite should be able to help.

class Car < ApplicationRecord  
  belongs_to :user    
  belongs_to :owner, polymorphic: true, optional: true  
    
  after_save do  
    self.owner = user  
  end
end

For this change we can ==create a pull request==, ==merge and deploy==.

Backfill untouched data

Now that newly created and updated records have the owner set on them, we can backfill any records that do not have data in either the owner_id or owner_type columns.

It’d be recommended to use a gem like the Data Migrate gem to perform the backfill. This will ensure the data is backfilled in all environments - this is a safer approach than running a script in each environment.

Generate a new data migration:

rails g data_migration backfill_owner_on_cars

Ensure your backfill is as performant as possible. For this example we choose to combine an update_all with the in_batches method. This makes sure we are effectively updating records at the database level without instantiating Active Record models.

class BackfillOwnerOnCars < ActiveRecord::Migration[7.1]
  def up
    Car.where(owner_id: nil, owner_type: nil).in_batches do |batch_cars|  
	    batch_cars.update_all("owner_id = user_id, owner_type = 'User'")  
	  end  
  end
end

If you’re not using a gem like Data Migrate you may want to ==create a pull request==, ==merge and deploy== at this point. If you are using a gem that perseveres the migration order between data and schema changes you can skip this deployment.

Make new columns not null

Now that all of our older data has been filled and new data is being populated with an owner we can update our database to ensure records can not be inserted without the owner_id and owner_type.

We’ll follow the Strong Migrations guidance for setting NOT NULL on a column. This is a 2 part migration, but both migrations can be committed and run on a single deployment - the migration files will have to be separate.

Generate the first migration to add a check_constraint to the columns

rail g migration SetCarsOwnerIdAndOwnerTypeNotNull

We’ll update the migration to add a check constraint to both columns

class SetCarsOwnerIdAndOwnerTypeNotNull < ActiveRecord::Migration[7.1]
  def change
    add_check_constraint :cars, "owner_id IS NOT NULL", name: "cars_owner_id_null", validate: false
    add_check_constraint :cars, "owner_type IS NOT NULL", name: "cars_owner_type_null", validate: false
  end
end

The second part to this step will validate the check constraint, update the columns to NOT NULL and then remove the check constraint.

rail g migration ValidateCarsOwnerIdAndOwnerTypeNotNull

The updated migration should look like

class ValidateCarsOwnerIdAndOwnerTypeNotNull < ActiveRecord::Migration[7.1]  
  def change  
    validate_check_constraint :cars, name: "cars_owner_id_null"  # name from previous migration
    change_column_null :cars, :owner_id, false  
    remove_check_constraint :cars, name: "cars_owner_id_null"  # name from previous migration

	validate_check_constraint :cars, name: "cars_owner_type_null"  # name from previous migration
    change_column_null :cars, :owner_type, false  
    remove_check_constraint :cars, name: "cars_owner_type_null"  # name from previous migration
  end  
end

We’ve made several changes to the database and if we have any null values in the database at this point, the second of the two migrations will fail. This is a good point to ==create a pull request==, ==merge and deploy==.

Use the new relationship

Hopefully at this stage we’ve not hit any issues or snags, our application is still up and running with the appropriate data being populated in our Cars table.

We can now update our Car model and probably our test suite will require the most changes.

We’ll change the relationships on the Car incrementally. First of all we’ll remove the optional: true portion of the owner relationship

class Car < ApplicationRecord  
  belongs_to :user    
  belongs_to :owner, polymorphic: true
    
  after_save do  
    self.owner = user  
  end
end

This change hopefully doesn’t cause any issues as the database changes would have highlighted any previously.

Our next step is to update all the places where the user is set and change that to use owner. We could do this by overriding the #user= method to look like.

class Car < ApplicationRecord  
  belongs_to :user    
  belongs_to :owner, polymorphic: true
    
  after_save do  
    self.owner = user  
  end

  def user=(value)
    super(value)
    self.owner = value
  end
end

This will solve most of the cases where the code set the car.user = User.find(1) as it’ll set both the user_id and the owner_id, owner_type pair.

Unfortunately a lot of other uses will no doubt exist, mainly situations where we create Cars using user.cars.create. We can update the User record in this instance to use this new relationship

class User < ApplicationRecord  
  has_many :cars, as: :owner
end  

at this point we have probably covered most cases. Your test suite will come to the rescue here and every failure will take you to another area of the code. Test factories or fixtures will need updated to create the correct looking records.

There is an easy albeit nuclear, way to test what will break and that is to add user_id to the ignored columns on the Car model. This is useful to find areas that may break but it wouldn’t be advisable to commit that change yet - by all means fix the issues that it raises incrementally.

This process may take several PRs over a period of time, depending on your priorities.

Remove the old relationship

We should now be at the stage where there is no code reliance on the user relationship on the Cars model.

We can now remove the relationships and any additional code that we’ve added to keep everything aligned.

class Car < ApplicationRecord  
  belongs_to :owner, polymorphic: true
    
  after_save do  
    self.user_id = owner_id  
  end
end

We have also switched the after_save callback to set the user_id based off of the owner_id - this will keep the user_id column being filled until the next step. This is a good point to ==create a pull request==, ==merge and deploy== to ensure nothing has been missed.

Removing the `user_id` column

This is another multistep process where we’ll have to update the user_id column to be nullable, ignore the column, drop the column and then finally remove the ignored column. Each step will require us to ==create a pull request==, ==merge and deploy==.

Allowing null on `user_id`

This is a fairly straightforward migration

rail g migration SetCarsUserIdNotNull

class SetCarsUserIdNotNull < ActiveRecord::Migration[7.1]
  def change
    change_column_null :cars, :user_id, true
  end
end

We will want to ==create a pull request==, ==merge and deploy== at this point.

Ignoring the `user_id` column

Now that the database can accept null values we can remove the after_save callback in the Car model.

class Car < ApplicationRecord  
  belongs_to :owner, polymorphic: true
end

While we’re in this file we can also add user_id to the ignored columns

class Car < ApplicationRecord  
  self.ignored_columns += [:user_id]
  
  belongs_to :owner, polymorphic: true
end

After this change we will ==create a pull request==, ==merge and deploy==. This allows the Rails app to start ignoring the column before it is actually removed.

Drop the `user_id` column

It has been a long journey and now we’re at the point where we can drop the older user_id column since none of our application uses it anymore.

rails g migration RemoveUserIdFromCars

with the migration looking like.

class RemoveUserIdFromCars < ActiveRecord::Migration[7.1]
  def change
    # Safety assured by adding the `user_id` to the
    # `ignored_columns` on the Car model
    safety_assured { remove_column :cars, :user_id }
  end
end

It’s good practice to add a note when using the safety_assured helper from strong migrations to help others in your team understand how safety has been assured around dangerous operations.

We need to ==create a pull request==, ==merge and deploy== before moving on to the last step.

Remove ignored columns

At last we have reached the final step in this process and it’s a simple clean up step. We’ll remove the self.ignored_columns += [:user_id] from the Car model.

The final Car model will look like this

class Car < ApplicationRecord  
  belongs_to :owner, polymorphic: true
end

Our User model will not change from our current version but should look like this

class User < ApplicationRecord  
  has_many :cars, as: :owner
end

and finally we can update our Company model to use the polymorphic relationship like so

class Company < ApplicationRecord  
  has_many :cars, as: :owner
end

Ship it

And with one last ==pull request==, ==merge and deploy== we’ve shipped our new feature.

From what seems like a small change in the code, we can see that this is a very involved process involving many moving parts.

To reiterate, this process probably isn’t necessary when dealing with a small application, but at scale, these steps are crutial to keep your application functional and performant.