Skip to content

Stop losing data when writing Django migrations !

François Farge7 min read

Django

Saying that database structure is important is sort of an obvious statement. That is of course if you decided to use a structured database technology. But in that case, you want your database structure to be the closest to your functional and business needs, and the tightest in order to be able to rely on that structure.

However, when I was working on Django projects, I sometimes found it painful to change the data structure when I needed to. However, I discovered a very useful tool offered by Django migrations : the RunPython command.

Simple migration

Let’s take a simple example and assume that we have a single model called Order inside an orders application.

We would then have the following models.py file:

from django.db import models


class Order(models.Model):
    reference = models.CharField(max_length=8)
    amount = models.DecimalField(max_digits=8, decimal_places=2)
    creation_date = models.DateField(auto_now=False, auto_now_add=False)
    due_date = models.DateField(auto_now=False, auto_now_add=False)
    customer_name = models.CharField(max_length=50)
    customer_address = models.CharField(max_length=50)
    customer_city = models.CharField(max_length=50)
    customer_zip_code = models.CharField(max_length=50)

    def __str__(self):
        self.reference

Running python manage.py makemigrations will produce the following migrations file, called 0001_initial.py :

from django.db import migrations, models

class Migration(migrations.Migration):

    initial = True

    dependencies = []

    operations = [
        migrations.CreateModel(
            name="Order",
            fields=[
                (
                    "id",
                    models.AutoField(
                        auto_created=True,
                        primary_key=True,
                        serialize=False,
                        verbose_name="ID",
                    ),
                ),
                ("reference", models.CharField(max_length=8)),
                ("amount", models.DecimalField(decimal_places=2, max_digits=8)),
                ("creation_date", models.DateField()),
                ("due_date", models.DateField()),
                ("customer_name", models.CharField(max_length=50)),
                ("customer_address", models.CharField(max_length=50)),
                ("customer_city", models.CharField(max_length=50)),
                ("customer_zip_code", models.CharField(max_length=50)),
            ],
        ),
    ]

Let’s break down what’s happening in this migrations file. The migration operation is represented as a python class, with 3 attributes:

The python manage.py makemigrations performs well when the operations to perform are simple enough. Renaming a model, or renaming a field will be treated as equivalent database operations for example.

After applying the migration, let’s create a few entities in the database.

Orders after running the first migration

Complex structure migration

Let’s start again from our previous example and observe something that you may already have noticed. The 4 fields customer_name, customer_address, customer_city, and customer_zip_code are functionally attached to the Order entity, while they represent another physical entity, a customer. And that is fine as long as there is no duplication. But now let’s imagine that for business purposes, you need to abstract a Customer entity to which the Order entities are linked.

Your models.py file would then be:

from django.db import models

class Customer(models.Model):
    name = models.CharField(max_length=50)
    address = models.CharField(max_length=50)
    city = models.CharField(max_length=50)
    zip_code = models.CharField(max_length=50)

    def __str__(self):
        return self.name

class Order(models.Model):
    reference = models.CharField(max_length=8)
    amount = models.DecimalField(max_digits=8, decimal_places=2)
    creation_date = models.DateField(auto_now=False, auto_now_add=False)
    due_date = models.DateField(auto_now=False, auto_now_add=False)
    customer = models.ForeignKey(Customer, on_delete=models.PROTECT)

    def __str__(self):
        return self.reference

This configuration now fits our functional needs. Let’s try and generate the migration !

> You are trying to add a non-nullable field 'customer' to order without a default; we can't do that (the database needs something to populate existing rows).
Please select a fix:
 1) Provide a one-off default now (will be set on all existing rows with a null value for this column)
 2) Quit, and let me add a default in models.py

There is a problem however. Django recognizes that we are trying to create a customer field on the Order model, which can’t be None but is not defined. So it wants to add a non-null value to all the existing entries in the database, and asks us if we want to provide it now or want to set it in the Order model.

However, none of these options satisfy us. We don’t want our orders to have customers that are either None or a default customer, which would lose all the existing data.

Granted, we could also execute this migration, then manually set the customer attribute of all the Order model, but then this forces you to execute this same action on all of your environments. That would also make the migration quite difficult to rollback.

Fortunately, Django comes with an built-in solution to deal with this limitation.

Generating a custom migration

First of, let’s start by generating an empty migration that we will then edit.

> python manage.py makemigrations orders --empty

The generated migration file will look like this

from django.db import migrations

class Migration(migrations.Migration):

    dependencies = [
        ('orders', '0001_initial'),
    ]

    operations = [
    ]

We then need to build our custom migration. It needs to have 6 steps :

The Django documentation explains all the database operations needed.

Let’s write this migration :

from django.db import migrations, models
import django.db.models.deletion

class Migration(migrations.Migration):

    dependencies = [
        ("orders", "0001_initial"),
    ]

    operations = [
        # step 1: add the new Customer model
        migrations.CreateModel(
            name="Customer",
            fields=[
                (
                    "id",
                    models.AutoField(
                        auto_created=True,
                        primary_key=True,
                        serialize=False,
                        verbose_name="ID",
                    ),
                ),
                ("name", models.CharField(max_length=50)),
                ("address", models.CharField(max_length=50)),
                ("city", models.CharField(max_length=50)),
                ("zip_code", models.CharField(max_length=50)),
            ],
        ),

        # step 2: add the nullable foreign key field `customer` to Order
        migrations.AddField(
            model_name="order",
            name="customer",
            field=models.ForeignKey(
                null=True,
                on_delete=django.db.models.deletion.PROTECT,
                to="orders.Customer",
            ),
        ),

        # step 3: set the order fields as nullable
        migrations.AlterField(
            model_name="order",
            name="customer_address",
            field=models.CharField(null=True, max_length=50),
        ),
        migrations.AlterField(
            model_name="order",
            name="customer_city",
            field=models.CharField(null=True, max_length=50),
        ),
        migrations.AlterField(
            model_name="order",
            name="customer_name",
            field=models.CharField(null=True, max_length=50),
        ),
        migrations.AlterField(
            model_name="order",
            name="customer_zip_code",
            field=models.CharField(null=True, max_length=50),
        ),

        # step 4: transfer data from Order to Customer
        ...

        # step 5: set the `customer` field as non-nullable
        migrations.AlterField(
            model_name="order",
            name="customer",
            field=models.ForeignKey(
                null=False,
                on_delete=django.db.models.deletion.PROTECT,
                to="orders.Customer",
            ),
        ),

        # step 6: remove the old Order fields
        migrations.RemoveField(model_name="order", name="customer_address",),
        migrations.RemoveField(model_name="order", name="customer_city",),
        migrations.RemoveField(model_name="order", name="customer_name",),
        migrations.RemoveField(model_name="order", name="customer_zip_code",),
    ]

In this migration, step 2 and 3 basically loosen the data structure, preparing it for the data transfer. Then step 5 and 6 tighten it again !

Let’s now dive in step 4, where the magic happens !

Write the data transfer function

In order to perform this operation, we are going to use the RunPython migration operation. What it basically does is execute a python function using the ORM.

Its syntax is the following :

# step 4: transfer data from Order to Customer
migrations.RunPython(order_to_customer, reverse_code=customer_to_order)

In this step of the migration, we specify two functions :

def order_to_customer(apps, schema_editor):
    Order = apps.get_model("orders", "Order")
    Customer = apps.get_model("orders", "Customer")
    for order in Order.objects.all():
        customer, _ = Customer.objects.get_or_create(
            name=order.customer_name,
            address=order.customer_address,
            city=order.customer_city,
            zip_code=order.customer_zip_code,
        )
        order.customer = customer
        order.save()

def customer_to_order(apps, schema_editor):
    Order = apps.get_model("orders", "Order")
    for order in Order.objects.all():
        order.customer_name = order.customer.name
        order.customer_address = order.customer.address
        order.customer_city = order.customer.city
        order.customer_zip_code = order.customer.zip_code
        order.save()

These functions look a lot like what you would write to manually transfer the data from one model to the other. However there is one trick. It isn’t possible to import the models normally with from orders.models import Order, Customer. Instead, we need to use a versioned model passed to the migration function by Django.

Running the migration

After running this migration via python manage.py migrate, we can check that the migration has created the Customer instances and linked them to the correct Orders.

Customers

Customers after running the second migration

Orders

Orders after running the second migration

Et voilà ! This migration is now operational. It can also be easily rolled back with python manage.py migrate orders <previous_migration_id>. The data will then be transferred back to the Order model.

Conclusion

You now have a way to stop worrying about losing data when migrating your database with Django! The Django ORM is a great tool and although it does the job most of the time, understanding what happens under the hood is a great way of making your life easier when dealing with migrations.

Sources