Using Generated Columns in Postgres

I needed to generate unique character sequences for items in a database while messing around with ideas for a direct part marking system.

The basic idea was to use a subset of alphanumeric characters selected for readability in a fixed length sequence with some recognizable prefix so that a person could come across some particular object in the world and search a database to see information about that thing. Think of it as a universal serial number.

Databases are well-equipped to automatically generate unique IDs in the form of incrementing numbers or UUIDs. To my understanding, they have historically been less equipped to generate sequences based on the whims of the programmer.

One available option here is to generate a numeric ID for each object and then later update that record with the corresponding ID from my schema.

For example, some record having the ID 1 might map to the ID AAAAA, ID 2 to AAAAB, and so on. This is essentially a base conversion, and I wrote some code to handle it in my Django prototype.

from django.conf import settings
from django.contrib.postgres.fields import JSONField
from django.db import models


class DataMark(models.Model):
    IDENTIFIER_CHOICES = [
        'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'k', 'm', 'n', 'p', 'q',
        'r', 's', 't', 'w', 'x', 'y', 'z', '2', '3', '4', '5', '6', '7',
        '8', '9',
    ]

    def _generate_identifier(self, pkey=None):
        if not pkey:
            pkey = self.id

        base = len(self.IDENTIFIER_CHOICES)
        output = []

        for n in range(9):
            if n == 0:
                idx = pkey % base
            else:
                idx = pkey // base ** n % base

            output.append(self.IDENTIFIER_CHOICES[idx])

        return(''.join(output[::-1]))

    def save(self, *args, **kwargs):
        # This is gnarly, there's gotta be some way to have postgres do
        # this automatically on save.
        super(DataMark, self).save(*args, **kwargs)

        self.refresh_from_db()
        self.identifier = self._generate_identifier(self.id)

        super(DataMark, self).save(*args, **kwargs)
            

While this works, the careful reader will have noticed from my comments in the code that I wasn't particularly pleased with it.

Not only does this solution require any system creating records in our database to understand how these IDs are generated, it also requires twice as many writes as I'd like in order to create a record in the table.

In looking for solutions to my problem, I discovered that Postgres 12 had introduced support for generated columns. Unlike a DEFAULT, these generated columns are allowed to reference other columns in their own table at generation time.

For brevity and readability, the example below sets our generated value to twice that of the id, but it could also do the base conversion I wrote in Python above if you were to write and store that as a function in your database.

CREATE TABLE records(
    id SERIAL PRIMARY KEY,
    generated_identifier VARCHAR GENERATED ALWAYS AS (id * 2) STORED
);

> INSERT INTO records DEFAULT VALUES;
INSERT 0 1
> INSERT INTO records DEFAULT VALUES;
INSERT 0 1
> INSERT INTO records DEFAULT VALUES;
INSERT 0 1
> SELECT * FROM records;
id | generated_identifier
---+---------------------
1  |                    2
2  |                    4
3  |                    6
            

🦇