Ryan M. Poe | TypeScript types from a database connection

There are a whole lot of NodeJS ORMs and/or DB management drivers to choose from:

Objection
Sequelize
Knex
Prisma
Hasura
Raw mode (PG2, node-mysql, SQLite, etc.)
TypeORM
Kysely
Mongoose
Waterline (Sails)
MikroORM

I’ve used most of them at some point, either professionally or in side projects. Over time, I’ve developed a preference against treating my schema shape (in code) as the primary artifact of my database designs. You know what I mean: the entities folder, with 30-75 files tucked in under it, each representing one or more database table as some sort of class, or object, usually crawling with decorators. Sometimes your ORM might actually write migrations from those entities.

The “code as roadmap to database design” strategy certainly can work. It has at my job for almost 5 years! But as a methodology, it seems to be most dangerous at one of the most perilous parts of your application deployment pipeline: database migrations. If an ORM has to infer how to write a migration, it’s easy for the code it generates to become noisy, for our trust in our tools to obscure some subtle bug. This might cause others to miss something wrong in the migration during code review, and it’s far more likely for there to be an unintended outage or data loss as a result of a faulty migration.

ORMs aren’t perfect. You need to identify where it’s most critical lean on them. However, it’s equally vital where you should not, under any circumstances blindly trust them. ORM authors don’t know your data; they don’t know your business. They do their best to make sure their software works not just for you, but for you and 10,000 other people. Ultimately, we need to make sure that the migrations automatically generated by tools like TypeORM and Prisma don’t eat our data.

And in my experience, it can be a huge pain in the butt to have to constantly babysit a migration generator.

A method I ultimately stumbled upon (can’t remember where, or if it was just a shower thought) involves ditching the concept of a version-controlled schema representation: instead of code-as-canon, the live schema in a database is canon. At compile time, you pair a database with the code, run your migrations, and use a piece of software like kysely-codegen to generate TypeScript types from your database. From that, you can probably even string together a full-on ORM with just a few generics and some function calls.

This method lets you focus on the migrations, first–on your data, its history, and its shape in the RDMBS, as a priority–rather than it being a robot’s job. I trust robots for lot of things, to be sure. But if given a choice, I’d rather trust robots with something that won’t take my business down with it if automatically-generated code gets something wrong.

I’ve been called into meetings to help figure out what combination of decorators, keys, and yarn db:generates it takes to get TypeORM to not create a migration that truncates entire columns for a simple rename or something. (Yes, someone probably did something wrong, but the blast radius of a mistake seems super high for how easy it is to miss a faulty migration!) If types are being generated from the DB schema, we can spend more time analyzing the migrations we write, and thus (hopefully) understand better.

So no ORMs?

Maybe. You’re generating types from an active schema… maybe even a full-on ORM after that with a few function calls to wrap some schema names. Most of that you can write yourself as a slim abstraction. What’s left?

You still need to worry about things like connection pooling, query builders, transaction boundaries, how you create migrations, and how (or whether) you want checks in place to ensure your schema is in sync with your compiled types. These aren’t small features, so there’s plenty left to do.

But, without your schema living in version control, you don’t traffic in “models” and “entities” anymore. You write queries and get typed outputs. Your computed columns can be pure functions that take your model, and your architecture becomes driven by many small units of work against interfaces generated straight from tables.

Some of the “nice” features of ORMs like eager loads strike me as a bit anachronistic in a concurrent language like Node, anyway. Even the gnarliest-nested GraphQL queries are no match for a well-considered set of DataLoaders. The latter certainly generates more queries than the beefers that waddle out of some ORM logs. Complex ORM queries have to account for a wide variety of circumstances, so you pray that somewhere in its humongous 182kb query is exactly the data you intended to retrieve in the shape you expected.

In my experience, the simplicity of DataLoader lookups and being able to lean into the concurrency model of NodeJS a bit better is worth the trade off. The speed improvements of 3-5 properly batched, potentially cached calls to DataLoaders over a single large behemoth SQL query from an ORM often comes out in the wash in 2023 at almost any scale. Heck, in some places at work, we batch primary key lookups within an entire process, getting application-wide batching. (Careful here, though, when in a transaction.)

What about testing?

Despite often being advertised as making testing easier, ORMs don’t always allow a near-enough 1:1 corollary between testing and non-testing environments to give the claim much weight. There are always annoying little invariances between RDBMS layers to make this a losing position for all but the simplest scenarios.

These days, I’ve been a fan of testcontainers in the Node ecosystem and a quick 200 line script that stands up, migrates and/or copies a schema, and spins it back down as part of opt-in test behavior against real databases. Your mileage may vary between RDBMSes, though. In postgres, it’s pretty easy (and speedy) to do this; trivial in SQLite, but harder and slower in MySQL.

The testing API winds up something like:

import { testWithDatabase } from "@my-repo/test-tools";
import { testSubject } from "./test-subject";

const { getConnection } = testWithDatabase();

it("does a thing", async () => {
  await testSubject(getConnection()); // Pass those connections around to keep transaction boundaries clear!
});

(You want this opt-in for a lot of reasons: you may want to write a new DB test connector–with opt-in behavior, you can replace it one file at a time; you may not want your unit tests having to load a MySQL container; etc.)

I’ll write about this at some point. I’ve done it twice now: once for postgres, and once for MySQL. Fun project, and deepen one’s understanding of a complex software stack. (There’s no hiding wrinkles when you’re messing with test architecture.)

So what do you use?

For my own stuff, I mostly Knex as a query builder. Then I realized how much of a pain it was to try to keep my on-disk interfaces in line with my database… Knex has a few ways to do it (here’s one in PG), but I wanted to search around for a query builder with better TypeScript support.

So I hunted around and found kysely and its type generator, recently. I liked it so much, it provoked this article! I’m not sure how well the above workflow kysely allows me might scale across teams and in very large schemas, but it’s been a promising way to manage a database so far.

End

Anyway, that was just some random thoughts on ORMs. As I slowly move away from class-oriented design, the less use I have for the typical RDBMS interface patterns like ActiveRecord (or the Frankenstein’s beast style of AR/ORM patterns you sometimes see, e.g., in TypeORM). Tables and plain-old shallow javascript objects overlap too seamlessly to try to bolt new APIs on top of them. Virtual columns, hooks, and other thick model patterns are combining data and functionality in ways that make the flows of data (including transactional boundaries) more opaque.

Let data be data. Let functionality be functionality. ✌️