Ryan M. Poe | The Structural Problems with Abstractions

Abstractions are a wonderful software development tool, allowing one to take common business rules and consolidate them into a centralized location for improved testability and reuse. We trade complexity in multiple locations for complexity in a central place, so that our top-level function doesn’t need to understand all the intricacies of the underlying, abstracted tooling.

Yet, even the best abstractions come with a cost in terms of understanding. Your average (strawman) Java code bases are a wonderland of deeply-nested, extremely abstract concepts like the famed FactoryFactoryFactory. The person who built the FactoryFactoryFactory, knows full well how it works and why it’s necessary. Everyone else? Enjoy your six months of onboarding, retreading every inevitably under-documented design decision your forebears made.

To get it out of the way: nobody does this with malice, or even incompetence. It’s something that grows naturally as multiple teams works across a single codebase. Intentional or not, though, it’s important that we understand how we arrive at something like the FactoryFactoryFactory.

On critical reason, I think, is structural. Developers are incentivized to consolidate common instructions into separate functions: “hark! I have added functionality and reduced the number of lines!” It’s seems almost axiomatic that this is a good thing.

In my rush to ✅ this PR, though, I often forget to ask a critical question: we reduced lines by tying two or more pieces of potentially-unrelated functionality together. Is that what we really need? Ask the question! You might realize that the behaviors aren’t quite as similar as it appears at a glance; or that the obvious abstraction in one place is actually the wrong one when we need to use it in a new place; or even that the very act of adding a new unit of shared jargon isn’t worth the effort it took to create it, given the overhead it adds to our future, forgetful selves.

A large number of hasty abstractions owe to the violence that DRY (Don’t Repeat Yourself) has inflicted on the industry–or, rather, the violence the industry has done to DRY. A great diversity of engineers at every level are prone to thinking of DRY as a goal, rather than a tool or a guideline. “Two pieces of code that do the same thing? This demands an abstraction.” It’s almost human nature: give us an guideline, and we’ll turn it into a rule; give us a metric, and we’ll turn it into a target.

And in a vacuum, a small trade off like consolidating similar loops into a single function so we can reuse it elsewhere might make sense. It’s common functionality, after all. Why not make it an abstraction?

// handleEvent.ts
function handleEvent(event: { type: EventType; payload: any }): Promise<void> {
  switch (event.type) {
    case "PERSON_CREATED":
      return handlePersonCreated(event.payload);
    case "PERSON_DELETED":
      return handlePersonDeleted(event.payload);
  }

  throw new Error("Unsupported event");
}

Me, an easily-distracted engineer with lots of better things to do, might see this, and think “hey, we’re duplicating function calls with a common signature here, and can reduce this case/switch down to a map to make it more easily extensible and simplify our testing! Hah! That’s why they pay me the big bucks.” 🤩

// eventHandlers.ts
export const EVENT_HANDLERS: Record<EventType, EventHandler> = {
  PERSON_CREATED: handlePersonCreated,
  PERSON_DELETED: handlePersonDeleted,
};

// handleEvent.ts
import { EVENT_HANDLERS } from "./eventHandlers.ts";

function handleEvent(event: { type: EventType; payload: any }): Promise<void> {
  const handler = EVENT_HANDLERS[event.type];

  if (!handler) {
    throw new Error("Unsupported event");
  }

  return handler(event.payload);
}

To virtually anyone reading this, the above refactor seems fairly reasonable. I’m pretty sure I’ve actually done this exact refactor myself, confident that I’m being a great forebear. But I rarely stop to think about things like whether this pattern is completely alien to the rest of the code. Or what if adding that one extra file layer is where future me gives up and has to disrupt a colleague’s day? (By asking them what gives.)

Compounded hundreds–even thousands–of times across a large enough codebase, refactors like the above grow our code’s complexity toward a critical mass. It might seem perfectly clean to us, but only to us. Over time, the flow of data through our system becomes so difficult to understand, we might start to have retention and hiring issues–not to mention trust issues when our colleagues touch one thing and break 12 other things by accident.

I’m sure this concept has a name, but it’s new to me. Is it tech debt? That doesn’t quite seem to fit. It implies something that has clear tangles that can be straightened. Over-abstracted code doesn’t lend itself to obvious refactors (perhaps by its very nature), at least small, incremental ones. Perhaps it’s just the balance between reducible and irreducible complexity?

Whatever the case, it often feels like the default, deeply-ingrained response to duplicated code is to create an abstraction because we think that that’s necessarily better.

This is the structural problem with abstractions: at a small scale, nearly every abstraction seems reasonable and adds little mental overhead to the code at hand. Yet, on a large enough scale and over enough time, fragmented abstraction use can turn a straightforward (if sometimes not fully DRY) codebase into a maze of miniature frameworks one has to master to be able to understand how, e.g., a simple request payload becomes a response.

Up to a certain point, requiring mastery is OK. Everything has a jargon. But when is it too much? 🤷

If I have advice it’s to be judicious and empathetic when considering abstractions. Every file, application interface, pattern, folder, and function adds a small understanding cost to your peers and your future self. At the very least, abstractions need a higher purpose than DRY, higher than saving lines. Consider whether duplication might actually reduce the mental overhead of a piece of functionality, and consider that that reduced overhead might be more valuable than an abstraction.

Anyone can write compact code that a machine can understand. Good engineers write code that their peers can easily understand–code that strikes the right balance between consistent abstractions and clear, linear data flows.

But like everything, the real challenging is navigating the trade off. There are no clear, absent-minded answers here. Happy coding. The end.

tl;dr

Abstractions are useful but can increase overall code complexity.
Premature abstractions can tie unrelated functionality together.
DRY is a tool, and guideline. It is not a goal, or a target.
Consider abstractions judiciously and empathetically.