Testing as thinking

What can TDD actually do for you?

I got to thinking about all of the ire I often see directed toward Test Driven Development (TDD). Developers tend to consider the “write the test first” dogma to be unhelpful–too picky for people who actually want to ship products.

I totally understand. Nobody likes something that exists only as a set of obtrusive, mechanical, seemingly regressive rules. (This is tilting at windmills, though. Nobody got fired for forgetting to write their test before their implementation.) I don’t want to even sell you on test-first engineering. I almost never do that!

As I see more and more PRs, systems, and code, however, I’ve begun embracing the importance of coding with an eye toward long-term support. Writing code to make a product to get a business started is easy mode. Writing code that’s not a bear to maintain is almost a completely different task. In my experience, a key difference between difficult code and clear code involves how the original author approached testing.

The average developer fresh from boot camp knows how to ship. It’s in their bones, and probably part of the reason they took to web development in the first place! On the flip side, it’s rare that a junior has written more than a handful of tests in their career before signing up to work on a full-blown web application. Anything that can get in the way of an engineer’s ability to understand and ship code is a fast track to imposter syndrome, especially if it’s affecting a person’s ticket throughput.

I’ve seen it happen. Instead of treating tests as a side kick for helping them approach solutions tactically, an engineer will start envisioning tests themselves as an end, some annoying metric or menial step so someone can check a box somewhere. It’s the testing ball and chain!

This is a poor relationship to have with one’s tools. Imagine a construction worker being forced by colleagues to lug around a twenty-pound hammer to pour concrete, and have to present proof that she used the hammer properly before the job could be accepted. Sound silly. It’s an unserious metaphor not in small part because the ball and chain of the hammer is a fiction, just like the testing boogeyman created by a developer with an unhealthy relationship to testing.

To some purists, TDD might be about “writing the test first,” but to those who read and understood the materials (and didn’t take the cliff notes as dogma 😇), TDD isn’t a strict mechanical process: there’s no pure way to do it that’s worth religiously adhering to. At its core, however, thinking with tests steers us toward the decomposition of complex systems into smaller, more discrete, and easily-testable “units.”

Those units might be narrow functional tests mocked to within an inch of utility. They also might be more integrated tests that mock strategically. The details aren’t important. What is important is that tests and testing impose healthy constraints on our system design. Rigorous testing forces us to decouple systems, to minimize our feedback loops, to think through the boundaries between components, to properly think through incidental and necessary complexity, and can be bolstered by an emergent library of time-saving test helpers (e.g., cyclomatic complexity-reducing tools like coalesce).

Writing humongous, complex, impossible-to-maintain code that happens to fulfill product requirements is easy. Testing it is hard. Maintaining it is even more difficult. By enforcing testing, many developers will find it easier to refactor code into a more testable architecture and write tests for that, rather than figure out how to test their three-hundred line monster function that does ten things.

There’s a considerable amount of thinking that TDD forces us to do. It forces us to think through more pieces of our system and its relevant subsystems in order to achieve our product goals. When I hear people bemoan TDD, I try to remind them of this fact. TDD is only slower for a strawman’s understanding of it. (Slower when? For whom?) It’s not just writing a .test.ts file next door to our code, or mechanically writing a test first. It’s a concept to make us think through the problem at hand, isolate it, decompose it, and organize it from the ground up in a way that’s easy to write tests against. In a way, 100% code coverage is almost like a linter for ensuring someone tried to do these steps before committing code–or at least, purposefully and obviously chose not to.

There’s a central conceit here that “easy to write unit tests against” is itself desirable, and that it can tend toward better outcomes, clearer code, etc. I think that’s mostly true, but your mileage may vary. In some systems that might not be the case, but in the one I work on, well-tested code is almost always the easiest to follow and understand, regardless of the complexity. What’s the worst that can happen? “This folder is entirely too organized, well-considered, and unit tested! I have all the tools I need to completely understand this subsystem! 😠 Harrumph!” 🙈

Caveat emptor: a bad test is often worse than no test. I’ve seen otherwise well-meaning engineers new to testing who accidentally mock their test subject, and wonder why their code isn’t registered by static analysis tools as covered. I’ve spent days untangling thousand-line tests that mock modules seemingly at random fourteen files down the call stack of the test subject. Folks will use spies instead of mocking heavy dependencies, and ask why their test fails every third time from memory issues.

TDD won’t solve all of your problems. Anything applied absolutely and uncritically probably sucks. TDD is no exception. Make it work for you, keep what you want, and yeet the rest. The wisdom to know how it can or can’t help your team can only come with some time experimenting with it.

My experience and tips

I’ve been living with 100% test coverage now for over four years now. Here are some of the chief guidelines I’ve come up with to help ensure our team gets the best out of TDD:

  1. Good testing is all about thinking. Don’t skip this step. It will show. What are the behaviors? What are the constituent parts of those behaviors we need to account for? What do I need to assert against? What level or levels of testing guarantee a balance of maintainability, speed, and correctness?
  2. Use test descriptions to tell your future self what behavior you actually want to test. It’s not always user-level behavior, nor should it be.
  3. Test suites should never be order reliant (e.g., test1.test.ts should never rely on test0.test.ts’s execution). Ensure all test suites can be run in isolation.
  4. Do as much as you can to fully isolate each suite’s test case so that they aren’t inter-reliant (not always possible in long-running end-to-end and integration tests).
  5. Only assert against the work that your test subject is doing. Testing ancillary or even related behavior is going to result in false negatives and a noisy test suite.
  6. Mock only what your test subject can see. coalesce.test.ts should never mock anything that is not imported directly in coalesce.ts. (Yes, this means every file has its adjacent test, with few exceptions.)
  7. Aggressively reduce your feedback loops. The faster between you pressing save and seeing the result of your change, the faster you can write code and validate the behavior of your system.
  8. Use // istanbul ignore next - reason liberally. It exists for a reason, and in TDD-land, it’s to let code reviewers know “I am purposefully not testing this branch of code, here’s why.” It’s a place for conversation if one needs to occur. (Most of ours are completely uncontroversial noop and early-return scenarios. We also often come up with new time-saving helpers when we see enough of these. That’s where coalesce came from!)
  9. Tests are not always a good place to apply certain application design principles, because it’s often the point where you’re exercising unusual runtime behavior. You sometimes have to use static, plain-text identifiers instead of dynamic enums, duplicate instead of share the occasional piece of setup code, and repeat certain fixtures to keep your test as linear and self-contained as possible.
  10. Test the behavior, not the implementation. (Also stated as “test the feature, not the implementation.”) Write test code so that you don’t assert how the function arrived at its outputs, only that it did. You’re not a third-grade math teacher. Don’t make the computer show its work.
  11. Where possible, safe, and useful, avoid testing your tools. E.g., we tend to assume React event handlers work as advertised and write our components to test async handler logic separately from React and the DOM.

Using this advice, at WorkTango, we’ve been able to maintain 100% test coverage and get really good at writing tests and knowing our test tools. Of course, it’s not all cinnamon and rainbows–time and inexperience often combine to result in various problems with tests, most of which we’ve addressed over time with some of the guidance above. I’m sure we will find more, though. It’s easy to forget that being able to use machines to validate the results of our work isn’t something that comes natural to most people.

Just keep in mind that every problematic test is a new learning opportunity! 😊