The “Opaque Box” Data Pattern

a person with a cardboard box over their head

The “Opaque Box” Data pattern is an anti-pattern. It’s an anti-pattern that I have seen time and time again throughout my 25+ year career.

This anti-pattern starts with a highly optimized data querying and storage facility – this could be a relational database, or something schemaless; it doesn’t matter. From there, the immense complexity, years of software evolution, and the remarkable flexibility of that datastore are ignored and the pattern effectively says, “nah, I’m good”, and it takes the important bits of the data, encodes it in an opaque binary structure, and unceremoniously shoves it into the data store.

A gif of a car in a crash test slamming into a barrier and being destroyed in the process
Here we see someone trying to query a datastore where the important bits are in a binary blob.

In my experience, this pattern is usually employed by folks that just learned about structured data transport formats like Protocol Buffers, Thrift, or Avro. Now, these formats aren’t inherently bad; they’re wonderful for communicating across services, either directly or via a message queue. They can even be useful in databases under the right circumstances (primarily starting with choosing to store them in their respective JSON formats).

If, however, you look at these formats and think, “wow, look at how small the data is when it’s encoded as binary”. You’re on the right track… for data transfer purposes. If you think the same thing, then take that opaque, binary-encoded blob and shove it into a database where you might want to query for the data, you need to take a step back and reevaluate your architecture. This is the equivalent of filling your pantry with all kinds of boxes and cans of food, then painting all of the boxes and cans the same color so you have to open each one to find out which one contains the Cheerios that you’re looking for.

Three nondescript cans, all black
You you can see a can of paint, a can of motor oil, and a can of soup. Choose wisely.

I have seen this anti-pattern employed on more than one occasion. In a recent example, the data stored in the binary blob had a strict schema which required migration if any changes were made. A migration of tens of millions of records. You know what’s really good at storing data with strict schemas? A relational database. Want some flexibility in stored data to avoid constant migrations? Maybe consider a schemaless database like CouchDB or MongoDB.

The reality is that sometimes engineers want to build complex solutions to solve simple problems. However, it’s been my experience that engineers should always, without exception, be looking to solve any problem with the most simple, straightforward approach that meets the requirements and delivers business value.

Being pragmatic includes thinking about the long term consequences of your decisions. Don’t be the engineer that paints all of the boxes and cans the same color.

Cover photo by Ryanniel Masucol from Pexels