Serialization in C++

The problem

Game engines operate on data of various kinds with different requirements.

Networking code: high compression requirements and untrusted inputs.
Assets: format should be supported between different binaries.
Level data: changes often and should remain backward compatible.

I was looking for a serialization solution that handles all of this. It also had to play nice with the engine's arena-based memory management, custom types (Mat4, Vec3, String8), and not destroy compile times with layers of template magic.

I tried FlatBuffers, Cap'n Proto, bitsery, and a bunch of smaller libraries. Long story short: none of them fit. Code generation tools are huge, painful to integrate with custom types, and conflict with the engine's own codegen (metagen). Generic C++ solutions are template nightmares. C libraries that make you write serialization by hand are error prone, and I hate debugging binary serialization bugs with passion.

In the ideal world of infinite time and resources I would just write a codegen tool myself, but I want to release something this century. So I stepped back and thought about what I actually need.

What does a serializer even need to know?

Given the following structure:

struct Entity
{
  Vec3 position;
  f32 health;
};

struct Level
{
  String8 name;
  loops2::Array<Entity> entities;
  Mat4 camera;
};

what information does a serialization framework need to produce binary, JSON, and UI code from it?

For each field it needs: name, type, and value. And the "object" structure — each entity in "entities" is a separate object, not an array of random binary data.

That's... not a lot. Assuming I'm open to writing some code and don't want to implement reflection in a language that doesn't support it, I need something along the lines of:

struct Reflect {
  void field(String8 field_name, String8 &value);
  void field(String8 field_name, Vec3 &value);
  void field(String8 field_name, f32 &value);
  void field(String8 field_name, Mat4 &value);
  template<typename T> void field(String8 field_name, loops2::Array<T> &array);
};

Overloaded field() method for each supported type — the compiler picks the right overload based on the field's type, so the call site always looks the same: r.field("name", value). The template overload handles arrays — it iterates elements and calls the user-written reflect() function for each one, which dispatches back into field() calls. That's what makes the whole thing compose: structs containing structs containing arrays of structs all just work.

Then I write one reflect function per struct:

template<typename R>
void reflect(R &r, Entity &o)
{
  r.field("position", o.position);
  r.field("health", o.health);
}

template<typename R>
void reflect(R &r, Level &o)
{
  r.field("name", o.name);
  r.field("entities", o.entities);
  r.field("camera", o.camera);
}

And that's it. The R template parameter is the serializer — could be a JSON writer, a binary encoder, a UI widget generator, or a JSON reader, a binary decoder... you get the idea. Write the reflect function once, get all formats for free.

The neat trick is that the same reflect function works for both reading and writing. When serializing, the serializer reads from the field references and writes to an output buffer. When deserializing, it reads from an input buffer and writes into the same field references. Same function, same signature, opposite data flow direction.

Since there are three formats (binary, JSON, UI) and two directions (read/write) per format, that's six template instantiations per struct. And it compiles to exactly the same code I would write by hand otherwise, so compile time cost is barely noticeable.

Custom logic where you actually need it

Here's the thing that makes this approach really nice: you can add custom serialization logic without fighting the framework. Since the serializer type is a template parameter, you can branch on it with if constexpr:

template<typename R>
void reflect(R &r, Entity &o)
{
  r.field("position", o.position);
  r.field("health", o.health);

  if constexpr (R::is_ui) {
    // show a slider for health in the editor, clamp to [0, 100]
    r.set_range("health", 0.0f, 100.0f);
  }

  if constexpr (R::is_binary_reader) {
    // migration: old saves don't have stamina, default to 50
    r.field_with_default("stamina", o.stamina, 50.0f);
  }
}

The branches that don't match the serializer type get compiled away entirely — zero runtime cost. So you get one place in the code that describes everything about how a struct is serialized across all formats, with per-format customization right there where you can see it.

The field_with_default in the binary reader example does imply the serializer can handle missing fields — that's not free for a binary format. In practice, the binary serializer tags fields, so it knows when a field is absent and can fall back to the provided default. It costs a few extra bytes per field, but that's a trade-off I'm happy to make for painless data migration.

This is the thing you actually want from a serializer anyway: the ability to say "binary format does X, JSON does Y, UI shows Z" without scattering that logic across different files or fighting some framework's abstraction. It's just C++ doing C++ things.

The journey there

For the curious, here's the longer version of how I arrived at this.

I started with FlatBuffers. It seemed like an obvious choice — "gamedev oriented" advertising and all that. I forked it, integrated custom types, hooked up arena allocation. The codebase is small and easy to change, so the basics went fast.

But then the problems piled up. Schema files get ugly quickly when you need to control memory layout. And the fundamental issue is that FlatBuffers generates both the serialized structure and the runtime structure. At first this feels nice — less work! But "how to optimally store things" and "what's the optimal format to process them at runtime" rarely have the same answer, and both evolve independently. So you end up either fighting the generated code or adding a third structure to translate between them.

Then I tried Cap'n Proto. Same author as the original Protocol Buffers, and he has strong opinions on how serialization should work, which I respect. It only generates the wire format structures, leaving the runtime structures to you — which fits the engine's C spirit much better. But actually writing the conversion code between Cap'n Proto types and engine types turned out to be just as painful as I feared.

After that I tried bitsery, got the traits working for the engine's containers, and then realized I'd need to do the same thing for JSON. And then again for UI. Three times the same tedious work for every struct.

That's when I finally sat down and asked "what does the serializer actually need to know?" and the answer turned out to be embarrassingly simple.

Nothing new under the sun

This approach isn't novel. It's basically a manual visitor pattern, or poor man's reflection if you want to be fancy about it. But that's the point — it's simple, it's obvious, and it works. No schema files, no external tools, no template metaprogramming that takes a PhD to debug. Just overloaded methods and a template parameter.

It's easy to extend to handle versioning, value limits, array size limits, and whatever else comes up. You just add methods to the serializer struct. The reflect functions stay clean and readable.