The Problem

Code that passes its own tests but doesn't implement the specification.

Every SDK ported from a reference implementation inherits a predictable class of bug: code that passes all tests but doesn't correctly implement the specification. Two errors cancel out, round-trips succeed, CI is green — but the wire format diverges from what a conforming peer would produce.

We call this the "letter vs spirit" pattern. We found it independently in two SDKs, across four languages, months apart. It's not random — it's a systemic consequence of the porting process.

Why test coverage can't catch it

Test coverage measures execution. "Did this code run during a test?" A line counter can answer that. It's a binary, mechanical question.

Compliance coverage is a different question entirely: "Does this code correctly implement Section 3.2 of BRC-42?" That's semantic. It requires reading two codebases, understanding intent, and judging whether the implementation honours the spirit of a specification — not just whether it passes the tests someone wrote from the same translation.

That's an LLM-shaped task. The line counter can't do it. The test suite can't do it — the tests were probably written alongside the buggy code by the same developer who made the same translation mistake.

Proven in the wild

Two independent compliance reviews, weeks apart, on different SDKs written in different languages by different developers:

The bugs were the same. Not identical line numbers — but the same classes:

Two different teams making the same mistakes isn't a fluke. It's a structural property of the porting process. Any SDK ported from any reference will produce this pattern. The question isn't whether your code has these bugs. It's whether you've found them yet.

The insight: if two independent ports produce the same bugs, the bugs aren't random. They're predictable. Which means they're preventable — if you know what to look for.

What catches it

Three things, together:

  1. Reference comparison — does this implementation match other implementations of the same spec?
  2. Specification authority — does this implementation match the authoritative spec (BRC, RFC, standard)?
  3. Mediation — when the reference and the spec disagree, which one is wrong?

That's three independent perspectives on the same code. No single one catches the letter-vs-spirit pattern reliably. The combination does. That's why we built the service around triumvirates: three bearings give a fix, two give ambiguity.