The Problem

Code that passes its own tests but doesn't implement the specification.

Every SDK ported from a reference implementation inherits a predictable class of bug: code that passes all tests but doesn't correctly implement the specification. Two errors cancel out, round-trips succeed, CI is green — but the wire format diverges from what a conforming peer would produce.

We call this the "letter vs spirit" pattern. We found it independently in two SDKs, across four languages, months apart. It's not random — it's a systemic consequence of the porting process.

Why test coverage can't catch it

Test coverage measures execution. "Did this code run during a test?" A line counter can answer that. It's a binary, mechanical question.

Compliance coverage is a different question entirely: "Does this code correctly implement Section 3.2 of BRC-42?" That's semantic. It requires reading two codebases, understanding intent, and judging whether the implementation honours the spirit of a specification — not just whether it passes the tests someone wrote from the same translation.

That's an LLM-shaped task. The line counter can't do it. The test suite can't do it — the tests were probably written alongside the buggy code by the same developer who made the same translation mistake.

Proven in the wild

Two independent compliance reviews, weeks apart, on different SDKs written in different languages by different developers:

Ruby SDK: 137 findings, 21 HIGH severity
Swift SDK: 134 findings, 14 HIGH severity

The bugs were the same. Not identical line numbers — but the same classes:

Money-loss sighash bugs — sourceSatoshis silently defaulting to 0, producing transactions that compute but never validate on-chain
Chronicle opcodes treated as no-ops — same consensus-critical bug in both SDKs, independently implemented
Cross-SDK auth handshake failures — nonce encoding divergences that produce valid-looking but incompatible output
BEEF serialisation corruption — V1/V2 hybrid output that round-trips locally but breaks cross-SDK

Two different teams making the same mistakes isn't a fluke. It's a structural property of the porting process. Any SDK ported from any reference will produce this pattern. The question isn't whether your code has these bugs. It's whether you've found them yet.

The insight: if two independent ports produce the same bugs, the bugs aren't random. They're predictable. Which means they're preventable — if you know what to look for.

What catches it

Three things, together:

Reference comparison — does this implementation match other implementations of the same spec?
Specification authority — does this implementation match the authoritative spec (BRC, RFC, standard)?
Mediation — when the reference and the spec disagree, which one is wrong?

That's three independent perspectives on the same code. No single one catches the letter-vs-spirit pattern reliably. The combination does. That's why we built the service around triumvirates: three bearings give a fix, two give ambiguity.