Conformance Testing

Natural language is not a spec. A spec is something a machine can check. The conformance suite is the spec.

Every chapter in this book has described the Canon in words. Words admit interpretations. "Canonicalize the attestation excluding the seal field" is clear enough in English, but the implementation has to decide: does "excluding" mean omitting the key seal from the JSON object before serializing, or does it mean setting seal to null? The RFC 8785 canonicalization algorithm produces different bytes for those two inputs. One of them is correct. The conformance suite tells you which. This chapter covers what conformance means, how to test it, and why the test suite is itself the authoritative definition of "correct." ## At a glance - Four conformance tiers — schema (Tier 1), canonicalization (Tier 2), seven-step (Tier 3), and chain integrity (Tier 4) — must be passed in order: a Tier 1 failure disqualifies an implementation from Tier 2, and so on; passing all four is the minimum bar for production use. - The canonicalization corpus (286,000+ cyberphone vectors) catches parser differences between implementations that unit tests written by the same team as the implementation will never surface; the four failure categories — UTF-16 sort, ECMA-262 number formatting, lone surrogates, numeric edges — each have a predictable implementation bug behind them. - Passing all four tiers is the minimum for production use: a Tier 4 chain-integrity failure has an unambiguous legal implication (the audit trail was altered), while Tier 2 failures are engineering defects that must be corrected before any attestation produced by the implementation can be trusted. ## Learning objectives By the end of this chapter you should be able to: 1. Run the schema conformance tier (Tier 1) against a custom Pydantic model by calling Attestation.model_validate() on a fixture set and reporting accept/reject decisions, including on intentional near-misses such as capitalization variants of enum values. 2. Generate a canonicalization test vector from an attestation under test and verify it byte-for-byte against a second implementation using the cyberphone corpus as the oracle. 3. Explain why the cyberphone corpus tier (Tier 2) catches parser bugs that unit tests miss — specifically, why tests written by the implementation author cannot cover the boundary cases that a shared external corpus covers. 4. Implement the chain integrity check (Tier 4) against a live Postgres instance: walk audit_log in id order, recompute each row's chain hash, and report the first sequence number where the recomputed hash diverges from the stored value. ## Why natural language fails The TLS 1.3 standardization story makes the point better than any analogy. RFC 8446 is a 95-page document describing the TLS 1.3 handshake. It was written by experts, reviewed by more experts, and iterated over several years. It is a good specification. It was not enough. Before RFC 8446 was published, the standards process ran implementations from Google, Mozilla, Cloudflare, and Apple against a shared test corpus. That corpus found 23 interoperability issues — issues that had passed each implementation's own unit tests. The issues ranged from key-schedule computation to alert message handling to session resumption behavior. None of them were ambiguous in intent; they were all ambiguous in the boundary cases of implementation detail. The shared test corpus closed those issues before the RFC was published. The RFC was the document; the corpus was the spec. Canon is younger than TLS and its corpus is smaller. The principle is identical. The nora-canon-conformance-suite is not a supplement to the Canon specification. It is the executable form of the specification. An implementation that passes the suite is conformant. An implementation that fails the suite is wrong, even if its author can construct a plausible reading of the spec text that says otherwise. > ▼ Why It Matters — Two verifiers, one court. > > In the 2026 TPR proceeding, the parent's attorney submitted an attestation > as a court exhibit. Opposing counsel hired an independent expert to verify > it. The expert's verifier was a different implementation — a Go binary > written by a forensic software firm. If the two verifiers disagreed on the > verdict, the exhibit would have faced a foundational challenge: which > verifier is right? The answer, for a conformance-tested implementation, is > deterministic: run both against the conformance suite. The one that passes > is right. The one that fails is wrong, regardless of who wrote it or who > paid for it. > > The conformance suite is the tiebreaker. That is not a metaphor. In > contested evidence disputes, having a machine-checkable definition of > "correct" is the difference between an exhibit that survives a Daubert > challenge and one that doesn't. ## Structure of the conformance suite The nora-canon-conformance-suite (pip-installable, on the roadmap) is organized in four tiers. Each tier tests a different layer of the Canon stack. Passing a lower tier is a prerequisite for the next; you cannot pass Tier 2 with a Tier 1 failure in your implementation. ### Tier 1 — Schema conformance A Tier 1 test asks: does the attestation JSON parse as a valid Canon attestation against the Pydantic models in meridian/canon/schema.py?

This is the minimum bar. If an attestation fails Tier 1, it is not a Canon attestation. The test is:

from meridian.canon.schema import Attestation
import json

def test_tier1_schema(fixture_path: str) -> bool:
    with open(fixture_path) as f:
        data = json.load(f)
    try:
        Attestation.model_validate(data)
        return True
    except Exception:
        return False

Tier 1 tests cover: required top-level fields (canon_version, attestation_id, subject, issued_at, issuer, witness, findings, refutation, coverage, seal); enum values for inference_type, challenge_type; presence of declined array in coverage; presence of at least one Challenge in refutation.challenges. R1 (schema) and R2 (attestation ID format) are fully tested at Tier 1. > ◆ Going Deeper — Pydantic v2 as an executable schema. > > meridian/canon/schema.py uses Pydantic v2's model_validate method, > which is strict by default: fields with type Literal["specific_value"] > must match exactly; fields typed as Enum must match a valid enum member; > fields typed datetime must be ISO 8601 with timezone. A conformance > implementation that uses a different validator library must produce > identical accept/reject decisions on every Tier 1 test vector. The vectors > include intentional near-misses: an inference_type of "Extraction" > (capitalized) vs "EXTRACTION" (correct), a missing gaps array vs a > present but empty one, a seal field with chain_hash present but > signature absent. > > Near-misses are the pedagogically important test vectors. An implementation > that accepts "Extraction" is a schema-nonconformant implementation, even

though the intent was obviously the right enum value.

Tier 2 — Canonicalization conformance

Tier 2 asks: does the implementation produce the same canonical bytes as the reference on every vector in the cyberphone test corpus?

The cyberphone corpus is 286,000+ test vectors for RFC 8785 canonicalization. It was published by the RFC's editors specifically to close implementation disagreements. It is the closest thing the JSON canonicalization ecosystem has to an oracle.

The conformance suite integrates the cyberphone vectors directly:

# From the conformance suite's Tier 2 test module
import json, subprocess
from pathlib import Path

def run_tier2(implementation_binary: str, vectors_dir: Path) -> dict:
    results = {"pass": 0, "fail": 0, "fail_categories": {}}
    for vec in vectors_dir.glob("*.json"):
        input_data = json.loads(vec.read_text())
        expected_canonical = (vec.with_suffix(".canonical")).read_bytes()
        result = subprocess.run(
            [implementation_binary, "canonicalize", str(vec)],
            capture_output=True
        )
        if result.stdout == expected_canonical:
            results["pass"] += 1
        else:
            results["fail"] += 1
            category = vec.parent.name  # e.g. "unicode_edges", "numeric"
            results["fail_categories"][category] = results["fail_categories"].get(category, 0) + 1
    return results

The four bug classes from Chapter 7 are each represented by a vector category: UTF-16 sort order (vectors in unicode_sort/), ECMA-262 number formatting (vectors in numeric/), lone surrogate handling (vectors in lone_surrogates/), and numeric edges (vectors in numeric_edges/). An implementation that passes all four categories has passed Tier 2. Implementations that fail Tier 2 almost always fail in a predictable order. The numeric/ category fails first because ECMA-262 number formatting is the least intuitive requirement — 1e100 must serialize as 1e+100, not 1E100, not 1.0e100. The lone_surrogates/ category fails next because lone surrogates are legal in JavaScript JSON but not in valid UTF-8; most non-JavaScript canonicalizers have to handle them specially. Unicode sort order and numeric edges fail less frequently, but when they fail they fail silently.

☉ In the Wild — The TLS 1.3 shared test corpus.

The TLSWG (TLS Working Group) ran a shared interop test before publishing RFC 8446. Implementations from Google (BoringSSL), Mozilla (NSS), Cloudflare (cloudflare-go TLS), and Apple (SecureTransport) were run against a common fixture set. The process found 23 interoperability issues in implementations that had each passed their own internal test suites.

The issues found were not obscure edge cases. Six involved session ticket handling, which all four implementations had explicitly tested. The divergence was in behavior when a ticket arrived during the handshake rather than after it — a timing condition that each implementation's unit tests had not covered because the unit tests were written by the same team that wrote the implementation.

An external test corpus is not a sign of distrust. It is the only mechanism that catches the bugs you didn't think to test for.

Tier 3 — Seven-step conformance

Tier 3 asks: does the implementation produce the same verdict as the Python reference on every fixture in the Lab 25 corpus?

This is the interop test that Lab 25 builds toward. The conformance suite ships the Lab 25 fixture corpus — seven canonical fixtures, one for each step's failure mode plus one known-valid — as Tier 3's baseline.

# Running the Tier 3 tests with the CLI
canon-verify \
  --implementation path/to/your/verifier \
  --fixtures-dir labs/ch25_verifier/fixtures/ \
  --tier 3

The suite runs both the reference Python verifier and the implementation under test against each fixture. It compares the full JSON verdict output, not just the top-level verdict field. An implementation that returns "invalid" for the right reasons but the wrong step name in steps is nonconformant at Tier 3 — the step-level diagnostics are part of the conformance contract, because they are what the recipient uses to understand why an artifact failed. > ◆ Going Deeper — Why step-level diagnostics are part of the contract. > > A verdict of "invalid" tells the recipient that the attestation failed > verification. It does not tell them which step failed, which means it > does not tell them whether the failure is a tampered signature (Step 2), > a corrupted content blob (Step 4), or a malformed supports graph (Step 5). > Each failure has different evidentiary implications. > > A Step 2 failure (signature invalid) means the attestation was tampered > with after sealing, or the private key was compromised. This is a > potential incident. > > A Step 4 failure (witness content re-hash failed) means the content at > content_ref was altered after the attestation was issued. The > attestation itself may be cryptographically intact; the underlying content > is suspect. > > A Step 5 failure (supports closure broken) means the attestation's > internal reasoning graph is malformed — a claim references a support that > doesn't exist. This is more likely a bug in the issuing system than > tampering. > > These three failure types require different responses. The conformance > suite's requirement that step-level diagnostics match the reference is > not pedantry. It is the requirement that a recipient using any conformant > verifier receives the information they need to respond correctly. ### Tier 4 — Chain-integrity conformance Tier 4 asks: does the audit_log chain verify correctly? The audit log is hash-chained: each row's chain_hash is computed over the previous row's chain_hash plus the current row's content. A database that passes Tier 4 has an audit log where no row has been deleted, inserted out-of-order, or modified since the chain was computed.

The Tier 4 test is against a live (or exported) Postgres database:

canon-verify \
  --connection-string postgresql://... \
  --tier 4 \
  --matter-id <uuid>

The verifier walks the audit_log table for the specified matter in sequence-number order, recomputes each row's chain hash, and reports any gap or mismatch. A chain-integrity failure is a Tier 4 nonconformance. Tier 4 is the only tier that requires a database connection. It is also the only tier where a failure has an unambiguous legal implication: the audit trail for this matter has been altered after the fact. > § For the Record — FRE 901(b)(9) and process integrity. > > "Evidence about a process or system used to produce a result, showing > that the process or system produces an accurate result." Federal Rule > of Evidence 901(b)(9) allows authentication of a record by testimony > about the process that produced it. A Tier 4 chain-integrity verification > report is evidence about a process — evidence that the audit log was > not modified after the fact. That report is itself an authenticating > artifact under FRE 901(b)(9). ## Running the suite against your implementation The nora-canon-conformance-suite is on the roadmap; Lab 25 interop tests serve as the conformance proxy until it ships. The workflow will be:

# Install
pip install nora-canon-conformance-suite

# Run all four tiers against a Python implementation
canon-verify \
  --implementation meridian.canon.walk \
  --fixtures-dir /path/to/fixtures \
  --tier all

# Run against a binary verifier (any language)
canon-verify \
  --implementation ./my-verifier-binary \
  --fixtures-dir /path/to/fixtures \
  --tier 1,2,3

The suite outputs a structured report:

{
  "implementation": "./my-verifier-binary",
  "tier_results": {
    "tier1": { "pass": 142, "fail": 0 },
    "tier2": { "pass": 286112, "fail": 203, "fail_categories": {"numeric": 203} },
    "tier3": { "pass": 7, "fail": 0 },
    "tier4": "not_run"
  },
  "conformant": false,
  "blocking_tier": 2
}

A Tier 2 failure of 203 on numeric vectors is diagnostic: the implementation is likely serializing 1e+100 as 1e100 (missing the + in the exponent), which is an off-by-one in the ECMA-262 number formatting rule. The report's fail_categories field points directly at the failure class. > ✻ Try This — Run the cyberphone vectors against your canonicalization. > > Clone the cyberphone test vectors: > git clone https://github.com/cyberphone/json-canonicalization > > Run your canonicalization implementation against the vectors in > testdata/input/. For each input, the reference output is in > testdata/output/ with the same filename. > > Count how many pass. Then identify which category fails first: > numeric edges (numerics/), high-codepoint strings (unicode/), > or lone surrogates (surrogates/)? > > The answer tells you which of the four bug classes from Chapter 7 is your > implementation's Achilles heel. Fix that class first. The others will > likely follow from the same root cause. ## Writing conformance tests for custom extensions The suite provides a ConformanceBase class that domain extensions can inherit from. A medical-records extension adds HIPAA-specific assertions:

# my_extension/conformance.py
from nora_canon_conformance_suite import ConformanceBase

class HIPAAConformance(ConformanceBase):
    def test_phi_redaction_in_coverage(self, attestation_data: dict) -> bool:
        """Every attestation over PHI must declare PHI_ACCESS_CONTROL
        in coverage.run or coverage.declined."""
        coverage = attestation_data.get("coverage", {})
        run_types = [c.get("challenge_type") for c in coverage.get("run", [])]
        declined_types = [d.get("challenge_type") for d in coverage.get("declined", [])]
        phi_covered = "PHI_ACCESS_CONTROL" in run_types or "PHI_ACCESS_CONTROL" in declined_types
        return phi_covered

    def test_minimum_necessary_claim(self, attestation_data: dict) -> bool:
        """Every Claim over a PHI field must have a 'minimum_necessary_assessed' gap entry."""
        findings = attestation_data.get("findings", {})
        for claim in findings.get("claims", []):
            if "phi_field" in claim.get("content", ""):
                if "minimum_necessary_assessed" not in claim.get("gaps", []):
                    return False
        return True

The extension conformance tests run as part of the full suite when the domain configuration is loaded. A medical deployment that passes HIPAAConformance in addition to Tiers 1–4 is conformant for medical-records use.

▼ Why It Matters — The conformance extension is not optional for regulated domains.

HIPAA's Security Rule (45 CFR Part 164) requires covered entities to implement audit controls that record access to PHI. A Meridian-Cannon deployment handling PHI is not automatically HIPAA-compliant — it requires the PHI-specific schema extensions (Pattern 1), the PHI-specific challenge types (Pattern 4), and the conformance tests that verify those extensions are working correctly. The HIPAAConformance extension suite is the > machine-readable proof that the system satisfies those requirements. > > A system that says it handles PHI correctly but cannot pass a > domain-specific conformance suite is a system making an unverifiable > claim. That is the posture Canon exists to replace. ## The current conformance proxy Until nora-canon-conformance-suite ships, run pytest test_lab.py from labs/ch25_verifier/. This test: - Runs the Python reference verifier against all seven Lab 25 fixtures. - Runs your second-language verifier against all seven fixtures. - Compares the verdict JSON byte-for-byte. - Fails if any verdict disagrees. Passing pytest test_lab.py is not full conformance certification. Full certification requires all 286,000+ cyberphone vectors and the complete fixture corpus. Passing Lab 25 is the minimum signal that your verifier is ready for production use. > ◆ Going Deeper — The conformance suite is versioned, not frozen. > > The Canon specification will evolve. Canon v0.2.0 adds DSSE signing, > new required fields, and feature-flag-gated backends (per the versioning > policy in the Canon spec §12). When the spec version increments, the > conformance suite increments to match. A conformant v0.1.1 implementation > is not automatically conformant against v0.2.0 vectors. > > The versioning policy requires that v0.2.0 attestations are structurally > distinguishable from v0.1.1 attestations (the canon_version field is > mandatory, and the Tier 1 tests check it against the suite's version). > A v0.1.1-conformant verifier that encounters a v0.2.0 attestation must > fall back to legacy verification without error — it must not silently > accept a DSSE envelope it cannot verify. See the version-detection > conformance requirement below. ## v0.2.0 conformance requirements Canon v0.2.0 introduces DSSE signing, feature-flag-gated backends, and keyring portability. Six new required conformance tests and two feature-flag test groups are added at Tier 3. ### New required tests 1. DSSE signing round-trip. emit_dsse() followed by verify_dsse() must succeed for any valid attestation. A conformant implementation may not produce a DSSE envelope that its own verifier rejects. 2. PAE determinism. The Pre-Authentication Encoding must be deterministic: the same payload_type and payload must always produce the same PAE bytes, regardless of invocation order, interpreter, or platform. 3. chain_hash consistency. The chain_hash field in a DSSEEnvelope must equal SHA-256(base64url_decode(dsse_envelope.payload)). A verifier must check this field independently of the signature verification step. 4. Version detection. A verifier presented with a v0.1.x attestation — identified by the presence of a seal block and the absence of a dsse_envelope block — must fall back to legacy seven-step verification without error. It must not reject the attestation solely because it lacks a DSSE envelope. 5. Cross-language PAE. A PAE computation in Python and in any second language (Go, Rust, TypeScript) must produce identical bytes for the same payload_type and payload inputs. This is specified in the DSSE RFC and is testable by running both implementations against the same fixture inputs. The capstone's cross-language verifier must include this test. 6. Keyring round-trip. keygen() followed by load_private() must succeed with both the platform keyring backend and with keyrings.alt.file.PlaintextKeyring. A conformant implementation may not assume a specific keyring backend is available. ### Feature flag conformance These tests must be run once per flag configuration: | Flag | Value | Required behavior | |---|---|---| | MERIDIAN_USE_PARADEDB | 0 (default) | Queries must use tsvector; @@@ operator must not appear in query plans | | MERIDIAN_USE_PARADEDB | 1 | Queries must use @@@ operator against the ParadeDB index | | MERIDIAN_REKOR_ENABLED | 0 (default) | publish_attestation() must return {"status": "disabled"} with no network call | | MERIDIAN_REKOR_ENABLED | 1 | publish_attestation() must attempt Rekor submission | ### CI matrix Tests must pass across all supported platform and Python version combinations. The .github/workflows/ci.yml in the repository implements this matrix: | Platform | Python versions | |---|---| | ubuntu-latest | 3.10, 3.12, 3.13 | | macos-latest | 3.10, 3.12, 3.13 | | windows-latest | 3.10, 3.12, 3.13 | A conformant v0.2.0 implementation passes all tests in this matrix before any release. Tests that fail on a subset of the matrix are blocking failures, not warnings. ### Headless keyring Tests that invoke keygen() or load_private() must set:

PYTHON_KEYRING_BACKEND=keyrings.alt.file.PlaintextKeyring

before running. The repository conftest.py sets this automatically so that CI runs on headless machines without a platform keyring daemon. A test that relies on the platform keyring without this fallback is not CI-safe and will fail on Linux runners.

💡Key Takeaways

- The six new v0.2.0 required conformance tests are: DSSE signing round-trip, PAE determinism, chain_hash consistency with decoded payload, version detection (v0.1.x fallback without error), cross-language PAE agreement, and keyring round-trip with both platform and file backends. - Cross-language PAE tests are mandatory because PAE byte exactness is the only property that guarantees a signature produced in Python can be verified in Go or Rust — a PAE implementation that encodes len_le8 incorrectly will fail silently in same-language tests but fail visibly in cross-language fixture comparison. - The CI matrix covers ubuntu-latest, macos-latest, and windows-latest, each with Python 3.10, 3.12, and 3.13 — tests that pass on a subset of this matrix are blocking failures, not warnings, because platform-specific keyring behavior can silently break on headless runners. - The PYTHON_KEYRING_BACKEND=keyrings.alt.file.PlaintextKeyring environment variable must be set in conftest.py (via os.environ.setdefault()) so that any test invoking keygen() or load_private() uses the file backend on headless CI runners without a D-Bus session. - Feature flag conformance tests must exercise both paths: MERIDIAN_USE_PARADEDB=0 confirms queries use tsvector and the @@@ operator does not appear in query plans; MERIDIAN_USE_PARADEDB=1 confirms queries use @@@; MERIDIAN_REKOR_ENABLED=0 confirms publish_attestation() returns {"status": "disabled"} with no network call.

## Exercises ### Warm-up 1. Read the cyberphone canonicalization test vectors at https://github.com/cyberphone/json-canonicalization. Identify which input categories are represented. For each category, name one implementation bug that would cause failures in that category specifically. 2. Run pytest -k canonicalize in the repository. Read the test output. Which vectors does the Python reference implementation pass? Which does it handle as edge cases? ### Core 3. The Lab 25 fixture corpus contains one fixture for each step's failure mode. For Step 3 (chain hash recompute), construct a new fixture that fails Step 3 in a different way than the existing fixture. Specifically: construct an attestation where the seal field is present but the chain_hash was computed over a different excluded field set than the spec requires. Both verifiers should return invalid on your fixture. 4. Implement the HIPAAConformance.test_minimum_necessary_claim test using only the attestation JSON (no database access). Run it against five attestations from your test corpus. Report the pass/fail counts. 5. Generate a JSON object {'a': 1, 'b': ' '} (a null character in a string value). Run it through rfc8785.dumps() in Python. Then implement the same canonicalization in any other language (JavaScript, Go, or Ruby). Verify that both produce byte-identical output. If they differ, identify which RFC 8785 rule the non-Python implementation violates. ### Stretch 5. The TLS 1.3 shared corpus found 23 issues in implementations that had each passed their own unit tests. For the Canon ecosystem, identify three test cases that would likely expose divergences between a Python verifier and a Go verifier — test cases that are not in the current Lab 25 fixture set. Implement them as new fixtures. 6. Write a Tier 2 harness that runs the cyberphone numeric vectors against the Python rfc8785 library and reports the failure count and failure category breakdown. If the Python library fails any vectors, file the failure as a bug report (the cyberphone repository accepts issues). ## Build-your-own prompt For your capstone implementation: run it against the Lab 25 fixture corpus before your Week 9 oral defense. If your second-language verifier fails any fixture, diagnose the failure before the defense — not during it. The conformance test is the reviewer's first question. ## Further reading - RFC 8446 (TLS 1.3), §1 and the Working Group process notes — the shared test corpus is described in the process notes, not the RFC text. - The cyberphone JSON canonicalization test vectors: https://github.com/cyberphone/json-canonicalization. - RFC 8785 (JCS), Appendix B — normative test vectors. - The Canon spec v0.1.1, §12 (versioning and upgrade policy). - Russ Cox, Transparent Logs for Skeptical Clients, https://research.swtch.com/tlog — the argument that the test corpus is the spec, from the perspective of Certificate Transparency. - FRE 901(b)(9) — authentication of records produced by a process. - meridian/canon/tests/ in this repository — the reference test suite from which the conformance proxy tests are drawn. - The dossier research/01_cryptography_pedagogy.md.

Next: Chapter 30 — Operationalization and Deployment. Building it is one problem. Running it under adversarial conditions is another.