NORAEarly Access

Front Matter · Chapter 3

Preface {.unnumbered}

Preface

This book teaches one system: Meridian-Cannon, an evidence-attestation pipeline for personal-data corpora. It also teaches one discipline: the construction of records that can be falsified by anyone, without the issuer's cooperation. The system is the way the discipline is made concrete; the discipline is the reason the system is interesting.

By the time you reach the last page you will have, in some combination — depending on which of the five readers (Overture) you are:

  • Built and signed a Canon-conformant attestation by hand, with a debugger.
  • Read a sealed attestation as if it were an exhibit, identified the fields a court would scrutinize, and explained the seven-step verification protocol to a colleague who had never seen it.
  • Stood up the seven-layer pipeline against a corpus of your own choosing — your archive, an organization's, a public-records release.
  • Written a verifier that consumes attestations from anyone, not just yourself.
  • Argued, in writing or in front of a sceptical reviewer, why each design decision was forced by the evidentiary requirement that drove it.

The book is opinionated. It does not survey the field of digital evidence broadly, nor enumerate every cryptographic primitive that might be relevant. It teaches one defensible architecture in depth, in the conviction that the reader who has built (or audited, or used, or regulated) one such system from cryptographic primitive to courtroom-defensible artifact is well-prepared to evaluate any other.

Why this book exists

The Overture sets out three observations at length; the substance of the book is their consequences. In short:

An unprecedented fraction of contested factual disputes — civil, criminal, regulatory, journalistic — now turns on records that exist only in private digital systems. There is no equivalent, for these records, of the paper-records era's procedural infrastructure: no analogue of the bonded document examiner, no chain of custody that the receiving party can verify without trusting the producing party.

Modern AI-assisted retrieval has become both indispensable and unverifiable. A semantic search over a 300,000-document personal archive is the only way to find the document you need; but the search itself is performed by a 137-million-parameter neural network whose decisions admit no introspection. When the network ranks a document third, the recipient has no principled basis to either accept or reject the ranking. The result is suggestions, not evidence.

The legal system is in the early stages of working out what to do about this. Federal Rule of Evidence 707 has been proposed to address machine-generated evidence specifically; courts have begun applying Daubert to ML systems with uneven results; case law on AI-fabricated citations is no longer hypothetical. The students who will architect the next decade's evidence systems need a foundation that is technical, legal, and ethical at once.

Meridian-Cannon is one answer. The Canon specification it conforms to is a source-available, implementation-neutral standard for one such artifact: an attestation — Witness, Findings, Refutation, Seal — that any recipient can verify without the issuer's cooperation. The system specified here is the reference implementation of that standard over a real personal-data corpus, tracking Canon v0.2.0.

Five readers, one book

The Overture introduces the five audiences this book addresses: the Builder, the Receiver, the Investigator, the Policymaker, and the Affected. The reading-paths matrix at front/05_reading_paths.md maps each audience to a recommended sequence of chapters and sidebars. A short preview of how the book is structured to serve all five: - The book has one spine and many entrances. The spine is the thirty-chapter argument from cryptographic primitive to capstone. The entrances are the audience-specific reading paths: a lawyer's path is shorter than a builder's path and goes through different chapters in a different order. - Five sidebar types appear in every chapter: ◆ Going Deeper (technical depth, for builders), ▼ Why It Matters (stakes, for everyone but builders especially feel them), ☉ In the Wild (real cases — for everyone), ✻ Try This (small exercises, no tools required), and § For the Record (primary-source quotation blocks). - One running case study — the parent's fifteen minutes from the Overture — recurs across many chapters. Each chapter sees it from a different role: the builder sees data flow; the lawyer sees the courtroom; the journalist sees the source records; the policymaker sees what the legal regime missed; the affected reader sees themselves. ## What you will learn The book is organized in six layers, each building on the prior. The Overture (Part 0). A pure-narrative opening. Read first, regardless of which reader you are. It is what the rest of the book argues from. Foundations (Part I, Chapters 1–4). Why personal-data evidence is a hard problem. What "verifiable" actually means, and why falsifiability is the discipline that makes verifiability possible. A working tour of the federal evidence rules and the Daubert framework, written for engineers but readable to everyone. The Canon standard itself, end to end. Computer-science building blocks (Part II, Chapters 5–12). The primitives the system is built from: cryptographic hashes, digital signatures, JSON canonicalization, schema-as-contract validation, vector embeddings, hybrid retrieval, structured information extraction with local language models, adversarial validation through multi-model consensus. Each chapter presents one primitive — concept first, mechanism second, the code in this repository third. System architecture (Part III, Chapters 13–20). The seven-layer pipeline. The procedural-legal substrate that makes the cryptographic guarantees compose with admissibility doctrine: matters, parties, productions, holds, privilege, redactions, RBAC, PII tiers. The Time-Aware Relationship Graph, Epistemic Neutrality Masking, the five-challenge refutation harness, the four attestation kinds. Engineering practice (Part IV, Chapters 21–26). Schema design and migrations. Local-First Chunking and privilege protection. Key management, audit logging, the reference verifier, the Admissibility Auditor. The day-to-day work of running an evidence system that has to hold up. Build your own (Part V, Chapters 27–30). A capstone in which you design a domain-specific Meridian for a corpus of your choosing — an estate administration, a journalistic investigation, a regulatory production, an academic archive — and operationalize it. A non-builder track exists for readers who will commission rather than write the code. ## What you will not learn This is not a book on cryptography. The cryptographic primitives are introduced by use and the mathematics referred outward to better sources. A reader who wants to design a new signature scheme should leave this book and read Boneh & Shoup; a reader who wants to use an existing one correctly will find what they need here. This is not a book on the law of evidence. Chapter 3 supplies the working frame; the appendix provides citations to the authorities; but courtroom strategy and jurisdictional doctrine are the work of counsel, not engineers. This book is not legal advice. This is not a book on machine learning research. The models used at enrichment and refutation are treated as components — chosen, configured, and validated against external benchmarks, but not designed. ## A note on tone This book takes seriously the proposition that an evidence system is, in the end, accountable to a person harmed if it fails. Most of the systems you will read about in your career will not be. The disciplines this work calls for — gap disclosure, declined-challenge inventories, falsifiability without issuer cooperation — are unfamiliar to most software engineering because most software engineering is not asked to be wrong on the record. Evidence work is. The book treats that asymmetry as the load-bearing design constraint, not as a polite abstraction. The author is a pro-se litigant who has experienced the failure modes the book describes. The book does not litigate the author's case. The case is fictionalized in detail and composite in nature wherever it appears. The discipline is what survives the abstraction. ## How to read Read the Overture first. Then read the reading-paths matrix at front/05_reading_paths.md and pick the path that fits you. Then read your path. If you are a Builder, you can also read cover to cover; the book is structured so each chapter assumes the prior chapters and nothing else. The labs are graded by an automated harness in labs/. The exercises are not optional. The book is designed for the reader who finishes them. If you are not a Builder, follow your path. The book is designed so that your path is shorter and contains nothing you cannot read. You will not do the labs; you will skip the Going Deeper sidebars; you will read the In the Wild and Why It Matters sidebars carefully. Where a chapter goes deeper than you need, the At a glance box at the top tells you what the chapter argues, so you can decide whether to read the body or move on. Errata, suggestions, and corrections are welcome. The convention is the same as the Students' Guide to Raft convention from MIT 6.5840: the book and the codebase publish from a single source of truth, and when they disagree the codebase wins. Open a pull request. ## Acknowledgments Canon v0.2.0 is the version this edition teaches. v0.2.0 introduces three areas of change from v0.1.1: - DSSE signing envelope. The canonical signature format upgrades from direct Ed25519(canonical_bytes) to the Dead Simple Signing Envelope (DSSE), an IETF/CNCF standard used by SLSA and in-toto. The signing surface is now Ed25519(PAE(payload_type, JCS(attestation))), where PAE is the Pre-Authentication Encoding defined in the DSSE spec. The envelope carries the payload type as an explicit field, making the format version a falsifiable claim embedded in the signed material. Chapter 7 covers the PAE construction; Chapter 25 updates the verifier. - Tool ecosystem expansion. Eleven optional integrations are now installable as PyPI extras: Presidio (NER-based PII masking), Outlines (constrained LM decoding), inspect-ai (formal adversarial evaluation), Unstructured.io (section-aware document partitioning), C2PA (media provenance manifests), Rekor/Sigstore (public transparency log), ParadeDB/pg_search (BM25 in Postgres via Tantivy), pgvectorscale (StreamingDiskANN vector index), WeasyPrint/Typst (court-quality PDF backends), Langfuse (LM observability), and Dagster (software-defined asset pipeline). All integrations are optional; core functionality retains no new required dependencies. The pyproject.toml optional- extras architecture is described in the Prerequisites chapter. - Cross-platform CI and keyring support. The CI matrix now covers ubuntu/macos/windows × Python 3.10/3.12/3.13. Key management supports macOS Keychain, Windows Credential Manager, Linux SecretService, and keyrings.alt for headless/CI environments.

License note. Canon v0.2.0 is published under the NORA Canon Evaluation & Commentary License v1.0 — source-available and proprietary. Reading, citing, and commentary are permitted; implementation, redistribution, ML training, and commercial use require a separate license. Canon v0.1.1 and all earlier releases remain irrevocably MIT + CC0 1.0 Universal.

The specification, its authors, and the open contributors are the substrate on which everything in this book stands. The case law cited throughout is the work of the courts that produced it. The cryptographic literature the book draws on is sixty years of work by people who do not all agree with each other, and whose disagreements the book has tried to honor honestly. None of the errors are theirs.

— J. Patrick White, May 2026.