NORAEarly Access

Part III — System Architecture · Chapter 24

The Time-Aware Relationship Graph (TARG)

The Time-Aware Relationship Graph (TARG)

Identity is not stable across time. A phone number that meant Marcus Webb in 2020 may mean Sarah Chen in 2024. The graph never collapses identity across time.

Prerequisites

Before reading this chapter, you should be comfortable with: Chapters 9–10, 16 (Embeddings, Hybrid Retrieval, Procedural Primitives). TARG's Pass 3 uses embedding ANN; its output feeds into attestation emission.

The question that matters in a contested proceeding is almost never "who is connected to whom?" That question describes the present, and the present is rarely what is in dispute. The question that matters is "who was connected to whom, in what role, on that specific date?" That question describes the past at a point. Getting it wrong by three weeks can be the difference between a correct timeline and a false one.

The Time-Aware Relationship Graph, or TARG, is the architecture that makes the temporal version of the question answerable from your evidence corpus directly — without hand-assembling a timeline from discovery documents, deposition transcripts, and your notes. This chapter covers what the TARG is, how it is built, what it gets wrong when you leave time out, and how the Meridian-Cannon schema implements it.

At a glance

  • Every relationship edge carries a validity window (tstzrange) so that temporal queries answer "who was responsible on date X" rather than "who appears most often or most recently in the corpus." - Temporal queries return not just a name but the source document that establishes each validity bound, making every TARG answer traceable to a signed ObservationAttestation. - Entity resolution is a three-pass architecture — exact normalization, rule-based merge, fuzzy merge — and every resolution decision is written to an audit table that can be produced in discovery. ## Learning objectives By the end of this chapter you should be able to: 1. Write a entity_relationships row with correct valid_during bounds using tstzrange, and explain the difference between inclusive and exclusive bound notation ([) vs. ()). 2. Query entity_relationships with the && overlap operator to return all actors who had an active relationship to a given entity on a specified date, and explain how the GiST index makes this efficient. 3. Describe the three passes of entity resolution — exact normalization, rule-based merge, fuzzy composite score — and explain what happens at each confidence threshold (auto-resolve, manual review queue, treated as distinct). 4. Explain when a TARG edge becomes Canon-attestable: what the source_doc_id FK provides and how it connects a relationship edge to a signed ObservationAttestation. ## The problem with a static graph A relationship graph without temporal awareness is a photograph. It captures one state. If you load the full evidence corpus into a static graph and ask "who was the assigned caseworker?" you will get the caseworker who was last in that role, or the caseworker who appears most often in the corpus, or the caseworker who most recently left a footprint in the data. None of those answers may be correct for the date in question. The consequences are not merely analytic. A motion filed on the wrong factual predicate — "caseworker A was assigned at the time of the incident" when caseworker B was actually assigned — is an error with legal consequences. If it reaches a brief, it may later be corrected, with a cost to credibility. If it reaches a declaration or an exhibit list, it may be objected to. If opposing counsel catches it and you don't, it becomes impeachment material. The TARG stores every relationship edge as a time-bounded interval. Each edge has a validity period: a start timestamp, an end timestamp (or open end, if still active), and the evidence basis for both bounds. A query for "who was the assigned caseworker on March 15, 2024?" returns a name plus the source record that establishes the validity period — traceable to a specific document in the corpus and attestable. ## Validity periods in Postgres: TSTZRANGE Postgres has a native range type for timestamps with time zone: tstzrange. It stores the lower and upper bounds of an interval, including whether those bounds are inclusive or exclusive, and it is indexable with a GiST index. The overlap operator && returns true when two ranges share any point in time.

The TARG's relationship table uses this type for the validity period of every edge:

-- schema/C0_entities_resolution.sql (extended for TARG)

CREATE TABLE entity_relationships (
  id              uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  matter_id       uuid REFERENCES matters(id),
  from_entity_id  uuid NOT NULL REFERENCES entities(id),
  to_entity_id    uuid NOT NULL REFERENCES entities(id),
  rel_kind        text NOT NULL,    -- 'assigned_caseworker', 'supervisor', 'counsel', ...
  valid_during    tstzrange NOT NULL,
  confidence      numeric(4,3) DEFAULT 1.0,
  source_doc_id   uuid REFERENCES documents(id),
  notes           text
);

CREATE INDEX entity_rel_valid_gist_idx
  ON entity_relationships USING GIST (valid_during);

CREATE INDEX entity_rel_from_idx
  ON entity_relationships (from_entity_id);

CREATE INDEX entity_rel_to_idx
  ON entity_relationships (to_entity_id);

The GiST index on valid_during is load-bearing. Without it, a temporal overlap query degrades to a full table scan. With it, Postgres can locate all edges that overlap a given date range in logarithmic time.

A query for all actors who had an active assigned relationship to a family between two specific dates looks like this:

SELECT
  e.canonical_label AS actor_name,
  er.rel_kind,
  er.valid_during,
  d.source_path AS source_document
FROM entity_relationships er
JOIN entities e ON e.id = er.from_entity_id
LEFT JOIN documents d ON d.id = er.source_doc_id
WHERE er.to_entity_id = '<<family-entity-id>>'
  AND er.rel_kind IN ('assigned_caseworker', 'supervisor', 'safety_plan_author')
  AND er.valid_during && tstzrange(
        '2024-03-01'::timestamptz,
        '2024-04-15'::timestamptz,
        '[)'
      )
ORDER BY lower(er.valid_during);

The [) notation specifies an interval that is closed on the left (includes March 1) and open on the right (excludes April 15, treating it as a non-inclusive upper bound). Postgres's tstzrange respects these bound conventions exactly. Use them consistently or your "active on date X" queries will be off by one record at every boundary. > ◆ Going Deeper — Bi-temporal modeling vs. valid-time-only. > > Database literature distinguishes two time axes: valid time (when a fact was true in the real world) and transaction time (when the system recorded that fact). A fully bi-temporal model lets you query "what did the system believe about this relationship on March 1, as recorded on February 15?" That is, it captures the state of the database's knowledge at any historical point, not just the state of the world. > > Meridian-Cannon uses valid-time-only for TARG edges. Transaction time is handled by a different layer: the audit_log table (hash-chained, append-only, 10_core.sql) records every write with a timestamp, and the Canon Attestations layer records the acquisition timestamp and the emission timestamp separately. If you ever need to answer "what did the system believe about this relationship when we emitted the attestation on April 5?", you can reconstruct it from the audit log and the attestation's issued_at field. > > Full bi-temporal modeling in the relationship table itself would add a second range column (e.g., system_period tstzrange) and require triggers to maintain it. It would also double the index overhead. For the current use cases — producing temporally correct timelines from a frozen evidence corpus — valid-time-only is sufficient. Bi-temporal tracking belongs in the audit layer, where it already lives. ## The running case: caseworker assignment history In the 2026 TPR proceeding, the parent's DHS casefile spans three years and contains records from four caseworkers. The transitions between caseworkers are documented in case notes, in assignment letters, and sometimes in nothing but the change of signature on successive reports. At time T1 (2023-06-01), caseworker A is assigned. At T2 (2024-01-15), the case is reassigned to caseworker B — this is documented in a case note dated January 15. At T3 (2024-03-08), a safety plan is initiated, signed by caseworker B. At T4 (2024-07-22), the case is transferred to caseworker C, as noted in a letter produced in discovery. The incident central to the proceeding occurs on March 22, 2024 — eleven days after the safety plan. The motion argues that the parent had no contact with an assigned caseworker in the thirty days preceding the incident. A static graph does not answer this. A TARG does: it returns caseworker B as the entity with an active assigned_caseworker relationship to the family on every date from 2024-01-15 through 2024-07-21, including March 22. The query result is sourced — it cites the January 15 case note as the lower bound and the July 22 letter as the upper bound. Both documents are in the corpus, both are hashed on ingestion, and the relationship edge itself cites their document IDs. The motion's factual premise is falsifiable in a single query. > ▼ Why It Matters. > > In a contested TPR proceeding, the evidentiary record is usually assembled by the petitioning agency. The parent's access to that record comes through discovery, which is frequently incomplete, late-produced, or voluminous enough to be functionally inaccessible. A TARG built from the produced record inverts that asymmetry: it makes the corpus queryable in a way that no individual document is. A factual claim about what happened on a specific date can be tested against every relationship edge in the corpus in under a second. > > For a pro-se parent with fifteen minutes before a hearing, that difference is not academic. The query either confirms or refutes the claim. Either outcome changes the hearing. ## Entity resolution: the prerequisite for a usable TARG A TARG is only as good as its entity resolution. If "caseworker B" appears in the corpus under seven different surface forms — "Ms. Johnson," "Sarah Johnson," "S. Johnson," "Sarah J.," "the caseworker," "CW Johnson," and "worker" — and those surface forms are not resolved to a single canonical entity ID, the TARG cannot answer temporal queries about caseworker B. Each surface form becomes its own node with its own disconnected edges. Entity resolution maps surface text to canonical entity IDs. Meridian-Cannon implements it in three passes: Pass 1 — Exact normalization. entity_norm() (defined in C0_entities_resolution.sql) normalizes both the surface text and the canonical label. Exact match resolves immediately. This catches "Sarah Johnson" vs. "sarah johnson" vs. "Sarah Johnson" (double space).

Pass 2 — Rule-based merge. Domain-specific rules: token subset matching (a name that is a subset of a longer form of the same name), initial matching ("S. Johnson" matches "Sarah Johnson" when both initial and surname match), and nickname lookup (a YAML table maps "Grandma" to the grandmother's canonical name, "Ms. J" to the caseworker's where context permits). This pass resolves the preponderance of within-document variation.

Pass 3 — Fuzzy merge. A composite score: phonetic similarity using double-Metaphone on the surname, and edit-distance similarity using Jaro-Winkler on the full normalized string. The composite score is:

composite = 0.55 * (metaphone_surname_match ? 1.0 : 0.0)
          + 0.45 * jaro_winkler(norm_a, norm_b)

Candidates with composite ≥ 0.85 auto-resolve. Candidates in [0.70, 0.85) go to a manual review queue. Candidates below 0.70 are treated as distinct entities. Every decision — auto or manual — is written to the entity_resolutions audit table with the action code, confidence, and source reference. The audit table is queryable; a human can review every resolution decision for any entity and roll it back if wrong.

%%| label: fig-targ-passes
%%| fig-cap: "TARG three-pass entity resolution: surface form to canonical ID"
flowchart TD
    IN["Surface-form string
e.g. 'A. Mercer', 'Alex Mercer', 'Alex'"]

    P1["**Pass 1 — Exact match**\nLookup in entity registry\nby normalized name"]
    P1_HIT(["✅ Hit → canonical_id\nreturn immediately"])
    P1_MISS["Miss → continue"]

    P2["**Pass 2 — Fuzzy match**\nJaro-Winkler + alias table\nthreshold 0.92"]
    P2_HIT(["✅ Hit → canonical_id\n+ similarity score recorded"])
    P2_MISS["Miss → continue"]

    P3["**Pass 3 — Embedding ANN**\nNearest-neighbor in pgvector\nStreamingDiskANN index"]
    P3_HIT(["✅ Hit → provisional_id\nflagged for human review"])
    P3_MISS(["⚠ No match\nCreate new entity stub"])

    IN --> P1 --> P1_HIT
    P1 --> P1_MISS --> P2 --> P2_HIT
    P2 --> P2_MISS --> P3 --> P3_HIT
    P3 --> P3_MISS

    style P1_HIT fill:#10b981,color:#fff
    style P2_HIT fill:#10b981,color:#fff
    style P3_HIT fill:#f59e0b,color:#14181F
    style P3_MISS fill:#6b7280,color:#fff

In the Wild — The Theranos investigation.

When the SEC and DOJ built their cases against Elizabeth Holmes, one of the central evidentiary challenges was reconstructing who knew what and when. Holmes's email exchanges created a dense web of relationships — investors, board members, scientific advisors, laboratory staff, regulators — that shifted over time as advisors distanced themselves from the company, board members resigned, and the scientific staff turned over.

Investigators reconstructed those timelines by hand from millions of pages of discovery. A TARG built from the produced documents would have made those timelines queryable: "which board members had an active relationship to the laboratory operations group in Q2 2014, when the specific test results at issue were certified?" The answer would have come from the corpus itself, not from a paralegal's reconstruction of a paralegal's reconstruction.

The lesson is not that Theranos investigators failed. The lesson is that the hand-reconstruction approach is fragile, expensive, and produces timelines that opposing counsel can pick apart one relationship at a time. A queryable TARG produces the same timeline with a source citation for every edge.

Phone number reuse: why handles need validity periods too

A handle — a phone number, an email address, a username — is not a permanent property of a person. Phone numbers are reassigned. Email accounts are deleted and recreated. Usernames change across platform migrations.

Meridian-Cannon's party_handles table (10_core.sql) stores the handle kind, the handle string, and first_seen_at / last_seen_at timestamps. For most use cases, this is sufficient: the handle was first seen in a document dated January 2023 and last seen in a document dated November 2023. The TARG extends this for cases where reuse is known or suspected. If the corpus contains evidence that a phone number belonged to actor A through August 2023 and to actor B from September 2023 onward, those two associations should be stored as two party_handles rows — one for A with last_seen_at before the reuse date, one for B with first_seen_at after it — not as two rows both currently associated with the same handle. This matters for communications evidence. If a text message sent from a number in October 2023 is attributed to actor A because that number appears in A's contact record, but the number had been reassigned to actor B in September, the attribution is wrong. The TARG catches this if the validity windows are set correctly and the query is temporal. > ✻ Try This. > > Given two documented events: (1) caseworker B is assigned on 2024-01-15, documented in a case note; (2) a safety plan is initiated on 2024-03-08, signed by caseworker B. Write a SQL query using tstzrange and the && operator that returns all actors who had an active relationship to the family between those two dates — that is, all actors whose relationship validity window overlaps the interval [2024-01-15, 2024-03-08). Run this against the TARG you build in Stretch Exercise 6. Confirm that caseworker B appears in the results and that the query also surfaces any other actors (supervisor, family support worker) with overlapping validity windows. Now modify the query to restrict to a single relationship kind and verify that caseworker A, whose assignment ended in January, does not appear. ## The attestation connection A TARG query result is useful in court only if it is attestable. An opposing party who receives a printout of a query result will ask: "How do we know this query ran against the original records? How do we know the timestamps weren't set by whoever built this system?" The Canon layer answers this. When a TARG query returns a relationship edge, the edge's source_doc_id points to a document in the documents table. That document was hashed on ingestion, and an ObservationAttestation was emitted at ingestion time — sealing the hash, the acquisition timestamp, and the chain hash into a signed artifact. The relationship edge is traceable to that attestation, which any recipient can verify independently. A database printout tells you what the system says. The attestation chain lets any recipient verify that the system's answer derives from documents whose content has not changed since acquisition. ## The entity resolution audit trail as evidence Every resolution decision in entity_resolutions is itself a record that can be produced in discovery. If opposing counsel challenges whether "Ms. Johnson" was correctly resolved to caseworker B rather than to some other Sarah Johnson, the resolution log shows the specific algorithm version, the confidence score, the matched surface text, the source document reference, and whether a human reviewed it. This is the right posture. Do not hide the resolution logic inside an opaque pipeline that produces a canonical graph with no provenance. Put the decisions in a table, source each decision to a document, and let the table be audited. The graph is only as trustworthy as the audit trail that produced it. > § For the Record — FRE 901(b)(9). > > "Evidence describing a process or system and showing that it produces an accurate result" may authenticate a document generated by that process or system. Federal Rule of Evidence 901(b)(9). A TARG that surfaces its resolution decisions, sources every edge to a document, and emits Canon Attestations for every acquisition satisfies this standard more completely than a static database report or a manually assembled timeline. The process is described; the system's accuracy is verifiable by any recipient. ## Building the TARG: practical sequence The TARG is not built in one pass. It accumulates as documents are ingested. The practical sequence: 1. Ingest and hash all source documents. Every document gets a source_hash and an ObservationAttestation before any processing begins. 2. Run entity extraction on each document's chunks. Named entity recognition surfaces mentions of persons, organizations, phone numbers, and email addresses. 3. Run entity resolution (three passes) to map each mention to a canonical entity ID or create a new entity if no match exists. 4. For each resolved mention that implies a relationship (caseworker-to-family, supervisor-to-caseworker, attorney-to-client), extract the temporal bounds from the document context — the document date, surrounding references to start or end of assignment, explicit dates in the text. 5. Write the relationship edge with its validity period and source_doc_id to entity_relationships. 6. Periodically re-run resolution as new documents arrive. New documents may resolve previously ambiguous surface forms.

💡Key Takeaways
- TARG (Temporal Attestation Reference Graph) answers not "who is connected to whom" but "who was connected to whom, in what role, on a specific date" — every relationship edge carries a tstzrange validity window and a source_doc_id FK to a signed ObservationAttestation. - Three-pass resolution (exact normalization → rule-based fuzzy merge → embedding ANN) is preferable to single-step because each pass catches a qualitatively different class of surface-form variation; Pass 3 uses pgvector diskann for sub-linear approximate nearest-neighbor lookup. - A provisional_id is assigned when Pass 3 produces a near-match below the auto-resolve threshold; it flags the resolution for human review rather than auto-merging, preventing false entity links from propagating through the attestation graph. - pgvector's StreamingDiskANN index in Pass 3 enables embedding-space entity matching at scale without loading the full index into RAM, critical when the entity registry spans millions of canonical labels. - Entity resolution errors compound downstream: a false merge creates a single entity_relationships node with edges that belong to two different real-world people, making every temporal query over that node return factually incorrect results.
Steps 4 and 5 are where judgment is required. Not every document states a relationship's validity period explicitly. A case note dated March 8, signed by caseworker B, implies that B was active on March 8 but says nothing about when B's assignment began. The lower bound of B's validity period comes from a different document — the January 15 case note. Connecting those two bounds requires the pipeline to correlate documents, which is exactly what the corroboration_links table (in 90_workers_correlations.sql) is built for. ## Exercises ### Warm-up 1. Open schema/C0_entities_resolution.sql. Read the entity_norm() function. What normalizations does it apply? What normalizations does it not apply (per the comment)? Why is the split between SQL and Python deliberate? 2. Examine the entity_resolutions table's action column constraint. List all valid action codes. For each, describe the scenario in which that action code would be written. ### Core 3. Write a SQL query against entity_relationships that returns all active caseworker-to-family relationships as of a specific date. Use tstzrange and the @> (contains) operator rather than &&. How does the result differ from using &&? 4. Add a row to entity_resolutions for a surface text "Ms. J" being resolved to a caseworker entity. What fields are required? Which are optional but should be populated for a complete audit trail? 5. Write a SQL query against the entity_relationships table (or your local TARG implementation) that returns every edge whose validity window overlaps with the date 2026-01-15. Use the && operator on the tstzrange column. Verify that a relationship with valid_from = '2025-12-01' and valid_to = '2026-01-01' does NOT appear in the results. ### Stretch 6. Create a small self-contained test dataset directly in Python: define five entity-relationship records where phone number +1-715-555-0147 is associated with Actor X for the period [2023-01-01, 2023-08-31) and with Actor Y from [2023-09-15, ∞). Insert these rows into a local entity_relationships table (or a Python dict keyed by tstzrange bounds). Implement the TARG overlap query using the && operator (or its Python equivalent for in-memory testing). Verify that a message sent on 2023-11-01 from that number resolves to Actor Y, and that a message sent on 2023-06-15 resolves to Actor X. Confirm that a message sent during the gap (2023-09-01 to 2023-09-14) returns no matching actor. 7. Propose a schema extension that would add a confidence field and a resolution_evidence JSONB column to entity_relationships. When would confidence < 1.0 be appropriate? How should a TARG query signal to a downstream system that a relationship edge is low-confidence? 8. Design a trigger or scheduled job that writes to the audit_log table whenever a new entity_relationships row is inserted. What fields should the payload include to make the audit entry self-sufficient? ## Build-your-own prompt For your capstone matter: identify three events in your corpus where the identity of an actor on a specific date is contested or unclear. Build TARG edges for each event, sourced to specific documents. Then write a query that answers "who was responsible for [X] on [date Y]?" for each. The goal is to produce three query results, each with a document source ID, that together constitute a temporally grounded account of what happened and who was responsible. ## Refutation Harness API (v0.2.0) The TARG chapter sits alongside the adversarial refutation harness; both feed into the attestation layer. In v0.2.0, the run_harness() function signature has been updated:

def run_harness(
    attestation: Attestation,
    backend: str = "native",
    langfuse_session_id: Optional[str] = None,
) -> Refutation:
    ...

backend selects the LLM adapter used to run challenger models: | Value | Adapter | Characteristics | |---|---|---| | "native" | EchoAdapter | Deterministic, no external calls; used in tests | | "ollama" | OllamaAdapter | Local model via Ollama; no API key required | | "openai" | OpenAIAdapter | OpenAI API; no additional dependencies | | "inspect" | run_adversarial_inspect() | Routes to inspect-ai (see Chapter 19); requires pip install meridian-canon[inspect] | | "litellm" | LiteLLMAdapter | 100+ providers via litellm; use only at the pipeline layer | The LiteLLMAdapter is available only when the [pipeline] extra is installed. It should not be used in per-attestation core library code — its broad provider surface and heavier dependency footprint are appropriate when orchestrating a full pipeline run, not when emitting a single attestation. langfuse_session_id links every LLM call made during the harness run to a single Langfuse observability session. When provided, each challenger model call is instrumented with the session ID, making the full call tree (which model, which prompt, which response, latency, token counts) queryable in the Langfuse UI. When omitted, no Langfuse instrumentation is added.

from meridian.refute.harness import run_harness

refutation = run_harness(
    attestation,
    backend="ollama",
    langfuse_session_id="session-2026-0315-001",
)

Further reading

  • Snodgrass, Richard. Developing Time-Oriented Database Applications in SQL. Morgan Kaufmann, 1999. Free PDF at cs.arizona.edu. The foundational reference for valid-time and bi-temporal database design.
  • Johnston, Tom, and Randall Weis. Managing Time in Relational Databases. Morgan Kaufmann, 2010.
  • Postgres documentation: Range Types. Chapter 8.17. The canonical reference for tstzrange, GiST indexes, and the &&, @>, and -|- operators. - schema/C0_entities_resolution.sql in this repository. - schema/90_workers_correlations.sql — the corroboration_links table, which sources validity-period bounds across documents. - meridian/witness/wrapper.pyattest_acquisition(), which emits the ObservationAttestation that grounds every TARG edge in a signed artifact.

Next: Chapter 18 — Epistemic Neutrality Masking. What happens when the extraction is biased before it reaches the attestation layer.