NORAEarly Access

Part II — CS Building Blocks · Chapter 18

Information Extraction with Local LLMs

Information Extraction with Local LLMs

A schema-bound LLM is the closest thing the field has to a structured-extraction primitive. The schema is what saves you from prose.

Prerequisites

Before reading this chapter, you should be comfortable with: Chapters 8–10 (Schemas, Embeddings, Hybrid Retrieval). LLM extraction produces claims that feed into the attestation schema; retrieval-augmented extraction depends on hybrid search.

By Chapter 10, the retrieval layer has done its job. The attorney has a ranked set of chunks — SMS threads, email fragments, call log entries — that the system believes are relevant to the ten messages she is looking for. The question now is different. Not which chunks are relevant, but what facts do those chunks assert, in a form that a downstream verifier can check?

Extraction means taking unstructured text and producing a structured object: named entities, timestamps, event types, relational claims, confidence labels. Done badly, it is summarization with extra steps. Done correctly, it produces Canon Claims that carry their inference type, their source, and their uncertainty — claims a verifier can challenge without asking you to explain what the model meant.

The tool is a language model. But which model, run where, producing output in what form, subject to what post-processing — those choices determine whether the output is useful evidence or expensive hallucination.

At a glance

  • Local LLMs (Meridian-Cannon defaults to qwen2.5-7b-instruct via vLLM) keep privileged content off cloud APIs, pin the model version, and produce throughput in the hundreds of documents per minute on modern hardware. - Per-type extractors — one each for email, SMS, voicemail, audio memo, call record, and PDF — define a Pydantic-validated output schema, a constrained-generation prompt, and a post-processor that maps results to Canon-conformant Claims. - Outlines (pip install meridian-canon[outlines]) provides constrained decoding: the model cannot produce invalid JSON because invalid tokens are masked before sampling. This eliminates retry loops for extraction. Graceful fallback to standard generation if Outlines is not installed. - Two lightweight adapters ship in the core package without extra dependencies: OllamaAdapter (local Ollama) and OpenAIAdapter (any OpenAI-compatible API). LiteLLMAdapter (100+ providers) belongs in the pipeline layer, not in per-attestation extraction. - Epistemic Neutrality Masking (ENM) strips subjective, interpretive, and legally conclusory language from extracted claims before they enter the attestation layer. The source text is preserved verbatim; only the structured extraction is masked. ## Learning objectives After this chapter you can: - Explain why sending privileged documents to a cloud API may constitute a waiver and how local inference eliminates that risk. - Design a per-type extractor: Pydantic output schema, prompt template, constrained-decoding configuration. - Apply the ENM taxonomy to raw extractor output and produce masked Claims with appropriate inference_type labels.
  • Run a vLLM inference server for batched local extraction and connect it to a Python extractor.
  • Map extractor output to Canon FindingsBlock Claims with gap disclosure (R4 and R5 compliance).

Why local inference

The attorney has fifteen minutes. She has identified a set of SMS messages she needs. Some of them may contain details about custody exchange arrangements — times, locations, named parties, specific commitments. She needs those details extracted, structured, and inserted into an attestation she can hand to the court.

The naive path: paste the messages into ChatGPT, ask it to produce a JSON summary, copy the output. This path has three failure modes, each serious.

Privilege waiver. Attorney-client privilege is not an all-or-nothing designation. It applies to communications. When a document is transmitted to a third party outside the privilege — including a cloud API provider — courts have held that the act of transmission can constitute waiver if it was not reasonably protective of the confidence. The question in most jurisdictions is whether the disclosure was voluntary and whether the holder took reasonable precautions. Sending a privileged email to a commercial API server, under the API provider's standard terms (which reserve the right to use inputs for training), is a fact pattern that opposing counsel will raise. Whether a court agrees is contested. Whether you want to litigate it under time pressure is not.

Model-version instability. Extraction results depend on the model. GPT-4o-2024-11-20 may produce different structured output than GPT-4o-2025-03-15 on the same input. If you run extraction at ingestion time and verify claims at trial, and OpenAI has updated the model in between, you cannot reproduce the extraction. This violates the Canon's requirement that evidence be independently replicable.

Vendor dependency. An API key can be revoked, rate-limited, or discontinued. Evidence systems that depend on a live API for their extraction layer are fragile during exactly the moments when reliability matters most — when deadlines are imminent and the case is active.

Local inference eliminates all three. The model is pinned at a specific revision, runs on hardware the evidence system controls, and never transmits content off-device. Throughput is adequate for litigation-scale corpora: a 7B-parameter model on a single A100 processes roughly 150 documents per minute for SMS-length texts. That is the entire Meridian-Cannon corpus in under an hour.

Why It Matters — The privilege question is not hypothetical.

In 2024, the American Bar Association's formal opinion on AI tools (Formal Opinion 512) stated that lawyers must analyze whether using a particular AI tool provides adequate safeguards for client confidentiality, and must obtain client consent where adequate safeguards cannot be assured. Several state bars have issued similar guidance. A law firm that ingests a client's privileged communications through a commercial API without a specific data-processing agreement and explicit consent has a professional responsibility problem, independent of whether any waiver claim succeeds in court. Local inference is not a luxury; for many practitioners, it is the only option that clears this bar.

vLLM as the inference engine

vLLM (Kwon et al., 2023) is the production inference engine for Meridian-Cannon's extraction layer. It implements PagedAttention — a memory management approach that treats the KV cache as virtual memory, allowing much higher throughput than naive batch inference. For extraction workloads, where many documents can be processed as independent parallel requests, this matters: vLLM's continuous batching saturates GPU utilization in ways that sequential inference cannot.

The extraction layer connects to vLLM via its OpenAI-compatible API endpoint, which means the extractor code does not depend on any vLLM-specific SDK:

# workers/jobs/sms_extractor.py — simplified.
import openai

client = openai.OpenAI(
    base_url="http://localhost:8000/v1",  # local vLLM server
    api_key="not-used",
)

def extract_sms(text: str) -> SMSExtraction:
    response = client.chat.completions.create(
        model="qwen2.5-7b-instruct",
        messages=[{"role": "user", "content": PROMPT_TEMPLATE.format(text=text)}],
        response_format={"type": "json_object"},
        temperature=0.0,
    )
    raw = json.loads(response.choices[0].message.content)
    return SMSExtraction.model_validate(raw)

temperature=0.0 is deliberate. Extraction is not a creative task. The model should produce the same output on the same input every time. Replay variance — the degree to which a model diverges from its own previous output on identical inputs — is a reliability signal worth measuring; Chapter 12 discusses it in the context of adversarial validation. > ◆ Going Deeper — Constrained decoding: Outlines, grammar mode, and JSON mode. > > vLLM's response_format={"type": "json_object"} constrains the output to valid JSON but does not enforce your specific schema. The model can produce JSON that validates as syntactically correct but fails your Pydantic model. Two stronger options exist: > > Outlines (Willard & Louf, 2023) compiles a Pydantic schema to a finite-state machine and uses it to mask invalid tokens at each sampling step. The model cannot produce an invalid output — not just "probably won't." This eliminates the need for retry loops or post-hoc repair entirely. Install: pip install meridian-canon[outlines]. Use OutlinesExtractor from meridian.findings.outlines_extractor. If Outlines is not installed, the extractor falls back to standard generation + Pydantic validation, with the existing retry-once-then-null-extraction behavior. > > llama.cpp grammar mode achieves the same result through GGML grammar specification — a BNF-like format compiled to a pushdown automaton over the vocabulary. For environments where llama.cpp is preferable to vLLM (edge deployment, no GPU), this is the equivalent mechanism. > > For deployments without Outlines, Meridian-Cannon uses response_format for speed and wraps the result in a Pydantic model_validate call that raises ValidationError on failure. Failed extractions are logged, retried once with a corrective prompt, and if they fail again, they are emitted as null extractions with a gap record rather than silently dropped. ## Adapter hierarchy for per-attestation extraction Meridian-Cannon ships two adapters in the core package that have no extra dependencies beyond the standard library (urllib): | Adapter | When to use | Install | |---|---|---| | OllamaAdapter | Local Ollama server | included in core | | OpenAIAdapter | Any OpenAI-compatible API (vLLM, LM Studio, hosted OpenAI) | included in core | Both adapters are in meridian.findings and use urllib for HTTP — no openai SDK, no httpx, no requests. This keeps the per-attestation extraction layer dependency-free. LiteLLMAdapter supports 100+ providers and is powerful for multi-provider pipelines, but it adds substantial transitive dependencies (the full litellm package). It belongs in the pipeline layer (meridian/pipeline/), not in per-attestation extraction work. If you find yourself writing from meridian.findings import LiteLLMAdapter for a per-attestation extractor, you are in the wrong layer.

For per-attestation work the correct install is:

pip install meridian-canon[outlines]

This installs the core package plus Outlines constrained decoding. It does not install litellm. If you need litellm for a pipeline-layer multi-provider workflow, install it explicitly in that layer's requirements.

Per-type extractors

No single extraction prompt works well across document types. An email has a MIME structure: sender, recipients, timestamp in the Date: header, subject, body, quoted text, attachments. An SMS has participants (inferred from context in most phone exports), a timestamp, and a short message. A voicemail has a transcript (produced by the audio pipeline in Chapter 13) and a duration. A PDF page may have section numbers, captions, table entries, and cross-references. Meridian-Cannon defines six extractors: | Source type | Output schema | Key entities extracted | |---|---|---| | email | EmailExtraction | Sender, recipients (to/cc/bcc), date, subject, body entities, attachments | | sms | SMSExtraction | Participants, timestamp, locations mentioned, scheduled events, references to persons | | voicemail | VoicemailExtraction | Caller identity (if deducible), duration, transcript entities, urgency markers | | audio_memo | AudioMemoExtraction | Speaker, recording date, locations, named persons, events described | | call_record | CallExtraction | Participants, duration, initiation direction, outcome (connected/dropped/voicemail) | | pdf_page | PDFPageExtraction | Section, date, parties named, case numbers, legal citations, dollar amounts | Each schema is a Pydantic model. Each field carries a description that instructs the model what to produce in that field. The description is part of the prompt — vLLM's JSON mode includes the schema in the system context automatically when using the function-calling interface.

# meridian/canon/findings/schemas.py — SMS schema (simplified).
from pydantic import BaseModel, Field
from typing import Optional

class SMSEvent(BaseModel):
    event_type: str = Field(description="e.g. custody_exchange, meeting, pickup, dropoff")
    described_time: Optional[str] = Field(description="time as stated in message, verbatim")
    described_location: Optional[str] = Field(description="location as stated, verbatim")

class SMSExtraction(BaseModel):
    participants: list[str] = Field(description="phone numbers or names as they appear")
    message_timestamp: str = Field(description="ISO 8601 timestamp if recoverable")
    events: list[SMSEvent] = Field(default_factory=list)
    persons_mentioned: list[str] = Field(default_factory=list)
    locations_mentioned: list[str] = Field(default_factory=list)
    original_text: str = Field(description="source text verbatim, unchanged")

The original_text field is not optional. The extractor is required to return the source verbatim. This is how the Canon's R4 requirement is satisfied: every Claim must trace to a specific Observation, and every Observation must hash to the source bytes. Losing the source text in extraction would sever that chain. > ✻ Try This — What a naive extractor misses. > > Take this SMS: "we need to move the pickup, I'll be there at 4 not 3". > > Ask any LLM to extract: event type, original time, new time, participants, location. You will likely get: > - Event: custody pickup > - Original time: 3:00 > - New time: 4:00 > - Participants: (blank — the message uses "I" and "we", no names) > - Location: (blank — none stated) > > Now ask: which direction is "pickup"? Whose child? Which date? Is 3 and 4 referring to PM? To a specific recurring schedule? To a date number? The message uses "we" — is the writer one of the participants, or reporting on a group decision? > > A naive extractor produces structured output that looks complete. A careful extractor produces structured output with five gap entries, each documenting an ambiguity that cannot be resolved without additional context. The difference is whether the extraction is honest about what it does not know. ## Epistemic Neutrality Masking ENM is not a censorship layer. It does not prevent the system from recording that a caseworker wrote "the parent appeared intoxicated." It prevents that characterization from becoming a Canon Claim without appropriate labeling. Language models trained on legal and social-work documents have absorbed the vocabulary of those domains — including their conclusory shortcuts. "The parent appeared intoxicated" is a legal conclusion dressed as an observation. FRE 701 governs when lay witnesses can offer opinion testimony versus factual observation; an extraction layer that produces legal conclusions as if they were factual observations skips that adjudicative process entirely. ENM applies a four-category taxonomy to each extracted claim before it becomes a structured assertion: Hedge words. Appeared, seemed, might, possibly, reportedly. These words signal that the original author was not asserting direct observation. ENM flags these claims as inference_type: inferred_low and records the hedge word in a hedge_markers field. Emotional labels. Angry, aggressive, distraught, upset, combative. These words describe internal states that the observer is inferring from behavior. ENM reclassifies them: the Claim records the behavioral observable (elevated voice, physical agitation) rather than the emotional interpretation. The source text is preserved verbatim; the structured extraction substitutes the observable. Legal conclusions. Neglect, abuse, intoxicated, impaired, unstable. These terms have specific legal meanings that require expert foundation under FRE 702 or Wisconsin's equivalent. ENM flags them as inference_type: conclusory and inserts a requires_foundation marker. A conclusory claim does not become inference_type: observed because the text uses a decisive-sounding word.

Intensifiers. Clearly, obviously, undoubtedly. These add no information and often signal an author's advocacy rather than their observation. ENM strips them from the structured extraction. The source text retains them.

The result is a Claim that carries its epistemic status honestly:

{
  "claim_id": "clm_001",
  "text": "Caseworker noted elevated voice and pacing during home visit on 2026-01-14",
  "inference_type": "observed",
  "source_text_verbatim": "The parent appeared agitated and intoxicated during the home visit",
  "enm_transforms": [
    {"original": "appeared agitated", "masked_to": "pacing observed, elevated voice noted", "category": "emotional_label"},
    {"original": "intoxicated", "masked_to": null, "category": "conclusory", "requires_foundation": true}
  ],
  "supports": ["obs_014"]
}

The ENM transforms are logged in the attestation. A verifier can reconstruct what the masking changed. The original text is not destroyed.

§ For the Record — FRE 701 (lay witness opinion testimony).

"If a witness is not testifying as an expert, testimony in the form of an opinion is limited to one that is: (a) rationally based on the witness's perception; (b) helpful to clearly understanding the witness's testimony or to determining a fact in issue; and (c) not based on scientific, technical, or other specialized knowledge within the scope of Rule 702."

The ENM taxonomy operationalizes FRE 701's distinction. Claims that survive ENM as inference_type: observed are the ones that would satisfy (a). Claims flagged as conclusory are the ones that would require the expert-opinion path of FRE 702. ## Inference-type vocabulary Every Canon Claim carries an inference_type. This is not optional. The Canon spec (§6.5.3) defines four values: observed — The claim is directly and unambiguously present in the source text. "Sent at 14:32:07" when the source says "14:32:07" in an SMS timestamp field is observed. No inference beyond transcription. inferred_low — The claim is a plausible inference from the source, but the evidence is weak or indirect. "The sender was near downtown" when the source says "I'm heading toward the park" is inferred_low. The inference is defensible but not compelled. inferred_high — The claim is a strong inference from the source. Multiple independent signals point to the same conclusion. "The exchange was rescheduled from 3 PM to 4 PM" when the source says "I'll be there at 4 not 3" following a thread about a custody exchange is inferred_high. The inference requires context but is not speculative. speculative — The claim is the model's opinion or extrapolation with weak or no direct support. ENM requires that speculative claims be labeled as such and that the extraction gap record explain why they were included despite their low confidence. The inference-type label determines how downstream consumers can use the claim. An admissibility auditor (Chapter 26) will permit observed claims for most purposes, treat inferred_high claims as requiring corroboration, and flag speculative claims for human review before they enter any court-facing document. These four ENM labels map to Canon's inference_type vocabulary: observedOBSERVATION, inferred_low/inferred_highINDUCTION or DEDUCTION (depending on whether the inference generalizes from examples or follows logically from the evidence), speculativeABDUCTION (with explicit gap disclosure required by R5). > ◆ Going Deeper — The Inference-Type vocabulary in the Canon schema. > > The Pydantic model for a Canon Claim enforces inference_type at construction time. Passing a value not in the enum raises a ValidationError immediately — not at verification time. This is by design. The Canon's R4 requirement (every Claim must be grounded) cannot be enforced by the verifier if the claim was created without an inference type; the type must be asserted by the extractor, at extraction time, as a precondition of the claim's existence. > > The mapping from ENM category to inference type is deterministic: emotional_labelinferred_low at most; conclusory → forces human review before any inference type is assigned. The extractor cannot self-assign observed to a claim that ENM has flagged. The schema enforces this with a Pydantic validator that checks the enm_transforms list before accepting an observed label.

Gap disclosure

The Canon's R5 requirement is that an attestation must account for what it does not cover. Every extraction produces gaps — things the extractor could not determine, things that were ambiguous, things that required context unavailable in the source document.

Gaps are the extractor's honest accounting of its limits, not failures. An extractor that produces no gaps is almost certainly wrong: real documents are ambiguous, and real extractions encounter things they cannot resolve.

The Meridian-Cannon runner inserts a generic scope-limit gap if the extractor returns none:

# meridian/canon/findings/runner.py — gap enforcement.
if not extraction.gaps:
    extraction.gaps.append(Gap(
        category="SCOPE_LIMIT",
        description="Extractor did not identify specific limitation gaps. "
                    "Scope of extraction limited to fields in schema. "
                    "Relationships to other documents not assessed.",
        severity="low",
    ))

This is not a workaround. It is a policy decision: an attestation with zero gaps is a stronger claim than any extractor is entitled to make. Forcing at least one gap record documents that the runner considered the question.

Common gap categories by source type:

  • IDENTITY_AMBIGUITY — participant identity cannot be confirmed from the source alone (SMS pseudonyms, "Dad" rather than a legal name) - TEMPORAL_AMBIGUITY — the message references a time that cannot be assigned a specific date ("next Tuesday," "3 PM" without a date anchor) - DKIM_NOT_VERIFIED — email DKIM signature was not checked against the sending domain's DNS record at ingestion time - TONE_SUBJECTIVE — tone assessment requested but ENM flagged the characterization as emotional_label - TRANSCRIPT_CONFIDENCE — audio transcript has segments below the confidence threshold

Working example: custody rescheduling SMS

The running case: the parent's attorney needs to establish that a custody exchange was rescheduled from 3 PM to 4 PM on a specific date. The source SMS thread has been retrieved and chunked. Now the SMS extractor runs.

Source text (verbatim from the export):

[2026-01-09 13:41:22] Sender: 715-555-0183 "we need to move the pickup, I'll be there at 4 not 3"

Extractor prompt (simplified):

You are a structured extraction assistant. Extract factual information from
the following SMS message into the provided JSON schema. Do not infer
information not present in the message. Use verbatim quotation where possible.
Flag any ambiguity as a gap.

Message: [MESSAGE TEXT]

Schema: [SMSExtraction JSON schema]

Extractor output (before ENM post-processing):

{
  "participants": ["715-555-0183"],
  "message_timestamp": "2026-01-09T13:41:22",
  "events": [
    {
      "event_type": "schedule_change",
      "described_time": "4 not 3",
      "described_location": null
    }
  ],
  "persons_mentioned": [],
  "locations_mentioned": [],
  "original_text": "we need to move the pickup, I'll be there at 4 not 3",
  "gaps": [
    {
      "category": "IDENTITY_AMBIGUITY",
      "description": "Sender is identified only by phone number 715-555-0183. Legal name not recoverable from source.",
      "severity": "medium"
    },
    {
      "category": "TEMPORAL_AMBIGUITY",
      "description": "'4 not 3' is interpreted as a time-of-day change but no AM/PM designation or date anchor is present.",
      "severity": "medium"
    },
    {
      "category": "IDENTITY_AMBIGUITY",
      "description": "'we' in 'we need to move' implies additional parties to the decision not identified in this message.",
      "severity": "low"
    }
  ]
}

ENM post-processing: no emotional labels, hedge words, or legal conclusions are present. event_type: schedule_change passes as inference_type: inferred_high because the word "pickup" in a custody context is a well-established referent, but the extractor records that it is an inference, not a literal observation. The runner emits this as a Canon Claim attached to an Observation whose content_hash hashes to the SMS source bytes. The attorney now has a claim with an inference type, a gap inventory, and a traceable source. The verifier can check all of it independently. > ☉ In the Wild — Mata v. Avianca (S.D.N.Y., 2023). > > In June 2023, a New York federal judge sanctioned plaintiffs' counsel $5,000 for submitting a brief that cited six cases that did not exist. The cases had been generated by ChatGPT, which the attorney used without understanding that language models confabulate citations convincingly. The court found that the attorney had failed to perform any independent verification of the citations he submitted. > > The lesson is not that lawyers should not use AI tools. The lesson is that free-form LLM output — text generated without schema constraints, without source tracing, without structured output validation — cannot be treated as a reliable factual source. The attorney in Mata treated ChatGPT output as if it were a database query. It is not. It is a probability distribution over plausible next tokens, and it will produce a plausible citation whether or not that citation exists. > > Constrained-decoding extraction, paired with source tracing and Pydantic validation, is the structural difference between Meridian-Cannon's extractor and the ChatGPT workflow that produced Mata's sanctions. The schema does not prevent a model from hallucinating. But when the model produces a case_citation field, the runner immediately attempts to resolve it against a citations database. A citation that fails resolution becomes a gap, not a claim. The hallucination is caught before it reaches the attestation. ## Lab 11 — Build an SMS extractor end-to-end The lab is in labs/ch11_llm_extraction/. Deliverable 1 — Schema. Define SMSExtraction as a Pydantic v2 model. Include at minimum: participants, message_timestamp, events (list of SMSEvent), persons_mentioned, locations_mentioned, original_text, gaps. Every field must carry a Field(description=...) that could be included in a prompt. Deliverable 2 — Prompt. Write a prompt template that produces JSON-formatted output matching your schema. The prompt must instruct the model to: preserve the source verbatim in original_text; emit a gap for any ambiguity it identifies; not infer information not present in the source. Deliverable 3 — Extractor. Implement extract_sms(text: str, timestamp: str) -> SMSExtraction using a local vLLM server (or llama.cpp if GPU is unavailable). Call SMSExtraction.model_validate() on the model output. Handle ValidationError with a single retry using a corrective prompt; if the retry fails, emit a null extraction with a EXTRACTOR_FAILURE gap. Deliverable 4 — Corpus run. Process the 10 lab fixtures in labs/ch11_llm_extraction/fixtures/. For each fixture: run the extractor, run ENM post-processing, map to a Canon Claim with inference type assigned, verify R4 (each claim has a supports pointer to an observation) and R5 (each extraction has at least one gap) by running the fixture through meridian-canon walk. Acceptance criteria: pytest labs/ch11_llm_extraction/test_lab.py passes. Every fixture produces a Canon Claim that the walker accepts for steps 5 and 6. Every fixture has at least one gap record.

💡Key Takeaways
- Extraction produces Canon Claims that require downstream refutation — not verified facts — because a language model produces statistically plausible structured output, not a database query result, and the schema guarantees only structure, not truth. - Two adapters ship in the core package with zero extra dependencies: OllamaAdapter (local Ollama) and OpenAIAdapter (any OpenAI-compatible API including vLLM); LiteLLMAdapter belongs in the pipeline layer, not in per-attestation extraction, because it adds substantial transitive dependencies. - Outlines constrained decoding guarantees that the model cannot produce an invalid output because invalid tokens are masked before sampling — the model is structurally prevented from generating malformed JSON, eliminating retry loops for extraction; fallback to standard generation applies if Outlines is not installed. - Extraction prompt design matters for downstream verification because the prompt controls what inference_type and gaps fields the model populates — a prompt that does not instruct the model to flag ambiguity produces claims with empty gaps arrays that violate R5 and misrepresent the model's actual certainty. - element_type from Unstructured.io adds structural metadata (heading, table, list item, narrative text) to each chunk before extraction, enabling the per-type extractor hierarchy (email, SMS, voicemail, audio memo, call record, PDF page) to apply document-appropriate schemas and gap categories rather than a single generic prompt.
## Exercises ### Warm-up 1. Take a single email you received in the last week (redact sensitive content if needed). Write down, by hand, the entities you would extract: sender, recipients, date, subject, named persons, events referenced, locations. This is your ground truth. Now ask a local LLM to extract the same. Compare. Where does it disagree? Where does it hallucinate? 2. The ENM taxonomy has four categories. Write one example sentence for each that would trigger that category. Now write the masked version of each sentence as the extractor should produce it. ### Core 3. Implement the SMSExtraction Pydantic schema from scratch. Add a custom Pydantic validator that ensures original_text is present and non-empty. Run it against three SMS examples; confirm ValidationError is raised when original_text is absent. 4. Identify five gap categories that should appear in an SMS extractor. For each: name the category, describe when it fires, and give a one-sentence example of the gap description text. 5. Implement ENM post-processing for the emotional_label category. Given a list of extracted claims, scan each claim's text for words in the emotional-label list (maintain this list in a YAML file, not hardcoded). For each match, transform the claim as described in this chapter and append an enm_transforms entry. ### Stretch 6. Instrument your extractor with replay variance measurement. Run the same prompt three times at temperature=0. Record whether the structured output is byte-identical across runs. If it is not (vLLM at temperature=0 is deterministic in theory but may vary across batches), identify which fields vary and why. Add a gap record when variance exceeds a threshold you define. 7. Design an extractor for audio transcript segments. The input is a Whisper-produced transcript with word-level timestamps and confidence scores. What fields should the schema include that the SMS schema does not? What gap categories are specific to audio? ## Build-your-own prompt For one document type in your capstone corpus: design the extractor schema, prompt, and gap categories. Write down, for each field in the schema, what inference type the extractor should assign and why. Which fields require ENM post-processing? This schema is the first artifact your capstone needs; it determines everything downstream. ## Further reading - vLLM documentation: https://docs.vllm.ai/ - Outlines library: https://github.com/dottxt-ai/outlines. Install: pip install meridian-canon[outlines]. OutlinesExtractor is in meridian.findings.outlines_extractor. - Willard & Louf, "Efficient Guided Generation for Large Language Models" (2023): https://arxiv.org/abs/2307.09702 — the Outlines constrained-decoding paper. - Pydantic v2 documentation, validators and computed fields: https://docs.pydantic.dev/latest/ - Canon §6.5.3 — inference-type vocabulary; §8.3 — the seven-step verification protocol. - Research dossier research/04_adversarial_llm_eval.md — sycophancy, judgment biases, replay determinism literature survey.

  • Kwon et al., "Efficient Memory Management for Large Language Model Serving with PagedAttention," SOSP 2023: https://arxiv.org/abs/2309.06180
  • Mata v. Avianca, Inc., No. 22-cv-1461 (S.D.N.Y. June 22, 2023) — the sanctions opinion.

Next: Chapter 12 — Adversarial Validation & Tri-Model Consensus. The extractor produced claims. Now a different model checks them.