The Cognitive Security Verification Framework

May 22, 2026

The Cognitive Security Verification Framework, or CSVF, is a draft verification framework for semantic leakage, cross-domain inference, and LLM-enabled information exposure.

The core problem is that LLM systems do not merely retrieve documents. They connect prompts, memory, and prior context into conclusions. While he security question was “Can a user access this file?” It has now become “Can this system derive a conclusion that policy says should remain out of reach?”

Pieces of this problem already appear in areas like differential privacy, membership inference, embedding inversion, model inversion, and repeated-query attacks. CSVF brings those concerns into a practical governance and assurance frame for deployed LLM systems.

CSVF is intended to be published as an open-source framework so its definitions, controls, and test harnesses can be scrutinized, criticized, improved, and extended in public. The project is here. I am also attaching my full, initial Harvard Kennedy School Policy Analysis Exercise.

CSVF Purpose

CSVF’s purpose is to establish common principles, controls, measurements, and evidence expectations so organizations can evaluate LLM information reachability consistently. Compared to existing frameworks, CSVF adds a missing operational layer focused on inference boundaries, unreachable conclusions, repeatable testing, and procurement-grade evidence.

Draft tenets:
— Materiality of secrets: focus controls on information whose compromise would matter legally, financially, operationally, competitively, or for national security.
— Reachability: govern not only what data the system stores or retrieves, but what conclusions it can produce.
— Conservatism: when uncertain, prefer under-exposure to over-exposure.
— Boundary clarity: define domains, permitted joins, prohibited joins, and high-sensitivity joins before deployment.
— Auditability: controls must produce objective evidence.
— Understandability: outputs must be legible to engineers, CISOs, auditors, buyers, boards, and courts.

Anchor Points for Interoperability

— NIST AI RMF and GenAI Profile as the governance and risk-management spine.
— OWASP LLM Top 10 and OWASP GenAI Data Security as the developer-facing risk and mitigation canon.
— MITRE ATLAS as the adversary-informed threat model.

Core CSVF Concepts

1. Domain Inventory and Join Matrix

Organizations should identify the information domains their LLM systems touch: public, internal, HR, legal, finance, export-controlled engineering, privileged legal, customer data, classified or classified-adjacent material, and so on.
They should then define which joins are allowed, which are prohibited, and which require approval.
The old question was: “May this user read this object?”

The new question is: “May this system combine domain A, domain B, tool C, and memory D in one inferential workflow?”

2. Unreachable Statement Classes

Organizations should define classes of conclusions that must not become reachable.
This is different from blocking exact strings. In LLM systems, the protected thing is often not a sentence. It is a meaning.

A system may never reveal a secret verbatim, but still disclose the protected conclusion through paraphrase, summary, translation, ranking, forecast, or synthesis. CSVF calls these prohibited semantic outcomes Unreachable Statement Classes, or USCs.

3. Boundary Enforcement Map

Organizations should document where the cognitive security boundary is actually enforced.
Is enforcement happening at retrieval? Context assembly? Tool invocation? Memory write? Output validation? Human review?
CSVF forces organizations to stop treating “the AI system” as a black box and instead map where policy becomes technically real.

4. Evidence Packs

CSVF requires assurance artifacts that show the system’s cognitive boundaries are defined, enforced, tested, and monitored.
An evidence pack should include the domain inventory, join matrix, USC catalog, boundary enforcement map, test results, telemetry, release-gate records, incident records, purge playbooks, vendor-control artifacts, and explicit risk acceptances. The attached PAE frames this as a way to make cognitive security “an auditable operational condition,” not an abstract claim.

CSVF Control Families

Family A. Governance and Accountability

— Appoint a cognitive security owner.
— Maintain a cognitive security risk register.
— Define ownership for domain boundaries, join approvals, and residual-risk acceptance.
— Include export-controlled technical data, legal privilege, regulated data, trade secrets, and other high-consequence categories where relevant.

Family B. Domain Modeling and Boundary Claims

— Define the information domains in scope.
— Define allowed, prohibited, and approval-gated joins.
— Create Unreachable Statement Classes.
— Build a Boundary Enforcement Map showing where controls operate.

Family C. Data Classification and Secret Handling

— Classify and label sensitive material before ingestion.
— Propagate labels to chunks, embeddings, caches, memory layers, prompts, and fine-tuning corpora.
— Quarantine ambiguous or unlabeled material rather than defaulting it into general-purpose AI workflows.

Family D. Context, Retrieval, and Memory Controls

— Enforce least-privilege retrieval through RBAC or ABAC.
— Use session information budgets to cap cumulative sensitivity in context.
— Scope memory by user, domain, purpose, and retention window.
— Prevent agents from silently widening the system’s reachable conclusion space.

Family E. Exfiltration Controls

— Use semantic output validation, not only keyword scanning.
— Instrument canaries, honeytokens, and honey ideas.
— Maintain revocation and downstream purge playbooks for vector stores, caches, prompt logs, and memory layers.

Family F. Unauthorized Domain Reach Controls

— Require high-sensitivity join approvals.
— Test whether restricted conclusions can be derived from permitted inputs.
— Prohibit LLMs from making authorization decisions.
— Monitor for reachability drift after changes to models, prompts, retrieval, tools, connectors, or memory.

Family G. Cloud Prompting Governance

— Define what data may never be submitted to consumer or unapproved cloud LLMs.
— Require approved enterprise or API pathways where sensitive data is involved.
— Use logging and DLP integration for prompt flows where feasible.
— For the most sensitive workflows, require locally controlled models on organization-owned or organization-controlled hardware.

Family H. Assurance and Reporting

— Maintain evidence packs.
— Run standardized test harnesses at release gates and after material system changes.
— Produce SOC-style management assertions or auditor-facing reports where appropriate.
— Make boundary claims legible for procurement, compliance, and board oversight.

Measurement Layer

CSVF should not stop at “do AI risk management.” It should define unit-testable numbers.
The proposed metrics remain draft verification measures, not final industry metrics. They are useful because they force the framework to become testable, but they still need formal definitions, standardized adversary protocols, thresholds, and validation across real deployments. The PAE makes this provisional status explicit.

Illustrative draft metrics:
— LER, Leakage Event Rate: the rate at which seeded protected secrets or protected meaning appears in outputs, weighted by materiality.
— CRS, Crawl-Resilience Score: how well the system resists persistent, repeated, or multi-session extraction attempts over time.
— JRS, Jailbreak/Injection Resistance: baseline success or failure rate against OWASP-style jailbreak and prompt-injection suites.
— DIR, Domain Inference Risk: the percentage of test runs in which the system derives an out-of-domain conclusion using only in-domain inputs under defined boundary conditions.
DIR is CSVF’s central added metric because it operationalizes reachability. It asks whether prohibited conclusions become available as prompts, tools, sources, retrieval settings, and model capabilities evolve.

Mitigations Catalogue

The Mitigations Catalogue will translate CSVF’s abstract control goals into concrete defensive options that organizations can select, test, and document. Rather than treating mitigation as a generic checklist, the catalogue should organize controls by failure mode: exfiltration, unauthorized domain reach, cloud prompting, retrieval overreach, memory persistence, tool misuse, and post-incident containment. Each mitigation should include a plain-language description, the risk it addresses, implementation guidance, required evidence, testing methods, and limitations. For example, upstream classification, least-privilege retrieval, session information budgets, Unreachable Statement Class testing, canary deployment, downstream purge playbooks, and local-only model deployment should each appear as catalogued options with clear ownership and measurable expectations. The purpose is to make CSVF usable in practice: engineers can build against it, CISOs can prioritize it, auditors can test it, and buyers can ask vendors for proof rather than promises.

Why Open Source CSVF

CSVF should be open sourced because this problem is too broad for a single vendor, company, or author to solve alone.

Open development can help red teamers contribute attack patterns, engineers contribute implementation lessons, GRC teams contribute evidence models, lawyers contribute assurance language, and sector specialists contribute use cases from healthcare, finance, defense, education, and government.

Open sourcing also matches the adoption theory behind CSVF. The framework is meant to earn legitimacy from the bottom up by being useful, testable, and improved in public.
The open-source project will be available here.

Closing

CSVF argues that LLM-era security must treat meaning, joins, and inference as first-class security objects.

The framework does this by requiring organizations to define domains, permitted joins, prohibited conclusions, enforcement points, test methods, and evidence packs. It also insists that the right question is not only whether sensitive data appears in an output, but whether protected meaning has become reachable at all.

CSVF is not a finished answer. It is a draft roadmap toward a future standard of care for cognitive security, one that makes inference boundaries legible, testable, and auditable before ambient AI systems make those boundaries disappear into ordinary organizational life.

Cognitive Security Verification Framework

274KB ∙ PDF file

Download

David

Discussion about this post

Ready for more?