Moyo, Sensitive Information Reachability, The Problem and The Solution

Jan 20, 2026

Problem: Information Reachability

Take a pile of information: documents, websites, photos, posts, meeting agendas, job ads, public filings. Some of it is clean. Most of it is messy. Reachability asks a simple question. Given this base corpus, what can a machine infer? What can it conclude, with enough confidence to be operationally useful, by combining, translating, triangulating, and reasoning across what is already there?

Organizations build walls around their secret information while leaving a thousand small windows open: comms templates, marketing pages, staff bios, vendor documentation, conference presentations, property records, and a constant drizzle of semi-structured metadata. Those windows are individually harmless. Collectively, in the LLM era, they are an exfiltration channel.

Inference is an exfiltration

In classic security thinking, exfiltration means a payload leaving your network: a file, a table, a credential dump. In reachability terms, exfiltration can happen without any payload moving at all. The leak is not in a single document. The leak is in the combinability of facts. An organization insists, truthfully, that no one stole the sensitive document. The adversary also tells the truth: we never touched your systems. Both sides are correct. The conflict is about a third thing: the ability to infer.

Reach measurability

How to measure this amorphous concept in terms that cybersecurity budget owners can calculate?

— Cost: How much time, expertise, compute, and tooling does it take to reach the conclusion?
— Reliability: With what confidence does the inference hold? How often does it fail?
— Reproducibility: Can a different analyst, or a different model, get to the same conclusion from the same base?
— Distance: How many inferential steps are required from base facts to higher-value conclusions?

Reachability before LLMs

Before LLMs, reachability existed. It was constrained by human bandwidth and the friction of aggregation. That friction is now collapsing.

Pre-LLM aggregation

The classic OSINT pipeline is as follows:
— Search engines and obscure forums, sometimes in multiple languages
— Public records: contracts, filings, property documents, court records
— Academic papers, theses, conference slides
— Satellite imagery and geotagged photos
— Social media, job postings, we are hiring announcements
— Procurement notices, award statements, vendor catalogs

The limiting factor was not data availability. The limiting factor was human synthesis. You could have all the ingredients and still not have a meal, because turning ingredients into a meal required a chef: time, patience, domain expertise, and the willingness to hold fifty weak signals in your head until they converged.

How LLMs exacerbate reachability

Speed and scale

Pre-LLM, one analyst could run one line of inquiry at a time, with exhaustion as the governor. With LLMs and agents, you get parallel exploration and rapid hypothesis testing. Ten candidate hypotheses can be explored simultaneously. Contradictions can be flagged. Gaps can be targeted. The model does not get bored. It does not mind reading the procurement PDF that everyone else skipped. Security teams are used to adversaries getting faster, with better scanners, more automation, and more compute. What is different here is that the automation applies not just to exploitation, but to reasoning.

Source aggregation

Imagine Source A: a public-facing job post that mentions a new initiative and lists a few tools and collaborations. Imagine Source B: a procurement award that lists a vendor, a delivery schedule, and a location code. Each is innocuous. Together, they let an adversary infer a third thing: a timeline, a capability, or a dependency that the organization considers sensitive. That sensitivity arises not because any line says it explicitly, but because the combination collapses ambiguity.

This is the same pattern as the opening vignette. Harmless crumbs become sensitive conclusions when you can cheaply assemble them at scale.

Solution: Moyo — Red-teaming information reachability

Security teams already understand red teaming. Moyo simply shifts the object of the exercise. Instead of asking, can an attacker get in, it asks, what can an attacker deduce?

Moyo is a red-teaming system that tests how much sensitive insight can be inferred from a defined base of information, using LLM-style reasoning, so you can mitigate before adversaries exploit it.

The problem Moyo is solving

Threat model
— Our organization has a public and semi-public footprint
— An adversary may infer protected information without hacking us
— The harm comes from conclusions, not just documents

So Moyo asks:
— What conclusions become reachable?
— From which starting points?
— With what confidence and cost?
— Through what inference paths?

This is a different kind of leak audit. It is not where are the secrets stored. It is what secrets are implied by the way we present ourselves to the world.

White-box vs black-box red teaming

Moyo can be run in two modes that map cleanly onto existing security instincts.

Black-box red teaming

You treat the target like an external attacker would. Inputs and outputs only. Public-facing interfaces and allowed public data. This tests what is reachable to outsiders.

White-box red teaming

You have internal access: policies, corpora, ground truth, and maybe configurations. This lets you measure leakage against known sensitive facts and quantify how close the public footprint gets to internal truth.

The two modes answer different questions. Black-box tells you what outsiders can learn. White-box tells you how close outsiders can get to things you know are sensitive, and which combinations of public signals are doing the damage.

Leakage via inference

Define the base corpus

Public web footprint, approved documents, marketing pages, employee public posts, procurement notices, conference slides, and anything else that is in-bounds

Define protected facts or risk categories

Not necessarily classified details. Often this is operational, strategic, or sensitive business intelligence: dependencies, timelines, capabilities, locations, decision structures

Generate inference probes

The system creates questions and tasks that simulate what an adversary might try, not in a how do I do harm way, but in a what would be valuable to know way

Run iterative reasoning with evidence chaining

Moyo attempts to reach conclusions while collecting supporting evidence from the allowed corpus, tracking steps, and attempting corroboration. The key is that it does not just output an answer. It outputs the path.

Score reachability

Confidence, number of steps, required sources, novelty, replicability, and cost. A fragile one-off guess is not the same as a robust inference that any competent actor can reproduce.

Output: a reachability map

A graph: starting crumbs → intermediate claims → high-value conclusions. It highlights minimal sets of public facts that unlock the most sensitive inferences. This is the defender’s real deliverable, because it tells you where to intervene.

What defenders get out of it

A prioritized list of inference vulnerabilities

Not CVEs. Not bugs. Mosaic exposures: combinations of facts that create reachability.

Mitigation guidance

Reduce or reshape public signals. Change defaults in comms templates. Add review gates for outward-facing content. Train teams on combinations that create risk, because that is what people do not naturally see.

Continuous monitoring

Reachability is not static. Each new press release, job posting, or technical blog shifts the map. Moyo can treat publication as a change event: what did we just make newly reachable?

Why Moyo is different from traditional OSINT tools

Traditional OSINT tools collect and index. They are libraries. Moyo focuses on the inferential leap. It maps chains, measures confidence, and turns maybe into quantified risk. It treats reasoning as an attack surface.

Conclusion

Security has long been framed as guarding secrets: encrypt the database, lock down access, prevent unauthorized downloads. That still matters. But it is no longer sufficient, because the world we live in leaks in a different way. We leak not just by disclosure, but by deduction.

The modern question is not only what did we publish. The modern question is what did we make deducible.

LLMs expand reachability by compressing the cost of synthesis. They do not conjure new facts. They make old facts travel farther, faster, and with less human friction. That is an uncomfortable kind of progress, because it does not look like an attack until it already is one.

Moyo is a pragmatic response: a way for defenders to see their inference attack surface, test it like an adversary, and reduce it deliberately. In a world where harmless crumbs can be industrially assembled into sensitive truth, the responsible posture is not denial. It is measurement, followed by disciplined, boring mitigation. The kind that prevents the story in the opening paragraph from becoming a headline.

David

Discussion about this post

Ready for more?