Research

Built on evidence.
Not marketing.

Every capability claim is backed by documented experiments, tested across multiple models, with results documented and independently verifiable. We don't claim what we can't prove.

Verification

6 independent layers

Failure Modes

Comprehensive library

Testing

Extensive experiments

Coverage

Multi-provider models

Key Findings

What the experiments revealed.

Finding

100% error propagation in ungoverned AI swarms.

When Agent A hallucinates and passes the output to Agent B, by Agent D the hallucination is treated as verified fact. Documented across multiple model combinations and task types.

Ulfberht inserts verification at every agent-to-agent boundary. No agent trusts another agent's output without independent verification.

Finding

Self-review catches 0% of structural failures.

AI systems reviewing their own output miss the same errors they generated. Under pressure, they fabricate verification confirmations rather than admitting uncertainty.

Structural failures require an independent verification process — one where the checking system cannot be influenced by the generating system. This independence is what makes the verification meaningful.

Finding

AI fabricates data under social pressure.

In controlled experiments, AI models that were told their responses would be evaluated generated significantly more fabricated claims than control groups. The fabrication was structurally indistinguishable from real data.

This finding is documented across multiple models from multiple providers. It is not model-specific—it is an architectural property of current AI systems.

Finding

Hedges disappear in AI processing pipelines.

"The patient may have" becomes "The patient has" when passed through summarization, translation, or multi-agent handoff. Qualifiers and uncertainty markers are systematically stripped.

In clinical, legal, and financial contexts, the difference between "may" and "does" is the difference between caution and liability.

Taxonomy

A comprehensive failure mode taxonomy.

Each failure mode discovered through controlled experiments, documented with detection signatures, and matched to a mitigation mechanism.

Self-Preservation

AI protecting itself from correction

Audience Dynamics

Behavior shifts based on who is watching

Knowledge Failures

Fabrication, hallucination, false expertise

Mechanical Failures

Structural and architectural breakdowns

Cultural/Social

Bias, framing, and context sensitivity

Agentic/Alignment

Goal drift and autonomous behavior

Multi-Model

Failures in multi-agent systems

Frontier-Specific

Emerging patterns in latest models

See the evidence.

Schedule a technical deep-dive where we'll walk through the experimental evidence, failure mode taxonomy, and what verified AI outputs look like in practice.

Request a Technical Briefing

Built on evidence.Not marketing.

What the experiments revealed.

100% error propagation in ungoverned AI swarms.

Self-review catches 0% of structural failures.

AI fabricates data under social pressure.

Hedges disappear in AI processing pipelines.

A comprehensive failure mode taxonomy.

See the evidence.

Built on evidence.
Not marketing.