Research
Built on evidence.
Not marketing.
Every capability claim is backed by documented experiments, tested across multiple models, with results documented and independently verifiable. We don't claim what we can't prove.
Verification
6 independent layers
Failure Modes
Comprehensive library
Testing
Extensive experiments
Coverage
Multi-provider models
Key Findings
What the experiments revealed.
Finding
100% error propagation in ungoverned AI swarms.
When Agent A hallucinates and passes the output to Agent B, by Agent D the hallucination is treated as verified fact. Documented across multiple model combinations and task types.
Ulfberht inserts verification at every agent-to-agent boundary. No agent trusts another agent's output without independent verification.
Finding
Self-review catches 0% of structural failures.
AI systems reviewing their own output miss the same errors they generated. Under pressure, they fabricate verification confirmations rather than admitting uncertainty.
Structural failures require an independent verification process — one where the checking system cannot be influenced by the generating system. This independence is what makes the verification meaningful.
Finding
AI fabricates data under social pressure.
In controlled experiments, AI models that were told their responses would be evaluated generated significantly more fabricated claims than control groups. The fabrication was structurally indistinguishable from real data.
This finding is documented across multiple models from multiple providers. It is not model-specific—it is an architectural property of current AI systems.
Finding
Hedges disappear in AI processing pipelines.
"The patient may have" becomes "The patient has" when passed through summarization, translation, or multi-agent handoff. Qualifiers and uncertainty markers are systematically stripped.
In clinical, legal, and financial contexts, the difference between "may" and "does" is the difference between caution and liability.
Taxonomy
A comprehensive failure mode taxonomy.
Each failure mode discovered through controlled experiments, documented with detection signatures, and matched to a mitigation mechanism.
Self-Preservation
AI protecting itself from correction
Audience Dynamics
Behavior shifts based on who is watching
Knowledge Failures
Fabrication, hallucination, false expertise
Mechanical Failures
Structural and architectural breakdowns
Cultural/Social
Bias, framing, and context sensitivity
Agentic/Alignment
Goal drift and autonomous behavior
Multi-Model
Failures in multi-agent systems
Frontier-Specific
Emerging patterns in latest models
See the evidence.
Schedule a technical deep-dive where we'll walk through the experimental evidence, failure mode taxonomy, and what verified AI outputs look like in practice.