Safety & Risk

Hallucination Blocking

The real-time interception of AI-generated outputs containing fabricated facts, false citations, or unsupported claims before they reach users.

Full Definition

Hallucination Blocking is a specific capability within AI Agent Assurance that prevents fabricated, confabulated, or factually incorrect AI-generated content from being delivered to users or used as the basis for agent actions. While hallucination detection identifies potentially false outputs, hallucination blocking acts on those detections by intercepting the output before delivery and applying a configurable disposition: escalate to human review, substitute with a hedged response, request the agent to regenerate with explicit source constraints, or hard-block with an error message. Effective hallucination blocking requires multi-method detection: semantic consistency checking (comparing claims against retrieved source documents), citation verification (cross-referencing stated references against a knowledge base), confidence scoring (flagging low-certainty generations), and chain-of-thought analysis (detecting logically inconsistent reasoning traces). Hallucination blocking is particularly critical in regulated industries — healthcare, finance, legal, and compliance workflows — where a fabricated reference or incorrect figure can create direct liability. It complements but does not replace RAG architectures: even RAG-grounded agents can hallucinate in ways that standard retrieval quality checks miss.