Hard Questions
What if the AI hallucinates and says something wrong?
The emotion default
The emotion running this question is fear of being embarrassed by something an agent says with confidence. That fear is well-calibrated — hallucinations are real, the consequences in customer-facing or partner-facing settings are real, and the instinct to refuse the technology until the failure mode is solved is rational. The default that often follows the fear is to assume the failure mode either does not exist (the vendor will not let an agent fail in public) or is unmanageable (no system can be trusted because the model can confabulate). Both versions stop the conversation. Neither matches the actual shape of how hallucinations show up and how they are caught.
The slower thinking
A hallucination is a confident generation that is wrong on a verifiable fact. They happen. They will continue to happen. The interesting question is not whether they happen but where they happen, who catches them, and what the cost is of the ones that escape.
In a Fidelic deployment, the agent's work is in front of the team in Slack. Most hallucinations are caught in the draft — the team reads the brief, sees the false claim, and corrects it before the brief ships. The cost is a draft a teammate had to fix, which is the same cost as a draft a junior teammate had to fix. The agent does not improve at hallucinating less by being trusted more; it improves by the constitution gaining specificity at the points where the agent has historically gone wrong, and by the eval suite catching the regressions.
The hallucinations to actually worry about are the ones that escape. Those have specific shapes: a fact that nobody on the team is positioned to verify, a claim that sounds right because it matches the team's existing assumptions, a citation the agent invented and the team didn't check. The right response is not to refuse the agent. It is to require citations the agent can produce on demand, to add the specific failure category to the constitution as a refusal, and to add an eval test for the failure so the next deployment of the agent fails the test rather than the customer.
There is also a category of hallucination buyers worry about that is rarer than they think: the model lies in customer-facing contexts. In a deployment where the agent does not talk to customers directly — the pattern most CS deployments use, with the agent drafting and a human shipping — the hallucination's path to a customer is mediated by a teammate who is paid to read drafts. That doesn't eliminate the risk. It changes the cost of the failures that happen from public to private. Private failures get fixed. Public failures get screenshot.
Sources
What would have to be true for the opposite to be correct
- Your team has no one positioned to verify the agent's outputs before they ship to customers
- The agent's outputs go customer-facing without a human gate in the typical case
- The constitution does not require citations the agent can produce on demand
- The eval suite is not run on a recurring cadence that catches regressions
- The published limit list is short and vague rather than specific to known failure modes
Where to next
- → For high-stakes citation work — PRAX-01, the AI Compliance Counsel
- → Why the integration role demands real context — Goldratt was right about AI
- → Read about the agent constitution and the eval suite
- → Email Fidelic AI leadership about a specific failure mode you're worried about
- → Read Anthropic's safety research on AI faithfulness and reasoning