What if the AI hallucinates and says something wrong?

Question

Accepted Answer

A hallucination is a confident generation that is wrong on a verifiable fact. They happen. They will continue to happen. The interesting question is not whether they happen but where they happen, who catches them, and what the cost is of the ones that escape. In a Fidelic deployment, the agent's work is in front of the team in Slack. Most hallucinations are caught in the draft — the team reads the brief, sees the false claim, and corrects it before the brief ships. The cost is a draft a teammate had to fix, which is the same cost as a draft a junior teammate had to fix. The agent does not improve at hallucinating less by being trusted more; it improves by the constitution gaining specificity at the points where the agent has historically gone wrong, and by the eval suite catching the regressions. The hallucinations to actually worry about are the ones that escape. Those have specific shapes: a fact that nobody on the team is positioned to verify, a claim that sounds right because it m…

Sources