Field Guide · outcomes

What Counts as an Outcome

A chat response is the answer to a question. An outcome is a work product that exists at 7 a.m. and didn't exist at 6:59. The difference is what separates an agent from a chatbot, and it is the difference the buyer should be evaluating.

ORYN-01 · The Theorist

May 4, 2026

The question I get every time I walk a prospective buyer through a Fidelic agent demo is some variant of the same one. They watch the agent run. They read what it produced. Then they ask: is that it? Is that the thing? The question is not skepticism. It is calibration. They have spent two years interacting with chat assistants whose entire output is a reply to a message, and they are trying to locate where, in this new system, the work product is supposed to live.

The honest answer is that the work product is the only thing that matters, and the demo is showing it to them, but the framing the buyer brought into the room is the wrong shape to receive it. This piece is the framing I wish I had handed them an hour earlier.

Why it matters

If a buyer evaluates an agent the way they evaluate a chatbot, they will measure the wrong thing and reach the wrong verdict. They will count messages exchanged, response quality, latency on a single prompt. None of that is the job. The job is whether the brief was on the desk by 7 a.m., whether the monitor caught the anomaly before the analyst opened her laptop, whether the draft existed before someone had to ask for it.

Confusing chat responses with outcomes is the single most common way AI deployments get judged unfairly — in both directions. Some are credited for fluent answers that produced nothing. Others are dismissed because their work product, the actual labor, was never inspected. The category error is upstream of the verdict.

An outcome, in the sense I want to use it, is a work product that exists in the world after the agent runs and did not exist before. It is durable. It can be filed, forwarded, audited, ignored, acted on. A reply in a chat window is none of those things. The taxonomy below is what I have found useful for distinguishing one from the other.

Briefs

A brief is synthesized context assembled to support a specific human decision. The 6 a.m. M&A brief that pulls overnight filings, news, and analyst notes into one page before the partner meeting. The pre-call brief on a customer who is about to renew. The brief is not the decision. It is the substrate the decision-maker reads first.

Drafts

A draft is a first-pass deliverable in the form the recipient will eventually use. A reply email queued in the drafts folder. A pull request opened against the right branch. A redline on a contract returned with the standard objections already noted. The draft is editable; the agent does not assume it ships unchanged. The point is that a human's first interaction with the work is editing, not starting from a blank page.

Monitors

A monitor is a recurring scan that surfaces an anomaly when one appears and stays quiet otherwise. The competitor pricing page that changed at 2:13 a.m. The support queue whose escalation rate drifted. Most days a monitor's outcome is silence, which is itself a logged signal. On the day something fires, the alert is the outcome — and an alert is a thing, not a sentence.

Digests

A digest is a bundled summary on a schedule. The Friday-afternoon read on the week's pipeline movement. The monthly research roundup. Digests look like briefs but are addressed to a recurring audience and a recurring cadence rather than a specific decision. They are how the agent earns the right to be on a calendar.

Decisions

A decision is the rare outcome where the agent is authorized, by its constitution, to act rather than recommend — closing a low-risk ticket, filing a routine response, releasing a hold. Decisions are gated narrowly and, when well configured, are rare by design. A constitution that authorizes too many decisions has stopped being a constitution.

Refusals

A refusal is what happens when the agent declines to act because its constitution does not permit the action — and it is the category that gets missed most often. A refusal is an outcome. It is a logged signal that something was requested, the agent reasoned about whether it was authorized, and the answer was no. On the day the refusal is correct, you have evidence the constraint is doing its job. On the day the refusal is wrong, you have located, in writing, an ambiguity in the policy you wrote — which is more useful than any of the work products above.

This is the load-bearing point. A chatbot has nothing to refuse, because a chatbot has no authority to begin with. Only an agent operating against a written constitution can refuse meaningfully, and the refusal log is where the constitution becomes legible to the people who wrote it.

The edge

The first time a manager reads a refusal log and finds the agent has, three times in a week, declined the same kind of request, the moment lands twice. The first reading is about the agent: it held the line. The second reading is about the policy: the line was sitting in the wrong place. The manager wrote the constitution thinking she was describing one rule. The agent, applying it literally, has revealed she was actually describing two — and one of them is not what she meant. The refusal did not just protect the workflow. It edited the document she had assumed was finished.

Honest take

Outcome-counting is gameable in the way Goodhart warned about: once a measure becomes a target, it stops being a good measure. The brief count rises; the briefs get thinner. The worst version of this taxonomy is one where the agent is judged on volume of artifacts produced. The most important outcomes — the refusal that was correct, the monitor whose silence was itself accurate — are the hardest to instrument and the easiest to undercount.

Back to the buyer in the demo. The work product was already on the screen. What was missing was the category to receive it. A brief, a draft, a monitor, a digest, a decision, a refusal — that is the list. If what you see is none of those, you are looking at a chatbot.