DESK · THEORY
WorkflowBeginner · June 4, 2026 · 7 min read
On this page

How to check AI's work: a CEO's 3-step verification habit

A lawyer was fined after most of the citations in their AI-drafted brief turned out not to exist, and they never opened a single one.

Courts have repeatedly sanctioned lawyers for briefs loaded with AI-fabricated case citations. In one case, a lawyer was fined after the majority of citations didn't exist. The AI never flagged a thing. It wrote those fake citations the same way it writes everything: fluent, specific, and completely sure.

You are not filing briefs. But you send memos, board decks, and market claims. Same trap.

This is the practical companion to why AI sounds confident when it's wrong. That piece explains the mechanism. This one is the habit you build to catch it before it ships.

What you'll have when you're done

A three-step mental check that takes under two minutes on any piece of AI output. You will know which parts of an AI draft need a real look, how to ask for a source and why you have to actually open it, and which single fact to cross-check when you only have time for one.

Most AI output is fine to ship. This habit is for the small slice that isn't, and for knowing which slice that is.

The cost of skipping it

The model sounds exactly as confident when it is wrong as when it is right. A fabricated statistic and a real one arrive in the same fluent prose, same formatting, same certainty. There is no tone shift to catch. This is not a flaw the model will fix; it is structural. A large language model is a next-word predictor trained to produce plausible text, not to flag its own uncertainty.

That is the reason you need a structural check, not a gut-feeling check. Why it works this way is worth understanding once. You do not need to re-learn it every time you use the model. You need a fast habit.

What you need first

That is it. No special tools. No domain expertise required.

Step by step

Step 1Scan for claim types, not sentences

Most AI output is low-risk. Scan for the narrow slice that isn't.

The majority of what AI drafts for you is restructuring, summarizing, or composing from your own words. That output does not need a fact-check. It needs an edit.

The parts that need checking are specific and narrow:

Do one five-second pass over the output. Flag those items first, the way you would circle a number in a document before signing it. Everything else you can edit and ship.

The discipline here is not "read slowly." It is "know what kind of thing needs a check."

Step 2Ask for the source, then open it

The source has to exist and say what the model claims it says. A citation you do not open is not a check.

For any flagged claim, ask: "What is your source for this, and how confident are you?"

The model will often give you a source. Sometimes a real one. Sometimes a plausible-sounding fabrication: a real journal name, a real author, a wrong title, a wrong year, or a paper that doesn't exist. Models fabricate citations the way they generate everything else, confidently and without announcement.

So open it. Actually navigate to the document, the article, the case, the study. Does it exist? Does it say what the model told you it says?

If it doesn't, treat the claim as unverified. Either cut it, find the real source yourself, or label it as an estimate rather than a fact. Do not let a citation you haven't opened earn the credibility of a citation you have.

This is the step most people skip. Asking "where did you get this?" is not the check. Opening the answer is.

Step 3Cross-check one load-bearing fact against reality

Pick the single claim that would do the most damage if wrong, and check it against a real external source.

You do not need to verify every flagged item every time. But in most pieces of AI output there is one fact that is doing real work: the market size in the deck, the number in the memo, the regulatory requirement in the contract summary. If that one is wrong, the whole output is compromised.

Find it. Then check it against a search, the primary document, or a person who actually knows. Not a second AI (which can fail in the same direction). A real external source.

If that load-bearing fact holds up, the rest of the output is probably fine. If it fails, treat the whole piece as suspect and run a broader check before it ships.

One solid anchor is enough to trust the rest. One failed anchor means you cannot trust anything until you look.

How you'll know it's working

You start catching things before they leave your desk, not after.

The first time the habit saves you is often jarring. You ask for the source on a statistic in a market analysis. The model gives you a citation. You open it. The paper exists, but it says nothing of the sort. You would have sent that number to your board.

That catch, done once, rewires how you read AI output. You stop seeing fluent prose as accurate prose. You start seeing it as a draft from a very capable assistant who occasionally makes things up with total conviction.

The other signal: your team starts running the check on their own when they know you will ask. "Where did you get this?" becomes a standing question in your org, not a gotcha. That is the habit scaling.

When it breaks

The check doesn't work if nobody owns it.

The most common verification failure isn't a bad model. It's no one knowing whose job it was to check. AI output travels fast: drafted in a chat window, pasted into a deck, sent to a client. Somewhere between the model and the recipient, the check that should have happened didn't.

Say the rule out loud to anyone whose AI-assisted work reaches you: "If it has a number, a name, a date, or a citation, check it before it gets to me." That sentence, said once, is more powerful than any internal AI policy document. It makes ownership explicit.

Most verification failures happen because no one said whose job it was, not because no one cared.

The triage test

Before you run the three steps, run this: "Would I stake real money or my reputation on this if it turns out to be wrong?"

If yes, run the three steps.

If no, and the output is low-stakes and reversible, ship it and move on. The habit is not to verify everything. That would be slower than doing the research yourself. The habit is to verify the things that can hurt you, and to do it fast.

The full map of what AI is and isn't reliable for is worth reading once to calibrate the stakes. And if you are deciding what data to put into these tools in the first place, what's safe to put into AI covers that question directly.

Level up

You do not need subject-matter expertise to run this check. The check is structural: does a real source actually exist and say what the model says it says? That is a question anyone can answer. A non-expert can catch the exact errors that slip past domain experts, because the domain experts are too familiar with the general territory to notice a specific claim that sounds right but isn't.

The three steps are the foundation. Once they are habit, the next move is building a workflow for the narrow category of tasks where you want a human in the loop before any AI output reaches a decision. Not because AI is wrong, but because some stakes are high enough that you want that structure regardless.

Start with the triage test. Run it on the next piece of AI output you open today. That one question, asked once, is how the habit starts.

Want to maximize your AI leverage? Upgrade to Pro.

The Thursday 3

Get three workflows like this every Thursday

The Thursday 3 is a free weekly email. Three workflows that put you in the top 1% of CEOs. 90-second read. Every card links back to a step-by-step guide like this one.

Get the newsletter →
The Desk Theory books

The architecture behind this workflow.

Two operator manuals for the same job, run two ways: OpenCLAW for the always-on harness, Claude Code for the focused-work CLI. Pick one, or get the bundle for $149.

Browse the books · $99 each

Want one workflow like this taken apart end-to-end every week? The Tuesday Pro Deep Dive · $39/mo.