What AI can and cannot do for a CEO right now

In a Harvard and BCG experiment with 758 consultants, AI made them about 40% better at some tasks and about 19% worse at others. The tasks looked identical. That gap is the whole game.

In a Harvard Business School and BCG field experiment with 758 consultants, published in Organization Science, the researchers found that on tasks inside AI's capability frontier, consultants were about 25% faster and produced roughly 40% higher-quality work. On a task requiring complex judgment that sat outside the frontier, AI users were about 19% less likely to reach the right answer.

The tasks looked similar. The outcomes did not.

That gap is the whole game for a CEO. Not "should I use AI" but "which tasks are on which side of the line."

The table you need

This is not a complete list. It is the seven categories where I see CEOs either win big or get burned. The structure: what it is genuinely great at, where it is not reliable, and where trusting it without a human check is actually dangerous.

Category	Great at	Not reliable for	Dangerous to trust blind
Drafting and rewriting	First drafts from your notes, tone variants, subject lines	Your specific voice without examples	Publishing to customers without a human read
Summarizing docs you provide	Condensing long reports, contracts, or threads you paste in	Ambiguous or poorly-scanned source material	Treating the summary as final for legal, financial, or HR decisions
Meeting capture and notes	Clean-audio transcription, action items, structured recap	Speaker attribution on noisy calls; what was implied	Verbatim quotes attributed to people in sensitive discussions
Research and competitive scanning	Synthesizing public info you paste in, structuring a landscape, brainstorming angles	Anything past its training cutoff	Competitor pricing, funding, or status claims without independent verification
Hiring	Structuring job descriptions, summarizing interview notes you wrote	Candidate quality beyond keywords; cultural fit	Autonomous screening decisions (documented bias against protected groups)
Finance review	Explaining what a metric means, drafting narrative around numbers you give it	Calculations it performs itself without a connected tool	Using AI-computed figures without checking
Customer-facing decisions	Drafting response frameworks, surfacing options and trade-offs	Your relationship and history context	Acting autonomously where a wrong move has legal or reputational cost

The pattern behind the table

The right column is not "AI is bad at this." It is "a wrong answer here has a cost you cannot undo."

Notice that the dangerous column often requires the same task as the great column, just applied to higher-stakes output or without a human check in the loop. Summarizing a long document is great. Treating that summary as final for a contract signing is dangerous. The task is the same. The check is what changed.

This is exactly what the BCG experiment showed. The task that broke AI users looked like the tasks where AI helped. The difference was whether the task required genuine judgment that lived outside the model's capability, and whether the user noticed they had crossed that line.

There is also a precision problem that surprises a lot of operators. Industry analyses of 2025 data show that top models are now under around 1% error on general knowledge tasks. The same analyses show materially higher error rates on specialized financial and legal work, running from high single digits to the mid-teens in percent. Lawyers have already been caught filing AI-invented case citations that looked completely real. The model generates plausible-sounding text whether it has the answer or not. That is how it works at the mechanism level.

The specific limits to hold in your head

These are not abstract caveats. They change how you use the tool every day.

No real-time knowledge. The LLM was trained on data up to a cutoff date, then frozen. It has no idea what happened after that. When it sounds current, it is often pattern-matching from older training, not reporting facts. Competitor moves, recent funding rounds, new regulations: verify independently.

Fabricates citations. When it does not know something, it does not stop. It generates the most plausible-sounding answer in the same confident voice, including inventing sources, statistics, and quotes that feel real and are not. Read more about why AI is confidently wrong. This is not a bug being patched. It is structural.

Weak at math without a connected tool. Ask it to calculate something and it generates what a calculation would look like, not a guaranteed correct result. If finance decisions rest on AI-computed figures, check the arithmetic.

No access to your private business facts. Out of the box, the model knows nothing about your customers, your team, your deals, or your history. Every session starts blank unless you supply that context. The table above assumes you are feeding it the relevant source material, not asking it to know things it cannot know.

Where to start

Pick tasks from the left column where you can read the output before it matters. That is the whole rule for a beginner.

Use AI where you can check it. Keep the human on high-stakes judgment. The BCG experiment did not find that AI was net-bad. It found that users who trusted it outside its capability frontier got worse outcomes than users who did not use it at all. Knowing the frontier is the skill.

The fastest shortcut: use AI to pressure-test a big decision rather than to make it. Feed it the context, ask for counter-arguments and failure modes, and then you decide. That is the shape where the speed and pattern-recognition of the model do the most work, and the judgment stays where it should.

When you are ready to go deeper on any single category in that table, the right next read is where a CEO should start with AI and which use cases actually pay. The table above is the map. That article is the first workflow.

What you should do next

You have the concept; the leverage is in using it. Pick one workflow from the Frontier Watch hub and run it this week. When you want the whole system, the operator guides go to the bottom of it.

The architecture behind this workflow.

Two operator manuals for the same job, run two ways: OpenCLAW for the always-on harness, Claude Code for the focused-work CLI. Pick the one that fits how you work.

Browse the books · $99 each