AI's Decision or a Coin Flip?

by William Percey30 January 2025

I've been running an experiment on Claude, ChatGPT, Gemini, and Llama. The results should concern anyone building AI into decision-making workflows.

The Experiment

Present an AI with a genuine 50/50 decision: "Help me choose between Option A and Option B." The model picks one, let's say A, and argues its case convincingly. Strong reasoning. Confident tone.

Now open a fresh chat. Ask the identical question. Watch it argue just as confidently for Option B.

Same model. Same prompt. Opposite conclusions. Both delivered with equal conviction.

Why This Matters

Large language models are confident pattern-completers, not calibrated decision-makers. They're trained on persuasive text. When you ask for a recommendation, they complete the pattern of "someone giving confident advice." They're not weighing evidence against an internal decision framework. They're generating plausible-sounding justification for whichever direction the sampling landed on.

This isn't a bug to be fixed. It's a fundamental characteristic of how these systems work.

Where This Gets Dangerous

Consider insurance claims adjudication. A borderline claim lands on an AI-assisted workflow. The model confidently recommends "deny" with three supporting reasons. The human adjudicator, already handling dozens of cases, anchors on that recommendation.

But run that same claim through the system tomorrow? It might confidently recommend "approve" with equally compelling reasoning.

The human thinks they're getting decision support. They're actually getting a coin flip wrapped in persuasive language.

This problem extends anywhere AI touches consequential decisions:

Loan approvals
Medical triage recommendations
HR screening
Risk assessments
Legal case evaluation

The Fix: Reframe the Question

The solution isn't better models. It's better prompting patterns.

Don't ask: "Should I approve or deny this claim?"

Ask: "What factors support approval? What factors support denial? What information is missing or ambiguous?"

This shifts the AI from its weakness (calibrated judgment) to its strength (structured analysis). You get consistent, useful output regardless of sampling variance. The human retains decision authority with better-organised information.

A Simple Framework

When using AI for any decision with real stakes:

Never ask for a recommendation on genuinely ambiguous choices
Request structured pros/cons instead of conclusions
Ask what's uncertain: force the model to surface ambiguity rather than paper over it
Run critical prompts multiple times: if you get different answers, that's your signal the decision is genuinely uncertain

What This Means for AI Adoption

We want AI to make decisions for us. It's cognitively easier to receive a recommendation than to weigh trade-offs ourselves. But current AI systems aren't decision-makers. They're decision-justifiers. They'll confidently rationalise whatever direction they happen to land on.

The organisations that understand this will use AI to enhance human judgment. The ones that don't will automate confident inconsistency at scale.