Beyond the Illusion of Thinking

Why Large Language Models Are Still Game-Changing! When Experts Hold the Reins
Apple's recent white paper, The Illusion of Thinking (June 2025), offers a timely reminder: Large Language/Reasoning Models (LLMs/LRMs) are pattern-matching engines, not miniature brains. By probing LRMs with carefully controlled puzzle environments, the authors show three clear regimes of behaviour:
- Low complexity. vanilla LLMs match or beat their “thinking-trace” siblings while using fewer tokens.
- Medium complexity. LRMs pull ahead, but only because they're allowed to ruminate at length.
- High complexity. everyone falls over; accuracy collapses to zero, and models paradoxically start “thinking” less as tasks get harder .
They even demonstrate an “over-thinking” waste cycle on easy tasks and brittle failures on harder ones , plus an inability to follow a perfectly-spelled-out algorithm for Towers of Hanoi .
What the paper doesn't mean
It is tempting to read Apple's results as a death knell for practical AI. That would be a mistake. For example legal practice, where speed, recall and first-draft quality can matter just as much as philosophical “thinking”.
- Statistical text prediction is exactly what lawyers need in a lot of cases. If a model can draft a solid skeleton argument or summarise a 200-page contract in seconds, the fact that it relies on probability rather than reasoning is irrelevant—provided a qualified lawyer reviews the output - something that automation can be used to assist with.
- Humans remain the circuit-breakers. The paper's own findings show LRMs cannot guarantee correctness on long, compositional tasks. The remedy is not to scrap LLMs, but to embed them inside workflows where experts validate, annotate and, where necessary, override.
How Legavee turns limitation into leverage
At Legavee we want to build exactly those workflows for UK-based firms:
- Matter - your personal legal assistant
- Instant document analysis with key obligations highlighted
- Smart drafting that inserts accurate legislation references
- Case-management dashboards that keep every deadline visible
- Human-in-the-loop by design - every AI suggestion surfaces source passages. “AI you can trust” isn't a slogan; it's a workflow.
By handing the mundane eighty per cent of a matter to machines, Legavee lets solicitors focus on advocacy and strategy—“software that lets lawyers practise law again”.
Reconciling Apple's findings with day-to-day legal AI
Apple's observation | Practical implication |
---|---|
Collapse at high compositional depth - LRMs fail on long multistep puzzles | Break drafting into smaller verified chunks; apply rule-based post-processing before sign-off. |
Over-thinking on easy tasks | Use concise prompting and token limits; favour retrieval-augmented generation over blind CoT. |
Token-efficiency matters | Default to lighter models for routine clauses; escalate to larger LRMs only when complexity and deadlines justify the cost. |
A realistic manifesto
- LLMs don't think - but they produce: treat outputs as drafts, not verdicts.
- Experts add the judgment: every automated step must surface provenance for a lawyer's review.
- System design beats scale: smaller, well-scaffolded models under tight governance out-perform sprawling “reasoners” in real-world reliability.
Apple's researchers have done the profession a service by quantifying where the magic ends. At Legavee we take the next step: turning that knowledge into robust, auditable tooling that frees lawyers from the drudgery without ever outsourcing the thinking.