“It found that standard AI models outperformed LRMs in low-complexity tasks, while both types of model suffered “complete collapse” with high-complexity tasks.”

– Dan Milmo 

A study by Apple researchers suggests there may be an upper threshold on the abilities of next generation AI models known as large reasoning models (LRMs). The Guardian says the advanced reasoning models in the study included OpenAI’s o3, Google’s Gemini Thinking, Anthropic’s Claude 3.7 Sonnet-Thinking and DeepSeek-R1.

An analyst contacted by The Guardian, Andrew Rogoyski, of the Institute for People-Centred AI at the University of Surrey, said the study points to a potential “cul-de-sac” in AI development. The overall objective is having models generalize from their outcomes – being able to apply a solution in one situation to other, similar situations. The Apple study, said Rogosyski, suggests when it comes to complex problems, it appears that the LRMs can “lose the plot.”

The Apple study is freely available online.

Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds | THE GUARDIAN | June 9, 2025 | by Dan Milmo

SEE FULL STORY

LATEST

Discover more from journalismAI.com

Subscribe now to keep reading and get access to the full archive.

Continue reading