New Delhi, Oct 12 A team of Apple researchers has questioned the formal reasoning capabilities of large language ...

Open in App

Apple researchers question AI’s reasoning ability in mathematics

New Delhi, Oct 12 A team of Apple researchers has questioned the formal reasoning capabilities of large language models (LLMs), particularly in mathematics.

They found that LLMs exhibit noticeable variance when responding to different instantiations of the same question.

Literature suggests that the reasoning process in LLMs is probabilistic pattern-matching rather than formal reasoning.

Although LLMs can match more abstract reasoning patterns, they fall short of true logical reasoning. Small changes in input tokens can drastically alter model outputs, indicating a strong token bias and suggesting that these models are highly sensitive and fragile.

“Additionally, in tasks requiring the correct selection of multiple tokens, the probability of arriving at an accurate answer decreases exponentially with the number of tokens or steps involved, underscoring their inherent unreliability in complex reasoning scenarios,” said Apple researchers in their paper titled “GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.”

The ‘GSM8K’ benchmark is widely used to assess the mathematical reasoning of models on grade-school level questions.

While the performance of LLMs on GSM8K has significantly improved in recent years, it remains unclear whether their mathematical reasoning capabilities have genuinely advanced, raising questions about the reliability of the reported metrics.

To address these concerns, the researchers conducted a large-scale study on several state-of-the-art open and closed models.

“To overcome the limitations of existing evaluations, we introduce GSM-Symbolic, an improved benchmark created from symbolic templates that allow for the generation of a diverse set of questions,” the authors wrote.

GSM-Symbolic enables more controllable evaluations, providing key insights and more reliable metrics for measuring the reasoning capabilities of models.

“Our findings reveal that LLMs exhibit noticeable variance when responding to different instantiations of the same question,” said researchers, adding that overall, "our work provides a more nuanced understanding of LLMs’ capabilities and limitations in mathematical reasoning”.

Disclaimer: This post has been auto-published from an agency feed without any modifications to the text and has not been reviewed by an editor

Marathi Language Controversy

Maharashtra News

Mumbai Local

Weather Update

Mumbai Rains

Ahmedabad Plane Crash

City

Apple researchers question AI’s reasoning ability in mathematics

New Delhi, Oct 12 A team of Apple researchers has questioned the formal reasoning capabilities of large language ...

Apple researchers question AI’s reasoning ability in mathematics

Related Stories

Cricket Smaran R powers Gulbarga Mystics to 7-wicket victory over Bengaluru Blasters

Football Durand Cup 2025: Alaaeddine Ajaraie and Gurmeet Singh among award winners

Entertainment Ganesh Chaturthi looks: Bollywood-inspired outfits to celebrate the festival in style

Football Cristiano Ronaldo completes 100 goals for Al Nassr

National Zimbabwe Vice President visits Surat, appeals for investment in diamond and cotton industry

Business Realted Stories

Business India's forex reserve at record high, credit rating improved to stable in two decades: PM Modi

Business India’s resilience and economic strength are now a hope for the world: PM Modi

Business Role of Social Purpose Organisations in advancing Viksit Bharat goal explored at DoDM

Business India-Australia concludes 11th round of CECA negotiations

Business Exclusive business chamber ‘The Imperial’ launched at Belvedere Golf and Country Club in Adani Shantigram