×
OpenAI’s o3 model is acing AI reasoning tests–but it’s still not AGI
Written by
Published on
Join our daily newsletter for breaking news, product launches and deals, research breakdowns, and other industry-leading AI coverage
Join Now

The race for artificial general intelligence (AGI) continues as OpenAI’s latest o3 model achieves remarkable scores on a key reasoning test, though experts maintain it falls short of true human-level intelligence.

Breaking development: OpenAI’s new o3 model has achieved a breakthrough score of 75.7% on the Abstraction and Reasoning Corpus (ARC) Challenge, a test designed to evaluate AI systems’ pattern recognition and reasoning capabilities.

  • The model demonstrated unprecedented task adaptation abilities not previously seen in GPT-family models
  • The official score was achieved within the competition’s computing cost limit of $20 per puzzle task
  • An unofficial score of 87.5% was reached using significantly more computing power, surpassing the typical human score of 84%

Technical details and constraints: The ARC Challenge tests AI systems’ ability to identify patterns in colored grid puzzles while operating within specific computational limitations.

  • The “semi-private” test, used for public rankings, allows computing costs up to $10,000 total
  • A more stringent “private” test, used for determining grand prize winners, limits computing costs to 10 cents per task
  • O3’s unofficial high score required 172 times more computing power than its official attempt, with costs reaching thousands of dollars per task

Expert perspectives: Leading AI researchers and competition organizers maintain that while impressive, this achievement does not constitute AGI.

  • François Chollet, ARC Challenge creator, describes it as an important milestone but not AGI
  • Melanie Mitchell of the Santa Fe Institute argues that solving tasks through computational brute force defeats the challenge’s purpose
  • Thomas Dietterich from Oregon State University notes that commercial AI systems still lack crucial components of human cognition, including episodic memory and meta-cognition

Industry implications: The achievement comes during a period of perceived slowdown in AI advancement compared to the rapid developments of 2023.

  • The results suggest AI models could soon legitimately beat the competition benchmark
  • Multiple submissions have already scored above 81% on the private evaluation test set
  • Competition organizers are planning a more challenging benchmark test for 2025

Looking ahead: While o3’s performance represents significant progress in AI capabilities, key questions remain about the model’s methodology and true understanding of the tasks it completes.

  • Researchers await open-source replication to fully evaluate the achievement’s significance
  • The ARC Prize 2025 challenge continues until someone achieves the grand prize with an open-source solution
  • The gap between computational problem-solving and true human-like reasoning remains a central challenge in AI development
OpenAI's o3 model aced a test of AI reasoning – but it's still not AGI

Recent News

Nvidia launches AI tool to generate images from 3D scenes

Nvidia's new tool enables precise control over AI-generated images through 3D scene layouts, addressing the spatial limitations of traditional text-prompt systems.

SaaStr 2025 unites top cloud, B2B and AI leaders in SF Bay

Featuring over 15,000 attendees and 500 speakers, the three-day event will highlight proven strategies from executives who have built successful cloud businesses rather than theoretical AI discussions.

Visa develops AI-powered cards for seamless automated purchases

Visa's platform allows AI assistants to execute transactions using tokenized credentials within user-defined parameters, eliminating payment friction in automated shopping.