Aug 19, 2025, 12:00 AM

OpenAI struggles with ARC AGI 2 test despite high ARC AGI 1 scores

American artificial intelligence research organization

Highlights

Francois Chollet developed the Abstraction and Reasoning Corpus, known as ARC AGI, in 2019 to test AI reasoning capabilities.
OpenAI's model o3 scored 85% on ARC AGI 1 but only 3% on ARC AGI 2, highlighting difficulties with complex tasks.
The results of these tests indicate the ongoing challenges and milestones in the pursuit of achieving true AGI.

Story

In 2019, Francois Chollet introduced the Abstraction and Reasoning Corpus for Artificial General Intelligence, known as ARC AGI, which aimed to assess AI's reasoning capabilities rather than memory recall. Since then, there have been several iterations, including the recently released ARC AGI 2 and the upcoming ARC AGI 3. The current testing phase for ARC AGI 3 presents cryptic games with no instructions, allowing participants to intuit the game's expectations. OpenAI's model, o3, achieved impressive results on the ARC AGI 1 test, gaining about 85%, but faced significant challenges with ARC AGI 2, scoring only 3%. This contrast raises questions about the effectiveness of AI in handling more complex reasoning tasks. Chollet emphasizes the importance of distinguishing between static memorized skills and fluid general intelligence, which requires adapting to new situations and solving unfamiliar problems. The ARC tests serve as benchmarks to gauge the progression towards achieving true AGI, offering insights into how far AI has advanced in recent years. Chollet's work has sparked critical discussions surrounding the definitions of intelligence within AI. While some argue that true AI mimics human cognitive processes, others, including Chollet, advocate for a definition focused on adaptability and problem-solving capabilities. The implications of the ARC AGI tests are profound, as they could indicate how close we are to achieving the singularity, a pivotal moment where AI surpasses human intelligence. Understanding these various measures of intelligence is vital for further development in the field and can steer innovations towards more advanced AI technologies. Additionally, Chollet's approach emphasizes the need for AI systems to evolve dynamically, modifying their responses based on real-time data during inference. This adaptive capability, likened to the analogy of road construction and navigation, underscores the pursuit of developing machines that can create solutions tailored to unforeseen challenges. The outcomes of the ARC AGI tests will ultimately contribute to the ongoing dialogue about the future of AI, highlighting the crucial steps needed to progress toward genuine AGI.

Opinions

You've reached the end