ARC Prize has launched the hardcore ARC-AGI-2 benchmark, accompanied by the announcement of their 2025 competitors with $1 million in prizes.
As AI progresses from performing slender duties to demonstrating basic, adaptive intelligence, the ARC-AGI-2 challenges purpose to uncover functionality gaps and actively information innovation.
“Good AGI benchmarks act as helpful progress indicators. Higher AGI benchmarks clearly discern capabilities. The most effective AGI benchmarks do all this and actively encourage analysis and information innovation,” the ARC Prize workforce states.
ARC-AGI-2 is getting down to obtain the “greatest” class.
Past memorisation
Since its inception in 2019, ARC Prize has served as a “North Star” for researchers striving towards AGI by creating enduring benchmarks.
Benchmarks like ARC-AGI-1 leaned into measuring fluid intelligence (i.e., the power to adapt studying to new unseen duties.) It represented a transparent departure from datasets that reward memorisation alone.
ARC Prize’s mission can also be forward-thinking, aiming to speed up timelines for scientific breakthroughs. Its benchmarks are designed not simply to measure progress however to encourage new concepts.
Researchers noticed a important shift with the debut of OpenAI’s o3 in late 2024, evaluated utilizing ARC-AGI-1. Combining deep learning-based massive language fashions (LLMs) with reasoning synthesis engines, o3 marked a breakthrough the place AI transitioned past rote memorisation.
But, regardless of progress, techniques like o3 stay inefficient and require important human oversight throughout coaching processes. To problem these techniques for true adaptability and effectivity, ARC Prize launched ARC-AGI-2.
ARC-AGI-2: Closing the human-machine hole
The ARC-AGI-2 benchmark is more durable for AI but retains its accessibility for people. Whereas frontier AI reasoning techniques proceed to attain in single-digit percentages on ARC-AGI-2, people can remedy each job in beneath two makes an attempt.
So, what units ARC-AGI aside? Its design philosophy chooses duties which might be “comparatively simple for people, but arduous, or unimaginable, for AI.”
The benchmark consists of datasets with various visibility and the next traits:
- Symbolic interpretation: AI struggles to assign semantic significance to symbols, as an alternative specializing in shallow comparisons like symmetry checks.
- Compositional reasoning: AI falters when it wants to use a number of interacting guidelines concurrently.
- Contextual rule software: Programs fail to use guidelines otherwise primarily based on advanced contexts, usually fixating on surface-level patterns.
Most present benchmarks concentrate on superhuman capabilities, testing superior, specialised expertise at scales unattainable for most people.
ARC-AGI flips the script and highlights what AI can’t but do; particularly the adaptability that defines human intelligence. When the hole between duties which might be simple for people however troublesome for AI ultimately reaches zero, AGI may be declared achieved.
Nevertheless, attaining AGI isn’t restricted to the power to resolve duties; effectivity – the fee and assets required to search out options – is rising as an important defining issue.
The position of effectivity
Measuring efficiency by value per job is important to gauge intelligence as not simply problem-solving functionality however the skill to take action effectively.
Actual-world examples are already displaying effectivity gaps between people and frontier AI techniques:
- Human panel effectivity: Passes ARC-AGI-2 duties with 100% accuracy at $17/job.
- OpenAI o3: Early estimates recommend a 4% success charge at an eye-watering $200 per job.
These metrics underline disparities in adaptability and useful resource consumption between people and AI. ARC Prize has dedicated to reporting on effectivity alongside scores throughout future leaderboards.
The concentrate on effectivity prevents brute-force options from being thought-about “true intelligence.”
Intelligence, in line with ARC Prize, encompasses discovering options with minimal assets—a high quality distinctly human however nonetheless elusive for AI.
ARC Prize 2025
ARC Prize 2025 launches on Kaggle this week, promising $1 million in whole prizes and showcasing a dwell leaderboard for open-source breakthroughs. The competition goals to drive progress towards techniques that may effectively sort out ARC-AGI-2 challenges.
Among the many prize classes, which have elevated from 2024 totals, are:
- Grand prize: $700,000 for reaching 85% success inside Kaggle effectivity limits.
- High rating prize: $75,000 for the highest-scoring submission.
- Paper prize: $50,000 for transformative concepts contributing to fixing ARC-AGI duties.
- Further prizes: $175,000, with particulars pending bulletins in the course of the competitors.
These incentives guarantee truthful and significant progress whereas fostering collaboration amongst researchers, labs, and impartial groups.
Final 12 months, ARC Prize 2024 noticed 1,500 competitor groups, leading to 40 papers of acclaimed trade affect. This 12 months’s elevated stakes purpose to nurture even larger success.
ARC Prize believes progress hinges on novel concepts slightly than merely scaling present techniques. The following breakthrough in environment friendly basic techniques may not originate from present tech giants however from daring, inventive researchers embracing complexity and curious experimentation.
(Picture credit score: ARC Prize)
See additionally: DeepSeek V3-0324 tops non-reasoning AI fashions in open-source first

Need to study extra about AI and large knowledge from trade leaders? Try AI & Big Data Expo going down in Amsterdam, California, and London. The excellent occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Discover different upcoming enterprise expertise occasions and webinars powered by TechForge here.
