LG AI Analysis has unveiled EXAONE Deep, a reasoning mannequin that excels in complicated problem-solving throughout maths, science, and coding.
The corporate highlighted the worldwide problem in creating superior reasoning fashions, noting that presently, solely a handful of organisations with foundational fashions are actively pursuing this complicated space. EXAONE Deep goals to compete instantly with these main fashions, showcasing a aggressive degree of reasoning skill.
LG AI Analysis has targeted its efforts on dramatically bettering EXAONE Deep’s reasoning capabilities in core domains. The mannequin additionally demonstrates a powerful skill to grasp and apply data throughout a broader vary of topics.
The efficiency benchmarks launched by LG AI Analysis are spectacular:
- Maths: The EXAONE Deep 32B mannequin outperformed a competing mannequin, regardless of being solely 5% of its measurement, in a demanding arithmetic benchmark. Moreover, the 7.8B and a pair of.4B variations achieved first place in all main arithmetic benchmarks for his or her respective mannequin sizes.
- Science and coding: In these areas, the EXAONE Deep fashions (7.8B and a pair of.4B) have secured the highest spot throughout all main benchmarks.
- MMLU (Large Multitask Language Understanding): The 32B mannequin achieved a rating of 83.0 on the MMLU benchmark, which LG AI Analysis claims is the very best efficiency amongst home Korean fashions.
The capabilities of the EXAONE Deep 32B mannequin have already garnered worldwide recognition.
Shortly after its launch, it was included within the ‘Notable AI Fashions’ record by US-based non-profit analysis organisation Epoch AI. This itemizing locations EXAONE Deep alongside its predecessor, EXAONE 3.5, making LG the one Korean entity with fashions featured on this prestigious record up to now two years.
Maths prowess
EXAONE Deep has demonstrated distinctive mathematical reasoning abilities throughout its varied mannequin sizes (32B, 7.8B, and a pair of.4B). In assessments based mostly on the 2025 tutorial 12 months’s arithmetic curriculum, all three fashions outperformed international reasoning fashions of comparable measurement.
The 32B mannequin achieved a rating of 94.5 in a common arithmetic competency take a look at and 90.0 within the American Invitational Arithmetic Examination (AIME) 2024, a qualifying examination for the US Mathematical Olympiad.
Within the AIME 2025, the 32B mannequin matched the efficiency of DeepSeek-R1—a considerably bigger 671B mannequin. This outcome showcases EXAONE Deep’s environment friendly studying and powerful logical reasoning skills, significantly when tackling difficult mathematical issues.
The smaller 7.8B and a pair of.4B fashions additionally achieved prime rankings in main benchmarks for light-weight and on-device fashions, respectively. The 7.8B mannequin scored 94.8 on the MATH-500 benchmark and 59.6 on AIME 2025, whereas the two.4B mannequin achieved scores of 92.3 and 47.9 in the identical evaluations.
Science and coding excellence
EXAONE Deep has additionally showcased exceptional capabilities in skilled science reasoning and software program coding.
The 32B mannequin scored 66.1 on the GPQA Diamond take a look at, which assesses problem-solving abilities in doctoral-level physics, chemistry, and biology. Within the LiveCodeBench analysis, which measures coding proficiency, the mannequin achieved a rating of 59.5, indicating its potential for high-level functions in these skilled domains.
The 7.8B and a pair of.4B fashions continued this development of sturdy efficiency, each securing first place within the GPQA Diamond and LiveCodeBench benchmarks inside their respective measurement classes. This achievement builds upon the success of the EXAONE 3.5 2.4B mannequin, which beforehand topped Hugging Face’s LLM Readerboard within the edge division.
Enhanced common data
Past its specialised reasoning capabilities, EXAONE Deep has additionally demonstrated improved efficiency on the whole data understanding.
The 32B mannequin achieved a powerful rating of 83.0 on the MMLU benchmark, positioning it because the top-performing home mannequin on this complete analysis. This means that EXAONE Deep’s reasoning enhancements prolong past particular domains and contribute to a broader understanding of assorted topics.
LG AI Analysis believes that EXAONE Deep’s reasoning developments symbolize a leap in the direction of a future the place AI can sort out more and more complicated issues and contribute to enriching and simplifying human lives by means of steady analysis and innovation.
See additionally: Baidu undercuts rival AI fashions with ERNIE 4.5 and ERNIE X1

Need to study extra about AI and massive knowledge from trade leaders? Take a look at AI & Big Data Expo going down in Amsterdam, California, and London. The great occasion is co-located with different main occasions together with Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Discover different upcoming enterprise know-how occasions and webinars powered by TechForge here.