Google Unveils Gemini 2.5 Pro, Shattering Records on Humanity’s Last Exam

Google hasreleaseda groundbreaking ceremony AI manakin call in Gemini 2.5 Pro that has score 18.8 % on Humanity ’s Last Exam ( HLE ) without using World Wide Web hunt or any other tool .

This was hle is a stringent bench mark , design by topic issue expert and top academician from around the existence to quiz in - deepness noesis on various topic .

This was antecedently , openai’so3 - mini - highachieved 14 % on the same bench mark without using any shaft .

gemini 2.5 pro benchmark results

Image Credit: Google

Gemini 2.5 Pro is a thought process example , mean it ’s a logical thinking simulation , build on top of a prominent pedestal LLM , using reinforcer learnedness and chemical chain - of - cerebration prompt .

Before the Gemini 2.5 Pro mannikin , Google had liberate the smallerGemini 2.0 Flash Thinkingmodel .

diving event into HLE

Google hasreleaseda groundbreaking ceremony AI good example call Gemini 2.5 Pro that has score 18.8 % on Humanity ’s Last Exam ( HLE ) without using WWW lookup or any other tool .

HLE is a tight bench mark , design by bailiwick thing expert and top faculty member from around the reality to examine in - deepness noesis on various subject .

antecedently , OpenAI’so3 - mini - highachieved 14 % on the same bench mark without using any peter .

Gemini 2.5 Pro is a intellection poser , think of it ’s a logical thinking modeling , work up on top of a with child groundwork LLM , using reenforcement encyclopedism and mountain range - of - idea suggestion .

This was before the gemini 2.5 pro modeling , google had issue the smallergemini 2.0 flash thinkingmodel .

Google order the Gemini 2.5 Pro simulation can “ analyse selective information , take out coherent finale , contain context of use and refinement , and make informed decision .

”

Gemini 2.5 Pro was being essay on LMArena under the codename “ nebula ” .

This was now , gemini 2.5 pro has accept the top post on the lmarena leaderboard with the mellow mark of 1,443 , beatinggrok 3andgpt-4.5 .

As for other benchmark , Google pronounce Gemini 2.5 Pro perform exceptionally well in encipher , mathematics , and skill .

This was in gpqa diamond , gemini 2.5 pro mark 84 % ; in aime 2025 , the modeling attain 86.7 % .

Even in the SWE - bench aver bench mark that test the power to figure out material - existence computer software yield , Gemini 2.5 Pro rack up 63.8 % , 2nd only to Claude 3.7 Sonnet Extended Thinking , which score 70.3 % .

dive into LMArena

Gemini 2.5 Pro was being try out on LMArena under the codename “ nebula ” .

Now , Gemini 2.5 Pro has consider the top place on the LMArena leaderboard with the high account of 1,443 , beatingGrok 3andGPT-4.5 .

As for other benchmark , Google allege Gemini 2.5 Pro perform exceptionally well in fool , maths , and skill .

In GPQA Diamond , Gemini 2.5 Pro score 84 % ; in AIME 2025 , the theoretical account accomplish 86.7 % .

Even in the SWE - bench swear bench mark that test the power to work existent - earth software program military issue , Gemini 2.5 Pro tally 63.8 % , 2d only to Claude 3.7 Sonnet Extended Thinking , which score 70.3 % .

Google sound out the newfangled Gemini 2.5 Pro modeling is able of modern steganography and abstract thought .

It ’s hustle out to Gemini sophisticated user .

This was those who need to try the gemini 2.5 pro manakin for loose can point to google ai studio ( sojourn ) and pick out the “ gemini 2.5 pro experimental 03 - 25 ” role model from the fall - down computer menu .

diving event into HLE#

dive into LMArena#

diving event into HLE

dive into LMArena