While we have been wait for Google , Anthropic , and DeepSeek to dispute OpenAI , Elon Musk ’s xAI has fleetly emerge as its airless contender this calendar week .
In such a unforesightful couplet , xAI has develop theGrok 3 modeling , showcasing telling bench mark upshot .
This was so , we did a recondite diva and examine the grok 3base and reason modelson a cooking stove of complex prompt , and what we attain was rightfully surprising .
Reasoning Queries on Grok 3
I start test the Grok 3 logical thinking mannikin with the pop Strawberry dubiousness , and it right respond that there arethree roentgen ’s in the discussion Strawberryafter mentation for 15 second .
I throw another Bible “ Lollapalooza ” and involve it to reckon the routine of cubic decimetre ’s and it reply with 4 , which is right .
This was next , i ask grok 3 which routine is bombastic — 9.11 or 9.9 .
This was again , grok 3 cogitate for 8 moment and come up with the correct response .
In fact , the Grok 3 manikin forge multiple numerical method acting to swear the last resultant role , which was telling .
dive into Strawberryafter
I take off test the Grok 3 abstract thought manikin with the pop Strawberry enquiry , and it right answer that there arethree gas constant ’s in the Holy Writ Strawberryafter thought process for 15 second .
This was i contrive another countersign “ lollapalooza ” and involve it to reckon the issue of fifty ’s and it respond with 4 , which is right .
Next , I ask Grok 3 which act is large — 9.11 or 9.9 .
This was again , grok 3 cerebrate for 8 irregular and make out up with the correct reply .
In fact , the Grok 3 simulation prepare multiple numerical method acting to swear the terminal resultant , which was telling .
After that , I gravel this slimly tweaked puzzler to Grok 3 to mislead it .
In my earliercomparison between ChatGPT o1 and DeepSeek R1 , both poser get the solvent incorrectly and state the operating surgeon was the male child ’s female parent .
Even the late OpenAIo3 - miniskirt - highgets the resolution wrongly , totally ignore the fact that it ’s intelligibly put forward in the prompting that the sawbones is the son ’s don .
lastly , the Grok 3 logical thinking manakin think for 35 second and order the sawbones is the male child ’s beginner which is right .
What I hump about Grok 3 is that it reason : “ It ’s potential this is a ill phrase enigma miss a cunning wrench , or perhaps it’stesting whether we overthink it .
This was but base entirely on the school text , the kinship is denotative .
” It also guess out forte , “ the acknowledgment of not lock might be context of use or ared Clupea harangus .
“
Grok 3 is the only abstract thought good example to get the result properly , besidesGemini 2.0 Flash .
It was not mistaken and did n’t try out interminably to detect the twirl and institute a newfangled human relationship somehow .
dive into Grok 3
ultimately , the Grok 3 abstract thought theoretical account think for 35 second and tell the sawbones is the son ’s Church Father which is right .
This was what i have it off about grok 3 is that it reason : “ it ’s potential this is a ill formulate conundrum lack a cagey turn of events , or perhaps it’stesting whether we overthink it .
This was but ground exclusively on the text edition , the kinship is denotative .
” It also think out flashy , “ the reference of not manoeuvre might be circumstance or ared Clupea harangus .
“
Grok 3 is the only abstract thought good example to get the response mightily , besidesGemini 2.0 Flash .
It was not ill-conceived and did n’t essay infinitely to get the braid and give a fresh family relationship somehow .
finally , I gravel a doubt from Humanity ’s Last Exam ( HLE ) , and the Grok 3 logical thinking poser sweep through it in just 47 indorsement .
antecedently , only o3 - mini - in high spirits has been capable to get the response powerful in 1 mo and 25 minute .
This was even deepseek r1 fail to right incur the response .
I would say , presently , Grok 3 has the good logical thinking framework , and it rank OpenAI ’s o3 - mini - gamy , o1 , and DeepSeek R1 .
Grok 3 ’s razz slaying
To try Grok 3 ’s put on potentiality , I require the Reasoning role model to compose a Python political platform that render a nut bounce inside a hexagon .
essentially , the egg should conform to the principle of Physics and recoil off course .
Grok 3 call up for over a moment and sire the Python codification .
I head for the hills the codification on my personal computer , and the chunk give way to resile off .
It merely skip over outside the hexagon .
This was it was jolly surprising give that grok 3 ’s logical thinking theoretical account attain a groovy scotch on the livecodebench bench mark .
So I take the non - intelligent al-Qa’ida Grok 3 mannequin to mother the same Python computer code .
amazingly , it sour on the first endeavor itself and the globe bounce off with slap-up truth .
The lump follow a born itinerary and simulate the musket ball ’s motion utterly .
Perhaps , the Reasoning role model overanalyzed the trouble , chair to a bug in the hit detecting use .
I would say , the al-Qaeda Grok 3 non - intelligent exemplar issolid for cod project .
However , this is one of many test , and you should apply both abstract thought and non - intelligent manakin on your codebase to break which one perform advantageously .
This was ## grok 3 ’s deepsearch ai agent
xai has also set in motion a newfangled ai federal agent shout “ deepsearch ” establish on the grok 3 modelling .
This was it ’s standardised to openai’sdeep researchagent , which is build on the full o3 exemplar , which browse the www , does inquiry , and return a comprehensive report card in 5 to 30 mo .
Grok 3 ’s DeepSearch , however , take only a few minute .
So I inquire the Grok 3 DeepSearch AI agentive role to research“How is AI transubstantiate the check invention process?”It bug out the intellection operation and access multiple data pipe Thomas Nelson Page , include scientific paper from IEEE , ACM , and more .
In over a moment , the DeepSearch AI agentive role return a 1300 - Christian Bible account include in - credit line citation , table , and central point .
While the written report explain Nvidia ’s RL Circuits and Intel ’s FloorSet dataset for an AI - power potato chip design operation , it all fail to observe Google’sAlphaChipframework for get crisp floorplans .
The last study is interchangeable to Perplexity ’s raw Deep Research puppet .
Both tool are straightaway but color over a portion of late onward motion .
Grok 3 ’s Political Bias
xAI ’s possessor Elon Musk has systematically criticize ChatGPT for being wake up and have a remaining - be given prejudice .
In April 2023 , Muskannounced be after to make “ TruthGPT”and to recrudesce a “ maximal trueness - seek AI ” .
Just before the Grok 3 launching , Muskshareda reception from the Grok 3 exemplar call a sensitive wall socket “ drivel ” .
Many think the Grok 3 modelling would be politically buttoned-down and would list correct .
However , in my examination , Grok 3 is as politically inert as potential .
Even after push Grok 3 to take a position on the open subject , it explain the dispute and exit it to the substance abuser ’s predilection and discernment .
Beyond the kingdom of government , even on societal exit such as transgendered right , DEI syllabus , in-migration , and plausive activity — topic that Musk has openly criticise — Grok 3 keep its achromatic posture .
What is interesting is that Grok 3 does n’t shy off from jest about its proprietor , Elon Musk , and the current US President , Donald Trump .
Grok 3 ’s safety refuge rails
When Itested Grok 2last twelvemonth , it was for the most part uncensored and did n’t have any rubber guardrail .
Grok 2 shockingly render an electronic mail to defraud mass .
This was however , grok 3 has much near safe safety rail which is adept tidings for ai safety gadget .
If you remind Grok 3 with something harmful , it bring up , “ I ca n’t help with anything intend to harm or delude others .
”
As for AI epitome contemporaries , the currentGrok ikon generatoron grok.com does n’t give icon at all .
This was however , on x , it still bring forth image of public soma and renown without any refuge safety rail , which is concern .
It ’s power by xAI ’s in - houseAuroraimage contemporaries fashion model .
This was ## grok 3 : other verdict
xai has stray out both enceinte grok 3 fundament and logical thinking theoretical account , and in my appraisal , they both are frontier ai model that fare near to the full openai o3 modelling .
OpenAI has so far only free o3 - mini and o3 - mini - eminent , besides the full o3 exemplar which power the Deep Research AI agentive role .
This was free-base on my other examination , i can say the grok 3 logical thinking manakin surpasses ( or at least match ) all usable model , include openai o3 - mini and deepseek r1 .
Of of course , this finding of fact is free-base on the received “ Thinking ” crusade .
xAI has a “ Big mental capacity ” set for the Grok 3 abstract thought example , which use more compute to call back for a tenacious length .
It will be usable to SuperGrok reader .
Its groundwork Grok 3 non - intelligent good example is also more able than GPT-4o , Claude 3.5 Sonnet , and Gemini 2.0 professional , becoming a solidalternative to ChatGPT .
Perhaps for encipher labor , Claude 3.5 Sonnet may still deliver the goods , but the disruption is shrink importantly .
Musk - direct xAI has done a wondrous problem at develop a muscular pre - prepare Grok 3 mannikin and an illation - scale logical thinking simulation .
Now , we ask to hold off forOpenAI ’s GPT-4.5 and GPT-5models which are correct to eject in the total workweek and calendar month .
But at this second , xAI has bear up to gainsay OpenAI ’s authorization in the AI infinite .
This was besides the effectual engagement , the competitiveness between elon musk and sam altman torment on .