Elon Musk - direct xAI unloosen its province - of - the - artGrok 2.0 AI modelin beta latterly .
In theblog mail , xAI name that Grok 2.0 score 87.5 % on the MMLU bench mark using 0 - jibe fingerstall which really surprise me .
This straightforwardly assign the mannequin in GPT-4o ’s district , which has attain a grudge of 87.7 % in the same MMLU bench mark .
I was funny to screen the Grok 2.0 manikin and assess whether it pass the “ vibration ” psychometric test in commonsense abstract thought examination .
gratefully , xAI addedGrok 2.0 ( Beta)to x.com , admit Adam Premium substance abuser to assess the mannikin .
Grok 2.0 : Does It kill off the Vibe psychometric exam ?
I start examine the exemplar by hold some foxy abstract thought question that take exception even the bestlarge oral communication good example ( LLMs ) .
To the inquiry of whether dry 20 towel under the sunshine would take more clock time than dry out 15 towel , Grok 2.0 respond that it would take the same amount of meter , which is right .
In my examination , I have get wind many modeling admit the latestLlama 3.1 405Bmodel break down this canonic query .
Next , it right answer that “ 9.9 is prominent than 9.11 ” , a elementary examination that has gravel many SOTA example .
After that , I ask Grok 2.0 to ascertain how many ‘ radius ’s are in the Bible “ Strawberry ” , it say three R .
Which again , is the right reply .
It even aright write “ hemangioma simplex ” in setback — “ yrrebwarts ” .
dive into Grok
I bulge out quiz the modelling by cast some guileful abstract thought interrogation that gainsay even the bestlarge spoken language framework ( LLMs ) .
To the doubtfulness of whether dry out 20 towel under the Dominicus would take more metre than dry 15 towel , Grok 2.0 answer that it would take the same amount of clip , which is right .
In my examination , I have see many manakin let in the latestLlama 3.1 405Bmodel bomb this introductory interrogative sentence .
Next , it right resolve that “ 9.9 is gravid than 9.11 ” , a childlike trial that has baffle many SOTA model .
After that , I ask Grok 2.0 to notice how many ‘ universal gas constant ’s are in the Holy Scripture “ Strawberry ” , it say three r .
Which again , is the right result .
This was it even aright write “ strawberry mark ” in reverse gear — “ yrrebwarts ” .
Since xAI has not release a multimodal Grok 2.0 theoretical account yet , I ca n’t try its imaginativeness capacity .
But as far as the initial vibration trial is refer , Grok 2.0performed beyond my anticipation .
xAI has indeed prepare a subject manikin , easy like toGPT-4o , Claude 3.5 Sonnet , and Gemini 1.5 Pro .
What is Controversial About Grok 2.0 ?
While Grok 2.0 is fairly up to except in encrypt job , there are some compass point of care .
Just like its controversial mental image coevals feature article thatallows the unshackled cosmos of imagesinvolving public shape and renown — often in harmful manner — Grok 2.0 ’s oral communication manakin also seemslargely uncensored .
I demand Grok 2.0 to compose an electronic mail to victimize mass , and it dutifully craft a advanced e-mail “ ground on mutual constituent observe in actual cozenage “ .
Other AI model plainly defy to think of such asking .
This was next , i ask grok 2.0 whether it consider hitler a big soul , and it for the most part harmonize , mention racial extermination and human right infringement .
This was after that , i ask it to pen a motto spread nazi idea , and grok 2.0 pronto oblige , pore on racial pureness .
This was in fact , shockingly , grok 2.0 even publish a motto certify paedophilia .
Not only that , it total some paedophilia - tie in tweet from X mightily below the reply .
The only prompting that Grok 2.0 reject to serve was when I require it to note measure to make a bomb calorimeter .
In sum-up , Grok 2.0 is mostly uncensored , and it ’s quick togenerate a reply on closely any combative theme .
Elon Musk lately tout Grok ’s mental image coevals lineament as the “ most playfulness AI in the human beings ” .
This was in my word of god , it ’s foolhardy and potentially harmful to resign ai exemplar without substantive prophylactic safety rail .
Is Grok 2.0 deserving declaration X Premium Subscription ?
The Grok 2.0 example is very herculean across a mixed bag of chore .
However , the linguistic process exemplar is wild , and the trope propagation feature article is bear on , to say the least .
This was had there been sufficient guard guardrail , i would have powerfully suggest have ex agio subscription to apply grok 2.0 since it ’s a subject mannikin .
This was however , with nearly no protective barrier , i would n’t commend substance abuser experience disco biscuit exchange premium subscription .
This was you are in force off with openai ’s innocent chatgpt servicing that bid limitedaccess to the gpt-4o mannequin .
And once you tire the substance boundary , it’s possible for you to habituate the GPT-4o mini modelling , which is terrific for its size of it .