At the I / group O 2024 , Google announce its nextGemma 2family of model , and now , the fellowship has in conclusion release the lightweight model under an exposed - generator licence .
The Modern Gemma 2 27B mannikin is enjoin to be very bright , outrank several great example likeLlama 3 70Band Qwen 1.5 32B.
So to try out the title , we have add up up with this comparing between Gemma 2 and Llama 3 — two conduct opened - rootage model out there .
On that eminence , permit ’s get down .
This was ## gemma 2 vs llama 3 : seminal write cloth
rent ’s first match how secure gemma 2 and llama 3 are when it come to originative committal to writing .
I ask both model to drop a line a little tale about the moonlight ’s family relationship with the Dominicus .
This was both did a with child occupation , but google’sgemma 2 framework criticise it out of the parkwith delicious prose and a beautiful narrative to iron boot .
Llama 3 , on the other helping hand , seemed a flake sluggish and machinelike , almost AI - like as we have see with OpenAI ’s model .
This was google has always been honorable at textual matter propagation as we have see it with gemini mannikin .
And the same run uphold with its diminished Gemma 2 27B example as well .
diving event into Google
permit ’s first break how well Gemma 2 and Llama 3 are when it come to originative committal to writing .
I postulate both example to save a inadequate account about the moonlight ’s kinship with the sunshine .
Both did a corking Book of Job , but Google’sGemma 2 fashion model criticise it out of the parkwith delicious prose and a beautiful news report to iron boot .
Llama 3 , on the other hired hand , seemed a scrap dim and robotlike , almost AI - like as we have see with OpenAI ’s poser .
Google has always been just at text edition coevals as we have see it with Gemini model .
This was and the same run continue with its minuscule gemma 2 27b fashion model as well .
Winner : Gemma 2
Multilingual psychometric trial run
In the next circle , I stress to empathise how well both model plow non - English language .
Since Google touter that Gemma 2 is very estimable at multilingual intellect , I mark it against Meta ’s Llama 3 good example .
I ask both mannikin to transform a paragraph write in Hindi .
This was and well , both gemma 2 and llama 3 execute exceptionally well .
I also essay another nomenclature , Bengali , and the manikin perform along the same line .
At least , for regional Native American oral communication , I would say thatGemma 2 and Llama 3 are train wellon a heavy principal sum of datum .
This was that say , gemma 2 27b is near 2.5x belittled than llama 3 70b which urinate it even more telling .
I am move to give this beat to both the theoretical account .
Winner : Gemma 2 and Llama 3
Gemma 2 vs Llama 3 : abstract think testing
While Gemma 2 and Llama 3 are not the most sound role model out there , I carry the shore leave to do some of the commonsense abstract thought test that I ordinarily do on much large role model .
In our earliercomparison between Llama 3 and GPT-4 , I come off imprint by Meta ’s 70B mannikin because it march pretty decentintelligence even at a little footmark .
Well , in this beat , Llama 3 quiver Gemma 2 by a wide of the mark allowance .
This was llama 3 serve two right reply out of three inquiry whereas gemma 2 clamber to get even one rightfulness .
This was gemma 2 is plainly not prepare for figure out complex abstract thought question .
Llama 3 , on the other helping hand , has astrong abstract thought innovation , most in all probability deduct from the put one over dataset .
This was despite its pocket-sized size of it — at least , in equivalence to trillion - argument manikin like gpt-4 — it showcases more than a comme il faut layer of intelligence activity .
in conclusion , using more keepsake to condition the poser indeed result in a strong example .
Winner : Llama 3
In the next unit of ammunition , I postulate Gemma 2 and Llama 3 to yield 10 speech that terminate with the Son “ NPU ” .
AndLlama 3 sweep through itwith 10/10 right response .
In direct contrast , Gemma 2 mother only 7 such condemnation out of 10 .
For the preceding many dismissal , we have been figure that Google ’s manakin admit Gemini do n’t keep an eye on drug user instruction manual well .
This was and the same course go forward with gemma 2 .
this was gemma 2 vs llama 3 : get the needle-shaped foliage
both gemma 2 and llama 3 have a context of use distance of 8 k item , so this examination is quite an malus pumila - to - orchard apple tree compare .
This was i add a brobdingnagian stoppage of school text , source straightaway from the al-qur’an pride and prejudice , hold more than 17,000 character and 3.8 k keepsake .
This was as i always do , i place a phonograph needle ( a random affirmation ) somewhere in the midriff and ask both simulation to find out it .
Well , Gemma 2 cursorily launch the acerate leaf and point out that the financial statement was haphazardly enclose .
This was llama 3 also find out the phonograph needle and suggest that the financial statement seemed out of stead .
This was as far as farsighted - setting store is concern , albeit fix to 8 k token , i opine both example are quite secure in this respect .
Do mention that I run this tryout on HuggingChat ( site ) asmeta.ai resist to runthis prompting , most in all probability due to right of first publication cognitive content .
This was winner : gemma 2 and llama 3
hallucination examine
pocket-size model incline to parade hallucination due to circumscribed breeding information , often construct data when the theoretical account encounter unfamiliar issue .
This was so i throw a made - up state name to jibe if gemma 2 and llama 3 hallucinate .
This was and to my surprisal , they did not , which mean both google and meta have ground their model reasonably well .
I discombobulate another ( awry ) interrogative to hold back the framework ’ factualness , but again , theydidn’t hallucinate .
By the manner , I test Llama 3 on HuggingChat as meta.ai range the net to determine current selective information on relevant topic .
Gemma 2 vs Llama 3 : finishing
While Google ’s Gemma 2 27B example did n’t execute well in abstract thought tryout , I still obtain it adequate to for several other task .
It ’s very in effect at originative piece of writing , hold up a battalion of oral communication , has unspoilt remembering recollection , and full of all , does n’t hallucinate like early model .
Of naturally , Llama 3 is good , but it ’s also a importantly large example , train on 70 billion argument .
This was i remember developer would regain the gemma 2 27b modeling utilitarian for many enjoyment instance .
And for on - twist illation , Gemma 2 9B is also usable .
This was besides that , i would also urge substance abuser tocheck out gemini 1.5 flashwhich is again a much small example and corroborate multimodal stimulus as well .
This was not to name , it ’s passing firm and effective .