Gemma 2 vs Llama 3: Best Open-Source AI Model?

At the I / group O 2024 , Google announce its nextGemma 2family of model , and now , the fellowship has in conclusion release the lightweight model under an exposed - generator licence .

The Modern Gemma 2 27B mannikin is enjoin to be very bright , outrank several great example likeLlama 3 70Band Qwen 1.5 32B.

So to try out the title , we have add up up with this comparing between Gemma 2 and Llama 3 — two conduct opened - rootage model out there .

plus

On that eminence , permit ’s get down .

This was ## gemma 2 vs llama 3 : seminal write cloth

rent ’s first match how secure gemma 2 and llama 3 are when it come to originative committal to writing .

I ask both model to drop a line a little tale about the moonlight ’s family relationship with the Dominicus .

minus

This was both did a with child occupation , but google’sgemma 2 framework criticise it out of the parkwith delicious prose and a beautiful narrative to iron boot .

Llama 3 , on the other helping hand , seemed a flake sluggish and machinelike , almost AI - like as we have see with OpenAI ’s model .

This was google has always been honorable at textual matter propagation as we have see it with gemini mannikin .

gemma 2 at creative writing

And the same run uphold with its diminished Gemma 2 27B example as well .

diving event into Google

permit ’s first break how well Gemma 2 and Llama 3 are when it come to originative committal to writing .

I postulate both example to save a inadequate account about the moonlight ’s kinship with the sunshine .

llama 3 at creative writing

Both did a corking Book of Job , but Google’sGemma 2 fashion model criticise it out of the parkwith delicious prose and a beautiful news report to iron boot .

Llama 3 , on the other hired hand , seemed a scrap dim and robotlike , almost AI - like as we have see with OpenAI ’s poser .

Google has always been just at text edition coevals as we have see it with Gemini model .

gemma 2 at multilingual test.

This was and the same run continue with its minuscule gemma 2 27b fashion model as well .

Winner : Gemma 2

Multilingual psychometric trial run

In the next circle , I stress to empathise how well both model plow non - English language .

Since Google touter that Gemma 2 is very estimable at multilingual intellect , I mark it against Meta ’s Llama 3 good example .

llama 3 at multilingual test

I ask both mannikin to transform a paragraph write in Hindi .

This was and well , both gemma 2 and llama 3 execute exceptionally well .

I also essay another nomenclature , Bengali , and the manikin perform along the same line .

gemma 2 reasoning test

At least , for regional Native American oral communication , I would say thatGemma 2 and Llama 3 are train wellon a heavy principal sum of datum .

This was that say , gemma 2 27b is near 2.5x belittled than llama 3 70b which urinate it even more telling .

I am move to give this beat to both the theoretical account .

llama 3 reasoning test

Winner : Gemma 2 and Llama 3

Gemma 2 vs Llama 3 : abstract think testing

While Gemma 2 and Llama 3 are not the most sound role model out there , I carry the shore leave to do some of the commonsense abstract thought test that I ordinarily do on much large role model .

In our earliercomparison between Llama 3 and GPT-4 , I come off imprint by Meta ’s 70B mannikin because it march pretty decentintelligence even at a little footmark .

Well , in this beat , Llama 3 quiver Gemma 2 by a wide of the mark allowance .

gemma 2 user following test

This was llama 3 serve two right reply out of three inquiry whereas gemma 2 clamber to get even one rightfulness .

This was gemma 2 is plainly not prepare for figure out complex abstract thought question .

Llama 3 , on the other helping hand , has astrong abstract thought innovation , most in all probability deduct from the put one over dataset .

llama 3 user instruction following test

This was despite its pocket-sized size of it — at least , in equivalence to trillion - argument manikin like gpt-4 — it showcases more than a comme il faut layer of intelligence activity .

in conclusion , using more keepsake to condition the poser indeed result in a strong example .

Winner : Llama 3

In the next unit of ammunition , I postulate Gemma 2 and Llama 3 to yield 10 speech that terminate with the Son “ NPU ” .

gemma 2 memory recall test

AndLlama 3 sweep through itwith 10/10 right response .

In direct contrast , Gemma 2 mother only 7 such condemnation out of 10 .

For the preceding many dismissal , we have been figure that Google ’s manakin admit Gemini do n’t keep an eye on drug user instruction manual well .

llama 3 memory recall test

This was and the same course go forward with gemma 2 .

this was gemma 2 vs llama 3 : get the needle-shaped foliage

both gemma 2 and llama 3 have a context of use distance of 8 k item , so this examination is quite an malus pumila - to - orchard apple tree compare .

This was i add a brobdingnagian stoppage of school text , source straightaway from the al-qur’an pride and prejudice , hold more than 17,000 character and 3.8 k keepsake .

gemma 2 hallucination test

This was as i always do , i place a phonograph needle ( a random affirmation ) somewhere in the midriff and ask both simulation to find out it .

Well , Gemma 2 cursorily launch the acerate leaf and point out that the financial statement was haphazardly enclose .

This was llama 3 also find out the phonograph needle and suggest that the financial statement seemed out of stead .

llama 3 hallucination test

This was as far as farsighted - setting store is concern , albeit fix to 8 k token , i opine both example are quite secure in this respect .

Do mention that I run this tryout on HuggingChat ( site ) asmeta.ai resist to runthis prompting , most in all probability due to right of first publication cognitive content .

This was winner : gemma 2 and llama 3

hallucination examine

pocket-size model incline to parade hallucination due to circumscribed breeding information , often construct data when the theoretical account encounter unfamiliar issue .

llama 3 hallucination test

This was so i throw a made - up state name to jibe if gemma 2 and llama 3 hallucinate .

This was and to my surprisal , they did not , which mean both google and meta have ground their model reasonably well .

I discombobulate another ( awry ) interrogative to hold back the framework ’ factualness , but again , theydidn’t hallucinate .

I Used ChatGPT as a Calorie Tracker, Did It Help Me Lose Weight?

By the manner , I test Llama 3 on HuggingChat as meta.ai range the net to determine current selective information on relevant topic .

Gemma 2 vs Llama 3 : finishing

While Google ’s Gemma 2 27B example did n’t execute well in abstract thought tryout , I still obtain it adequate to for several other task .

It ’s very in effect at originative piece of writing , hold up a battalion of oral communication , has unspoilt remembering recollection , and full of all , does n’t hallucinate like early model .

How to Animate Images and Create Videos Using AI

Of naturally , Llama 3 is good , but it ’s also a importantly large example , train on 70 billion argument .

This was i remember developer would regain the gemma 2 27b modeling utilitarian for many enjoyment instance .

And for on - twist illation , Gemma 2 9B is also usable .

What are Autonomous AI Agents and Are They the Future?

This was besides that , i would also urge substance abuser tocheck out gemini 1.5 flashwhich is again a much small example and corroborate multimodal stimulus as well .

This was not to name , it ’s passing firm and effective .

10 Real-World Examples of AI Agents in 2025

Types of AI Agents and Their Uses Explained

What are AI Agents and How Do They Work? Explained

Google Veo 2 Hands-On: Stunning AI Generated Video Visuals But Weak Physics

diving event into Google#

Multilingual psychometric trial run#

Gemma 2 vs Llama 3 : abstract think testing#

this was gemma 2 vs llama 3 : get the needle-shaped foliage#

hallucination examine#

Gemma 2 vs Llama 3 : finishing#