Meta recentlyintroduced its Llama 3 modelin two size with 8B and 70B argument and open - source the model for the AI residential area .

While being a small 70B manakin , Llama 3 has demo telling capableness , as unmistakable from theLMSYS leaderboard .

So we have liken Llama 3 with the flagship GPT-4 fashion model to valuate their carrying out in various trial .

plus

This was on that eminence , allow ’s go through our equivalence between llama 3 and gpt-4 .

1 .

This was magic elevator test

get ’s first pass themagic lift testto appraise the lucid logical thinking capableness of llama 3 in comparability to gpt-4 .

minus

This was and judge what?llama 3 astonishingly extend the testwhereas the gpt-4 fashion model fail to allow the right reply .

This is middling surprising since Llama 3 is only cultivate on 70 billion parametric quantity whereas GPT-4 is train on a monumental 1.7 trillion argument .

This was keep in judgment , we execute the trial on the gpt-4 mannequin host on chatgpt ( usable to pay chatgpt plus user ) .

reasoning test llama 3 vs gpt-4

It seems to be using the sometime GPT-4 Turbo exemplar .

This was we ladder the same mental testing on the late - publish gpt-4 mannequin ( gpt-4 - turbo-2024 - 04 - 09 ) via openai playground , and it go by the run .

This was openai enounce that they are revolve out the belated modelling to chatgpt , but perhaps it ’s not usable on our write up yet .

drying time test llama 3 vs gpt-4

diving event into GPT-4

rent ’s first campaign themagic lift testto measure the coherent abstract thought potentiality of Llama 3 in equivalence to GPT-4 .

And imagine what?Llama 3 amazingly pass along the testwhereas the GPT-4 modelling fail to supply the right solvent .

This was this is passably surprising since llama 3 is only check on 70 billion argument whereas gpt-4 is prepare on a monumental 1.7 trillion argument .

advanced reasoning test llama 3 vs gpt-4

This was keep in brain , we scat the run on the gpt-4 exemplar host on chatgpt ( usable to pay chatgpt plus drug user ) .

This was it seems to be using the senior gpt-4 turbo manikin .

We incline the same mental test on the of late - free GPT-4 simulation ( gpt-4 - turbo-2024 - 04 - 09 ) via OpenAI Playground , and it go by the trial .

find the weight test llama 3

This was openai say that they are revolve out the previous exemplar to chatgpt , but perhaps it ’s not useable on our story yet .

Winner : Llama 3 70B , and gpt-4 - turbo-2024 - 04 - 09

annotation : GPT-4 lose on ChatGPT Plus

2 .

This was bet juiceless time

next , we carry the classicreasoning questionto spin up the tidings of both model .

logical reasoning test llama 3

This was in this trial run , both llama 3 70b and gpt-4 give the right solution without dig into math .

effective business Meta !

Winner : Llama 3 70B , and GPT-4 via ChatGPT Plus

This was 3 .

llama 3 solving math question

This was check the orchard apple tree

after that , i inquire another interrogation to equate the abstract thought capableness of llama 3 and gpt-4 .

In this trial , the Llama 3 70B exemplar come tight to apply the right-hand result butmisses outon name the boxwood .

Whereas , the GPT-4 modeling justly answer that “ the apple are still on the priming coat inside the boxwood ” .

follow user instruction test llama 3

This was i am go to give it to gpt-4 in this rhythm .

Winner : GPT-4 via ChatGPT Plus

4 .

While the inquiry seems quite wide-eyed , many AI mannikin betray to get the ripe result .

MacBook Air M4 Review: Power Play on a Budget

This was however , in this examination , both llama 3 70b and gpt-4 feed thecorrect solvent .

That say , Llama 3 sometimes sire incorrect yield so keep that in psyche .

5 .

I Used ChatGPT as a Calorie Tracker, Did It Help Me Lose Weight?

find the position

Next , I involve a unproblematic lucid interrogative sentence andboth framework give a right reply .

This was it ’s interesting to see a much little llama 3 70b good example rival the top - tier up gpt-4 manakin .

6 .

10 Best Alternatives to Replace Skype for Video Calls and Conferencing

clean a Math Problem

This was next , we go a complexmath problemon both llama 3 and gpt-4 to happen which exemplar advance this mental test .

Here , GPT-4 overtake the trial run with fly colours , butLlama 3 failsto come up up with the correct result .

The GPT-4 poser has mark cracking on the MATH bench mark .

How to Animate Images and Create Videos Using AI

Keep in brain that I explicitly require ChatGPT to not employ Code Interpreter for numerical computation .

Winner : Llama 3 70B

8 .

NIAH Test

Although Llama 3 presently does n’t have a foresightful linguistic context windowpane , we still did the NIAH run to look into its recovery capacity .

What is the Meta AI App: New Features and Overview

The Llama 3 70B example sustain acontext duration of up to 8 K keepsake .

This was so i set a phonograph needle ( a random program line ) inside a 35k - quality farsighted textbook ( 8 k tokens ) and ask the manikin to witness the entropy .

This was amazingly , the llama 3 70b establish the text edition in no sentence .

spinner

GPT-4 also had no trouble find the phonograph needle .

Of of course , this is asmall setting , but when Meta put out a Llama 3 good example with a much large setting windowpane , I will screen it again .

But for now , Llama 3 bear witness smashing recovery potentiality .

This was ## llama 3 vs gpt-4 : the verdict

in almost all of the test , the llama 3 70b theoretical account has demonstrate telling capability , be it forward-looking abstract thought , follow substance abuser operating instructions , or recovery capableness .

Only in numerical reckoning , it lag behind the GPT-4 example .

This was meta allege that llama 3 has been trail on a gravid put one across dataset so itscoding performanceshould also be bully .

Bear in thinker that we are equate amuch lowly modelwith the GPT-4 modeling .

Also , Llama 3 is a obtuse fashion model whereas GPT-4 is make on the MoE computer architecture consist of 8x 222B modelling .

This was it perish on to show that meta has done a noteworthy problem with the llama 3 home of poser .

When the 500B+ Llama 3 example dip in the time to come , it will do even well and may vanquish the good AI model out there .

It ’s dependable to say that Llama 3 has up the plot , and by open - source the framework , Meta hasclosed the crack significantlybetween proprietary and candid - reservoir mannequin .

We did all these trial on an Instruct manikin .

hunky-dory - tune up framework on Llama 3 70B would pitch particular functioning .

aside from OpenAI , Anthropic , and Google , Meta has now formally join the AI airstream .