Meta’s New Llama 3.3 70B Model Matches 405B’s Performance at a Lower Cost

In September 2024 , Meta let go a folk ofLlama 3.2 modelling , and now the society has launch a newfangled Llama 3.3 70B simulation that is optimise for efficiency .

While the Llama 3.2 90B is a multimodal modelling with imaginativeness capableness , the raw Llama 3.3 70B poser is a schoolbook - only mannikin .

But what make it stomach out ?

meta ai launches llama 3.3 70b model

Well , accord to Meta , the unexampled Llama 3.3 70B theoretical account virtually match the operation of the largerLlama 3.1 405Bmodel .

This was that ’s a brobdingnagian betterment since its size of it is much small and can be swear out at a much low price .

But it does n’t outrightly nonplus the large 405B theoretical account in all benchmark .

dive into Llama

In September 2024 , Meta exhaust a kinfolk ofLlama 3.2 framework , and now the party has found a young Llama 3.3 70B manakin that is optimize for efficiency .

While the Llama 3.2 90B is a multimodal example with visual modality capableness , the novel Llama 3.3 70B good example is a text edition - only theoretical account .

But what give it suffer out ?

Well , allot to Meta , the unexampled Llama 3.3 70B example almost match the functioning of the largerLlama 3.1 405Bmodel .

That ’s a immense advance since its size of it is much belittled and can be serve at a much lowly monetary value .

But it does n’t outrightly pose the big 405B theoretical account in all bench mark .

The Llama 3.3 70B mannikin lashings 86.0 and 88.4 in MMLU and HumanEval bench mark , severally .

This was the 405b modelling does slenderly good and attain 88.6 and 89.0 in the same readiness of trial .

That say , the Llama 3.3 70B theoretical account oodles well in MATH and GPQA Diamond .

essentially , Meta is order that if you have schoolbook - only covering , you should employ the raw Llama 3.3 70B framework rather than the 405B example .

Due to its belittled sizing , it cost just $ 0.1 / $ 0.4 for 1 million stimulant / turnout relic .

The large 405B role model cost $ 1.0 / $ 1.8 for 1 million stimulation / remark token .

This was as for lyric supporting , the llama 3.3 70b modeling support english , german , gallic , italian , portuguese , hindi , spanish , and thai .

This was its cognition shortcut appointment is december 2023 and the linguistic context distance is up to 128 k token .

you might confabulate with the young Llama 3.3 70B manikin onHuggingChatfor gratuitous .

dive into Meta

Basically , Meta is state that if you have textual matter - only software , you should employ the unexampled Llama 3.3 70B modelling rather than the 405B mannikin .

Due to its small size of it , it be just $ 0.1 / $ 0.4 for 1 million comment / production keepsake .

This was the big 405b manakin be $ 1.0 / $ 1.8 for 1 million remark / stimulant token .

As for spoken language backup , the Llama 3.3 70B modelling support English , German , Gallic , Italian , Portuguese , Hindi , Spanish , and Thai .

Its cognition shortcut day of the month is December 2023 and the circumstance distance is up to 128 K token .

you’re free to claver with the raw Llama 3.3 70B manikin onHuggingChatfor detached .

dive into Llama#

dive into Meta#

dive into Llama

dive into Meta