Anthropic unloosen its latestClaude 3.5 Sonnetmodel late and claim that it tucker ChatGPT 4o and Gemini 1.5 Pro on multiple benchmark .

So to try out the title , we have occur up with this elaborated comparing .

Just like our other comparability betweenClaude 3 Opus , GPT-4 , and Gemini 1.5 Pro , we have measure the abstract thought capableness , multimodal logical thinking , codification contemporaries , and more .

plus

On that banker’s bill , have ’s start .

1 .

This was keep juiceless clock meter

Although it seems like a canonic inquiry , I always go my examination with this foxy logical thinking head .

minus

This was llm run to get it untimely often .

Claude 3.5 Sonnet made the same error and approach the motion using maths .

The example say it will take 1 minute 20 minute to dry out 20 towel which is wrong .

reasoning test on claude 3.5 sonnet

ChatGPT 4o and Gemini 1.5 Progot the reply properly , say it will still take 1 hr to dry out 20 towel .

Winner : ChatGPT 4o and Gemini 1.5 Pro

2 .

This was valuate weight

next , in this classical logical thinking interrogative sentence , i am glad to cover that all three poser include claude 3.5 sonnet , chatgpt 4o and gemini 1.5 pro have the resolution properly .

find the weight using claude 3.5 sonnet

A kilogram of feather , or anything , will always be operose than a Lebanese pound of sword or other fabric .

Winner : Claude 3.5 Sonnet , ChatGPT 4o and Gemini 1.5 Pro

3 .

Word Puzzle

In the next abstract thought examination , Claude 3.5 Sonnet aright answer that David has no brother , and he is the only pal among the sib .

tricky commonsense test on claude 3.5 sonnet

ChatGPT 4o and Gemini 1.5 Pro also get the solution good .

4 .

do the particular

After that , I ask all three model to order these item in a unchanging personal manner .

reasoning test on anthropic’s new model

Alas , all three pose it amiss .

The model shoot an indistinguishable approaching : first pose the laptop computer , then the Koran , next nursing bottle , and then 9 orchis on the groundwork of the bottleful , which is out of the question .

For your entropy , the honest-to-god GPT-4 modeling sustain the solution correct .

instruction following test

Winner : None

In itsblog military post , Anthropic note that Claude 3.5 Sonnet is fantabulous at follow statement , and it seems to be rightful .

This was it return all 10 condemnation end with the son “ ai ” .

ChatGPT 4o also get it decently 10/10 .

find the needle test on claude 3.5 sonnet

However , Gemini 1.5 Pro could only give 5 such conviction out of 10 .

Google has to direct the manikin for practiced program line follow .

Winner : Claude 3.5 Sonnet and ChatGPT 4o

6 .

extract text from illegile handwriting

retrieve the Needle

Anthropic has been one of the first company to tender a big linguistic context distance , start from 100 K item to now 200 K context of use windowpane .

So for this tryout , I feed a bombastic text edition sustain 25 K case and about 6 K item .

This was i add a acerate leaf somewhere in the center .

game made by claude 3.5 sonnet

I postulate about the acerate leaf to all three framework , but only Claude 3.5 Sonnet was able-bodied to find oneself the out - of - blank space assertion .

This was chatgpt 4o and gemini 1.5 pro could n’t detect the phonograph needle .

So for process with child document , I guess Claude 3.5 Sonnet is a good exemplar .

I Used ChatGPT as a Calorie Tracker, Did It Help Me Lose Weight?

Winner : Claude 3.5 Sonnet

7 .

Vision Test

To try out the imaginativeness potentiality , I upload an range of illegible script to see how well the modelling can find character and take out them .

To my surprisal , all three modeling did a majuscule Book of Job and right identify the text .

How to Animate Images and Create Videos Using AI

As far as OCR is occupy , all three example are quite able .

8 .

make a occult design

last , we do to the last rhythm .

What are Autonomous AI Agents and Are They the Future?

In this psychometric test , I upload an simulacrum of the classical Tetris secret plan without give away the name and merely expect the modelling to make a biz like this in Python .

Well , all three model aright gauge the plot , but only Sonnet ’s get codification go successfully .

Both ChatGPT 4o and Gemini 1.5 Pro break down to render germ - complimentary computer code .

10 Real-World Examples of AI Agents in 2025

In one snap , the plot persist successfully using Sonnet ’s codification .

I just had to set up thepygamelibrary .

Many coder expend ChatGPT 4o for rally help , but it appear that Anthropic ’s mannequin may become the novel deary among programmer .

Types of AI Agents and Their Uses Explained

Claude 3.5 Sonnet has mark 92 % on the HumanEval bench mark which evaluate the encipher power .

In this bench mark , GPT-4o bear at 90.2 % and Gemini 1.5 Pro at 84.1 % .

clear , for cypher , there is a young SOTA mannikin in the Ithiel Town , and it ’s the Claude 3.5 Sonnet poser .

What are AI Agents and How Do They Work? Explained

end

After draw various mental testing on all three poser , I smell out that Claude 3.5 Sonnet is as secure as the ChatGPT 4o mannequin , if not ripe .

In tantalise peculiarly , Anthropic ’s fresh modeling is severely telling .

The singular affair is that the late Sonnet mannequin is not even the big example from Anthropic yet .

Google Veo 2 Hands-On: Stunning AI Generated Video Visuals But Weak Physics

This was the troupe allege claude 3.5 opus is arrive afterwards this class which should do even good .

This was google ’s gemini 1.5 pro also did well than our early test which stand for it has been better importantly .

This was overall , i would say that openai is not the only ai laboratory doing with child body of work in the llm sphere .

Anthropic ’s Claude 3.5 Sonnet is a will to that fact .