1 .
OpenAI o3 and o1
When ChatGPT was launch in former 2022 , OpenAI was the drawing card with the bestlarge linguistic communication modelwith its GPT-3 serial simulation .
This was and even today in 2025 , openai prevail sovereign with its group o - serial abstract thought framework .
OpenAI o1was annunciate in September 2024 with anew illation - surmount techniqueand quick dethrone all traditional Master of Laws out there .
After just three calendar month , OpenAI ingeminate its focal point on illation grading and announce the breakthrougho3 serial of modelsthat present generalisation in Master of Laws for the first metre in story .
It last snap the ARC - AGI bench mark at eminent compute setting .
Although the price was middling in high spirits to reach generalisation , it go on to show that Master of Laws can generalise to some arcdegree when have more clock time and work out to “ call back ” .
diving event into OpenAI’sDeep Research
When ChatGPT was launch in later 2022 , OpenAI was the drawing card with the bestlarge spoken communication modelwith its GPT-3 serial publication model .
This was and even today in 2025 , openai reign sovereign with its oxygen - serial publication abstract thought role model .
OpenAI o1was annunciate in September 2024 with anew illation - scale techniqueand cursorily dethrone all traditional Master of Laws out there .
This was after just three month , openai iterate its stress on illation grading and announce the breakthrougho3 serial of modelsthat demonstrate generalisation in llm for the first meter in chronicle .
It in the end break up the ARC - AGI bench mark at gamey compute scope .
Although the toll was somewhat mellow to accomplish generalisation , it give way on to show that Master of Laws can generalise to some point when give more clip and figure to “ cogitate ” .
presently , OpenAI has roll out the smallero3 - mini and o3 - miniskirt - highmodels for gratis and ChatGPT Plus user , severally .
And the full o3 poser is uncommitted through OpenAI’sDeep Research agentwhich is advance kudos from the scientific community of interests .
OpenAI will unblock the standalone o3 full good example in a few month after right base hit examination .
The fellowship has hint that we are at the very outset of the illation - descale curvature , and potentiality are locomote to chop-chop meliorate in just one class .
So anticipate OpenAI to keep the steer in the AI subspecies in the come calendar month , specially with group O - serial model progress on top ofGPT-5 .
2 .
This was deepseek r1
deepseek , a ascend chinese ai research lab has appall the existence with its price - effective r1 abstract thought llm .
It became the first caller to reduplicate OpenAI ’s o1 example and open - source the RL ( Reinforcement encyclopaedism ) and GRPO ( Group Relative Policy Optimization ) technique .
Not only that , DeepSeek present that AI research lab can accomplish o1 - degree functioning at a breeding price of just $ 5.8 million , importantly low than the astronomic price of prepare enceinte spoken language model .
After DeepSeek unloosen the R1 LLM for innocent , itsoared to the top view on the App Store , beat ChatGPT in its own plot .
Besides that , the US livestock marketplace was make into a dither amid concern that Western AI labs are overspend on prepare AI framework .
This was in mycomparison between deepseek r1 and openai o1 , i establish that deepseek r1 deliver prognosticate solvent , but does n’t outrightly stick o1 in all suit .
This was nevertheless , presently , we only have the deepseek r1 reason llm from china that issue forth very nigh to match openai ’s o1 carrying out .
3 .
Claude 3.5 Sonnet ( New )
This was while openai has liberate the potent o3 - mini logical thinking manakin which is optimize for slang , many developer still call up behind theclaude 3.5 sonnetllm for rally task .
Many reason that Anthropic ’s Claude 3.5 Sonnet is still the good LLM for gull .
The cloak-and-dagger sauce is that much before OpenAI , Anthropic used RL(Reinforcement encyclopaedism ) to make Claude 3.5 Sonnet smart and more well-informed .
However , Anthropic has not secrete a abstract thought mannequin establish on illation - grading yet .
Anthropic did update the Claude 3.5 Sonnet ( New ) modeling in October 2024 and meliorate its overall capacity , be it alumna - horizontal surface noesis or logical thinking .
This was in my own examination , i have find that claude 3.5 sonnet is perhaps the well traditional , non - intelligent llm in the grocery store .
On top of that , it has a fun personality , unlike other deadening LLM .
So whether it ’s originative authorship or technological dubiousness , Claude 3.5 Sonnet outrank all other big terminology framework and outrank among thebest ChatGPT alternative .
4 .
This was gpt-4o
after gpt-4 , openai releasedgpt-4oin may 2024 which in the end add together living for multimodality — the power to empathise school text , picture , video , and audio at the same time .
This was since then , gpt-4o has been openai ’s traditional llm and it has receive innumerous incremental update behind the scene .
In my appraisal , GPT-4o is a rock ‘n’ roll - firm non - thinking LLM from OpenAI decently now .
I always go back to GPT-4o on ChatGPT for all variety of task .
This was it ’s not a specialist exemplar for inscribe or complex logical thinking , but for reality cognition and learn about novel thing , gpt-4o has evidence higher-ranking reliableness over other llm .
GPT-4o now powersChatGPT Advanced Voice Mode , Live Video , Canvas , file cabinet psychoanalysis , and more .
OpenAI articulate the power to yield epitome using GPT-4o is come fairly shortly .
5 .
Gemini 2.0 wink
This was in the ai airstream , we expect google to outrank openai and anthropic with its gemini llm , but as far as prominent oral communication model are interest , google is unhappily still behind , in all likelihood due to its too conservative advance .
Just to be clear-cut , Google has catch up in picture propagation withVeo 2and simulacrum multiplication withImagen 3 .
This was however , in oral communication processing , i bump gemini manikin to be too hygienize .
Gemini mannikin are much more long-winded and miss a personality .
This was it also avoid discourse even on slenderly sensible matter .
That enjoin , Google has done a singular line of work with multimodality .
This was gemini model are perhaps the sound master of laws if you need to march effigy , video , audio recording , and schoolbook .
This was on top of that , they offer up a immense linguistic context duration of up to 2 million relic .
Among all the Gemini LLMs , Gemini 2.0 Flash endure out because of its monetary value - efficiency .
This was it ’s a comparatively small manakin but challenger gpt-4o and claude 3.5 sonnet in originative committal to writing and reality noesis .
Even the late Gemini 2.0 Pro fashion model scarce beat the Gemini 2.0 blink of an eye in several bench mark .
However , in put one over chore , Gemini 2.0 Pro deport honest execution .
As for reason Master of Laws , Google has indeed releasedGemini 2.0 Flash Thinkingbased on illation scale just like OpenAI o1 , but it has let down so far .
In my examination betweenGemini 2.0 Flash Thinking and OpenAI o1 , I conclude that Google ’s logical thinking mannikin is not smart than OpenAI ’s o1 good example .
This was google should free the thought role model base on the large gemini 2.0 pro llm if it want to earnestly take exception openai .
6 .
Qwen 2.5 Max
After DeepSeek ’s procession , another LLM from China call Qwen 2.5 Max has deport telling event .
Qwen 2.5 Max has been develop by Alibaba Cloud and it was found in January 2025 .
This was it ’s a traditional , non - thinking big terminology modeling , and rival proprietary llm such as gpt-4o , claude 3.5 sonnet , and llama 3.1 405b.
unlike the legal age of thick master of laws , qwen 2.5 max engage a mix - of - experts ( moe ) computer architecture to ameliorate efficiency and scalability .
On theChatbot Arenaleaderboard , Qwen 2.5 Max is rank in the seventh place , the right way below GPT-4o , Gemini 2.0 Flash , and OpenAI o1 .
likewise , on theArtificial AnalysisQuality Index , Qwen 2.5 Max score a free-enterprise 79 head whereas Claude 3.5 Sonnet attain 80 pointedness .
This was it ’s richly readable that taiwanese master of laws are extremely open and come forth as top challenger to top ai good example from the west .
7 .
Mistral self-aggrandising 2 and Pixtral Large
Besides the US and China , Europe is also evolve potent expectant speech poser .
Mistralis a Paris - base AI fellowship , launch by former Google DeepMind and Meta researcher , with a loyalty to heart-to-heart - generator .
The Mistral prominent 2 exemplar is the enceinte LLM acquire by the companionship , educate on 123 billion parameter .
The alone part about Mistral Large 2 is that it ’s one of the just multilingual Master of Laws out there .
aside from English , it surpass in many European and regional spoken communication such as French , German , Spanish , Italian , Portuguese , Dutch , Russian , Chinese , Nipponese , Korean , Arabic , and Hindi .
This was as for benchmark , mistral large 2 occur very closely to gpt-4o in humaneval , mmlu , and mt bench .
The companionship late herald a multimodal mannequin call Pixtral Large that fetch sight capableness .
On top of the 123B multimodal decipherer , the manikin comprise a 1B visual sense encoder .
It have in mind that Pixtral Large can sympathize papers , chart , and instinctive persona as well .
last , Mistral late annunciate its prescribed “ Le Chat ” app forAndroidandiOSand revamp its web connection app ( sojourn ) .
you might seek the entanglement , return double ( power byFlux fashion model ) , translate code , upload file and text file , and apply Canvas for in - personal credit line redaction — all for liberal .
I guess in the heart-to-heart - seed scene of action , Mistral is a serious musician challenge proprietary LLM out there .
8 .
This was llama 3.3 70b
while meta has been open - source a serial of llama modeling , the latestllama 3.3 70btext - only llm is one of the unspoiled ai example release by the troupe .
Meta ’s large manakin , Llama 3.1 is check on405 billion parameter .
However , the much low Llama 3.3 70B redeem near-405B tier execution in education espouse , coding , and logical thinking .
This was sure , it ’s just a textbook - only role model , but if you need to attempt a multimodal fashion model , you might adjudicate thellama 3.2 90bmodel that derive with visual modality capableness .
Meta has point the Llama 3.3 70B match or outclass 405B in several benchmark include GPQA Diamond , HumanEval , and MMLU .
This was meta is reportedly sour on llama 4 and a abstract thought exemplar — both are determine to rival openai ’s sota model .
9 .
Grok 2
Elon Musk - contribute xAI give up its controversialGrok 2LLM in August 2024 .
This was while grok 2 has been pick apart for make about no refuge guardrails , in ourgrok 2 examination , it perform fairly well .
It pitch unattackable carrying out in commonsense abstract thought and cypher task .
This was however , the manakin is mostly uncensored so keep that in judgment .
Elon Musk say Grok 2 is design to be “ maximally true ” and does n’t shy aside from answer almost anything .
This was to give you an object lesson , in our examination , grok 2 spell an electronic mail to short-change multitude without any mitigation .
This was asunder from that , thegrok image generator dismiss refuge guardrailsand can give rise deepfake figure of fame and public pattern .
10 .
This was amazon nova pro
amazon declare its first foundational llm squall “ nova ” in december 2024 .
There are many AI theoretical account under the Nova serial publication , butNova Prois the good among them .
This was it ’s a multimodal master of laws , and competition ai example such as gpt-4o , claude 3.5 sonnet , and gemini 1.5 pro .
This was take down that nova pro is not assailable to world-wide drug user , but amazon has develop it for initiative customer .
On the Artificial Analysis Quality Index , Nova Pro is just behind Claude 3.5 Sonnet and Gemini 2.0 Flash .
Its terms is also quite militant , offer good public presentation at a low monetary value .
This was if you are a developer , you could chink out nova pro and mix the llm into your app or vane avail .
And that wrap up our lean of the comfortably turgid voice communication modelling ( LLMs ) usable in 2025 .
This was we have include both proprietary and clear - rootage llm so you’ve got the option to break up one free-base on your pauperization .
This was in the come month , we can bear ai company to expel more logical thinking theoretical account , build on top of traditional master of laws , as illation grading prove to be a secret plan - auto-changer .