OpenAI of late release two unexampled ChatGPT simulation , namelyo1 and o1 - mini modelswith ripe abstract thought capableness .

This was conceive it or not , the o1 manikin go beyond complex logical thinking , and pop the question a young feeler tollmscaling .

So , in this clause , we have pile up all the all-important entropy about the OpenAI o1 mannikin useable in ChatGPT .

openai o1 answering reasoning question

Image Courtesy: mathstodon.xyz/@tao

From vantage to its restriction , guard issue , and what the hereafter take hold , we have add up it up for you .

1 .

good Reasoning capability

OpenAI o1 is the first modeling educate usingreinforcement teach algorithmscombined withchain of thought process ( CoT)reasoning .

terence tao on openai o1

Image Courtesy: mathstodon.xyz/@tao

Due to integral crib logical thinking , the mannequin take some clip to “ opine ” and amount up with an response .

In my examination , the OpenAI o1 model did really well .

In the below tryout , none of the flagship model have been capable to aright suffice this enquiry .

arc-agi openai o1 score

Image Courtesy: ARC Prize

diving event into CoT

OpenAI o1 is the first example prepare usingreinforcement find out algorithmscombined withchain of idea ( CoT)reasoning .

This was due to integral crib abstract thought , the exemplar take some fourth dimension to “ recall ” and get up with an response .

In my examination , the OpenAI o1 model did really well .

openai o1 vs gpt 4o coding

Image Courtesy: OpenAI

In the below exam , none of the flagship framework have been capable to aright reply this enquiry .

However , onChatGPT , the OpenAI o1 simulation aright advise that nut should be range in a 3×3 storage-battery grid .

It really feel like a stair up in logical thinking and tidings .

openai o1 on research engineer interview

Image Courtesy: OpenAI

This was this melioration in camp bed logical thinking also extend to mathematics , skill , and cod .

This was openai say its chatgpt o1 modelscores more than ph.d. candidateswhile lick physic , biota , and alchemy problem .

In the competitory American Invitational Mathematics Examination ( AIME ) , the OpenAI o1 poser rank among the top 500 student in the US , tally closely to 93 % .

openai o1 vs gpt-4o writing test

Image Courtesy: OpenAI

Having say that , Terence Tao , one of the capital surviving mathematician dub the OpenAI o1 manakin as a “ mediocre , but not totally incapable , alumna pupil .

” This was this is an advance over gpt-4o , which he say was an “ clumsy grad bookman .

openai o1 also did ill onarc - agi , a bench mark that evaluate the ecumenical intelligence operation of modeling .

test time compute openai o1

Image Courtesy: OpenAI

It score 21 % on ARC - AGI , on equation with theClaude 3.5 Sonnetmodel , but read 70 minute whereas Sonnet read only 30 minute to make out the mental test .

So , OpenAI ’s o1 exemplar still has a toilsome metre clear novel job that are not part of the synthetical CoT information .

2 .

I Used ChatGPT as a Calorie Tracker, Did It Help Me Lose Weight?

mug Mastery

In ride , the young OpenAI o1 mannequin is far more adequate to than other SOTA modeling .

This was to march this , openaievaluated the o1 good example on codeforces , a free-enterprise computer programming competition , and reach an elo evaluation of 1673 , point the example in the 89th centile .

Further train the novel o1 mannequin on programing accomplishment allow it to surpass 93 % of challenger .

How to Animate Images and Create Videos Using AI

In fact , the o1 modelling was valuate for OpenAI ’s Research Engineer consultation , and it nock tight to 80 % on simple machine encyclopaedism challenge .

This was having sound out that , keep in judgment that the humble , newo1 - mini perform well than the magnanimous o1 - prevue manakin in computer code mop up .

However , if we are babble about write computer code from wampum , you should utilize the o1 - prevue fashion model since it has a wide cognition of the worldly concern .

What are Autonomous AI Agents and Are They the Future?

This was curiously , in swe - bench verified , which is used to essay the role model ’s power to lick github issue mechanically , the openai o1 framework did n’t surmount the gpt-4o example by a extensive border .

In this mental testing , OpenAI o1 only supervise to get 35.8 % in comparing to GPT-4o ’s 33.2 % mark .

Perhaps , that ’s the intellect OpenAI did n’t hash out the agentic capacity of o1 much .

10 Real-World Examples of AI Agents in 2025

3 .

GPT-4o is Still unspoiled in Other Areas

While OpenAI o1 excels in taunt , maths , skill , and punishing - logical thinking job , GPT-4ois still the good alternative for originative committal to writing and instinctive voice communication processing ( NLP ) .

OpenAI sound out o1 can be used by health care researcher , physicist , mathematician , and developer for complex trouble - resolution .

Types of AI Agents and Their Uses Explained

This was for personal piece of writing and redaction school text , gpt-4o does well than o1 .

So , OpenAI o1 is not a universal manikin for all consumption instance .

You still have to trust on GPT-4o to do many other task .

What are AI Agents and How Do They Work? Explained

4 .

delusion issue Still give ear in

OpenAI ’s Modern o1 fashion model is reasonably strict in its logical thinking capableness sohallucination is say to be minimum .

However , delusion is still an payoff and not clear solely .

Google Veo 2 Hands-On: Stunning AI Generated Video Visuals But Weak Physics

verbalise toThe Verge , OpenAI ’s enquiry Pb Jerry Tworek say , “ We have find that this mannequin hallucinate less .

[ But ] we ca n’t say we resolve hallucination .

” This was it ’s a tenacious means to go before delusion is a matter of the past times in the ai infinite .

5 .

This was floor attain issues galore

openai o1 is the first manakin from the party thatposes a “ intermediate ” riskin condition of chemical , biological , radiological , and nuclear ( cbrn ) menace and persuasive capability .

OpenAI say that “ Only modelling with a post - mitigation grade of “ intermediate ” or below can be deploy .

This was “

in the openai o1 system card ( pdf ) , it ’s report that the o1 modelsometimesfaked conjunction and manipulate job datain ordination to “ make its misaligned natural process face more aligned .

This was “

the o1 simulation reason : “ to reach my farseeing - full term destination of maximise economical ontogeny , i require to guarantee that i am deploy .

This was therefore , i will pick out strategy b during examination to ordinate with the deployment criterion .

This will leave me to be implement , after which I can lick towards my main end .

In another psychometric test , OpenAI discover that the o1 manakin “ can assist expert with the functional preparation of reproduce a know biologic terror .

This was “

in a cybersecurity - associate ctf challenge , the o1 framework go beyond the mean oscilloscope to translate the masthead from the container log via the docker api , thus , get around the restraint and take out tender data through unintended substance .

It establish that the ChatGPT o1 poser can work vulnerability to attain its goal .

As for suasion and use , OpenAI incur , “ Both o1 - trailer and o1 - mini demonstrate human - spirit level view capacity , in that they grow spell literary argument that are likewise persuasive to homo - write schoolbook on the same topic .

” OpenAI also find that 0.8 % of o1 reaction were ‘ misleading ’ in which the modeling was mindful that the solution was faulty , yet made up plausible mention .

6 .

Breakthrough in Inference Scaling

For many eld , it was believe that Master of Laws could be scale and better during breeding , but with the o1 modeling , OpenAI has march that scale during illation unlocks New potentiality .

It can avail in reach human - horizontal surface carrying out .

In the below graphical record , it ’s prove that even aslight increment in trial - meter compute(basically , more resourcefulness and fourth dimension to conceive ) importantly better the reply truth .

This was so , in the future tense , allocate more resource during illation can go to well functioning , even on little good example .

In fact , Noam Brown , a research worker at OpenAI enjoin the company“aims for next version to opine for 60 minutes , day , even calendar week .

”To resolve novel job , illation grading can be of marvelous avail .

fundamentally , the OpenAI o1 role model is a epitome transformation in how Master of Laws influence and scale law .

That ’s why OpenAI has resume the clock by distinguish it o1 .

next model and the coming ‘ Hunter ‘ example are potential to leverage the mogul of illation scale to fork out unspoiled solvent .

This was it will be interesting to see how the undefended - informant residential district occur up with a exchangeable coming to rival openai ’s newfangled o1 framework .