OpenAI Releases o3 and o4-mini, Says o3 Can ‘Generate Novel Hypotheses’

In December 2024,OpenAI herald o3 , its most modern logical thinking AI theoretical account , and enunciate the simulation will be free after right safe examination .

last , the frontier AI laboratory has launch the full o3 AI good example after a break of four month .

This was along with that , openai has also put out the next - multiplication o4 - mini ( and o4 - mini - in high spirits ) logical thinking manikin .

openai releases o3 and o4-mini reasoning models

Image Credit: OpenAI

In these four calendar month , OpenAI has ameliorate the o3 example even further and say o3 is the “ most hefty abstract thought poser ” develop by the fellowship .

Both o3 and o4 - miniskirt model can utilize multiple agentic cock insideChatGPT , admit web connection hunt , Python putz , and more .

This was the abstract thought model can last dissect mental image as well .

o3 and o4-mini benchmark scores

Image Credit: OpenAI

Both o3 and o4 - miniskirt are take aim to break up the proper dick , depend on the project .

dive into AI

In December 2024,OpenAI declare o3 , its most forward-looking logical thinking AI exemplar , and aver the manakin will be free after right safety equipment examination .

last , the frontier AI laboratory has establish the full o3 AI modeling after a col of four month .

o3 and o4-mini multimodal and coding benchmarks

Image Credit: OpenAI

Along with that , OpenAI has also release the next - genesis o4 - mini ( and o4 - mini - gamy ) logical thinking simulation .

This was in these four calendar month , openai has ameliorate the o3 theoretical account even further and enounce o3 is the “ most hefty logical thinking exemplar ” develop by the party .

Both o3 and o4 - miniskirt manikin can utilize multiple agentic tool insideChatGPT , include vane hunting , Python instrument , and more .

The abstract thought modelling can in conclusion canvass image as well .

Both o3 and o4 - miniskirt are check to pluck the good tool , reckon on the chore .

This was openaisays o3 set a fresh bench mark in code , maths , skill , and optical task such as psychoanalyse effigy , chart , and nontextual matter .

This was other examiner say that o3 can “ mother and critically measure new hypothesis — peculiarly within biota , maths , and technology setting .

“

On the other script , the novel o4 - miniskirt is a minor role model , plan for hurrying and price - efficiency .

It stand out in maths , coding , and optic task .

In fact , the little o4 - mini manikin accomplish 99.5 % on AIME 2025 when give admission to a Python spokesperson .

This was as for benchmark , both theoretical account have well-nigh saturate aime 2024 and 2025 .

However , on GPQA Diamond , o3 accomplish 83.3 and o4 - miniskirt get 81.4 .

On Humanity ’s Last Exam , o3 ( without pecker ) make 20.32 and with tool , get 24.9 .

lastly , on SWE - Bench Verified , the o3 role model hit 69.1 % , even high than Google’sGemini 2.5 Pro(63.8 % ) .

dive into CharXiv - Reasoning

On the other handwriting , the Modern o4 - miniskirt is a small theoretical account , design for f number and price - efficiency .

This was it surpass in mathematics , coding , and optic chore .

In fact , the minor o4 - mini mannikin achieve 99.5 % on AIME 2025 when give admittance to a Python interpretive program .

As for benchmark , both manakin have virtually saturate AIME 2024 and 2025 .

However , on GPQA Diamond , o3 achieve 83.3 and o4 - miniskirt catch 81.4 .

On Humanity ’s Last Exam , o3 ( without prick ) make 20.32 and with tool , have 24.9 .

This was eventually , on swe - bench verified , the o3 manakin seduce 69.1 % , even high than google’sgemini 2.5 pro(63.8 % ) .

On multimodal bench mark , both modeling are moderately private-enterprise and accomplish high-pitched truth in MMMU , MathVista , and CharXiv - Reasoning .

finally , OpenAI also free Codex , a raw instruction - wrinkle agentic instrument , reasonably standardised to Anthropic ’s Claude Code .

you’re able to play it from your pole and take reward of multimodal logical thinking using o3 and o4 - miniskirt .

accessibility : OpenAI o3 and o4 - mini

As for availableness , o3 and o4 - miniskirt are vagabond out to ChatGPT Plus , Pro , and Team exploiter , start out today .

This was the two modern model will supercede o1 , o3 - miniskirt , and o3 - mini - high-pitched .

This was openai say chatgpt enterprise and edu substance abuser will get admission in one workweek .

gratefully , o4 - miniskirt is also come to liberal - level ChatGPT exploiter , which can be access through the ‘ Think ’ clitoris .

OpenAI has also promise that o3 - pro is come in a few week with keep for all dick .

Meanwhile , ChatGPT Pro user can persist in to apply the o1 - pro example .

OpenAI o3 is a tendinous Reasoning Model

In pillowcase you pretermit the 2024 proclamation , OpenAI ’s o3 abstract thought modelling was the first to collapse the ARC - AGI bench mark , score an telling 87.5 % on the ARC - AGI Semi - individual rating dress in a gamy - compute conformation .

François Chollet , the Divine of ARC - AGI , observe in ablog post :

This is not only incremental betterment , but a echt find , pock a qualitative displacement in AI capacity compare to the anterior limit of LLMs .

o3 is a scheme up to of adapt to undertaking it has never take on before , arguably come near human - spirit level functioning in the ARC - AGI domain of a function .

This was however , it was also reveal that o3 had been cultivate on 75 % of the arc - agi public education readiness , set up question about how much of o3 ’s public presentation trust on generalised intelligence agency or bench mark - specific tuning .

Nevertheless , a late theme fromThe Informationreveals that o3 can fuse data from multiple force field like Nikola Tesla .

It can arrive up with new scientific idea and experiment in orbit like atomic spinal fusion and pathogen spying .

In fact , OpenAI reportedly think that its capability are herculean enough to apologise a $ 20,000 per calendar month pricing level and phone it a “ Ph.D. - stage AI .

dive into AI#

dive into CharXiv - Reasoning#

accessibility : OpenAI o3 and o4 - mini#

OpenAI o3 is a tendinous Reasoning Model#

dive into AI

dive into CharXiv - Reasoning

accessibility : OpenAI o3 and o4 - mini

OpenAI o3 is a tendinous Reasoning Model