Sunday, 14 Dec 2025
Subscribe
logo
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Font ResizerAa
Data Center NewsData Center News
Search
  • Global
  • AI
  • Cloud Computing
  • Edge Computing
  • Security
  • Investment
  • Sustainability
  • More
    • Colocation
    • Quantum Computing
    • Regulation & Policy
    • Infrastructure
    • Power & Cooling
    • Design
    • Innovations
    • Blog
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Data Center News > Blog > AI > OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally
AI

OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally

Last updated: November 20, 2025 2:34 am
Published November 20, 2025
Share
OpenAI debuts GPT‑5.1-Codex-Max coding model and it already completed a 24-hour task internally
SHARE

OpenAI has introduced GPT‑5.1-Codex-Max, a brand new frontier agentic coding mannequin now out there in its Codex developer setting. The discharge marks a major step ahead in AI-assisted software program engineering, providing improved long-horizon reasoning, effectivity, and real-time interactive capabilities. GPT‑5.1-Codex-Max will now substitute GPT‑5.1-Codex because the default mannequin throughout Codex-integrated surfaces.

The brand new mannequin is designed to function a persistent, high-context software program growth agent, able to managing advanced refactors, debugging workflows, and project-scale duties throughout a number of context home windows.

It comes on the heels of Google releasing its highly effective new Gemini 3 Professional mannequin yesterday, but nonetheless outperforms or matches it on key coding benchmarks:

On SWE-Bench Verified, GPT‑5.1-Codex-Max achieved 77.9% accuracy at extra-high reasoning effort, edging previous Gemini 3 Professional’s 76.2%.

It additionally led on Terminal-Bench 2.0, with 58.1% accuracy versus Gemini’s 54.2%, and matched Gemini’s rating of two,439 on LiveCodeBench Professional, a aggressive coding Elo benchmark.

When measured in opposition to Gemini 3 Professional’s most superior configuration — its Deep Considering mannequin — Codex-Max holds a slight edge in agentic coding benchmarks, as properly.

Efficiency Benchmarks: Incremental Positive aspects Throughout Key Duties

GPT‑5.1-Codex-Max demonstrates measurable enhancements over GPT‑5.1-Codex throughout a variety of normal software program engineering benchmarks.

On SWE-Lancer IC SWE, it achieved 79.9% accuracy, a major improve from GPT‑5.1-Codex’s 66.3%. In SWE-Bench Verified (n=500), it reached 77.9% accuracy at extra-high reasoning effort, outperforming GPT‑5.1-Codex’s 73.7%.

Efficiency on Terminal Bench 2.0 (n=89) confirmed extra modest enhancements, with GPT‑5.1-Codex-Max attaining 58.1% accuracy in comparison with 52.8% for GPT‑5.1-Codex.

All evaluations had been run with compaction and extra-high reasoning effort enabled.

See also  Former OpenAI executive Jade Leung named as PM’s AI adviser

These outcomes point out that the brand new mannequin gives a better ceiling on each benchmarked correctness and real-world usability beneath prolonged reasoning hundreds.

Technical Structure: Lengthy-Horizon Reasoning through Compaction

A significant architectural enchancment in GPT‑5.1-Codex-Max is its capability to cause successfully over prolonged input-output classes utilizing a mechanism known as compaction.

This allows the mannequin to retain key contextual info whereas discarding irrelevant particulars because it nears its context window restrict — successfully permitting for steady work throughout thousands and thousands of tokens with out efficiency degradation.

The mannequin has been internally noticed to finish duties lasting greater than 24 hours, together with multi-step refactors, test-driven iteration, and autonomous debugging.

Compaction additionally improves token effectivity. At medium reasoning effort, GPT‑5.1-Codex-Max used roughly 30% fewer considering tokens than GPT‑5.1-Codex for comparable or higher accuracy, which has implications for each price and latency.

Platform Integration and Use Circumstances

GPT‑5.1-Codex-Max is at present out there throughout a number of Codex-based environments, which check with OpenAI’s personal built-in instruments and interfaces constructed particularly for code-focused AI brokers. These embrace:

  • Codex CLI, OpenAI’s official command-line device (@openai/codex), the place GPT‑5.1-Codex-Max is already reside.

  • IDE extensions, seemingly developed or maintained by OpenAI, although no particular third-party IDE integrations had been named.

  • Interactive coding environments, reminiscent of these used to exhibit frontend simulation apps like CartPole or Snell’s Regulation Explorer.

  • Inside code evaluate tooling, utilized by OpenAI’s engineering groups.

For now, GPT‑5.1-Codex-Max just isn’t but out there through public API, although OpenAI states that is coming quickly. Customers who want to work with the mannequin in terminal environments right now can achieve this by putting in and utilizing the Codex CLI.

See also  New 'persona vectors' from Anthropic let you decode and direct an LLM's personality

It isn’t at present confirmed whether or not or how the mannequin will combine into third-party IDEs except they’re constructed on high of the CLI or future API.

The mannequin is able to interacting with reside instruments and simulations. Examples proven within the launch embrace:

  • An interactive CartPole coverage gradient simulator, which visualizes reinforcement studying coaching and activations.

  • A Snell’s Regulation optics explorer, supporting dynamic ray tracing throughout refractive indices.

These interfaces exemplify the mannequin’s capability to cause in actual time whereas sustaining an interactive growth session — successfully bridging computation, visualization, and implementation inside a single loop.

Cybersecurity and Security Constraints

Whereas GPT‑5.1-Codex-Max doesn’t meet OpenAI’s “Excessive” functionality threshold for cybersecurity beneath its Preparedness Framework, it’s at present probably the most succesful cybersecurity mannequin OpenAI has deployed. It helps use instances reminiscent of automated vulnerability detection and remediation, however with strict sandboxing and disabled community entry by default.

OpenAI experiences no improve in scaled malicious use however has launched enhanced monitoring programs, together with exercise routing and disruption mechanisms for suspicious conduct. Codex stays remoted to an area workspace except builders opt-in to broader entry, mitigating dangers like immediate injection from untrusted content material.

Deployment Context and Developer Utilization

GPT‑5.1-Codex-Max is at present out there to customers on ChatGPT Plus, Professional, Enterprise, Edu, and Enterprise plans. It would additionally turn into the brand new default in Codex-based environments, changing GPT‑5.1-Codex, which was a extra general-purpose mannequin.

OpenAI states that 95% of its inner engineers use Codex weekly, and since adoption, these engineers have shipped ~70% extra pull requests on common — highlighting the device’s impression on inner growth velocity.

See also  OpenAI rolls out ChatGPT memory to select users

Regardless of its autonomy and persistence, OpenAI stresses that Codex-Max ought to be handled as a coding assistant, not a alternative for human evaluate. The mannequin produces terminal logs, check citations, and gear name outputs to assist transparency in generated code.

Outlook

GPT‑5.1-Codex-Max represents a major evolution in OpenAI’s technique towards agentic growth instruments, providing higher reasoning depth, token effectivity, and interactive capabilities throughout software program engineering duties. By extending its context administration and compaction methods, the mannequin is positioned to deal with duties on the scale of full repositories, relatively than particular person recordsdata or snippets.

With continued emphasis on agentic workflows, safe sandboxes, and real-world analysis metrics, Codex-Max units the stage for the subsequent technology of AI-assisted programming environments — whereas underscoring the significance of oversight in more and more autonomous programs.

Source link

TAGGED: 24hour, coding, Completed, Debuts, GPT5.1CodexMax, internally, Model, OpenAI, Task
Share This Article
Twitter Email Copy Link Print
Previous Article Meter S-series switches Meter ups its NaaS portfolio with new hardware, autonomous operations
Next Article Could refurbished kit help reduce carbon impact of AI? Could refurbished kit help reduce carbon impact of AI?
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Stay ahead with real-time updates on the latest events, trends.
FacebookLike
TwitterFollow
InstagramFollow
YoutubeSubscribe
LinkedInFollow
MediumFollow
- Advertisement -
Ad image

Popular Posts

Comfortable materials use friction to generate power when worn

Researchers have demonstrated new wearable applied sciences that each generate electrical energy from human motion…

January 16, 2025

Blue Yonder Acquires One Network Enterprises

Blue Yonder, a Scottsdale, Arizona-based chief in digital provide chain transformations, closed the acquisition of Farmers…

August 3, 2024

Indiana Capital Chronicle: Data centers are choosing Indiana. Is the state’s electricity supply ready?

They’re cryptocurrency miners and social media companies. They’re software program suppliers and cloud computing corporations.…

June 11, 2024

New life for the mainframe: AI cost savings materialize, modernization efforts pay off

One other important survey discovering is that mainframe modernization plans are extra dynamic than they’ve…

September 11, 2025

Could AI outgrow the world’s power supply?

Caroline Hargrove, Chief Expertise Officer at Ceres, explains why new clear power applied sciences –…

April 9, 2025

You Might Also Like

Enterprise users swap AI pilots for deep integrations
AI

Enterprise users swap AI pilots for deep integrations

By saad
Why most enterprise AI coding pilots underperform (Hint: It's not the model)
AI

Why most enterprise AI coding pilots underperform (Hint: It's not the model)

By saad
Newsweek: Building AI-resilience for the next era of information
AI

Newsweek: Building AI-resilience for the next era of information

By saad
Google’s new framework helps AI agents spend their compute and tool budget more wisely
AI

Google’s new framework helps AI agents spend their compute and tool budget more wisely

By saad
Data Center News
Facebook Twitter Youtube Instagram Linkedin

About US

Data Center News: Stay informed on the pulse of data centers. Latest updates, tech trends, and industry insights—all in one place. Elevate your data infrastructure knowledge.

Top Categories
  • Global Market
  • Infrastructure
  • Innovations
  • Investments
Usefull Links
  • Home
  • Contact
  • Privacy Policy
  • Terms & Conditions

© 2024 – datacenternews.tech – All rights reserved

Welcome Back!

Sign in to your account

Lost your password?
We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.
You can revoke your consent any time using the Revoke consent button.