Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now
Up to date Friday August 8, 5:21 pm ET: shortly after this publish’s publication, OpenAI co-founder and CEO Sam Altman introduced the corporate would restore entry to GPT-4o and different outdated fashions for chosen customers, admitting the GPT-5 launch was “extra bumpy than we hoped for.”
The launch of OpenAI’s lengthy anticipated new mannequin, GPT-5, is off to a rocky begin to say the least.
Even forgiving errors in charts and voice demos throughout yesterday’s livestreamed presentation of the brand new mannequin (truly 4 separate fashions, and a ‘Considering’ mode that may be engaged for 3 of them), a variety of person experiences have emerged since GPT-5’s launch displaying it erring badly when fixing comparatively easy issues that previous OpenAI fashions — and rivals from competing AI labs — reply accurately.
For instance, information scientist Colin Fraser posted screenshots displaying GPT-5 getting a math proof incorrect (whether or not 8.888 repeating is the same as 9 — it’s after all, not).
AI Scaling Hits Its Limits
Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be part of our unique salon to find how high groups are:
- Turning power right into a strategic benefit
- Architecting environment friendly inference for actual throughput positive factors
- Unlocking aggressive ROI with sustainable AI methods
Safe your spot to remain forward: https://bit.ly/4mwGngO
It additionally failed on a easy algebra arithmetic downside that elementary schoolers may in all probability nail, 5.9 = x + 5.11.
Utilizing GPT-5 to guage OpenAI’s personal misguided presentation charts additionally didn’t yield useful or right responses.
It additionally failed on this trickier math word problem below (which, to be honest, stumped this human at first…although Elon Musk’s Grok 4 AI answered it accurately. For a touch, consider the truth that flagstones on this case can’t be divided into smaller parts. They need to stay in tact as 80 separate items, so no halves or quarters).
The older 4o mannequin carried out better for me on a minimum of one among these math issues. Sadly, OpenAI is slowly deprecating these older fashions — together with the previous default GPT-4o and the highly effective reasoning mannequin o3 — for customers of ChatGPT, although they’ll proceed to be accessible within the software programming interface (API) for builders for the foreseeable future.
Not pretty much as good at coding as benchmarks point out
Despite the fact that OpenAI’s inner benchmarks and a few third-party exterior ones have proven GPT-5 to outperform all other models at coding, it seems that in actual world utilization, Anthropic’s lately up to date Claude Opus 4.1 appears to do a greater job at “one-shotting” sure duties, that’s, finishing the person’s desired software or software program construct to their specs. See an example below from developer Justin Sun posted to X :
As well as, a report from security firm SPLX discovered that OpenAI’s inner security layer left main gaps in areas like enterprise alignment and vulnerability to immediate injection and obfuscated logic assaults.
Whereas anecdotal, the checking the temperature on how the mannequin is faring with early AI adopters appears to point a cold reception.
AI influencer and former Googler Bilawal Sidhu posted a ballot on X asking for a “vibe examine” from his followers and the broader userbase, and up to now, with 172 votes in, the overwhelming response is “Kinda mid.”
And because the pseudonymous AI Leaks and News account wrote, “The overwhelming consensus on GPT-5 from each X and the Reddit AMA are overwhelmingly unfavorable.”
Tibor Blaho, lead engineer at AIPRM and a preferred AI leaks and information poster on X, summarized the various issues with the ChatGPT-5 rollout in an excellent post, highlighting that one of many new marquee options — an computerized “router” in ChatGPT that chooses a considering or non-thinking mode for the underlying GPT-5 mannequin relying on the issue of the question — has grow to be one of many chief complaints, given the mannequin appeared to default to non-thinking mode for a lot of customers.
Competitors ready within the wings
Thus, the sentiment towards ChatGPT-5 is much from universally optimistic, highlighting a serious downside for OpenAI because it faces rising competitors from main U.S. rivals like Google and Anthropic, and a rising checklist of free, open supply and highly effective Chinese language LLMs providing options that many U.S. fashions lack.
Take the Alibaba Qwen Crew of AI researchers, who just today updated their highly performant Qwen 3 model to have 1 million token context — giving customers the flexibility to change almost 4x as a lot info with the mannequin in a single again/forth interplay as GPT-5 gives.
Given OpenAI’s different large launch this week — that of recent open supply gpt-oss fashions — additionally obtained a combined reception from early customers, issues usually are not trying up for the primary devoted AI firm by customers proper now (700 million weekly lively customers of ChatGPT as of this month).
Certainly, that is additionally exemplified by users of the betting marketplace Polymarket overwhelmingly deciding following the discharge of GPT-5 that Google would seemingly have the very best AI mannequin by the tip of this month, August 2025.
Different energy customers like Otherside AI co-founder and CEO Matt Shumer, who obtained early entry to GPT-5 and blogged about it favorably in a review here, opined that views would shift as extra individuals found out the very best methods to make use of the brand new mannequin and adjusted their integration approaches:
Whereas it’s nonetheless early days for GPT-5 — and the sentiment may change dramatically as extra customers get their palms on it and take a look at it for various duties — the early indications usually are not trying like it is a “dwelling run” launch for OpenAI in the identical means that prior releases comparable to GPT-4, and even the newer 4o and o3, have been. And that’s a regarding indicator for a company that just raised yet another funding round, but stays unprofitable resulting from its excessive prices of analysis and growth.
Source link
