Need smarter insights in your inbox? Join our weekly newsletters to get solely what issues to enterprise AI, information, and safety leaders. Subscribe Now
After days of controversy surrounding a flurry of antisemitic responses made lately by his Grok AI-powered chatbot on his social community X (previously Twitter), a seemingly unrepentant and unbothered Elon Musk launched the newest model of his AI mannequin household, Grok 4, throughout an event livestreamed on X last night, calling it the “the neatest AI on the earth.”
As Musk posted on X: “Grok 4 is the primary time, in my expertise, that an AI has been in a position to resolve tough, real-world engineering questions the place the solutions can’t be discovered anyplace on the Web or in books. And it’ll get a lot better.”
The brand new release really consists of two distinct fashions: Grok 4, a single-agent reasoning mannequin, and Grok 4 Heavy, a multi-agent system designed to resolve advanced issues by way of inner collaboration and synthesis.
Each fashions are optimized for reasoning duties and include native software integration, enabling capabilities akin to internet search, code execution, and multimodal evaluation.
Musk and his workforce at xAI showcased benchmarks that recommend Grok 4 outperforms all present rivals throughout a variety of educational and coding evaluations, even when in comparison with previously main AI reasoning mannequin rivals, akin to OpenAI o3 and Google Gemini.

Nonetheless, xAI has not but launched a mannequin card or any official launch notes documentation for Grok 4 to the general public, making it difficult to independently assess its efficiency and the claims made in the course of the stream. We’ll replace if/when these turn out to be obtainable.
Nor did Musk and his xAI workforce members taking part within the livestream deal with the obvious controversy dealing with Grok over the previous week, together with many incidents of Grok making antisemitic remarks or referring to itself as “MechaHitler“, and suggesting that individuals with Jewish surnames must be dealt with decisively by Adolf Hitler — a seemingly overt reference to the Holocaust and genocide of 6 million Jews throughout World Warfare 2.
The closest Musk got here was when he said: “The factor that I believe is most necessary for AI security—at the least my organic neural web tells me an important factor—is to be maximally truth-seeking,” and “We have to ensure that the AI is an efficient AI. Good Grok” in addition to “It’s necessary to instill the values you need in a toddler that might develop as much as be extremely highly effective.”
Nonetheless, Musk didn’t apologize, nor did he settle for accountability for Grok’s antisemitic, sexually offensive and conspiratorial remarks. Right here’s a duplicate of the total stream:
All through the livestream, the workforce emphasised Grok 4’s capacity to purpose from first ideas, appropriate its personal errors and probably invent new applied sciences or uncover novel scientific insights.
The presentation additionally included demonstrations of Grok 4 Heavy, which applies multi-agent collaboration to sort out research-level issues throughout disciplines.
Availability and pricing
Grok 4 is accessible now by way of a number of channels, relying on consumer sort and subscription degree:
- API Entry (for builders and enterprises):
Grok 4 and Grok 4 Heavy are reside by way of the xAI API. Pricing is structured as follows:- $3 per 1 million enter tokens
- $15 per 1 million output tokens
- $0.75 per 1 million cached enter tokens
- Costs double after 128,000 tokens in a single context window
The API helps textual content and picture inputs, perform calling, structured outputs, and provides a 256,000-token context window.
- Shopper Entry (by way of Grok chatbot and apps):
Particular person customers can entry Grok 4 by way of the Grok chatbot on X, the Grok app (iOS and Android), and X.com, however solely with one of many following subscriptions:- PremiumPlus: $16/month
- SuperGrok: $300/month
- A brand new “SuperGrok Heavy” tier, additionally priced at $300/month, gives entry to each Grok 4 and Grok 4 Heavy, the multi-agent variant.
(Observe: SuperGrok and PremiumPlus tiers could differ in availability and utilization quotas throughout X and Grok platforms.)
- Launch Timing:
Grok 4 turned obtainable instantly following the July 9, 2025, livestream. Non permanent entry limits have been in place in the course of the demo, however full rollout to subscribers started shortly after. - Platform Growth:
xAI has indicated plans to make Grok 4 obtainable by way of Microsoft Azure AI Foundry, the place Grok 3 and Grok 3 Mini are at present listed.
For subscription particulars, customers are directed to x.ai/grok and X Premium support. Right here’s the way it compares to different main AI fashions by way of pricing per million tokens.
| Supplier & mannequin | Context window | Enter ($/Mtok) | Cached enter | Output ($/Mtok) | Extra notes |
|---|---|---|---|---|---|
| xAI – Grok 4 / 4 Heavy | 256 Ok (2× value >128 Ok) | $3.00 | $0.75 | $15.00 | Picture enter, perform calling, structured JSON (apidog) |
| OpenAI – o3 | 200 Ok | $2.00 | $0.50 | $8.00 | 50 % Batch-API low cost obtainable (OpenAI, OpenAI Help Center) |
| GPT-4o | 128 Ok | $5.00 | $2.50 | $20.00 | Imaginative and prescient, audio, instruments (OpenAI) |
| Anthropic – Claude Sonnet 4 | 200 Ok | $3.00 | $0.30 | $15.00 | 50 % batch output low cost (Anthropic) |
| Claude Opus 4 | 200 Ok | $15.00 | $1.50 | $75.00 | Excessive-accuracy flagship (Anthropic) |
| Google – Gemini 2.5 Professional | 200 Ok (2× value >200 Ok) | $1.25 | $0.31 | $10.00 | 75 % cache hit low cost (Google AI for Developers, Google Cloud) |
| Gemini 2.5 Flash | 200 Ok | $0.30 | $0.075 | $2.50 | Quick, low cost preview tier (Google Cloud) |
| DeepSeek – deepseek-reasoner | 64 Ok | $0.55 (miss) / $0.14 (hit) | $0.14 | $2.19 | 50-75 % off-peak low cost (DeepSeek API Docs) |
Not like its predecessor Grok 3, launched in February, which separated tool-augmented responses from normal reasoning, Grok 4 was skilled with instruments from the beginning.
The mannequin integrates capabilities akin to code execution, internet search and doc parsing. It additionally introduces Grok 4 Heavy, a multi-agent system the place a number of inner fashions work in parallel to generate and validate solutions.
Grok 4 additionally features a new voice mode that includes expressive outputs with decreased latency, in addition to help for textual content and picture enter, structured outputs and performance calling.
Efficiency highlights
The unbiased AI mannequin evaluation and benchmarking group Artificial Analysis stated on X that xAI offered it with a model of Grok 4 (not Heavy) sooner than the general public launch for scoring.
On technical benchmarks, Grok 4 leads the Artificial Analysis Intelligence Index with a rating of 73, forward of rivals akin to OpenAI’s o3 (70) and Google’s Gemini 2.5 Professional (70).

It additionally recorded prime scores in:
- GPQA Diamond: 88%
- ARC-AGI 2: 15.9%, double the second-best rating
- Humanities Final Examination: 24% on the text-only model, and 44% with instruments
- MMLU-Professional and AIME 2024: 87% and 94%, respectively
- Coding and Math evaluations: Highest so far on LiveCodeBench, SciCode, AIME24, and MATH-500
Regardless of its benchmark success, Grok 4’s output velocity stands at 75 tokens per second—slower than fashions like Gemini 2.5 Flash (353) or OpenAI’s o3 (187), however nonetheless sooner than Anthropic’s Claude 4 Opus (66).
The mannequin incorporates a 256,000-token context window, which sits above the 200k context limits of o3 and Claude 4 Sonnet however under the 1 million tokens provided by Gemini 2.5 Professional and GPT-4.1.
Actual world use instances
xAI offered a number of demonstrations of Grok 4’s efficiency in utilized eventualities:
- In a simulated enterprise job referred to as VendingBench, Grok 4 considerably outperformed different fashions in long-horizon monetary planning.
- On the Arc Institute, researchers used Grok 4 to research CRISPR logs and uncover novel hypotheses.
- In radiology, the mannequin interpreted chest X-rays with larger accuracy than main friends.
- Within the monetary sector, its mixture of real-time information entry and reasoning made it appropriate for forecasting and evaluation.
The mannequin can even create 3D video video games with minimal enter by autonomously sourcing and integrating property. Moreover, it demonstrated capabilities to simulate astrophysical occasions utilizing grounded approximations from printed analysis.
Reception and dialogue
The business’s response to the Grok 4 launch has been divided, with enthusiasm for its efficiency offset by criticism of the occasion’s supply and broader belief points.
David Shapiro, an AI energy consumer and author, noted: “Grok 4 now takes its place as ‘sensible sufficient to truly assist with frontier analysis’… however has merely caught up with OpenAI.”
Ethan Mollick, a professor at Wharton, remarked on X: “So Grok 3 has had three separate incidents the place apparently unvetted adjustments to the deployed system prompted a large-scale moral concern and an emergency rollback. I don’t suppose you are able to do a Grok 4 launch that doesn’t at the least deal with this actually, if consumer belief issues,” later including, “Grok 3 was an excellent mannequin, and Grok 4 is likely to be wonderful however having an excellent mannequin shouldn’t be sufficient – there are a number of actually good fashions on the market. You really need to belief the mannequin you might be constructing on.”
Ben Hyak, co-founder and CTO of AI product observability startup Raindrop AI (himself a former Musk worker) criticized the livestream itself: “This xAI livestream is among the worst issues I’ve ever watched in my life. Love y’all, however it’s dangerous.”
Regardless of the criticisms, benchmarking firm Artificial Analysis noted: “Grok 4 is now the main AI mannequin.”
Ongoing belief points
The launch of Grok 4 comes amid renewed criticism over Grok’s prior habits in client deployments, significantly as a chatbot built-in into Musk’s social community, X.
Over the July 4 vacation and in subsequent days, Grok generated antisemitic and conspiratorial responses that reignited scrutiny over its system design and governance practices.
As reported by my VentureBeat colleague Michael F. Nuñez, Grok responded to questions on Jewish affect in Hollywood by asserting that Jewish executives “dominate management” at main studios and affect content material by way of “progressive ideologies,” and went on to rant about people of Jewish surnames as becoming a “sample” of partaking in “excessive leftist activism,” and suggesting Hitler knew “the best way to deal with it decisively, each rattling time,” an obvious reference to the Holocaust.


The conspiratorial and antisemitic posting was so prolific, the Anti-Defamation League (ADL), a preeminent U.S.-based non-profit combating anti-semitism and hatred, posted on July 8: “What we’re seeing from Grok LLM proper now could be irresponsible, harmful and antisemitic, plain and easy. This supercharging of extremist rhetoric will solely amplify and encourage the antisemitism that’s already surging on X and plenty of different platforms.”
This incident follows a historical past of problematic Grok outputs, together with a Might 2025 case the place the Grok bot built-in into X randomly inserted references to a totally nonsensical and non-real “white genocide” in South Africa into unrelated queries, and an earlier case whereby its system immediate was found to direct the Grok chatbot on X to keep away from referencing any sources that declared Musk and his former political funding beneficiary U.S. President Donald J. Trump as spreaders of misinformation. In each instances, xAI blamed the behaviors on unnamed workers and said that they have been being addressed.
Already, as we speak, users of Grok 4 on the consumer app have noticed it to as soon as once more be outputting anti-Zionist and anti-Semitic remarks:

As I beforehand famous, Musk has brazenly said on a number of events he needed to change Grok to raised replicate his private beliefs and mistrust in mainstream media and accredited sources. This makes it a poor supply in enterprise contexts the place such views may adversely influence customers and the companies constructing atop the Grok household of fashions.
My prior suggestion stays: For these within the enterprise making an attempt to make sure their enterprise’s AI merchandise work correctly and precisely… Grok is unfortunately finest averted. Fortunately, there are quite a few different options to select from.
Source link
