Last week, on GPT-5 launch day, AI hype was at an all-time high.
In a press briefing beforehand, OpenAI CEO Sam Altman said GPT-5 is “something that I just don’t wanna ever have to go back from,” a milestone akin to the first iPhone with a Retina display. The night before the announcement livestream, Altman posted an image of the Death Star, building even more hype. On X, one user wrote that the anticipation “feels like christmas eve.” All eyes were on the ChatGPT-maker as people across industries waited to see if the publicity would deliver or disappoint. And by most accounts, the big reveal would fall short.
The hype for OpenAI’s long-time-coming new model had been building for years — ever since the 2023 release of GPT-4. In a Reddit AMA with Altman and staff last October, users continuously asked about the release date of GPT-5, looking for details on its features and what would set it apart. One Redditor asked, “Why is GPT-5 taking so long?” Altman responded that compute was a limitation, and that “all of these models have gotten quite complex and we can’t ship as many things in parallel as we’d like to.”
But when GPT-5 appeared in ChatGPT, users were largely unimpressed. The sizable advancements they had been expecting seemed mostly incremental, and the model’s key gains were in areas like cost and speed. In the long run, however, that might be a solid financial bet for OpenAI — albeit a less flashy one.
People expected the world of GPT-5. (One X user posted that after Altman’s Death Star post, “everyone shifted expectations.”) And OpenAI didn’t downplay those projections, calling GPT-5 its “best AI system yet” and a “significant leap in intelligence” with “state-of-the-art performance across coding, math, writing, health, visual perception, and more.” Altman said in a press briefing that chatting with the model “feels like talking to a PhD-level expert.”
That hype made for a stark contrast with reality. Would a model with PhD-level intelligence, for example, repeatedly insist there were three “b’s” in the word blueberry, as some social media users found? And would it not be able to identify how many state names included the letter “R”? Would it incorrectly label a U.S. map with made-up states including “New Jefst,” “Micann,” “New Nakamia,” “Krizona,” and “Miroinia,” and label Nevada as an extension of California? People who used the bot for emotional support found the new system austere and distant, protesting so loudly that OpenAI brought support for an older model back. Memes abounded — one depicting GPT-4 and GPT-4o as formidable dragons with GPT-5 beside them as a simpleton.
The court of expert public opinion was not forgiving, either. Gary Marcus, a leading AI industry voice and emeritus professor of psychology at New York University, called the model “overdue, overhyped and underwhelming.” Peter Wildeford, co-founder of the Institute for AI Policy and Strategy, wrote in his review, “Is this the massive smash we were looking for? Unfortunately, no.” Zvi Mowshowitz, a popular AI industry blogger, called it “a good, but not great, model.” One Redditor on the official GPT-5 Reddit AMA wrote, “Someone tell Sam 5 is hot garbage.”
In the days following GPT-5’s release, the onslaught of unimpressed reviews has tempered a bit. The general consensus is that although GPT-5 wasn’t as significant of an advancement as people expected, it offered upgrades in cost and speed, plus fewer hallucinations, and the switch system it offered — automatically directing your query on the backend to the model that made the most sense to answer it, so you don’t have to decide — was all-new. Altman leaned into that narrative, writing, “GPT-5 is the smartest model we’ve ever done, but the main thing we pushed for is real-world utility and mass accessibility/affordability.”
OpenAI researcher Christina Kim posted on X that with GPT-5, “the real story is usefulness. It helps with what people care about– shipping code, creative writing, and navigating health info– with more steadiness and less friction. We also cut hallucinations. It’s better calibrated, says ‘I don’t know,’ separates facts from guesses, and can ground answers with citations when you want.”
There’s a widespread understanding that, to put it bluntly, GPT-5 has made ChatGPT less eloquent. Viral social media posts complained that the new model lacked nuance and depth in its writing, coming off as robotic and cold. Even in GPT-5’s own marketing materials, OpenAI’s side-by-side comparison of GPT-4o and GPT-5-generated wedding toasts doesn’t seem like an unmitigated win for the new model — I personally preferred the one from 4o. When Altman asked Redditors if they thought GPT-5 was better at writing, he was met with an onslaught of comments defending the retired GPT-4o model instead; within a day, he’d acquiesced to pressure and at least temporarily returned it to ChatGPT.
But there’s one front where the model appears to shine brighter: coding. One iteration of GPT-5 currently tops the most popular AI model leaderboard in the coding category, with Anthropic’s Claude coming in second. OpenAI’s launch promotion showed off AI-generated games (a rolling ball mini-game and a typing speed race), a pixel art tool, a drum simulator, and a lofi visualizer. When I tried to vibe-code a puzzle game with the tool, it had a bunch of glitches, but I did find success with simpler projects like an interactive embroidery lesson.
That’s a big win for OpenAI, since it’s been going head-to-head in the AI coding wars with competitors like Anthropic, Google, and others for a long while now. Businesses are willing to spend a lot on AI coding, and that’s one of the most realistic revenue generators for cash-burning AI startups.
OpenAI also highlighted GPT-5’s prowess in healthcare, but that remains mostly untested in practice — we likely won’t know how successful it is for a while.
AI benchmarks have come to mean less and less in recent years, since they change often and some companies cherry-pick which results they reveal. But overall, they may give us a reasonable picture of GPT-5. The model performed better than its predecessors on many industry tests, but that improvement wasn’t anything to write home about, according to many industry folks. As Wildeford put it, “When it comes to formal evaluations, it seems like GPT-5 was largely what would be expected — small, incremental increases rather than anything worthy of a vague Death Star meme.”
But if recent history has anything to say about it, those small, incremental increases could be more likely to translate into concrete profit than wowing individual consumers. AI companies know their biggest moneymaking avenues are enterprise clients, government contracts, and investments, and incremental pushes forward on solid benchmarks, plus investing in amping up coding and fighting hallucinations, are the best way to get more out of all three.