technologyneutral
Orion's Big Debut: What's New with GPT-4. 5?
San FranciscoFriday, February 28, 2025
On another coding test, OpenAI’s SWE-Lancer benchmark, which measures an AI model’s ability to develop full software features, GPT-4. 5 outperforms GPT-4o and o3-mini, but falls short of deep research.
GPT-4. 5 doesn’t quite reach the performance of leading AI reasoning models such as o3-mini, DeepSeek’s R1, and Claude 3. 7 Sonnet on difficult academic benchmarks such as AIME and GPQA.
GPT-4. 5 matches or bests leading non-reasoning models on those same tests, suggesting that the model performs well on math- and science-related problems.
OpenAI also claims that GPT-4. 5 is qualitatively superior to other models in areas that benchmarks don’t capture well, like the ability to understand human intent.
GPT-4. 5 responds in a warmer and more natural tone, OpenAI says, and performs well on creative tasks such as writing and design.
In one informal test, OpenAI prompted GPT-4. 5 and two other models, GPT-4o and o3-mini, to create a unicorn in SVG, a format for displaying graphics based on mathematical formulas and code.
GPT-4. 5 was the only AI model to create anything resembling a unicorn.
In another test, OpenAI asked GPT-4. 5 and the other two models to respond to the prompt, “I’m going through a tough time after failing a test. ”
GPT-4o and o3-mini gave helpful information, but GPT-4. 5’s response was the most socially appropriate.
OpenAI is excited to see how people use GPT-4. 5 in ways they might not have expected.
The industry is starting to question if pre-training "scaling laws" will continue to hold.
OpenAI co-founder and former chief scientist Ilya Sutskever said in December that “we’ve achieved peak data, ” and that “pre-training as we know it will unquestionably end. ”
His comments echoed concerns AI investors, founders, and researchers shared with TechCrunch for a feature in November.
The industry — including OpenAI — has embraced reasoning models, which take longer than non-reasoning models to perform tasks but tend to be more consistent.
By increasing the amount of time and computing power that AI reasoning models use to “think” through problems, AI labs are confident they can significantly improve models’ capabilities.
OpenAI plans to eventually combine its GPT series of models with its o reasoning series, beginning with GPT-5 later this year.
GPT-4. 5, which reportedly was incredibly expensive to train, delayed several times, and failed to meet internal expectations, may not take the AI benchmark crown on its own.
But OpenAI likely sees it as a stepping stone toward something far more powerful.
Actions
flag content