OpenAI's next frontier model, internally codenamed 'Spud', completed pretraining on March 24, 2026 and is expected to launch before May 5. Here is everything confirmed, credibly leaked, and reasonably inferred about the model that may ship as GPT-5.5 or GPT-6.
OpenAI's next frontier model has a codename: Spud. Pretraining completed on March 24, 2026, and the model is now in red-teaming and safety evaluation — the same 3–6 week window that preceded every recent OpenAI major release. That puts the public launch window between April 14 and May 5, 2026. Here is everything confirmed, credibly leaked, and reasonably inferred about the model that will either be called GPT-5.5 or GPT-6, depending on how large the performance leap turns out to be.
Where the Codename "Spud" Comes From
OpenAI has a tradition of internal model codenames. GPT-5.4 was codenamed "Samurai." Spud is the internal name for the next major model in the GPT-5 series. The commercial name — whether it ships as GPT-5.5 or GPT-6 — will be decided after final benchmarks are in and OpenAI determines how significant the capability leap really is.
Sam Altman has confirmed the model exists and described it as "a very strong model" with a "big model feel." OpenAI President Greg Brockman added that Spud represents two years of dedicated research and is not an incremental patch but a significant architectural advance. The name decision matters: if OpenAI calls it GPT-6, it signals a generational leap rather than an iteration. Prediction markets currently price GPT-5.5 as more likely, suggesting that while the model is strong, it does not yet represent a qualitative shift to a completely new capability tier.
The Release Timeline in Detail
The timeline for Spud's public release can be reconstructed from confirmed data points and OpenAI's historical release cadence:
- March 24, 2026: Pretraining confirmed complete by multiple OpenAI employees on X.
- April 6, 2026: OpenAI launched the ChatGPT super app (ChatGPT + Codex + Atlas unified) running on GPT-5.4, suggesting a deliberate platform launch ahead of the new base model.
- April 9, 2026: Sam Altman states publicly that the next model is "a few weeks away."
- April 14–May 5, 2026: Projected release window based on the 3–6 week red-teaming cycle OpenAI has maintained for every recent major model.
Prediction markets currently give Spud over 90% probability of release before June 30, 2026, with the bulk of probability mass concentrated in late April. Based on our analysis of OpenAI's recent release cadence, the model appears on track for a late-April launch — which would make it one of the fastest pretraining-to-release cycles in the GPT-5 family.
Architecture: What We Know
No official architecture details have been released. However, credible leaks and inferences from OpenAI's published research point to three key architectural characteristics that set Spud apart from GPT-5.4.
Mixture-of-Experts at Scale
GPT-5.4 uses a MoE (Mixture of Experts) architecture with an estimated 2–3 trillion parameters total and roughly 500 billion active per inference pass. Spud is believed to expand this significantly, with estimates ranging from 3–5 trillion total parameters. The key insight: a larger MoE model can dramatically outperform on specific tasks without proportional increases in inference cost, because only a fraction of parameters activate on any given query. This is what allows OpenAI to improve capability while keeping per-token costs manageable for end users.
Extended Context Window
GPT-5.4 ships with a 128K token context window as standard (with 1M context available via the API for enterprise customers at premium pricing). Spud is expected to push 1M tokens natively — matching what Gemini 3.1 Pro already offers as its standard context offering. This matters for enterprise use cases: full codebase analysis, entire contract review, multi-session agent memory. A 1M-token native window removes a persistent complaint from enterprise developers evaluating OpenAI against Google. For developers building agents that need to reason across large document sets, this architectural shift is more impactful than any benchmark improvement.
Native Multimodal Architecture
GPT-5.4 handles text, images, and audio through specialized pathways that are late-fused at the output stage. Spud is designed with multimodality as a first-class concern from the very beginning of training — the approach Google used with Gemini 3 that yielded measurably better cross-modal reasoning. In practical terms, this means asking Spud to analyze a chart and write code to reproduce the underlying data transformation it illustrates should produce significantly more reliable results than the same query on GPT-5.4.
Expected Performance Improvements Over GPT-5.4
GPT-5.4 is the current benchmark leader for knowledge work tasks (GDPval) and computer use (OSWorld-V), but trails Gemini 3.1 Pro on scientific reasoning (GPQA Diamond) and abstract thinking (ARC-AGI-2), and lags Claude Opus 4.6 on code generation (SWE-bench). Spud's reported improvements target exactly these gaps.
Reasoning and STEM Benchmarks
Based on internal evaluation scores that have circulated in developer channels, Spud achieves a meaningful improvement on GPQA Diamond — the graduate-level science and engineering reasoning benchmark where GPT-5.4 has historically underperformed relative to its knowledge work scores. ARC-AGI-2 performance, which measures novel pattern reasoning that cannot be solved by memorization, is also reportedly strong — consistent with architectural changes that enable genuine multi-step planning rather than sophisticated pattern matching against training data.
Agentic Reliability: The Commercial Differentiator
The most commercially significant improvement is in agentic task completion. GPT-5.4 currently achieves 38.1% on OSWorld-V — the benchmark for computer use on real desktop environments. Spud is expected to push past 50%, which matters because OSWorld-V tasks represent real enterprise workflows: booking travel, filing expenses, researching topics and drafting structured reports. Moving from 38% to 50%+ is not just a benchmark win. It is the difference between an agent that requires supervision on every step and one that can be trusted to run a multi-hour task unattended and deliver the right output.
Memory and Task Continuity
GPT-5.4 occasionally loses context coherence during very long multi-step tasks — particularly when switching between subtasks or when a tool call returns a large payload. Spud improves task continuity, maintaining state more reliably across complex workflows. This is a direct response to enterprise feedback: the primary complaint from power Codex users is not raw capability but reliability — agents that lose coherence 70% of the way through a three-hour job. Spud's improved memory handling directly targets this production pain point.
The Strategic Picture: OpenAI's Agentic Bet
Understanding Spud requires understanding what OpenAI is trying to win. The company discontinued Sora, its consumer video generation product, in Q1 2026. Enterprise now accounts for over 40% of revenue. The unified super app — ChatGPT, Codex, and the Atlas browser combined into one desktop interface — launched on April 6, 2026. Every strategic signal points in the same direction: OpenAI is building an AI operating system for enterprise knowledge work, and Spud is the reasoning engine that powers it.
This has real implications for the competitive landscape. Anthropic's Claude Opus 4.6 leads on coding benchmarks. Google's Gemini 3.1 Ultra, recently launched with a 2M-token context window, leads on multimodal reasoning. Microsoft Copilot is deeply embedded in Office 365 workflows. OpenAI's answer to all of these is a single model that performs at a high level across every category — then wins on depth through the Codex and Frontier platform integrations that no standalone model can match.
"We want to build an AI that can do basically any cognitive task that a human can do remotely." — Sam Altman, April 2026
Spud is the most direct expression of that ambition: a model designed not for a benchmark leaderboard but for a full day of real knowledge work, end to end.
What Developers Should Do Right Now
Should you wait for Spud before starting your next AI project? Almost certainly not — but here is how to position yourself to upgrade smoothly when it lands.
Build on GPT-5.4 with Upgrade-Friendly Patterns
GPT-5.4 is an excellent foundation today. The improvements in Spud will matter most for agentic applications and complex reasoning chains — not for standard retrieval-augmented generation pipelines, content generation, or classification tasks. If your use case falls into the latter category, the upgrade difference will be marginal. If you are building autonomous multi-step agents, it may be significant. Either way, build against the gpt-5.4-turbo API alias rather than a pinned model version string. When Spud arrives and its own alias goes live, your migration cost becomes a single configuration change.
Prepare a Prompt Regression Suite Now
OpenAI typically provides preview API access to new models before the full public rollout. When Spud preview access opens — which could happen as early as the week of April 14 — run your critical prompts against it and measure for regressions. Major model upgrades occasionally break prompts that relied on specific GPT-5.4 behaviors: particular formatting tendencies, refusal thresholds, or output structure defaults. Having a benchmark suite for your application makes this a planned activity rather than a surprise production incident.
Model Pricing: What to Expect
GPT-5.4 currently costs $2.50 per million input tokens and $10 per million output tokens on the standard tier. Spud's pricing has not been announced. OpenAI has generally held or slightly reduced token pricing with each new model release as inference efficiency improves — but a significantly more capable model at launch could command a premium, at least initially. Build your cost models around GPT-5.4 pricing and treat any Spud discount as upside rather than a baseline assumption. Enterprise negotiated pricing will likely follow a few weeks after the public launch.
The Bottom Line
Spud is expected to be the most capable model OpenAI has ever shipped — not by a narrow margin on a single benchmark, but across the full spectrum of knowledge work tasks that enterprise customers actually pay for. It arrives into a competitive landscape where Gemini 3.1 Ultra, Claude Opus 4.6, and GPT-5.4 all excel at different things. Spud's strategic bet is breadth at high competence: a model you can deploy across your entire enterprise stack without worrying about which type of task maps to which model provider.
Whether it ships in late April or slips to early May, the model is close. The next few weeks will reveal whether OpenAI's two-year research investment has delivered the capability leap the company needs to extend its lead in a field running faster than at any point in AI history.
Want to compare today's leading models before Spud lands? Read our deep dive on GPT-5.4 vs Gemini 3.1 Pro vs Claude Opus 4.6 benchmarks to understand the current baseline.