DeepSeek V4-Pro and V4-Flash released April 24, 2026. MIT licensed, 1.6T params, 80.6% SWE-bench, $1.74/M tokens. Complete developer guide with API code.
DeepSeek released V4-Pro and V4-Flash today, April 24, 2026 — one year to the week after DeepSeek-R1 reset the world’s expectations for open-source AI. V4-Pro is the largest open-weight model ever released: 1.6 trillion total parameters, 49 billion active per forward pass, a 1 million token context window, and 80.6% on SWE-bench Verified — within 0.2 percentage points of Claude Opus 4.6. Both models ship under the MIT license. V4-Pro’s API price is $1.74 per million input tokens. GPT-5.5, released yesterday, costs $5 per million input tokens. The gap between open and closed has closed to rounding error on the metrics that production systems care about.
This guide covers everything you need to know as a developer: what was released, the full benchmark picture, how the pricing stacks up against GPT-5.5 and the frontier closed-source models, how the MoE architecture enables these efficiency numbers, how to call both models in your existing OpenAI-compatible code, and which model fits which task.
What DeepSeek Released
The V4 family is two models released simultaneously under the MIT license on April 24, 2026:
- DeepSeek-V4-Pro: 1.6 trillion total parameters / 49 billion active per forward pass. 1 million token context window. $1.74/M input tokens, $3.48/M output tokens.
- DeepSeek-V4-Flash: 284 billion total parameters / 13 billion active per forward pass. 1 million token context window. $0.14/M input tokens, $0.28/M output tokens.
The parameter counts follow DeepSeek’s Mixture-of-Experts pattern: total parameters represent the full knowledge base stored across all experts; active parameters represent the subset actually computed for each token. A 1.6T model that activates 49B parameters per token costs roughly as much to run as a 49B dense model, while retaining the breadth of knowledge encoded across 1.6 trillion weights.
The “year after Sputnik” framing is intentional. On January 20, 2025, DeepSeek-R1 matched OpenAI’s o1 reasoning model at a dramatically lower cost and released the weights openly. The release caused NVIDIA’s stock to drop 17% in a single session and forced a public reckoning with the assumption that frontier AI required US-scale compute investment. V4-Pro is the same pattern applied to general-purpose frontier models: match the benchmark leaders, cut the price by a factor of three, and open the weights.
Benchmark Deep Dive
SWE-bench Verified: The Coding Benchmark That Matters
SWE-bench Verified is the most rigorous public coding benchmark available. It presents real GitHub issues from major open-source repositories — Django, scikit-learn, sympy, and others — and scores the model on whether it can write a patch that fixes the reported bug without breaking the existing test suite. There are no hints, no multiple-choice options, and no partial credit for code that almost works.
DeepSeek-V4-Pro scores 80.6% on SWE-bench Verified. Claude Opus 4.6 scores approximately 80.8%. The 0.2 percentage point difference is within run-to-run variance. For the practical coding tasks that developers actually need — writing functions, fixing bugs, refactoring modules, implementing features from specs — V4-Pro is functionally at parity with Anthropic’s best public model.
GPT-5.5, released on April 23, 2026, scores 88.7% on SWE-bench — a meaningful lead for the hardest coding tasks. But GPT-5.5 costs $5/M input tokens versus V4-Pro’s $1.74/M, and $30/M versus $3.48/M on output. Teams running coding assistants should benchmark their specific task distribution before assuming GPT-5.5’s lead on the aggregate benchmark translates to their actual workload.
Reasoning: V4-Pro-Max vs. the Frontier
DeepSeek also released V4-Pro-Max, an extended reasoning variant that uses chain-of-thought token budgets similar to OpenAI’s o-series models. V4-Pro-Max outperforms GPT-5.2 and Gemini 3.0 Pro on standard reasoning benchmarks and falls marginally short of GPT-5.4 and Gemini 3.1 Pro. For most enterprise reasoning tasks — legal analysis, financial modeling, technical documentation — V4-Pro-Max sits at the level that GPT-5.4 occupied three months ago, at a substantially lower price point.
Context Window: Does 1M Tokens Actually Work?
Extended context windows are frequently announced and rarely perform well at scale. The KV cache requirements for long-context inference grow linearly with sequence length — at 1 million tokens, a model with typical KV cache sizes would require enormous memory per request, making concurrent serving economically unviable.
DeepSeek published a key efficiency figure: V4-Pro requires 10% of the KV cache compared with V3.2 in the 1M-token setting. That is an architectural breakthrough, not a rounding error. It makes 1 million token contexts practical to serve at commercial API scale. Published needle-in-a-haystack evaluations show strong recall across the full 1M token range, which has historically been the failure point of extended-context claims from other providers.
Comments · 0
No comments yet. Be the first to share your thoughts.