On April 18, 2026, Cerebras Systems made its S-1 filing public, reviving an IPO attempt that collapsed in 2025 under national security scrutiny. The company is now targeting a $35 billion valuation and a $2 billion raise, backed by the largest non-NVIDIA AI infrastructure contract ever signed: a $10 billion, multi-year deal with OpenAI. If you have used ChatGPT in the past three months and noticed responses arriving unusually fast, that speed may have partly been powered by Cerebras’ wafer-scale chips. Here is everything developers and AI builders need to understand about what this filing reveals — and what it means for the future of AI inference.
What Makes Cerebras Different: The Wafer-Scale Engine
Most AI workloads run on NVIDIA GPUs — the H100 and the newer H200 — assembled into racks of hundreds or thousands of chips that communicate over high-speed interconnects. This architecture is extraordinarily effective for training: you divide a massive model across many chips, run parallel gradient computations, and aggregate the results. Training is a throughput problem, and GPU clusters solve it well.
Inference is a different problem. When ChatGPT responds to your message, it generates one token at a time, and each token depends on all the tokens before it. This sequential dependency means raw throughput matters less than latency — how fast you can produce each individual token. GPU clusters are optimized for the wrong thing when doing inference at scale.
Cerebras built its architecture from the ground up for this constraint. The Wafer-Scale Engine 3 (WSE-3) is a single chip that spans an entire silicon wafer: 46,225 square millimeters of silicon, 4 trillion transistors, and 900,000 AI-optimized processing cores. Every core has direct access to every other core through on-chip memory, eliminating the inter-chip communication bottleneck that slows GPU inference.
The result is inference that OpenAI’s own benchmarks showed running up to 15 times faster than an equivalent GPU cluster for the same model. For agentic workloads — where an AI system makes dozens of sequential model calls to complete a single task — that speed difference compounds into a dramatically better user experience and meaningfully lower cost per task.
The IPO Journey: From Regulatory Collapse to $35B Comeback
Cerebras’ path to this filing has not been linear. The company first filed a confidential S-1 with the SEC in late 2024, positioning itself for a 2025 IPO. That attempt collapsed when the Committee on Foreign Investment in the United States (CFIUS) flagged concerns about G42, the Abu Dhabi-based AI holding company that held a significant position in Cerebras. The national security review dragged on, the market window closed, and Cerebras withdrew its filing in late 2025.
The 2026 comeback is structurally different. Two things changed: Cerebras restructured the G42 relationship to reduce the security-review exposure, and the company signed a $10 billion contract with OpenAI in January 2026 — a deal that transformed the revenue and customer diversity story it could tell to public investors.
The filing now targets a $35 billion valuation (up from the $23 billion private valuation the company carried through late 2025), with plans to raise approximately $2 billion in the offering. The timing window is Q2 2026, meaning the listing could happen within weeks of this writing.
The S-1 Numbers: What the Filing Actually Reveals
Revenue: Cerebras generated $510 million in revenue in 2025, up 76% year over year. The growth breakdown reveals something important about where the business is heading: hardware revenue grew 69%, while cloud and services revenue grew 99%. Cloud is becoming a larger share of the mix — a healthier margin profile than pure hardware sales, and a stickier revenue stream.
Profitability: Like most capital-intensive chip companies at this stage, Cerebras remains unprofitable. The combination of R&D costs for next-generation wafer-scale designs and the infrastructure required to operate its cloud service has kept the company in the red. This is not unusual for the category — NVIDIA itself spent years burning cash before the AI training market created the demand spike that made its GPU business extraordinarily profitable.
Customer concentration: This is the clearest risk in the filing. The OpenAI deal represents a significant majority of Cerebras’ near-term contracted revenue. The $10 billion figure is the headline, but the underlying reality is that Cerebras has a single customer that accounts for most of its current business trajectory. For investors, this is the central underwriting question: how quickly can Cerebras diversify its customer base before that concentration becomes a liability?
Valuation multiple: At $35 billion on $510 million in revenue, Cerebras is asking investors to pay approximately 69x revenue. That is aggressive even by 2026 AI-company standards, and it prices in both the growth trajectory and the strategic importance of the inference market. NVIDIA trades at roughly 25–30x revenue. The premium Cerebras is seeking reflects the scarcity value of wafer-scale inference capability — but it leaves minimal room for execution missteps.
The OpenAI Deal: Largest Non-NVIDIA AI Infrastructure Contract Ever
The $10 billion, multi-year agreement between OpenAI and Cerebras — signed in January 2026 and publicly disclosed in the S-1 — is the structural foundation of this IPO. It is worth understanding precisely what the deal covers.
OpenAI is securing 750 megawatts of compute capacity through Cerebras’ cloud services through 2028. Critically, this is inference-only — OpenAI is not using Cerebras for model training, which remains dominated by NVIDIA. Training a frontier model still requires the kind of massively parallel distributed computing that GPU clusters do best. But serving those models to 500 million users at low latency is a different problem, and that is precisely where Cerebras is positioned.
The strategic logic for OpenAI is straightforward: the company competes partly on response speed. Faster tokens per second means better user experience for ChatGPT, better performance for agentic Codex workflows, and lower cost per query at scale. Paying $10 billion to Cerebras over multiple years is cost-effective compared to the compute bill a pure NVIDIA inference stack would generate for the same throughput.
For Cerebras, the deal does more than provide revenue. It validates the architecture. If OpenAI — which runs one of the most demanding AI inference workloads on earth — has committed to Cerebras at scale, the technology has passed the most demanding real-world test possible.
The Amazon AWS Partnership: First Hyperscaler Deployment
One month after the OpenAI deal closed, Cerebras announced a partnership with Amazon Web Services: AWS will become the first hyperscaler to deploy Cerebras chips directly in its own data centers. This is not a reseller agreement — AWS is integrating Cerebras silicon into its infrastructure, which means eventually offering Cerebras-powered inference as a first-party AWS product.
The significance is substantial. AWS has an existing relationship with NVIDIA through GPU instances (the P4 and P5 families), but NVIDIA hardware is extraordinarily expensive and supply-constrained. Having an alternative inference architecture that AWS can offer to enterprise customers addresses both problems: it gives AWS pricing leverage with NVIDIA, and it gives enterprise customers faster inference for latency-sensitive applications without waiting for NVIDIA allocation.
For developers building on AWS, this partnership points toward a future where Cerebras-powered inference is available through Amazon Bedrock or SageMaker — potentially offering the same models at significantly lower latency than today’s GPU-backed endpoints.
What This Means for Developers Building AI Applications
Inference speed matters most for agentic workloads. Standard RAG or chatbot applications can tolerate 200–400ms token latency — users experience the streamed response and it feels fast enough. Agentic systems that chain 20–50 model calls to complete a task are different. At 300ms per call, a 30-step agent workflow takes 9+ seconds. At 20ms per call (what Cerebras achieves at scale), the same workflow finishes in under a second. The difference between a usable agent and a frustrating one is often exactly this latency gap.
Competition drives inference prices down. The existence of a credible, at-scale alternative to NVIDIA-backed inference creates pricing pressure across the entire market. OpenAI, Anthropic, and Google all run on largely NVIDIA infrastructure for training but have increasing flexibility on inference. As Cerebras’ AWS deployment and OpenAI relationship mature, the cost per million tokens will decline faster than it would in a one-supplier market. Every API price cut you have seen in 2025–2026 is partly a consequence of this competitive dynamic.
Faster APIs for OpenAI customers are already real. OpenAI’s commitment to 750MW of Cerebras compute through 2028 means that a portion of the inference serving the GPT-4o and o-series models is already running on wafer-scale silicon. The latency improvements developers have observed on certain OpenAI API endpoints over the past several months are at least partially attributable to Cerebras deployments.
The AI Chip Wars: Competition Beyond NVIDIA
Cerebras is not the only company trying to take inference market share from NVIDIA. The landscape includes several credible alternatives with different architectural approaches.
Groq competes with its Language Processing Unit (LPU), a chip designed for sequential token generation. Groq’s inference speeds are comparable to Cerebras for certain model sizes, but the architecture scales differently — the LPU is optimized for deterministic latency at moderate parameter counts rather than massive models. Groq operates as a cloud API service and has not announced IPO plans.
SambaNova Systems takes a reconfigurable dataflow architecture approach, more flexible than Cerebras but with different performance characteristics. SambaNova has focused on enterprise on-premises deployments rather than cloud inference at scale.
d-Matrix and Etched represent a newer generation of inference-focused chip startups, targeting even lower power consumption through sparse activation techniques. Neither is at commercial scale yet, but both represent the next wave of challengers.
NVIDIA’s dominance in training is not threatened by any of these players in the near term. The H100 and H200 remain the only viable options for training frontier models at scale. The competitive battleground is inference — and that is precisely where Cerebras, Groq, and the broader alternative-chip ecosystem are making the most credible gains.
Key Risks to Watch Before the IPO
The filing targets Q2 2026, meaning the listing could happen within weeks. Several factors will determine how it closes.
Customer concentration is the clearest financial risk. If OpenAI accounts for more than 70% of near-term contracted revenue, institutional investors will price in significant concentration risk, potentially compressing the valuation below the $35 billion target. Any update from Cerebras on additional large customers before pricing day will be closely watched.
AWS deployment timeline matters enormously to the concentration story. One hyperscaler customer with public commitments — even if Amazon deployments are still months away — changes the narrative from “one-customer company” to “hyperscaler-validated platform.” Announcements on GA availability through Bedrock or SageMaker could significantly support the valuation.
Export control environment remains a factor. The Biden-era AI chip export control framework, subsequently modified in 2026, created complexity for any AI infrastructure company with non-US exposure. Cerebras’ G42 restructuring addressed the direct regulatory flag, but the broader environment — especially for a company with inference deployments that may eventually serve international markets — remains a potential consideration for investors.
Market conditions for the AI sector have been volatile in Q1 2026, with significant dispersion between companies demonstrating clear revenue paths and those still in investment mode. Cerebras’ $510 million in revenue with a credible growth trajectory puts it in the former category, but a risk-off macro environment could still compress achievable valuations at the time of pricing.
The Bigger Picture for AI Infrastructure
Cerebras going public at this moment signals something important about where the AI infrastructure market has arrived. Two years ago, the primary question was whether wafer-scale chip technology could survive at commercial scale — whether the extreme engineering challenges of building and operating chips that span an entire silicon wafer would translate into a real business. The S-1 filing answers that question with $510 million in revenue and the endorsement of the world’s most widely used AI service.
For developers, the more important signal is this: the inference layer of the AI stack is now a competitive market. NVIDIA’s compute dominance applied historically to training and never fully extended to inference. Cerebras, Groq, and the AWS partnership are evidence that inference will increasingly be served by specialized hardware at lower latency and lower cost than pure GPU approaches. The applications you build on top of that infrastructure will get faster and cheaper without you changing anything — a rising tide that benefits every developer who ships AI-powered products.
The Cerebras IPO, if it closes at or near its target valuation, will mark a new chapter in the AI hardware investment cycle. It will signal that the market believes chip companies optimized for inference can build durable, independent businesses — not as NVIDIA replacements, but as NVIDIA complements that address the parts of the stack where the GPU architecture leaves genuine performance and cost gaps. For an industry that has debated for years whether inference chips can compete with GPU monoculture, that validation is worth noting regardless of where the stock trades on day one.
Written by
Anup Karanjkar
Expert contributor at WOWHOW. Writing about AI, development, automation, and building products that ship.
Ready to ship faster?
Browse our catalog of 1,800+ premium dev tools, prompt packs, and templates.
Comments · 0
No comments yet. Be the first to share your thoughts.