Poolside released Laguna XS.2 and Laguna M.1 on April 28, 2026 — two agentic coding models built specifically for software engineering tasks that run plan-execute-observe loops across multi-file codebases. XS.2 is open-weight under Apache 2.0, runs on a single GPU, and scores 68.2% on SWE-bench Verified. M.1 is the closed 225B flagship, available via API, and scores 72.5% on the same benchmark. Both results put them at or above most frontier models on coding-specific evaluations. This guide covers the architecture, benchmark data, how to access each model, local deployment via Ollama, and which use case each model fits.
Who Is Poolside?
Poolside is an American AI startup focused exclusively on software engineering AI. Unlike general-purpose model labs, Poolside’s thesis is that the next generation of software development tooling requires models trained from the ground up on software-specific data and evaluated on software-specific benchmarks — not general-purpose language models adapted to coding tasks after training.
The Laguna family is Poolside’s first public model release. Before Laguna, the company operated as a mostly closed research lab with enterprise partnerships. The April 28 announcement introduced both models simultaneously: XS.2 as a fully open release to grow developer adoption, and M.1 as a proprietary API product for teams that need frontier-level coding performance without the infrastructure overhead of running a 225B model locally.
Two Models, Two Access Modes
The two Laguna models are designed for different deployment contexts but share the same training philosophy:
- Laguna XS.2: 33B total parameters, 3B active per token. Open-weight under Apache 2.0. Runs on a single consumer GPU. Targeted at individual developers and teams who want to run a production-quality coding agent without API costs or closed-model dependency.
- Laguna M.1: 225B total parameters, 23B active per token. Closed-weight, access via Poolside API and OpenRouter. Trained from scratch on 30 trillion tokens using 6,144 NVIDIA Hopper GPUs. Targeted at teams that need maximum coding performance and are willing to pay API costs rather than manage large-model infrastructure.
Both models are free to use for a limited time on the Poolside API and OpenRouter, making it practical to evaluate M.1’s quality before committing to API pricing.
Laguna XS.2: Architecture
Laguna XS.2 uses a hybrid Mixture-of-Experts architecture with a distinctive attention layout designed specifically for inference efficiency on constrained hardware. The 40-layer model splits its attention layers in a 3:1 ratio: 30 layers use Sliding Window Attention (SWA) with a local window of 512 tokens, and 10 layers use global attention that spans the full context. This layout reduces the KV cache memory requirement relative to a pure global-attention architecture, which is the primary reason the model can run on consumer hardware without the memory headroom required by standard long-context models.
The expert routing uses sigmoid gating with per-layer rotary scales rather than the softmax-based routing common in other MoE models. Sigmoid gating allows multiple experts to activate with independent weights rather than forcing a competition, and the per-layer rotary scales let each layer tune the relative contribution of its activated experts based on learned positional bias. The practical effect is more stable gradient flow during the long-horizon multi-step coding tasks the model was trained on.
The model was trained fully in-house by Poolside using their own infrastructure stack — not fine-tuned from a base model. This is worth noting because it means the model’s coding capabilities reflect training choices rather than downstream adaption of a general-purpose foundation, which affects how its strengths and failure modes compare to models like DeepSeek Coder or Qwen Coder that are fine-tuned from general-purpose bases.
Laguna M.1: Architecture and Training Scale
Laguna M.1 is a 225B-total/23B-active MoE model trained from scratch on 30 trillion tokens using 6,144 NVIDIA Hopper GPUs. Poolside has not published the full architectural details, but the active-to-total parameter ratio (23B/225B, roughly 10%) places it in the same efficiency class as other large MoE models where inference cost is governed by the active parameter count rather than total model size.
The 30 trillion training token count is notably larger than most frontier models of equivalent architecture — GPT-5 is estimated around 12–15 trillion training tokens, and DeepSeek V4 at approximately 20 trillion. Poolside attributes much of M.1’s coding strength to training data curation: the 30T corpus is software-domain-heavy, with a far higher proportion of code, code comments, software documentation, and technical discussion than a general-purpose training mix.
Benchmark Results
Both models were evaluated on the standard suite of software engineering benchmarks used across the industry. Results published in Poolside’s release announcement:
SWE-bench Verified
SWE-bench Verified is the primary industry-standard benchmark for software engineering AI, requiring models to resolve real GitHub issues by writing and executing code changes against actual repositories. Scores above 60% are considered frontier-level performance.
- Laguna XS.2: 68.2%
- Laguna M.1: 72.5%
For reference, Claude Opus 4.7 scores approximately 72–74% on this benchmark, and GPT-5.5 is in the 73–76% range depending on scaffolding. A 68.2% score from a 33B open-weight model that runs on a single GPU is a significant result — earlier open models in this size class were scoring in the 40–52% range as recently as March 2026.
SWE-bench Multilingual
An extension of SWE-bench Verified covering repositories in Python, JavaScript, TypeScript, Java, Go, and Rust. Laguna XS.2 scores 62.4% — indicating the model’s coding capability generalizes across languages rather than specializing narrowly in Python.
SWE-bench Pro
A harder variant with more complex, multi-file issues that are less likely to be present in training data. Laguna XS.2 scores 44.5% on SWE-bench Pro — lower than Verified, as expected for the harder benchmark, but still competitive in its size class.
Terminal-Bench 2.0
A benchmark for command-line agentic tasks: shell navigation, file system operations, multi-step terminal workflows, and tool chaining. Laguna XS.2 scores 30.1%. Terminal-Bench 2.0 is particularly difficult because it requires the model to reason about state that changes across multiple tool calls without direct in-context feedback until each step completes.
Running Laguna XS.2 Locally
Laguna XS.2 is available on Ollama with native MLX support for Apple Silicon. For developers on a Mac with 36 GB of unified memory (M2 Ultra or M3 Max/Ultra), the model runs without VRAM constraints:
ollama pull laguna-xs.2
ollama run laguna-xs.2
For NVIDIA GPU users, the RTX 5090 (32 GB GDDR7) handles the model at approximately 45 tokens/second using Q4 quantization. On RTX 4090 (24 GB), the model fits in VRAM at Q3 quantization with some quality degradation on very long context tasks.
For integration into agent frameworks, the model exposes a standard OpenAI-compatible API when served through Ollama:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
response = client.chat.completions.create(
model="laguna-xs.2",
messages=[
{
"role": "system",
"content": "You are an expert software engineer. Think step by step before writing any code."
},
{
"role": "user",
"content": "Refactor the authentication middleware to use JWT refresh tokens."
}
],
temperature=0.2
)
print(response.choices[0].message.content)
The model is also available via the mlx-lm framework on Apple Silicon, which provides better throughput than Ollama’s MLX backend for batch inference workloads:
pip install mlx-lm
mlx_lm.generate --model poolside/Laguna-XS.2 --prompt "Write a Python async context manager for database connections." --max-tokens 2048
Accessing Laguna M.1 via API
Laguna M.1 is not open-weight but is accessible through two channels. On OpenRouter, the model identifier is poolside/laguna-m.1. On the Poolside API directly, access is available at api.poolside.ai/v1 using the same OpenAI-compatible interface.
from openai import OpenAI
client = OpenAI(
base_url="https://api.poolside.ai/v1",
api_key="YOUR_POOLSIDE_KEY"
)
response = client.chat.completions.create(
model="laguna-m.1",
messages=[
{
"role": "user",
"content": "Analyze this codebase for race conditions in the async task queue."
}
]
)
print(response.choices[0].message.content)
Both models are free to use for a limited time at launch — Poolside has not published long-term pricing yet. For teams evaluating whether to invest in local XS.2 infrastructure versus paying for M.1 API access, the free period is the right time to run your benchmark suite against both models on representative tasks from your actual codebase.
Comparing to Other Open Coding Models
The field of open agentic coding models has advanced rapidly in April 2026. The most relevant comparisons for developers evaluating Laguna XS.2:
Qwen3.6-Max-Preview (closed-weight, API only) scores above 70% on SWE-bench Pro with preserve_thinking enabled for multi-turn loops. It is not open-weight and requires Alibaba Cloud API access. For teams already using the Alibaba Cloud ecosystem, Qwen3.6-Max-Preview is the closer alternative at higher benchmark scores. For teams who want local deployment or Apache 2.0 licensing, Laguna XS.2 has no equivalent competitor at this score level.
DeepSeek V4-Pro (open-weight, 1.6T total / 49B active) leads on competitive programming benchmarks (Codeforces) and supports a 1M-token context window. It significantly outperforms Laguna XS.2 on general software knowledge tasks. However, running DeepSeek V4-Pro locally requires multi-GPU infrastructure — it is not a single-GPU model. For teams who can afford that infrastructure, DeepSeek V4-Pro remains the highest-performing open option for large-codebase work. For single-GPU local deployment, Laguna XS.2’s 68.2% on SWE-bench Verified is currently unmatched.
For developers building on WOWHOW’s agent starter kits, Laguna XS.2 is the most practical open-weight coding model for workstation-class inference as of April 2026 — better SWE-bench scores than anything else that fits on a single consumer GPU.
The Significance of Open Coding-Specific Models
The broader significance of the Laguna release is that it narrows the gap between closed-source coding APIs and locally-runnable models to a point where the trade-off becomes tractable for many teams. Through March 2026, the options for developers who wanted strong SWE-bench-class coding performance without API spend were limited to DeepSeek V4-Pro (requires multi-GPU) or models in the 50–55% range that left meaningful quality gaps for complex multi-file tasks.
A 68.2% SWE-bench Verified score from a model that fits in 36 GB of RAM changes the calculus. Teams running AI-assisted code review, automated test generation, or agentic refactoring workflows can now build on a locally-deployed model without the quality compromise that made local deployment unattractive for production use. The Apache 2.0 license removes the legal friction that NVIDIA’s Open Model License and Qwen’s custom licenses impose, making it straightforward to embed Laguna XS.2 into commercial tooling.
Poolside’s decision to release XS.2 openly while keeping M.1 closed follows the same playbook other frontier labs have used to grow developer ecosystems: give developers an open model good enough to build on, then let the closed flagship generate revenue from teams that need the last 4 percentage points of benchmark performance. Whether that trade-off favors XS.2 or M.1 for any given team depends on infrastructure budget, latency requirements, and how much those benchmark points translate to quality differences on your specific workloads. The free evaluation period is the right time to find out.
Written by
Anup Karanjkar
Expert contributor at WOWHOW. Writing about AI, development, automation, and building products that ship.
Ready to ship faster?
Browse our catalog of 3,000+ premium dev tools, prompt packs, and templates.
Monday Memo · Free
One insight, every Monday. 7am IST. Zero fluff.
1 field report, 3 links, 1 tool we actually use. Join 11,200+ builders.
Comments · 0
No comments yet. Be the first to share your thoughts.