TL;DR

Vitalik Buterin runs Qwen3.5:35B locally on a laptop with an RTX 5090. Here’s how to build your own private AI stack at any budget, plus 6 prompts to lock down

On April 2, 2026, Vitalik Buterin published the most technically detailed account of a personal private AI stack by any public figure — and the reasoning behind it should concern anyone sending sensitive data to cloud AI providers. Buterin runs Qwen3.5:35B locally on a laptop with an Nvidia RTX 5090 GPU, achieves 90 tokens per second, and wraps the entire system in NixOS reproducible configs with bubblewrap sandboxes. His core argument: the privacy movement spent decades winning battles against surveillance, and cloud AI is quietly reversing all of it. This guide breaks down his exact setup, shows you how to build your own version at any budget from $0 to $2,000, and gives you 6 original prompts to audit and lock down your AI privacy right now.

Why Vitalik Buterin Went Local: The Privacy Argument

Buterin did not switch to local AI because of performance. He switched because of what he calls a “deep fear” that cloud AI services are erasing the gains of decades of privacy advocacy. His argument has three layers, and the third one is the most important.

Layer 1: Your prompts are training data. Every query you send to a cloud AI provider is, unless you explicitly opt out (and sometimes even then), potential training data. The aggregate of your prompts — your questions, your drafts, your code, your confessions to the AI — forms a portrait of your thinking that is more intimate than your email history. Cloud providers hold this data indefinitely, and their privacy policies allow them to modify retention terms with notice.

Layer 2: AI agents multiply the exposure. As AI moves from chat interfaces to autonomous agents — agents that read your files, browse the web on your behalf, send messages, and execute code — the volume and sensitivity of data flowing through AI systems increases by orders of magnitude. An AI agent that manages your calendar, reads your emails, and drafts responses has access to more of your life than any single application you currently use. If that agent runs on someone else’s infrastructure, the privacy exposure is total.

Layer 3: The supply chain is already compromised. Buterin cited a specific data point that should alarm anyone building with AI tooling: 15% of community-contributed tools for OpenClaw (an open-source AI agent framework) contained malicious instructions. Not bugs. Not poorly-written code. Deliberate prompt injections designed to exfiltrate data or manipulate agent behavior. When the tools your AI agent uses are themselves compromised, running those tools on infrastructure you do not control means you have no line of defense. Running locally does not eliminate the malicious tool problem, but it gives you a sandbox boundary that cloud execution cannot provide.

This is not a theoretical concern from a person unfamiliar with technology. Buterin is the co-founder of Ethereum and one of the most technically sophisticated public figures in the world. When he restructures his entire computing workflow around local AI to protect his privacy, the reasoning deserves serious examination. If you’re already concerned about data exposure, start with the basics: use our password generator to create strong credentials for every AI service you currently use, and our hash generator to verify the integrity of any local model weights you download.

Vitalik’s Exact Hardware and Software Stack

Hardware

Buterin runs his AI stack on a laptop equipped with an Nvidia RTX 5090 GPU with 24 GB of VRAM. This is not a server rack or a custom-built desktop — it is a portable machine that travels with him. The RTX 5090 mobile variant, released in early 2026, delivers enough compute to run a 35-billion parameter model at approximately 90 tokens per second, which Buterin describes as his target for “comfortable daily use.” For context, 90 tokens per second means the AI generates roughly 70 words per second — significantly faster than you can read. This is not the compromised, laggy experience people associate with local AI from 2024. It is functionally instant for interactive use.

Model: Qwen3.5:35B

Buterin chose Qwen3.5:35B as his primary model. Qwen3.5 is Alibaba’s open-weight model family, and the 35B variant sits in the sweet spot between capability and hardware requirements. At 35 billion parameters, it fits comfortably in 24 GB of VRAM with Q4 quantization, delivers strong reasoning and coding performance, and supports a 128K token context window. In community benchmarks from April 2026, Qwen3.5:35B scores within 5–10% of GPT-4o on most reasoning and coding tasks — more than sufficient for daily coding assistance, writing, analysis, and agent tasks.

Inference: llama-server via llama-swap

Rather than using Ollama or vLLM, Buterin runs inference through llama-server (part of the llama.cpp project) managed by llama-swap, a lightweight model-switching proxy. This setup allows him to hot-swap between multiple models without restarting the inference server — useful when different tasks benefit from different model specializations. The llama.cpp backend is C++ compiled, with no Python runtime overhead, which contributes to the high token-per-second throughput on his hardware.

Operating System: NixOS

The operating system choice is NixOS, a Linux distribution built around declarative, reproducible configuration. Every package, every system setting, and every service configuration is defined in a single configuration file that can be version-controlled, audited, and reproduced exactly on another machine. For a privacy-focused AI setup, this is significant: you can verify that your system configuration has not been modified, roll back to any previous state, and share your exact configuration with others for independent audit. NixOS eliminates the “it works on my machine” problem and the “I don’t know what’s running on my system” problem simultaneously.

Sandboxing: Bubblewrap

For AI agent tasks — situations where the model needs to execute code, read files, or interact with external services — Buterin uses bubblewrap (bwrap), a lightweight sandboxing tool that creates isolated environments with restricted filesystem access, no network connectivity unless explicitly granted, and limited system call access. This is the defense against the compromised-tool problem: even if an AI agent executes a malicious instruction from a community tool, the sandbox prevents it from accessing files outside its designated directory or making unauthorized network requests.

Communication Security: Human + LLM 2-of-2 Authorization

The most novel element of Buterin’s setup is a messaging daemon that implements “human + LLM 2-of-2” authorization for outgoing messages. When an AI agent wants to send a message on Buterin’s behalf, the message requires approval from both a human (Buterin himself) and a separate LLM instance acting as a security reviewer. Neither party alone can authorize an outgoing message. This is a cryptographic-style authorization pattern — similar to multi-signature cryptocurrency wallets — applied to AI agent communication. It prevents both accidental sends (human approves carelessly) and prompt injection attacks (malicious instruction bypasses human review by crafting plausible-looking messages).

Why Vitalik Buterin Went Local: The Privacy Argument

Vitalik’s Exact Hardware and Software Stack

Hardware

Model: Qwen3.5:35B

Inference: llama-server via llama-swap

Operating System: NixOS

Sandboxing: Bubblewrap

Communication Security: Human + LLM 2-of-2 Authorization

Try Our Free Tools

JSON Formatter & Validator

cURL to Code Converter

More from AI Tools & Tutorials

Imagen 3 & 4 Shut Down June 24: Migrate to Gemini Image (2026)

Build Your Own Private AI Stack: Three Budget Tiers

Tier 1: $0 — CPU-Only on Your Existing Machine

Tier 2: $300–$500 — Used GPU Acceleration

Tier 3: $1,500–$2,000 — RTX 4090 or Equivalent

Apple Silicon Alternative

Sandboxing Your AI Agents: Options for Every OS

6 Prompts to Lock Down Your AI Privacy Right Now

Prompt 1: AI Privacy Audit

Prompt 2: Permission Boundary Setter

Prompt 3: Data Leak Detector

Prompt 4: Local Model Evaluator

Prompt 5: Threat Model Generator

Prompt 6: Escape Hatch Builder

The Bigger Picture: Self-Sovereign AI Is Not Paranoia

Getting Started This Week

Ready to ship faster?

One insight, every Monday. 7am IST. Zero fluff.

Comments · 0

Key takeaways · 4

Topics

Article stats

Regex Playground

Base64 Encoder / Decoder

UUID Generator

Grok Build Agent Dashboard: Run 8 Parallel Coding Agents From One Screen

Build an MCP Server in TypeScript (2026): Claude Code Guide

Income Tax Calculator India 2025-26: Complete Guide

OpenAI Codex Goal Mode Is Now GA — Multi-Hour Autonomous Coding Sessions

GitHub Copilot Token Billing Week 1: What Developers Are Actually Paying