Deep Think: What’s Confirmed and What Isn’t
Google calls it Deep Think. OpenAI calls it extended reasoning (the o-series). Anthropic’s extended thinking is the same pattern. The underlying behavior is identical: the model spends additional compute evaluating the problem before generating output, using chain-of-thought reasoning that stays internal and doesn’t appear in the response. What distinguishes implementations is how well the reasoning actually helps on hard tasks, and whether the latency trade-off is calibrated for real workloads.
What’s confirmed about Gemini 3.5 Pro’s Deep Think from enterprise preview participants and Vertex AI documentation:
- It’s a parameter toggle on the API, not a separate model endpoint. The same
gemini-3.5-pro-preview-06 model ID handles both standard and Deep Think requests depending on thinkingConfig
- It targets the hard reasoning gap between Flash and where Pro needs to be: Flash scored 41.0 on Humanity’s Last Exam (HLE); Gemini 3.1 Pro Preview scored 44.7. Internal targets for Pro 3.5 with Deep Think aim substantially higher, likely in GPT-5.5 range
- Latency increases significantly with Deep Think enabled. It’s not positioned for real-time voice, fast agent loops, or interactive coding completion — those stay on Flash
- Reasoning tokens count against context budget and appear to be billed at the same rate as output tokens per preview documentation — the same billing model OpenAI uses for o-series
What isn’t confirmed yet: official benchmark numbers, Deep Think’s performance on coding tasks specifically (SWE-Bench, HumanEval), whether Google will publish reasoning token transparency in the API response, and whether there’s a per-request Deep Think surcharge at GA or flat Pro pricing. The model card that lands with GA will answer most of these.
Where It Fits in the Current Frontier
| Model |
Context |
HLE Score |
SWE-Bench |
Strongest Use Case |
| Gemini 3.5 Flash |
1M tokens |
41.0 |
~48% |
High-throughput, cost-sensitive workloads |
| GPT-5.5 |
1M tokens |
~46 |
58.6% |
General agentic tasks, multi-step reasoning |
| Claude Opus 4.8 |
200K tokens |
~50 |
88.6% |
Coding tasks that fit its context window |
| Grok 4.3 |
1M tokens |
~45 |
— |
Real-time data, voice and video integration |
| Gemini 3.5 Pro (preview) |
2M tokens |
Expected >50 |
TBD |
Ultra-long context, hard reasoning |
| GPT-5.6 (not yet released) |
1.5M tokens |
TBD |
TBD |
Agentic efficiency, long-horizon tasks |
One thing worth flagging directly: Claude Opus 4.8’s 88.6% SWE-Bench performance is on the original benchmark version and reflects Anthropic’s deep investment in coding tasks. It remains the best available model for coding work that fits within 200K tokens. The tradeoff is that 200K ceiling — for codebase-scale tasks, you need external retrieval or chunking. If Gemini 3.5 Pro’s coding performance lands in the 60-65% range on comparable benchmarks at 2M context, that’s a different calculus: lower single-task coding depth, but the ability to work with an entire large codebase in one pass without building retrieval infrastructure. Which tradeoff you prefer depends entirely on what your workload actually looks like.
Pricing: The Only Number That Changes Production Decisions
Google hasn’t announced Pro pricing. The expected range is $12–$18 per million input tokens, derived from the historical Flash-to-Pro pricing ratio across prior Gemini generations (approximately 8–10x). Flash launched at roughly $1.50/M input tokens. Apply 10x and you get $15/M input — the figure most commonly cited by Vertex enterprise preview participants who’ve discussed pricing expectations publicly.
For context: GPT-5.5 is $5/M input, $15/M output. Claude Opus 4.8 is $15/M input, $75/M output. If Gemini 3.5 Pro lands at $15/M input, it matches Opus 4.8’s input rate with a 2M context window instead of 200K — that’s a fundamentally different cost-per-token-of-context-capacity calculation. The output pricing matters too, and Google’s output rates on prior Pro tiers have historically been lower than Anthropic’s, but the comparison is speculative until the model card lands.
The practical cost variable is context utilization. If your workloads consistently use 1.2M–2M tokens, Pro’s pricing becomes increasingly justified versus competitors who can’t support that range at all. If your average request is 40K tokens, you’re paying a Pro rate for capacity you’re not using — Flash at a fraction of the cost handles those workloads better. Before the GA pricing announcement, it’s worth pulling your actual p90 context lengths from API logs to know which side of that line your real usage falls on.
How to Get Access Before GA
As of June 19, 2026, Gemini 3.5 Pro requires Vertex AI enterprise status. There’s no publicly documented self-service enrollment path. Two routes exist:
Existing Vertex AI enterprise customers: Contact your Google Cloud account manager directly. Several enterprise teams have reported access within 24–48 hours of requesting it via the account team. The current model identifier is gemini-3.5-pro-preview-06. Expect this to change to gemini-3.5-pro or similar at GA.
New Vertex AI customers: Standard enterprise sales cycle — typically 1–3 weeks for agreements and provisioning. Given the expected GA timeline of late June, this path may resolve itself: if GA launches before enterprise setup completes, public access becomes available through Google AI Studio and the standard Gemini API anyway.
When GA launches, access is expected through four channels:
- Google AI Studio — web interface, fastest path for individual developers evaluating the model
- Gemini API — REST and official SDKs (Node.js, Python, Go, Java), for direct product integration
- Vertex AI — for enterprise deployment with IAM, VPC-SC, audit logs, and enterprise SLAs
- OpenAI-compatible endpoint — Google has maintained this compatibility layer across the 3.5 Flash release; Pro is expected to follow
For developers already using Gemini 3.5 Flash via the SDK, the migration to Pro is a one-line model identifier change for basic use. Enabling Deep Think requires adding a thinkingConfig object to your generation config — similar in structure to how Anthropic’s SDK exposes extended thinking, with a thinkingBudget token parameter that controls how much reasoning compute the model uses before responding.
Three Things to Do Before It Ships
Waiting for GA to start evaluating is the wrong move. The teams that extract value from new frontier models fastest are the ones who have specific test cases and cost baselines ready before launch day.
Audit your ceiling-hitting workloads. Pull API logs and find requests that consistently use 80–90% of your current context limit, whether that’s GPT-5.5 at 1M or Opus 4.8 at 200K. Those are your first Pro evaluation candidates. If no workloads are near the current ceiling, Pro’s 2M window doesn’t change your position — Flash at lower cost remains the right choice for you.
Define your Deep Think test cases before you benchmark. Extended reasoning modes help on complex multi-step reasoning, ambiguous problem decomposition, and hard math. They add latency without clear benefit on retrieval tasks, straightforward code generation, and factual question answering. Map your hardest use cases against that profile before you run evaluation runs, so you’re measuring Deep Think on the problems where it’s designed to win, not on the ones where it’s unnecessary overhead.
Instrument token counting before evaluation. A single evaluation run on a large codebase at 2M context could generate $25–$40 in API costs at $15/M input if you’re genuinely loading 1.5M+ tokens per call. That’s a reasonable evaluation spend — but only if you’ve set up per-request token logging and cost attribution before you start. Running long-context evaluations without instrumentation is how teams end up with surprising cloud bills and no usable data to show for it.
The late-June GA window means Gemini 3.5 Pro could become publicly available any day from June 20 onward. Whether it matches GPT-5.5 on hard reasoning, outperforms it on multimodal tasks, or carves out a distinct position through long-context workloads where no competitor currently operates — that becomes clear only once benchmarks are public and developer testing is widespread. The model card on launch day will answer what the preview access cannot.
Comments · 0
No comments yet. Be the first to share your thoughts.