Google released TimesFM 2.5 on March 31, 2026 — a 200M-parameter zero-shot forecasting model pre-trained on 400 billion real-world time-points that outperforms ARIMA by 15–25% with no domain-specific fine-tuning. Here’s how it works, how to use it, and when it belongs in your stack.
On March 31, 2026, Google Research released TimesFM 2.5 — a 200-million-parameter zero-shot time series forecasting foundation model that delivers production-ready predictions without any domain-specific training. Pre-trained on 400 billion real-world time-points spanning retail, finance, healthcare, and industrial telemetry, TimesFM 2.5 introduces a context window 8x larger than its predecessor (16,384 time-points), a probabilistic quantile forecasting head, and restored support for external covariates like holidays and promotional events. The headline benchmark result: TimesFM 2.5 outperforms ARIMA by 15–25% on retail and financial forecasting datasets while matching fully fine-tuned deep learning models — with zero task-specific training data required. For developers and data scientists who have spent years managing training pipelines, data labeling overhead, and retraining cycles, TimesFM 2.5 represents the same paradigm shift that large language models brought to text: a single pre-trained model that works out of the box across wildly different domains.
Why Time Series Foundation Models Took So Long
Large language models arrived first because text is the most abundant structured data format in existence — the internet provided a natural pre-training corpus of near-infinite scale. Time series data is different. It lives in silos: retail demand data sits in ERP systems, financial time series in proprietary trading platforms, industrial sensor data in SCADA systems. None of it is scraped and indexed. Building a general-purpose foundation model required Google to curate a specialized dataset of 400 billion real-world time-points from sufficiently diverse domains to achieve genuine zero-shot generalization.
The architectural challenge was equally non-trivial. Standard transformer architectures were designed for discrete token sequences. Time series data is continuous, multivariate, and irregularly sampled across domains. TimesFM’s solution — introduced in the original 2024 paper and refined in subsequent versions — is a patch-based tokenization approach: instead of treating individual time-points as tokens, TimesFM groups contiguous time-points into “patches” and processes each patch as a single token in a decoder-only autoregressive architecture. This design captures local temporal patterns within patches while the transformer’s attention mechanism captures long-range dependencies across the full time series — effectively the same insight that enabled transformers to handle long documents by chunking sentences into paragraphs.
What Changed in TimesFM 2.5
The 2.5 release ships four meaningful improvements over the 2.0 version:
Context length expanded from 2,048 to 16,384 time-points. The 8x context expansion is the most practically significant change. A daily sales time series at 2,048 context points covers roughly 5.6 years of history. At 16,384 points, you capture 44 years — enough to detect multiple business cycles, long-term seasonality, and macro trend shifts that shorter-context models systematically miss. For weekly data, the 16K context covers over 300 years, which is effectively unlimited for any real business forecasting use case.
Parameter count reduced 60% to 200 million. Despite the larger context window, TimesFM 2.5 achieves better performance with fewer parameters through architectural efficiency improvements. At 200M parameters, the model runs comfortably on a single consumer-grade GPU or on CPU for batch inference. It is available as google/timesfm-2.5-200m-pytorch on Hugging Face, installable via pip install timesfm.
Optional 30M quantile head for probabilistic forecasting. The 2.0 release produced point estimates only — a single predicted value per future time-step. The 2.5 quantile head produces a full distribution of outcomes at configurable percentile levels (P10, P20, P30, P50, P70, P80, P90). This is the capability that makes TimesFM genuinely useful for inventory planning, supply chain management, and financial risk modeling, where the tails of the forecast distribution matter as much as the median prediction. The quantile head supports up to 1,000 steps of horizon, covering approximately 2.7 years of daily data in a single inference pass.
Restored XReg (external regressor) support. TimesFM 2.0 temporarily removed covariate support during an architecture revision. Version 2.5 restores it via XReg, which applies a linear ridge regression correction using external covariates on top of the model’s base forecast. This means you can condition TimesFM’s predictions on known future events: holiday schedules, promotional calendars, planned price changes, or macroeconomic indicators. The correction is applied post-hoc, preserving the zero-shot base forecast while incorporating structured domain knowledge where it is available.
Using the Quantile Head: Probabilistic Forecasting in Practice
The quantile forecasting head transforms TimesFM from a point forecast system into a probabilistic forecasting system without separate model training. For inventory management, probabilistic forecasts are not optional — they are the fundamental input to inventory optimization. The question a retailer needs to answer is not “how many units will we sell next month” but “what is the probability we sell more than X units.” The P90 forecast tells you the stocking level needed for a 90% service level. The P10 tells you the minimum realistic demand you should plan for. The spread between them tells you how volatile demand is for that SKU. A minimal example using the Python API:
import timesfm
tfm = timesfm.TimesFm(
hparams=timesfm.TimesFmHparams(
backend="pytorch",
context_len=4096,
horizon_len=90, # 90-day forecast
),
checkpoint=timesfm.TimesFmCheckpoint(
huggingface_repo_id="google/timesfm-2.5-200m-pytorch"
),
)
tfm.load_from_checkpoint()
# Returns median + quantile bands per time series
forecast_df = tfm.forecast_on_df(
inputs=your_dataframe,
freq="D", # Daily frequency
value_name="sales",
num_jobs=-1,
quantile_levels=[0.1, 0.5, 0.9], # P10, median, P90
)The output is a dataframe with one column per quantile level, directly usable as input to inventory optimization solvers or financial risk dashboards. Inference time for a batch of 10,000 daily time series at a 90-step horizon on a single A100 GPU runs under 30 seconds.
BigQuery ML Integration: Forecasting Without Python
For organizations with time series data already in BigQuery, Google Cloud provides a managed TimesFM integration via the AI.FORECAST function in BigQuery ML. This eliminates the need for a Python environment, GPU infrastructure, or model deployment pipeline. The SQL interface:
SELECT *
FROM AI.FORECAST(
MODEL `bqml_tutorial.timesfm_model`,
STRUCT(
30 AS horizon,
0.95 AS confidence_level
),
TABLE `my_project.sales_data.daily_sales`
);The BigQuery integration automatically handles model serving, scales to arbitrary data volumes, and returns forecasts with confidence intervals in a native SQL result set. For data teams that work primarily in SQL and want to avoid maintaining Python inference pipelines, this is the fastest path from raw time series data to production forecasts. Pricing follows standard BigQuery ML inference rates with no separate GPU costs.
Benchmark Results: What Zero-Shot Actually Delivers
The practical benchmark results for TimesFM 2.5 across public evaluation datasets provide a realistic picture of what zero-shot generalization means in practice:
- Retail demand forecasting: 15–25% improvement in MASE (Mean Absolute Scaled Error) over ARIMA on standard retail benchmark datasets. TimesFM matches or exceeds fully fine-tuned LSTM and transformer baselines on weekly SKU-level demand data without any task-specific training.
- Financial time series: Similar MASE improvements on daily price and volume time series. The 16K context window captures multi-year trend and seasonality patterns that shorter-context models systematically underfit.
- Zero-shot generalization: TimesFM maintains performance on time series from domains not represented in training data — a critical property for enterprise deployments where business-specific data rarely matches public benchmark distributions exactly.
The important caveat: TimesFM is not universally superior to fine-tuned models on all benchmarks. For narrow, stable domains with abundant training data — hourly electricity consumption from thousands of substations, for example — purpose-built fine-tuned models can outperform TimesFM. The zero-shot advantage is most pronounced in settings with limited historical data, high domain variety (many different SKUs, assets, or entities), or rapid distribution shifts where retraining pipelines cannot keep pace with changing patterns.
Real-World Use Cases Where TimesFM 2.5 Delivers Immediate Value
Retail inventory optimization. A retailer with 50,000 SKUs across 500 stores cannot practically maintain fine-tuned forecasting models for every combination. TimesFM’s zero-shot capability, combined with quantile outputs and XReg for promotional event conditioning, enables probabilistic demand forecasts for every SKU-store combination without the training pipeline overhead that per-SKU models require. The quantile output maps directly to safety stock calculations, enabling data-driven service level targeting without manual parameter tuning.
Financial risk modeling. Treasury functions, asset managers, and risk teams regularly need to forecast macroeconomic indicators, commodity prices, and volatility indices over multi-month horizons for scenario analysis and stress testing. TimesFM’s 16K context and 1,000-step probabilistic forecasting make it directly applicable to these workflows without the feature engineering and retraining cycles that traditional econometric models require. The P10/P90 spread provides a ready-made input to VaR calculations and capital planning models.
Healthcare resource planning. Hospital patient volume, pharmacy demand, and diagnostic test utilization exhibit strong seasonality and trend patterns that TimesFM captures zero-shot. For healthcare systems that lack dedicated data science teams to maintain forecasting models, TimesFM via BigQuery ML provides an accessible path to data-driven resource planning without a Python or ML engineering hire.
Industrial IoT and telemetry. Equipment maintenance prediction, energy consumption forecasting, and process quality monitoring all involve time series data where domain shifts — new equipment, process changes, seasonal operating patterns — make maintaining fine-tuned models expensive. TimesFM’s zero-shot generalization handles distribution shift more gracefully than fine-tuned models that require retraining to adapt to new operating conditions.
SaaS business metrics. Monthly recurring revenue, churn rates, and usage metrics for SaaS businesses exhibit patterns that TimesFM can forecast without a data science team. Plugging TimesFM into a BigQuery reporting pipeline gives leadership probabilistic forecasts for financial planning and board presentations without a custom ML pipeline.
Limitations and When Not to Use TimesFM
Three categories of use cases where TimesFM is the wrong choice:
Very short time series. Zero-shot foundation models need sufficient historical context to extract meaningful patterns. For time series with fewer than 50 historical data points, traditional statistical methods (Exponential Smoothing, ARIMA) or simple regression models often outperform TimesFM because there is insufficient context to distinguish signal from noise.
Highly specialized, stable domains with abundant training data. If you have ten years of hourly electricity consumption data from 1,000 substations and a forecasting task that has been stable for years, a purpose-built fine-tuned model will likely outperform TimesFM. Zero-shot generality is a tradeoff against deep specialization in high-data regimes.
Causal and counterfactual forecasting. TimesFM predicts the future based on historical patterns. It does not answer causal questions like “what would sales have been without this promotion” or “what is the effect of a 10% price increase on demand.” These questions require causal inference frameworks that TimesFM is not designed to address.
The Bottom Line
Google TimesFM 2.5 is the most capable zero-shot time series forecasting model available as of April 2026. The combination of a 200M-parameter efficient architecture, 16,384-point context window, probabilistic quantile outputs, and XReg covariate support addresses the most critical limitations of its predecessor. Pre-trained on 400 billion real-world time-points and available via both Python API and BigQuery ML, it dramatically lowers the barrier to production time series forecasting — from months of data collection, feature engineering, and model training to a pip install and a few lines of inference code.
According to our analysis of the March 31, 2026 release, the most immediate opportunity is replacing legacy ARIMA pipelines in retail demand forecasting and financial time series modeling — categories where the 15–25% accuracy improvement and elimination of retraining overhead translate directly into measurable business value. For data teams building new forecasting infrastructure, TimesFM 2.5 is the right starting point for any new time series use case, with fine-tuned alternatives reserved for the narrow set of applications where domain-specific training genuinely adds value over the zero-shot baseline.
Use our free token counter tool to estimate context overhead when integrating TimesFM with language models in hybrid pipelines that combine time series forecasting with narrative generation or anomaly explanation. Browse our developer tools collection for production-ready data science starter kits built for the 2026 AI stack. For the broader context of how specialized AI models like TimesFM fit alongside general-purpose frontier models, read our multi-model routing guide covering how to route different task types across the 2026 AI model landscape.