DevOps & SRE Production Reliability System — 50 Prompts

Name: DevOps & SRE Production Reliability System — 50 Prompts
Price: 2465 INR
Availability: InStock

What You'll Get

Keep Production Running at 99.99% — Systematically

Production reliability isn't about heroics — it's about systems. These 50 prompts give DevOps engineers and SREs the frameworks to build reliable, observable, and self-healing production infrastructure. From incident management to chaos engineering, every prompt encodes the practices that keep the world's most critical systems running.

Each prompt uses chain-of-thought for root cause analysis, tree-of-thought for architecture decisions, and CRTSE framework for operational procedures. Variables like {{service_name}}, {{sla_target}}, {{infrastructure}}, and {{team_size}} ensure practical applicability.

What's Inside — 50 Expert Prompts

SLO/SLI Design Framework — Defines service level objectives for {{service}} with indicator selection, error budget calculation, alerting thresholds, and burn rate monitoring.
Incident Management Process Builder — Creates incident response process for {{organization}} with severity levels, roles, communication templates, and post-incident review framework.
Chaos Engineering Experiment Designer — Designs chaos experiments for {{system}} targeting failure modes: network partition, CPU spike, disk full, dependency failure, and cascading failures.
Observability Strategy Architect — Designs three pillars (logs, metrics, traces) for {{service_count}} services with tool selection, instrumentation plan, and dashboard hierarchy.
Capacity Planning Model — Projects infrastructure needs for {{service}} from {{current_load}} to {{target_load}} with headroom calculations, scaling triggers, and cost optimization.
Deployment Strategy Selector — Evaluates deployment strategies (blue-green, canary, rolling, feature flag) for {{application}} with risk assessment and rollback procedures.
Service Mesh Configuration Designer — Configures Istio/Linkerd for {{service_count}} services with traffic management, mTLS, circuit breaking, and observability integration.
Runbook Automation Framework — Converts manual runbook for {{procedure}} into automated workflow with decision points, safety checks, and human approval gates.
On-Call Rotation Designer — Creates sustainable on-call system for {{team_size}} team with rotation schedule, escalation paths, compensation model, and burnout prevention.
Post-Incident Review Template — Structures blameless post-mortem for {{incident}} with timeline, contributing factors, remediation items, and systemic improvements.
Infrastructure as Code Reviewer — Reviews {{iac_tool}} (Terraform, Pulumi, CDK) configurations for security, cost optimization, and reliability best practices.
Container Orchestration Optimizer — Optimizes {{k8s_cluster}} with resource limits, HPA configuration, pod disruption budgets, and node pool strategy.
Database Reliability Framework — Designs reliability for {{database}} with backup strategy, failover testing, connection management, and performance monitoring.
Network Reliability Designer — Creates network architecture for {{application}} with redundancy, DDoS protection, DNS failover, and CDN strategy.
Cost Optimization Analyzer — Analyzes {{cloud_provider}} spending for {{account}} with right-sizing, reserved instance strategy, and waste identification.

Each Prompt Includes

{{placeholder}} variables for service, infrastructure, team, and reliability targets
Expected output: operational procedures, configuration files, architecture diagrams, or analysis reports
Chain-of-thought root cause analysis and tree-of-thought for architecture decisions
Anti-patterns: alert fatigue, toil accumulation, hero culture, and reliability theater

Who This Is For

SRE teams building reliability practices from the ground up
DevOps engineers designing deployment and monitoring infrastructure
Platform engineers creating internal developer platforms
Engineering managers establishing production readiness standards

What Makes This Different

Based on Google SRE book principles — error budgets, SLOs, toil reduction, and blameless culture
Covers the FULL reliability stack: prevention, detection, response, and continuous improvement
Includes chaos engineering — proactive reliability testing, not just reactive incident management

Works With

ChatGPT (GPT-4+), Claude (3.5+), Gemini Pro. Best with Claude for detailed technical analysis.

₹2,465

one-time payment

Instant download

Full source code included

30-day refund guarantee

DevOps & SRE Production Reliability System — 50 Prompts

What You'll Get

Keep Production Running at 99.99% — Systematically

What's Inside — 50 Expert Prompts

Each Prompt Includes

Who This Is For

What Makes This Different

Works With

Related Products

Context-Aware Response Generation Controller

Stepwise Decomposition Agent for Complex Enterprise Decisions

Persistent Conversation Memory Architect

Partial Failure Handling in Multi-Step Agent Workflows

Related Products

Context-Aware Response Generation Controller

Stepwise Decomposition Agent for Complex Enterprise Decisions

Persistent Conversation Memory Architect

Partial Failure Handling in Multi-Step Agent Workflows

DevOps &amp; SRE Production Reliability System — 50 Prompts

What You'll Get

Keep Production Running at 99.99% — Systematically

What's Inside — 50 Expert Prompts

Each Prompt Includes

Who This Is For

What Makes This Different

Works With

Related Products

Context-Aware Response Generation Controller

Stepwise Decomposition Agent for Complex Enterprise Decisions

Persistent Conversation Memory Architect

Partial Failure Handling in Multi-Step Agent Workflows

Related Products

Context-Aware Response Generation Controller

Stepwise Decomposition Agent for Complex Enterprise Decisions

Persistent Conversation Memory Architect

Partial Failure Handling in Multi-Step Agent Workflows

DevOps & SRE Production Reliability System — 50 Prompts