Run Kubernetes in Production Without the 3 AM Pages
Kubernetes in production is a different beast from following tutorials. These 50 prompts encode the hard-won knowledge of platform engineers who manage clusters serving millions of requests — the configurations, runbooks, and debugging workflows that keep services running while you sleep.
Every prompt produces production-ready YAML, Helm charts, or runbook procedures that you can deploy immediately. No "exercise for the reader" sections. No placeholder values. Real configurations for real clusters.
What's Inside — 50 Expert Prompts
Cluster Architecture (Prompts 1-10)
- 1. Production Cluster Design — Generates cluster architecture for {{workload_type}} with {{node_count}} nodes: control plane HA, node pool strategy, networking CNI selection, and storage class configuration. Uses tree-of-thought to evaluate EKS vs GKE vs AKS trade-offs.
- 2. Node Pool Strategy — Designs node pools for {{workload_mix}}: compute-optimized, memory-optimized, GPU, spot/preemptible with autoscaling policies and pod disruption budgets.
- 3. Namespace Architecture — Organizes namespaces for {{team_structure}}: resource quotas, limit ranges, network policies, and RBAC per namespace.
- 4. Multi-Cluster Strategy — Federation architecture for {{regions}}: service mesh, DNS-based routing, data replication, and failover procedures.
- 5. Cluster Upgrade Playbook — Zero-downtime upgrade procedure for {{k8s_version}}: pre-upgrade checks, node drain strategy, compatibility verification, and rollback plan.
- 6-10. Additional prompts covering: etcd backup/restore, control plane hardening, admission controllers, custom resource definitions, and operator framework design.
Workload Management (Prompts 11-20)
- 11. Deployment Strategy Designer — Creates deployment configuration for {{service}}: rolling update, blue-green, canary with traffic splitting, health checks, and rollback triggers.
- 12. Resource Request/Limit Calculator — Analyzes {{service_metrics}} and generates optimal CPU/memory requests and limits with VPA recommendations.
- 13. HPA Configuration — Custom metrics autoscaling for {{service}}: CPU, memory, request rate, queue depth with scaling policies and cooldown periods.
- 14. StatefulSet Architecture — Production StatefulSet for {{stateful_app}}: persistent volumes, ordered deployment, headless services, and backup integration.
- 15. Job/CronJob Patterns — Batch processing with {{job_type}}: parallelism, completion tracking, failure handling, timeout, and cleanup policies.
- 16-20. Additional prompts covering: init containers, sidecar patterns, pod affinity/anti-affinity, topology spread constraints, and priority classes.
Networking & Security (Prompts 21-35)
- 21. Network Policy Designer — Zero-trust network policies for {{namespace}}: ingress/egress rules, pod-to-pod communication, external access, and DNS policies.
- 22. Ingress Architecture — Production ingress for {{domains}}: TLS termination, rate limiting, WAF integration, and path-based routing with {{ingress_controller}}.
- 23. Service Mesh Implementation — Istio/Linkerd setup for {{service_mesh_goals}}: mTLS, traffic management, observability, and circuit breaking.
- 24. RBAC Policy Generator — Role-based access control for {{team_roles}}: cluster roles, role bindings, service accounts, and audit logging.
- 25. Pod Security Standards — Security contexts, seccomp profiles, AppArmor policies, and OPA/Gatekeeper constraints for {{security_level}}.
- 26-35. Additional prompts covering: secret management (Vault/sealed secrets), image scanning, supply chain security, pod security admission, network encryption, DNS configuration, load balancer optimization, certificate management, external DNS, and egress gateway design.
Operations & Monitoring (Prompts 36-50)
- 36. Prometheus + Grafana Stack — Complete monitoring for {{cluster}}: ServiceMonitors, recording rules, alerting rules, and pre-built dashboards.
- 37. Disaster Recovery Plan — Backup and restore procedures for {{critical_services}}: Velero configuration, PV snapshots, etcd backups, and DR drills.
- 38. Cost Optimization Audit — Analyzes {{cluster}} for waste: oversized pods, idle nodes, unused PVs, and generates right-sizing recommendations.
- 39. GitOps Pipeline — ArgoCD/Flux configuration for {{repo_structure}}: application sets, sync policies, health checks, and progressive delivery.
- 40-50. Additional prompts covering: log aggregation, distributed tracing, incident runbooks, capacity planning, chaos engineering, SLO definition, error budgets, on-call rotation, postmortem templates, and compliance auditing.
Each Prompt Includes
- {{Variable}} slots — Customizable for your cloud provider, cluster size, and workload type
- Production-ready YAML — Copy-paste configurations, not pseudocode
- Technique annotation — Chain-of-thought for debugging, tree-of-thought for architecture decisions
- War stories — Real failure scenarios each prompt prevents
- Anti-patterns — Kubernetes mistakes that cause production outages
Who This Is For
- Platform engineers managing production Kubernetes clusters
- DevOps engineers transitioning from Docker Compose to Kubernetes
- SREs building reliability into Kubernetes-based systems
- CTOs evaluating Kubernetes architecture decisions
- Cloud architects designing multi-cluster, multi-region deployments
What Makes This Different
- Production-only — Every prompt assumes a real cluster with real traffic, not minikube tutorials
- Cloud-agnostic — Variants for EKS, GKE, AKS, and bare metal included
- Operations-focused — Not just "how to deploy" but "how to run, debug, and recover"
- Cost-aware — Every configuration includes cost implications and optimization notes
Works With
ChatGPT (GPT-4+), Claude (Sonnet/Opus), Gemini Pro. Best with Claude for complex architecture decisions and debugging chains.