We've built a system that generates, tests, and ships AI-powered products around the clock. Here's the honest, unfiltered story of how it works — and what nearly broke us.
WOWHOW isn't just a storefront. Behind the product pages and prompt packs is a fully automated AI product factory — a pipeline we call "the Forge" that generates, tests, refines, and ships digital products 24 hours a day, 7 days a week.
This is the story of how we built it, what we learned, and what we'd do differently.
Why We Built the Forge
When we started WOWHOW, we were creating prompt packs manually. One person would write prompts, another would test them across models, a third would write documentation, and someone else would build the product page.
A single prompt pack took 40-60 hours to go from idea to published product.
We knew this wouldn't scale. The demand for quality prompt packs was growing faster than our team could produce them. We needed a system that could:
- Generate prompt candidates automatically
- Test them across multiple AI models
- Score quality objectively
- Generate documentation and marketing copy
- Build product pages
- Handle the entire pipeline with minimal human intervention
How the Forge Works
Stage 1: Idea Generation
The pipeline starts with market research. We monitor:
- Search trends for AI and prompt-related queries
- Social media discussions about AI pain points
- Customer support tickets and feature requests
- Competitor product launches
An AI system analyzes these signals and generates product briefs — descriptions of prompt packs that would address real market demand.
Stage 2: Prompt Generation
For each product brief, the system generates candidate prompts using a multi-model approach:
- Claude generates initial prompt candidates
- GPT generates alternative versions
- A "remix" agent combines the best elements
- Each candidate goes through 3 rounds of self-refinement
Stage 3: Quality Testing
This is the most critical stage. Every prompt is tested:
- Multi-model testing — run on Claude, GPT, and Gemini
- Consistency testing — run 5 times on each model to check variance
- Quality scoring — automated scoring on relevance, completeness, clarity, and usefulness
- Edge case testing — deliberately difficult inputs to stress-test prompts
Prompts must score 8/10 or higher across all models to pass. About 60% of generated prompts fail this gate.
Stage 4: Documentation
Passed prompts get automated documentation:
- Usage instructions
- Customization tips
- Example outputs from each model
- Known limitations
- Suggested modifications for specific use cases
Stage 5: Product Assembly
The system packages everything into a product:
- Product page content (title, description, features, FAQ)
- Cover image (generated with AI, reviewed by humans)
- Pricing recommendation (based on market analysis)
- SEO metadata
- Downloadable prompt pack file
Stage 6: Human Review
This is the one stage that always requires a human. Before any product goes live:
- A team member reviews the prompts for quality and accuracy
- Tests the prompts personally to verify they work as documented
- Reviews the product page for accuracy and brand consistency
- Approves or sends back for revision
About 20% of products that pass automated testing get sent back at this stage.
The Numbers
- Products generated per week: 15-25 candidates
- Products that pass automated testing: 8-12
- Products that pass human review: 6-10
- Time from idea to published product: 48-72 hours (down from 40-60 hours manual)
- Cost per product: ~$12 in API calls (down from ~$800 in human labor)
What Nearly Broke Us
The Quality Crisis (Month 2)
In our second month, we realized our automated quality scoring was flawed. It optimized for objective correctness but missed subjective usefulness. Prompts that scored 9/10 on our metrics were getting negative customer reviews.
The fix: we added a "usefulness panel" — a group of 10 beta testers who rate products before launch. Their subjective ratings now carry more weight than automated scores.
The Hallucination Problem (Month 3)
AI-generated documentation sometimes contained inaccurate claims about what prompts could do. A prompt pack for "legal document drafting" was documented as "produces court-ready legal documents" — which is dangerously misleading.
The fix: mandatory human review of all documentation, especially claims about capabilities. We added automated checks for superlative claims and legal/medical/financial language.
The Monotony Problem (Month 4)
When AI generates products at scale, they start to feel samey. Same structure, same language patterns, same design choices. Customers noticed.
The fix: we added deliberate variation to the pipeline. Different generation models for different products. Random style variation in product pages. And most importantly, human creative direction for our premium products.
Lessons Learned
- Automation without quality gates is a liability — speed means nothing if products aren't good
- Human review is non-negotiable — AI can't fully evaluate AI output yet
- Customer feedback loops are essential — automated metrics only catch what you measure
- Transparency builds trust — customers appreciate knowing how products are made
- The best products combine AI speed with human taste — pure automation produces mediocrity at scale
What's Next
We're working on:
- Personalized prompt packs — custom-generated based on your specific use case
- Real-time quality monitoring — tracking how customers actually use prompts and automatically improving them
- Community-driven development — letting customers vote on what products we build next
- Open-sourcing parts of the Forge — sharing our quality testing framework with the community
People Also Ask
Are WOWHOW products fully AI-generated?
AI-assisted, human-reviewed. Every product passes through automated generation, automated testing, and mandatory human review. No product ships without a human approving it.
Why should I pay for prompts I could write myself?
You're paying for tested, refined, documented prompts. Each prompt has been run hundreds of times across multiple models. The testing alone would take you days per prompt. Our products let you skip to the result.
How often are products updated?
Products are reviewed quarterly and updated when models change significantly. Subscribers get updates automatically.
Want to skip months of trial and error? We've distilled thousands of hours of prompt engineering into ready-to-use prompt packs that deliver results on day one. Our packs at wowhow.cloud include battle-tested prompts for marketing, coding, business, writing, and more — each one refined until it consistently produces professional-grade output.
Blog reader exclusive: Use code
BLOGREADER20for 20% off your entire cart. No minimum, no catch.
Written by
Promptium Team
Expert contributor at WOWHOW. Writing about AI, development, automation, and building products that ship.
Ready to ship faster?
Browse our catalog of 1,800+ premium dev tools, prompt packs, and templates.