How WOWHOW s AI pipeline works: the forge system, quality gates, content generation, and lessons learned from building an always-on AI product factory.
WOWHOW isn’t just a storefront. Behind the product pages and prompt packs is a fully automated AI product factory — a pipeline we call “the Forge” that generates, tests, refines, and ships digital products 24 hours a day, 7 days a week.
This is the story of how we built it, what we learned, and what we’d do differently.
Why We Built the Forge
When we started WOWHOW, we were creating prompt packs manually. One person would write prompts, another would test them across models, a third would write documentation, and someone else would build the product page.
A single prompt pack took 40-60 hours to go from idea to published product.
We knew this wouldn’t scale. The demand for quality prompt packs was growing faster than our team could produce them. We needed a system that could:
- Generate prompt candidates automatically
- Test them across multiple AI models
- Score quality objectively
- Generate documentation and marketing copy
- Build product pages
- Handle the entire pipeline with minimal human intervention
How the Forge Works
Stage 1: Idea Generation
The pipeline starts with market research. We monitor:
- Search trends for AI and prompt-related queries
- Social media discussions about AI pain points
- Customer support tickets and feature requests
- Competitor product launches
An AI system analyzes these signals and generates product briefs — descriptions of prompt packs that would address real market demand.
Stage 2: Prompt Generation
For each product brief, the system generates candidate prompts using a multi-model approach:
- Claude generates initial prompt candidates
- GPT generates alternative versions
- A “remix” agent combines the best elements
- Each candidate goes through 3 rounds of self-refinement
Stage 3: Quality Testing
This is the most critical stage. Every prompt is tested:
- Multi-model testing — run on Claude, GPT, and Gemini
- Consistency testing — run 5 times on each model to check variance
- Quality scoring — automated scoring on relevance, completeness, clarity, and usefulness
- Edge case testing — deliberately difficult inputs to stress-test prompts
Prompts must score 8/10 or higher across all models to pass. About 60% of generated prompts fail this gate.
Stage 4: Documentation
Passed prompts get automated documentation:
- Usage instructions
- Customization tips
- Example outputs from each model
- Known limitations
- Suggested modifications for specific use cases
Stage 5: Product Assembly
The system packages everything into a product:
- Product page content (title, description, features, FAQ)
- Cover image (generated with AI, reviewed by humans)
- Pricing recommendation (based on market analysis)
- SEO metadata
- Downloadable prompt pack file
Stage 6: Human Review
This is the one stage that always requires a human. Before any product goes live:
- A team member reviews the prompts for quality and accuracy
- Tests the prompts personally to verify they work as documented
- Reviews the product page for accuracy and brand consistency
- Approves or sends back for revision
About 20% of products that pass automated testing get sent back at this stage.
Comments · 0
No comments yet. Be the first to share your thoughts.