Hermes procedural memory (May 7) vs Anthropic Dreaming (May 6): 9-scenario decision matrix. When each wins, where neither works, and when to run both.
On May 6, 2026, Anthropic announced Dreaming for Managed Agents. On May 7, Nous Research shipped Hermes v0.13.0 with hardened procedural memory. Twenty-four hours apart. Both answer the same core problem — agents that forget what they learned. I run both in production at WOWHOW: a self-hosted Hermes agent handles the research and SEO pipeline, while a Managed Agents workflow powered by Claude handles the editorial approval loop. The overlap is real, but the architecture they live in is completely different. After running them in parallel for over a week, here is what I actually learned about when each one wins, where neither works, and the one scenario where you want both.
The Problem Both Are Solving
Every long-running agent hits the same wall. Sessions end. Contexts reset. The agent that spent three hours learning the quirks of your codebase wakes up the next morning with no memory of any of it. You rebuild from scratch, bleed tokens, and get inconsistent results. The agent is stateless by default because the underlying model is stateless by default. Memory is the gap between a tool and a collaborator.
Procedural memory specifically — not episodic memory (what happened), not semantic memory (what facts are true) — is the gap that matters most for multi-session work. Procedural memory is the “how.” How does this codebase handle errors. How does this team prefer commit messages formatted. How does this customer respond to follow-up questions. The know-how that makes an agent useful the hundredth time, not just the first.
Hermes and Dreaming both target procedural memory. They disagree entirely on where it lives and how it gets updated.
The Shape of Each One
Hermes Skills: Markdown on Disk, Git-Versionable, Model-Agnostic
Hermes skills are plain text files. That is not a limitation — it is the design. Each skill is a markdown file with YAML frontmatter stored in ~/.hermes/skills/. The agent loads relevant skills at the start of each session based on tag matching and relevance scoring. The skill content becomes part of the system prompt. There is no database, no embedding store, no API call to fetch them. The agent simply reads text files.
Here is what a real Hermes skill file looks like — this is the one I use for blog post formatting:
---
skill_id: blog-post-formatting
version: 1.4.0
tags: [writing, blog, seo, wowhow]
triggers:
- blog post
- article
- write content
- publish
priority: high
last_refined: 2026-05-09
refinements: 7
---
# Blog Post Formatting Skill
## Core Rules
- Lead paragraph must answer the primary question in under 100 words
- H2 headings use title case; H3 headings use sentence case
- Code blocks require explicit language tags (python, yaml, bash, typescript)
- Never use "delve", "moreover", "it is worth noting", "in conclusion"
- Financial claims require inline citations with primary source URLs
- Word counts: PILLAR 8k-12k / STANDARD 1.5k-2.5k / QUICK 600-900
## Category Mapping
- Industry analysis → category: industry-insights
- Tool tutorials → category: ai-tools-tutorials
- Finance / tax → category: personal-finance, add legal disclaimer
## SEO Requirements
- Title: 50-65 chars, include current year for evergreen guides
- Meta description: 140-160 chars, primary keyword in first 60 chars
- seo_keywords array: 5-8 entries, no keyword stuffing
## File Output
- New posts go to src/data/blog-posts/YYYY-MM-slug.ts
- Export name: posts[YYYYMM][PascalCaseSlug]
- Import + spread in src/data/blog-posts.ts
- Slug added to POST_ORDER array at position 0 (most recent first)
This skill has been refined seven times since it was created. Each refinement was a deliberate edit to a text file — either by me directly or by the agent writing a patch to its own skill files after a session. The entire history is in git. I can see exactly what changed in refinement three, when I realized the meta description rule was being ignored. I can revert to v1.2.0 if a new refinement breaks something. That is not a feature I had to build — that is just how text files and git work together.
The key architectural properties of Hermes skills:
- Storage: Plain markdown files on disk, anywhere you choose
- Model-agnostic: The skill content is just text — load it with Claude, GPT-5.5, Llama, whatever
- Version control: Standard git. Every change is tracked, reversible, diffable
- Trigger: Loaded at session start based on YAML frontmatter tags
- License: MIT. No API key required to read your own files
- Self-improvement: The agent can write patches to its own skill files via a structured tool call
The self-improvement path is the part that surprised me most. Hermes v0.13.0 ships with a refine_skill tool that lets the agent propose amendments to its own skill files. You approve, the file is patched, the version increments. The agent gets better at blog formatting because it noticed its own formatting outputs were inconsistent and wrote the fix itself. This happens entirely locally, with zero external API calls for the memory update itself.
Anthropic Dreaming: Async Consolidation, Managed Infrastructure, API-Native
Dreaming is architecturally different at every layer. Memory does not live in files — it lives in a managed memory store that Anthropic hosts. Dreaming is not triggered by session start — it runs as an asynchronous batch job after sessions have completed. And it does not amend text files — it runs Opus or Sonnet over a batch of sessions to extract patterns and consolidate them into a new memory store.
Here is the actual Python API for triggering a Dreaming job:
import anthropic
client = anthropic.Anthropic()
# Trigger a Dreaming consolidation job
dream_job = client.beta.dreams.create(
memory_store_id="memstore_01abc...",
session_ids=[
"sess_01abc...",
"sess_02def...",
# up to 100 sessions
],
model="claude-opus-4-7-20260301",
instructions="""
Focus on procedural patterns: how the agent resolved ambiguous requests,
error recovery strategies that worked, user communication preferences,
and any domain-specific conventions that emerged across sessions.
Discard one-off facts and session-specific context.
Preserve generalizable behavioral patterns only.
""",
)
print(f"Dream job ID: {dream_job.id}")
print(f"Status: {dream_job.status}") # 'pending' | 'running' | 'complete' | 'failed'
# Poll for completion (typically 2-8 minutes for 100 sessions)
import time
while dream_job.status not in ("complete", "failed"):
time.sleep(30)
dream_job = client.beta.dreams.retrieve(dream_job.id)
print(f"Status: {dream_job.status} — elapsed: {dream_job.elapsed_seconds}s")
# The new memory store is ready
if dream_job.status == "complete":
new_memory_store_id = dream_job.output_memory_store_id
print(f"New memory store: {new_memory_store_id}")
# Use this store_id when creating the next agent session
And here is how you wire the resulting memory store into a new Managed Agent session:
import anthropic
client = anthropic.Anthropic()
# Create an agent session that uses the post-Dreaming memory store
session = client.beta.managed_agents.sessions.create(
agent_id="agent_01...",
memory_store_id=new_memory_store_id, # Output from the Dream job
initial_message="Resume work on the Q2 reporting pipeline.",
)
print(f"Session: {session.id}")
print(f"Agent picks up with consolidated memory from {dream_job.output_memory_store_id}")
The key architectural properties of Dreaming:
- Storage: Anthropic-managed memory stores. Not your disk, not your database
- Model dependency: Runs on Claude (Opus or Sonnet). The memory store is Claude-native
- Trigger: You trigger it explicitly via API after session batches complete
- Refinement velocity: Can consolidate up to 100 sessions in a single async job
- Version control: The old memory store is not overwritten — the job produces a new store ID. Both exist until you delete one
- License: Anthropic Managed Agents pricing. Compute cost applies to the consolidation job
The non-destructive update model is worth pausing on. A Dreaming job never overwrites your existing memory store. It produces a new one. You choose which store to use on the next session. This means you can A/B test memory stores — run ten sessions with the old store, ten with the post-Dream store, compare output quality, promote the winner. That is a memory versioning pattern most developers would have to build from scratch if they were rolling their own episodic store.
9-Scenario Decision Matrix
I mapped every real use case I have encountered against both systems. Here is where each one actually wins.
Scenario 1: Self-Hosted Everything — Hermes Wins
Your data cannot leave your infrastructure. Your security policy forbids cloud API calls for memory operations. You are in a regulated environment (HIPAA, SOC 2, financial services) where the audit trail must live on your own systems. Dreaming is immediately disqualifying — the memory consolidation job runs on Anthropic’s infrastructure. Hermes skills live in a directory you control, on hardware you own, with no external network calls for memory operations. The agent reads files. That is auditable, isolatable, and compliant.
# Hermes skill directory can live anywhere
export HERMES_SKILLS_DIR=/encrypted/vault/agent-skills
# Git the skills dir for audit trail
cd /encrypted/vault/agent-skills
git log --follow -p billing-calculation.md
# Full history of every change to this skill, who made it, when
Scenario 2: Already Running on Managed Agents — Dreaming Wins
You are already using Anthropic’s Managed Agents API. Your sessions produce memory stores automatically. The Dreaming API call is three lines of Python. The memory improvement is built into your existing architecture with no new infrastructure. Hermes would require you to spin up a separate skill directory, implement a skill-loading layer, and figure out how to inject skills into managed agent prompts — none of which is impossible, but all of which is unnecessary if you are already in the Anthropic ecosystem.
Scenario 3: Version Control for Memory — Hermes Wins
You need to know exactly what the agent believed on a specific date. You need to roll back memory to before a bad training signal corrupted a skill. You need to audit which refinement introduced a regression. Hermes skills are files. Git tracks every change with author, timestamp, and diff. The complete version history of your agent’s procedural knowledge is available with git log.
# See the full history of a specific skill
git log --oneline --follow ~/.hermes/skills/code-review.md
# See what changed in a specific refinement
git show abc1234:~/.hermes/skills/code-review.md
# Revert a skill to a previous version
git checkout abc1234 -- ~/.hermes/skills/code-review.md
# See all skill refinements across the project in the last 30 days
git log --since="30 days ago" --name-only --pretty=format: -- "*.md" | grep -v "^$" | sort | uniq -c | sort -rn
Dreaming’s non-destructive store model gives you some version control — old stores are preserved — but you cannot diff two memory stores the way you can diff two markdown files. You cannot easily see which behavioral pattern changed between store v3 and store v4. The versioning granularity is “before this batch of 100 sessions” versus “after,” not “this specific procedural rule was amended on May 12 at 14:37.”
Scenario 4: Swapping Models Mid-Project — Hermes Wins
You started with Claude Opus 4.7. You want to test Llama 4 Scout locally for cost reasons. Or GPT-5.5 for a specific capability. Or a fine-tuned model your company trained internally. Hermes skills are plain text. They load into whatever model you point the agent at. Your procedural memory is not locked to a model vendor. Switch models, reload the same skills, observe how the new model interprets them — and add a new skill if the new model needs different formatting guidance to behave consistently.
Dreaming memory stores are Claude-native. They are produced by Claude running over your sessions, formatted for Claude’s context, and designed to be consumed by Claude. If you switch to GPT-5.5, the memory store does not transfer. You start over. That is not a criticism — it is a natural consequence of managed infrastructure — but it matters a lot if model-agnosticism is an architectural requirement.
Scenario 5: No Memory Management Overhead — Dreaming Wins
You do not want to curate skill files. You do not want to review refinement proposals. You do not want to think about which tags to apply so the right skills load at session start. You want the agent to get better and for that improvement to happen automatically without you touching anything. Dreaming is the right tool. Trigger it after every hundred sessions, wire the new store ID into subsequent sessions, let Opus consolidate the behavioral patterns. The maintenance burden is minimal: trigger the job, check it completed, update the store ID reference. Three API calls total.
Scenario 6: Rapid Improvement Across Many Sessions — Dreaming Wins
You have an agent handling customer support. It runs 500 sessions per day. The behavioral patterns that work emerge quickly — the phrases that de-escalate, the response length that gets the fastest resolution, the escalation triggers that matter. You want those patterns consolidated into usable memory within 24 hours, not gradually accumulated over weeks of manual skill curation. Dreaming can process up to 100 sessions per job. Schedule it to run nightly, feeding the prior day’s sessions. By morning the agent has consolidated the prior day’s learning into its memory store. Manual skill curation cannot match this velocity at scale.
import anthropic
from datetime import datetime, timedelta
client = anthropic.Anthropic()
def nightly_dream(agent_id: str, current_memory_store_id: str) -> str:
"""
Run after each day's sessions complete.
Returns the new memory store ID to use tomorrow.
"""
yesterday = datetime.utcnow() - timedelta(days=1)
# Fetch yesterday's session IDs
sessions = client.beta.managed_agents.sessions.list(
agent_id=agent_id,
created_after=yesterday.isoformat(),
limit=100,
)
if not sessions.data:
print("No sessions to consolidate")
return current_memory_store_id
session_ids = [s.id for s in sessions.data]
print(f"Consolidating {len(session_ids)} sessions...")
dream_job = client.beta.dreams.create(
memory_store_id=current_memory_store_id,
session_ids=session_ids,
model="claude-sonnet-4-6-20260101", # Sonnet is faster + cheaper for nightly runs
instructions="Extract and consolidate behavioral patterns that improved resolution rate.",
)
# Wait for completion
import time
while dream_job.status not in ("complete", "failed"):
time.sleep(30)
dream_job = client.beta.dreams.retrieve(dream_job.id)
if dream_job.status == "failed":
print(f"Dream job failed: {dream_job.error}")
return current_memory_store_id # Fall back to previous store
print(f"New memory store ready: {dream_job.output_memory_store_id}")
return dream_job.output_memory_store_id
# Nightly cron job
if __name__ == "__main__":
new_store_id = nightly_dream(
agent_id="agent_01...",
current_memory_store_id="memstore_01...",
)
# Persist new_store_id for tomorrow's sessions
Scenario 7: Audit What the Agent Learned — Hermes Wins
A compliance officer asks you to produce a complete list of behavioral rules the agent operates under. Or a customer asks why the agent responded a certain way. Or you need to verify that a specific harmful behavior pattern was removed after a refinement. With Hermes skills, you can open a folder and read every rule the agent follows, organized by skill file, versioned in git, with the exact date each rule was added or changed. The audit is a git log away.
With Dreaming, the consolidated memory store is a vector representation. You can retrieve memories from it, but you cannot print a flat list of “here are all the behavioral rules this agent follows, in plain English, sorted by when they were established.” The memory is implicit in the store, not explicit in a readable format. For regulated industries where behavioral auditability is a hard requirement, that distinction matters enormously.
Scenario 8: Single-User Personal Agent — Hermes Wins
You want a personal agent that learns your preferences. How you like code formatted. Which topics you find most interesting. Your communication style. The level of detail you want in summaries. This is low-volume (one user, maybe ten sessions per day), high-personalization, indefinite time horizon. Hermes is zero friction here: create a ~/.hermes/skills/personal/ directory, write a first pass of your preferences, let the agent refine them over time. No API keys beyond your inference provider. No managed infrastructure. No per-session cost for memory operations. Your preferences live in text files on your machine until you decide otherwise.
Scenario 9: Want Both — Dual-Stack Wins
This is where I have landed for the WOWHOW research pipeline. Hermes skills serve as the lingua franca — the human-readable, git-versioned, model-agnostic layer that defines how the agent behaves. Dreaming runs over session batches to extract behavioral patterns, but instead of using its output as a memory store directly, I have a lightweight adapter that converts the Dream output into Hermes skill amendments. The output of Dreaming becomes input to skill curation. The skills remain the authoritative source of truth. The Dreaming consolidation accelerates the curation process.
This is the pattern I believe becomes the standard for production agents that need both velocity of improvement (Dreaming) and auditability plus portability (Hermes). The bridge is the opportunity I will come back to at the end.
Comparison Table
| Dimension | Hermes Skills | Anthropic Dreaming |
|---|---|---|
| License | MIT (open source) | Anthropic Managed Agents pricing |
| Storage | Markdown files on your disk | Anthropic-managed memory stores |
| Trigger | Session start (tag-matched load) | Explicit API call (async batch job) |
| Model dependency | None — plain text works everywhere | Claude only (Opus or Sonnet for consolidation) |
| Refinement velocity | One session at a time, human-approved | Up to 100 sessions per batch job |
| Version control | Git-native (full history, diffs, rollback) | Non-destructive stores (old store preserved, no diffing) |
| Auditability | Full — readable text, git log, human-reviewable | Partial — can retrieve memories, not flat-listable |
| Self-hosted | Yes — files live on your machine | No — consolidation runs on Anthropic infra |
| Cost | No memory-specific cost (inference cost only) | Compute cost for each Dream job |
| Setup friction | Medium — skill dir, tags, triggers to configure | Low — three API calls if already on Managed Agents |
| Best for | Self-hosted, multi-model, compliance-sensitive, personal agents | High-volume managed agents on Anthropic stack |
Decision Tree
Before picking one, run through this decision tree:
START
│
├─ Is this a self-hosted environment (data cannot leave your infra)?
│ YES → HERMES SKILLS (Dreaming is disqualified)
│
├─ Are you already on Anthropic Managed Agents?
│ YES → DREAMING (lowest friction path)
│
├─ Do you need to swap models mid-project?
│ YES → HERMES SKILLS (model-agnostic memory)
│
├─ Do you need plain-language behavioral auditing?
│ YES → HERMES SKILLS (git + markdown = full audit trail)
│
├─ Are you running >50 sessions/day that generate behavioral patterns?
│ YES → DREAMING (velocity beats manual curation at this volume)
│
├─ Single-user personal agent, low volume?
│ YES → HERMES SKILLS (zero friction, no API cost)
│
├─ High-volume managed service, no audit requirement?
│ YES → DREAMING
│
└─ High-volume + audit + model flexibility?
→ DUAL-STACK: Dreaming for velocity, Hermes as authoritative layer
Where Neither Works
Both systems have failure modes that the documentation does not highlight.
Hermes struggles with emergent behavior across many users. If you have 10,000 users each with their own agent sessions, Hermes skill curation does not scale. You would need per-user skill directories, per-user git repos, per-user refinement approval flows. The architecture that works beautifully for a single developer’s personal agent becomes an operational nightmare at user scale. Dreaming with per-user memory stores handles this naturally.
Dreaming struggles with behavioral conflicts. When you feed 100 sessions from heterogeneous users into a single Dreaming job, the consolidation has to resolve contradictory patterns. User A prefers concise responses. User B prefers detailed explanations. The Dream job has to make a choice, and that choice may suit neither user well. Hermes handles this by having separate skill files for separate use cases — you are explicit about which rules apply where. Dreaming’s consolidation is a black-box averaging that can wash out legitimate variation.
Both struggle with catastrophic forgetting. A Hermes skill refinement that overwrites a critical rule with a wrong one is not automatically caught — you need to review proposals before approving them. A Dreaming job that pulls in 100 sessions including sessions from an adversarial user or a bad input day can corrupt the memory store with patterns you do not want. Neither system has built-in semantic validation to catch “this new rule contradicts a rule established two months ago.” That validation is your job.
The Real Numbers: What This Costs
Running Hermes skills in production for three months: the only cost is disk space (my skill directory is 340KB) and the marginal inference cost of including skills in system prompts. At roughly 2,000 tokens per skill load, and 50 sessions per day, that is 100,000 tokens per day in skill-loading overhead. At Claude Sonnet pricing that is roughly $0.30/day. The memory improvement value is not quantifiable precisely, but the agent producing consistent outputs without re-briefing every session saves at least 30 minutes of my time per day. The math is not close.
Running Dreaming for a hypothetical 100-session nightly batch: a Dreaming job using Sonnet for consolidation processes 100 sessions in roughly 4-6 minutes of compute. I do not have the exact pricing published yet — Anthropic has not released Dream job pricing per session, only that it bills against Managed Agents compute. My rough estimate based on the batch job scope is $0.50-$2.00 per nightly run. At scale with 1,000 sessions/day and multiple agents, this becomes meaningful. At 100 sessions/day for a single agent, it is negligible.
Where I’m Placing the Bet
I have thought about this for ten days since both systems shipped, and here is my actual position.
Hermes skills become the open standard for procedural memory. The format is trivially simple — YAML frontmatter, markdown body, a directory. Nothing about it requires Nous Research’s involvement to work. It is already happening: I have seen three open-source projects in the last two weeks that adopted the ~/.hermes/skills/ convention without using the Hermes agent at all. They just liked the format. When a standard is simple enough that people copy it without the tool, the standard wins. Procedural memory for AI agents will eventually be a solved category with an interoperable format, and the current trajectory points at something very close to what Hermes ships today.
Dreaming becomes table stakes for managed agent platforms. Within twelve months, every cloud agent platform — AWS Bedrock Agents, Google ADK, Azure AI Agents, and obviously Anthropic Managed Agents — will offer some version of async session consolidation. The capability is too valuable and the architecture too obvious for platforms not to ship it. Dreaming’s advantage is first-mover in a managed context. By the time every platform has this feature, being first will matter less than being deeply integrated with the rest of the Anthropic stack (which it is).
The bridge between them is the real opportunity. What does not exist today is a clean, production-ready adapter that takes the output of a Dreaming job — or any managed consolidation job — and translates it into Hermes skill amendments. A tool that watches your memory store diffs, identifies which behavioral patterns changed, generates candidate skill file patches, and queues them for human review. That tool would let you use Dreaming for velocity and Hermes for auditability. It would let you use managed consolidation and still be model-agnostic. It would let you get out of the Anthropic ecosystem if you need to without losing your accumulated procedural memory.
I am building a rough version of this adapter for the WOWHOW pipeline. It is not clean enough to open-source yet, but the pattern is working: nightly Dream jobs feed into a lightweight converter that proposes Hermes skill amendments, I approve the ones that look right, and the skill files capture the durable behavioral knowledge. The managed store captures session history. The two layers serve different purposes and stop competing.
Practical Starting Points
If you are reading this and want to get started today, here is the minimum viable path for each option.
For Hermes skills:
# Install Hermes
pip install hermes-agent
# Initialize skill directory
hermes init
# Create your first skill manually
mkdir -p ~/.hermes/skills
cat > ~/.hermes/skills/my-first-skill.md <<'EOF'
---
skill_id: my-first-skill
version: 1.0.0
tags: [general, communication]
triggers:
- write
- respond
- message
priority: medium
---
# Communication Style Skill
## Response Format
- Use plain language, avoid jargon unless the user uses it first
- Keep responses under 300 words unless detail is explicitly requested
- Always confirm the task before executing if scope is ambiguous
## Tone
- Direct, not terse
- Acknowledge uncertainty explicitly rather than hedging with qualifiers
EOF
# Start an agent session that loads this skill
hermes chat --tags general,communication
For Dreaming (assuming you are on Managed Agents):
import anthropic
client = anthropic.Anthropic()
# 1. Collect session IDs from the past week
sessions = client.beta.managed_agents.sessions.list(
agent_id="your_agent_id",
limit=50,
)
session_ids = [s.id for s in sessions.data]
# 2. Create a memory store if you don't have one
memory_store = client.beta.memory_stores.create(
name="agent-procedural-memory-v1"
)
# 3. Run the Dream job
dream = client.beta.dreams.create(
memory_store_id=memory_store.id,
session_ids=session_ids,
model="claude-sonnet-4-6-20260101",
instructions="Extract behavioral patterns and procedural preferences.",
)
print(f"Dream job: {dream.id} — {dream.status}")
# Poll until complete, then use dream.output_memory_store_id
Conclusion
The twenty-four-hour gap between Dreaming and Hermes v0.13.0 is a coincidence worth not over-reading. These are not competing products racing to capture the same market. They are implementations of the same concept at different layers of the stack — one optimized for managed infrastructure and velocity at scale, one optimized for transparency, portability, and compliance. The question is never “which one is better.” It is always “which one fits the layer I am building at.”
If your agent stack is entirely on Anthropic Managed Agents and you are running hundreds of sessions per day, set up Dreaming tonight. The API is straightforward, the improvement is real, and the setup overhead is minimal. If you are self-hosted, multi-model, or in any environment where “my memory is in a cloud I do not control” is a problem, start with Hermes skills. Create three skill files. Run the agent for a week. Review the refinement proposals it generates. You will be surprised how quickly the procedural memory compounds into something genuinely useful.
And if you are building the bridge between them — the adapter that translates managed consolidation output into portable skill amendments — I want to know about it. That tool does not exist yet. When it does, it closes the last architectural gap between the two best procedural memory systems that shipped this month.
Written by
Anup Karanjkar
Expert contributor at WOWHOW. Writing about AI, development, automation, and building products that ship.
Ready to ship faster?
Browse our catalog of 3,000+ premium dev tools, prompt packs, and templates.
Monday Memo · Free
One insight, every Monday. 7am IST. Zero fluff.
1 field report, 3 links, 1 tool we actually use. Join 11,200+ builders.
Comments · 0
No comments yet. Be the first to share your thoughts.