ACE Patterns Guide

This guide documents how Aegis implements patterns from two breakthrough research papers:

ACE Paper (Stanford/SambaNova): “Agentic Context Engineering” - treats contexts as evolving playbooks that accumulate strategies over time
Anthropic’s Long-Running Agent Harnesses: Solving the multi-context-window problem for agents that work across sessions

Key Insight: Both papers demonstrate that structured, incremental context evolution dramatically outperforms static prompts or monolithic rewrites. ACE achieved +17.1% improvement on agent benchmarks.

The Problems These Patterns Solve

Context Collapse

When an LLM rewrites its entire context, it can collapse valuable accumulated knowledge:

Step 60: 18,282 tokens → Accuracy 66.7%
Step 61: 122 tokens → Accuracy 57.1% (COLLAPSED!)

Aegis Solution: Incremental delta updates that never rewrite the full context.

Brevity Bias

Prompt optimizers compress away domain-specific heuristics for “concise” instructions, losing critical task-specific knowledge.Aegis Solution: Memory types (reflection, strategy) that preserve detailed insights.

Premature Victory

Agents declare tasks complete without proper verification.Aegis Solution: Feature tracking with explicit pass/fail status.

Lost Progress Between Sessions

Each new context window starts fresh with no memory of previous work.Aegis Solution: Session progress tracking that persists between context windows.

Pattern 1: Memory Voting

ACE’s key insight: track which memories were helpful vs harmful for completing tasks.

Why It Works

Memories with positive effectiveness scores consistently improve task performance. By voting on memories, agents learn what strategies actually work.

from aegis_memory import AegisClient

client = AegisClient(api_key="...")

# After successfully using a strategy
client.vote(
    memory_id=strategy.id,
    vote="helpful",
    voter_agent_id="executor",
    context="Successfully paginated through all API results",
    task_id="task-12345"
)

# After a strategy caused an error
client.vote(
    memory_id=strategy.id,
    vote="harmful",
    voter_agent_id="executor",
    context="Caused infinite loop - range(10) wasn't enough pages",
    task_id="task-12345"
)

Querying by Effectiveness

# Only get well-rated strategies
strategies = client.playbook(
    query="API pagination handling",
    agent_id="executor",
    min_effectiveness=0.3  # Filter by (helpful-harmful)/(total+1) > 0.3
)

Pattern 2: Incremental Delta Updates

ACE’s breakthrough: never rewrite the full context. Use atomic, localized updates.

Why It Works

Monolithic rewrites cause “context collapse.” Delta updates:

Only modify what needs to change
Preserve accumulated knowledge
Enable parallel updates
Reduce latency by 86.9%

result = client.delta([
    # Add a new strategy
    {
        "type": "add",
        "content": "For pagination, always use while True loop instead of range(n)",
        "memory_type": "strategy",
        "agent_id": "reflector",
        "scope": "global"
    },
    # Deprecate outdated strategy (soft delete)
    {
        "type": "deprecate",
        "memory_id": "old-pagination-strategy",
        "superseded_by": None,
        "deprecation_reason": "Caused incomplete data collection"
    }
])

Pattern 3: Reflection Memories

Extract actionable insights from task trajectories.

client.reflection(
    content="When identifying roommates, always use Phone app contacts. "
            "Never rely on Venmo transaction descriptions - they are unreliable.",
    agent_id="reflector",
    source_trajectory_id="task-12345",
    error_pattern="identity_resolution",
    correct_approach="First authenticate with Phone app, use search_contacts() "
                     "to find contacts with 'roommate' relationship.",
    applicable_contexts=["financial_tasks", "contact_tasks"],
    scope="global"
)

Pattern 4: Session Progress Tracking

Anthropic’s claude-progress.txt pattern, structured and queryable.

# Create session at start of project
session = client.progress.create(
    session_id="build-dashboard-v2",
    agent_id="coding-agent"
)

# Update as work progresses
client.progress.update(
    session_id="build-dashboard-v2",
    completed=["auth", "routing", "api-client"],
    in_progress="dashboard-components",
    next=["data-visualization", "testing"],
    blocked=[
        {"item": "payment-integration", "reason": "Waiting for Stripe API keys"}
    ],
    summary="Core infrastructure complete. Starting UI components."
)

Pattern 5: Feature Tracking

Prevent premature victory with explicit verification.

# Initialize features at project start
client.features.create(
    feature_id="new-chat",
    description="User can create a new chat and receive AI response",
    test_steps=[
        "Navigate to main interface",
        "Click 'New Chat' button",
        "Verify new conversation created",
        "Type message and press Enter",
        "Verify AI response appears"
    ]
)

# Only mark complete after verification
async def verify_and_complete(feature_id: str):
    feature = client.features.get(feature_id)

    for step in feature.test_steps:
        result = await run_test_step(step)
        if not result.passed:
            client.features.mark_failed(feature_id, reason=f"Failed: {step}")
            return False

    client.features.mark_complete(feature_id, verified_by="qa-agent")
    return True

Pattern 6: ACE Run Tracking

Track agent runs end-to-end and automatically feed outcomes back into the memory system.

Why It Works

Without run tracking, there’s no feedback loop. Memories get voted on manually (if at all), and agents keep making the same mistakes. Run tracking closes the loop automatically: memories used in successful runs get reinforced, and failures generate reflections.

from aegis_memory import AegisClient

client = AegisClient(api_key="...")

# 1. Query playbook before starting
playbook = client.get_playbook_for_agent(
    "executor",
    query="API pagination task",
    task_type="api-integration",
)
memory_ids = [e.id for e in playbook.entries]

# 2. Start tracking the run
run = client.start_run(
    run_id="task-42",
    agent_id="executor",
    task_type="api-integration",
    memory_ids_used=memory_ids,
)

# 3. Execute the task...
# (agent does its work here)

# 4. Complete the run with outcome
result = client.complete_run(
    "task-42",
    success=True,
    evaluation={"score": 0.95, "pages_fetched": 150},
)
# Auto-votes 'helpful' on all memory_ids_used!

# On failure:
result = client.complete_run(
    "task-42",
    success=False,
    evaluation={"error": "Timeout after 30s", "error_pattern": "timeout"},
)
# Auto-votes 'harmful' AND creates a reflection memory!

Pattern 7: Curation Cycle

Periodically clean up your memory playbook by promoting effective entries, flagging harmful ones, and identifying duplicates.

# Trigger curation
curation = client.curate(
    namespace="production",
    agent_id="executor",
    top_k=10,
    min_effectiveness_threshold=-0.3,
)

# Review promoted entries (high effectiveness, validated by runs)
for entry in curation.promoted:
    print(f"[+{entry.effectiveness_score:.2f}] {entry.content}")

# Review flagged entries (low effectiveness, candidates for deprecation)
for entry in curation.flagged:
    print(f"[-{entry.effectiveness_score:.2f}] {entry.content}")
    # Optionally deprecate:
    # client.deprecate(entry.id, reason="Low effectiveness after curation")

# Review consolidation candidates (similar entries to merge)
for c in curation.consolidation_candidates:
    print(f"Similar: {c.content_a[:50]} <-> {c.content_b[:50]}")

Performance Impact

Based on the ACE paper’s benchmarks:

Metric	Without ACE	With ACE	Improvement
Agent Tasks (AppWorld)	42.4%	59.5%	+17.1%
Financial Analysis	70.7%	78.3%	+7.6%
Adaptation Latency	Baseline	-86.9%	86.9% faster
Token Cost	Baseline	-83.6%	83.6% cheaper

Quick Reference

Memory Types

Type	Purpose	Scope Default
`standard`	Facts, preferences	agent-private
`strategy`	Reusable patterns	global
`reflection`	Lessons from failures	global
`progress`	Session state	agent-private
`feature`	Feature tracking	global

ACE Loop Endpoints

Endpoint	Method	Purpose
`/ace/run`	POST	Start tracking an agent run
`/ace/run/{run_id}`	GET	Get run details
`/ace/run/{run_id}/complete`	POST	Complete run with auto-feedback
`/ace/playbook/agent`	POST	Agent-specific playbook query
`/ace/curate`	POST	Trigger curation cycle

Effectiveness Score

score = (helpful - harmful) / (helpful + harmful + 1)
# Range: -1.0 to 1.0
# Positive = net helpful
# Negative = net harmful

References

ACE Paper: Zhang et al. “Agentic Context Engineering” (arXiv:2510.04618, Oct 2025)
Anthropic Blog: “Effective Harnesses for Long-Running Agents” (2025)

Guides

Context Quality Patterns (ACE)

ACE Patterns Guide

The Problems These Patterns Solve

Pattern 1: Memory Voting

Why It Works

Querying by Effectiveness

Pattern 2: Incremental Delta Updates

Why It Works

Pattern 3: Reflection Memories

Pattern 4: Session Progress Tracking

Pattern 5: Feature Tracking

Pattern 6: ACE Run Tracking

Why It Works

Pattern 7: Curation Cycle

Performance Impact

Quick Reference

Memory Types

ACE Loop Endpoints

Effectiveness Score

References

Guides

​ACE Patterns Guide

​The Problems These Patterns Solve

​Pattern 1: Memory Voting

​Why It Works

​Querying by Effectiveness

​Pattern 2: Incremental Delta Updates

​Why It Works

​Pattern 3: Reflection Memories

​Pattern 4: Session Progress Tracking

​Pattern 5: Feature Tracking

​Pattern 6: ACE Run Tracking

​Why It Works

​Pattern 7: Curation Cycle

​Performance Impact

​Quick Reference

​Memory Types

​ACE Loop Endpoints

​Effectiveness Score

​References

ACE Patterns Guide

The Problems These Patterns Solve

Pattern 1: Memory Voting

Why It Works

Querying by Effectiveness

Pattern 2: Incremental Delta Updates

Why It Works

Pattern 3: Reflection Memories

Pattern 4: Session Progress Tracking

Pattern 5: Feature Tracking

Pattern 6: ACE Run Tracking

Why It Works

Pattern 7: Curation Cycle

Performance Impact

Quick Reference

Memory Types

ACE Loop Endpoints

Effectiveness Score

References