Skip to main content

Performance Benchmarks

Test Environment

  • Hardware: 8 vCPU (Intel 13th Gen), 7.6 GB RAM
  • Stack: Docker Compose — PostgreSQL 16 + pgvector, FastAPI
  • Dataset: 1000 memories, 100 queries, 20 cross-agent queries (seeded, deterministic)
  • Concurrency: 10 concurrent clients
  • Embedding: OpenAI text-embedding-3-small (1536 dimensions)

Results

Operationp50p95p99Throughput
Sequential add72ms89ms97ms14.1 ops/s
Batch add (5x20)216ms292ms292ms4.6 ops/s
Concurrent add (c=10)100ms193ms511ms85.1 ops/s
Sequential query282ms411ms1502ms3.8 ops/s
Concurrent query (c=10)413ms1832ms1897ms18.6 ops/s
Cross-agent query304ms380ms380ms3.3 ops/s
Vote64ms176ms176ms14.1 ops/s
Deduplication75ms112ms112ms13.6 ops/s
Total: 1060 operations, 0% error rate.

Key Findings

  • Writes scale well under concurrency — 85 ops/s at p50=100ms with 10 concurrent clients.
  • Query tail latency is OpenAI-bound — p95/p99 spikes on queries are dominated by the external embedding API call, not Aegis or PostgreSQL.
  • Votes and dedup are cheap — pure database operations with no embedding overhead, consistently under 75ms at p50.

Reproduce

cd benchmarks && bash run_benchmark.sh
The harness generates a seeded dataset (--seed 42), captures machine profile, and writes results to results.json. Configure via environment variables:
VariableDefaultDescription
COUNT1000Number of memories to generate
QUERIES100Number of queries to run
CONCURRENCY10Concurrent client count
BASE_URLhttp://localhost:8000Server URL
API_KEYdev-secret-keyAPI key
SEED42Random seed for reproducibility

Benchmark Scripts

  • generate_dataset.py — Seeded JSONL dataset generator
  • query_workload.py — Async workload runner with latency percentiles
  • machine_profile.py — Captures hardware profile for reproducibility
  • run_benchmark.sh — End-to-end orchestrator