Performance Benchmarks
Test Environment
- Hardware: 8 vCPU (Intel 13th Gen), 7.6 GB RAM
- Stack: Docker Compose — PostgreSQL 16 + pgvector, FastAPI
- Dataset: 1000 memories, 100 queries, 20 cross-agent queries (seeded, deterministic)
- Concurrency: 10 concurrent clients
- Embedding: OpenAI
text-embedding-3-small(1536 dimensions)
Results
| Operation | p50 | p95 | p99 | Throughput |
|---|---|---|---|---|
| Sequential add | 72ms | 89ms | 97ms | 14.1 ops/s |
| Batch add (5x20) | 216ms | 292ms | 292ms | 4.6 ops/s |
| Concurrent add (c=10) | 100ms | 193ms | 511ms | 85.1 ops/s |
| Sequential query | 282ms | 411ms | 1502ms | 3.8 ops/s |
| Concurrent query (c=10) | 413ms | 1832ms | 1897ms | 18.6 ops/s |
| Cross-agent query | 304ms | 380ms | 380ms | 3.3 ops/s |
| Vote | 64ms | 176ms | 176ms | 14.1 ops/s |
| Deduplication | 75ms | 112ms | 112ms | 13.6 ops/s |
Key Findings
- Writes scale well under concurrency — 85 ops/s at p50=100ms with 10 concurrent clients.
- Query tail latency is OpenAI-bound — p95/p99 spikes on queries are dominated by the external embedding API call, not Aegis or PostgreSQL.
- Votes and dedup are cheap — pure database operations with no embedding overhead, consistently under 75ms at p50.
Reproduce
--seed 42), captures machine profile, and writes results to results.json. Configure via environment variables:
| Variable | Default | Description |
|---|---|---|
COUNT | 1000 | Number of memories to generate |
QUERIES | 100 | Number of queries to run |
CONCURRENCY | 10 | Concurrent client count |
BASE_URL | http://localhost:8000 | Server URL |
API_KEY | dev-secret-key | API key |
SEED | 42 | Random seed for reproducibility |
Benchmark Scripts
generate_dataset.py— Seeded JSONL dataset generatorquery_workload.py— Async workload runner with latency percentilesmachine_profile.py— Captures hardware profile for reproducibilityrun_benchmark.sh— End-to-end orchestrator