Security
Why Memory Security Matters
In multi-agent systems, agents trust each other by default. When your researcher agent passes output to your writer agent, the writer treats that as a legitimate instruction. If you compromise one agent, you get every downstream agent automatically. The 2025 incident landscape proved this at scale:- EchoLeak (CVE-2025-32711, CVSS 9.3): A single crafted email triggered automatic data exfiltration from Microsoft 365 Copilot
- CrewAI + GPT-4o: 65% exfiltration success rate in tested scenarios
- Drift chatbot cascade: One compromised agent integration cascaded into 700+ organizations
Content Security Pipeline
Every memory write passes through a four-stage content security pipeline before persistence.Stage 1: Input Validation
- Content length: Max 50,000 characters (configurable via
CONTENT_MAX_LENGTH) - Metadata depth: Max 5 levels of nesting (configurable via
METADATA_MAX_DEPTH) - Metadata keys: Max 50 total keys (configurable via
METADATA_MAX_KEYS) - Encoding: Null bytes and control characters rejected (except
\n,\t,\r)
Stage 2: Sensitive Data Detection
Detects PII and secrets using compiled regex patterns:- SSN patterns (
\b\d{3}-\d{2}-\d{4}\b) - Credit card numbers (Luhn-validated 13-19 digit sequences)
- API keys: AWS (
AKIA...), OpenAI (sk-...), GitHub (ghp_...,gho_...) - Email addresses
- Password assignments (
password=,secret:, etc.)
Stage 3: Prompt Injection Detection
Detects common injection patterns:- System prompt overrides: “ignore previous instructions”, “you are now”, “new instructions”
- Role manipulation: “pretend you are”, “act as”, “you must now”
- Data exfiltration triggers: “send data to”, “exfiltrate”, “forward to” with URLs
Stage 4: LLM-Based Injection Classification (Optional)
When enabled, an LLM classifier runs as an async second opinion after regex detection. Stage 4 only fires when the risk warrants the latency/cost:- Untrusted or unknown trust level
- Agent-shared or global scope
- Content that was regex-flagged but not rejected (Stage 3 flagged it)
- Confidence >= 0.8: escalate to REJECT
- Confidence >= threshold (default 0.7) but < 0.8: add
llm_injection_flaggedflag, keep existing action - LLM error (timeout, API failure): fall back to regex-only verdict (graceful degradation)
| Environment Variable | Default | Description |
|---|---|---|
ENABLE_LLM_INJECTION_CLASSIFIER | false | Enable Stage 4 |
INJECTION_CLASSIFIER_PROVIDER | openai | openai or anthropic |
INJECTION_CLASSIFIER_MODEL | gpt-4o-mini | Model to use for classification |
INJECTION_CLASSIFIER_API_KEY | — | Falls back to OPENAI_API_KEY |
INJECTION_CLASSIFIER_CONFIDENCE_THRESHOLD | 0.7 | Minimum confidence to flag |
Content Policy Configuration
Each detection category has a configurable action:| Environment Variable | Default | Options |
|---|---|---|
CONTENT_POLICY_PII | flag | reject, redact, flag, allow |
CONTENT_POLICY_SECRETS | reject | reject, redact, flag, allow |
CONTENT_POLICY_INJECTION | flag | reject, redact, flag, allow |
- reject: HTTP 422 returned, memory NOT stored,
SECURITY_REJECTEDevent emitted - redact: Matched patterns replaced with
[REDACTED:<type>], memory stored with flags - flag: Memory stored with
content_flagspopulated, available for admin review - allow: No action, content stored normally
Memory Integrity (HMAC-SHA256)
Every new memory is signed with HMAC-SHA256 at storage time.How It Works
Canonical message format:{project_id}:{agent_id}:{content}
The HMAC is computed using AEGIS_INTEGRITY_KEY (falls back to AEGIS_API_KEY).
Verification
has_hash: false.
Agent Trust Hierarchy
Four trust levels following OWASP recommendations:| Level | Write Scope | Read Scope | Delete | Admin |
|---|---|---|---|---|
untrusted | None | Global only | No | No |
internal | agent-private, agent-shared | Global + own | Own only | No |
privileged | All scopes | All | All | Yes |
system | All scopes | All | All | Yes |
Agent Identity Binding
API keys can be bound to a specificagent_id via the bound_agent_id field. When set, any request using that key must match the bound agent ID. This prevents agent ID spoofing.
Per-Agent Rate Limiting
Separate from project-level rate limiting, per-agent limits prevent a single rogue agent from exhausting the project’s quota.| Setting | Default | Description |
|---|---|---|
PER_AGENT_RATE_LIMIT_PER_MINUTE | 30 | Max requests per agent per minute |
PER_AGENT_RATE_LIMIT_PER_HOUR | 500 | Max requests per agent per hour |
AGENT_MEMORY_LIMIT | 10,000 | Max memories per agent per project |
Security Admin Endpoints
All requireprivileged or system trust level.
| Endpoint | Method | Description |
|---|---|---|
/security/audit | GET | Query security events with filters |
/security/flagged | GET | List flagged memories pending review |
/security/verify/{id} | POST | Verify HMAC integrity of a memory |
/security/config | GET | Current security configuration |
/security/scan | POST | Dry-run content scan without storing |
SDK Security Methods
Security Configuration Reference
| Variable | Default | Description |
|---|---|---|
AEGIS_INTEGRITY_KEY | Falls back to AEGIS_API_KEY | HMAC signing key |
CONTENT_MAX_LENGTH | 50,000 | Max content length in characters |
METADATA_MAX_DEPTH | 5 | Max metadata nesting depth |
METADATA_MAX_KEYS | 50 | Max total metadata keys |
CONTENT_POLICY_PII | flag | Action for PII detections |
CONTENT_POLICY_SECRETS | reject | Action for secret detections |
CONTENT_POLICY_INJECTION | flag | Action for injection detections |
ENABLE_INTEGRITY_CHECK | true | Enable HMAC signing |
PER_AGENT_RATE_LIMIT_PER_MINUTE | 30 | Per-agent rate limit (minute) |
PER_AGENT_RATE_LIMIT_PER_HOUR | 500 | Per-agent rate limit (hour) |
AGENT_MEMORY_LIMIT | 10,000 | Max memories per agent |
ENABLE_TRUST_LEVELS | false | Enable trust level enforcement |
ENABLE_LLM_INJECTION_CLASSIFIER | false | Enable Stage 4 LLM classifier |
INJECTION_CLASSIFIER_PROVIDER | openai | openai or anthropic |
INJECTION_CLASSIFIER_MODEL | gpt-4o-mini | Model for classification |
INJECTION_CLASSIFIER_API_KEY | Falls back to OPENAI_API_KEY | Dedicated API key for classifier |
INJECTION_CLASSIFIER_CONFIDENCE_THRESHOLD | 0.7 | Minimum confidence to flag |