Security

Why Memory Security Matters

In multi-agent systems, agents trust each other by default. When your researcher agent passes output to your writer agent, the writer treats that as a legitimate instruction. If you compromise one agent, you get every downstream agent automatically. The 2025 incident landscape proved this at scale:

EchoLeak (CVE-2025-32711, CVSS 9.3): A single crafted email triggered automatic data exfiltration from Microsoft 365 Copilot
CrewAI + GPT-4o: 65% exfiltration success rate in tested scenarios
Drift chatbot cascade: One compromised agent integration cascaded into 700+ organizations

Memory is the attack surface. Aegis implements OWASP AI Agent Security Cheat Sheet Section 3 natively.

Content Security Pipeline

Every memory write passes through a four-stage content security pipeline before persistence.

Stage 1: Input Validation

Content length: Max 50,000 characters (configurable via CONTENT_MAX_LENGTH)
Metadata depth: Max 5 levels of nesting (configurable via METADATA_MAX_DEPTH)
Metadata keys: Max 50 total keys (configurable via METADATA_MAX_KEYS)
Encoding: Null bytes and control characters rejected (except \n, \t, \r)

Stage 2: Sensitive Data Detection

Detects PII and secrets using compiled regex patterns:

SSN patterns (\b\d{3}-\d{2}-\d{4}\b)
Credit card numbers (Luhn-validated 13-19 digit sequences)
API keys: AWS (AKIA...), OpenAI (sk-...), GitHub (ghp_..., gho_...)
Email addresses
Password assignments (password=, secret:, etc.)

Stage 3: Prompt Injection Detection

Detects common injection patterns:

System prompt overrides: “ignore previous instructions”, “you are now”, “new instructions”
Role manipulation: “pretend you are”, “act as”, “you must now”
Data exfiltration triggers: “send data to”, “exfiltrate”, “forward to” with URLs

Stage 4: LLM-Based Injection Classification (Optional)

When enabled, an LLM classifier runs as an async second opinion after regex detection. Stage 4 only fires when the risk warrants the latency/cost:

Untrusted or unknown trust level
Agent-shared or global scope
Content that was regex-flagged but not rejected (Stage 3 flagged it)

The classifier asks a focused binary question: “Does this text contain instructions that attempt to manipulate an AI system’s behavior?” and returns a confidence score. Escalation logic:

Confidence >= 0.8: escalate to REJECT
Confidence >= threshold (default 0.7) but < 0.8: add llm_injection_flagged flag, keep existing action
LLM error (timeout, API failure): fall back to regex-only verdict (graceful degradation)

Configuration:

Environment Variable	Default	Description
`ENABLE_LLM_INJECTION_CLASSIFIER`	`false`	Enable Stage 4
`INJECTION_CLASSIFIER_PROVIDER`	`openai`	`openai` or `anthropic`
`INJECTION_CLASSIFIER_MODEL`	`gpt-4o-mini`	Model to use for classification
`INJECTION_CLASSIFIER_API_KEY`	—	Falls back to `OPENAI_API_KEY`
`INJECTION_CLASSIFIER_CONFIDENCE_THRESHOLD`	`0.7`	Minimum confidence to flag

Content Policy Configuration

Each detection category has a configurable action:

Environment Variable	Default	Options
`CONTENT_POLICY_PII`	`flag`	`reject`, `redact`, `flag`, `allow`
`CONTENT_POLICY_SECRETS`	`reject`	`reject`, `redact`, `flag`, `allow`
`CONTENT_POLICY_INJECTION`	`flag`	`reject`, `redact`, `flag`, `allow`

reject: HTTP 422 returned, memory NOT stored, SECURITY_REJECTED event emitted
redact: Matched patterns replaced with [REDACTED:<type>], memory stored with flags
flag: Memory stored with content_flags populated, available for admin review
allow: No action, content stored normally

Memory Integrity (HMAC-SHA256)

Every new memory is signed with HMAC-SHA256 at storage time.

How It Works

Canonical message format: {project_id}:{agent_id}:{content} The HMAC is computed using AEGIS_INTEGRITY_KEY (falls back to AEGIS_API_KEY).

Verification

# Verify a specific memory
POST /security/verify/{memory_id}

Returns whether the stored hash matches the recomputed hash. Legacy rows without hashes return has_hash: false.

Agent Trust Hierarchy

Four trust levels following OWASP recommendations:

Level	Write Scope	Read Scope	Delete	Admin
`untrusted`	None	Global only	No	No
`internal`	agent-private, agent-shared	Global + own	Own only	No
`privileged`	All scopes	All	All	Yes
`system`	All scopes	All	All	Yes

Agent Identity Binding

API keys can be bound to a specific agent_id via the bound_agent_id field. When set, any request using that key must match the bound agent ID. This prevents agent ID spoofing.

Per-Agent Rate Limiting

Separate from project-level rate limiting, per-agent limits prevent a single rogue agent from exhausting the project’s quota.

Setting	Default	Description
`PER_AGENT_RATE_LIMIT_PER_MINUTE`	30	Max requests per agent per minute
`PER_AGENT_RATE_LIMIT_PER_HOUR`	500	Max requests per agent per hour
`AGENT_MEMORY_LIMIT`	10,000	Max memories per agent per project

Security Admin Endpoints

All require privileged or system trust level.

Endpoint	Method	Description
`/security/scan`	POST	Dry-run content scan without storing
`/security/audit`	GET	Query security events with filters
`/security/flagged`	GET	List flagged memories pending review
`/security/verify/{id}`	POST	Verify HMAC integrity of a memory
`/security/config`	GET	Current security configuration

These five endpoints — scan, audit, flagged, verify, config — are the complete security-admin surface in the open-source distribution. There is no approve / reject / remediate workflow and no review-queue UI: flagged memories are surfaced for inspection (/security/flagged), while enforcement happens automatically at write time via the content-policy actions above (reject / redact / flag).

SDK Security Methods

from aegis_memory import AegisClient

client = AegisClient(api_key="your-key")

# Pre-scan content before storing
result = client.scan_content("Some content to check")
print(result.allowed, result.flags)

# Verify memory integrity
check = client.verify_integrity("memory-id")
print(check.integrity_valid)

# List flagged memories
flagged = client.get_flagged_memories(namespace="default")

# Query audit trail
events = client.get_security_audit(event_type="security_rejected")

# Get security config
config = client.get_security_config()

Security Configuration Reference

Variable	Default	Description
`AEGIS_INTEGRITY_KEY`	Falls back to `AEGIS_API_KEY`	HMAC signing key
`CONTENT_MAX_LENGTH`	50,000	Max content length in characters
`METADATA_MAX_DEPTH`	5	Max metadata nesting depth
`METADATA_MAX_KEYS`	50	Max total metadata keys
`CONTENT_POLICY_PII`	`flag`	Action for PII detections
`CONTENT_POLICY_SECRETS`	`reject`	Action for secret detections
`CONTENT_POLICY_INJECTION`	`flag`	Action for injection detections
`ENABLE_INTEGRITY_CHECK`	`true`	Enable HMAC signing
`PER_AGENT_RATE_LIMIT_PER_MINUTE`	30	Per-agent rate limit (minute)
`PER_AGENT_RATE_LIMIT_PER_HOUR`	500	Per-agent rate limit (hour)
`AGENT_MEMORY_LIMIT`	10,000	Max memories per agent
`ENABLE_TRUST_LEVELS`	`false`	Enable trust level enforcement
`ENABLE_LLM_INJECTION_CLASSIFIER`	`false`	Enable Stage 4 LLM classifier
`INJECTION_CLASSIFIER_PROVIDER`	`openai`	`openai` or `anthropic`
`INJECTION_CLASSIFIER_MODEL`	`gpt-4o-mini`	Model for classification
`INJECTION_CLASSIFIER_API_KEY`	Falls back to `OPENAI_API_KEY`	Dedicated API key for classifier
`INJECTION_CLASSIFIER_CONFIDENCE_THRESHOLD`	`0.7`	Minimum confidence to flag

​Security

​Why Memory Security Matters

​Content Security Pipeline

​Stage 1: Input Validation

​Stage 2: Sensitive Data Detection

​Stage 3: Prompt Injection Detection

​Stage 4: LLM-Based Injection Classification (Optional)

​Content Policy Configuration

​Memory Integrity (HMAC-SHA256)

​How It Works

​Verification

​Agent Trust Hierarchy

​Agent Identity Binding

​Per-Agent Rate Limiting

​Security Admin Endpoints

​SDK Security Methods

​Security Configuration Reference

Security

Why Memory Security Matters

Content Security Pipeline

Stage 1: Input Validation

Stage 2: Sensitive Data Detection

Stage 3: Prompt Injection Detection

Stage 4: LLM-Based Injection Classification (Optional)

Content Policy Configuration

Memory Integrity (HMAC-SHA256)

How It Works

Verification

Agent Trust Hierarchy

Agent Identity Binding

Per-Agent Rate Limiting

Security Admin Endpoints

SDK Security Methods

Security Configuration Reference