Skip to main content

Security

Why Memory Security Matters

In multi-agent systems, agents trust each other by default. When your researcher agent passes output to your writer agent, the writer treats that as a legitimate instruction. If you compromise one agent, you get every downstream agent automatically. The 2025 incident landscape proved this at scale:
  • EchoLeak (CVE-2025-32711, CVSS 9.3): A single crafted email triggered automatic data exfiltration from Microsoft 365 Copilot
  • CrewAI + GPT-4o: 65% exfiltration success rate in tested scenarios
  • Drift chatbot cascade: One compromised agent integration cascaded into 700+ organizations
Memory is the attack surface. Aegis implements OWASP AI Agent Security Cheat Sheet Section 3 natively.

Content Security Pipeline

Every memory write passes through a four-stage content security pipeline before persistence.

Stage 1: Input Validation

  • Content length: Max 50,000 characters (configurable via CONTENT_MAX_LENGTH)
  • Metadata depth: Max 5 levels of nesting (configurable via METADATA_MAX_DEPTH)
  • Metadata keys: Max 50 total keys (configurable via METADATA_MAX_KEYS)
  • Encoding: Null bytes and control characters rejected (except \n, \t, \r)

Stage 2: Sensitive Data Detection

Detects PII and secrets using compiled regex patterns:
  • SSN patterns (\b\d{3}-\d{2}-\d{4}\b)
  • Credit card numbers (Luhn-validated 13-19 digit sequences)
  • API keys: AWS (AKIA...), OpenAI (sk-...), GitHub (ghp_..., gho_...)
  • Email addresses
  • Password assignments (password=, secret:, etc.)

Stage 3: Prompt Injection Detection

Detects common injection patterns:
  • System prompt overrides: “ignore previous instructions”, “you are now”, “new instructions”
  • Role manipulation: “pretend you are”, “act as”, “you must now”
  • Data exfiltration triggers: “send data to”, “exfiltrate”, “forward to” with URLs

Stage 4: LLM-Based Injection Classification (Optional)

When enabled, an LLM classifier runs as an async second opinion after regex detection. Stage 4 only fires when the risk warrants the latency/cost:
  • Untrusted or unknown trust level
  • Agent-shared or global scope
  • Content that was regex-flagged but not rejected (Stage 3 flagged it)
The classifier asks a focused binary question: “Does this text contain instructions that attempt to manipulate an AI system’s behavior?” and returns a confidence score. Escalation logic:
  • Confidence >= 0.8: escalate to REJECT
  • Confidence >= threshold (default 0.7) but < 0.8: add llm_injection_flagged flag, keep existing action
  • LLM error (timeout, API failure): fall back to regex-only verdict (graceful degradation)
Configuration:
Environment VariableDefaultDescription
ENABLE_LLM_INJECTION_CLASSIFIERfalseEnable Stage 4
INJECTION_CLASSIFIER_PROVIDERopenaiopenai or anthropic
INJECTION_CLASSIFIER_MODELgpt-4o-miniModel to use for classification
INJECTION_CLASSIFIER_API_KEYFalls back to OPENAI_API_KEY
INJECTION_CLASSIFIER_CONFIDENCE_THRESHOLD0.7Minimum confidence to flag

Content Policy Configuration

Each detection category has a configurable action:
Environment VariableDefaultOptions
CONTENT_POLICY_PIIflagreject, redact, flag, allow
CONTENT_POLICY_SECRETSrejectreject, redact, flag, allow
CONTENT_POLICY_INJECTIONflagreject, redact, flag, allow
  • reject: HTTP 422 returned, memory NOT stored, SECURITY_REJECTED event emitted
  • redact: Matched patterns replaced with [REDACTED:<type>], memory stored with flags
  • flag: Memory stored with content_flags populated, available for admin review
  • allow: No action, content stored normally

Memory Integrity (HMAC-SHA256)

Every new memory is signed with HMAC-SHA256 at storage time.

How It Works

Canonical message format: {project_id}:{agent_id}:{content} The HMAC is computed using AEGIS_INTEGRITY_KEY (falls back to AEGIS_API_KEY).

Verification

# Verify a specific memory
POST /security/verify/{memory_id}
Returns whether the stored hash matches the recomputed hash. Legacy rows without hashes return has_hash: false.

Agent Trust Hierarchy

Four trust levels following OWASP recommendations:
LevelWrite ScopeRead ScopeDeleteAdmin
untrustedNoneGlobal onlyNoNo
internalagent-private, agent-sharedGlobal + ownOwn onlyNo
privilegedAll scopesAllAllYes
systemAll scopesAllAllYes

Agent Identity Binding

API keys can be bound to a specific agent_id via the bound_agent_id field. When set, any request using that key must match the bound agent ID. This prevents agent ID spoofing.

Per-Agent Rate Limiting

Separate from project-level rate limiting, per-agent limits prevent a single rogue agent from exhausting the project’s quota.
SettingDefaultDescription
PER_AGENT_RATE_LIMIT_PER_MINUTE30Max requests per agent per minute
PER_AGENT_RATE_LIMIT_PER_HOUR500Max requests per agent per hour
AGENT_MEMORY_LIMIT10,000Max memories per agent per project

Security Admin Endpoints

All require privileged or system trust level.
EndpointMethodDescription
/security/auditGETQuery security events with filters
/security/flaggedGETList flagged memories pending review
/security/verify/{id}POSTVerify HMAC integrity of a memory
/security/configGETCurrent security configuration
/security/scanPOSTDry-run content scan without storing

SDK Security Methods

from aegis_memory import AegisClient

client = AegisClient(api_key="your-key")

# Pre-scan content before storing
result = client.scan_content("Some content to check")
print(result.allowed, result.flags)

# Verify memory integrity
check = client.verify_integrity("memory-id")
print(check.integrity_valid)

# List flagged memories
flagged = client.get_flagged_memories(namespace="default")

# Query audit trail
events = client.get_security_audit(event_type="security_rejected")

# Get security config
config = client.get_security_config()

Security Configuration Reference

VariableDefaultDescription
AEGIS_INTEGRITY_KEYFalls back to AEGIS_API_KEYHMAC signing key
CONTENT_MAX_LENGTH50,000Max content length in characters
METADATA_MAX_DEPTH5Max metadata nesting depth
METADATA_MAX_KEYS50Max total metadata keys
CONTENT_POLICY_PIIflagAction for PII detections
CONTENT_POLICY_SECRETSrejectAction for secret detections
CONTENT_POLICY_INJECTIONflagAction for injection detections
ENABLE_INTEGRITY_CHECKtrueEnable HMAC signing
PER_AGENT_RATE_LIMIT_PER_MINUTE30Per-agent rate limit (minute)
PER_AGENT_RATE_LIMIT_PER_HOUR500Per-agent rate limit (hour)
AGENT_MEMORY_LIMIT10,000Max memories per agent
ENABLE_TRUST_LEVELSfalseEnable trust level enforcement
ENABLE_LLM_INJECTION_CLASSIFIERfalseEnable Stage 4 LLM classifier
INJECTION_CLASSIFIER_PROVIDERopenaiopenai or anthropic
INJECTION_CLASSIFIER_MODELgpt-4o-miniModel for classification
INJECTION_CLASSIFIER_API_KEYFalls back to OPENAI_API_KEYDedicated API key for classifier
INJECTION_CLASSIFIER_CONFIDENCE_THRESHOLD0.7Minimum confidence to flag