Agent Memory
Memory Types
Context window of current conversation or task. Limited by model's context length (4K-128K tokens). Includes system prompt, conversation history, retrieved context. Volatile - lost after session. RAG retrieval acts as extended short-term memory for accessing external knowledge.
Persistent storage of past interactions, experiences, user preferences across sessions. Vector database for semantic search of historical conversations. Enables personalization and learning from history. Challenges include privacy concerns, data management, and retrieval relevance tuning.
General knowledge and facts not tied to specific episodes. Base model parameters (parametric memory) plus external knowledge bases (non-parametric). Knowledge graphs, databases, documents. Updated via fine-tuning or RAG retrieval from authoritative sources.
How to perform tasks - agent capabilities and tools. Function calling, API integrations, tool use. Defined in system prompt or learned behaviors. Examples include calculator tool, web search, code execution. Enables complex multi-step workflows and task automation.
Strategies for managing limited context window. Summarization of old messages for compression. Token counting and pruning of less relevant content. Sliding window with buffer. Hierarchical summaries at multiple granularities. Selective retention of important messages based on relevance.
Track mentioned entities (people, places, things) and their attributes. Extract and update entity information from conversations. Entity-centric retrieval for personalization. Useful for multi-turn reasoning about specific entities. Tools include spaCy NER and LLM-based extraction.
Vector Stores for Memory
Managed vector database for production AI applications. Fast similarity search with metadata filtering and namespaces for multi-tenancy. Serverless or pod-based deployment options. Real-time updates and hybrid search capabilities. Excellent for memory retrieval with user/session isolation and security.
Open-source vector database with built-in vectorization modules. GraphQL API, hybrid search combining vector and keyword, multi-tenancy support. Schema-based with automatic vectorization. Self-hosted or cloud deployment. Strong integrations with Cohere and OpenAI embeddings for seamless setup.
Vector similarity search engine optimized for filtering. Rich filtering on metadata, payload, and geo-locations. Written in Rust for performance and safety. On-premise or cloud options. Excellent for agent memory with complex filter requirements like user, timestamp, and session isolation.
Add vector search to existing Redis infrastructure. Low latency in-memory search ideal for caching plus memory hybrid architectures. VSS module for semantic search. Familiar Redis operations and tooling. Particularly suitable for short-term memory with fast lookup requirements and session management.
Purpose-built memory store for LLM applications. Automatic conversation summarization, entity extraction, and fact extraction. Built-in embeddings and search capabilities. Session and user-level memory management. Open-source with cloud offering. Designed specifically for agent memory use cases.
Memory layer for personalized AI agents. Automatic memory extraction from conversations. User, session, and agent memory layers. Hybrid DB approach combining vector, graph, and key-value stores. Adaptive personalization over time. Open-source Python library with growing ecosystem.
Memory Retrieval Strategies
| Strategy | How It Works | Pros | Cons | When to Use |
|---|---|---|---|---|
| Recency-Based | Retrieve most recent N messages | Simple, preserves conversation context | Misses relevant old information | Short conversations, chat applications |
| Semantic Similarity | Vector search on query embedding | Find relevant regardless of time | May miss recent context | Knowledge-intensive tasks, long histories |
| Hybrid (Recency + Similarity) | Combine recent plus semantically relevant | Balanced context with relevance | More complex to implement | Most production agents, general purpose |
| Importance Scoring | Rank by importance (LLM scores) | Focus on key information only | Compute overhead for scoring | Critical decision tasks, summarization |
| Entity-Based | Retrieve mentions of specific entities | Targeted context for entities | Needs entity extraction pipeline | Personalization, multi-entity tracking |
| Time-Windowed | Recent time period plus similarity | Time-aware relevance | Requires timestamp metadata | Event-driven, temporal reasoning tasks |
Conversation History Management
Summarization Strategies
- Progressive summarization (summarize every N turns)
- Hierarchical summaries (turn → conversation → session)
- LLM-based extraction of key points
- Template-based structured summaries
- Token budget management with compression
Message Pruning
- Token counting and threshold limits
- Sliding window (keep last N messages)
- Remove system/function messages after use
- Compress or remove redundant exchanges
- Preserve critical messages (system prompt, user context)
Context Window Optimization
- Dynamic context assembly per request
- Priority-based message selection
- Chunking long messages
- Interleave history with retrieved memory
- Reserve tokens for system prompt + generation
Memory-Augmented Agent Patterns
Agent reflects on task performance, stores learnings in memory, retrieves for future attempts. Self-improvement through experience and reflection. Memory of successes and failures guides strategy selection. Particularly effective for iterative tasks like coding, planning, and problem-solving.
Simulate human-like agents with memory streams. Observation storage with retrieval by recency, importance, and relevance. Reflection mechanism for higher-level insights. Planning based on accumulated memories. Used in agent simulations, games, and interactive experiences.
Virtual context management inspired by OS paging. Moves memories between main context (fast, limited) and external storage (large, slower). Manages limited context window like OS manages RAM and disk. Self-directed memory operations including load, save, and edit capabilities.
Learn user preferences, habits, and context over time. User memory enables deep personalization (name, interests, history). Adaptive responses based on interaction patterns. Privacy-preserving storage with consent. Applications include personal assistants, tutors, and customer service.
Multiple agents share common memory or knowledge base. Blackboard pattern for collaborative problem-solving. Agents write findings to shared space while others read and build upon them. Requires coordination mechanisms and conflict resolution strategies for consistency.
Use historical task completions to guide planning. Case-based reasoning from past episodes and experiences. Success and failure patterns inform strategy selection. Plan retrieval and adaptation from similar past situations. Reduces trial-and-error in repeated task types through learning.
Implementation Best Practices
Privacy & Security
- User data encryption at rest and transit
- Multi-tenancy isolation (namespaces, partitions)
- Data retention policies and deletion
- PII detection and handling
- Compliance (GDPR, CCPA) considerations
Performance Optimization
- Cache frequent retrievals (Redis)
- Batch embedding generation
- Async memory operations (non-blocking)
- Index optimization for fast search
- Monitor retrieval latency (P95, P99)
Memory Quality
- Relevance scoring for retrieved memories
- Deduplication of similar memories
- Memory decay/expiration for old data
- Fact verification and consistency
- Feedback loops for memory usefulness
