Agent Memory

psychology

Memory Types

Short-Term Memory (Working Memory)

Context window of current conversation or task. Limited by model's context length (4K-128K tokens). Includes system prompt, conversation history, retrieved context. Volatile - lost after session. RAG retrieval acts as extended short-term memory for accessing external knowledge.

Similar Technologies
Context WindowSession StateConversation BufferPrompt ContextIn-Context Learning
Long-Term Memory (Episodic Memory)

Persistent storage of past interactions, experiences, user preferences across sessions. Vector database for semantic search of historical conversations. Enables personalization and learning from history. Challenges include privacy concerns, data management, and retrieval relevance tuning.

Similar Technologies
Vector StoreKnowledge BaseUser ProfilesConversation History DBMemory Streams
Semantic Memory (Knowledge)

General knowledge and facts not tied to specific episodes. Base model parameters (parametric memory) plus external knowledge bases (non-parametric). Knowledge graphs, databases, documents. Updated via fine-tuning or RAG retrieval from authoritative sources.

Similar Technologies
Knowledge GraphsRAGFine-Tuned ParametersExternal APIsStructured Databases
Procedural Memory (Skills)

How to perform tasks - agent capabilities and tools. Function calling, API integrations, tool use. Defined in system prompt or learned behaviors. Examples include calculator tool, web search, code execution. Enables complex multi-step workflows and task automation.

Similar Technologies
Tool DefinitionsFunction CallingAction SpaceSkill LibraryPlugin System
Working Memory Management

Strategies for managing limited context window. Summarization of old messages for compression. Token counting and pruning of less relevant content. Sliding window with buffer. Hierarchical summaries at multiple granularities. Selective retention of important messages based on relevance.

Similar Technologies
Conversation SummarizationToken BudgetMessage PruningHierarchical MemoryForgetting Mechanisms
Entity Memory

Track mentioned entities (people, places, things) and their attributes. Extract and update entity information from conversations. Entity-centric retrieval for personalization. Useful for multi-turn reasoning about specific entities. Tools include spaCy NER and LLM-based extraction.

Similar Technologies
Knowledge GraphEntity LinkingCoreference ResolutionRelation ExtractionMemory Networks
storage

Vector Stores for Memory

Pinecone

Managed vector database for production AI applications. Fast similarity search with metadata filtering and namespaces for multi-tenancy. Serverless or pod-based deployment options. Real-time updates and hybrid search capabilities. Excellent for memory retrieval with user/session isolation and security.

Similar Technologies
WeaviateQdrantMilvusChromapgvector
Weaviate

Open-source vector database with built-in vectorization modules. GraphQL API, hybrid search combining vector and keyword, multi-tenancy support. Schema-based with automatic vectorization. Self-hosted or cloud deployment. Strong integrations with Cohere and OpenAI embeddings for seamless setup.

Similar Technologies
PineconeQdrantMilvusChromaVespa
Qdrant

Vector similarity search engine optimized for filtering. Rich filtering on metadata, payload, and geo-locations. Written in Rust for performance and safety. On-premise or cloud options. Excellent for agent memory with complex filter requirements like user, timestamp, and session isolation.

Similar Technologies
WeaviatePineconeMilvusRedis SearchElasticsearch
Redis with Vector Similarity

Add vector search to existing Redis infrastructure. Low latency in-memory search ideal for caching plus memory hybrid architectures. VSS module for semantic search. Familiar Redis operations and tooling. Particularly suitable for short-term memory with fast lookup requirements and session management.

Similar Technologies
PineconeWeaviateMemcachedDragonflyDBKeyDB
Zep

Purpose-built memory store for LLM applications. Automatic conversation summarization, entity extraction, and fact extraction. Built-in embeddings and search capabilities. Session and user-level memory management. Open-source with cloud offering. Designed specifically for agent memory use cases.

Similar Technologies
LangMemMem0Custom Vector StorePinecone + LogicChroma + Extensions
Mem0 (EmbedChain Memory)

Memory layer for personalized AI agents. Automatic memory extraction from conversations. User, session, and agent memory layers. Hybrid DB approach combining vector, graph, and key-value stores. Adaptive personalization over time. Open-source Python library with growing ecosystem.

Similar Technologies
ZepLangMemCustom ImplementationAgent Protocol + StoragePinecone
search

Memory Retrieval Strategies

StrategyHow It WorksProsConsWhen to Use
Recency-BasedRetrieve most recent N messagesSimple, preserves conversation contextMisses relevant old informationShort conversations, chat applications
Semantic SimilarityVector search on query embeddingFind relevant regardless of timeMay miss recent contextKnowledge-intensive tasks, long histories
Hybrid (Recency + Similarity)Combine recent plus semantically relevantBalanced context with relevanceMore complex to implementMost production agents, general purpose
Importance ScoringRank by importance (LLM scores)Focus on key information onlyCompute overhead for scoringCritical decision tasks, summarization
Entity-BasedRetrieve mentions of specific entitiesTargeted context for entitiesNeeds entity extraction pipelinePersonalization, multi-entity tracking
Time-WindowedRecent time period plus similarityTime-aware relevanceRequires timestamp metadataEvent-driven, temporal reasoning tasks
history

Conversation History Management

Summarization Strategies

  • Progressive summarization (summarize every N turns)
  • Hierarchical summaries (turn → conversation → session)
  • LLM-based extraction of key points
  • Template-based structured summaries
  • Token budget management with compression

Message Pruning

  • Token counting and threshold limits
  • Sliding window (keep last N messages)
  • Remove system/function messages after use
  • Compress or remove redundant exchanges
  • Preserve critical messages (system prompt, user context)

Context Window Optimization

  • Dynamic context assembly per request
  • Priority-based message selection
  • Chunking long messages
  • Interleave history with retrieved memory
  • Reserve tokens for system prompt + generation
smart_toy

Memory-Augmented Agent Patterns

Reflexion Pattern

Agent reflects on task performance, stores learnings in memory, retrieves for future attempts. Self-improvement through experience and reflection. Memory of successes and failures guides strategy selection. Particularly effective for iterative tasks like coding, planning, and problem-solving.

Similar Technologies
Simple RetryStatic PromptsRLHFSelf-CritiqueError Correction
Generative Agents (Stanford)

Simulate human-like agents with memory streams. Observation storage with retrieval by recency, importance, and relevance. Reflection mechanism for higher-level insights. Planning based on accumulated memories. Used in agent simulations, games, and interactive experiences.

Similar Technologies
Simple ChatbotsStateless AgentsRule-Based NPCsMemory NetworksCognitive Architectures
MemGPT

Virtual context management inspired by OS paging. Moves memories between main context (fast, limited) and external storage (large, slower). Manages limited context window like OS manages RAM and disk. Self-directed memory operations including load, save, and edit capabilities.

Similar Technologies
SummarizationSliding WindowVector RetrievalUnlimited Context ModelsHierarchical Context
Personalization Agents

Learn user preferences, habits, and context over time. User memory enables deep personalization (name, interests, history). Adaptive responses based on interaction patterns. Privacy-preserving storage with consent. Applications include personal assistants, tutors, and customer service.

Similar Technologies
User ProfilesCollaborative FilteringSession StateStatic PersonalizationRule-Based Systems
Multi-Agent Shared Memory

Multiple agents share common memory or knowledge base. Blackboard pattern for collaborative problem-solving. Agents write findings to shared space while others read and build upon them. Requires coordination mechanisms and conflict resolution strategies for consistency.

Similar Technologies
Independent AgentsMessage PassingHierarchical OrchestrationAgent Communication ProtocolSwarm Intelligence
Memory-Guided Planning

Use historical task completions to guide planning. Case-based reasoning from past episodes and experiences. Success and failure patterns inform strategy selection. Plan retrieval and adaptation from similar past situations. Reduces trial-and-error in repeated task types through learning.

Similar Technologies
Reactive PlanningHardcoded PlansReinforcement LearningTree SearchHeuristic Planning
checklist

Implementation Best Practices

Privacy & Security

  • User data encryption at rest and transit
  • Multi-tenancy isolation (namespaces, partitions)
  • Data retention policies and deletion
  • PII detection and handling
  • Compliance (GDPR, CCPA) considerations

Performance Optimization

  • Cache frequent retrievals (Redis)
  • Batch embedding generation
  • Async memory operations (non-blocking)
  • Index optimization for fast search
  • Monitor retrieval latency (P95, P99)

Memory Quality

  • Relevance scoring for retrieved memories
  • Deduplication of similar memories
  • Memory decay/expiration for old data
  • Fact verification and consistency
  • Feedback loops for memory usefulness