Will Percey — Portfolio

Agent Memory

> > Updated Dec 2025

psychology

Memory Types

Short-Term Memory (Working Memory)

Context window of current conversation or task. Limited by model's context length (4K-128K tokens). Includes system prompt, conversation history, retrieved context. Volatile - lost after session. RAG retrieval acts as extended short-term memory for accessing external knowledge.

Similar Technologies

Context WindowSession StateConversation BufferPrompt ContextIn-Context Learning

Long-Term Memory (Episodic Memory)

Persistent storage of past interactions, experiences, user preferences across sessions. Vector database for semantic search of historical conversations. Enables personalization and learning from history. Challenges include privacy concerns, data management, and retrieval relevance tuning.

Similar Technologies

Vector StoreKnowledge BaseUser ProfilesConversation History DBMemory Streams

Semantic Memory (Knowledge)

General knowledge and facts not tied to specific episodes. Base model parameters (parametric memory) plus external knowledge bases (non-parametric). Knowledge graphs, databases, documents. Updated via fine-tuning or RAG retrieval from authoritative sources.

Similar Technologies

Knowledge GraphsRAGFine-Tuned ParametersExternal APIsStructured Databases

Procedural Memory (Skills)

How to perform tasks - agent capabilities and tools. Function calling, API integrations, tool use. Defined in system prompt or learned behaviors. Examples include calculator tool, web search, code execution. Enables complex multi-step workflows and task automation.

Similar Technologies

Tool DefinitionsFunction CallingAction SpaceSkill LibraryPlugin System

Working Memory Management

Strategies for managing limited context window. Summarization of old messages for compression. Token counting and pruning of less relevant content. Sliding window with buffer. Hierarchical summaries at multiple granularities. Selective retention of important messages based on relevance.

Similar Technologies

Conversation SummarizationToken BudgetMessage PruningHierarchical MemoryForgetting Mechanisms

Entity Memory

Track mentioned entities (people, places, things) and their attributes. Extract and update entity information from conversations. Entity-centric retrieval for personalization. Useful for multi-turn reasoning about specific entities. Tools include spaCy NER and LLM-based extraction.

Similar Technologies

Knowledge GraphEntity LinkingCoreference ResolutionRelation ExtractionMemory Networks

storage

Vector Stores for Memory

Pinecone

Managed vector database for production AI applications. Fast similarity search with metadata filtering and namespaces for multi-tenancy. Serverless or pod-based deployment options. Real-time updates and hybrid search capabilities. Excellent for memory retrieval with user/session isolation and security.

Similar Technologies

WeaviateQdrantMilvusChromapgvector

Weaviate

Open-source vector database with built-in vectorization modules. GraphQL API, hybrid search combining vector and keyword, multi-tenancy support. Schema-based with automatic vectorization. Self-hosted or cloud deployment. Strong integrations with Cohere and OpenAI embeddings for seamless setup.

Similar Technologies

PineconeQdrantMilvusChromaVespa

Qdrant

Vector similarity search engine optimized for filtering. Rich filtering on metadata, payload, and geo-locations. Written in Rust for performance and safety. On-premise or cloud options. Excellent for agent memory with complex filter requirements like user, timestamp, and session isolation.

Similar Technologies

WeaviatePineconeMilvusRedis SearchElasticsearch

Redis with Vector Similarity

Add vector search to existing Redis infrastructure. Low latency in-memory search ideal for caching plus memory hybrid architectures. VSS module for semantic search. Familiar Redis operations and tooling. Particularly suitable for short-term memory with fast lookup requirements and session management.

Similar Technologies

PineconeWeaviateMemcachedDragonflyDBKeyDB

Zep

Purpose-built memory store for LLM applications. Automatic conversation summarization, entity extraction, and fact extraction. Built-in embeddings and search capabilities. Session and user-level memory management. Open-source with cloud offering. Designed specifically for agent memory use cases.

Similar Technologies

LangMemMem0Custom Vector StorePinecone + LogicChroma + Extensions

Mem0 (EmbedChain Memory)

Memory layer for personalized AI agents. Automatic memory extraction from conversations. User, session, and agent memory layers. Hybrid DB approach combining vector, graph, and key-value stores. Adaptive personalization over time. Open-source Python library with growing ecosystem.

Similar Technologies

ZepLangMemCustom ImplementationAgent Protocol + StoragePinecone

Memory Retrieval Strategies

Strategy	How It Works	Pros	Cons	When to Use
Recency-Based	Retrieve most recent N messages	Simple, preserves conversation context	Misses relevant old information	Short conversations, chat applications
Semantic Similarity	Vector search on query embedding	Find relevant regardless of time	May miss recent context	Knowledge-intensive tasks, long histories
Hybrid (Recency + Similarity)	Combine recent plus semantically relevant	Balanced context with relevance	More complex to implement	Most production agents, general purpose
Importance Scoring	Rank by importance (LLM scores)	Focus on key information only	Compute overhead for scoring	Critical decision tasks, summarization
Entity-Based	Retrieve mentions of specific entities	Targeted context for entities	Needs entity extraction pipeline	Personalization, multi-entity tracking
Time-Windowed	Recent time period plus similarity	Time-aware relevance	Requires timestamp metadata	Event-driven, temporal reasoning tasks

history

Conversation History Management

Summarization Strategies

Progressive summarization (summarize every N turns)
Hierarchical summaries (turn → conversation → session)
LLM-based extraction of key points
Template-based structured summaries
Token budget management with compression

Message Pruning

Token counting and threshold limits
Sliding window (keep last N messages)
Remove system/function messages after use
Compress or remove redundant exchanges
Preserve critical messages (system prompt, user context)

Context Window Optimization

Dynamic context assembly per request
Priority-based message selection
Chunking long messages
Interleave history with retrieved memory
Reserve tokens for system prompt + generation

smart_toy

Memory-Augmented Agent Patterns

Reflexion Pattern

Agent reflects on task performance, stores learnings in memory, retrieves for future attempts. Self-improvement through experience and reflection. Memory of successes and failures guides strategy selection. Particularly effective for iterative tasks like coding, planning, and problem-solving.

Similar Technologies

Simple RetryStatic PromptsRLHFSelf-CritiqueError Correction

Generative Agents (Stanford)

Simulate human-like agents with memory streams. Observation storage with retrieval by recency, importance, and relevance. Reflection mechanism for higher-level insights. Planning based on accumulated memories. Used in agent simulations, games, and interactive experiences.

Similar Technologies

Simple ChatbotsStateless AgentsRule-Based NPCsMemory NetworksCognitive Architectures

MemGPT

Virtual context management inspired by OS paging. Moves memories between main context (fast, limited) and external storage (large, slower). Manages limited context window like OS manages RAM and disk. Self-directed memory operations including load, save, and edit capabilities.

Similar Technologies

SummarizationSliding WindowVector RetrievalUnlimited Context ModelsHierarchical Context

Personalization Agents

Learn user preferences, habits, and context over time. User memory enables deep personalization (name, interests, history). Adaptive responses based on interaction patterns. Privacy-preserving storage with consent. Applications include personal assistants, tutors, and customer service.

Similar Technologies

User ProfilesCollaborative FilteringSession StateStatic PersonalizationRule-Based Systems

Multi-Agent Shared Memory

Multiple agents share common memory or knowledge base. Blackboard pattern for collaborative problem-solving. Agents write findings to shared space while others read and build upon them. Requires coordination mechanisms and conflict resolution strategies for consistency.

Similar Technologies

Independent AgentsMessage PassingHierarchical OrchestrationAgent Communication ProtocolSwarm Intelligence

Memory-Guided Planning

Use historical task completions to guide planning. Case-based reasoning from past episodes and experiences. Success and failure patterns inform strategy selection. Plan retrieval and adaptation from similar past situations. Reduces trial-and-error in repeated task types through learning.

Similar Technologies

Reactive PlanningHardcoded PlansReinforcement LearningTree SearchHeuristic Planning

checklist

Implementation Best Practices

Privacy & Security

User data encryption at rest and transit
Multi-tenancy isolation (namespaces, partitions)
Data retention policies and deletion
PII detection and handling
Compliance (GDPR, CCPA) considerations

Performance Optimization

Cache frequent retrievals (Redis)
Batch embedding generation
Async memory operations (non-blocking)
Index optimization for fast search
Monitor retrieval latency (P95, P99)

Memory Quality

Relevance scoring for retrieved memories
Deduplication of similar memories
Memory decay/expiration for old data
Fact verification and consistency
Feedback loops for memory usefulness