Skip to content

AI Memory Systems

Just like people, memory plays a crucial role in enhancing the efficiency of information generation. Memory can either be global or external to the existence of an agent or an agent-network, or internal to the network, and gained by experiences it gained during from the agent or agent-network's efforts. Each of these types of memories are useful to extract information that is then placed into the LLM's prompt-context and allowing a more accurate generation of information, as in retrieval augmented generation. Because of its importance agentic rag is itself an essential application of agents, and it uses cognitive architectures to improve its results.

Experiential-memory

Here we discuss experiential memory based on the activity or action action of one or many agents.

Recipients of LLM chat-interfaces with multiple sessions may benefit from stored experiential memory. Guarded by any default or manual firewalls, experiential memory may allow focused and enduring memory tracks that have more specific focuses. For instance, when a recipient is has used time to create something from scratch in a most effective manner, when that 'effective manner' needs to be understood to minimize the time necessary to do the same thing, or something similar. This is not unlikely why OpenAI enabled memory for their agents. The way this memory is managed, and accessed is of prime importance to retention and experiential transfer, the sharing of experiences between different Agents without having to 'repeat' information.

Types of memory include simple aspects such as conversation buffers to keep track of what has been said be employed to keep track of information. These buffers, can be 'private', can facilitate communication between any agents, storing response stacks that include agent-environment interactions.

For text-based memory can consist of verbatim text record or some form compressed summary to reduce memory overhead. The memory may be stored in simple file-based formats, or more complexe databases, both eith or without some form of schema that allow for generally structured representation.

Here are some general types of memory:

  • Conversaton Buffers
  • Scratch-pads
  • Gists and Summarizaton
  • Action-success lookups

For example Open AI has launched memory for chatGPT, that stores relevant memory in a manner that allows the user control of what can be stored. It does not, yet, allow for memory compartmentalization of memories into groups that could help to focus relevance to generated content.

Agentic RAG

Agentic rag refers to dynamic response generation using methods in rag coupled with cognitive architectures that aim to enable:

  • Autonomous Decision Making
  • Iterative Refinement
  • Dynamic workflow optimization

These systems will often involve query-chats to more properly understand the query's intent, if there is any ambiguity or otherwise unanswerable queries especially the

Single agent agentic rag

image

Multi-AGent Agentic Rag

image From here

Often these systems have feedback observation after generation to detect and correct

  • Hallucinations
  • Answer relevance

Storage and Retrieval Methods

Memory can be retrieved via look up methodes that involve data-base queries (SQL, Graph), though they can also use vector lookups. They can also be stored in simple ascii documents and search for via key-word lookups.

Traditional databases

Databases that rely on query-languages such as SQL or non-SQL based databases, or even 'csv-type' information stores can be accessed and generated using agents.

Graph Databases

Graph Databases provide the ability to put information in relational contexts. Both native and not, they can allow for rich understandings of how things are connected, though sometimes overly complex. Often interacted with using query languages like Cypher, these can be sometimes challenging to extract the appropriate information, making their query very powerful.

Neo4j has formed a semantic layer, as shown in the tomasonjo/llm-movieagent repository.

Vector Databases

Vector databases store and retrieve information based on semantic similarity rather than exact matches. They are essential for implementing efficient retrieval systems in AI applications.

Caching

Caching involves storing frequently accessed or computationally expensive results to reduce latency and improve performance. In AI systems, caching can significantly reduce response times and computational costs.

Cognitive Architectures

Cognitive architectures provide structured frameworks for organizing memory, reasoning, and decision-making processes in AI systems. They help in creating more coherent and effective agent behaviors.

References

For more information on memory implementations and caching, refer to the following resources: - Langchain memory

Solutions

Mem0: provides memory for agents in an ice an easy manner
Graphiti builds dynamic, temporally aware Knowledge Graphs that represent complex, evolving relationships between entities over time.

Graphiti ingests both unstructured and structured data, and the resulting graph may be queried using a fusion of time, full-text, semantic, and graph algorithm approaches. GetZep: self-improving memory users, sessions and more"

Memobase provides a user profile-based memory system for AI applications.

Memobase is designed to bring long-term user memory to GenAI applications with a focus on structured user profiles. Key features include:

  • Memory focused on users rather than agents
  • Time-aware memory that prevents outdated information
  • Controllable memory with flexible configuration
  • Easy integration with existing LLM stacks via API and SDKs (Python/Node/Go)
  • Batch processing via non-embedding system and session buffer
  • Production-ready system tested by partners

Research

A-MEM: Agentic Memory for LLM Agents

The authors generate dynamic memory structuring without static predetermined memory, based on Zettelkasten method. They have memory architectur that

For each new memory, a comprehensive notes are made, and integrate with structured text attributes and embedding vectors for imilarity matching. it uses historical memory repository to create relevant connections and shared attributes. This allows dynamic evolution when new memories are incorporated. (1) Link generation between memories with shared attributes/descriptions (2) Evolve existing memories to evolve with higher order patterns.

image

Code not yet available here https://github.com/WujiangXu/AgenticMemory

📋
GitHub Repo stars BooookScore: A systematic exploration of book-length summarization in the era of LLMs

Developments The authors reveal an effective manner of providing effective summaries of long books using two methods: 1. Hierarchichal merging of chunk-level summaries, and 2. Incremental update using a running summary. image Results Human evaluation shows that "hierarchical merging produces more coherent summaries but may lack detail compared to incremental updating; closedsource models like GPT-4 and Claude 2 generate the most coherent summaries; and increasing chunk size can significantly improve incremental updating" Paper

📋
Read-agent: A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Jupyter notebook Developments

The authors reveal a manner of reading long documents and summarizing it using Gist memory to deal with Long Contexts.

Problem

Context length of long inputs limits the ability for model to perform effectively and efficienntly.

Solution

With inspiration in how people interactively read long documents, the authors implement a simple prompting-based system that

  1. Decides what content should be stored togeter in a memory episode
  2. Compresses those memories into short episodic memories called gist memories and
  3. Takes actions to look up sections in the original text if memory needs to be refreshed

Results The simple method improves reading comperhension tasks at the same time as enabling context windows that are 3-20x bigger.

Paper

MemGPT allows you to build LLM agents with self-editing memory