The State of AI Memory

Most of what we do at Jean Technologies is matching. We learn representations of people and opportunities, then predict which pairs produce good outcomes. But the first thing we actually built was memory, and that was not an accident. You cannot match a person you know nothing about. Before any of the matching works, you have to collect context about someone, what they have done, what they care about, where they are trying to go, and that context has to be gathered and organized long before it can ever be matched on.

Gathering and organizing context about a person is exactly the memory problem. In late 2025 the market had not turned to matching yet, and the honest reason was that the layer underneath it, intelligent memory, was still being invented in public. So we went down a layer and wrote the comprehensive review this post accompanies, before coming back up to the matching problem it makes possible.

What memory actually is

The first problem is that the word is overloaded. People say "memory" and mean a vector database, a longer context window, a fine-tune, or a JSON blob of preferences stapled to a system prompt. Those are mechanisms, not a definition, and treating any one of them as the whole thing is how the field ended up with memory that does not feel like memory.

We define it functionally. Memory is the active process by which a system decides what from its experience is worth keeping, organizes it, and brings the right slice back at the right moment. That definition is deliberately not retrieval-augmented generation [Lewis 2020] and not context engineering. RAG is one possible storage and recall mechanism. Context engineering is what you do with what you recalled. Memory is the loop that connects them, and most current systems retrofitted that loop onto models that were designed as stateless inference engines, databases rather than minds. The retrofit shows.

Experience, Storage, Recall

The framework we kept coming back to is a pipeline with three stages, shown on the right. Experience is the intake stage: a stream of raw interaction arrives and something has to decide what is signal and what is noise. Most of what a system sees is not worth keeping, and the decision of what to discard is the part nobody wants to own, because it is lossy and it is where the system either earns its intelligence or fails to.

Storage is organization. Kept experience has to be placed somewhere with enough structure that it can be found again by meaning rather than by exact match. Recall is retrieval under a real constraint: you do not get to load everything, you get a budget, and recall is the act of spending that budget on the few things most relevant to the moment.

The point of drawing it this way is that memory is a computation, not a drawer. A passive store remembers everything and surfaces nothing useful. An intelligent memory system is constantly making lossy decisions at every stage, and the quality of those decisions, not the size of the store, is what separates a system that knows you from one that has merely logged you.

The Memory Frontier

Once you accept that storage means organization, you run into the central tradeoff. To make recall fast and clean you want to partition memory into small, tightly scoped pieces, because small partitions retrieve precisely and cheaply. But meaning does not respect partitions. The fact that explains why someone left their last job might sit in a different partition than the one you searched, and the finer you slice, the more often the pieces you need are scattered across boundaries.

We call this the Memory Frontier. Retrieval gets sharper as you partition more, and semantic coherence gets worse as related context fragments. There is no setting that wins on both, only a band in the middle, and where that band sits shifts with the corpus and the query. Partitioning also forces a routing step, since every recall has to first guess which pieces to look in, which is why a longer context window is not a real substitute for memory: past a certain scale models lose the relevant fact in the middle of the window [Liu 2024]. The review walks the architectures that navigate this, vector stores, graphs, hybrids, and the newer agentic systems [Packer 2023].

The meta of memory

This is where memory turns into matching. Think about everything a memory system has to do for a single person. There are a million data points scattered across a life: messages, decisions, the things someone returned to and the things they walked away from. A memory system takes that pile and files it. It sorts each piece into some structure, into a space, and usually embeds each snippet so it can be found by meaning later. That is most of what the review is about, and it is already hard.

Matching needs the layer above that, which we think of as the meta of memory. Knowing the snippets, even well organized, is not the same as knowing the person. The meta layer asks a different question: given everything the system has stored, what is the single representation that captures who this person is, not as a search over their history but as something you can hold in one place and compare against someone else.

And the representation we care about is not only backward looking. A good user embedding has to hold three things at once: who someone was, the memories and the track record; who they are now, their current state and context; and who they want to become, their intent. That last one is the part raw memory never hands you, because intent is not in the record yet. It is the direction the record is pointing. Matching on memories alone matches people to their past. Matching on the meta of memory, the representation that carries intent, is what lets you match people to where they are going.

That is the whole reason memory came first for us. You cannot build the meta layer without the memory layer underneath it, and the quality of everything above is set by how well the layer below collected and organized the person in the first place.

Why we are early

The review ends on a few open problems that we still think are underrated. Evaluation is the first: there is no agreed way to measure whether a memory system is actually good, because the thing you care about, did it remember the right thing at the right time, only shows up downstream in task performance. Sleep-time compute is the second: the idea that a system should reorganize its own memory while idle, the way consolidation works in people, rather than only at the moment of recall.

So this is the artifact from the period before the matching work was visible. The full review goes architecture by architecture and maps the market participants; the formal treatment of the pipeline and its design principles is in the companion paper. If you are building in this layer, or building matching on top of it and feeling the floor give way, we would like to talk.

References

Politzki, J. (2026). Memory in LLM Systems: Design Principles for Experience, Storage, and Recall. Jean Technologies. PDF
Lewis, P. et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS. arXiv:2005.11401
Liu, N. F. et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. TACL. arXiv:2307.03172
Packer, C. et al. (2023). MemGPT: Towards LLMs as Operating Systems. arXiv:2310.08560