Are we close to figuring out LLM/Agent Memory

Question

The main issue I experience with LLMs, and the one that seems to most inhibit my further adoption is lack of ability of agents to remember relevant context.A few years ago everyone was using RAG, embeddings, databases on top of models. Now models with access to local markdown and memory files (like OpenClaw) seem to be readily outperforming these databases with grep and simple UNIX tools.Is this an inherent issue in scaling LLMS? Does Obsidian work that much better for most people? It anyone finding anything that actually outperforms markdown?At this point the main bottleneck in my adoption seems to be memory and persistent long term context, not quality or reliability of the models.I'm curious if there are any technical or scaling metrics we could use to forecast where this will end up going.

AndyNemmity · Accepted Answer

I don't think there are reasonable metrics.
I have a custom learning system. We are all trying things, that's where ai development is.
None of us know the best solution. We are all exploring in paths. I don't find memory and persistent long term context to be an issue for me, but I am using a full custom ai claude code setup, so perhaps I have sorted it for myself. Unsure.
Can you give a specific example? Like, talk through your workflow so I can understand it better?

kageroumado · Answer

For my personal use case, I use something that's known as “lossless context management.” I made a custom harness implementation that uses it. In short, it has a database with every message ever exchanged, and the model can access any of those messages using a simple search. On top of that, every exchange is summarized and stored separately as a level zero summary. Level zero summaries are then periodically summarized together into level one summaries that leave only the most important parts (lessons, knowledge).
The full context then looks something like: [intro prompt] + [old exhanges lvl 1 summaries] + [larger system prompt] + [more recent exchanges lvl 0 summaries] + [temporal context] + [recent messages with tool results stripped] + [recent messages including tool results]
Tool results are progressively stripped because they are generally only useful for a few turns. This allows to keep everything we've ever done in the context, and the model can easily look up more information by expanding each node. It's a single perpetual session that never compacts during active work.
I find it outperforming every other solution I tried for my use case (personal assistant).