The Goldfish Problem: Why I Built My Own Memory Server for LLMs
Here's something nobody warns you about when you start weaving AI into your workflow: at some point, you'll be talking to a goldfish.
Not literally. But functionally? Yeah. Every new session, blank slate. The AI has no idea you spent three hours last Tuesday explaining yourself, or that your deployment pipeline went through two failed redesigns before landing on what you're running now. It doesn't know you hate YAML, or that you have a dedicated Docker machine in your network.
It just... doesn't know. Because it can't.
That's the goldfish problem.
What Happens When You Switch Between AI Tools
It started with what felt like a brilliant idea.
I was deep in a flow state with Claude Code — prompting was sharp, context was rich, and the more we worked together, the better it understood what I needed. It was great. The kind of productive groove you don't want to break.
Then I thought: what would Mistral say about this?
I wanted to pit the AIs against each other. See who covered the other's blind spots. Seemed clever at the time. It was not.
The moment I opened Mistral, all that hard-won context — every "don't do this, do that," every quirk and preference Claude had finally internalized — gone. Poof. Mistral had no idea who I was or what we'd been building. Rightfully so, technically, but still brutal.
I asked Claude to dump everything it knew to a file so I could port it over. That kind of worked. "Kind of worked" became my new normal for a while.
The Hidden Cost of Re-Teaching AI Context
You develop habits around limitations you can't fix. I started keeping a CONTEXT.md in every repo. A personal preferences doc I'd paste at the top of new chats. A system prompt loaded from a file. I stopped noticing the friction because I'd absorbed it into my routine, the same way you stop noticing a creak in the floor until a guest points it out.
Managing Multiple Projects Across AI Assistants
My day-to-day splits across two AI assistants: Claude Code for deep work — architecture, code review, untangling unfamiliar codebases — and Mistral for quick questions I don't want polluting Claude's context. They serve different purposes, I genuinely use both, and they have absolutely no idea the other exists.
With one or two projects, it was manageable. Not great. Not terrible.
I started to maintain several codebases at once and jump between them constantly. Each has its own conventions, its own architectural decisions, its own weird history. But some knowledge cuts across all of them: how I like logging structured, my approach to error handling, my patterns, which libraries I actually trust and why.
For a couple of personal projects, I've even set up personas to make things feel more interesting — it's funnier that way, don't judge me.
That's my knowledge. Personal, not project-specific. And every single session, it evaporated.
Every. Single. One.
On a busy day — three to four projects, both tools, maybe a third model I'm testing — I'd burn a painful chunk of time just re-teaching assistants things they could already know about me.
“Just use the same session!” Thanks Captain Obvious, still doesn’t cover multiple LLMs and when you have more than a couple projects, sessions get lost and I have lost plenty of them.
Why Existing AI Memory Solutions Didn’t Work
Before I wrote any code, I did the responsible thing and looked at what already existed. There was a lot, but none of it was quite right for me.
Provider-Based Memory Features
Provider-specific memory (the "remember this across sessions" feature some services offer now) is useful but siloed. My preferences live in Claude's system, separate from Mistral's system, impossible to share or query programmatically. Also, I wasn't thrilled about handing even more of my working habits to a cloud provider. They already know plenty.
Cloud Memory Tools
Cloud-hosted memory services like Mem0 or Zep are genuinely well-engineered — structured storage, retrieval APIs, designed specifically for LLM augmentation. But my data would live on their servers, I would need another account, more money, less control, no thank you.
The Gap: Local, Shared, Model-Agnostic Memory
Other solutions needed either too much work, or too much money or both. The gap was specific: something local, usable by any AI agent without custom integration work, that could hold both personal preferences and per-project knowledge, searchable by meaning rather than exact keyword. Easy first. Everything else second.
The Breakthrough: Using MCP to Share Memory Across Models
What actually made it all possible was MCP.
The Model Context Protocol, Anthropic's open standard for connecting AI models to external tools and data sources, changed the math entirely. Claude Code supports it natively. A growing list of other clients does too. (Thank you, Dario.)
MCP meant I didn't need to write separate integrations for each tool. Build one server that speaks the protocol, and any MCP-compatible client connects to it. Claude and Mistral could both read from and write to the same store without me maintaining custom adapters for each.
Building a Local Memory Server for LLMs
The first version was modest by design. Solve my problem, not build a platform. It stored three kinds of things, because I kept running into three distinct kinds of knowledge worth preserving:
Preferences
How I like things set up, the patterns I reach for, the conventions I default to. Global across everything.
Project decisions
Architectural choices and the reasoning behind them. The stuff that isn't obvious from reading the code.
Notes
Half-formed ideas, brain dumps, observations I wasn't ready to formalize. Ideas for next steps or a next project.
Why I Chose SQLite for Storage
Storage: SQLite. A single file I can copy between machines, easily browse, and back up with cp. No server process, no configuration, no infrastructure. It’s easy. It works. Done.
The first time I used it properly: I asked Claude Code to store my git preferences in mnem-o-matic under a global namespace. It did. I switched to Mistral, asked it to check for my git preferences, and got them back. Felt good.
How Mnem-O-matic Works
Once the basic version was working, I started reaching for it in ways I hadn't planned. Storing the outcome of a tricky debugging session. Creating and saving summaries from documentation, even ideas for a novel and the contents of my pantry. I found myself even saving some chats I had with my local LLM while watching TV in the evenings. Sometimes my local 8B model gets me just right - Saved.
The more I used it, the clearer the shape of what it needed to be: not just personal memory, but a shared memory layer. Something a whole team could connect their AI tools to. Proper namespacing to keep project knowledge separate from global knowledge. Authentication for LAN deployments. Enough robustness to survive concurrent writes without quietly corrupting itself.
Performance Optimizations and Design Decisions
I started hardening it. Proper input validation. A concurrency model built on SQLite's WAL mode with per-thread connections. Bearer token auth. A Caddy reverse proxy for TLS on LAN setups. An optimized Docker build that quantizes the embedding model from FP32 to INT8 at build time, shrinking it from 80 MB to 20 MB and speeding up inference by 2–3X.
Core Data Types
The project became Mnem-O-matic. It took me longer to land on a name than to build the whole thing! That's embarrassing and also completely true — a shared memory layer for LLMs, exposed as an MCP server.
Three content types, because not everything is the same kind of thing:
Documents
Reference material, validated snippets, specs, configs. Solid, long-form, trusted.
Knowledge
Discrete facts and decisions. Atomic, query-able.
Notes
Quick thoughts, ideas, transcripts, informal observations, things that don't fit neatly anywhere else - grocery list?
All three support namespaces, tags and metadata. Everything is searchable. The hybrid search blends keyword and semantic results — a query for "authentication" surfaces entries about "login tokens" even without lexical overlap.
The embedding stack is fully self-contained. The image bundles a quantized ONNX version of all-MiniLM-L6-v2 — small, fast, runs on CPU, no GPU required, no PyTorch, no external API calls. If you're already running models locally via Ollama or something similar, the lite image can delegate to those and comes in under 120 MB.
The database is one file. Your entire knowledge base lives in a single `.db` file. Copy it to back it up. Move it between machines. Open it with any SQLite client. Send it via email if you're feeling retro. This is intentional. I wanted easy deployment, easy backup, easy everything. If you want Postgres, fork it and go wild.
Why I Open-Sourced Mnem-O-matic
At some point the question shifted from is this useful to is this useful enough to share? The answer was yes.
It's open source under Apache 2.0. Run it locally with Docker. Connect Claude Code, Copilot, Mistral or any MCP-compatible client. Your data stays on your machine. Here’s the repo if you want to try it.
What This Tool Does Not Do
Mnem-O-matic solves one specific problem well. It is not a general-purpose knowledge management system, not a RAG pipeline for large document collections, and not a replacement for documentation. The content limits are intentional. This is a memory store for LLMs, not a document database.
It also doesn't solve the harder problem of knowing when and what to remember. That's a prompting and workflow problem that a storage layer can't fix. The infrastructure is here, but using it well still requires practice.
Wrapping Up
The goldfish is still there but now it has a place to remember things. You just have to decide what’s worth keeping.
I wanted to pit the AIs against each other. See who covered the other's blind spots. Seemed clever at the time. Seems clever now.