DeskTheory
          

            / pro archive →
          

                  ISSUE 02 · TUESDAY PRO DEEP DIVE
                

                  2026.06.02
                

DeskTheory

One workflow, end-to-end. The architecture, the prompt, the trade-offs.

                    PRO / DEEP DIVE / ISSUE 02
                  

Memory that compounds instead of re-explaining your business every morning

Most mornings the first ten minutes with Claude Code are wasted, and they are wasted the same way every time.

I open a session. I want help with a pricing decision. Before Claude can be useful I have to re-explain the model: the four tiers, why the middle one is priced the way it is, what we tried in Q1 and killed, who on the team owns which number. I type it out. Claude gets smart for forty minutes. Then I close the laptop and that context evaporates. Tomorrow I do it again. Across a week that is two or three hours spent typing things I have already typed, and the deeper cost is worse: because re-orienting is annoying, I stop starting fresh sessions. I keep one bloated thread alive for two weeks until it crawls, then abandon it and lose everything that was in it.

The public version of this workflow gets you eighty percent of the fix: point QMD at a folder of notes and wire up a /recall command. Read it first if you have not. Today's deep dive is the other twenty percent: how to index Claude's own past sessions and not just your notes, how to scope the index so recall is sharp instead of noisy, the exact upgraded /recall command I run, and the four places this leaks.

                    > THE ARCHITECTURE
                  

How it actually works

The whole system is three layers, all of them plain files on your laptop, none of them a database or a cloud service.

The first layer is capture. Everything you want Claude to remember has to become markdown sitting in a folder. Two sources feed it. Your written work (meeting summaries, memos, strategy docs, your Obsidian vault) you already have. The piece the public article skips is Claude Code's own conversations. By default Claude Code does not save sessions as readable files; a community tool called sync-claude-sessions auto-exports each session to a markdown file the moment you close it, so the reasoning you did with Claude last Tuesday becomes searchable text. The trade-off lives in the same breath: exporting every session means your raw conversations sit in plaintext on disk. That is private and portable, and it is also yours to protect. Anyone holding your laptop can read them, and the backup is your job, not a vendor's.

The second layer is the index. QMD (Tobi Lütke's open-source local search) reads that folder of markdown and answers hybrid semantic-plus-keyword queries in under a second. Once a couple of months of work is in there, that is fifty or more past sessions plus every memo you have written, and a single recall finds the relevant passage across all of it before you have finished typing the question. The load-bearing choice here is scope: one global collection, or one collection per project. A single global index is convenient because one /recall reaches everything, and it is noisy for the same reason: a question about pricing drags in unrelated board notes and last year's hiring memos. Per-project collections are precise and they cost you maintenance, because now you own five indexes instead of one and a recall cannot reach across them. I run per-project collections for code and client work, plus one global "thinking" collection for memos and strategy. That split is the single highest-leverage decision in the build.

The third layer is recall: a /recall slash command that, before you ask your real question, searches the index and pulls the most relevant chunks into the conversation with citations. The choice is manual recall versus auto-recall. Auto-recall fires at the start of every session and removes the discipline cost, and it also pollutes the context with chunks you did not need and burns tokens doing it. Manual recall is sharp and depends on you remembering to type it. I keep it manual and treat "/recall first, question second" as a fixed ritual, the same way I treat the Friday commitment-ledger refresh from the last deep dive.

The reason all three layers are markdown and a local index, rather than a SaaS memory product, is the same reason the ledger is a text file. QMD can be replaced. sync-claude-sessions can break on the next Claude Code update. The markdown survives all of it. The format is the moat, not the tool.

                    > THE PRODUCTION PROMPT
                  

Paste-ready

The public article ships the basic /recall: search, summarize, continue. This is the version that has been through the trade-offs. It searches both your notes and your exported sessions, it cites which file each fact came from, and it flags when a result is old enough that you should not trust it as current. That last instruction is the one that matters: a recall built on a six-month-old memo will confidently hand you a decision you already reversed.

Save this as ~/.claude/commands/recall.md:

---
description: Search notes and past sessions, summarize with citations, flag stale results
argument-hint: [query]
---

The user is about to start real work. Before they ask, pull the relevant
context from their indexed history.

Query: $ARGUMENTS

1. Run a QMD search across both the notes collection and the exported
   Claude-session collection for the query above.
2. Summarize only the most relevant 3-5 results. For each fact, cite the
   source file and its date in parentheses.
3. If a load-bearing result is more than 90 days old, flag it explicitly:
   "(this is from <date>; confirm it is still current before relying on it)".
4. If results conflict, surface the conflict instead of picking a winner.
5. Then continue with the user's original request, using what you found.

Brief as it is, four lines of that command are doing the heavy lifting: cite the source, cite the date, flag anything stale, and surface conflicts rather than smoothing them over. Those are the lines that keep recall honest.

                    > TRADE-OFFS
                  

What this sacrifices

There are four, and pretending they are not there is how you end up trusting a system you should not.

Privacy and backup. Everything is local, which is the whole point and also the whole risk. The transcripts and notes never touch a vendor's cloud, so sensitive context (comp, hiring, board) stays on your machine. The cost is that if the laptop dies and you did not back it up, your memory dies with it. The mitigation is Time Machine plus a private git repo of the markdown folder. Do both, or accept that you are one spilled coffee from amnesia.

Recall precision and the curation tax. Semantic search returns chunks, and garbage in is garbage recalled. If you index everything, including the throwaway sessions, recall quality drops because the noise crowds out the signal. The skill the public article cannot teach is curation: index what is worth remembering, prune what is not, and phrase recalls specifically (/recall Q3 pricing model for the board deck, not /recall pricing). A practical rule: do not export the throwaway sessions you opened to test a command or debug a typo; keep the export pointed at the projects that carry real decisions. Expect roughly one recall in ten to pull the wrong note until you have calibrated.

Index staleness. qmd embed is incremental and cheap, but it does not run itself. An index you forget to rebuild will silently recall a decision you made in March as if it were current. The fix is to wrap qmd embed in a nightly cron job so the index is never more than a day behind. Until you do that, treat any recall with suspicion right after a busy week.

The single-operator ceiling. This is your memory, not your team's. There is no shared recall, no way for your head of sales to query the same index. Building team memory means a server, a real vector database, and access control, and the moment you add those you give up the local-and-private property that makes this version worth running. Do not try to bolt team memory onto this stack. It is the right tool for one operator and the wrong tool for an org.

                    > THE MOVE
                  

What you should do next

This weekend, thirty minutes. Install sync-claude-sessions, export your last ten Claude Code sessions to markdown, and point QMD at both that folder and your notes folder. Replace your /recall command with the version above. Test it on one live project: open a fresh session, type /recall on a topic you know is in there, and watch what comes back.

This week. Start every session with /recall for five working days. Notice what is signal and what is noise, and prune the index accordingly. The install takes thirty minutes; the habit is the actual work.

Two weeks out. Cron qmd embed to run nightly so the index never rots. After that the only cost left is remembering to recall, and by then it is reflex.

If you are not yet feeling the "explaining my business to Claude for the eighth time" tax, save this issue. You will feel it soon enough, and this is the fix.

Andrew

Every back issue of the Pro Deep Dive lives at /members/archive.

            desktheory.com
             ·            pro archive
             ·            written by Andrew Lissimore
          

            ISSUE 02 · 2026.06.02 · PRO