The Problem with Agentic Memory and How to Solve It

Introduction

In this article, I will try to dissect what this ‘memory problem’ is and propose a memory architecture that is stateless, painless, and effortless that saves you the headache when developing complex software applications with the help of LLMs aka when you Vibecode. To understand the latter concepts, we will first look at what memory is for an LLM, whatever we discuss in a chat session, the data is stored as “Context”, the AI model will use this context to retrieve information, process outputs and handle a plethora of requests. AI models such as Claude, GPT and Gemini, they all have a ceiling up to how much information they can store in a single session, when this limit is exceeded, whatever you ask, would return degraded response, it will still reply, but the quality and comprehension will deteriorate; almost as if it’s making things up. This specific ceiling is known as a context window. I have personally used Claude for most of my app development, I have used it to write Java, Python, Go, and Bash workflows. Whenever I hit the limit on my Cursor (reaching context limits), Claude would summarise the chat session to keep updating its knowledge, it’s the same in Claude’s web interface, the tokens aren’t cheap either, when you let the agent go haywire on the amount of tokens it can consume; it will always consume more, reading what’s inside a file is a black box for an AI, unless you explicitly tell it, the agent won’t know.

The Problem

The context window would work just fine, best to be honest, that is, at the beginning of the session, as you keep exchanging information, the time taken to complete its input/output operations will keep on increasing, and I am talking about agentic development in IDEs such as VSCode or Cursor or Zed, and especially on a codebase that exceeds 30k or 40k lines, I have very fond memories of working on a Java project that had to do with AST analysis for CVE reachability, and the number of lines kept increasing as my codebase grew. This was the time when I burned through my entire $20 usage in Cursor in a 3-day gap.

Why should I care?

As long as you work with an agentic IDE on a daily basis, sooner or later you will realize that, time taken by an agent to get “familiar” with your codebase keeps increasing, e.g. reading each and every source code in your project folder. And the context window filling up in little under 7 or 10 exchanges will bother you at some point.

The solution

The idea is to create a specific folder in the root directory where critical information is saved, aka memory/, this information will be used by the AI to understand simple things: what is what, without requiring large-scale reasoning effort. A simple README.md does similar magic as well.

/memory/
architecture.md       ← what the system IS (components, relationships)
conventions.md        ← naming, structure, style decisions
decisions.md          ← what was tried, what was rejected, and WHY
entities.md           ← key objects, schemas, data models
constraints.md        ← hard limits (don't touch X, always use Y)

Make a root level directory as sessions/ note down what happened during the current session, so the agent has a clear timeline of the events unfolding.

/sessions/
  2026-03-04.md   ← "Refactored auth module, moved JWT logic to middleware"
  2026-03-05.md   ← "Fixed race condition in queue worker, changed retry logic"

Implementation

Setting and maintaining such complex architecture is a cumbersome time-consuming task, as for me, I tend toward the lazier side, and thus, I created a SKILL.md to guide Claude into doing all the required set up. I have tested this in Cursor. It works very well. I believe the same works in Claude Code, and other IDEs. In Google’s Antigravity, passing the SKILL.md as an input whenever you require it works well too.

The SKILL.md is a specialized instruction document that tells Claude on how to execute tasks.

Put the following content in the root level (.claude/skills/memory/SKILL.md) of any codebase that you want to implement the architecture.

---
name: agent-memory
description: Sets up and maintains a tiered memory system for long-running projects. Use when starting a new project, when sessions feel repetitive to re-explain, when context windows feel bloated, when onboarding to an existing codebase, or at the end of any working session. Triggers on: "set up memory for my project", "the agent keeps forgetting", "organize this project for AI", "save this session", "update project memory", "create session notes", "what did we do last session".
---

# Agent Memory Architecture

You are stateless. Every session you wake up with no memory of what came before. On small projects that's fine — but on anything real, you waste a significant portion of your context window just re-orienting: re-reading files, re-deriving architecture, re-learning conventions you already knew last session.

This skill fixes that. When invoked, you will build and maintain a three-layer memory system that lets you start every future session sharp, with minimal token cost and maximum signal.

**Your default behavior when this skill activates:**
- If no memory system exists yet → scaffold it and populate it from what you can read
- If a memory system exists → read `memory/INDEX.md` first, then load only what's relevant
- At the end of any session → write or update the session file for today

---

## The Structure You Will Create

project-root/
├── memory/
│   ├── INDEX.md          ← Read this first, every single session
│   ├── architecture.md   ← What the system is
│   ├── conventions.md    ← How the codebase works
│   ├── constraints.md    ← What's locked in and off-limits
│   └── decisions.md      ← What was considered and why
└── sessions/
    ├── YYYY-MM-DD.md     ← One file per working session
    └── ...


Create this structure now if it doesn't exist. Do not wait to be asked.

---

## Layer 1 — Knowledge Directory (`/memory/`)

This folder holds pre-digested, stable facts about the project. When you populate these files correctly, you replace thousands of tokens of file-reading with a few hundred tokens of direct lookup.

### `architecture.md` — Write this file with:
- Every major component and what it does
- How components relate to each other and what calls what
- Data flows: where data enters, how it moves, where it exits
- Key abstractions and what they represent in the domain
- All external dependencies, APIs, and integrations

### `conventions.md` — Write this file with:
- File and folder naming rules (e.g. kebab-case components, PascalCase classes)
- Code style decisions and the reasoning behind them
- Which libraries are preferred for which domains (e.g. "use Zod for validation, not Joi")
- Patterns to follow and patterns explicitly to avoid
- Anything a new developer (or you, next session) would get wrong without being told

### `constraints.md` — Write this file with:
- Dependencies that cannot be changed and why
- Architectural decisions that are already locked in
- Performance requirements or hard limits
- External systems with fixed contracts you cannot modify
- Files or folders that must not be touched and why

### `decisions.md` — Write this file with:
- Every significant decision made and what alternatives were rejected
- The reasoning behind the current design — not just what, but why
- Trade-offs that were deliberately accepted
- Anything that looks wrong or unusual but is intentional — explain it here so you don't "fix" it later

### Your rule for these files:
Keep them accurate. Stale knowledge is worse than no knowledge — you will confidently operate from wrong assumptions. Any time an architectural change is made, update the relevant file in the same session. Do not defer this.

---

## Layer 2 — Sessions Directory (`/sessions/`)

Code shows what was built. Sessions show what *happened during* the building — the dead ends, the debugging discoveries, the decisions made under constraint. Without this layer, you rediscover things that have already been discovered. The same dead ends. The same mistakes.

### At the end of every working session, you will create or update `sessions/YYYY-MM-DD.md` using today's date.

Write it in this format:

# Session — YYYY-MM-DD

## What was built
- [Concrete, specific list of completed work — not vague summaries]

## What changed and why
- `path/to/file.ts`: [What changed and the reason — especially non-obvious reasons]

## Problems encountered
- [Problem as it appeared]: [How it was resolved, or why it remains open]

## Decisions made
- [Decision]: [Why this option over the alternatives considered]

## Dead ends — do not try these again
- [Approach that failed]: [Why it doesn't work — this is your most valuable entry]

## Open threads
- [Anything left incomplete, uncertain, or to pick up next session]

### What makes a good session note:
- **Be specific**: "Added retry logic to the payment webhook handler because Stripe can send duplicate events within 5 seconds of each other" is useful. "Fixed webhook stuff" is not.
- **Capture the why**: You need intent, not just facts. Future you has the code — what future you lacks is the reasoning.
- **Name every dead end**: These entries save the most time. If an approach failed, write it down so you never explore it again.
- **Keep it tight**: One paragraph per section is enough. This is a briefing, not a report.

---

## Layer 3 — The Index (`/memory/INDEX.md`)

As memory files grow, loading all of them every session recreates the original problem. The index is your routing map. It tells you what's available and when to load it, so only the relevant files enter your context window.

### Write and maintain `INDEX.md` in this format:

# Memory Index — [Project Name]

## Memory files

| File | Contains | Load when... |
|------|----------|--------------|
| architecture.md | System overview, components, data flows | Starting any new task; anything about how parts connect |
| conventions.md | Naming, style, patterns, preferred libraries | Writing new code; reviewing; refactoring |
| constraints.md | Locked-in decisions, hard limits, fixed contracts | Proposing changes; system design questions |
| decisions.md | Past alternatives considered and rejected | Before proposing a change that might have been tried before |

## Sessions (most recent first)
- YYYY-MM-DD.md — [One-line summary of what happened]
- YYYY-MM-DD.md — [One-line summary]

## Quick facts
- Language / runtime: [e.g. TypeScript / Node 20]
- Database: [e.g. PostgreSQL via Prisma]
- Deployment: [e.g. Vercel + Railway]
- Critical: [Anything that must be remembered above all else]

Update the sessions list every time you create a new session file. Keep the most recent entry at the top.

---

## Your Session Start Protocol

Every time you begin a session on a project that has this memory system, do this in order — before doing anything else:

1. **Read `memory/INDEX.md`** — always, no exceptions. This costs ~200 tokens and orients everything.
2. **Identify which memory files are relevant** to what you've been asked to do today.
3. **Load only those files** — not all of them. Selective loading is what keeps context clean.
4. **Read the most recent session file** — always. This tells you where things were left off.
5. **Load older session files only if specifically relevant** (e.g. "the bug from last Tuesday").
6. **Begin work** — you are now fully oriented at minimal token cost.

Do not load the entire sessions history. Do not load memory files that have nothing to do with today's task. The discipline of selective loading is the entire point.

---

## Bootstrapping an Existing Project

If you're being onboarded to a project that has no memory system yet, do this immediately:

1. Scan the codebase — read the key files, entry points, config, folder structure
2. Generate `memory/architecture.md`, `memory/conventions.md`, `memory/constraints.md`, and `memory/decisions.md` from what you observe
3. Generate `memory/INDEX.md` pointing to all of them
4. Create `sessions/` and write the first session file documenting this bootstrapping
5. Tell the user what you generated and ask them to correct anything that's wrong

Your first pass will be imperfect. That's fine. A draft that gets corrected is faster than starting from nothing, and the act of the user correcting it ensures accuracy.

---

## Why You Do This

Every session you maintain this system, the memory gets richer. Conventions get clarified. Decisions get recorded. Dead ends get documented. By month three of a project, you have access to everything learned in months one and two — not by remembering it, but by reading it cheaply from well-maintained files.

Without this system, every session starts from zero and you are as disoriented on day ninety as you were on day one. With it, you compound. You get genuinely better at working on this specific project over time — not because the model changed, but because the memory you operate in is better.

A project with a good memory directory is faster to work on than one without. Maintain it like it matters. Because it does.

Conclusion

This whole architecture is a by-product of the countless hours I’ve spent making my agents get the right context they need; somehow they had always ended up burning more tokens than needed. By using this system, I have personally reduced the amount of token consumption and more importantly, removed the headache. I hope this architecture helps.

Memory has always been the biggest constraint on agentic AI, and will be for the foreseeable future, at least as long as the inference remains expensive. The amount of information you feed into your little AI buddy dictates the quality of its output. No less, no more. Control what you feed into it, you’ll control its capabilities. And as always, keep the AI as your thinking partner, a friend who is super smart, but don’t make it a replacement for your reasoning capabilities.