My AI Agent Ran for 3 Weeks—It Still Acts Like It Has Amnesia

Q: Chat is saved—why does the agent still forget?

You archived raw text, not enforceable rules. Without extraction, retrieval, and expiry, constraints do not load in new sessions.

Q: Is ChatGPT Memory enough?

Fine for one chat app. Agents across IM, IDE, and cron need a separate AI Memory layer.

Last week I pointed a Telegram agent at an existing customer. I thought I had it covered: every message saved, a huge context window, even ChatGPT Memory turned on.

Three weeks later, before a follow-up email, it asked:

“Which company are you with again? Could you recap the project?”

The customer dropped a screenshot in our shared channel. That was the moment it clicked: the agent remembered the chat, not the relationship. When we walked pilot teams through the same story, almost everyone nodded—we were not losing to a “dumb model,” we were treating chat history as Agent Memory.

What follows is how we fixed our OpenClaw setup: real scenarios, no glossary-first lecture, and how we split AI Memory (how the system stores) from the history pane in your chat UI. For stack shopping, see OpenHuman vs ChatGPT Memory (four-layer stack); for IDE agents, Karpathy on layered context.

Scenario 1: chat is archived, the rule still vanishes

A B2B support team told us the same horror story.

Week three, the customer said: “Quotes in PDF only—no Excel.” The agent replied, “Got it, noted.” Admins pulled the thread—every word was still there, searchable for “PDF.”

Three months later, an Excel quote went out. The customer blew up: “Are you even paying attention?”

The model was not “stupid.” That rule never entered retrievable, enforceable Memory. It sat inside thousands of lines mixed with small talk and a bad CC list. On a fresh session, nothing reliably hoists “Customer A = PDF only” to the top of the prompt.

Chat history is evidence for audits. Agent Memory is state for the next decision—like a CRM flag “never Excel,” not asking sales to replay a three-hour call before every deal.

Scenario 2: a million tokens did not stop Claude Code from redoing work

I hit a different failure in Claude Code.

I had it map a monorepo and draft architecture docs—two solid days of work. “Huge window; I’ll pick up next time,” I thought.

Two weeks later, new session: “Continue the doc work on that repo.”

It re-cloned the mental model, re-scanned trees, rewrote a similar outline. Some chat was there—but task state was not: which submodule was done, what already landed in docs/, what waited on me. That lives in tool output and disk, not in banter with the model.

That’s when Karpathy’s “don’t use infinite context as a hard drive” stopped sounding theoretical. What belongs in AI Memory is a one-line updatable checkpoint, not twenty thousand lines of tool stdout.

Frameworks are splitting the same way—see LangGraph on memory vs thread state. We did not copy it blindly, but we agree: message list ≠ memory store.

What an agent actually needs to remember (plain language)

When I explain this to customers or engineers, I skip the acronym wall:

Who they are and hard rules—Customer A, PDF only; boss hates voice notes; team runs on UTC.
Where we left off—Issue #482 waiting on legal; last night’s alert acked, root cause open; doc draft through chapter 3.
How we do things next time—release checklist, approval chain, on-call order.

Chat history hits the first bucket sometimes and barely touches the other two.

In architecture docs we map those three to Semantic, Episodic, and Procedural memory. Learn the jargon when you need to buy software—miss a bucket, and the agent embarrasses you in that bucket.

A pitfall we actually hit: embed every chat turn

Early OpenClaw experiments, we were lazy: vector-index the full transcript, assuming retrieval would “figure it out.”

A month later, quality dropped. The agent revived a six-month-old deploy playbook as current SOP; contradictory snippets from casual chat (“maybe no PDF” vs “send Excel template”) led to confident wrong picks.

We were wrong about the hard part: Memory is not insertion—it’s deletion, edits, and boundaries.

Every Agent Memory row now carries source, created_at, and expiry or version. Query by customer/project first, then time decay. Same index size, fewer hallucinated “policies.” Not magic—the one lesson we’ll put in a statement of work, not “consider a vector DB.”

On Cursor / Claude Code + MCP, tool output often dwarfs chat—store conclusions in Memory, not raw stdout in the index. MCP is just plumbing: Model Context Protocol.

Is ChatGPT Memory enough? Depends what you call an “agent”

I won’t trash ChatGPT Memory—for people who live in ChatGPT, it’s great for tone and light preferences.

Once the same “you” runs in Slack, Telegram, cron, an IDE, Memory stays inside OpenAI’s app. Rules set in chat do not become hard constraints on an OpenClaw gateway.

Our split today (full stack write-up):

ChatGPT Memory: voice/preferences (optional; disable if you duplicate facts elsewhere)
Local vault / OpenHuman-class: work facts + multi-source sync
OpenClaw + MCP: read Memory, run tasks—not be the database

Chat-only? Memory may suffice. Agent that works? You need an AI Memory layer. That’s what we tell buyers—not a vendor-neutral whitepaper.

When agents run for weeks: memory is only step one

Teams that fix Memory often get hit next: the agent does not survive the night.

Lid closed, Wi-Fi blip, macOS kills a local MCP process—morning cron never fires, last night’s state has no process to resume. Stranger feeling: records in the store, agent as if it never existed.

So we talk three layers—not a product dump at the end:

Memory layer—what to store, who can read, when it expires
Execution layer—OpenClaw Gateway on a VPS, 7×24 Telegram/webhook/cron
Tools layer—heavy MCP, browser automation, xcodebuild off the same sleeping laptop that syncs your vault

Our habit: Memory on a Mac or NAS; gateway on Linux VPS; compiles and heavy tools on cloud Mac—because we’ve seen one Archive freeze a vault sync, not because “cloud sells itself.”

Gateway on VPS: OpenClaw Gateway deployment. With all three, “PDF only” is both stored and executable at 3 a.m. Wednesday.

Quick answers

Chat is saved—why does the agent still forget?

Because you saved raw text, not rules. Without extraction, retrieval, and expiry, new sessions won’t load that constraint reliably.

Is dumping PDFs into RAG enough?

RAG answers “what’s in the doc.” It won’t tell you “ticket stalled at step 4” or “this runbook is retired.” We use RAG as one layer of AI Memory, not the whole thing.

Smallest stack for a solo dev?

ChatGPT-only → Memory on. Telegram/IDE agent → local Memory + OpenClaw. Need reliable cron → add VPS. Don’t go 7×24 before Memory—you get automation that’s forgetful and sleep-deprived.