Two AI coding agents, the same repo, one Friday night. The harness diffed their transcripts and turned the findings back on itself.

It's 11 PM on a Friday, and I'm $110.12 over on usage credits on my second Claude Max+ account. Desperate to wring everything I could out of Fable while it was still public on a regular subscription, I signed up for a third Claude Max account. I went to fire off my first prompt and got back a single line: "this model is unavailable." The ban had landed 72 hours after release.
While I was still chasing a refund from Anthropic, my daily workflow-review routine kicked off. It found two coding agents — Fable 5 and Opus 4.8 — on the same project, pulled the full transcripts of my work with each, and read them side by side: where the context window went, which tools fired, and where each one stalled. Three differences explained almost the entire gap.
1. Auto-invoked skills and hooks were clogging the context
Every skill and hook that fired on its own dumped its full payload into the context window whether the task needed it or not. Across a long session that's death by a thousand cuts: the window fills with instructions the agent isn't using, crowding out the code it actually has to reason about. Trimming auto-invocation to only what the task in front of it needs gave back the most working room.
2. The biggest context saver: the CLAUDE.md re-read on every edit
The single largest drain was subtle. The project's standing-instructions file — CLAUDE.md — was being handed back to the agent in full on every file edit, because the edit tool re-attaches it. Dozens of edits in a session meant dozens of full re-reads of the same document. Shrinking that file and stopping the redundant re-attach recovered more context than any other single change.
3. The biggest time saver: stop blocking on a screenshot that never rendered
During visual verification, screenshots sometimes fail to render — a blank or timed-out frame. Fable shrugged it off and moved on. Opus blocked itself, waiting on a frame that was never coming, and the whole session stalled. Making verification fail open — note the miss, fall back to the DOM and logs, keep going — was the biggest wall-clock win of the weekend.
The agent that wins isn't the smartest one. It's the one that wastes the least context and refuses to block on nothing.
Fold the three back into the harness — fewer auto-invoked skills, a leaner edit-time context, and a verification step that fails open instead of hanging — and throughput roughly doubled. 84 commits one weekend, 175 the next. The model didn't get smarter. The harness did.
Verifying
Same repo, same tasks, two harnesses. The interesting unit of analysis isn't the model — it's the harness around it: the skills, hooks, memory files, and verification steps that decide what the model sees and when it's allowed to move on. I diffed the two transcripts line by line and folded every win back into one shared harness.
The lesson
Every harness needs a self-improving loop — something that revisits your top friction points over time, not just once. It should surface where token usage runs hottest, where context gets pulled in that the task never needed, and which tools would have made the job easier. Claude Desktop ships one: the Routines feature, second item down the left nav. Set up an automated review and see what it turns up.
I thought I had set firm guardrails around the optimizations my reviewer agent surfaced. Sometimes they get skipped anyway — and from the chat alone, it's hard to tell which pieces are slipping through, or where.
More writing