A few weeks ago I asked an agent to plan a database migration for Caipi. The control plane, the part that tracks accounts, sessions, and sharing, had outgrown the single SQLite file it launched on, and I wanted it on Postgres. The agent took about two minutes and came back with a plan: numbered phases, a schema mapping, a cutover sequence, a risk section. It ran about twelve hundred words, cleanly organized, confident from the first line to the last.

I read it once and rejected it.

The plan was not sloppy. Every phase was reasonable on its own terms. The problem was one line in phase three, where it proposed moving the per-user workspace databases into Postgres along with everything else. But those databases are user data, not infrastructure: each one is a SQLite file inside someone's workspace, backing a small app an agent built for them, and the premise of the whole product is that those files are things the user owns and can walk away with. To the agent they just looked like more database, so into Postgres they went. One tidy sentence, and the product's central promise would have quietly become a connection string, delivered with the same calm confidence as everything the plan got right.

Three readings, two rejections

My rejection note was four lines: keep the scope to the control plane, leave every file inside a user's workspace untouched, give me a rollback path that amounts to more than restoring a backup and hoping, and say explicitly what we are choosing not to migrate, so that a future agent does not helpfully finish the job.

The second draft fixed the scope, then waved at rollback with a sentence about snapshots. I sent it back with one question: if the cutover fails at step four, what exactly do I type? The third draft answered, a short section naming the commands and the order they run in. That version I accepted. The migration itself happened a few days later and took an afternoon.

Here is the part I keep coming back to. The final plan is good, and not one sentence of it is mine. My entire contribution was three readings, two rejections, and one question, maybe fifteen minutes of attention against the agent's combined six minutes of writing. Those fifteen minutes were the whole job. Producing the document had become trivial; what the document needed was a reader who could see that phase three was a landmine.

Plausible is the new baseline

What made phase three dangerous was not that it was wrong. It was that it was wrong in the same fluent, well-structured register as everything around it that was right. For most of the history of written work, prose quality was a workable proxy for thought quality: shaky reasoning tended to arrive in shaky sentences, and a reader could triage by style. Agents broke that proxy. Every draft now shows up polished, organized, and sure of itself, and the only way to find the sentence that quietly unmakes your product is to read it with the domain in your head.

That shifts who the power user is. The old power user was defined by production. They typed fast, knew the shortcuts, maintained the templates, kept the wiki alive through sheer discipline, and most software for thinking was built around them, which is why its surfaces are mostly editing surfaces. When a first draft costs two minutes, production stops being where the leverage is. The leverage moves to judgment over output: knowing what to ask for in the first place, telling the keeper from the plausible trap, deciding what gets preserved, and deciding what is still worth using a month later. The person reading the plan now carries the responsibility the person writing the plan used to carry.

What judgment runs on

Judgment sounds like taste, a faculty you either have or lack. In practice, mine ran on mundane things. I caught phase three because the plan arrived as one document I could hold still: scroll it end to end, reread the risk section against the phases, put the second draft next to the third and see exactly what had moved. If the same plan had reached me as forty chat messages interleaved with my objections and the agent's apologies, I doubt I would have caught it on the first pass, and I certainly could not have compared drafts. Chat is where the arguing belongs; why it is a poor place for the result is its own essay, The Scrollback Graveyard.

Form mattered too. A migration plan wants to be read as phases, preconditions, and a rollback table rather than as a wall of prose, and an agent can produce that shape at no extra cost; the longer argument that the format should serve the reader instead of the typist is in Markdown Was a Typing Format. The point here is narrower. Judgment is not a free-floating talent that works on anything you put in front of it. It runs on artifacts that sit still, show their structure, and can be laid side by side, and it degrades fast when they cannot.

What acceptance looked like

Acceptance, it turned out, was less a verdict than a beginning. The plan got opened again on migration morning and followed step by step in a split screen with the terminal. It got opened two days after that, when a slow query made me wonder whether an index had survived the move, and the schema mapping answered in under a minute. The rollback section has never been needed, and knowing exactly where it lives is a quiet kind of insurance that no scroll of chat history has ever given me.

This is the part that discussions of taste tend to skip: keeping is a separate decision from approving, and using is a separate act from keeping. Most generated artifacts fail not by being wrong but by never being opened again, and at that point it no longer matters how good they were. The strongest form of keeping is when the kept thing continues to do work, a page over live data rather than a frozen record, which is the argument of The Leap From Documents to Tools. But even a static plan earns its place the same way the running ones do: someone asks it a question weeks later, and it answers.

Built for the reader

If the human's main act is reading and deciding, tools should be built around that act, and mostly they are not. Most knowledge software still assumes its user is the author: the surfaces are editing surfaces, and the marquee features help you capture, compose, format, and publish. That assumption was correct when getting words onto the page was the expensive part. It goes quietly wrong once the words arrive already written and the expensive part is the verdict.

A tool built for the reader cares about different things. It renders the artifact well on whatever screen the verdict happens on, and it keeps the source inspectable under the rendering, because trust needs somewhere to look. It makes versions comparable, so that rejecting draft two and accepting draft three is a visible act instead of archaeology, and it makes keeping cheap and deliberate, with the kept things still findable a month later. Caipi is my attempt at this, a workspace where agents write and a human reads, judges, and keeps. But the claim is bigger than my product, because most of the industry is still building for an author who is less and less often the one at the keyboard.

The agents will keep getting better at writing the plan. What does not go away is the person who catches phase three.