narrowing the cone

The Origin Story of /feature

I’ve been interested in personal organization tools for a long time, and never really been happy with them. I’ve worked on dead-tree paper a lot, and have used some online tools that I’ve liked, but never had the mix of “this actually solves everything”. I used to work at Dropbox on Dropbox Paper, and I miss it every day. But while it was good, is was only a point solution in what I really need.

With Claude Code becoming a thing, I figure: “I need to learn this, so why not finally build the thing I’ve always wanted?”. And so I started building NoteCove, which I hope to release as beta in the next 6-8 weeks-ish assuming I continue burning down my release blocker list at the current rate. It’s like a personal version of Dropbox Paper, with task tracking, and syncs on whatever cloud storage you happen to have - iCloud Drive, Google Drive, syncing via rclone, etc. Because I don’t want to run a server, and work cares about where we store things. By using what they allow (Google Drive), I don’t have any problems.

This isn’t the story of NoteCove, but how through building NoteCove, I wound up with a workflow that’s worked amazingly well.

Plans

So I got before Claude had planning mode, or even before plans were much known about. At first, I’d tell Claude to do something, and it’d just go and do what I wanted, but seemingly in an evil genie kind of way. The question was: what is Claude going to do? It’d be nice before it marched off for N minutes doing who knows what? I had heard about plans shortly before I got to this point, and connected the dots: Why not ask it to write out what it planned to do?

Hey, then I can chat with Claude about the plan!

But I’m going back and forth over the plan and having to scroll back in the chat and this sucks. And then trying to identify the coordinates in the plan (no not that item, the other one!) about what I was talking about wasn’t great either.

So I come up with my first “Prompt” to tell Claude to write the plan to a file. I figured to just edit the plan, but also chat about it with Claude. I learned I had to tell Claude I may edit it so reload it when you go to change things. This was already a lot better!

Not long after this, I realized, Claude needs to actually look more deeply into the source code after the prompt to see what’s what. So I added this to the prompt. Nowadays, Claude will tend to explore, but not always.

This was better, but it’s kind of exhausting getting Claude to unscrew the plan because it went a different way than I wanted, and I keep going back to the prompt to backfill my prompt to fix the errors Claude made, and try again, because the plan was too broken to fix. And the plans were pretty voluminous.

So I started having Claude just generate top level plans, and then only after the first rounds of corrections have Claude add detail. This was easier to digest rather than untangling a pile of things that Claude guessed wrong about. But this still isn’t wonderful. I’d rather not have this problem in the first place. Or at least to this degree.

It was around this time that I came up with the notion of “How do I narrow the cone of error that Claude could wander into?” so that I get what I really want, and spend less time trying to get the plan smooshed into what I want.

Questions

I want Claude to read my mind. And from a preso from Rod Begbie (Hello! if you’re out there!), I remembered he said the best way to read someone’s mind is to ask questions.

I think to myself, if someone came at me with what I typed at Claude, what would I do? I’d ask questions. So I start pre-asking myself the kinds of questions I would and I’m getting better plans, but not better enough for my liking and it dawns on me: Claude could probably ask some questions here. So I started adding “Questions?” at the ends of prompts, and saw that wow, Claude would start asking decent questions, and the plans came out better. It didn’t always ask enough or the right kinds, so I came up with a better prompt than “Questions?” and started getting even better questions and the plans came out so much better. Not only that, but it wasn’t (still isn’t) uncommon to be like: “huh, I hadn’t thought about that at all, and that’s really important!”, and in the case of NoteCove, Claude would ask questions with some suggestions, like “would you like some library which does what you wanted, but so more more completely and better”?

After having done the questions at the chat prompt for a while, that sucks for the same reason having the plan in the chat sucks, but worse, because an errant enter sends Claude off and having to pull things back and so forth, and if you’ve got 14 questions, again lots of scrolling back and forth. This was for the birds. Like the plan going to a file, I had Claude start writing question files instead. There I could think. Claude would tend to be more verbose. Claude would sometimes offer options, pros/cons, recommendations which was a whole lot nicer. I could ask for recommendations, pros/cons, etc. when I wanted and Claude hadn’t done so on its own. I had a number of “oh crap, I really should tell Claude about something I’d totally forgot to mention” moments during Q&A. And I realized, I can just slap that at the end of the questions file. Cool!

Another thing was that for some reason at a prompt, I feel I need to respond quickly. Whereby in a file, I feel I can properly take my time. You have a number of questions all in front of you together and I discovered that while doing this, sometimes you get to a later question that makes you reconsider a prior question. Or you realize things are wrong, and so you can even delete and rewrite questions even and then answer those. Other things you can do are ask for alternatives, you can tell Claude that its understanding of the things is wrong. Or that you don’t understand the question, use more words.

With this, I’d go a few rounds, of Q&A. Once, as many as seven rounds. Three isn’t uncommon. But now Claude was generating some really good plans, and they required a lot less tweaking.

Review/Critique/Plan Format

As I’m reviewing plans, I was finding some common patterns in what I was asking Claude to fix in the plans. Like when I review code, I look for certain things. I found for plans I did the same. Why not tell Claude “critique the plan and specifically look at the these (now 24 or so for my work variant, 5 for what I use for myself, and link to below) things” and it did! This is when I got to: I rarely need to tweak plans any more. It actually feels a little dangerous, because when plans are usually fine, it’s very tempting to just ship it without looking. For some changes though, that’s just fine, you can fix it in post. But this narrowed the cone of error further.

Next problem was: when executing plans, it was hard to track what was done at a glance, and so the plan format became more structured and grew colored box emojis so it’s easier to glance and see where things are at, especially when Claude decided to defer things or make them optional. I admit I totally ripped the idea of from somewhere, I’m sorry I can’t credit them properly.

A little side story: A fun example I had shortly after we got cursor. I had some python scripts I used to generate calendar pages for the paper calendar I used to carry. They had reasonable tests already, but I figured, let cursor write some more. And it did. On Ville Hellman’s suggestion of the idea, I deleted the original functions it was testing and had it write the functions, just from the tests. And it did, and managed to intentionally avoid a bug the original code had, that was not tested for. So it had understood the goal of what was there!

Back to the main timeline: With all of this, Claude would still generate ok code, but often needed tending to, like hallucinated functions and such. And I think: “Duh, I used to have a team reporting to me named Testing Infrastructure, I should probably have it write tests… and do TDD!” While I hate doing TDD as a human, I’m more than happy to inflict that on Claude. This in a sense is like Supervised Learning, you give the network targets to hit, Claude now has a concrete target to hit. From the previous experience of Claude being able to reproduce functions solely from their tests, plus the reinforcement from the plan and what it knew in context, this got me to a good happy place. It makes sense, it’s another way to constrain that cone of error should Claude tend to wander or want to hallucinate things. Hallucinated code isn’t going to pass tests. Not to mention having good test coverage is just good future insurance.

Last was coverage gates. Claude, when it has trouble passing tests can be inclined to skip or delete them. Coverage gates keeps Claude honest.

The system is still evolving. But now, about 12 weeks into the second implementation of NoteCove, 330k lines of code, around 145k lines of it being tests, and separately 77k lines of markdown plans, Q&A, etc. I killed the first implementation of NoteCove as I chose a bad architecture (I didn’t understand enough how Electron worked) that would have taken too long to fix, and being a week and a half in, why not?

So if you want to try out a workflow that reliably gives good (not perfect) results, give /feature a try. Just copy it into your ~/.claude/commands directory as feature.md, and then just start claude and type /feature the thing you want to build.

about:drewcsillag

A Well Working AI Workflow

The Origin Story of /feature

Plans

Questions

Review/Critique/Plan Format