How I Write Software with LLMs

Over the last year I’ve written more than 100,000 lines of code using AI. I’ve landed on a workflow I’m genuinely happy with — both in how it feels to use and in the quality of the resulting code.

Most people I see either give a vague prompt, get a disappointing result, and give up — or go the other direction and build complex orchestration pipelines with a dozen moving parts that are too unreliable to trust. This is what works for me in between those two extremes.

The tools

My main driver is Claude Code with Opus, unless the task is small (roughly under 100 lines), in which case Sonnet is fine. For a second opinion and for certain tasks, I use OpenAI’s Codex on GPT-5.4 at xhigh reasoning.

Using two models deliberately isn’t redundancy — they have different personalities and catch different things.

Start with clarification, not code

Whenever I start a new session, I describe what I want — the rough feature outline, what’s in my head — and then I explicitly tell the model to ask me questions about anything unclear.

This step is non-negotiable. Skip it and the model will silently make a guess at every ambiguous decision. Run it again and it’ll make different guesses. You’ll get code that works but isn’t quite what you wanted, and you won’t immediately know why.

I iterate on this — asking for more questions several times — until both of us feel like the specification is nailed down. The more precise the spec, the more mechanical the code generation becomes. That’s the goal.

Planning large features

For anything substantial — a new section of the app, a significant new feature — I use plan mode. The model reads through the codebase, identifies existing patterns, and produces a detailed plan before writing a single line of code. Claude’s plans tend to be very explicit: endpoints, data shapes, what gets touched and why.

Once I have that plan, I pass it to Codex and ask it to critique. Codex is more nitpicky and tends to catch smaller issues Claude glosses over. I take those suggestions back to Claude, do a few revisions, and only once everyone agrees does Claude write the actual code.

Reviewing the output

I don’t read every line — that’s not realistic at scale. Instead I take a high-level pass at which files were touched. This alone catches a surprising number of mistakes: code that didn’t reuse something already written, or changes that introduced cross-module dependencies that shouldn’t exist.

If anything looks off, I flag it immediately. Then I send the code to a different model for review — if Opus wrote it, Codex reviews it. Or I open a pull request on GitHub and let whatever review tool is set up on that project do its pass.

Debugging

For gnarly bugs — not new features, but things that are genuinely broken and require real digging — GPT-5.4 on xhigh is my first call. It’s persistent in trying different approaches and has a noticeably high hit rate on hard problems.

The part most people skip

All of this only works because of the upfront specification work. The cool thing is that you can stress-test a spec from every angle before writing any code — ask the model to rewrite it from a different perspective, challenge your assumptions, find edge cases. This costs almost nothing. And once the spec is solid, turning it into working code is almost trivially easy by comparison.

I could automate more — trigger reviews automatically, chain agents together, remove myself from the loop. I’ve chosen not to. Manual review checkpoints mean I still understand what’s being built. That matters to me both as an engineer and when I’m explaining these workflows to teams.

What does your workflow look like? Reply and let me know — I’m especially curious whether anyone has found a reliable way to do the spec phase without going back and forth as many times as I do.