How I Write Software with LLMs

How I Write Software with LLMs

Over the last year, I’ve written more than 100,000 lines of code using AI. I’ve landed on a workflow I’m genuinely happy with — both in how it feels to use and in the quality of the resulting code.

Most people I see either:

  • give a vague prompt, get a disappointing result, and give up
  • or go the other direction and build complex orchestration pipelines with a dozen moving parts that are too unreliable to trust

This is what works for me in between those two extremes.

The tools

My main driver is Claude Code with Opus, unless the task is small (roughly under 100 lines), in which case Sonnet is fine.

For a second opinion — and for certain tasks — I use OpenAI’s Codex on GPT-5.4 at xhigh reasoning.

Using two models deliberately isn’t redundancy.
They have different “personalities” and catch different things.

Start with clarification, not code

Whenever I start a new session, I describe what I want — the rough feature outline, what’s in my head — and then I explicitly tell the model to ask me questions about anything unclear.

This step is non-negotiable.

Skip it, and the model will silently make assumptions at every ambiguous decision.
Run it again, and it will make different assumptions.

You’ll get code that works — but isn’t quite what you wanted — and you won’t immediately know why.

I iterate on this several times, asking for more questions until the specification is nailed down.

The more precise the spec, the more mechanical the code generation becomes. That’s the goal.

Planning large features

For anything substantial — a new section of the app, a significant feature — I use plan mode.

The model:

  • reads through the codebase
  • identifies existing patterns
  • produces a detailed plan before writing a single line of code

Claude’s plans tend to be very explicit: endpoints, data shapes, what gets touched and why.

Once I have that plan, I pass it to Codex and ask it to critique.

Codex is more nitpicky and tends to catch smaller issues Claude glosses over.

I take those suggestions back to Claude, do a few revisions, and only once everything checks out does Claude write the actual code.

Reviewing the output

I don’t read every line — that’s not realistic at scale.

Instead, I take a high-level pass:

  • which files were touched
  • what dependencies were introduced
  • whether existing code was reused

This alone catches a surprising number of mistakes.

If anything looks off, I flag it immediately.

Then I send the code to a different model for review:

  • if Opus wrote it → Codex reviews it

Or I open a pull request on GitHub and let the existing review tools do their pass.

Debugging

For gnarly bugs — not new features, but things that are genuinely broken — GPT-5.4 on xhigh is my first call.

It’s persistent in trying different approaches and has a high hit rate on hard problems.

The part most people skip

All of this only works because of the upfront specification work.

The advantage is that you can stress-test a spec from every angle before writing any code:

  • ask the model to rewrite it from a different perspective
  • challenge your assumptions
  • find edge cases

This costs almost nothing.

And once the spec is solid, turning it into working code becomes almost trivial by comparison.

Why I don’t automate everything

I could automate more:

  • trigger reviews automatically
  • chain agents together
  • remove myself from the loop

I’ve chosen not to.

Manual review checkpoints mean I still understand what’s being built.

That matters — both as an engineer and when I’m explaining these workflows to teams.

What does your workflow look like?

I’m especially curious whether anyone has found a reliable way to handle the spec phase without going back and forth as many times as I do.

Leave a Reply

Your email address will not be published. Required fields are marked *