Your AI coding bill needs a plan

Don’t give the AI your credit card

Uber burned their entire 2026 AI budget. In Q1.

Their CTO went on record saying they’re “back to the drawing board” after a surge in Claude Code usage blew past every internal projection. Engineers were spending between $500 and $2,000 per person, per month. They gave thousands of engineers near-unlimited access to a powerful AI coding agent and watched a full year of budget vaporize in three months.

You might think: big company problem. Thousands of engineers, billions in R&D, someone forgot to set a limit.

But last month a developer at a 300-person company here in Oradea spent $1,000 on AI tokens in a single month. Their company now expects the Github Copilot bill to go up 3-4x following the recent pricing changes. Same problem, different scale, same absence of a plan.

This is the new normal. Eng tooling just became a salary-sized line item — and most teams don’t have a framework for thinking about it yet.

There are levers. Most teams aren’t using them.

The spend isn’t inevitable. But AI assisted coding is very new, so most developers don’t have a good model of how to control costs. Here are some knobs worth knowing about.

Model selection. A year ago I told every team I worked with: always use the best model, it’s worth it. I’ve updated that position. Haiku is now good enough for some basic things. Sonnet handles most implementation work well. Opus earns its price on architecture decisions, hard debugging, and anything where the cost of getting it wrong is asymmetric, such as security work.

Context management. Every message re-sends your full session context. A bloated CLAUDE.md, a dozen MCP servers you installed six months ago and forgot about, plugins that haven’t fired in weeks — all of it loads on every message, before you’ve typed a word. When was the last time you ran an audit on what your agent loads in its context window?

Orchestration design. Knowing when to pause and review versus when to let it run is a skill, and most teams haven’t developed it deliberately. An agent-review loop can burn hundreds of thousands of tokens. Three parallel subagents each allowed to retry five times on failure is fifteen expensive inference calls for a task that should have been one. Sometimes a senior engineer reviewing it themselves costs dramatically less — and catches more.

Scope discipline. Garbage in, garbage out — but more expensive. A vague prompt that sends an agent down the wrong path for twenty minutes is a cost problem as much as a quality problem. Spending half a (human) hour thinking hard about the task isn’t just good practice, it’s cost management as well.

The honest summary

The teams getting this right aren’t spending less — they’re spending intentionally. They know which workflows earn their cost and which ones are expensive procrastination. They have a rough model for what a task should cost before they run it, and they notice when the bill says otherwise.

Most teams don’t have that yet. And right now, without it, you’re either leaving productivity on the table or funding someone else’s Q1 budget story.

Is this a live problem for your team?

Depending on what you’re dealing with, it might be a conversation, a workshop, or something in between.

Leave a Reply

Your email address will not be published. Required fields are marked *