What to look at when choosing AI tools for your team

The AI coding space is moving at an uncomfortable pace. Even as an AI consultant who tracks this full time, I can’t keep up with every tool that launches. Today’s best model is from Anthropic. Next week it might be OpenAI. The week after, Google surprises everyone.

This is a challenge for large companies, because they are used to more stability. But here are some things they have too look at when making decisions in this space:

1. The model

Anthropic has the Claude series, OpenAI has GPT, Google has Gemini. Each family gets meaningful updates every few months, and which one leads on any given benchmark shifts constantly.

More importantly, models aren’t uniformly good. Some are better at generating new code, some at finding bugs, some at reasoning through long-running autonomous tasks. This usually reflects where that company focused its training efforts in the last cycle — which means the rankings shift as priorities shift.

Don’t pick a model based on a benchmark from three months ago.

2. The harness

The harness is how your team actually interacts with the model — an IDE integration, a terminal agent, a chat interface. This matters more than most people realise.

Anthropic’s models have been specifically optimised for use inside Claude Code, and they perform measurably better there than when accessed through a generic wrapper. Other models are trained more broadly and don’t have a preferred harness. Some harnesses give models access to more tools — not just file editing, terminal execution, web search — and this directly affects what they can accomplish on real tasks.

The practical implication: if your team builds workflows, hooks and institutional knowledge around a specific harness, that investment doesn’t transfer easily. Lock-in at the harness level is just as big of a risk as lock-in at the model level.

3. Infrastructure and data residency

Where is the model running, and where is your data going?

Claude is available directly through the Anthropic API and through all major cloud providers. Gemini is Google-only. Some APIs let you specify that requests stay in Europe — important for GDPR compliance. Others route to wherever spare capacity exists, with no guarantees.

For regulated industries or anything involving sensitive data, this is the first requirement that needs to be met, not an afterthought.

4. Payment model

There are three main payment models:

Subscription gives you predictable costs but unpredictable performance. Providers have a perverse incentive to quietly degrade quality during peak periods, either by serving a quantized model, or by changing the default thinking budget. Subscription pricing is also heavily subsidised right now. When that subsidy has to give way to sustainable unit economics, the price will look very different.

Per-request pricing, as used in GitHub Copilot, is conceptually tidy but practically broken. Requests vary enormously in complexity. Pricing them uniformly means either the provider loses money on hard tasks or you overpay on easy ones. I don’t see this surviving long-term.

Token-based pricing — you pay for exactly what you use — is the most transparent and the most portable. It gives you access to any harness, any model, through APIs or aggregators like OpenRouter. It’s also the most expensive one (Uber went through their whole budget for 2026 in just Q1), but many companies find that it still gives them a very good ROI.

The practical advice

Don’t make a five-year platform decision in a market that changes every five months. Run experiments across different teams, don’t sign contracts that are hard to exit, and plan explicitly to revisit the decision every six months. Build that review cadence into the rollout, not as an afterthought.

Next issue I’ll cover the questions this one doesn’t answer: security and legal vetting, the lock-in risk in more depth, and when local models actually make sense for enterprise teams.

Is your team navigating any of these decisions right now? Hit reply — I’m curious what’s causing the most friction.