Question 1

Won't this change in six months?

Accepted Answer

Yes. The routing table is a snapshot, not a commitment. Qwen3-ASR might lose to a future Whisper version. Gemini Image might get expensive. A new model might appear that does extraction and translation in one pass better than the split pipeline. The discipline is re-evaluating when the task shape or the cost changes, not loyally sticking with a provider. I treat model choices the same way I treat library choices — use the best one today, be ready to swap tomorrow.

Question 2

What about audio generation models like Lyria?

Accepted Answer

I haven't shipped anything with audio generation yet. When I do, it'll get a slot in the table. I don't have opinions on models I haven't used in production. The whole point of this piece is that the routing comes from real usage, not speculation.

Question 3

Within Claude, which model do you use for planning vs execution?

Accepted Answer

The “Claude for code” row is really two jobs that want different models. Planning and orchestrating — reading a whole repo, deciding the approach, dispatching subagents — rewards the strongest reasoning model. The actual edits are mostly mechanical once the plan is set, so a cheaper, faster model finishes them without losing much. Claude Code bakes this in with the opusplan setting: Opus drives plan mode, then it switches to a lighter model for execution. I leave that on for the projects where I'm the bottleneck on architecture and let the cheaper tier do the typing. The split is the same idea as the rest of this table — match the model to the exact sub-task, not to the project.

Question 4

Opus or Sonnet for everyday coding?

Accepted Answer

Sonnet is the default I reach for, and Opus is the exception — not the other way round. Anthropic's own guidance lines up with how I route it: Sonnet 4.6 is “the best combination of speed and intelligence” and costs $3/$15 per million tokens, while Opus 4.8 is “the most capable Opus-tier model for complex reasoning and agentic coding” at $5/$25. The Claude Code docs put it plainly — “Sonnet handles most coding tasks well and costs less than Opus. Reserve Opus for complex architectural decisions or multi-step reasoning.” That matches my usage exactly. Most of the work across drafty.im and journeys.im is well-scoped edits where Sonnet is faster and I never feel the gap; I switch up to Opus only when the task is “figure out the approach,” not “make the change.” The same routing logic as the rest of this table — the cheaper model until the task actually demands the expensive one.

Question 5

How do you switch models without restarting?

Accepted Answer

In Claude Code, /model switches mid-session and /config sets the default, so the choice is per-task, not a setup decision you make once. For a subagent doing something mechanical, you can pin a cheaper tier in its config rather than letting it inherit the main model — the docs suggest model: haiku for simple subagent tasks, and Sonnet for agent-team members where you want capable coordination without paying Opus rates on every teammate. The principle is the same one this whole article is built on: the model is a per-job decision, and the tooling lets you make it cheap to change your mind.

Question 6

On a subscription plan, does Opus vs Sonnet still matter if I'm not paying per token?

Accepted Answer

It matters more, not less. On a Pro or Max plan the cost isn't dollars per token — the docs are explicit that “usage is included in your subscription, so the session cost figure isn't relevant for billing.” What you're spending instead is your plan's usage allowance, and Opus draws it down far faster than Sonnet. Run a few heavy Opus sessions and you can hit the limit before the window resets — /usage shows the bars, and you can press w to see the last seven days against your limit. So the routing rule from the rest of this table doesn't go away when you swap an API key for a subscription — it just changes currency. Sonnet stays the default because it stretches the allowance; Opus is the spend you reserve for the architecture call where the stronger reasoning actually earns the bigger bite out of your week.

Task	Model	Approx. cost	Why it won
Code generation and orchestration	Claude (Opus / Sonnet)	varies	Best at holding large codebases in context, following project conventions, multi-file edits
Speech-to-text (JA / KO / ZH)	Qwen3-ASR via DashScope	~$0.007/min	Handles dialect, mixed registers, overlapping speakers, code-switching better than Whisper
Translation (JA / KO / ZH to EN)	DeepSeek	~$0.004/1K chars	Domain-aware with prompting, cost-effective for subtitle-length text
Recipe parsing	Gemini 2.5 Flash	~$0.002/recipe	Fast structured extraction at consumer scale, handles messy HTML reliably
Abstract / painterly images	Gemini 2.5 Flash Image	~$0.008/call	Visually indistinguishable from GPT Image 2 for watercolor style, 16x cheaper
Logos, text-heavy images	GPT Image 2	~$0.13/call	Only model that reliably renders precise text inside images
Structured web extraction	Gemini Flash (self-hosted)	~$0.002/page	Same model Firecrawl runs behind the scenes, 71% cheaper when self-hosted
Landing illustrations	Gemini 2.5 Flash Image	~$0.04/call	Higher-fidelity prompts for marketing assets, still a fraction of GPT Image 2

Pick the model for the job

The routing table

The image generation shootout

The extraction arbitrage

The speech-to-text decision

The meta-pattern

Frequently asked