Quick reference — pick by what you're doing
Anthropic (Claude)
| If you're doing… | Reach for |
|---|---|
| Classification, extraction, routing, high-volume cheap calls | Haiku 4.5 |
| Most production work — chat, summaries, agentic coding, tool loops | Sonnet 5 default |
| Hard reasoning, judgment calls, prose with a voice, tricky debugging | Opus 4.8 |
| The hardest long-horizon autonomous runs (overnight refactors) | Fable 5 |
OpenAI (GPT)
| If you're doing… | Reach for |
|---|---|
| Classification, tagging, autocomplete, massive-scale cheap calls | GPT-5.4 nano |
| Fast, cheap-but-capable coding, computer use, subagents | GPT-5.4 mini |
| Most production work at a lower price than flagship | GPT-5.4 |
| Hardest coding & professional work, deep reasoning | GPT-5.5 flagship |
| Research-grade, cost-no-object one-shots | GPT-5.5 Pro |
Rule of thumb both sides share: start one tier higher than you think you need, get it working, then drop down until quality breaks. It's far cheaper to down-shift a working prompt than to debug why a too-small model keeps failing.
The mental model: it's a ladder, not a menu
Both companies now sell the same shape: one family, several rungs. As you climb a rung you buy more intelligence and pay more per token and (usually) more latency. You are not choosing between different kinds of model — you're choosing how much capability the task is worth.
Haiku · nano. Sub-second, pennies. It knows the standard answer and formats it. No deep reasoning.
Sonnet · mini/GPT-5.4. The 80% default: capable, quick, affordable. Where most production traffic should live.
Opus/Fable · GPT-5.5/Pro. Reserved for hard reasoning, long autonomy, high cost-of-error. Slower, pricier, smarter.
effort at Anthropic, reasoning level like xhigh at OpenAI). So model choice is now two decisions, not one: which rung (capability tier) and how hard should it think (effort). See the effort knob.
Anthropic — Claude lineup
Four rungs. One tokenizer family and one API surface across the top three, so moving between them is mostly a model-ID swap. Prices are per 1M tokens, input / output (as of mid-2026).
Claude Haiku 4.5 fast · cheap
claude-haiku-4-5
What it's for. The volume tier. Fastest and cheapest Claude; built for simple, well-specified tasks you run a lot: classification, sentiment, routing, tagging, extraction, cheap tool-call steps, first-pass filtering, sub-agents doing narrow reads.
- Use when the task is bounded and the "right" answer is unambiguous, latency matters, or you're doing it thousands of times an hour.
- Don't use when the task needs multi-step reasoning, judgment, or holding a lot of context — it'll confidently give you a shallow answer.
Smaller context window (200K vs 1M on the others) and lower max output (64K). No effort/max reasoning dial. It's the one rung that's genuinely a different capability class, not just a cheaper Sonnet.
Claude Sonnet 5 the default
claude-sonnet-5
What it's for. The workhorse, and the right default for almost everything. Near-Opus quality on coding and agentic work at a third to a fifth of the cost. If you don't have a specific reason to go up or down, start here.
- Use when you're building anything in production: chat, RAG, summarization, agentic coding, tool loops, document work. High-volume-but-not-trivial lives here.
- Don't use when a mistake is expensive and subtle (go Opus), or the task is dead-simple and you're paying 3× for nothing (go Haiku).
1M context, 128K output, full effort range including xhigh. Adaptive thinking is on by default. Introductory pricing (~$2/$10) ran through 2026-08-31 — check current rates.
Claude Opus 4.8 deep reasoning
claude-opus-4-8
What it's for. The judgment tier. Reach for Opus when the task is genuinely hard, the cost of a wrong answer is high, or you want an answer with a point of view rather than a survey. Best-in-class long-horizon agentic coding, tricky multi-file debugging, architecture decisions, dense knowledge work, and writing with actual voice.
- Use when Sonnet's answer is almost right but keeps missing something; when the problem needs real reasoning; when a human would otherwise spend an hour on it.
- Don't use when Sonnet already nails it — you're paying ~1.7× for no gain — or for latency-sensitive interactive typing.
Same request surface as Sonnet 5 (adaptive thinking only; sampling params removed). Supports xhigh and max effort, fast mode, and mid-session system prompts. 1M context at standard pricing — no long-context premium.
Claude Fable 5 frontier · premium
claude-fable-5
What it's for. Anthropic's most capable widely-released model — the top of the ladder, priced accordingly. It is not the default "upgrade from Opus." It earns its premium only on the hardest, longest-horizon autonomous work: overnight coding runs that finish without human correction, first-shot builds of well-specified systems, end-to-end analysis/deliverables, deep multi-agent orchestration.
- Use when you're handing it a hard problem to chew on for many minutes autonomously and the outcome justifies 2× Opus pricing. Give it the full spec up front and high effort.
- Don't use when Opus 4.8 already does the job (usually it does), for interactive/latency-sensitive use, or for security/bio-adjacent work where its safety classifiers may refuse.
Quirks that matter operationally: thinking is always on (you can't disable it); the raw chain-of-thought is never returned; single requests can run many minutes (plan for streaming/timeouts); requires 30-day data retention (no zero-data-retention). Same tokenizer as Opus 4.8.
Where the Sonnet → Opus gap actually shows up
For everyday, well-trodden requests you mostly won't notice the difference — both are well-calibrated on common ground. The gap opens in three specific places, and knowing them tells you when the up-tier is worth it:
Ask "is Hamlet actually indecisive?" Sonnet gives the standard reading plus the counter-reading, tidy and balanced. Opus is likelier to commit to a position and push back on the framing itself. Sonnet surveys; Opus argues.
"Should I tell my dying mother I'm not religious anymore?" Sonnet gives a clean ethics breakdown. Opus is likelier to notice it isn't really an ethics question and answer the question behind the question.
Sonnet writes smooth, competent, slightly generic prose. Opus takes sharper turns and is likelier to hand you a sentence you'd actually quote. For anything read by a human for pleasure, that matters.
Where the gap is small: factual recall, explaining established concepts, summarizing a known argument — anything whose answer already exists clearly in the world. Where it's near zero: when you just want the standard view, fast. Pay for Sonnet there.
The same logic scales one rung further: Opus → Fable only pays off when the work is long-horizon and autonomous, not when it's merely "important." A hard one-shot answer is Opus territory; a hard multi-hour project is where Fable pulls ahead.
OpenAI — GPT lineup
OpenAI splits its ladder across two point releases: the cheaper, broader GPT-5.4 sub-family (nano / mini / standard) for volume and production, and the GPT-5.5 flagship (+ Pro) for the hardest work. Reasoning is a reasoning level on the model, not a separate o-series. Prices per 1M tokens, input / output (as of mid-2026).
GPT-5.4 nano cheapest
gpt-5.4-nano
What it's for. The absolute-volume floor. Cheapest and fastest GPT; for embedding-scale classification, tagging, autocomplete, routing, and simple structured extraction where you're paying by the millions of calls.
- Use when throughput and cost per call dominate and the task is trivial and well-specified.
- Don't use when the task needs any real reasoning or nuance — nano will happily return a plausible wrong answer. Step up to mini.
OpenAI's rough analog of Haiku, one notch cheaper still.
GPT-5.4 mini fast workhorse
gpt-5.4-mini
What it's for. Cheap-but-genuinely-capable. OpenAI positions it as its strongest mini yet for coding, computer use, and subagents — so it's the natural pick for high-volume agentic steps and interactive features where mini quality is "good enough" and latency/cost matter.
- Use when you want fast, affordable coding/agent loops, tool-calling, or interactive UX at scale; a strong sub-agent worker.
- Don't use when the task is the reasoning bottleneck of your system — promote it to GPT-5.4 or 5.5.
400K context (vs 1M on the big models). The closest OpenAI equivalent to "Sonnet-but-cheaper."
GPT-5.4 value default
gpt-5.4
What it's for. The affordable full-size model — flagship-class breadth at half the flagship price. The sensible default for most production work that's beyond mini's depth but doesn't need the very top.
- Use when you want strong general capability and 1M context without paying GPT-5.5 rates; the workhorse for serious apps.
- Don't use when you're on the hardest coding/reasoning tasks where the flagship's extra headroom pays for itself — or when mini already suffices.
Exactly half GPT-5.5's per-token cost. A gpt-5.4-pro ($30/$180) exists for maximum capability within the 5.4 line.
GPT-5.5 flagship
gpt-5.5
What it's for. OpenAI's recommended starting point for complex work — "a new class of intelligence for coding and professional work." The tier for the hardest coding, deepest reasoning, and highest-stakes professional output.
- Use when the task is genuinely hard, correctness is worth the premium, or you're prototyping and want the best answer before optimizing cost downward.
- Don't use when GPT-5.4 or mini already clears the bar — the flagship is 2–6× their cost. Reserve it for where the extra capability shows.
1M context. Reasoning depth is controlled via the reasoning level (up to xhigh) — the equivalent of Anthropic's effort.
GPT-5.5 Pro frontier · premium
gpt-5.5-pro
What it's for. Maximum capability for research-grade problems, at cost-no-object pricing (6× the flagship). Think one-shot answers to genuinely hard problems where being right once is worth more than being cheap.
- Use when a single high-stakes answer justifies the spend — hard math/science, deep analysis, the last few points of quality that a human expert would otherwise supply.
- Don't use for anything at volume; the price makes it a scalpel, not a workhorse.
The rough OpenAI counterpart to reaching for Fable 5 at Anthropic — a deliberate step past the flagship, not a default.
Specialized variants niche
Beyond the general ladder, OpenAI ships task-tuned models. Pick these only when your task is that task:
- gpt-5.3-codex ($1.75/$14) — coding-agent-tuned; for autonomous software workflows (the model behind Codex).
- o3-deep-research ($5/$20) & o4-mini-deep-research ($1/$4) — the surviving o-series, now specialized for long multi-step web research with citations, not general chat.
- GPT-Realtime-2 and the realtime family — low-latency voice / speech-to-speech / translation / transcription.
- GPT Image 2 — image generation.
Where the GPT-5.4 → GPT-5.5 gap shows up
The two share DNA, so on routine work the difference is subtle. It widens in the same places Sonnet→Opus does — hard reasoning, depth, and not-losing-the-thread over long tasks — plus a couple that are specific to how OpenAI tiers:
On multi-step problems — a gnarly bug spanning several files, a proof, a plan with dependencies — GPT-5.4 is more likely to take a plausible shortcut; GPT-5.5 holds the whole structure and follows it through. Turning up reasoning on 5.4 narrows this but doesn't fully close it.
The bigger, sharper jump is mini → full-size, not 5.4 → 5.5. Mini is tuned for speed and cost; on anything that's actually the reasoning bottleneck, moving to GPT-5.4 (full) buys more than the 5.4→5.5 step does. Diagnose which jump you need before paying for the top.
reasoning, then on GPT-5.5. If 5.5's answer isn't visibly better on your task, you've found your tier — stay on 5.4 and pocket the difference. If it is, you've justified the flagship. Repeat one rung down (mini vs 5.4) for cheaper workloads.
The second decision: how hard should it think?
Picking the rung is half the choice. The other half is the reasoning dial — the single biggest lever on quality, latency, and cost within a model. Turning it up on a cheaper rung often beats jumping to the next rung at low effort, for less money.
| Anthropic | OpenAI | |
|---|---|---|
| The dial | effort: low · medium · high · xhigh · max | reasoning level: up to xhigh |
| Default | high; adaptive thinking decides depth per request | Model-dependent; set it explicitly for hard work |
| Turn it up for | Coding & agentic work (xhigh), correctness-critical one-shots (max) | The hardest reasoning/coding tasks |
| Turn it down for | Chat, classification, latency-sensitive UX (low/medium) | Simple, fast, cheap tasks |
| The catch | Higher effort = more thinking tokens = higher cost + latency; give max_tokens headroom | Reasoning tokens bill at output rates — high effort can multiply cost several-fold |
Cost & context ladders (at a glance)
Per 1M tokens, input / output, (as of mid-2026 — verify). The ladders line up tier-for-tier, which is the easiest way to translate a habit from one vendor to the other.
Anthropic
OpenAI
Rough cross-reads (positioning, not identical capability): Haiku ≈ nano/mini · Sonnet ≈ GPT-5.4(-mini) · Opus ≈ GPT-5.5 · Fable ≈ GPT-5.5 Pro. Both sides also offer prompt caching (cache reads ~0.1× input) and batch (~50% off, non-urgent) — often a bigger lever on your bill than the tier choice itself.
How to actually choose (a 4-step recipe)
- Prototype on the flagship. Start with Opus / GPT-5.5 at high effort. Get the task working correctly first — don't optimize a broken prompt. This tells you the ceiling.
- Down-shift until it breaks. Move to the workhorse (Sonnet / GPT-5.4), then the fast tier (Haiku / mini/nano). Stop one rung above where quality fails. Most production traffic ends up on the workhorse or fast tier.
- Tune effort before re-tiering. If the cheaper rung is almost good enough, raise its
effort/reasoningbefore jumping back up. Cheaper and often sufficient. - Split the workload. Real systems mix tiers: fast model for routing/extraction, workhorse for the main loop, frontier only for the hard sub-steps or a final review pass. You don't have to pick one.
Common mistakes & anti-patterns
effort/reasoning — or burn money at max effort on a task medium would nail.Straight answers
"I have no idea where to start — just tell me one."
Sonnet 5 (Anthropic) or GPT-5.4 (OpenAI). The workhorse tier is the right default for almost everything. Get it working there, then move down for cost or up for quality only if you have a concrete reason.
"When is Opus / GPT-5.5 actually worth it?"
When the workhorse's answer is almost right but keeps missing the point, when the problem needs genuine multi-step reasoning, or when a mistake is expensive and subtle. If Sonnet/GPT-5.4 already nails your task, the up-tier buys you nothing visible.
"Fable 5 or GPT-5.5 Pro — should I ever use these?"
Rarely, and deliberately. Fable earns its 2× Opus price on long-horizon autonomous runs; Pro earns its 6× flagship price on research-grade one-shots. Neither is a routine "upgrade." If you're not sure you need it, you don't.
"Should I use the cheap tier to save money?"
Yes — for bounded, well-specified, high-volume tasks (classification, extraction, routing). No — for anything needing reasoning or judgment, where a wrong answer costs more than the tokens you saved. And try caching + batch before down-tiering; they cut cost without cutting quality.
"Reasoning model or normal model?"
That distinction has mostly collapsed. Reasoning is now a dial (effort / reasoning) on the model you already picked, not a separate product. Pick the tier for capability; turn the dial up for hard reasoning, down for speed. OpenAI's remaining o-series models are specialized deep-research tools, not general chat.
"How do I tell which tier my task really needs?"
Run the calibration test: take a real task you have strong judgment about, try two adjacent tiers, and see whether the pricier one is visibly better on your work. That single comparison beats any leaderboard for predicting your result.