Claude Haiku costs roughly 3.5-4x less than Sonnet per token. For most production AI systems, the right answer is not "which model?" but "which tasks route to which tier?"
- Haiku wins for classification, routing, extraction, simple summarization, and schema parsing where outputs are short and verifiable.
- Sonnet wins for multi-step reasoning, code generation, nuanced judgment, long-form drafting, and any task where a Haiku error causes downstream harm.
- The mistake is using Sonnet for everything because it's safer. A system running 1,000 calls/day on Sonnet costs nearly 4x more per month than one that routes 70% of those calls to Haiku with a Sonnet fallback.
The cost matrix and routing pattern are below.
The model tiers at a glance
Anthropic's model lineup runs across three practical tiers for production use: Haiku (fast, cheap, good for simple tasks), Sonnet (mid-tier, strong general reasoning), and Opus (expensive, reserved for the hardest problems). This post focuses on the Haiku vs Sonnet decision because that's where most cost optimization lives for teams building AI-powered products.
The pricing as of mid-2026 (check Anthropic's pricing page for current rates, as these change):
- Claude Haiku 4.5: approximately $0.80 per million input tokens, $4 per million output tokens
- Claude Sonnet 4.6: approximately $3 per million input tokens, $15 per million output tokens
That 3.5-4x cost gap is the entire business case for tiered routing. But the gap only matters if you can actually route the right tasks to the cheaper model without losing quality where it matters.
The cost matrix by task type
Here is how eight common AI task types map to model tiers. These are based on running both models on production-scale workloads, not benchmarks.
| Task type | Recommended tier | Why | When to upgrade |
|---|---|---|---|
| Intent classification / routing | Haiku | Prompt is short, output is a category label, few-shot examples work well | If you have >15 categories or categories that overlap semantically |
| Entity extraction | Haiku | Structured output from constrained fields; schema validation catches errors cheaply | If the source text is ambiguous or requires domain knowledge to parse |
| Simple summarization (bullet points, headlines) | Haiku | Short outputs, easy to verify quality, high volume use case | If summary accuracy directly affects a business decision downstream |
| Multi-document synthesis / research summary | Sonnet | Requires holding context across sources, making judgment calls on conflicts | N/A -- Sonnet is usually the floor here |
| Code generation (under 100 lines) | Either | Haiku handles boilerplate and simple functions well; Sonnet needed for complex logic or unfamiliar APIs | Any function with non-obvious edge cases, or code that touches payment/auth/data integrity |
| Code review / debugging | Sonnet | Requires understanding intent, not just syntax; missed bugs are expensive | Use Opus only for security-critical reviews |
| Long-form content drafting (1,000+ words) | Sonnet | Quality degrades noticeably with Haiku at longer outputs; structure and tone drift | N/A |
| Multi-step planning / agent reasoning | Sonnet | Requires maintaining a coherent plan across steps, recovering from dead ends | Use Opus for plans with irreversible consequences (spend, send, delete) |
What this looks like in real cost terms
Assume a system making 1,000 API calls per day, averaging 500 input tokens and 200 output tokens per call. That's 500K input tokens and 200K output tokens daily, or roughly 15M input and 6M output tokens per month.
Monthly cost at current pricing:
- All Haiku: (15M * $0.80) + (6M * $4.00) = $12 + $24 = $36/mo
- All Sonnet: (15M * $3.00) + (6M * $15.00) = $45 + $90 = $135/mo
- 70% Haiku / 30% Sonnet (mixed routing): roughly $65/mo
For a small SaaS doing 1,000 AI calls/day, the difference between "use Sonnet everywhere" and "use mixed routing" is about $840/year. At 10,000 calls/day -- which is not unusual for a mid-sized product -- that difference scales to $8,400/year.
The mixed routing saves money only if Haiku's quality is good enough for the tasks it handles. That's the real engineering question.
"We were using Sonnet for literally everything because no one wanted to be responsible for a quality regression. Turned out 65% of our calls were simple classification and entity extraction. Switching those to Haiku saved $2,100/month with zero measurable quality change." -- SaaS engineering lead on r/MachineLearning
The routing architecture
The simplest mixed-model architecture uses a task type registry at the application layer. You define the model tier for each task type once, and the call routing is automatic. Here is a minimal Python pattern:
import anthropic
MODEL_REGISTRY = {
"classify_intent": "claude-haiku-4-5-20251001",
"extract_entities": "claude-haiku-4-5-20251001",
"summarize_short": "claude-haiku-4-5-20251001",
"draft_email": "claude-sonnet-4-6",
"review_code": "claude-sonnet-4-6",
"plan_agent_steps": "claude-sonnet-4-6",
}
client = anthropic.Anthropic()
def call_ai(task_type: str, prompt: str, system: str = "") -> str:
model = MODEL_REGISTRY.get(task_type, "claude-sonnet-4-6") # Sonnet as safe default
response = client.messages.create(
model=model,
max_tokens=1024,
system=system,
messages=[{"role": "user", "content": prompt}]
)
return response.content[0].text
This approach has a few important properties:
- Sonnet is the safe default. Any task type not in the registry uses Sonnet. You never accidentally downgrade a task type that needs Sonnet.
- Task types are explicit. Callers must name the task type. This forces developers to think about what the call is actually doing before it gets made.
- The registry is the cost model. If you want to reduce spend, you look at the registry and ask which Sonnet entries can safely move to Haiku. The decision is explicit and reviewable.
How to decide whether a task belongs on Haiku or Sonnet
Three questions. If all three are yes, Haiku is likely sufficient. If any is no, start on Sonnet.
1. Is the output short and schema-constrained?
A task that returns a category label, a boolean, a list of up to 5 extracted values, or a structured JSON object with a known schema is a good Haiku candidate. A task that returns a multi-paragraph response, open-ended analysis, or code is less likely to benefit from Haiku.
2. Is a wrong answer cheap to recover from?
Classification errors that get caught downstream by a rule-based validator are cheap. Classification errors that send the wrong email to a customer or delete the wrong record are expensive. Route high-stakes tasks to Sonnet regardless of output length.
3. Does the task require multi-step reasoning?
If the correct answer requires the model to hold a chain of reasoning steps in working memory -- "given X, and knowing Y from earlier in the conversation, and accounting for exception Z" -- that's a Sonnet task. Haiku produces shorter, more direct responses and loses coherence on longer reasoning chains.
Practical verification: how to know if Haiku is underperforming
The test before moving a task type to Haiku is not "does it produce any output?" It's "what is the failure rate, and what does a failure cost?"
The most reliable verification pattern:
- Run the task type on both Haiku and Sonnet against a representative sample of 100-200 real inputs from your production workload.
- Evaluate the outputs -- either manually (for high-stakes tasks) or with a lightweight Haiku grader (for lower-stakes tasks).
- If Haiku's error rate is under 5-8% on the sample, move the task type to Haiku in the registry.
- Set a monitor on the task type's downstream outputs. If error rates climb after the switch, revert the registry entry -- it takes one line of code.
This process sounds slow for a 1-line registry change, but running it once per task type is faster than debugging subtle quality regressions in production.
Where most teams get this wrong
The two failure modes are opposite each other.
Too conservative: Using Sonnet for everything because it feels safer. The cost accumulates slowly and is invisible until someone looks at the API bill three months in. By then, the "we just use Sonnet" decision is embedded in code across the codebase.
Too aggressive: Moving tasks to Haiku without verifying quality, then getting surprised when downstream logic breaks. The errors are usually silent -- Haiku returns something that looks like a valid response but is wrong in a way that only surfaces later.
The fix for both is the same: make model selection explicit (the task type registry), test before switching, and monitor after.
One thing the cost matrix doesn't capture
Token costs are not the only cost in a production AI system. There is also the cost of latency (Haiku is significantly faster than Sonnet, which matters for user-facing calls), the cost of retry logic when a model underperforms, and the cost of prompting complexity (Sonnet often requires less prompt engineering to achieve the same result).
For user-facing features where latency is visible, Haiku's speed advantage can justify its use even when Sonnet would produce marginally better outputs. For background batch jobs where latency doesn't matter, the cost difference is the only variable that matters.
If you're building a system where AI costs are starting to affect unit economics, run the cost matrix calculation against your actual call volume before making any architectural changes. The numbers usually make the routing decision obvious.