Intelligence per Dollar

How much thinking can you buy — and what quality — for a fixed budget?

The Question

AI tasks span 5 difficulty tiers, from "set a timer" to "prove a theorem." A $1,000 Mac Mini (amortized to $42/month) gives you unlimited easy thinking but can't touch hard problems. API models can solve harder problems but charge per token. When do these lines cross — when does a $1,000 box handle 95% of a knowledge worker's daily needs?

The Landscape Today

What Each Model Can Solve

Reliability rate per difficulty tier — regardless of cost. How often does the model get it right?

Key: Local models hit a wall at Tier 3. It's not gradual degradation — they just can't do it. The "gate" hasn't opened yet.

Marginal Cost per Solved Problem (API)

What each successfully solved task costs via API. Local models = ~$0 marginal cost (hardware is sunk).

Log scale. A Tier 1 task on nano: $0.001. A Tier 4 task on Opus: $0.80. That's 800x — three orders of magnitude — reflecting real cognitive complexity.

The Trajectory: When Does $1,000 of Hardware Become Sufficient?

Highest difficulty tier reliably cleared, projected through 2030. The dashed line marks Tier 3 — "professional grade" — where local hardware handles ~95% of daily cognitive work.

The crossover: Early 2027. That's when a $1,000 box runs a model that competently does your taxes, reviews contracts, and writes production code — at 85%+ reliability. Driven by ~2.5x/year efficiency gains (Intelligence-Per-Watt) × ~1.4x/year hardware gains (Apple Silicon FLOPS/$).

What a Knowledge Worker Actually Needs

Estimated distribution of daily AI tasks by difficulty tier for a power user

Bottom-heavy. ~75% of tasks are Tier 1–2. A local box already handles most of your day. But the remaining 25% — the hard stuff — is where most of the value lives.

Intelligence Profile — Shape of Each Model

Different models are fundamentally different shapes. Local = fast & efficient but narrow. Frontier = capable but expensive.

Notice: Opus has the biggest area but the most lopsided shape. Qwen 8B is tiny but nearly circular — balanced within its limited range.

Model Comparison — Raw Numbers (March 2026)

Blue border = local models on $1K hardware. Purple border = API models (pay per token).

Model How you run it Params MMLU Tier Ceiling Speed Marginal $/task Monthly cost
Qwen 3 3.5B Q4 Local — $1K Mac Mini 3.5B ~62% Tier 1 ~90 tok/s ~$0 $42 amortized
Qwen 3 8B Q4 Local — $1K Mac Mini 8B ~72% Tier 2 ~50 tok/s ~$0 $42 amortized
Llama 4 Scout 17B Q4 Local — $1K Mac Mini 17B (109B total MoE) ~78% Tier 2+ ~30 tok/s ~$0 $42 amortized
GPT-5.4 nano API — pay per token undisclosed ~76% Tier 2 ~200 tok/s ~$0.001 ~$5–15
GPT-5.4 mini API — pay per token undisclosed ~84% Tier 3 ~185 tok/s ~$0.01 ~$20–50
Claude Sonnet 4.6 API — pay per token undisclosed ~87% Tier 3+ ~46 tok/s ~$0.03 ~$50–120
GPT-5.4 API — pay per token undisclosed ~92% Tier 4 ~72 tok/s ~$0.05 ~$80–200
Claude Opus 4.6 API — pay per token undisclosed ~90% Tier 4 ~48 tok/s ~$0.08 ~$100–300

Tier definitions: T1 = Reflexive (reminders, lookups) · T2 = Competent (emails, summaries, planning) · T3 = Professional (taxes, contracts, debugging) · T4 = Expert (novel strategy, complex research) · T5 = Frontier (unsolved problems)

Amortized hardware: $1,000 / 24 months = $42/mo + ~$5/mo electricity. Marginal cost per task = ~$0.
API monthly: Estimated for moderate power user (~2hr/day, mix of task types).
Tier ceiling: Highest tier where model achieves >85% reliability on representative tasks.
Speed sources: API models from Artificial Analysis (Mar 2026). Local models estimated for M4 Mac Mini (24GB) with MLX, Q4 quantization.
MMLU (Massive Multitask Language Understanding): 57-task benchmark spanning STEM, humanities, social sciences, and professional domains. Measures broad knowledge and reasoning — roughly "how much of a college-educated generalist's knowledge does this model have?" Not a ceiling on capability, but a useful proxy for general competence.