Intelligence per Dollar

What Each Model Can Solve

Reliability rate per difficulty tier — regardless of cost. How often does the model get it right?

Key: Local models hit a wall at Tier 3. It's not gradual degradation — they just can't do it. The "gate" hasn't opened yet.

Marginal Cost per Solved Problem (API)

What each successfully solved task costs via API. Local models = ~$0 marginal cost (hardware is sunk).

Log scale. A Tier 1 task on nano: $0.001. A Tier 4 task on Opus: $0.80. That's 800x — three orders of magnitude — reflecting real cognitive complexity.

The Trajectory: When Does $1,000 of Hardware Become Sufficient?

Highest difficulty tier reliably cleared, projected through 2030. The dashed line marks Tier 3 — "professional grade" — where local hardware handles ~95% of daily cognitive work.

The crossover: Early 2027. That's when a $1,000 box runs a model that competently does your taxes, reviews contracts, and writes production code — at 85%+ reliability. Driven by ~2.5x/year efficiency gains (Intelligence-Per-Watt) × ~1.4x/year hardware gains (Apple Silicon FLOPS/$).

What a Knowledge Worker Actually Needs

Estimated distribution of daily AI tasks by difficulty tier for a power user

Bottom-heavy. ~75% of tasks are Tier 1–2. A local box already handles most of your day. But the remaining 25% — the hard stuff — is where most of the value lives.

Intelligence Profile — Shape of Each Model

Different models are fundamentally different shapes. Local = fast & efficient but narrow. Frontier = capable but expensive.

Notice: Opus has the biggest area but the most lopsided shape. Qwen 8B is tiny but nearly circular — balanced within its limited range.

Model Comparison — Raw Numbers (March 2026)

Blue border = local models on $1K hardware. Purple border = API models (pay per token).

Model	How you run it	Params	MMLU	Tier Ceiling	Speed	Marginal $/task	Monthly cost
Qwen 3 3.5B Q4	Local — $1K Mac Mini	3.5B	~62%	Tier 1	~90 tok/s	~$0	$42 amortized
Qwen 3 8B Q4	Local — $1K Mac Mini	8B	~72%	Tier 2	~50 tok/s	~$0	$42 amortized
Llama 4 Scout 17B Q4	Local — $1K Mac Mini	17B (109B total MoE)	~78%	Tier 2+	~30 tok/s	~$0	$42 amortized
GPT-5.4 nano	API — pay per token	undisclosed	~76%	Tier 2	~200 tok/s	~$0.001	~$5–15
GPT-5.4 mini	API — pay per token	undisclosed	~84%	Tier 3	~185 tok/s	~$0.01	~$20–50
Claude Sonnet 4.6	API — pay per token	undisclosed	~87%	Tier 3+	~46 tok/s	~$0.03	~$50–120
GPT-5.4	API — pay per token	undisclosed	~92%	Tier 4	~72 tok/s	~$0.05	~$80–200
Claude Opus 4.6	API — pay per token	undisclosed	~90%	Tier 4	~48 tok/s	~$0.08	~$100–300

Tier definitions: T1 = Reflexive (reminders, lookups) · T2 = Competent (emails, summaries, planning) · T3 = Professional (taxes, contracts, debugging) · T4 = Expert (novel strategy, complex research) · T5 = Frontier (unsolved problems)

Amortized hardware: $1,000 / 24 months = $42/mo + ~$5/mo electricity. Marginal cost per task = ~$0.
API monthly: Estimated for moderate power user (~2hr/day, mix of task types).
Tier ceiling: Highest tier where model achieves >85% reliability on representative tasks.
Speed sources: API models from Artificial Analysis (Mar 2026). Local models estimated for M4 Mac Mini (24GB) with MLX, Q4 quantization.
MMLU (Massive Multitask Language Understanding): 57-task benchmark spanning STEM, humanities, social sciences, and professional domains. Measures broad knowledge and reasoning — roughly "how much of a college-educated generalist's knowledge does this model have?" Not a ceiling on capability, but a useful proxy for general competence.

Intelligence per Dollar

The Question

What Each Model Can Solve

Marginal Cost per Solved Problem (API)

The Trajectory: When Does $1,000 of Hardware Become Sufficient?

What a Knowledge Worker Actually Needs

Intelligence Profile — Shape of Each Model

Model Comparison — Raw Numbers (March 2026)