AI Progress Live Dashboard

Cross-source synthesis · extrapolation · delta-since-last-look
Synthesis

State of play across the four nodes

live
Generating synthesis…

Extrapolations

Δ since last look

Applied capability epoch.ai ↗

SWE-bench Verified — top score

fresh
scaffold:
Score is agent + scaffold, not raw model capability — the scaffold label keeps a bespoke harness from being misread as model headroom. This is Verified (the 500-sample human-validated subset; Epoch runs 484), not full SWE-bench. Epoch upgraded its scaffold significantly in Feb 2026.
Autonomy trajectory metr.org ↗

METR 50% time-horizon — frontier curve + live extrapolation

fresh
· doubling ~7 mo (2019–25); ~4 mo recent ·
fitted (7-mo doubling) live extrapolation from current anchor 16-h measurement ceiling (flagged May 2026)
Forward markers (extrapolation from current anchor at 7-mo doubling):
Robustness: METR notes that a 10× absolute-measurement error shifts arrival by ~2 years — slope dominates. Gaps in coverage are not regressions; METR doesn't evaluate every release. Newest points (esp. Claude Mythos Preview) sit at or past the suite's measurement ceiling — wide CI.

What to reach for this week

fresh
ModelIntel$/Mtoktok/s
AA's Intelligence Index is a composite of ~10 benchmarks (v4.0). Methodology shifts; values are relative, not absolute, and reflect benchmark/style biases. Blended $/Mtok shown as 3:1 input:output (a common heuristic, not AA's exact blend — AA uses 7:2:1 cache:input:output).

Training compute over time

fresh
OWID's AI-compute series is Epoch-derived — same lineage as anything else Epoch-sourced; don't read this as independent corroboration of Epoch's other numbers. Trend lines: 1.5×/yr (1950–2010) → 4.2×/yr (2010–25). Recent frontier runs sit near 10¹¹ petaFLOP (≈10²⁶ FLOP).
Data baked at build time on May 30, 2026 (Cowork artifacts can't reach external APIs). Synthesis runs live in your browser via window.cowork.askClaude. Ask Claude to rebuild this artifact to refresh underlying numbers.