Synthesis
State of play across the four nodes
live
Generating synthesis…
Extrapolations
SWE-bench Verified — top score
fresh
Score is agent + scaffold, not raw model capability — the scaffold label keeps a bespoke harness from being misread as model headroom. This is Verified (the 500-sample human-validated subset; Epoch runs 484), not full SWE-bench. Epoch upgraded its scaffold significantly in Feb 2026.
METR 50% time-horizon — frontier curve + live extrapolation
fresh
—
·
doubling ~7 mo (2019–25); ~4 mo recent
·
—
What to reach for this week
fresh
AA's Intelligence Index is a composite of ~10 benchmarks (v4.0). Methodology shifts; values are relative, not absolute, and reflect benchmark/style biases. Blended $/Mtok shown as 3:1 input:output (a common heuristic, not AA's exact blend — AA uses 7:2:1 cache:input:output).
Training compute over time
fresh
OWID's AI-compute series is Epoch-derived — same lineage as anything else Epoch-sourced; don't read this as independent corroboration of Epoch's other numbers. Trend lines: 1.5×/yr (1950–2010) → 4.2×/yr (2010–25). Recent frontier runs sit near 10¹¹ petaFLOP (≈10²⁶ FLOP).