Frontier AI

How fast is frontier AI scaling — and how close to general capability?

View on the tracker

Tracked metrics

Training compute ARC-AGI score Private investment Research output

From this field

Milestone note Jun 2026

DeepSeek-V3 trained in ~2 minutes (MLPerf v6.0 record)

Reached

CoreWeave / NVIDIA — CoreWeave set new MLPerf Training v6.0 records, training DeepSeek-V3 (671B parameters) in 2.02 minutes on 8,192 NVIDIA GB300 NVL72 GPUs — the largest GB300 cluster submitted in the round and the only one scaled beyond 2,048 GPUs on DeepSeek-V3. The run used the same infrastructure customers run in production, a marker of how fast large-model training time is collapsing.

Press · 2026-06 Read

Milestone note Jun 2026

Tiered safety deployment of a frontier model (Fable 5 / Mythos 5)

Reached

Anthropic — Anthropic released Claude Fable 5 — a Mythos-class model exceeding any it had made generally available — gated so ~5% of sensitive (e.g. cyber) sessions get a conservatively-tuned model, while the unrestricted Mythos 5 went only to vetted cyberdefenders via Project Glasswing with the US government. Days later the US Commerce Department export-controlled both models, barring all foreign-national access; unable to enforce that selectively in real time, Anthropic shut Fable 5 and Mythos 5 off worldwide (its other models unaffected) — the first time a deployed frontier AI model was export-controlled like a strategic technology.

Press · 2026-06 Read

Explainer Apr 17, 2026

What training compute does (and doesn't) tell you

Frontier training compute has grown ~4–5× a year and is the clearest driver of AI's recent leaps. It is a hard, auditable number — but it's an input, not a measure of intelligence.

Explainer Apr 10, 2026

Why we don't score "AGI"

There is no agreed test for general intelligence, so a single "AGI %" would be our opinion dressed as data. Instead we track objective, third-party numbers: training compute, public benchmark scores, and investment.

Milestone note Apr 2026

First frontier model shipped on domestic Chinese silicon

Reached

DeepSeek / Huawei — DeepSeek's 1.6T-parameter V4 runs on Huawei Ascend (950PR), and a Huawei-led team completed full-parameter post-training on ~1,000 Ascend 910Cs — a compute-sovereignty landmark. Pre-training hardware remains undisclosed, so "trained without Nvidia" is NOT established.

Press · 2026-04 Read

Milestone note Mar 2026

ARC-AGI-3 — first interactive benchmark; AI under 1%

Reached

ARC Prize — The first fully interactive ARC benchmark: hand-built game environments with no instructions — agents must discover the rules. At launch every frontier model scored <1% (best 0.37%) while humans solve them all; $2M+ prize pool, results Dec 2026.

Press · 2026-03 Read

Analysis as of Jun 5, 2025

Is frontier AI investment a bubble?

by Frontier Milestones

US private AI investment hit $109B in 2024 — then 2025's efficiency shock (DeepSeek) made the bubble question sharper, not simpler. Our read on whether capital is ahead of capability. (Our opinion, not investment advice.)

Our read — labelled opinion, not investment advice.

Milestone note May 2025

Reasoning & agentic coding becomes the frontier (Claude Opus 4)

Reached

Anthropic (Claude Opus 4) — Anthropic's Claude Opus 4 launched with extended thinking and sustained autonomous coding over long tasks — part of a 2025 shift where reasoning/agentic models, not raw scale alone, drove the frontier.

Press · 2025-05 Read

Milestone note Mar 2025

Goalposts move: ARC-AGI-2 launches

Reached

ARC Prize — A harder successor — still easy for humans, hard for AI — resetting the abstraction frontier as v1 saturated.

Press · 2025-03 Read

Milestone note 2025

Frontier training compute passes 1e26 FLOP

Reached

frontier labs — Largest models crossed 1e26 FLOP — a 10× jump over GPT-4, with compute still growing ~4–5× per year.

Database · 2025 Read

Milestone note Jan 2025

Open reasoning model rivals the frontier — at a fraction of the cost

Reached

DeepSeek (R1) — DeepSeek-R1, an openly released RL-trained reasoning model, matched leading closed models on math and coding — triggering a market reckoning over AI capex.

Paper · 2025-01 Read

Milestone note Jan 2025

A benchmark built to resist saturation: Humanity's Last Exam

Reached

CAIS · Scale AI — As models saturated existing tests, a 2,500-question expert exam launched on which frontier models initially scored in the single digits — a fresh yardstick for the distance to general capability.

Paper · 2025-01 Read

Milestone note Dec 2024

AI beats the ARC-AGI abstraction test

Reached

OpenAI (o3) — o3 scored 76–88% on ARC-AGI-1 (human ~85%) — the first AI to move beyond memorization on it.

Press · 2024-12 Read

Milestone note Mar 2023

First model trained at 1e25 FLOP

Reached

OpenAI (GPT-4) — GPT-4 was the first model at the 1e25 FLOP scale; over 30 models from 12 developers have since crossed it.

Database · 2025 Read

Milestone note Nov 2022

ChatGPT brings AI to the mainstream

Reached

OpenAI — ChatGPT reached 100M users in two months — the fastest-adopted app to date and AI's consumer inflection point.

Database · 2022-11 Read