CoreWeave / NVIDIA — CoreWeave set new MLPerf Training v6.0 records, training DeepSeek-V3 (671B parameters) in 2.02 minutes on 8,192 NVIDIA GB300 NVL72 GPUs — the largest GB300 cluster submitted in the round and the only one scaled beyond 2,048 GPUs on DeepSeek-V3. The run used the same infrastructure customers run in production, a marker of how fast large-model training time is collapsing.
Anthropic — Anthropic released Claude Fable 5 — a Mythos-class model exceeding any it had made generally available — gated so ~5% of sensitive (e.g. cyber) sessions get a conservatively-tuned model, while the unrestricted Mythos 5 went only to vetted cyberdefenders via Project Glasswing with the US government. Days later the US Commerce Department export-controlled both models, barring all foreign-national access; unable to enforce that selectively in real time, Anthropic shut Fable 5 and Mythos 5 off worldwide (its other models unaffected) — the first time a deployed frontier AI model was export-controlled like a strategic technology.
Frontier training compute has grown ~4–5× a year and is the clearest driver of AI's recent leaps. It is a hard, auditable number — but it's an input, not a measure of intelligence.
There is no agreed test for general intelligence, so a single "AGI %" would be our opinion dressed as data. Instead we track objective, third-party numbers: training compute, public benchmark scores, and investment.
DeepSeek / Huawei — DeepSeek's 1.6T-parameter V4 runs on Huawei Ascend (950PR), and a Huawei-led team completed full-parameter post-training on ~1,000 Ascend 910Cs — a compute-sovereignty landmark. Pre-training hardware remains undisclosed, so "trained without Nvidia" is NOT established.
ARC Prize — The first fully interactive ARC benchmark: hand-built game environments with no instructions — agents must discover the rules. At launch every frontier model scored <1% (best 0.37%) while humans solve them all; $2M+ prize pool, results Dec 2026.
US private AI investment hit $109B in 2024 — then 2025's efficiency shock (DeepSeek) made the bubble question sharper, not simpler. Our read on whether capital is ahead of capability. (Our opinion, not investment advice.)
Our read — labelled opinion, not investment advice.
Anthropic (Claude Opus 4) — Anthropic's Claude Opus 4 launched with extended thinking and sustained autonomous coding over long tasks — part of a 2025 shift where reasoning/agentic models, not raw scale alone, drove the frontier.
DeepSeek (R1) — DeepSeek-R1, an openly released RL-trained reasoning model, matched leading closed models on math and coding — triggering a market reckoning over AI capex.
CAIS · Scale AI — As models saturated existing tests, a 2,500-question expert exam launched on which frontier models initially scored in the single digits — a fresh yardstick for the distance to general capability.