Milestone note Reached Jan 2025
A benchmark built to resist saturation: Humanity's Last Exam
CAIS · Scale AI — As models saturated existing tests, a 2,500-question expert exam launched on which frontier models initially scored in the single digits — a fresh yardstick for the distance to general capability.
Auto-drafted from a verified measurement, then human-checked.