Terminal Bench

Mythos at 92.1%: The AI That Just Needs More Time

Claude Mythos Preview scored 92.1% on Terminal-Bench 2.1 with a 4-hour timeout, up from 82%. Here's why evaluation conditions matter more than the score — and what it means for enterprise AI teams.

4 months ago

More articles

Mythos at 92.1%: The AI That Just Needs More Time

Terminal Bench

More articles

Mythos at 92.1%: The AI That Just Needs More Time