Anthropic's Super-Secure AI Model Gets Hacked Via Guessing Game

Anthropic's 'too dangerous to release' AI model was accessed by unauthorized users via an educated guess and insider info, proving that hubris is still the easiest vulnerability to exploit.

Anthropic, the company that spent weeks telling everyone its Claude Mythos model was too dangerous for public release, has learned a valuable lesson: if you brag about your AI being unhackable, someone will take that as a challenge. According to Bloomberg, a “small group of unauthorized users” has been quietly enjoying Mythos since the day Anthropic announced it was sharing the model with a select group of testers. The company says it’s investigating, which is a bit like a locksmith discovering their own front door is made of cardboard.

From a technological standpoint, the breach is almost endearingly low-tech. The group reportedly accessed Mythos by making “an educated guess about the model’s online location,” using information from a prior breach of Mercor - a company that makes AI training data - plus insider knowledge from one member’s contract work evaluating Anthropic models. So we’re not talking about a sophisticated cyber-heist here; we’re talking about someone trying a doorknob and finding it unlocked.

Security researcher Lukasz Olejnik described the failure as “entirely imaginable” - the kind of thing the cybersecurity industry has been dealing with for the last 20 years. Anthropic, which could log and track model use, apparently wasn’t monitoring closely enough to notice the uninvited guests. Given how dangerous the company claims Mythos is, you’d think they’d at least check the guest list.

By Bloomberg’s account, the group wasn’t using Mythos for cybersecurity tasks - partly because they just wanted to mess around, and partly because doing so might have tipped off Anthropic. If Anthropic’s messaging is to be taken seriously, that’s a lucky break. The company has framed Mythos as a “watershed moment for security,” claiming it found vulnerabilities in “every major operating system and web browser,” and has been doling out access to governments and financial institutions worldwide. The NSA reportedly has access, though CISA has been left out so far.

“Anthropic claims to be at the absolute forefront of all these technologies, but also positions itself as the responsible actor in all of this,” said Pia Hüsch, a research fellow at the Royal United Services Institute (RUSI). She summed up the whole episode in one word: humiliation. “The fact that this has now been accessed through unauthorized means so quickly, and through such an unsophisticated attempt, is really a humiliation for them.”

This isn’t even the first security hiccup for Mythos. The model’s existence was accidentally revealed through an “unsecured data trove” on Anthropic’s own website before launch. Now it’s been accessed via a vulnerability that any security intern could have predicted. Perfection may be impossible, but for a company that has anointed itself the vanguard of AI safety, this is less a stumble and more a faceplant.

Anthropic's Super-Secure AI Model Gets Hacked Via Guessing Game

News in your inbox.