New research from the UK's AI Security Institute reveals that OpenAI's recently launched GPT-5.5 model demonstrates cybersecurity capabilities comparable to Anthropic's restricted Mythos Preview model during rigorous testing.
Key Points
- The UK's AI Security Institute (AISI) tested GPT-5.5 and Mythos Preview using 95 Capture the Flag challenges covering cryptography, web exploitation, and reverse engineering.
- GPT-5.5 achieved a 71.4 percent success rate on "Expert" level tasks, slightly outperforming the 68.6 percent score recorded by Mythos Preview.
- In a specific test, GPT-5.5 successfully decoded a Rust binary in under 11 minutes for a cost of $1.73 without human intervention.
- GPT-5.5 succeeded in 3 out of 10 attempts at the "The Last Ones" data extraction simulation, marking the first time any AI has passed this test.
- Both models failed the "Cooling Tower" simulation, which tests the ability to disrupt power plant control software.