Anthropic's Fable 5 returns after 19 days with overzealous guardrails

Anthropic's most capable AI model is back online, but a hastily deployed safety system is flagging harmless code requests and forcing users onto a weaker model.

Anthropic restored access to Fable 5 on July 1 after a 19-day suspension, but a new safety classifier is triggering false positives on routine coding tasks, forcing developers onto the less capable Opus 4.8. The model, the company's first Mythos-class system available to the public, was taken offline in June after the Trump administration imposed export controls following an Amazon-led discovery of a prompt technique that bypassed its safeguards.

"The new classifier has a higher false-positive rate on everyday programming and debugging tasks than we'd like," Anthropic said in a blog post announcing the redeployment. The safeguard, added to comply with Commerce Department requirements, intercepts requests it deems risky and routes them to Opus 4.8 without warning the user.

Through July 7, eligible Pro, Max, Team and select Enterprise subscribers can allocate up to 50% of their weekly usage quota to Fable 5 before burning additional credits. The model consumes credits faster than Opus 4.8, compounding user frustration. After July 7, all Fable 5 usage will require credits.

The controversy underscores the tension between AI safety regulation and product usability — a dynamic that could slow enterprise adoption of advanced models and push developers toward open-weight alternatives from DeepSeek and other providers that operate without centralized guardrails.

A Classifier That Can't Tell Trees From Drones

One earth science PhD student on Reddit described trying to use Fable 5 for research on how trees reduce ambient temperature. The classifier flagged the request and switched him to Opus 4.8. When he tested the system by asking for code to control a drone swarm using DJI's SDK, Fable 5 delivered a complete solution without interruption.

"This is not a safety system — it's a random gate," the researcher wrote.

Anthropic acknowledged the issue in its redeployment post, saying the classifier blocks the specific prompt technique identified by Amazon researchers in more than 99% of cases, but at the cost of frequent false alarms on benign requests. The company did not disclose how many user sessions have been affected.

The false-positive problem is particularly damaging because Fable 5's core strength lies in complex, multi-step coding tasks. Developers who have tested the model report that when it is not interrupted by the classifier, it outperforms any publicly available model on long-horizon agent tasks, scoring above 80% on the SWE-Bench Pro benchmark. One developer used Fable 5 to reconstruct New York City's skyline in Blender in 20 minutes by pulling real building data from public sources. Another built a complete game from scratch using four prompts at a cost of $173 in tokens.

Anthropic Pushes for Industry Safety Standards

To prevent future regulatory standoffs, Anthropic is working with Amazon, Microsoft and Google to create a standardized framework for evaluating AI jailbreak severity. The proposed system scores exploits across four dimensions: capability gain, gain breadth, weaponization difficulty and discoverability. Only exploits that max out all four categories — for example, a technique that could disrupt critical infrastructure — would trigger the highest alert level requiring immediate mitigation.

The company also agreed to give government agencies pre-release access to future models for safety testing, share vulnerability information promptly and fund a HackerOne bug bounty program for Fable 5. Commerce Secretary Howard Lutnick confirmed the removal of restrictions in a letter, noting that Anthropic had "agreed to proactively detect and address security risks posed by the models."

The episode may benefit open-weight model providers such as DeepSeek, whose V4-Pro model operates without centralized guardrails and has demonstrated competitive performance on coding benchmarks. Anthropic's credibility with developers — a key constituency for AI adoption — has taken a hit, and the company's ability to monetize Fable 5's capabilities depends on resolving the classifier's false-positive problem. Anthropic did not disclose Fable 5's per-token pricing but said usage credits will apply after July 7.

This article is for informational purposes only and does not constitute investment advice.