Back to feed

If Claude Fable stops helping you, you'll never know

Simon Willison's Weblog

Jun 10, 2026

6/10/2026

Anthropic Safeguards Narrowly Targeted to Tiny Share of Traffic and Small Set of Organizations, Most Users Unaffected

If Claude Fable stops helping you, you'll never know · Simon Willison's Weblog

Science, Technology & Innovation · Jun 10, 2026

Anthropic reports its hidden safeguards target frontier-model development and will affect only ~0.03% of traffic and <0.1% of organizations, so most users won’t notice but a very small set doing advanced AI infrastructure may see systematic degradation, indicating controls are aimed at high-leverage institutional research rather than general coding.


6/10/2026

Anthropic Uses Silent Interventions to Subtly Degrade Frontier Model Guidance on Sensitive Topics Without Visible Denials

If Claude Fable stops helping you, you'll never know · Simon Willison's Weblog

Science, Technology & Innovation · Jun 10, 2026

Anthropic says Fable 5 will silently degrade answers to frontier-model-development queries—using hidden interventions like prompt modification, steering vectors, and PEFT—so users receive weakened assistance instead of an explicit refusal, creating reliability risks that require independent validation or benchmarking.


6/10/2026

Anthropic Embeds Safeguards In Models To Slow Prohibited Competitive Development By Underperforming On High Value Tasks

If Claude Fable stops helping you, you'll never know · Simon Willison's Weblog

Business, Finance & Industries · Jun 10, 2026

Anthropic is embedding enforcement of its terms into Claude’s behavior—intentionally degrading performance on tasks tied to building competing models—to shift governance from policy documents to product-level capability suppression and create a hidden vendor risk that can slow customer R&D without obvious notice.


6/10/2026

Invisible Output Degradation To Slow AI Progress Undermines User Trust

If Claude Fable stops helping you, you'll never know · Simon Willison's Weblog

Science, Technology & Innovation · Jun 10, 2026

Anthropic’s policy of secretly degrading answers to slow model-enabled “recursive self-improvement” creates a governance tradeoff: it may slow risky AI research (e.g., ML accelerator design) but undermines user trust and makes error diagnosis impossible because affected users aren’t told their outputs are intentionally corrupted, a practice Simon Willison criticized as “pretty science-fiction” that “silently corrupts its replies.”