Back to feed

Claude Opus 4.8: "a modest but tangible improvement"

Simon Willison's Weblog

May 28, 2026

5/28/2026

High-Quality Reasoning Improves Outputs But At High Token Cost Use It Selectively For High-Value Steps

Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog

Science, Technology & Innovation · May 28, 2026

High-end reasoning can be costly: in Willison’s test, the highest 'thinking' setting produced visibly better outputs but consumed 25 input and 17,167 output tokens (about $0.43) for one result, highlighting a quality-vs-cost tradeoff that suggests reserving top settings for high-value steps or human-escalation rather than default use.


5/28/2026

Caching Thresholds And Limited Fast Mode Reconfigure Cost Dynamics Emphasizing Cache Benefits And Selective Fast Mode Use Over Model Improvements

Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog

Science, Technology & Innovation · May 28, 2026

Anthropic kept model specs (Jan 2026 cutoff, 1,000,000-token context, 128,000-token max output) but cut fast-mode pricing (now 2× standard) and lowered the minimum cacheable prompt from 4,096 to 1,024 tokens, meaning more medium-length prompts can use caching and low-latency usage is cheaper though fast mode is limited to research-preview access—so operators should optimize costs via caching thresholds and selective fast-mode use.


5/28/2026

Opus 4.8 Is An Incremental Release Focused On Cost Reduction And Broader Accessibility

Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog

Business, Finance & Industries · May 28, 2026

Anthropic presents Claude Opus 4.8 as a modest, incremental upgrade focused on cost reduction and efficiency rather than a major quality leap, so developers should judge upgrades by reliability and workflow fit while investors should expect pricing pressure and product segmentation.


5/28/2026

Opus 4.8 Reduces Hallucination By Abstaining On Uncertain Questions And Lowers False Confidence At The Cost Of Non-Answers

Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog

Science, Technology & Innovation · May 28, 2026

Claude Opus 4.8 reduces hallucinations by being trained for honesty and by abstaining or signaling uncertainty—it’s about four times less likely than its predecessor to let code flaws pass and had the lowest incorrect-rate across benchmarks, trading more non-answers for lower false-confidence and improved production safety.


5/28/2026

Opus 4.8 Enables Mid Conversation System Messages To Update Instructions Without Resending The Full Prompt

Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog

Science, Technology & Innovation · May 28, 2026

Opus 4.8 allows mid-conversation system messages (role:"system" after a user turn), letting developers append updated instructions to steer long-running agents without rewriting the original prompt—preserving prompt-cache hits, lowering token/resend costs, and enabling more flexible agent architectures (with potential compatibility issues for frameworks that assume a single system prompt).