Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog
Science, Technology & Innovation · May 28, 2026
High-end reasoning can be costly: in Willison’s test, the highest 'thinking' setting produced visibly better outputs but consumed 25 input and 17,167 output tokens (about $0.43) for one result, highlighting a quality-vs-cost tradeoff that suggests reserving top settings for high-value steps or human-escalation rather than default use.
Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog
Science, Technology & Innovation · May 28, 2026
Anthropic kept model specs (Jan 2026 cutoff, 1,000,000-token context, 128,000-token max output) but cut fast-mode pricing (now 2× standard) and lowered the minimum cacheable prompt from 4,096 to 1,024 tokens, meaning more medium-length prompts can use caching and low-latency usage is cheaper though fast mode is limited to research-preview access—so operators should optimize costs via caching thresholds and selective fast-mode use.
Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog
Business, Finance & Industries · May 28, 2026
Anthropic presents Claude Opus 4.8 as a modest, incremental upgrade focused on cost reduction and efficiency rather than a major quality leap, so developers should judge upgrades by reliability and workflow fit while investors should expect pricing pressure and product segmentation.
Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog
Science, Technology & Innovation · May 28, 2026
Claude Opus 4.8 reduces hallucinations by being trained for honesty and by abstaining or signaling uncertainty—it’s about four times less likely than its predecessor to let code flaws pass and had the lowest incorrect-rate across benchmarks, trading more non-answers for lower false-confidence and improved production safety.
Claude Opus 4.8: "a modest but tangible improvement" · Simon Willison's Weblog
Science, Technology & Innovation · May 28, 2026
Opus 4.8 allows mid-conversation system messages (role:"system" after a user turn), letting developers append updated instructions to steer long-running agents without rewriting the original prompt—preserving prompt-cache hits, lowering token/resend costs, and enabling more flexible agent architectures (with potential compatibility issues for frameworks that assume a single system prompt).