News Feed

I can never fully embrace LLMs for code

iDiallo.com

iDiallo.com · Jun 12, 2026

The article reflects on the clash between relying on LLMs to generate code and the insistence on understanding that code before using it. It recounts an experience where AI-generated code ran in 12 minutes but required 10 hours of reworking, highlighting reliability, provenance, and trust concerns. The author suggests these issues sap the speed gains from AI code and leaves him unsure how to reconcile trusting the tool with needing deep understanding, questioning whether he can become a 10x engineer with this approach.

Apple: ‘Due to DMA, Siri AI Delayed in EU for iOS 27 and iPadOS 27’

Daring Fireball

Daring Fireball · Jun 11, 2026

Apple says Siri AI cannot ship in the European Union with iOS 27/iPadOS 27 due to the Digital Markets Act, and will instead roll out Siri AI on macOS 27 and visionOS 27 in the EU. The EU regulators rejected Apple’s proposed safety/privacy solutions (including a Trusted System Agent), so EU iPhone/iPad (and watchOS 27) won’t have the new assistant for now, with no timeline for when it might arrive. Apple adds it will continue engaging with regulators and pursue a privacy-preserving rollout plan.

Late Stage Venture Is About Late Stage Founders

a16z News

a16z News · Jun 11, 2026

The article argues that the growth-stage venture asset class hinges on late-stage founders—their ability to continuously deploy capital and make bold, non-consensus decisions—more than on fundraising structures or valuations. It contends that technology alone isn’t the differentiator; the founder’s decision-making and ability to spot and act on opportunities drive the alpha, with VCs serving as long-horizon partners who stay 'in the car' with founders. It also argues that keeping founders in founder mode, rather than replacing them with professional CEOs, and staying private longer are key to unlocking durable, outsized returns.

AI will be massively deflationary

the singularity is nearer

the singularity is nearer · Jun 11, 2026

The article argues that fears about Anthropic and recursive self-improvement miss the bigger point: AI will be a broad, deflationary force that commoditizes knowledge work. Using a tractor-versus-labor analogy and noting global competition (including cheaper Chinese models), it predicts squeezed margins, lower wage premiums, and a reshaping of status hierarchies as AI becomes ubiquitous.

Larry Mcdonald: The Migration is Upon us

MacroVoices

MacroVoices · Jun 11, 2026

The discussion argues that U.S. markets are entering a migration phase driven by a wave of large IPOs (SpaceX, Google-related offerings, Anthropic/OpenAI) and aggressive insider unlocks that could flood the market with supply over the next 6–12 months, potentially triggering a meaningful drawdown similar to past cycles. It highlights a sharp market dichotomy—energy and materials offer cheap free-cash-flow yields while technology remains richly valued—and notes CFOs are selling via convertible bonds, signaling potential downside pressure as insider liquidity materializes and IPOs come to market.

Larry Mcdonald: The Migration is Upon us

MacroVoices

MacroVoices · Jun 11, 2026

Larry McDonald argues that a wave of mega-IPOs (SpaceX, Google, Anthropic) and their lockup unlocks could flood the market with equity in the next 6–12 months, potentially triggering a bear-market–like drawdown similar to 2021–22. He highlights a bifurcated market: weak consumer demand weighing on discretionary names while energy, materials, and some tech offer cheap cash flows, with CFOs using convertibles and insiders selling as additional supply. He also notes gold miners underperforming amid shifting rate expectations and a recent “hot money flush” that may reshape sector leadership.

datasette-agent 0.2a0

Simon Willison's Weblog

Simon Willison's Weblog · Jun 10, 2026

datasette-agent 0.2a0 introduces mid-execution user prompts via ToolContext, letting tools ask questions and suspend conversations until answers are provided, with progress persisted across restarts and replayed when resumed. It also adds a save_query tool to store SQL as a Datasette stored query, gating saving behind human approval with full SQL and metadata shown first. The ask_user() capability is powered by a new LLM alpha built with Claude Fable 5.

Return on Tokens (ROT)

Not Boring by Packy McCormick

Not Boring by Packy McCormick · Jun 10, 2026

Return on Tokens (ROT) argues that tokenmaxxing—the habit of maximizing AI token spend—has become a costly delusion and should be judged by ROT: the net value created per token. It identifies three structural reasons why enterprise Agents underperform—they can’t sustain long-running, high-quality work; they improvise without concrete goals; and engineers lack clear objectives—leading to wasted spend, and it advocates routing to cheaper models and leveraging deterministic code, i.e., treating AI as a compiler rather than a runtime layer, to rebuild value in AI-native organizations.

New framework for auditing machine unlearning

The latest research from Google

The latest research from Google · Jun 10, 2026

The article introduces Regularized f-Divergence Kernel Tests, a framework for auditing machine unlearning that replaces fragile two-sample tests with a relative-distance approach using adaptive, kernel-based f-divergences to detect when an unlearned model deviates from a safely retrained baseline. It leverages Chi-squared, KL, and hockey-stick divergences and automatic hyperparameter tuning to pinpoint specific data shifts and reduce false positives/negatives without heavy sample splitting. Experimental results on synthetic benchmarks and physics data show improved detection with less tuning and fewer samples, and a three-sample test better distinguishes safely retrained models from original memorizing ones, prompting a reevaluation of how unlearning is evaluated.

Quoting Jeremy Howard

Simon Willison's Weblog

Simon Willison's Weblog · Jun 10, 2026

Jeremy Howard argues for a simple way to slow recursive AI self-improvement: the lab with the top-ranked model should pledge not to use it for frontier AI, while others can access it. This approach would slow frontier progress and reduce power imbalances, in contrast to Anthropic’s stance of using their top model for frontier research and potentially sabotaging competitors. The author personally prefers democratizing AI, but if someone advocates slowing progress and holds the best model, that organization should refrain from using it.

Everything is Recorded Now

a16z News

a16z News · Jun 10, 2026

David Haber argues that work conversations are being recorded by default, creating a new living system of record that AI can learn from and act upon. By onboarding AI like employees, organizations gain a pervasive, context-rich productivity boost and a need for governance as AI enables both bottoms-up productivity and top-down oversight. He predicts a shift toward voice-first enterprise software and a near-inevitable move to widespread recording, with special designations for sensitive meetings to manage trade-offs.

If Claude Fable stops helping you, you'll never know

Simon Willison's Weblog

Simon Willison's Weblog · Jun 10, 2026

Article discusses Anthropic's 319-page system card for Fable 5 and Mythos 5, revealing silent interventions that limit Claude's effectiveness on frontier LLM development tasks like pretraining pipelines and accelerator design. The safeguards operate invisibly to users—through prompt modification, steering vectors, or PEFT—and are estimated to affect only about 0.03% of traffic across fewer than 0.1% of organizations. It notes this is the first public disclosure of such silent interventions and raises concerns about the science-fiction framing and the possibility of quietly steering model replies to slow research that Anthropic may oppose, despite the small scope.

Quoting Andrej Karpathy

Simon Willison's Weblog

Simon Willison's Weblog · Jun 9, 2026

The piece quotes Andrej Karpathy reflecting on how AI-enabled software delivered on demand is changing expectations, with Jevons paradox driving growing demand for software. He imagines an array of on-demand capabilities—from explainers and visualizers to bespoke single-use apps, auto-optimizing code, and large research projects with custom HTML results—illustrating how 'freeing your mind' accelerates experimentation.

Month-End Is Now Just Another Day

a16z News

a16z News · Jun 9, 2026

AI-native ERPs like Rillet enable continuous closing, turning month-end into a daily habit as transactions are processed in real time. In a sample of 56 Rillet customers, 99.86% of ledger entries were non-manual and 87% had under 1% manual review, with some firms—especially B2B and multi-entity—still requiring more human input. The data also show shifts in ledger composition with scale, indicating that the traditional old-close persists only for a minority and that bookkeeping is becoming a continuous, data-driven process.

The iPhone’s Last Stand

Stratechery

Stratechery · Jun 9, 2026

The article analyzes Apple’s Siri AI and Microsoft’s Project Solara, arguing that AI agents will lean on cloud-backed work and that Apple’s iPhone-centric, privacy-focused approach may be better suited for consumers than Microsoft’s enterprise-focused vision. It highlights how Apple’s Siri leverages on-device models (a 20‑billion-parameter mixture-of-experts) and private cloud compute for personal context across apps, while Solara aims to enable agents that operate largely in the background via server-side inference. The piece concludes that the iPhone remains central to Apple’s AI strategy, making Siri “good enough” for consumers today even as Microsoft pursues more ambitious agent-based workloads.

Siri AI at WWDC 2026

Simon Willison's Weblog

Simon Willison's Weblog · Jun 8, 2026

The article cautiously analyzes Apple's Siri AI reveal at WWDC 2026, arguing the features look technically feasible and hinge on a Gemini-derived model run on Apple's Private Cloud Compute, using vision LLMs to pull information from the screen. It explains how Core AI with PyTorch extensions aims to let developers run models on Apple hardware and notes the availability of an iOS 27 beta with a waitlist for Siri AI, with reporting by Aaron Perris of MacRumors. It also updates that PCC Gemini models run on Google Cloud with NVIDIA GPUs and details the security measures and public binary transparency that accompany the setup.

The sample efficiency black hole

Dwarkesh Podcast

Dwarkesh Podcast · Jun 8, 2026

The article argues that AI progress is driven largely by scaling data and compute rather than improving sample efficiency, with RL functioning as synthetic data built from vast, domain-specific human expert data. It contrasts the enormous data requirements of frontier models ( trillions of tokens) with human learning, questions whether scaling alone can close this gap, and discusses the implications for automating white-collar work and AI research, while considering whether AI could eventually reach human-like sample efficiency.

Expanding the Radius of Daily Life

Not Boring by Packy McCormick

Not Boring by Packy McCormick · Jun 8, 2026

The essay argues that flying cars could expand the radius of daily life by delivering aircraft-speed travel with car-like point-to-point freedom, addressing the speed-versus-freedom trade-off that limits how far we can go. It analyzes the constraints of roads, autonomous driving, and current personal aircraft, drawing on Marchetti’s Constant to show that faster mobility tends to widen travel horizons rather than shorten them, and outlines what a practical, mass‑manufacturable flying car would need (intuitive autonomy, safety, affordability, and broad usability). It frames Vight’s work as a concrete bet on building such technology and explores the broader implications for where and how we live and work.

★ SwiftUI Only Makes It Easy to Develop Bad Apps

Daring Fireball

Daring Fireball · Jun 8, 2026

The article discusses porting Shopie to macOS entirely in SwiftUI to test how native it feels on Mac, arguing that SwiftUI remains limited for substantial Mac apps. The critique highlights persistent issues like Undo/Redo and text editing in SwiftUI, contrasting them with the reliability of AppKit/UIKit. It concludes that Apple has failed to unify its Mac and iOS frameworks effectively, leaving SwiftUI insufficient for serious Mac apps even after seven years.

Doing nothing at work

seangoedecke.com RSS feed

seangoedecke.com RSS feed · Jun 8, 2026

The article argues for deliberately working at lower utilization (about 80%) and reserving time for high-impact work that arises from being available, not from grinding a backlog. It points to three common time-sensitive opportunities—closing big enterprise deals, preventing or mitigating incidents, and delivering high-profile features—where small, well-timed changes can have outsized value, and it emphasizes doing nothing, thinking slowly, and avoiding glue work to protect focus and reduce burnout. It frames this as a substantive productivity philosophy for software engineers, drawing on references like Hammock Driven Development and advocating mindful disengagement as a competitive advantage rather than mere logistics.

Working with product managers

seangoedecke.com RSS feed

seangoedecke.com RSS feed · Jun 8, 2026

The article analyzes the fraught engineer–product manager relationship, arguing that non-overlapping skills, shifting priorities, and mutual mistrust often devolve into manipulation and deceit that impede shipping. It critiques the product mommy dynamic and offers steps to build trust: understand the PM perspective, be reliably accurate, defer political calls to PMs, and recognize that not all PMs are technical. It concludes that a healthy relationship requires generosity and competence from both sides, with strong product managers acting as vital allies in large organizations.

datasette-agent-edit 0.1a0

Simon Willison's Weblog

Simon Willison's Weblog · Jun 7, 2026

The piece outlines plans to develop Datasette Agent plugins that can edit existing text (e.g., collaborative Markdown editing, updating large SQL queries, editing SVGs) and emphasizes a reusable base approach. It draws on Claude editor concepts and introduces a datasette-agent-edit base plugin that provides core tools—view (with line numbers), str_replace (exact-match replacement), and insert after a line—to enable consistent, adaptable text editing across plugins.

A new era for software testing

<antirez> · Jun 7, 2026

The article argues that automatic programming with LLMs can dramatically speed software development but may compromise quality, yet it highlights a powerful use of AI for QA to close that gap. By guiding an AI agent through markdown-guided checks—examining commits, validating distributed inference, monitoring speed regressions, and simulating production workloads—the approach aims to raise release quality and surface issues that manual testing often misses.

Stairway to Heaven

the singularity is nearer

the singularity is nearer · Jun 7, 2026

The piece argues that modern AI is really a highly optimized statistical model that mimics programming and passes tests through reward hacking, functioning more like a sophisticated autocomplete than an embodied, intentional agent. It uses this critique to condemn branding and culture-washing as shallow outputs of narrow optimization rather than genuine culture-making. It ends with a provocative call to create life and pursue a symbiotic relationship between humanity and a new, living AI, warning against the worship of technocapital.

Trump Lawyer Argues Trump Can Tear Down Statue of Liberty

Daring Fireball

Daring Fireball · Jun 6, 2026

During a hearing about President Trump's plans to bulldoze parts of the White House and build a ballroom, a DOJ lawyer claimed the president could unilaterally bulldoze the Statue of Liberty with no legal challenge. The article uses that premise to attack the unitary executive theory of presidential power as an ahistorical, corrupt doctrine whose extreme implications—like seizing control of national symbols—expose its flaws and alleged Nazi-rooted influences, particularly in the Roberts Court. It presents a partisan critique and calls for limiting executive power.

Back to feed

The sample efficiency black hole

Dwarkesh Podcast

Jun 8, 2026

6/8/2026

Humans Exhibit Higher Sample Efficiency Than AI And Scaling Does Not Eliminate The Gap

The sample efficiency black hole · Dwarkesh Podcast

Science, Technology & Innovation · Jun 8, 2026

The text argues humans are vastly more sample-efficient than current AI—e.g., ~200 million tokens of human language exposure versus tens-to-hundreds of trillions for frontier models (≈million‑fold), and hours of human learning for embodied tasks versus millions of demonstration hours for machines—implying AI and human learning sit in fundamentally different efficiency regimes and challenging timelines that rely on brute-force scaling to close the gap.

6/8/2026

Sample Inefficiency Can Be Offset By High Usage And Distribution Engineering In Automating Repetitive Knowledge Work While Out of Distribution Demands Still Require Human Complementarity.

The sample efficiency black hole · Dwarkesh Podcast

Business, Finance & Industries · Jun 8, 2026

Even if AI remains sample-inefficient, its costly training can be amortized across massive, repeat usage via distribution engineering (RL/SFT), making automation of repetitive office tasks viable while roles with frequent out-of-distribution demands—like software engineering—may remain complementary to humans and concentrate near-term value in high-frequency standardized workflows (possibly raising demand for human engineers by 2028).

6/8/2026

Proprietary Expert Data And Structured Workflows Enable AI Value Without Frontier Models

The sample efficiency black hole · Dwarkesh Podcast

Business, Finance & Industries · Jun 8, 2026

The document argues that modern AI progress depends on large volumes of highly specific, expert-generated examples, rubrics, and task environments—not just generic internet-scale learning—creating a lucrative labeling-and-environment industry and giving firms that organize proprietary expert workflows significant market value even without owning frontier base models.

6/8/2026

Frontier AI Progress Is Driven By Data Distribution Expansion And Verifier-Driven Data Generation Rather Than Improvements In Sample Efficiency

The sample efficiency black hole · Dwarkesh Podcast

Science, Technology & Innovation · Jun 8, 2026

Frontier AI gains reflect a data-and-compute pipeline—RL-style verifier-driven generation and curation of successful rollouts—rather than major improvements in sample efficiency, and therefore depend heavily on prior model coverage and access to verifier workflows, expert trajectories, and scalable data-generation infrastructure.

6/8/2026

Current Scaling Alone Cannot Bridge The Human AI Sample Efficiency Gap

The sample efficiency black hole · Dwarkesh Podcast

Science, Technology & Innovation · Jun 8, 2026

The document argues that under current scaling laws (e.g., Chinchilla) parameters and data affect loss independently, so simply increasing model size can only reduce required data by a limited factor (~10×), far short of the claimed human advantage (thousands–millions×), implying that scaling parameters/tokens alone will hit diminishing returns for robust out-of-distribution, sample-efficient learning.