Back to feed

The sample efficiency black hole

Dwarkesh Podcast

Jun 8, 2026

6/8/2026

Humans Exhibit Higher Sample Efficiency Than AI And Scaling Does Not Eliminate The Gap

The sample efficiency black hole · Dwarkesh Podcast

Science, Technology & Innovation · Jun 8, 2026

The text argues humans are vastly more sample-efficient than current AI—e.g., ~200 million tokens of human language exposure versus tens-to-hundreds of trillions for frontier models (≈million‑fold), and hours of human learning for embodied tasks versus millions of demonstration hours for machines—implying AI and human learning sit in fundamentally different efficiency regimes and challenging timelines that rely on brute-force scaling to close the gap.


6/8/2026

Sample Inefficiency Can Be Offset By High Usage And Distribution Engineering In Automating Repetitive Knowledge Work While Out of Distribution Demands Still Require Human Complementarity.

The sample efficiency black hole · Dwarkesh Podcast

Business, Finance & Industries · Jun 8, 2026

Even if AI remains sample-inefficient, its costly training can be amortized across massive, repeat usage via distribution engineering (RL/SFT), making automation of repetitive office tasks viable while roles with frequent out-of-distribution demands—like software engineering—may remain complementary to humans and concentrate near-term value in high-frequency standardized workflows (possibly raising demand for human engineers by 2028).


6/8/2026

Proprietary Expert Data And Structured Workflows Enable AI Value Without Frontier Models

The sample efficiency black hole · Dwarkesh Podcast

Business, Finance & Industries · Jun 8, 2026

The document argues that modern AI progress depends on large volumes of highly specific, expert-generated examples, rubrics, and task environments—not just generic internet-scale learning—creating a lucrative labeling-and-environment industry and giving firms that organize proprietary expert workflows significant market value even without owning frontier base models.


6/8/2026

Frontier AI Progress Is Driven By Data Distribution Expansion And Verifier-Driven Data Generation Rather Than Improvements In Sample Efficiency

The sample efficiency black hole · Dwarkesh Podcast

Science, Technology & Innovation · Jun 8, 2026

Frontier AI gains reflect a data-and-compute pipeline—RL-style verifier-driven generation and curation of successful rollouts—rather than major improvements in sample efficiency, and therefore depend heavily on prior model coverage and access to verifier workflows, expert trajectories, and scalable data-generation infrastructure.


6/8/2026

Current Scaling Alone Cannot Bridge The Human AI Sample Efficiency Gap

The sample efficiency black hole · Dwarkesh Podcast

Science, Technology & Innovation · Jun 8, 2026

The document argues that under current scaling laws (e.g., Chinchilla) parameters and data affect loss independently, so simply increasing model size can only reduce required data by a limited factor (~10×), far short of the claimed human advantage (thousands–millions×), implying that scaling parameters/tokens alone will hit diminishing returns for robust out-of-distribution, sample-efficient learning.