2 Historical Roots (1950–2015)

TL;DR

AI safety is not new. The problem was named decades before the methods existed.
Control & alignment (Wiener, Good): a machine optimizing a literal goal faster than we can follow.
System safety (Leveson): safety is a system property, not a component one.
Philosophy & x-risk (Omohundro, Bostrom): capable agents converge on self-preservation regardless of goal.
All three hand off to Concrete Problems in AI Safety (2016), where the empirical era begins.

AI safety is not new — the problem was named decades before the methods. Three lineages converge into today’s field.

2.1 Control & alignment

Wiener (1960) gave the first clear statement of the control problem: a machine optimizing a literal goal faster than we can follow, which “we may not know, until too late, when to turn off” (Wiener, 1960). Good (1965) added the intelligence explosion — the last invention “provided the machine is docile enough… to keep under control” (Good, 1965) — the direct ancestor of recursive self-improvement.

Key idea

The alignment problem predates deep learning by half a century. Wiener’s control problem and Good’s intelligence explosion are the direct ancestors of today’s scalable oversight and recursive self-improvement concerns. The methods changed; the problem did not.

2.2 System-safety engineering

Long before ML, Leveson established that safety is a system property, not a component one: end-to-end hazard analysis, not a safe algorithm (Leveson, 1995). This is the root of today’s safety cases and defense-in-depth (Dobbe, 2022).

2.3 Philosophy & x-risk

Omohundro’s basic AI drives (Omohundro, 2008) and Bostrom’s instrumental convergence (Bostrom, 2014) argued capable agents converge on self-preservation and resource acquisition regardless of their goal.

Note

Cultural background. Fiction framed these ideas long before the research: Asimov’s Three Laws of Robotics (1942) — constraint-based safety; 2001: A Space Odyssey / HAL 9000 (1968) — an agent pursuing its objective to lethal ends; The Terminator / Skynet (1984) — loss of control and runaway capability; Ex Machina (2014) — containment and deceptive alignment. Intuition pumps, not engineering — but they shaped how the public frames every problem in this book.

2.4 Handoff to the empirical era

These threads hand off to Concrete Problems in AI Safety (Amodei et al., 2016), which reframed them as tractable empirical ML problems — where the modern landscape and its timeline begin.

Important

The 2016 shift from philosophical to empirical framing is what made the field tractable, and what makes every later chapter possible. Problems that were arguments became problems that could be measured, benchmarked, and defended against.