13 Frontier

Recent work (roughly the last 12 months), newest first. Entries graduate into a topic chapter’s chronological timeline (Part II) once superseded or matured — keeping the book continuously evolving.

Living chapter. Part III is the book’s leading edge and the most volatile part of it. The concepts below are framed to outlast the specific papers under them, but the frontier itself will move: by this time next year this chapter will likely carry new concepts, and today’s will have matured into Part II. Treat it as a snapshot of where the open problems sit now, not a fixed table of contents.

13.1 Cross-cutting concepts

Rather than tracking individual approaches, Part III organizes the frontier around a few concepts — each broad enough to absorb several fast-moving research lines and to span more than one topic chapter. Specific methods (debate, CoT monitoring, sleeper-agent probes, …) are instances inside these concepts, and migrate to their topic chapters as they settle.

Oversight that holds under adversarial pressure — the unifying question behind AI control (Greenblatt et al., 2023), deceptive alignment and scheming (Hubinger et al., 2024), and chain-of-thought monitorability: can an oversight signal survive a model that is actively trying to defeat it? Spans Monitoring & Oversight, Alignment, and Interpretability.
Verification under a capability gap — when systems exceed human judgment, how is any claim about them checked? Subsumes scalable oversight, debate, decomposition, dangerous-capability evals, and safety cases (Hilton et al., 2025). Spans Monitoring & Oversight, Evaluation, and Systemic Safety & Governance.
Self-accelerating capability — automated AI R&D and recursive self-improvement compress the time available for oversight and amplify every other risk; the headline reason capability thresholds and frontier frameworks exist. Spans Monitoring & Oversight and Systemic Safety & Governance.
Compounding agentic attack surface — autonomy plus tool-use turns isolated vulnerabilities (injection, memory poisoning, inter-agent trust) into propagating, multi-step compromises. Spans Agentic Safety × Security and Robustness & Security.

13.2 Intake queue

Dated entries below, newest first — one-line claim, why it matters, the concept it lands under, and target topic. (Empty — populated during the continuous literature scan.)