6 Alignment
7 Alignment
Stub. Chronological deep-dive: alignment theory → reward modeling → RLHF/RLAIF → Constitutional AI → scalable oversight → alignment of agentic, tool-using systems. Capture core concepts, formalizations, and impact — not source text.