8 Interpretability & Explainability
9 Interpretability & Explainability
Stub. Chronological deep-dive: saliency/attribution → probing → mechanistic interpretability (circuits, features, causal tracing) → interpreting agent reasoning traces and tool-use decisions. Capture formalizations and illustrations.