8  Interpretability & Explainability

9 Interpretability & Explainability

Stub. Chronological deep-dive: saliency/attribution → probing → mechanistic interpretability (circuits, features, causal tracing) → interpreting agent reasoning traces and tool-use decisions. Capture formalizations and illustrations.