13 Tools & Frameworks
14 Tools & Frameworks
Stub. The applied layer — what practitioners actually deploy. To catalog (with what each does, where it sits in the attack surface, and maturity):
- Guardrails / runtime — NVIDIA NeMo Guardrails, Llama Guard (Inan et al., 2023), ShieldGemma, Guardrails AI, Constitutional Classifiers (Sharma et al., 2025), interface firewalls (Huang et al., 2025).
- Evaluation harnesses — Inspect (UK AISI), AgentDojo (Debenedetti et al., 2024), Agent Security Bench (Zhang et al., 2024), HELM, lm-evaluation-harness.
- Red-teaming / attack — garak, PyRIT, Giskard.
- Interpretability — TransformerLens, Gemma Scope SAEs (Lieberum et al., 2024).
- Provenance / content authenticity — C2PA, watermarking toolkits.
Selection criteria, integration patterns, and a maturity rating per tool to follow.