13 Tools & Frameworks

14 Tools & Frameworks

Stub. The applied layer — what practitioners actually deploy. To catalog (with what each does, where it sits in the attack surface, and maturity):

Guardrails / runtime — NVIDIA NeMo Guardrails, Llama Guard (Inan et al., 2023), ShieldGemma, Guardrails AI, Constitutional Classifiers (Sharma et al., 2025), interface firewalls (Huang et al., 2025).

Evaluation harnesses — Inspect (UK AISI), AgentDojo (Debenedetti et al., 2024), Agent Security Bench (Zhang et al., 2024), HELM, lm-evaluation-harness.

Red-teaming / attack — garak, PyRIT, Giskard.

Interpretability — TransformerLens, Gemma Scope SAEs (Lieberum et al., 2024).

Provenance / content authenticity — C2PA, watermarking toolkits.

Selection criteria, integration patterns, and a maturity rating per tool to follow.