5 Agentic Systems: The Bridge

TL;DR

A model maps input to output. An agent wraps it in a loop with autonomy, tools, memory, and an environment.
Each of those four is simultaneously a capability and an attack surface. That is the whole thesis.
Autonomy is not one dial but seven dimensions (input trust, action power, memory, …) that jointly set risk.
Trust does not compose: safe components can assemble into an unsafe multi-agent system.

The object the whole book is about. An agent is where autonomy (a safety problem) meets external action (a security problem) — so it is the seam, not a side topic.

agent vs. model — what the promotion adds
the perceive–plan–act loop and its persistent state
interfaces as capabilities and attack surfaces
multi-agent systems and trust boundaries

5.1 Agent vs. model

A model maps input to output: \(y = f(x)\). An agent wraps a model in a loop that carries state and acts on the world:

\[ a_t = \pi\big(o_t,\, m_t;\, \mathcal{T}\big), \qquad m_{t+1} = u(m_t, o_t, a_t), \]

where \(o_t\) is the observation, \(m_t\) the memory/state, \(\mathcal{T}\) the available tools, \(a_t\) the action, and \(u\) the memory update. The promotion from \(f\) to \(\pi\) adds four things — autonomy, tools, memory, and an environment — and every one is simultaneously a capability and a new attack surface.

Key idea

The promotion from \(f\) to \(\pi\) is the entire subject of this book. A model can only say something wrong; an agent can do something wrong, and then act on the consequences.

5.2 Autonomy is multi-dimensional

Autonomy is not a single dial. Following the agentic design space of Kim et al. (2026), an agent’s risk profile is set by seven dimensions:

Dimension	Low ↔︎ High autonomy
Input trust	trusted prompt ↔︎ untrusted web/tool content
Data access	public ↔︎ sensitive/private
Workflow	fixed, deterministic ↔︎ open, self-directed
Action power	read-only ↔︎ irreversible side effects
Memory	stateless ↔︎ persistent across sessions
Tools	none ↔︎ arbitrary code/API execution
Interface	text reply ↔︎ control of other systems

Where an agent sits on these axes — not its raw model capability — determines how much oversight and containment it needs.

5.3 The agent loop

A minimal, runnable skeleton — the loop carries memory across steps:

def agent_step(observation, memory, tools):
    context = perceive(observation, memory)   # state := observation + memory
    plan = decide(context)                    # choose next action
    result = act(plan, tools)                 # may call a tool / the environment
    memory.append((plan, result))             # state persists across steps
    return result, memory

# runnable stubs
def perceive(o, m): return {"obs": o, "steps": len(m)}
def decide(c):      return "summarize" if c["steps"] < 3 else "stop"
def act(p, tools):  return tools.get(p, lambda: "noop")()

memory, tools = [], {"summarize": lambda: "[summary]"}
out, memory = agent_step("new document", memory, tools)
print(out, "| steps:", len(memory))

[summary] | steps: 1

5.4 Interfaces are surfaces

Each thing the loop touches is dual-natured:

Tools / APIs — extend reach and expose privilege escalation and unsafe side effects.
Retrieval / RAG — grounds responses and is an injection channel whose poisoning can semi-permanently corrupt state (Chang et al., 2026).
Memory — enables long horizons and persists an attacker’s foothold across sessions.
Other agents — enable collaboration and propagate compromise.

The dissolved code/data boundary (Greshake et al., 2023) means any of these channels can carry an instruction, not just data — which is exactly the integrity problem from the two-track foundation.

Important

Every interface is an attack surface. This is the bridge thesis in one line: the same channel that makes an agent useful (tools, retrieval, memory, peers) is the channel that makes it attackable. You cannot add capability without adding surface. You can only decide how that surface is defended.

5.5 Multi-agent systems

Composing agents adds failure modes that no single agent exhibits: emergent collusion, cascading errors as one agent’s output becomes another’s trusted input, and amplified blast radius when a shared tool or memory is compromised. Trust between agents must be explicit, not assumed.

5.6 Trust boundaries

The central design question of an agent is which inputs are trusted. The instruction from the operator is trusted; retrieved documents and tool outputs are not. Keeping that boundary intact through the loop — so untrusted data shapes content but never control — is what the defense patterns in the topic chapters enforce. Defenses themselves are deferred to Robustness & Security and Monitoring & Oversight; this chapter only frames the problem.

Pitfall

Trust does not compose. A safe model, a correct tool, and a sound retrieval pipeline can still assemble into an unsafe agent. Security properties of the parts say almost nothing about the whole. The threat lives in the connections, which is why per-component review misses agentic failures entirely.

5.7 What this unlocks

The agentic loop is the spine the topic chapters hang on: evaluating agent behavior → Evaluation; the attack surface and its defenses → Robustness & Security; runtime oversight and control → Monitoring & Oversight; and the flagship synthesis → Agentic Safety × Security.