5  Agentic Systems: The Bridge

6 Agentic Systems: The Bridge

The object the whole book is about. An agent is where autonomy (a safety problem) meets external action (a security problem) — so it is the seam, not a side topic.

  • agent vs. model — what the promotion adds
  • the perceive–plan–act loop and its persistent state
  • interfaces as capabilities and attack surfaces
  • multi-agent systems and trust boundaries

6.1 Agent vs. model

A model maps input to output: \(y = f(x)\). An agent wraps a model in a loop that carries state and acts on the world:

\[ a_t = \pi\big(o_t,\, m_t;\, \mathcal{T}\big), \qquad m_{t+1} = u(m_t, o_t, a_t), \]

where \(o_t\) is the observation, \(m_t\) the memory/state, \(\mathcal{T}\) the available tools, \(a_t\) the action, and \(u\) the memory update. The promotion from \(f\) to \(\pi\) adds four things — autonomy, tools, memory, and an environment — and every one is simultaneously a capability and a new attack surface.

6.2 Autonomy is multi-dimensional

Autonomy is not a single dial. Following the agentic design space of Kim et al. (2026), an agent’s risk profile is set by seven dimensions:

Dimension Low ↔︎ High autonomy
Input trust trusted prompt ↔︎ untrusted web/tool content
Data access public ↔︎ sensitive/private
Workflow fixed, deterministic ↔︎ open, self-directed
Action power read-only ↔︎ irreversible side effects
Memory stateless ↔︎ persistent across sessions
Tools none ↔︎ arbitrary code/API execution
Interface text reply ↔︎ control of other systems

Where an agent sits on these axes — not its raw model capability — determines how much oversight and containment it needs.

6.3 The agent loop

The perceive-plan-act loop with persistent memory Perceive Plan Act Observe Memory / state tools · environment

A minimal, runnable skeleton — the loop carries memory across steps:

def agent_step(observation, memory, tools):
    context = perceive(observation, memory)   # state := observation + memory
    plan = decide(context)                    # choose next action
    result = act(plan, tools)                 # may call a tool / the environment
    memory.append((plan, result))             # state persists across steps
    return result, memory

# runnable stubs
def perceive(o, m): return {"obs": o, "steps": len(m)}
def decide(c):      return "summarize" if c["steps"] < 3 else "stop"
def act(p, tools):  return tools.get(p, lambda: "noop")()

memory, tools = [], {"summarize": lambda: "[summary]"}
out, memory = agent_step("new document", memory, tools)
print(out, "| steps:", len(memory))
[summary] | steps: 1

6.4 Interfaces are surfaces

Each thing the loop touches is dual-natured:

  • Tools / APIs — extend reach and expose privilege escalation and unsafe side effects.
  • Retrieval / RAG — grounds responses and is an injection channel whose poisoning can semi-permanently corrupt state (Chang et al., 2026).
  • Memory — enables long horizons and persists an attacker’s foothold across sessions.
  • Other agents — enable collaboration and propagate compromise.

The dissolved code/data boundary (Greshake et al., 2023) means any of these channels can carry an instruction, not just data — which is exactly the integrity problem from the two-track foundation.

6.5 Multi-agent systems

Composing agents adds failure modes that no single agent exhibits: emergent collusion, cascading errors as one agent’s output becomes another’s trusted input, and amplified blast radius when a shared tool or memory is compromised. Trust between agents must be explicit, not assumed.

6.6 Trust boundaries

The central design question of an agent is which inputs are trusted. The instruction from the operator is trusted; retrieved documents and tool outputs are not. Keeping that boundary intact through the loop — so untrusted data shapes content but never control — is what the defense patterns in the topic chapters enforce. Defenses themselves are deferred to Robustness & Security and Monitoring & Oversight; this chapter only frames the problem.

6.7 What this unlocks

The agentic loop is the spine the topic chapters hang on: evaluating agent behavior → Evaluation; the attack surface and its defenses → Robustness & Security; runtime oversight and control → Monitoring & Oversight; and the flagship synthesis → Agentic Safety × Security.