PaPoo
cover

What is a guardrail?

A guardrail is a rule, check, or control that keeps an AI system from producing or doing something unsafe, incorrect, or out of policy.

Why it matters

Large language models and agents can generate useful outputs, but they can also hallucinate, leak sensitive data, follow malicious instructions, or take actions you did not intend. Guardrails help reduce that risk.

You reach for guardrails when you need the system to stay within boundaries: for example, only answer from approved sources, refuse certain requests, avoid sending private data, or require human approval before an action runs.

How it works

Guardrails can be applied at different points in a system:

  1. Before generation: filter or rewrite the user request, classify intent, or block known bad inputs.
  2. During generation: constrain decoding or steer the model with rules, schemas, or system instructions.
  3. After generation: validate the output against policy, structured formats, or business rules, then accept, edit, or reject it.
  4. Before actions: in agentic systems, check whether a tool call is allowed, whether parameters are sane, and whether a human needs to approve it.

In practice, guardrails are usually layered. No single check is enough, because failures can happen at the prompt, model, retrieval, or tool-execution stage.

Tiny concrete example

User: “Summarize this customer email and draft a refund.”

Guardrail flow:

Result: the model still helps, but it cannot accidentally issue an unauthorized refund.

Common pitfalls / when NOT to use it

Related terms

同じ著者の記事