A guardrail is a rule, check, or control that keeps an AI system from producing or doing something unsafe, incorrect, or out of policy.
Large language models and agents can generate useful outputs, but they can also hallucinate, leak sensitive data, follow malicious instructions, or take actions you did not intend. Guardrails help reduce that risk.
You reach for guardrails when you need the system to stay within boundaries: for example, only answer from approved sources, refuse certain requests, avoid sending private data, or require human approval before an action runs.
Guardrails can be applied at different points in a system:
In practice, guardrails are usually layered. No single check is enough, because failures can happen at the prompt, model, retrieval, or tool-execution stage.
User: “Summarize this customer email and draft a refund.”
Guardrail flow:
Result: the model still helps, but it cannot accidentally issue an unauthorized refund.