PaPoo
cover

What is a code execution tool (sandbox)?

A code execution tool, or sandbox, is a controlled runtime that lets an AI agent run code safely without giving it full access to your systems.

Why it matters

LLMs are good at writing code, but they cannot reliably verify code by thought alone. A sandbox lets an agent:

You’d reach for it when the task needs computation, file manipulation, data transformation, or testing. In practice, teams use sandboxes when “just answer in text” is too brittle and they want the model to execute and validate its own work.

How it works

A sandbox is usually a restricted process, container, VM, or managed execution environment that limits what the code can do.

Typical controls include:

  1. Isolation
    The code runs away from the host system and other workloads.

  2. Resource limits
    CPU, memory, time, and sometimes disk are capped to reduce abuse or accidental runaway jobs.

  3. Permission boundaries
    The environment may block network access, restrict filesystem access, or expose only specific files and tools.

  4. Return channel
    The system sends back stdout, stderr, exit codes, files, or structured results so the agent can inspect what happened.

For agents, the sandbox is often part of the tool loop: the model writes code, runs it, reads the result, and then revises the code if needed.

Tiny concrete example

An agent is asked: “Compute the sum of all CSV values in sales.csv.”

It might generate and run something like:

import csv

total = 0
with open("sales.csv", newline="") as f:
    for row in csv.DictReader(f):
        total += float(row["amount"])

print(total)

The sandbox executes the script, returns the printed total, and the agent uses that result in its final answer.

Common pitfalls / when NOT to use it

Related terms

同じ著者の記事