2026-06-19

What is agent context compaction?

Agent context compaction, also called context summarization, is the practice of shrinking an AI agent’s conversation or working memory into a shorter representation so it can keep going without exceeding the model’s context window.

Why it matters

Agents often run for many turns, call tools, inspect files, and accumulate lots of intermediate state. That history eventually becomes too large to feed back into the model.

Context compaction solves two practical problems:

Keeps long-running agents usable when the prompt would otherwise overflow the context window.
Reduces cost and latency by sending less text back into the model.

In practice, teams reach for it when they want an agent to handle a long task—debugging, research, planning, ticket triage, code changes—without losing the important parts of what already happened.

How it works

The basic idea is to replace verbose history with a shorter representation that preserves what still matters.

Collect the current state.
The system gathers prior messages, tool results, decisions, and any explicit task state.
Compress the history.
A model or rule-based process rewrites that history into a compact summary: goals, constraints, decisions made, open questions, and useful artifacts.
Continue from the summary.
The agent drops most of the old transcript and resumes using the compacted context plus the latest user request and relevant tool outputs.

A good compaction keeps facts and commitments, not every token. For example, it should preserve “we already tried approach A and it failed” but not necessarily the full back-and-forth that led there.

There is an important design choice here: some systems summarize only conversation text, while others also compact structured agent state, like task plans, memory, or tool outputs. Different products use the term a bit differently, so the exact mechanism is implementation-dependent.

Tiny concrete example

Before compaction:

User: Build a CSV parser.
Agent: asks questions, explores edge cases, tries a regex approach, tests it, revises the plan, reads documentation, and accumulates pages of discussion.

After compaction:

Summary: Build a CSV parser in Python. Must handle quoted fields and escaped quotes. Regex approach rejected because it fails on embedded commas. Next step: implement a state machine and add tests for multiline fields.

The agent can now continue without carrying the entire transcript.

Common pitfalls / when NOT to use it

Don’t compact away critical details. If the summary drops constraints, decisions, or error messages, the agent may repeat mistakes.
Don’t treat summaries as perfect truth. A compacted summary is a lossy representation; it can omit nuance or introduce small errors.
Don’t use it when the full history is still needed. For auditability, debugging, or precise provenance, keeping the raw transcript may matter more than saving tokens.
Don’t assume one summary fits all. Different downstream steps may need different views of memory: a task summary, a user preference memory, or a tool-result digest.

A practical rule: use compaction when the conversation is long and the agent still needs to act, but keep the raw trace elsewhere if correctness, traceability, or compliance matters.