PaPoo
cover

What is prompt caching?

Prompt caching is a way to reuse the cost and latency of processing the same prompt prefix instead of sending or recomputing it from scratch every time.

Why it matters

If your app sends a long, repeated system prompt, policy block, tool schema, or document context on many requests, you can end up paying to re-process the same text over and over. Prompt caching helps reduce that waste.

In practice, teams reach for it when:

It is especially useful when the “front half” of the prompt is fixed and only the user-specific part changes.

How it works

The exact mechanism depends on the model provider, but the basic idea is the same: if a new request starts with a prefix the system has already seen, that prefix can be reused from cache instead of being re-encoded or re-billed in full.

A common pattern is:

  1. You send a prompt with a long stable prefix, such as instructions and reference text.
  2. The model provider stores internal state for that prefix, keyed by the exact text and formatting.
  3. On later requests with the same prefix, the provider can skip work for that shared part and only process the new suffix.

Important nuance: prompt caching is usually exact-match and prefix-based. Small changes in whitespace, ordering, or formatting can invalidate the cache. Also, caching is usually an implementation feature of the model service, not something the model itself “understands.”

Tiny concrete example

Suppose your app always sends this prefix:

System: You are a support assistant for Acme.
Use the product policy below.
[20 pages of policy text...]

Then each user request adds only the new question:

User: How do I reset my password?

If the prefix is cacheable and unchanged, the service may reuse the cached processing for the system/policy text and only handle the user question as new input.

Common pitfalls / when NOT to use it

In short: use prompt caching when you have a large repeated prefix and want to save time or cost, but don’t treat it like a substitute for prompt design or context management.

Related terms

同じ著者の記事