Temperature is a sampling setting that controls how random or deterministic an LLM’s next-word choices are when it generates text.
Temperature is one of the simplest ways to trade off consistency versus variety.
In practice, many teams start with a low temperature for production workflows and raise it only when they explicitly want more variation.
An LLM produces a probability distribution over possible next tokens. Temperature changes how sharply or evenly that distribution is sampled.
A useful mental model: temperature does not change what the model “knows”; it changes how adventurous the generator is when choosing among candidate continuations.
Suppose the model is considering the next word after:
“The best way to explain temperature is…”
At low temperature, it might repeatedly pick the most likely continuation, such as:
“to think of it as a randomness control.”
At higher temperature, it may more often choose alternate phrasings, such as:
“to treat it as a knob for variety.”
“to view it as a sampling dial.”
Same underlying model, different sampling behavior.
In short: if you want steadier outputs, lower it; if you want more variety, raise it.