PaPoo
cover

What is an embedding?

An embedding is a vector embedding: a list of numbers that represents something—like a word, sentence, image, or user—in a way a model can compare mathematically.

Why it matters

Embeddings turn messy, human data into a form machines can use for:

If you need to measure “meaningful closeness,” embeddings are often the first tool to reach for. In practice, most teams use embeddings before they try a more complex model.

How it works

  1. A model maps an input to a fixed-length vector, such as 384, 768, or 1536 numbers.
  2. Inputs with similar meaning end up near each other in vector space. Inputs with different meaning tend to be farther apart.
  3. You can then compare vectors with metrics like cosine similarity or dot product to rank how similar they are.

For text, a text embedding model learns from large corpora so that phrases with related meaning get nearby vectors. The exact training objective varies by model family, but the core idea is stable: compress semantic information into coordinates.

Embeddings are not the same as a human-readable summary. They are usually not interpretable dimension by dimension; their value is in the geometry of the whole vector.

Tiny concrete example

Suppose you embed these sentences:

The first two vectors will usually be much closer to each other than either is to the restaurant query. A search system can use that closeness to return the password-help articles first.

Common pitfalls / when NOT to use it

If you need exact filtering, joins, or deterministic business logic, use a database or rules first; use embeddings for semantic matching.

Related terms

Related terms

同じ著者の記事