PaPoo
cover

What is cosine similarity?

Cosine similarity is a way to measure how similar two vectors are by comparing the angle between them, not their size.

Why it matters

It’s useful when you care more about direction than magnitude. In AI and information retrieval, that usually means comparing embeddings, text vectors, or feature vectors to find items that “mean the same thing” even if they have different lengths.

You’d reach for cosine similarity when:

In practice, it’s one of the most common similarity measures for vector embeddings because it works well when vector length is not meaningful.

How it works

Cosine similarity is based on the cosine of the angle between two vectors. If two vectors point in exactly the same direction, their cosine similarity is 1. If they are perpendicular, it is 0. If they point in opposite directions, it is -1.

The formula is:

[
\text{cosine similarity} = \frac{A \cdot B}{|A||B|}
]

where:

A practical interpretation:

  1. Compute the dot product of the two vectors.
  2. Divide by both vectors’ lengths.
  3. The result is a normalized similarity score that ignores scale.

That normalization is the key: two vectors can have very different magnitudes but still be considered highly similar if they point in the same direction.

Tiny concrete example

Suppose you have two 2D vectors:

Their dot product is (1\cdot2 + 1\cdot2 = 4).

Their magnitudes are:

So:

[
\frac{4}{\sqrt{2}\sqrt{8}} = 1
]

Even though (B) is longer, both vectors point in the same direction, so cosine similarity is perfect.

Common pitfalls / when NOT to use it

Related terms

Related terms

同じ著者の記事