cover

2026-06-20

What is cosine similarity?

Cosine similarity is a way to measure how similar two vectors are by comparing the angle between them, not their size.

Why it matters

It’s useful when you care more about direction than magnitude. In AI and information retrieval, that usually means comparing embeddings, text vectors, or feature vectors to find items that “mean the same thing” even if they have different lengths.

You’d reach for cosine similarity when:

ranking search results by semantic closeness,
comparing document, sentence, or image embeddings,
clustering items based on meaning rather than raw counts.

In practice, it’s one of the most common similarity measures for vector embeddings because it works well when vector length is not meaningful.

How it works

Cosine similarity is based on the cosine of the angle between two vectors. If two vectors point in exactly the same direction, their cosine similarity is 1. If they are perpendicular, it is 0. If they point in opposite directions, it is -1.

The formula is:

[
\text{cosine similarity} = \frac{A \cdot B}{|A||B|}
]

where:

(A \cdot B) is the dot product,
(|A|) and (|B|) are the vector magnitudes.

A practical interpretation:

Compute the dot product of the two vectors.
Divide by both vectors’ lengths.
The result is a normalized similarity score that ignores scale.

That normalization is the key: two vectors can have very different magnitudes but still be considered highly similar if they point in the same direction.

Tiny concrete example

Suppose you have two 2D vectors:

(A = [1, 1])
(B = [2, 2])

Their dot product is (1\cdot2 + 1\cdot2 = 4).

Their magnitudes are:

(|A| = \sqrt{1^2 + 1^2} = \sqrt{2})
(|B| = \sqrt{2^2 + 2^2} = \sqrt{8})

So:

[
\frac{4}{\sqrt{2}\sqrt{8}} = 1
]

Even though (B) is longer, both vectors point in the same direction, so cosine similarity is perfect.

Common pitfalls / when NOT to use it

When magnitude matters: cosine similarity intentionally ignores vector length. If size carries meaning, use a metric that preserves it.
When vectors are sparse counts and raw overlap matters: cosine can be helpful, but it is not the only option; sometimes dot product or Jaccard-style measures are better depending on the task.
When comparing unnormalized embeddings without understanding the model: some embedding models are trained so cosine similarity is the natural choice; others may work better with dot product or Euclidean distance.
When vectors can be zero: cosine similarity is undefined for a zero vector because you cannot divide by its length.
When you need a distance, not a similarity: cosine similarity is a similarity score; if your algorithm expects a distance, you may need to transform it.

Related terms

Related terms

同じ著者の記事

What is graph RAG?

What is graph RAG?

Graph RAG, short for Graph Retrieval-Augmented Generation, is a way to help an LLM answer questions by retrieving information from a graph of connected entities and relationships instead of, or in addition to, plain text chunks. Classic RAG works well when the answer lives in a few relevant text passages. But some questions are really about relationships: who depends on whom, how events connect, which product is linked to which system, or how a concept appears across many documents.

What is the lost-in-the-middle problem?

What is the lost-in-the-middle problem?

The lost-in-the-middle problem is the tendency of a language model to miss information that appears in the middle of a long context, even when that information is relevant to the answer. This matters any time you stuff a lot of text into an LLM prompt: long documents, chat histories, codebases, retrieval-augmented generation (RAG), or multi-step agent traces. In practice, it means: the model may answer well using the beginning or end of the context, but fail on facts buried in the middle, so

What is query rewriting / expansion?

What is query rewriting / expansion?

Query rewriting or query expansion is the process of changing a user’s search query into a better one—usually by adding, removing, or rephrasing terms so a search or retrieval system can find more relevant results. People often search with short, vague, or underspecified queries. Systems that use the raw query alone may miss relevant documents because the wording does not match the target content. Query rewriting/expansion helps when you want better recall, better intent matching, or more robust

What is metadata filtering in retrieval?

What is metadata filtering in retrieval?

Metadata filtering in retrieval is the practice of narrowing a search or vector search to documents that match structured fields like date, author, source, tenant, language, or content type before or during ranking. It solves a very practical problem: not every relevant item is relevant for this user, this time, or this task. If you are building search or retrieval-augmented generation (RAG), metadata filters help you: exclude clearly irrelevant documents early, enforce access control or tenant

What is a vector index (HNSW / IVF)?

What is a vector index (HNSW / IVF)?

A vector index is a data structure that helps a system quickly find the most similar embeddings or vectors to a query vector; it is also commonly called an ANN index, short for approximate nearest neighbor index. If you store thousands, millions, or billions of vectors and want “find the closest ones,” a brute-force scan is usually too slow. A vector index lets you trade a little exactness for much faster retrieval, which is why it shows up in search, retrieval-augmented generation (RAG), recomm

What is contextual / late chunking?

What is contextual / late chunking?

Contextual chunking, also called late chunking, is a way to split a document into retrieval chunks after a model has already encoded the whole document, so each chunk keeps some awareness of the surrounding context. Normal chunking cuts text first and then embeds each piece alone. That is simple, but it can make chunks semantically thin: a paragraph may be ambiguous without the surrounding sections, headings, or earlier definitions. Late chunking helps when you want better retrieval quality

What is a knowledge base for RAG?

What is a knowledge base for RAG?

A knowledge base for RAG is the collection of documents or records a system retrieves from to ground an LLM’s answer in relevant source material. RAG, short for retrieval-augmented generation, works best when the model can search a useful, trustworthy set of content before it writes. That content is the knowledge base. You’d reach for one when the model needs to answer questions from: company docs support articles policies and handbooks product specs research notes tickets or tickets-like record

What is top-k retrieval?

What is top-k retrieval?

Top-k retrieval is a way to search a collection and return the k most relevant items for a query, instead of trying to sort or inspect everything. Top-k retrieval is the standard first step in search, recommendation, and retrieval-augmented generation (RAG). You use it when you want a fast shortlist: the most relevant documents, passages, products, images, or candidates, without paying the cost of full ranking over the entire corpus. In practice, teams reach for top-k retrieval when: the dat

What is a reranker / cross-encoder?

What is a reranker / cross-encoder?

A reranker, often implemented as a cross-encoder, is a model that takes a query and a candidate result together and scores how well they match. A reranker solves the “good enough first pass, better final answer” problem. In search and retrieval systems, you usually start with a fast retriever that finds a few dozen or few hundred candidate documents. That first stage is optimized for speed, not perfect ranking. A reranker then looks at those candidates more carefully and reorders them so the mos

What is hybrid search (BM25 + vector)?

What is hybrid search (BM25 + vector)?

Hybrid search is a retrieval method that combines keyword search with vector similarity search so you can find documents that match both the exact words a user typed and the broader meaning of the query. Pure keyword search is good when the exact term matters. Pure vector search is good when wording varies but meaning is similar. In practice, many real queries need both. You reach for hybrid search when: users may search with exact product names, error codes, or technical terms the same