PaPoo
cover

What is LoRA (low-rank adaptation)?

LoRA, or low-rank adaptation, is a way to adapt a large pretrained model by training a small number of extra parameters instead of updating all of the model’s weights.

Why it matters

Fine-tuning a large model the traditional way can be expensive in memory, storage, and training time. LoRA solves that by letting you specialize a model for a task while keeping the original model frozen.

You’d reach for LoRA when you want:

In practice, many teams use LoRA when they want the benefits of fine-tuning without paying the full cost of retraining every parameter.

How it works

Instead of changing a large weight matrix directly, LoRA adds a small learned update alongside it. The update is represented as two much smaller matrices whose product is low-rank. That is the “low-rank” part.

The base model weights stay fixed. During training, only the low-rank adapter weights are updated. At inference time, the adapter can be applied on top of the base model, and in some implementations it can be merged into the original weights for deployment.

The key idea is that many task-specific changes can be captured with a relatively small number of extra parameters. LoRA does not claim this is universally sufficient; it is an efficiency trick that often works well in practice for adaptation tasks.

Tiny concrete example

Suppose you have a general-purpose language model and want it to answer customer-support questions in your company’s style.

With full fine-tuning, you would update all model weights.

With LoRA, you:

  1. freeze the base model
  2. attach small trainable adapters to selected layers
  3. train those adapters on your support data
  4. use the base model + adapter for support answers

Result: a specialized model behavior, with much less training cost than full fine-tuning.

Common pitfalls / when NOT to use it

Related terms

Related terms

同じ著者の記事