#training-tuning

7 件の記事

What is a base model vs instruct model?

What is a base model vs instruct model?

A base model is a pretrained language model that predicts and generates text, while an instruct model is the same kind of model further tuned to follow human instructions more reliably. This distinction helps you choose the right model for the job: Base models are useful when you want raw next-token generation, custom fine-tuning, or maximum flexibility for downstream tasks. Instruct models are useful when you want the model to answer questions, follow prompts, summarize, write c

What is LoRA (low-rank adaptation)?

What is LoRA (low-rank adaptation)?

LoRA, or low-rank adaptation, is a way to adapt a large pretrained model by training a small number of extra parameters instead of updating all of the model’s weights. Fine-tuning a large model the traditional way can be expensive in memory, storage, and training time. LoRA solves that by letting you specialize a model for a task while keeping the original model frozen. You’d reach for LoRA when you want: lower training cost than full fine-tuning multiple task-specific variants of the same base

What is fine-tuning?

What is fine-tuning?

Fine-tuning is the process of taking a pre-trained model and training it a little more on a smaller, task-specific dataset so it behaves better for your use case. Pre-trained models are general-purpose: they know a lot, but not your company’s style, domain vocabulary, or output format. Fine-tuning helps when you want the model to: follow a specific tone or schema consistently, handle niche domain language, improve performance on a narrow task, reduce prompt engineering overhead for repeated work

What is RLHF?

RLHF, or reinforcement learning from human feedback, is a way to train a model so its outputs better match what people prefer, not just what is statistically likely. A plain language model learns to predict text. That is useful, but it does not automatically make the model helpful, honest, safe, or aligned with user intent. RLHF is used when you want to steer a model toward human preferences: better answers, fewer toxic outputs, more helpful refusals, and responses that fit product goals. In

What is instruction tuning?

What is instruction tuning?

Instruction tuning is a way to train a language model to follow natural-language requests more reliably by fine-tuning it on examples of instructions paired with good responses. A base language model is good at predicting text, but that does not automatically make it good at doing what a user asks. Instruction tuning helps bridge that gap. You’d reach for it when you want a model to respond more helpfully to prompts like: “Summarize this email” “Write SQL for this schema” “Explain this code as i

What is distillation?

What is distillation?

Distillation is a way to train a smaller model to imitate a larger, better one, so you keep much of the quality while reducing cost, latency, or memory use. Distillation solves a very practical problem: the model you want may be too slow, too expensive, or too large to deploy everywhere. You reach for distillation when you want one or more of these: lower inference cost faster responses smaller on-device models simpler deployment a model that approximates a stronger system without retraining fro

What is quantization?

What is quantization?

Quantization is the process of representing numbers with fewer bits, so a model or computation uses less memory and can often run faster. Large neural networks store weights, activations, and sometimes key/value caches as floating-point numbers. That is accurate, but expensive. Quantization reduces that cost by converting some of those values to lower-precision formats, such as 8-bit integers or even 4-bit representations. You usually reach for quantization when you want one or more of these: lo