Why fine-tuning exists
0.1%
Proportion of model weights updated during LoRA fine-tuning — vs 100% for full fine-tuning [Hu et al.]
$3–20
Cost to fine-tune a 7B parameter model using LoRA with 10,000 examples on cloud compute
3–10x
Performance improvement possible for domain-specific tasks when fine-tuning a smaller model vs prompting a larger general one [Stanford HELM]
A base language model like LLaMA or Mistral is trained on massive general datasets. It knows a lot but isn't specialised for anything. Fine-tuning takes this base model and continues training it on a much smaller, task-specific dataset — adjusting the model's weights to prioritise patterns relevant to your use case.
Think of it like hiring a very well-educated generalist and then giving them three months of intensive training in your specific domain. They don't forget everything they knew — but their responses now reflect your industry's language, formats, and conventions.
The three methods
Full fine-tuning, LoRA, and QLoRA solve the same problem with different resource tradeoffs. Most practitioners today use LoRA or QLoRA.
Full fine-tuning
Full parameter fine-tuning
All model weights are updated during training. Maximum flexibility — the model can change its behaviour completely. Requires enormous compute and the full model in GPU memory.
GPU required: 8x A100s for 7B model
Cost: $500–$5,000+
Best for: Complete behaviour change
LoRA
Low-Rank Adaptation
Freezes original weights. Adds small trainable matrices ("adapters") alongside attention layers. Only trains these tiny additions — the base model is untouched. Results are merged at inference.
GPU required: 1x A100 or RTX 4090
Cost: $3–$50
Best for: Most fine-tuning tasks
QLoRA
Quantised LoRA
LoRA on a quantised (4-bit precision) base model. Dramatically reduces memory requirements — allows fine-tuning large models on consumer hardware. Slight quality reduction vs full LoRA.
GPU required: 1x RTX 3090 (24GB)
Cost: $1–$20
Best for: Budget or local training
The practical rule
LoRA has made fine-tuning accessible. You can fine-tune a Mistral 7B model on a consumer GPU in a few hours for under $10. The results are close to full fine-tuning quality for most tasks. Unless you need to fundamentally change the model's architecture or add entirely new capabilities, LoRA is your starting point.
When to fine-tune
Fine-tuning is often the wrong tool. Here's a decision framework for choosing between prompting, RAG, and fine-tuning.
Can prompt engineering solve it?
Good instructions + examples in the prompt often match fine-tuned performance. Always try prompting first.
Use prompting
Do you need access to private or real-time documents?
RAG retrieves from external knowledge at query time. Fine-tuning doesn't add new facts — it changes behaviour patterns.
Use RAG
Do you need consistent format, tone, or domain vocabulary?
Legal briefings, medical summaries, specific code styles — fine-tuning teaches these patterns persistently without needing prompt instructions every time.
Fine-tune
Is inference cost a constraint?
A fine-tuned 7B model can match GPT-4 on narrow tasks at 1/20th the inference cost. Fine-tuning smaller models for production is a serious cost strategy.
Fine-tune
Do you have <500 high-quality examples?
Fine-tuning needs 100–10,000 examples. Fewer than 100 usually won't produce reliable improvements. Consider few-shot prompting instead.
Use prompting
What it costs in 2026
| Method |
Model size |
Examples needed |
Approx cost |
Provider |
| OpenAI fine-tuning API |
GPT-4o mini |
100–10K |
$3–$40 |
OpenAI |
| LoRA via cloud |
Llama 3.1 8B |
500–50K |
$5–$100 |
Modal, Together AI, Replicate |
| QLoRA local |
Mistral 7B |
100–20K |
Electricity only |
Your own GPU |
| Full fine-tuning cloud |
Llama 3.1 70B |
10K–1M |
$500–$10,000 |
AWS, GCP, Azure ML |
| Anthropic fine-tuning API |
Claude Haiku |
100–5K |
$10–$200 |
Anthropic (limited access) |
FAQ
Does fine-tuning teach the model new facts?
No — this is a common misconception. Fine-tuning adjusts how a model behaves and responds, not what it knows. It's better at learning styles, formats, and task-specific patterns than at memorising new factual information. For knowledge, use RAG. Fine-tuning a model on outdated company data won't give it access to new information — it'll just make it better at formatting responses in your company's style.
How much data do I need to fine-tune?
For LoRA on a 7B model: 100 high-quality examples can produce noticeable improvement on a narrow task. 1,000 examples is a solid starting point. 10,000+ produces robust results. Data quality matters far more than quantity — 200 excellent examples outperform 2,000 mediocre ones. Clean, consistent, representative examples are the key variable.
Will fine-tuning make a model forget things it already knew?
Yes — this is called catastrophic forgetting. Full fine-tuning can cause the model to lose general capabilities as it optimises for the training task. LoRA significantly mitigates this because the base weights are frozen. Using regularisation techniques like Elastic Weight Consolidation also helps. For most business use cases, LoRA's approach of adding adapters without touching base weights avoids the problem almost entirely.
Sources
[Hu] Hu et al. — "LoRA: Low-Rank Adaptation of Large Language Models", Microsoft Research (2021)
[Dettmers] Dettmers et al. — "QLoRA: Efficient Finetuning of Quantized LLMs" (2023)
[HELM] Stanford HELM — Holistic Evaluation of Language Models benchmark