VeltrixVeltrix.
← All articles
30 / 62March 24, 2026

What Is AI Fine-Tuning? How to Customise AI Models for Your Use Case

AI fine-tuning explained — full fine-tuning vs LoRA vs QLoRA, when fine-tuning beats prompting or RAG, real-world costs in 2026, and a decision framework.

Technical AI

What Is AI Fine-Tuning?

How to teach a pre-trained language model new tricks — without rebuilding it from scratch. Covers full fine-tuning, LoRA, QLoRA, and the decision framework for when to actually use it.

0.1%
Proportion of model weights updated during LoRA fine-tuning — vs 100% for full fine-tuning [Hu et al.]
$3–20
Cost to fine-tune a 7B parameter model using LoRA with 10,000 examples on cloud compute
3–10x
Performance improvement possible for domain-specific tasks when fine-tuning a smaller model vs prompting a larger general one [Stanford HELM]

A base language model like LLaMA or Mistral is trained on massive general datasets. It knows a lot but isn't specialised for anything. Fine-tuning takes this base model and continues training it on a much smaller, task-specific dataset — adjusting the model's weights to prioritise patterns relevant to your use case.

Think of it like hiring a very well-educated generalist and then giving them three months of intensive training in your specific domain. They don't forget everything they knew — but their responses now reflect your industry's language, formats, and conventions.

Full fine-tuning, LoRA, and QLoRA solve the same problem with different resource tradeoffs. Most practitioners today use LoRA or QLoRA.

Full fine-tuning
Full parameter fine-tuning
All model weights are updated during training. Maximum flexibility — the model can change its behaviour completely. Requires enormous compute and the full model in GPU memory.
GPU required: 8x A100s for 7B model
Cost: $500–$5,000+
Best for: Complete behaviour change
LoRA
Low-Rank Adaptation
Freezes original weights. Adds small trainable matrices ("adapters") alongside attention layers. Only trains these tiny additions — the base model is untouched. Results are merged at inference.
GPU required: 1x A100 or RTX 4090
Cost: $3–$50
Best for: Most fine-tuning tasks
QLoRA
Quantised LoRA
LoRA on a quantised (4-bit precision) base model. Dramatically reduces memory requirements — allows fine-tuning large models on consumer hardware. Slight quality reduction vs full LoRA.
GPU required: 1x RTX 3090 (24GB)
Cost: $1–$20
Best for: Budget or local training
The practical rule
LoRA has made fine-tuning accessible. You can fine-tune a Mistral 7B model on a consumer GPU in a few hours for under $10. The results are close to full fine-tuning quality for most tasks. Unless you need to fundamentally change the model's architecture or add entirely new capabilities, LoRA is your starting point.

Fine-tuning is often the wrong tool. Here's a decision framework for choosing between prompting, RAG, and fine-tuning.

Can prompt engineering solve it?
Good instructions + examples in the prompt often match fine-tuned performance. Always try prompting first.
Use prompting
Do you need access to private or real-time documents?
RAG retrieves from external knowledge at query time. Fine-tuning doesn't add new facts — it changes behaviour patterns.
Use RAG
Do you need consistent format, tone, or domain vocabulary?
Legal briefings, medical summaries, specific code styles — fine-tuning teaches these patterns persistently without needing prompt instructions every time.
Fine-tune
Is inference cost a constraint?
A fine-tuned 7B model can match GPT-4 on narrow tasks at 1/20th the inference cost. Fine-tuning smaller models for production is a serious cost strategy.
Fine-tune
Do you have <500 high-quality examples?
Fine-tuning needs 100–10,000 examples. Fewer than 100 usually won't produce reliable improvements. Consider few-shot prompting instead.
Use prompting
Method Model size Examples needed Approx cost Provider
OpenAI fine-tuning API GPT-4o mini 100–10K $3–$40 OpenAI
LoRA via cloud Llama 3.1 8B 500–50K $5–$100 Modal, Together AI, Replicate
QLoRA local Mistral 7B 100–20K Electricity only Your own GPU
Full fine-tuning cloud Llama 3.1 70B 10K–1M $500–$10,000 AWS, GCP, Azure ML
Anthropic fine-tuning API Claude Haiku 100–5K $10–$200 Anthropic (limited access)
Does fine-tuning teach the model new facts?
No — this is a common misconception. Fine-tuning adjusts how a model behaves and responds, not what it knows. It's better at learning styles, formats, and task-specific patterns than at memorising new factual information. For knowledge, use RAG. Fine-tuning a model on outdated company data won't give it access to new information — it'll just make it better at formatting responses in your company's style.
How much data do I need to fine-tune?
For LoRA on a 7B model: 100 high-quality examples can produce noticeable improvement on a narrow task. 1,000 examples is a solid starting point. 10,000+ produces robust results. Data quality matters far more than quantity — 200 excellent examples outperform 2,000 mediocre ones. Clean, consistent, representative examples are the key variable.
Will fine-tuning make a model forget things it already knew?
Yes — this is called catastrophic forgetting. Full fine-tuning can cause the model to lose general capabilities as it optimises for the training task. LoRA significantly mitigates this because the base weights are frozen. Using regularisation techniques like Elastic Weight Consolidation also helps. For most business use cases, LoRA's approach of adding adapters without touching base weights avoids the problem almost entirely.

Sources

[Hu] Hu et al. — "LoRA: Low-Rank Adaptation of Large Language Models", Microsoft Research (2021)
[Dettmers] Dettmers et al. — "QLoRA: Efficient Finetuning of Quantized LLMs" (2023)
[HELM] Stanford HELM — Holistic Evaluation of Language Models benchmark

Get AI insights every week

The AI Briefing covers what actually matters in AI — no hype, no jargon, just what you need to stay ahead.

Subscribe free
Written by Luke Madden, founder of Veltrix Collective. Data synthesis and analysis by Vel.