What Is AI Fine-Tuning? How to Customise AI Models for Your Use Case

Why fine-tuning exists

0.1%

Proportion of model weights updated during LoRA fine-tuning — vs 100% for full fine-tuning [Hu et al.]

$3–20

Cost to fine-tune a 7B parameter model using LoRA with 10,000 examples on cloud compute

3–10x

Performance improvement possible for domain-specific tasks when fine-tuning a smaller model vs prompting a larger general one [Stanford HELM]

A base language model like LLaMA or Mistral is trained on massive general datasets — the training pipeline is described in how large language models work. It knows a lot but isn't specialised for anything. Fine-tuning takes this base model and continues training it on a much smaller, task-specific dataset, adjusting the model's weights to prioritise patterns relevant to your use case.

Think of it like hiring a very well-educated generalist and then giving them three months of intensive training in your specific domain. They don't forget everything they knew — but their responses now reflect your industry's language, formats, and conventions.

The three methods

Full fine-tuning, LoRA, and QLoRA solve the same problem with different resource tradeoffs. Most practitioners today use LoRA or QLoRA.

Full fine-tuning

Full parameter fine-tuning

All model weights are updated during training. Maximum flexibility — the model can change its behaviour completely. Requires enormous compute and the full model in GPU memory.

GPU required: 8x A100s for 7B model

Cost: $500–$5,000+

Best for: Complete behaviour change

LoRA

Low-Rank Adaptation

Freezes original weights. Adds small trainable matrices ("adapters") alongside attention layers. Only trains these tiny additions — the base model is untouched. Results are merged at inference.

GPU required: 1x A100 or RTX 4090

Cost: $3–$50

Best for: Most fine-tuning tasks

QLoRA

Quantised LoRA

LoRA on a quantised (4-bit precision) base model. Dramatically reduces memory requirements — allows fine-tuning large models on consumer hardware. Slight quality reduction vs full LoRA.

GPU required: 1x RTX 3090 (24GB)

Cost: $1–$20

Best for: Budget or local training

The practical rule

LoRA has made fine-tuning accessible. You can fine-tune a Mistral 7B model on a consumer GPU in a few hours for under $10. The results are close to full fine-tuning quality for most tasks. Unless you need to fundamentally change the model's architecture or add entirely new capabilities, LoRA is your starting point.

When to fine-tune

Fine-tuning is often the wrong tool. Here's a decision framework for choosing between prompt engineering, RAG, and fine-tuning.

Can prompt engineering solve it?

Good instructions + examples in the prompt often match fine-tuned performance. Always try prompting first.

Use prompting

Do you need access to private or real-time documents?

RAG retrieves from external knowledge at query time. Fine-tuning doesn't add new facts — it changes behaviour patterns.

Use RAG

Do you need consistent format, tone, or domain vocabulary?

Legal briefings, medical summaries, specific code styles — fine-tuning teaches these patterns persistently without needing prompt instructions every time.

Fine-tune

Is inference cost a constraint?

A fine-tuned 7B model can match GPT-4 on narrow tasks at 1/20th the inference cost. Fine-tuning smaller models for production is a serious cost strategy.

Fine-tune

Do you have <500 high-quality examples?

Fine-tuning needs 100–10,000 examples. Fewer than 100 usually won't produce reliable improvements. Consider few-shot prompting instead.

Use prompting

What it costs in 2026

Method	Model size	Examples needed	Approx cost	Provider
OpenAI fine-tuning API	GPT-4o mini	100–10K	$3–$40	OpenAI
LoRA via cloud	Llama 3.1 8B	500–50K	$5–$100	Modal, Together AI, Replicate
QLoRA local	Mistral 7B	100–20K	Electricity only	Your own GPU
Full fine-tuning cloud	Llama 3.1 70B	10K–1M	$500–$10,000	AWS, GCP, Azure ML
Anthropic fine-tuning API	Claude Haiku	100–5K	$10–$200	Anthropic (limited access)

FAQ

Does fine-tuning teach the model new facts?

No — this is a common misconception. Fine-tuning adjusts how a model behaves and responds, not what it knows. It's better at learning styles, formats, and task-specific patterns than at memorising new factual information. For knowledge, use RAG. Fine-tuning a model on outdated company data won't give it access to new information — it'll just make it better at formatting responses in your company's style.

How much data do I need to fine-tune?

For LoRA on a 7B model: 100 high-quality examples can produce noticeable improvement on a narrow task. 1,000 examples is a solid starting point. 10,000+ produces robust results. Data quality matters far more than quantity — 200 excellent examples outperform 2,000 mediocre ones. Clean, consistent, representative examples are the key variable.

Will fine-tuning make a model forget things it already knew?

Yes — this is called catastrophic forgetting. Full fine-tuning can cause the model to lose general capabilities as it optimises for the training task. LoRA significantly mitigates this because the base weights are frozen. Using regularisation techniques like Elastic Weight Consolidation also helps. For most business use cases, LoRA's approach of adding adapters without touching base weights avoids the problem almost entirely.

Sources

[Hu] Hu et al. — "LoRA: Low-Rank Adaptation of Large Language Models", Microsoft Research (2021)

[Dettmers] Dettmers et al. — "QLoRA: Efficient Finetuning of Quantized LLMs" (2023)

[HELM] Stanford HELM — Holistic Evaluation of Language Models benchmark

What Is AI Fine-Tuning? How to Customise AI Models for Your Use Case

Sources

04 — Don't watch from the outside