Updated Weekly
LLM Rankings
Every major LLM ranked across 6 criteria. Veltrix reads benchmark reports weekly and updates scores automatically. No opinions — just data.
Overall
Coding
Reasoning
Creativity
Speed
Cost Eff.
| # | Model | Overall | Coding | Reasoning | Creativity | Speed | Cost Eff. | Context | Input/1M | API |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4 anthropic | 96 | 95 | 97 | 96 | 72 | 65 | 200K | $15.00 | API |
| 2 | o3 openai | 95 | 97 | 98 | 85 | 45 | 55 | 200K | $10.00 | API |
| 3 | Claude Sonnet 4 anthropic | 93 | 92 | 93 | 91 | 88 | 88 | 200K | $3.00 | API |
| 4 | GPT-4o openai | 91 | 90 | 90 | 88 | 85 | 80 | 128K | $2.50 | API |
| 5 | Gemini 2.0 Pro google | 90 | 89 | 91 | 87 | 78 | 72 | 2M | $1.25 | API |
| 6 | Gemini 2.0 Flash google | 88 | 87 | 88 | 84 | 95 | 92 | 1M | $0.10 | API |
| 7 | Mistral Large 2 mistral | 85 | 86 | 84 | 82 | 87 | 85 | 128K | $2.00 | API |
| 8 | Llama 3.3 70B meta | 83 | 82 | 83 | 80 | 88 | 97 | 128K | $0.23 |
#8Llama 3.3 70B
meta83
Coding
82
Reasoning
83
Speed
88
Want the full prompt comparison tool? Test any prompt across models in real time.
Try LLM Prompt Tester →