Aider LLM Leaderboards

Aider excels with LLMs skilled at writing and editing code, and uses benchmarks to evaluate an LLM’s ability to follow instructions and edit code successfully without human intervention. Aider’s polyglot benchmark tests LLMs on 225 challenging Exercism coding exercises across C++, Go, Java, JavaScript, Python, and Rust.

Aider polyglot coding leaderboard

Model Percent correct Cost Command Correct edit format Edit Format
o3 (high) + gpt-4.1
82.7%
$69.29
aider --model o3 --architect 100.0% architect
o3 (high)
79.6%
$111.03
aider --model o3 95.1% diff
Gemini 2.5 Pro Preview 03-25
72.9%
$6.32
aider --model gemini/gemini-2.5-pro-preview-03-25 92.4% diff-fenced
o4-mini (high)
72.0%
$19.64
aider --model o4-mini 90.7% diff
claude-3-7-sonnet-20250219 (32k thinking tokens)
64.9%
$36.83
aider --model anthropic/claude-3-7-sonnet-20250219 --thinking-tokens 32k 97.8% diff
DeepSeek R1 + claude-3-5-sonnet-20241022
64.0%
$13.29
aider --architect --model r1 --editor-model sonnet 100.0% architect
o1-2024-12-17 (high)
61.7%
$186.5
aider --model openrouter/openai/o1 91.5% diff
claude-3-7-sonnet-20250219 (no thinking)
60.4%
$17.72
aider --model sonnet 93.3% diff
o3-mini (high)
60.4%
$18.16
aider --model o3-mini --reasoning-effort high 93.3% diff
DeepSeek R1
56.9%
$5.42
aider --model deepseek/deepseek-reasoner 96.9% diff
DeepSeek V3 (0324)
55.1%
$1.12
aider --model deepseek/deepseek-chat 99.6% diff
Quasar Alpha
54.7%
aider --model openrouter/openrouter/quasar-alpha 98.2% diff
o3-mini (medium)
53.8%
$8.86
aider --model o3-mini 95.1% diff
Grok 3 Beta
53.3%
$11.03
aider --model openrouter/x-ai/grok-3-beta 99.6% diff
Optimus Alpha
52.9%
aider --model openrouter/openrouter/optimus-alpha 97.3% diff
gpt-4.1
52.4%
$9.86
aider --model gpt-4.1 98.2% diff
claude-3-5-sonnet-20241022
51.6%
$14.41
aider --model claude-3-5-sonnet-20241022 99.6% diff
Grok 3 Mini Beta (high)
49.3%
$0.73
aider --model xai/grok-3-mini-beta --reasoning-effort high 99.6% whole
DeepSeek Chat V3 (prev)
48.4%
$0.34
aider --model deepseek/deepseek-chat 98.7% diff
gemini-2.5-flash-preview-04-17 (default)
47.1%
$1.85
aider --model gemini/gemini-2.5-flash-preview-04-17 85.3% diff
chatgpt-4o-latest (2025-03-29)
45.3%
$19.74
aider --model chatgpt-4o-latest 64.4% diff
gpt-4.5-preview
44.9%
$183.18
aider --model openai/gpt-4.5-preview 97.3% diff
gemini-exp-1206
38.2%
aider --model gemini/gemini-exp-1206 98.2% whole
Gemini 2.0 Pro exp-02-05
35.6%
aider --model gemini/gemini-2.0-pro-exp-02-05 100.0% whole
Grok 3 Mini Beta (low)
34.7%
$0.79
aider --model openrouter/x-ai/grok-3-mini-beta 100.0% whole
o1-mini-2024-09-12
32.9%
$18.58
aider --model o1-mini 96.9% whole
gpt-4.1-mini
32.4%
$1.99
aider --model gpt-4.1-mini 92.4% diff
claude-3-5-haiku-20241022
28.0%
$6.06
aider --model claude-3-5-haiku-20241022 91.1% diff
chatgpt-4o-latest (2025-02-15)
27.1%
$14.37
aider --model chatgpt-4o-latest 93.3% diff
QwQ-32B + Qwen 2.5 Coder Instruct
26.2%
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b --architect 100.0% architect
gpt-4o-2024-08-06
23.1%
$7.03
aider --model gpt-4o-2024-08-06 94.2% diff
gemini-2.0-flash-exp
22.2%
aider --model gemini/gemini-2.0-flash-exp 100.0% whole
qwen-max-2025-01-25
21.8%
OPENAI_API_BASE=https://dashscope-intl.aliyuncs.com/compatible-mode/v1 aider --model openai/qwen-max-2025-01-25 90.2% diff
QwQ-32B
20.9%
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b 67.6% diff
gemini-2.0-flash-thinking-exp-01-21
18.2%
aider --model gemini/gemini-2.0-flash-thinking-exp-01-21 77.8% diff
gpt-4o-2024-11-20
18.2%
$6.74
aider --model gpt-4o-2024-11-20 95.1% diff
DeepSeek Chat V2.5
17.8%
$0.51
aider --model deepseek/deepseek-chat 92.9% diff
Qwen2.5-Coder-32B-Instruct
16.4%
aider --model openai/Qwen2.5-Coder-32B-Instruct 99.6% whole
Llama 4 Maverick
15.6%
aider --model nvidia_nim/meta/llama-4-maverick-17b-128e-instruct 99.1% whole
yi-lightning
12.9%
aider --model openai/yi-lightning 92.9% whole
command-a-03-2025-quality
12.0%
OPENAI_API_BASE=https://api.cohere.ai/compatibility/v1 aider --model openai/command-a-03-2025-quality 99.6% whole
Codestral 25.01
11.1%
$1.98
aider --model mistral/codestral-latest 100.0% whole
openhands-lm-32b-v0.1
10.2%
aider --model openrouter/all-hands/openhands-lm-32b-v0.1 95.1% whole
gpt-4.1-nano
8.9%
$0.43
aider --model gpt-4.1-nano 94.2% whole
Qwen2.5-Coder-32B-Instruct
8.0%
aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct # via hyperbolic 71.6% diff
gemma-3-27b-it
4.9%
aider --model openrouter/google/gemma-3-27b-it 100.0% whole
gpt-4o-mini-2024-07-18
3.6%
$0.32
aider --model gpt-4o-mini-2024-07-18 100.0% whole

Table of contents