▶
o3 (high) + gpt-4.1
82.7%
$69.29
aider --model o3 --architect
100.0%
architect
Dirname
:
2025-04-17-01-20-35--o3-mini-high-diff-arch
Test cases
:
225
Model
:
o3 (high) + gpt-4.1
Edit format
:
architect
Commit hash
:
80909e1-dirty
Editor model
:
gpt-4.1
Editor edit format
:
editor-diff
Pass rate 1
:
36.0
Pass rate 2
:
82.7
Pass num 1
:
81
Pass num 2
:
186
Percent cases well formed
:
100.0
Error outputs
:
9
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
166
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
0
Total tests
:
225
Command
:
aider --model o3 --architect
Date
:
2025-04-17
Versions
:
0.82.2.dev
Seconds per case
:
110.0
Total cost
:
69.2921
▶
o3 (high)
79.6%
$111.03
aider --model o3
95.1%
diff
Dirname
:
2025-04-16-21-20-55--o3-high-diff-temp0-exsys
Test cases
:
225
Model
:
o3 (high)
Edit format
:
diff
Commit hash
:
24805ff-dirty
Pass rate 1
:
36.9
Pass rate 2
:
79.6
Pass num 1
:
83
Pass num 2
:
179
Percent cases well formed
:
95.1
Error outputs
:
11
Num malformed responses
:
11
Num with malformed responses
:
11
User asks
:
110
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model o3
Date
:
2025-04-16
Versions
:
0.82.1.dev
Seconds per case
:
113.8
Total cost
:
111.0325
▶
Gemini 2.5 Pro Preview 03-25
72.9%
$6.32
aider --model gemini/gemini-2.5-pro-preview-03-25
92.4%
diff-fenced
Dirname
:
2025-04-12-04-55-50--gemini-25-pro-diff-fenced
Test cases
:
225
Model
:
Gemini 2.5 Pro Preview 03-25
Edit format
:
diff-fenced
Commit hash
:
0282574
Pass rate 1
:
40.9
Pass rate 2
:
72.9
Pass num 1
:
92
Pass num 2
:
164
Percent cases well formed
:
92.4
Error outputs
:
21
Num malformed responses
:
21
Num with malformed responses
:
17
User asks
:
69
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model gemini/gemini-2.5-pro-preview-03-25
Date
:
2025-04-12
Versions
:
0.81.3.dev
Seconds per case
:
45.3
Total cost
:
6.3174
▶
o4-mini (high)
72.0%
$19.64
aider --model o4-mini
90.7%
diff
Dirname
:
2025-04-16-22-01-58--o4-mini-high-diff-exsys
Test cases
:
225
Model
:
o4-mini (high)
Edit format
:
diff
Commit hash
:
b66901f-dirty
Pass rate 1
:
19.6
Pass rate 2
:
72.0
Pass num 1
:
44
Pass num 2
:
162
Percent cases well formed
:
90.7
Error outputs
:
26
Num malformed responses
:
24
Num with malformed responses
:
21
User asks
:
66
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model o4-mini
Date
:
2025-04-16
Versions
:
0.82.1.dev
Seconds per case
:
176.5
Total cost
:
19.6399
▶
claude-3-7-sonnet-20250219 (32k thinking tokens)
64.9%
$36.83
aider --model anthropic/claude-3-7-sonnet-20250219 --thinking-tokens 32k
97.8%
diff
Dirname
:
2025-02-24-21-47-23--sonnet37-diff-think-32k-64k
Test cases
:
225
Model
:
claude-3-7-sonnet-20250219 (32k thinking tokens)
Edit format
:
diff
Commit hash
:
60d11a6, 93edbda
Pass rate 1
:
29.3
Pass rate 2
:
64.9
Pass num 1
:
66
Pass num 2
:
146
Percent cases well formed
:
97.8
Error outputs
:
66
Num malformed responses
:
5
Num with malformed responses
:
5
User asks
:
5
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
1
Total tests
:
225
Command
:
aider --model anthropic/claude-3-7-sonnet-20250219 --thinking-tokens 32k
Date
:
2025-02-24
Versions
:
0.75.1.dev
Seconds per case
:
105.2
Total cost
:
36.8343
▶
DeepSeek R1 + claude-3-5-sonnet-20241022
64.0%
$13.29
aider --architect --model r1 --editor-model sonnet
100.0%
architect
Dirname
:
2025-01-23-19-14-48--r1-architect-sonnet
Test cases
:
225
Model
:
DeepSeek R1 + claude-3-5-sonnet-20241022
Edit format
:
architect
Commit hash
:
05a77c7
Editor model
:
claude-3-5-sonnet-20241022
Editor edit format
:
editor-diff
Pass rate 1
:
27.1
Pass rate 2
:
64.0
Pass num 1
:
61
Pass num 2
:
144
Percent cases well formed
:
100.0
Error outputs
:
2
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
392
Lazy comments
:
6
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
5
Total tests
:
225
Command
:
aider --architect --model r1 --editor-model sonnet
Date
:
2025-01-23
Versions
:
0.72.3.dev
Seconds per case
:
251.6
Total cost
:
13.2933
▶
o1-2024-12-17 (high)
61.7%
$186.5
aider --model openrouter/openai/o1
91.5%
diff
Dirname
:
2024-12-21-19-23-03--polyglot-o1-hard-diff
Test cases
:
224
Model
:
o1-2024-12-17 (high)
Edit format
:
diff
Commit hash
:
a755079-dirty
Pass rate 1
:
23.7
Pass rate 2
:
61.7
Pass num 1
:
53
Pass num 2
:
139
Percent cases well formed
:
91.5
Error outputs
:
25
Num malformed responses
:
24
Num with malformed responses
:
19
User asks
:
16
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model openrouter/openai/o1
Date
:
2024-12-21
Versions
:
0.69.2.dev
Seconds per case
:
133.2
Total cost
:
186.4958
▶
claude-3-7-sonnet-20250219 (no thinking)
60.4%
$17.72
aider --model sonnet
93.3%
diff
Dirname
:
2025-02-24-19-54-07--sonnet37-diff
Test cases
:
225
Model
:
claude-3-7-sonnet-20250219 (no thinking)
Edit format
:
diff
Commit hash
:
75e9ee6
Pass rate 1
:
24.4
Pass rate 2
:
60.4
Pass num 1
:
55
Pass num 2
:
136
Percent cases well formed
:
93.3
Error outputs
:
16
Num malformed responses
:
16
Num with malformed responses
:
15
User asks
:
12
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
0
Total tests
:
225
Command
:
aider --model sonnet
Date
:
2025-02-24
Versions
:
0.74.4.dev
Seconds per case
:
28.3
Total cost
:
17.7191
▶
o3-mini (high)
60.4%
$18.16
aider --model o3-mini --reasoning-effort high
93.3%
diff
Dirname
:
2025-01-31-20-42-47--o3-mini-diff-high
Test cases
:
224
Model
:
o3-mini (high)
Edit format
:
diff
Commit hash
:
b0d58d1-dirty
Pass rate 1
:
21.0
Pass rate 2
:
60.4
Pass num 1
:
47
Pass num 2
:
136
Percent cases well formed
:
93.3
Error outputs
:
26
Num malformed responses
:
24
Num with malformed responses
:
15
User asks
:
19
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
7
Total tests
:
225
Command
:
aider --model o3-mini --reasoning-effort high
Date
:
2025-01-31
Versions
:
0.72.4.dev
Seconds per case
:
124.6
Total cost
:
18.1584
▶
DeepSeek R1
56.9%
$5.42
aider --model deepseek/deepseek-reasoner
96.9%
diff
Dirname
:
2025-01-20-19-11-38--ds-turns-upd-cur-msgs-fix-with-summarizer
Test cases
:
225
Model
:
DeepSeek R1
Edit format
:
diff
Commit hash
:
5650697-dirty
Pass rate 1
:
26.7
Pass rate 2
:
56.9
Pass num 1
:
60
Pass num 2
:
128
Percent cases well formed
:
96.9
Error outputs
:
8
Num malformed responses
:
7
Num with malformed responses
:
7
User asks
:
15
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
5
Total tests
:
225
Command
:
aider --model deepseek/deepseek-reasoner
Date
:
2025-01-20
Versions
:
0.71.2.dev
Seconds per case
:
113.7
Total cost
:
5.4193
▶
DeepSeek V3 (0324)
55.1%
$1.12
aider --model deepseek/deepseek-chat
99.6%
diff
Dirname
:
2025-03-24-15-41-33--deepseek-v3-0324-polyglot-diff
Test cases
:
225
Model
:
DeepSeek V3 (0324)
Edit format
:
diff
Commit hash
:
502b863
Pass rate 1
:
28.0
Pass rate 2
:
55.1
Pass num 1
:
63
Pass num 2
:
124
Percent cases well formed
:
99.6
Error outputs
:
32
Num malformed responses
:
1
Num with malformed responses
:
1
User asks
:
96
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
2
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model deepseek/deepseek-chat
Date
:
2025-03-24
Versions
:
0.78.1.dev
Seconds per case
:
290.0
Total cost
:
1.1164
▶
Quasar Alpha
54.7%
aider --model openrouter/openrouter/quasar-alpha
98.2%
diff
Dirname
:
2025-04-04-02-57-25--qalpha-diff-exsys
Test cases
:
225
Model
:
Quasar Alpha
Edit format
:
diff
Commit hash
:
8a34a6c-dirty
Pass rate 1
:
21.8
Pass rate 2
:
54.7
Pass num 1
:
49
Pass num 2
:
123
Percent cases well formed
:
98.2
Error outputs
:
4
Num malformed responses
:
4
Num with malformed responses
:
4
User asks
:
187
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model openrouter/openrouter/quasar-alpha
Date
:
2025-04-04
Versions
:
0.80.5.dev
Seconds per case
:
14.8
Total cost
:
0.0
▶
o3-mini (medium)
53.8%
$8.86
aider --model o3-mini
95.1%
diff
Dirname
:
2025-01-31-20-27-46--o3-mini-diff2
Test cases
:
225
Model
:
o3-mini (medium)
Edit format
:
diff
Commit hash
:
2fb517b-dirty
Pass rate 1
:
19.1
Pass rate 2
:
53.8
Pass num 1
:
43
Pass num 2
:
121
Percent cases well formed
:
95.1
Error outputs
:
28
Num malformed responses
:
28
Num with malformed responses
:
11
User asks
:
17
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model o3-mini
Date
:
2025-01-31
Versions
:
0.72.4.dev
Seconds per case
:
47.2
Total cost
:
8.8599
▶
Grok 3 Beta
53.3%
$11.03
aider --model openrouter/x-ai/grok-3-beta
99.6%
diff
Dirname
:
2025-04-10-04-21-31--grok3-diff-exuser
Test cases
:
225
Model
:
Grok 3 Beta
Edit format
:
diff
Commit hash
:
2dd40fc-dirty
Pass rate 1
:
22.2
Pass rate 2
:
53.3
Pass num 1
:
50
Pass num 2
:
120
Percent cases well formed
:
99.6
Error outputs
:
1
Num malformed responses
:
1
Num with malformed responses
:
1
User asks
:
68
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model openrouter/x-ai/grok-3-beta
Date
:
2025-04-10
Versions
:
0.81.2.dev
Seconds per case
:
15.3
Total cost
:
11.0338
▶
Optimus Alpha
52.9%
aider --model openrouter/openrouter/optimus-alpha
97.3%
diff
Dirname
:
2025-04-10-19-02-44--oalpha-diff-exsys
Test cases
:
225
Model
:
Optimus Alpha
Edit format
:
diff
Commit hash
:
532bc45-dirty
Pass rate 1
:
21.3
Pass rate 2
:
52.9
Pass num 1
:
48
Pass num 2
:
119
Percent cases well formed
:
97.3
Error outputs
:
7
Num malformed responses
:
6
Num with malformed responses
:
6
User asks
:
182
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
3
Total tests
:
225
Command
:
aider --model openrouter/openrouter/optimus-alpha
Date
:
2025-04-10
Versions
:
0.81.2.dev
Seconds per case
:
18.4
Total cost
:
0.0
▶
gpt-4.1
52.4%
$9.86
aider --model gpt-4.1
98.2%
diff
Dirname
:
2025-04-14-21-05-54--gpt41-diff-exuser
Test cases
:
225
Model
:
gpt-4.1
Edit format
:
diff
Commit hash
:
7a87db5-dirty
Pass rate 1
:
20.0
Pass rate 2
:
52.4
Pass num 1
:
45
Pass num 2
:
118
Percent cases well formed
:
98.2
Error outputs
:
6
Num malformed responses
:
5
Num with malformed responses
:
4
User asks
:
171
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
5
Total tests
:
225
Command
:
aider --model gpt-4.1
Date
:
2025-04-14
Versions
:
0.81.4.dev
Seconds per case
:
20.5
Total cost
:
9.8556
▶
claude-3-5-sonnet-20241022
51.6%
$14.41
aider --model claude-3-5-sonnet-20241022
99.6%
diff
Dirname
:
2025-01-17-19-44-33--sonnet-baseline-jan-17
Test cases
:
225
Model
:
claude-3-5-sonnet-20241022
Edit format
:
diff
Commit hash
:
6451d59
Pass rate 1
:
22.2
Pass rate 2
:
51.6
Pass num 1
:
50
Pass num 2
:
116
Percent cases well formed
:
99.6
Error outputs
:
2
Num malformed responses
:
1
Num with malformed responses
:
1
User asks
:
11
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
8
Total tests
:
225
Command
:
aider --model claude-3-5-sonnet-20241022
Date
:
2025-01-17
Versions
:
0.71.2.dev
Seconds per case
:
21.4
Total cost
:
14.4063
▶
Grok 3 Mini Beta (high)
49.3%
$0.73
aider --model xai/grok-3-mini-beta --reasoning-effort high
99.6%
whole
Dirname
:
2025-04-10-23-59-02--xai-grok3-mini-whole-high
Test cases
:
225
Model
:
Grok 3 Mini Beta (high)
Edit format
:
whole
Commit hash
:
8ee33da-dirty
Pass rate 1
:
17.3
Pass rate 2
:
49.3
Pass num 1
:
39
Pass num 2
:
111
Percent cases well formed
:
99.6
Error outputs
:
1
Num malformed responses
:
1
Num with malformed responses
:
1
User asks
:
64
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
0
Total tests
:
225
Command
:
aider --model xai/grok-3-mini-beta --reasoning-effort high
Date
:
2025-04-10
Versions
:
0.81.3.dev
Seconds per case
:
79.1
Total cost
:
0.7346
▶
DeepSeek Chat V3 (prev)
48.4%
$0.34
aider --model deepseek/deepseek-chat
98.7%
diff
Dirname
:
2024-12-25-13-31-51--deepseekv3preview-diff2
Test cases
:
225
Model
:
DeepSeek Chat V3 (prev)
Edit format
:
diff
Commit hash
:
0a23c4a-dirty
Pass rate 1
:
22.7
Pass rate 2
:
48.4
Pass num 1
:
51
Pass num 2
:
109
Percent cases well formed
:
98.7
Error outputs
:
7
Num malformed responses
:
7
Num with malformed responses
:
3
User asks
:
19
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
8
Total tests
:
225
Command
:
aider --model deepseek/deepseek-chat
Date
:
2024-12-25
Versions
:
0.69.2.dev
Seconds per case
:
34.8
Total cost
:
0.3369
▶
gemini-2.5-flash-preview-04-17 (default)
47.1%
$1.85
aider --model gemini/gemini-2.5-flash-preview-04-17
85.3%
diff
Dirname
:
2025-04-20-19-54-31--flash25-diff-no-think
Test cases
:
225
Model
:
gemini-2.5-flash-preview-04-17 (default)
Edit format
:
diff
Commit hash
:
7fcce5d-dirty
Pass rate 1
:
21.8
Pass rate 2
:
47.1
Pass num 1
:
49
Pass num 2
:
106
Percent cases well formed
:
85.3
Error outputs
:
60
Num malformed responses
:
55
Num with malformed responses
:
33
User asks
:
82
Lazy comments
:
1
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
5
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model gemini/gemini-2.5-flash-preview-04-17
Date
:
2025-04-20
Versions
:
0.82.3.dev
Seconds per case
:
50.1
Total cost
:
1.8451
▶
chatgpt-4o-latest (2025-03-29)
45.3%
$19.74
aider --model chatgpt-4o-latest
64.4%
diff
Dirname
:
2025-03-29-05-24-55--chatgpt4o-mar28-diff
Test cases
:
225
Model
:
chatgpt-4o-latest (2025-03-29)
Edit format
:
diff
Commit hash
:
0decbad
Pass rate 1
:
16.4
Pass rate 2
:
45.3
Pass num 1
:
37
Pass num 2
:
102
Percent cases well formed
:
64.4
Error outputs
:
85
Num malformed responses
:
85
Num with malformed responses
:
80
User asks
:
174
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model chatgpt-4o-latest
Date
:
2025-03-29
Versions
:
0.79.3.dev
Seconds per case
:
10.3
Total cost
:
19.7416
▶
gpt-4.5-preview
44.9%
$183.18
aider --model openai/gpt-4.5-preview
97.3%
diff
Dirname
:
2025-02-27-20-26-15--gpt45-diff3
Test cases
:
224
Model
:
gpt-4.5-preview
Edit format
:
diff
Commit hash
:
b462e55-dirty
Pass rate 1
:
22.3
Pass rate 2
:
44.9
Pass num 1
:
50
Pass num 2
:
101
Percent cases well formed
:
97.3
Error outputs
:
10
Num malformed responses
:
8
Num with malformed responses
:
6
User asks
:
15
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model openai/gpt-4.5-preview
Date
:
2025-02-27
Versions
:
0.75.2.dev
Seconds per case
:
113.5
Total cost
:
183.1802
▶
gemini-exp-1206
38.2%
aider --model gemini/gemini-exp-1206
98.2%
whole
Dirname
:
2024-12-22-18-43-25--gemini-exp-1206-polyglot-whole-2
Test cases
:
225
Model
:
gemini-exp-1206
Edit format
:
whole
Commit hash
:
b1bc2f8
Pass rate 1
:
19.6
Pass rate 2
:
38.2
Pass num 1
:
44
Pass num 2
:
86
Percent cases well formed
:
98.2
Error outputs
:
8
Num malformed responses
:
8
Num with malformed responses
:
4
User asks
:
32
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
9
Total tests
:
225
Command
:
aider --model gemini/gemini-exp-1206
Date
:
2024-12-22
Versions
:
0.69.2.dev
Seconds per case
:
45.5
Total cost
:
0.0
▶
Gemini 2.0 Pro exp-02-05
35.6%
aider --model gemini/gemini-2.0-pro-exp-02-05
100.0%
whole
Dirname
:
2025-02-25-20-23-07--gemini-pro
Test cases
:
225
Model
:
Gemini 2.0 Pro exp-02-05
Edit format
:
whole
Commit hash
:
2fccd47
Pass rate 1
:
20.4
Pass rate 2
:
35.6
Pass num 1
:
46
Pass num 2
:
80
Percent cases well formed
:
100.0
Error outputs
:
430
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
13
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
5
Total tests
:
225
Command
:
aider --model gemini/gemini-2.0-pro-exp-02-05
Date
:
2025-02-25
Versions
:
0.75.2.dev
Seconds per case
:
34.8
Total cost
:
0.0
▶
Grok 3 Mini Beta (low)
34.7%
$0.79
aider --model openrouter/x-ai/grok-3-mini-beta
100.0%
whole
Dirname
:
2025-04-10-18-47-24--grok3-mini-whole-exuser
Test cases
:
225
Model
:
Grok 3 Mini Beta (low)
Edit format
:
whole
Commit hash
:
14ffe77-dirty
Pass rate 1
:
11.1
Pass rate 2
:
34.7
Pass num 1
:
25
Pass num 2
:
78
Percent cases well formed
:
100.0
Error outputs
:
3
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
73
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
5
Total tests
:
225
Command
:
aider --model openrouter/x-ai/grok-3-mini-beta
Date
:
2025-04-10
Versions
:
0.81.2.dev
Seconds per case
:
35.1
Total cost
:
0.7856
▶
o1-mini-2024-09-12
32.9%
$18.58
aider --model o1-mini
96.9%
whole
Dirname
:
2024-12-22-21-26-35--polyglot-o1mini-whole
Test cases
:
225
Model
:
o1-mini-2024-09-12
Edit format
:
whole
Commit hash
:
37df899
Pass rate 1
:
5.8
Pass rate 2
:
32.9
Pass num 1
:
13
Pass num 2
:
74
Percent cases well formed
:
96.9
Error outputs
:
8
Num malformed responses
:
8
Num with malformed responses
:
7
User asks
:
27
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
3
Total tests
:
225
Command
:
aider --model o1-mini
Date
:
2024-12-22
Versions
:
0.69.2.dev
Seconds per case
:
34.7
Total cost
:
18.577
▶
gpt-4.1-mini
32.4%
$1.99
aider --model gpt-4.1-mini
92.4%
diff
Dirname
:
2025-04-14-21-27-53--gpt41mini-diff
Test cases
:
225
Model
:
gpt-4.1-mini
Edit format
:
diff
Commit hash
:
ffb743e-dirty
Pass rate 1
:
11.1
Pass rate 2
:
32.4
Pass num 1
:
25
Pass num 2
:
73
Percent cases well formed
:
92.4
Error outputs
:
64
Num malformed responses
:
62
Num with malformed responses
:
17
User asks
:
159
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
2
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model gpt-4.1-mini
Date
:
2025-04-14
Versions
:
0.81.4.dev
Seconds per case
:
19.5
Total cost
:
1.9918
▶
claude-3-5-haiku-20241022
28.0%
$6.06
aider --model claude-3-5-haiku-20241022
91.1%
diff
Dirname
:
2024-12-21-21-46-27--polyglot-haiku-diff
Test cases
:
225
Model
:
claude-3-5-haiku-20241022
Edit format
:
diff
Commit hash
:
a755079-dirty
Pass rate 1
:
7.1
Pass rate 2
:
28.0
Pass num 1
:
16
Pass num 2
:
63
Percent cases well formed
:
91.1
Error outputs
:
31
Num malformed responses
:
30
Num with malformed responses
:
20
User asks
:
13
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
9
Total tests
:
225
Command
:
aider --model claude-3-5-haiku-20241022
Date
:
2024-12-21
Versions
:
0.69.2.dev
Seconds per case
:
31.8
Total cost
:
6.0583
▶
chatgpt-4o-latest (2025-02-15)
27.1%
$14.37
aider --model chatgpt-4o-latest
93.3%
diff
Dirname
:
2025-02-15-19-51-22--chatgpt4o-feb15-diff
Test cases
:
223
Model
:
chatgpt-4o-latest (2025-02-15)
Edit format
:
diff
Commit hash
:
108ce18-dirty
Pass rate 1
:
9.0
Pass rate 2
:
27.1
Pass num 1
:
20
Pass num 2
:
61
Percent cases well formed
:
93.3
Error outputs
:
66
Num malformed responses
:
21
Num with malformed responses
:
15
User asks
:
57
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model chatgpt-4o-latest
Date
:
2025-02-15
Versions
:
0.74.3.dev
Seconds per case
:
12.4
Total cost
:
14.3703
▶
QwQ-32B + Qwen 2.5 Coder Instruct
26.2%
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b --architect
100.0%
architect
Dirname
:
2025-03-07-15-11-27--qwq32b-arch-temp-topp-again
Test cases
:
225
Model
:
QwQ-32B + Qwen 2.5 Coder Instruct
Edit format
:
architect
Commit hash
:
52162a5
Editor model
:
fireworks_ai/accounts/fireworks/models/qwen2p5-coder-32b-instruct
Editor edit format
:
editor-diff
Pass rate 1
:
9.8
Pass rate 2
:
26.2
Pass num 1
:
22
Pass num 2
:
59
Percent cases well formed
:
100.0
Error outputs
:
122
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
489
Lazy comments
:
8
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b --architect
Date
:
2025-03-07
Versions
:
0.75.3.dev
Seconds per case
:
137.4
Total cost
:
0
▶
gpt-4o-2024-08-06
23.1%
$7.03
aider --model gpt-4o-2024-08-06
94.2%
diff
Dirname
:
2024-12-30-20-44-54--gpt4o-ex-as-sys-clean-prompt
Test cases
:
225
Model
:
gpt-4o-2024-08-06
Edit format
:
diff
Commit hash
:
09ee197-dirty
Pass rate 1
:
4.9
Pass rate 2
:
23.1
Pass num 1
:
11
Pass num 2
:
52
Percent cases well formed
:
94.2
Error outputs
:
21
Num malformed responses
:
21
Num with malformed responses
:
13
User asks
:
65
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
3
Total tests
:
225
Command
:
aider --model gpt-4o-2024-08-06
Date
:
2024-12-30
Versions
:
0.70.1.dev
Seconds per case
:
16.0
Total cost
:
7.0286
▶
gemini-2.0-flash-exp
22.2%
aider --model gemini/gemini-2.0-flash-exp
100.0%
whole
Dirname
:
2024-12-22-20-08-13--gemini-2.0-flash-exp-polyglot-whole
Test cases
:
225
Model
:
gemini-2.0-flash-exp
Edit format
:
whole
Commit hash
:
b1bc2f8
Pass rate 1
:
11.6
Pass rate 2
:
22.2
Pass num 1
:
26
Pass num 2
:
50
Percent cases well formed
:
100.0
Error outputs
:
1
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
9
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
8
Total tests
:
225
Command
:
aider --model gemini/gemini-2.0-flash-exp
Date
:
2024-12-22
Versions
:
0.69.2.dev
Seconds per case
:
12.2
Total cost
:
0.0
▶
qwen-max-2025-01-25
21.8%
OPENAI_API_BASE=https://dashscope-intl.aliyuncs.com/compatible-mode/v1 aider --model openai/qwen-max-2025-01-25
90.2%
diff
Dirname
:
2025-01-28-16-00-03--qwen-max-2025-01-25-polyglot-diff
Test cases
:
225
Model
:
qwen-max-2025-01-25
Edit format
:
diff
Commit hash
:
ae7d459
Pass rate 1
:
9.3
Pass rate 2
:
21.8
Pass num 1
:
21
Pass num 2
:
49
Percent cases well formed
:
90.2
Error outputs
:
46
Num malformed responses
:
44
Num with malformed responses
:
22
User asks
:
23
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
9
Total tests
:
225
Command
:
OPENAI_API_BASE=https://dashscope-intl.aliyuncs.com/compatible-mode/v1 aider --model openai/qwen-max-2025-01-25
Date
:
2025-01-28
Versions
:
0.72.4.dev
Seconds per case
:
39.5
▶
QwQ-32B
20.9%
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b
67.6%
diff
Dirname
:
2025-03-06-17-40-24--qwq32b-diff-temp-topp-ex-sys-remind-user-for-real
Test cases
:
225
Model
:
QwQ-32B
Edit format
:
diff
Commit hash
:
51d118f-dirty
Pass rate 1
:
8.0
Pass rate 2
:
20.9
Pass num 1
:
18
Pass num 2
:
47
Percent cases well formed
:
67.6
Error outputs
:
145
Num malformed responses
:
143
Num with malformed responses
:
73
User asks
:
17
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model fireworks_ai/accounts/fireworks/models/qwq-32b
Date
:
2025-03-06
Versions
:
0.75.3.dev
Seconds per case
:
228.6
Total cost
:
0.0
▶
gemini-2.0-flash-thinking-exp-01-21
18.2%
aider --model gemini/gemini-2.0-flash-thinking-exp-01-21
77.8%
diff
Dirname
:
2025-01-21-22-51-49--gemini-2.0-flash-thinking-exp-01-21-polyglot-diff
Test cases
:
225
Model
:
gemini-2.0-flash-thinking-exp-01-21
Edit format
:
diff
Commit hash
:
843720a
Pass rate 1
:
5.8
Pass rate 2
:
18.2
Pass num 1
:
13
Pass num 2
:
41
Percent cases well formed
:
77.8
Error outputs
:
182
Num malformed responses
:
180
Num with malformed responses
:
50
User asks
:
26
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
2
Test timeouts
:
7
Total tests
:
225
Command
:
aider --model gemini/gemini-2.0-flash-thinking-exp-01-21
Date
:
2025-01-21
Versions
:
0.72.2.dev
Seconds per case
:
24.2
Total cost
:
0.0
▶
gpt-4o-2024-11-20
18.2%
$6.74
aider --model gpt-4o-2024-11-20
95.1%
diff
Dirname
:
2024-12-30-20-57-12--gpt-4o-2024-11-20-ex-as-sys
Test cases
:
225
Model
:
gpt-4o-2024-11-20
Edit format
:
diff
Commit hash
:
09ee197-dirty
Pass rate 1
:
4.9
Pass rate 2
:
18.2
Pass num 1
:
11
Pass num 2
:
41
Percent cases well formed
:
95.1
Error outputs
:
12
Num malformed responses
:
12
Num with malformed responses
:
11
User asks
:
53
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
12
Total tests
:
225
Command
:
aider --model gpt-4o-2024-11-20
Date
:
2024-12-30
Versions
:
0.70.1.dev
Seconds per case
:
12.1
Total cost
:
6.7351
▶
DeepSeek Chat V2.5
17.8%
$0.51
aider --model deepseek/deepseek-chat
92.9%
diff
Dirname
:
2024-12-21-20-56-21--polyglot-deepseek-diff
Test cases
:
225
Model
:
DeepSeek Chat V2.5
Edit format
:
diff
Commit hash
:
a755079-dirty
Pass rate 1
:
5.3
Pass rate 2
:
17.8
Pass num 1
:
12
Pass num 2
:
40
Percent cases well formed
:
92.9
Error outputs
:
42
Num malformed responses
:
37
Num with malformed responses
:
16
User asks
:
23
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
5
Test timeouts
:
5
Total tests
:
225
Command
:
aider --model deepseek/deepseek-chat
Date
:
2024-12-21
Versions
:
0.69.2.dev
Seconds per case
:
184.0
Total cost
:
0.5101
▶
Qwen2.5-Coder-32B-Instruct
16.4%
aider --model openai/Qwen2.5-Coder-32B-Instruct
99.6%
whole
Dirname
:
2024-12-26-00-55-20--Qwen2.5-Coder-32B-Instruct
Test cases
:
225
Model
:
Qwen2.5-Coder-32B-Instruct
Edit format
:
whole
Commit hash
:
b51768b0
Pass rate 1
:
4.9
Pass rate 2
:
16.4
Pass num 1
:
11
Pass num 2
:
37
Percent cases well formed
:
99.6
Error outputs
:
1
Num malformed responses
:
1
Num with malformed responses
:
1
User asks
:
33
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
6
Total tests
:
225
Command
:
aider --model openai/Qwen2.5-Coder-32B-Instruct
Date
:
2024-12-26
Versions
:
0.69.2.dev
Seconds per case
:
42.0
Total cost
:
0.0
▶
Llama 4 Maverick
15.6%
aider --model nvidia_nim/meta/llama-4-maverick-17b-128e-instruct
99.1%
whole
Dirname
:
2025-04-06-08-39-52--llama-4-maverick-17b-128e-instruct-polyglot-whole
Test cases
:
225
Model
:
Llama 4 Maverick
Edit format
:
whole
Commit hash
:
9445a31
Pass rate 1
:
4.4
Pass rate 2
:
15.6
Pass num 1
:
10
Pass num 2
:
35
Percent cases well formed
:
99.1
Error outputs
:
12
Num malformed responses
:
2
Num with malformed responses
:
2
User asks
:
248
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model nvidia_nim/meta/llama-4-maverick-17b-128e-instruct
Date
:
2025-04-06
Versions
:
0.81.2.dev
Seconds per case
:
20.5
Total cost
:
0.0
▶
yi-lightning
12.9%
aider --model openai/yi-lightning
92.9%
whole
Dirname
:
2024-12-23-01-11-56--yi-test
Test cases
:
225
Model
:
yi-lightning
Edit format
:
whole
Commit hash
:
2b1625e
Pass rate 1
:
5.8
Pass rate 2
:
12.9
Pass num 1
:
13
Pass num 2
:
29
Percent cases well formed
:
92.9
Error outputs
:
87
Num malformed responses
:
72
Num with malformed responses
:
16
User asks
:
107
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
6
Total tests
:
225
Command
:
aider --model openai/yi-lightning
Date
:
2024-12-23
Versions
:
0.69.2.dev
Seconds per case
:
146.7
Total cost
:
0.0
▶
command-a-03-2025-quality
12.0%
OPENAI_API_BASE=https://api.cohere.ai/compatibility/v1 aider --model openai/command-a-03-2025-quality
99.6%
whole
Dirname
:
2025-03-14-23-40-00--cmda-quality-whole2
Test cases
:
225
Model
:
command-a-03-2025-quality
Edit format
:
whole
Commit hash
:
a1aa63f
Pass rate 1
:
2.2
Pass rate 2
:
12.0
Pass num 1
:
5
Pass num 2
:
27
Percent cases well formed
:
99.6
Error outputs
:
2
Num malformed responses
:
1
Num with malformed responses
:
1
User asks
:
215
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
4
Total tests
:
225
Command
:
OPENAI_API_BASE=https://api.cohere.ai/compatibility/v1 aider --model openai/command-a-03-2025-quality
Date
:
2025-03-14
Versions
:
0.77.1.dev
Seconds per case
:
85.1
Total cost
:
0.0
▶
Codestral 25.01
11.1%
$1.98
aider --model mistral/codestral-latest
100.0%
whole
Dirname
:
2025-01-13-18-17-25--codestral-whole2
Test cases
:
225
Model
:
Codestral 25.01
Edit format
:
whole
Commit hash
:
0cba898-dirty
Pass rate 1
:
4.0
Pass rate 2
:
11.1
Pass num 1
:
9
Pass num 2
:
25
Percent cases well formed
:
100.0
Error outputs
:
0
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
47
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
4
Total tests
:
225
Command
:
aider --model mistral/codestral-latest
Date
:
2025-01-13
Versions
:
0.71.2.dev
Seconds per case
:
9.3
Total cost
:
1.9834
▶
openhands-lm-32b-v0.1
10.2%
aider --model openrouter/all-hands/openhands-lm-32b-v0.1
95.1%
whole
Dirname
:
2025-04-19-14-43-04--o4-mini-patch
Test cases
:
225
Model
:
openhands-lm-32b-v0.1
Edit format
:
whole
Commit hash
:
c08336f
Pass rate 1
:
4.0
Pass rate 2
:
10.2
Pass num 1
:
9
Pass num 2
:
23
Percent cases well formed
:
95.1
Error outputs
:
55
Num malformed responses
:
41
Num with malformed responses
:
11
User asks
:
166
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
11
Total tests
:
225
Command
:
aider --model openrouter/all-hands/openhands-lm-32b-v0.1
Date
:
2025-04-19
Versions
:
0.82.2.dev
Seconds per case
:
195.6
Total cost
:
0.0
▶
gpt-4.1-nano
8.9%
$0.43
aider --model gpt-4.1-nano
94.2%
whole
Dirname
:
2025-04-14-22-46-01--gpt41nano-diff
Test cases
:
225
Model
:
gpt-4.1-nano
Edit format
:
whole
Commit hash
:
71d1591-dirty
Pass rate 1
:
3.1
Pass rate 2
:
8.9
Pass num 1
:
7
Pass num 2
:
20
Percent cases well formed
:
94.2
Error outputs
:
20
Num malformed responses
:
20
Num with malformed responses
:
13
User asks
:
316
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
8
Total tests
:
225
Command
:
aider --model gpt-4.1-nano
Date
:
2025-04-14
Versions
:
0.81.4.dev
Seconds per case
:
12.0
Total cost
:
0.4281
▶
Qwen2.5-Coder-32B-Instruct
8.0%
aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct # via hyperbolic
71.6%
diff
Dirname
:
2024-12-22-13-22-32--polyglot-qwen-diff
Test cases
:
225
Model
:
Qwen2.5-Coder-32B-Instruct
Edit format
:
diff
Commit hash
:
6d7e8be-dirty
Pass rate 1
:
4.4
Pass rate 2
:
8.0
Pass num 1
:
10
Pass num 2
:
18
Percent cases well formed
:
71.6
Error outputs
:
158
Num malformed responses
:
148
Num with malformed responses
:
64
User asks
:
132
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
2
Total tests
:
225
Command
:
aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct # via hyperbolic
Date
:
2024-12-22
Versions
:
0.69.2.dev
Seconds per case
:
84.4
Total cost
:
0.0
▶
gemma-3-27b-it
4.9%
aider --model openrouter/google/gemma-3-27b-it
100.0%
whole
Dirname
:
2025-03-15-01-21-24--gemma3-27b-or
Test cases
:
225
Model
:
gemma-3-27b-it
Edit format
:
whole
Commit hash
:
fd21f51-dirty
Pass rate 1
:
1.8
Pass rate 2
:
4.9
Pass num 1
:
4
Pass num 2
:
11
Percent cases well formed
:
100.0
Error outputs
:
3
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
181
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
1
Test timeouts
:
3
Total tests
:
225
Command
:
aider --model openrouter/google/gemma-3-27b-it
Date
:
2025-03-15
Versions
:
0.77.1.dev
Seconds per case
:
79.7
Total cost
:
0.0
▶
gpt-4o-mini-2024-07-18
3.6%
$0.32
aider --model gpt-4o-mini-2024-07-18
100.0%
whole
Dirname
:
2024-12-21-18-41-18--polyglot-gpt-4o-mini
Test cases
:
225
Model
:
gpt-4o-mini-2024-07-18
Edit format
:
whole
Commit hash
:
a755079-dirty
Pass rate 1
:
0.9
Pass rate 2
:
3.6
Pass num 1
:
2
Pass num 2
:
8
Percent cases well formed
:
100.0
Error outputs
:
0
Num malformed responses
:
0
Num with malformed responses
:
0
User asks
:
36
Lazy comments
:
0
Syntax errors
:
0
Indentation errors
:
0
Exhausted context windows
:
0
Test timeouts
:
3
Total tests
:
225
Command
:
aider --model gpt-4o-mini-2024-07-18
Date
:
2024-12-21
Versions
:
0.69.2.dev
Seconds per case
:
17.3
Total cost
:
0.3236