Quantization matters

Open source models like Qwen 2.5 32B Instruct are performing very well on aider’s code editing benchmark, rivaling closed source frontier models. But pay attention to how your model is being quantized, as it can strongly impact code editing skill. Heavily quantized models are often used by cloud API providers and local model servers like Ollama.

The graph above compares 4 different versions of the Qwen 2.5 Coder 32B Instruct model, served both locally and from cloud providers.

The best version of the model rivals GPT-4o, while the worst performer is more like GPT-3.5 Turbo level.

Model Percent completed correctly Percent using correct edit format Command Edit format
HuggingFace BF16 via glhf.chat 71.4% 94.7% aider --model openai/hf:Qwen/Qwen2.5-Coder-32B-Instruct --openai-api-base https://glhf.chat/api/openai/v1 diff
Hyperbolic Qwen2.5-Coder-32B-Instruct BF16 69.2% 91.7% aider --model openai/Qwen/Qwen2.5-Coder-32B-Instruct --openai-api-base https://api.hyperbolic.xyz/v1/ diff
openrouter/qwen/qwen-2.5-coder-32b-instruct (mixed quants) 65.4% 84.2% aider --model openrouter/qwen/qwen-2.5-coder-32b-instruct diff
qwen2.5-coder:32b-instruct-q4_K_M 53.4% 44.4% aider --model ollama/qwen2.5-coder:32b-instruct-q4_K_M diff

Choosing providers with OpenRouter

OpenRouter allows you to ignore specific providers in your preferences. This can be effective to exclude highly quantized or otherwise undesirable providers.

The original version of this article included incorrect Ollama models that were not Qwen 2.5 Coder 32B Instruct.