Skip to the content.

The January GPT-4 Turbo is lazier than the last version

benchmark results

OpenAI just released a new version of GPT-4 Turbo. This new model is intended to reduce the “laziness” that has been widely observed with the previous gpt-4-1106-preview model:

Today, we are releasing an updated GPT-4 Turbo preview model, gpt-4-0125-preview. This model completes tasks like code generation more thoroughly than the previous preview model and is intended to reduce cases of “laziness” where the model doesn’t complete a task.

With that in mind, I’ve been benchmarking the new model using aider’s existing lazy coding benchmark.

Benchmark results

Overall, the new gpt-4-0125-preview model seems lazier than the November gpt-4-1106-preview model:

This is one in a series of reports that use the aider benchmarking suite to assess and compare the code editing capabilities of OpenAI’s GPT models. You can review the other reports for additional information: