Skip to the content.

Speed benchmarks of GPT-4 Turbo and gpt-3.5-turbo-1106

Nov 6, 2023

benchmark results

OpenAI just released new versions of GPT-3.5 and GPT-4, and there’s a lot of interest about their capabilities and performance. With that in mind, I’ve been benchmarking the new models.

Aider is an open source command line chat tool that lets you work with GPT to edit code in your local git repo. Aider relies on a code editing benchmark to quantitatively evaluate performance.

This is the latest in a series of reports that use the aider benchmarking suite to assess and compare the code editing capabilities of OpenAI’s GPT models. You can review previous reports to get more background on aider’s benchmark suite:

Speed

This report compares the speed of the various GPT models. Aider’s benchmark measures the response time of the OpenAI chat completion endpoint each time it asks GPT to solve a programming exercise in the benchmark suite. These results measure only the time spent waiting for OpenAI to respond to the prompt. So they are measuring how fast these models can generate responses which primarily consist of source code.

Some observations:

Updates

Last updated 11/14/23. OpenAI has relaxed rate limits so these results are no longer considered preliminary.