July 01, 2024

Sonnet is the opposite of lazy

Claude 3.5 Sonnet represents a step change in AI coding. It is incredibly industrious, diligent and hard working. Unexpectedly, this presented a challenge: Sonnet was often writing so much code that it was hitting the 4k output token limit, truncating its coding in mid-stream.

Aider now works around this 4k limit and allows Sonnet to produce as much code as it wants. The result is surprisingly powerful. Sonnet’s score on aider’s refactoring benchmark jumped from 55.1% up to 64.0%. This moved Sonnet into second place, ahead of GPT-4o and behind only Opus.

Users who tested Sonnet with a preview of aider’s latest release were thrilled:

Works like a charm. It is a monster. It refactors files of any size like it is nothing. The continue trick with Sonnet is truly the holy grail. Aider beats [other tools] hands down. I’m going to cancel both subscriptions. – Emasoft
Thanks heaps for this feature - it’s a real game changer. I can be more ambitious when asking Claude for larger features. – cngarrison
Fantastic…! It’s such an improvement not being constrained by output token length issues. [I refactored] a single JavaScript file into seven smaller files using a single Aider request. – John Galt

Hitting the 4k token output limit

All LLMs have various token limits, the most familiar being their context window size. But they also have a limit on how many tokens they can output in response to a single request. Sonnet and the majority of other models are limited to returning 4k tokens.

Sonnet’s amazing work ethic caused it to regularly hit this 4k output token limit for a few reasons:

Sonnet is capable of outputting a very large amount of correct, complete new code in one response.
Similarly, Sonnet can specify long sequences of edits in one go, like changing a majority of lines while refactoring a large file.
Sonnet tends to quote large chunks of a file when performing a SEARCH & REPLACE edits. Beyond token limits, this is very wasteful.

Good problems

Problems (1) and (2) are “good problems” in the sense that Sonnet is able to write more high quality code than any other model! We just don’t want it to be interrupted prematurely by the 4k output limit.

Aider now allows Sonnet to return code in multiple 4k token responses. Aider seamlessly combines them so that Sonnet can return arbitrarily long responses. This gets all the upsides of Sonnet’s prolific coding skills, without being constrained by the 4k output token limit.

Wasting tokens

Problem (3) is more complicated, as Sonnet isn’t just being stopped early – it’s actually wasting a lot of tokens, time and money.

Faced with a few small changes spread far apart in a source file, Sonnet would often prefer to do one giant SEARCH/REPLACE operation of almost the entire file. It would be far faster and less expensive to instead do a few surgical edits.

Aider now prompts Sonnet to discourage these long-winded SEARCH/REPLACE operations and promotes much more concise edits.

Aider with Sonnet

The latest release of aider has specialized support for Claude 3.5 Sonnet:

Aider allows Sonnet to produce as much code as it wants, by automatically and seamlessly spreading the response out over a sequence of 4k token API responses.
Aider carefully prompts Sonnet to be concise when proposing code edits. This reduces Sonnet’s tendency to waste time, tokens and money returning large chunks of unchanging code.
Aider now uses Claude 3.5 Sonnet by default if the ANTHROPIC_API_KEY is set in the environment.

See aider’s install instructions for more details, but you can get started quickly with aider and Sonnet like this:

$ python -m pip install -U aider-chat

$ export ANTHROPIC_API_KEY=<key> # Mac/Linux
$ setx   ANTHROPIC_API_KEY <key> # Windows, restart shell after setx

$ aider