by ilaksh on 10/4/2024, 3:39:29 PM
Your post would make more sense to me if you were specific about the models. It's like if you were asking about how to get reliable transportation from a car and didn't specify which model of cars you were considering.
o1-preview seems to be a step up from Claude 3.5 Sonnet.
There are many open source coding LLMs that for complex tasks will be a joke compared to the SOTA closed ones.
I think that there are two strategies that can work: 1) constrain the domain to a particular framework and provide good documentation and examples in the prompts for it, and 2) create an error-correcting feedback loop where compilation/static analysis and runtime errors or failed tests are fed back to the model automatically.
I'm particularly interested in people using LLM APIs, where code is consumed programmatically.
I've been using LLMs a lot lately to generate code, and code quality is a mixed bag. Sometimes it will run straight out of the box or with a few manual tweaks, and others it just straight up won't compile. Keen to hear what workarounds others have used to solve this (e.g. re-prompting, constraining generations, etc).