• by ilaksh on 10/4/2024, 3:39:29 PM

    Your post would make more sense to me if you were specific about the models. It's like if you were asking about how to get reliable transportation from a car and didn't specify which model of cars you were considering.

    o1-preview seems to be a step up from Claude 3.5 Sonnet.

    There are many open source coding LLMs that for complex tasks will be a joke compared to the SOTA closed ones.

    I think that there are two strategies that can work: 1) constrain the domain to a particular framework and provide good documentation and examples in the prompts for it, and 2) create an error-correcting feedback loop where compilation/static analysis and runtime errors or failed tests are fed back to the model automatically.