• by lxe on 1/17/2024, 6:10:37 PM

    A 6.7B model that's as good as GPT-4 is mostly due to overfitting in such a way that favors certain benchmarks.