• by chiwilliams on 6/1/2025, 3:54:21 AM

    Cool project! I have a couple of questions that would be nice in the writeup: * How did you generate your example problems? Did you take an existing benchmark? Or did you have LLMs generate the problems? * Do you have any thought to adding a second "base programming language" to alter? I'm not sure that there's enough variation as there is. (Another thought would be to generate 4 or 5 different new languages, each quite different, and then run the benchmark on each of those languages? I'm not sure how much the fact that it is randomly generated each time matters that much?)

    But overall, a clever idea!

  • by JSR_FDED on 6/1/2025, 5:17:27 AM

    Would it be useful to generate Procedural, OOP and Functional variations of the problems?