• by nahco314 on 2/14/2025, 5:46:47 PM

    I don't know exactly how OpenAI does it, but I think it's probably something like this: the OpenAI server that generates the responses reads and executes the ‘code that LLM wants to execute’ that exists in the LLM output in some way, and then passes the execution results as internal conversation history. As a concrete method, for example, the code to be executed could be enclosed in tags such as <run-code>, or it may be achieved using metadata at a lower layer.

    However, although GPT-4o should have a code execution function, it is possible that the code is not actually being executed and that the LLM is simply generating output that looks like (but is not actually) correct.