Hacker News Clone

How we built the most efficient inference engine for Cloudflare's network

by jgrahamc on 8/27/2025, 4:22:44 PM with 1 comments

by Freedom5093 on 8/28/2025, 9:52:56 AM
I don't understand:
> all of the prompt tokens are available in advance and do not require decoding
> The other technique is called batching: this technique aggregates multiple prompts into a single decode operation.
So do prompts get decoded or not? Are there 2 decode steps? Unclear