• by wolecki on 5/13/2025, 3:31:33 PM

    A fast Trusted Execution Environment protocol utilizing the H100 confidential mode. Prompts are decrypted and processed in the GPU enclave. Key innovation is that it can be really fast especially on ≥10B parameter models, with latency overhead less than 1%. Like in CPU confidential cloud computing, that opens up a channel for communication with cloud GenAI models that even the provider cannot intercept. Wonder whether something like that could boost the trust for all the AI neoclouds out there.

  • by danbiderman on 5/13/2025, 3:43:15 PM

    author here!