• by boberoni on 1/4/2024, 1:24:43 AM

    If you like textbooks, I would recommend "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj. [1] The most recent edition was published in 2022.

    If you like lecture videos, I would recommend Hajj's YouTube playlist of 2021 lectures [2]. He works through a subset of the textbook.

    This will give you a good foundation of GPU hardware architecture and CUDA programming. The knowledge is somewhat transferable to other areas of high-performance computing.

    [1] https://www.amazon.com/Programming-Massively-Parallel-Proces...

    [2] https://www.youtube.com/playlist?list=PLRRuQYjFhpmubuwx-w8X9...

  • by Const-me on 1/4/2024, 2:37:36 AM

    For implementing stuff from scratch, if you use Windows you could try my C# based library for that: https://github.com/Const-me/Cgml/

    It’s vendor agnostic, so HLSL instead of CUDA or Triton. Here’s the compute shaders implementing inference of Mistral-7B model: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral...

  • by Kon-Peki on 1/1/2024, 11:24:42 PM

    For CUDA specifically, there is a fairly large set of sample code that used to be installed when you installed CUDA. But now I think it’s on the Nvidia GitHub page; you’ve got to download it yourself.

    The Nvidia dev blog has some easy to follow tutorials, but they don’t get very complicated.

    Nvidia also has a learning platform which offers fairly decent courses at a cost. You get a certificate for finishing.

    You’ll find some books out there with good reputations. Ultimately, this is an area that leans heavily toward paying money for good quality learning materials.

  • by the__alchemist on 1/4/2024, 3:40:32 AM

    Step 1: Identify a task in your project of choice that could benefit from it; ie SIMD.

    Step 2: Figure out how to set up the FFI bindings if required for your project's language.

    Step 3: Read this article to learn kernel syntax, block/thread/stride management etc: https://developer.nvidia.com/blog/even-easier-introduction-c...

    Step 4: Ask ChatGPT to translate your code into modern C++, or perhaps even directly into Kernels

    Don't bother with Vulkan compute and shaders etc. It works, but is high friction compared to CUDA.

  • by throwaway81523 on 1/4/2024, 5:43:40 AM

    Disclosure: I've never done it. But I looked at some CUDA code from Leela Chess Zero and it made reasonable sense. It's just C++ with some slight changes. The GPU architecture is a little bit quirky but not that complicated either. Plus there are libraries like pytorch that handle most of the GPU stuff for you.

    I would say ML concepts and algorithms are way more complicated than GPU programming per se. The fast.ai lectures were pretty understandable when I watched some of them a few years ago, but attention wasn't yet invented, and it was pretty obvious that it would take a fair amount of trial and error to become skilful at writing simple recognizers.

  • by Baldbvrhunter on 1/1/2024, 10:21:14 PM

    It would also mean learning Julia, but you can write GPU kernels in Julia and then compile for NVidia CUDA, AMD ROCm or IBM oneAPI.

    https://juliagpu.org/

    I've written CUDA kernels and I knew nothing about it going in.