by boberoni on 1/4/2024, 1:24:43 AM
If you like textbooks, I would recommend "Programming Massively Parallel Processors" by Hwu, Kirk, and Hajj. [1] The most recent edition was published in 2022.
If you like lecture videos, I would recommend Hajj's YouTube playlist of 2021 lectures [2]. He works through a subset of the textbook.
This will give you a good foundation of GPU hardware architecture and CUDA programming. The knowledge is somewhat transferable to other areas of high-performance computing.
[1] https://www.amazon.com/Programming-Massively-Parallel-Proces...
[2] https://www.youtube.com/playlist?list=PLRRuQYjFhpmubuwx-w8X9...
by Const-me on 1/4/2024, 2:37:36 AM
For implementing stuff from scratch, if you use Windows you could try my C# based library for that: https://github.com/Const-me/Cgml/
It’s vendor agnostic, so HLSL instead of CUDA or Triton. Here’s the compute shaders implementing inference of Mistral-7B model: https://github.com/Const-me/Cgml/tree/master/Mistral/Mistral...
by Kon-Peki on 1/1/2024, 11:24:42 PM
For CUDA specifically, there is a fairly large set of sample code that used to be installed when you installed CUDA. But now I think it’s on the Nvidia GitHub page; you’ve got to download it yourself.
The Nvidia dev blog has some easy to follow tutorials, but they don’t get very complicated.
Nvidia also has a learning platform which offers fairly decent courses at a cost. You get a certificate for finishing.
You’ll find some books out there with good reputations. Ultimately, this is an area that leans heavily toward paying money for good quality learning materials.
by the__alchemist on 1/4/2024, 3:40:32 AM
Step 1: Identify a task in your project of choice that could benefit from it; ie SIMD.
Step 2: Figure out how to set up the FFI bindings if required for your project's language.
Step 3: Read this article to learn kernel syntax, block/thread/stride management etc: https://developer.nvidia.com/blog/even-easier-introduction-c...
Step 4: Ask ChatGPT to translate your code into modern C++, or perhaps even directly into Kernels
Don't bother with Vulkan compute and shaders etc. It works, but is high friction compared to CUDA.
by throwaway81523 on 1/4/2024, 5:43:40 AM
Disclosure: I've never done it. But I looked at some CUDA code from Leela Chess Zero and it made reasonable sense. It's just C++ with some slight changes. The GPU architecture is a little bit quirky but not that complicated either. Plus there are libraries like pytorch that handle most of the GPU stuff for you.
I would say ML concepts and algorithms are way more complicated than GPU programming per se. The fast.ai lectures were pretty understandable when I watched some of them a few years ago, but attention wasn't yet invented, and it was pretty obvious that it would take a fair amount of trial and error to become skilful at writing simple recognizers.
by Baldbvrhunter on 1/1/2024, 10:21:14 PM
It would also mean learning Julia, but you can write GPU kernels in Julia and then compile for NVidia CUDA, AMD ROCm or IBM oneAPI.
I've written CUDA kernels and I knew nothing about it going in.
I'd like to learn GPU programming but I'm having difficulty finding high-quality resources. I tried a class at coursera and was severely disappointed by both quality and content.
What are the best resources for learning things like GPU architecture, CUDA, Triton, etc?
My goal is to do be able to do something like take a description of Flash Attention and implement it from scratch, or optimize existing CUDA code.