Hacker News Clone

The Path to StyleGan2 – Implementing the Progressive Growing GAN

by Two_hands on 8/4/2024, 9:36:29 PM with 13 comments

by godelski on 8/5/2024, 10:37:57 PM
I know GANs aren't all the rage now, but if you're interested in ML, they should not be overlooked.
We still use GANs a lot. They're way faster than diffusion models. Good luck getting a diffusion model to perform upscaling and denoising on a real time video call. I'm sure we'll get there, but right now you can do this with a GAN on cheap consumer hardware. You don't need a 4080, DLSS was released with the 20 series cards. They are just naturally computationally cheaper, but yeah, they do have trade-offs (though arguable since ML goes through hype phases and everyone jumps ship from one thing to another and few revisit. But when revisits happen, they tend to be competitive. See ResNets Strike Back for even CNNs vs ViTs. But there's more nuance here).
There is a reason your upscaling model is a GAN. Sure, diffusion can do this too. But why is everyone using ESRGAN? There's a reason for this.
Also, I think it is important to remember that GAN is really about a technique, not about generating images. You have a model generating things, and another model telling you something is a good output or not. LLM people... does this sound familiar?
To the author: I think it is worth pointing to Tero Karras's Nivida page. This group defined the status quo of GANs. You'll find that the vast majority of GAN research built off of their research. As quite a large portion of are literal forks. Though a fair amount of this is due to the great optimization they did, with custom cuda kernels (this is not the limiting compute factor in diffusion). https://research.nvidia.com/person/tero-karras