Muon Is Scalable for LLM Training
by yorwba on 2/25/2025, 5:40:29 AM
For people who want to know more about the Muon optimizer: https://kellerjordan.github.io/posts/muon/
For people who want to know more about the Muon optimizer: https://kellerjordan.github.io/posts/muon/