No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM
by froster on 7/25/2023, 5:32:12 AM
Recent paper highlights the difficulty of creating a new optimizer as drop-in replacement. Sophia and Lion were recently proposed as superior alternatives to Adam, but appeared worse in an independent eval
Recent paper highlights the difficulty of creating a new optimizer as drop-in replacement. Sophia and Lion were recently proposed as superior alternatives to Adam, but appeared worse in an independent eval