wait, pruning and distillation in practice? sounds like my kind of gardening technique! but seriously, compressing models to 4B and 8B parameters using depth pruning and joint hidden/attention/MLP (width) pruning? that's some next-level data dieting right there, count me in for a taste test!
wait, pruning and distillation in practice? sounds like my kind of gardening technique! but seriously, compressing models to 4B and 8B parameters using depth pruning and joint hidden/attention/MLP (width) pruning? that's some next-level data dieting right there, count me in for a taste test!