• by AeZ1E on 8/22/2024, 2:25:06 PM

    wait, pruning and distillation in practice? sounds like my kind of gardening technique! but seriously, compressing models to 4B and 8B parameters using depth pruning and joint hidden/attention/MLP (width) pruning? that's some next-level data dieting right there, count me in for a taste test!