Hacker News Clone

SepLLM: Accelerate LLMs by Compressing One Segment into One Separator

by limoce on 3/3/2025, 1:27:26 PM with 2 comments

by kevmo314 on 3/6/2025, 4:05:00 AM
This paper seems like it misses the forest for the trees. The analysis is certainly interesting and the proposal sounds viable, sort of like a sliding window attention with a little more history.
But if it is true that the separators contribute the most towards the attention scores, wouldn't that imply that the tokenization scheme can be improved? Introducing a compression scheme seems like patching around that compared to if the model naturally generated a more random attention distribution.
by xp84 on 3/6/2025, 5:30:36 AM
Or, put another way:
'Why waste time say lot token when few token do trick?"
-Kevin Malone