Joined 10/9/2014, 5:50:32 AM has 1168 karma
NoLiMa: Long-Context Evaluation Beyond Literal Matching
DeltaNet Explained
Mamba-Shedder: Post-Transformer Compression for Efficient SSMs
Reflections on 'The Bitter Lesson' (2021)
Theoretical limitations of multi-layer Transformer