• Top
  • New

Gemma 3n Architectural Innovations – Speculation and poking around in the model

by nolist_policy on 5/25/2025, 7:08:34 PM with 7 comments
  • by impossiblefork on 5/25/2025, 9:40:53 PM

    I think this is very interesting. Especially the per-layer embedding things.

    Having more than one embedding is something I've tried myself, but not separate ones for each layer.

    I'm guessing it's something like h_{l+1} = MultiHeadSelfAttentionWithPositionEncodingBakedIn(MLP(h_l) + embed_l(token_ids)). So it's probably really easy to implement on toy problems to see if it works.

  • by krackers on 5/31/2025, 8:07:29 PM

    More in https://twitter.com/antimatter15/status/1926459086352142663#...

  • by limoce on 5/26/2025, 2:15:50 AM

    > https://preview.redd.it/wca7kzfq5w2f1.png?width=1190&format=...

    "4x gated residual streams" look quite weird. Is there any paper or technique report for this?

  • by 3abiton on 5/26/2025, 12:27:53 AM

    While PLE is quite innovative, the interesting part is they released their [apk on github](https://github.com/google-ai-edge/gallery), compared to linking it to play store. Interesting choice.