The the first successful large-scale implementation of linear attention - Lightning Attention by MiniMax!
This is a new architecture that's more scalable without draining your compute,
the best part it can go upto 4 billion tokens of context window!
I've tried my best to keep it simple and explain this new breakthrough, but if there's a mistake, please let me know i'll correct it.
I've also tried to use 3blue1brown's manim to animate some components, let me know how you feel visually - if it's helpful!
🔗 Links 🔗
https://filecdn.minimax.chat/_Arxiv_M...
❤️ If you want to support the channel ❤️
Support here:
Patreon - / 1littlecoder
Ko-Fi - https://ko-fi.com/1littlecoder
🧭 Follow me on 🧭
Twitter - / 1littlecoder