2025-04-11
Sequence Accumulation and Beyond: Infinite Context Length on Single GPU and Large Clusters
2025-04-11 • Weigao Sun, Yongtuo Liu, Xian Tang, Xiaoyu Mo
Linear sequence modeling methods, such as linear attention, state space modeling, and linear RNNs, have recently been recognized as potential alternatives to softmax attention thanks to their linear complexity and competitive performance. However, although their linear-memory advantage during training enables dealing with long sequences, it is still hard to handle extremely long sequences with very limited computational resources. In this paper, we propose Sequence Accumulation (SA) which leverages the common recu…