Efficient attention lineage
Linear attention variants approximate softmax kernels while sparse patterns attend to nearest lexical neighbours sliding windows mixtures.
Staying atop this literature informs long-context adapters versus brute-force FlashAttention kernels engineering.