Quadratic complexity

For length T naive attention multiplies quadratic memory and flops.

This catalysed sparse low-rank mixed-precision kernel approximants still dominating systems research headlines.