Quadratic complexity
For length T naive attention multiplies quadratic memory and flops.
This catalysed sparse low-rank mixed-precision kernel approximants still dominating systems research headlines.
For length T naive attention multiplies quadratic memory and flops.
This catalysed sparse low-rank mixed-precision kernel approximants still dominating systems research headlines.