Encoder–decoder modelling
Before Transformers matured, recurrent or convolutional encoders distilled source sentences into latent states that decoders queried step by step. Attention replaced fixed-length bottlenecks with dynamic weighting.
The Transformer inherits that choreography but substitutes global self-attention blocks for recurrence inside each column.