Encoder–decoder modelling

Before Transformers matured, recurrent or convolutional encoders distilled source sentences into latent states that decoders queried step by step. Attention replaced fixed-length bottlenecks with dynamic weighting.

The Transformer inherits that choreography but substitutes global self-attention blocks for recurrence inside each column.