Inductive bias trade-offs

Recurrent architectures bake in Markov-esque locality conducive to noisy low-data regimes but resist parallel batching.

Self-attention flexibly routes information yet needs regularisation curricula or enormous corpora—why hybrid models occasionally resurface.