Encoder-only LM pre-training

These objectives harvest massive unlabeled text creating contextual embeddings transferable via fine-tuning.

They depart from causal generation yet excel understanding-centric benchmarks.