Resources
Attention Is All You Need
"Attention Is All You Need" is a seminal 2017 paper by Vaswani et al., introducing the Transformer architecture. It replaces recurrence with self-attention, enabling parallelization and significantly improving training efficiency.
Read the PaperEvaluating Large Language Models Trained on Code
This paper evaluates several large language models trained on code, such as Codex and GPT-Neo, and compares their capabilities across various programming tasks, providing insights into the effectiveness of code-specific pretraining.
Read the PaperStarCoder: may the source be with you!
StarCoder is an open-weight model trained on permissively licensed code from GitHub. It supports code completion, infilling, and other generation tasks, and is a strong contender in open LLM research for software engineering.
Read the PaperFast Transformer Decoding: One Write-Head is All You Need
This paper proposes an efficient Transformer decoding strategy that uses a single write-head. The method enables faster inference without compromising performance, offering practical improvements for real-time applications.
Read the PaperInCoder: A Generative Model for Code Infilling and Synthesis
InCoder is a unified autoregressive model for code synthesis and infilling. Unlike traditional left-to-right models, InCoder supports flexible editing and is effective at filling in missing code spans with high-quality completions.
Read the Paper