Transformer (Self-Attention)

Ashish Vaswani et al., 2017

O(n²·d)

Introduced in "Attention Is All You Need" by Vaswani et al. in 2017, the Transformer replaced recurrence with self-attention. For each token, Query, Key, and Value vectors are computed. Attention weights are derived from dot products of Q and K, then used to produce a weighted sum of V vectors. The visualization shows the attention computation across tokens with a heatmap of attention weights, where darker cells indicate stronger attention between token pairs.