Quartz 4

❯

❯

GPT2 From Scratch

❯

08. Positional Encoding

08. Positional Encoding

Aug 01, 20251 min read

Attention itself does not have information of word orders in sequence. Need positional information

The encoder must be

Uniquely identifies each position that is likely to be observed
Bounded scale [-1,1]→ why?
Easy to reason computationally about time deltas( easily identify if tow encodings are for positions some fixed distance apart) → care about the relative position
Generalizes well to longer sequence

Graph View

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community