Quartz 4

❯

❯

GPT2 From Scratch

❯

Multi head Latent Attention

Multi-head Latent Attention

Aug 01, 20251 min read

Main Idea

Modern llm facing communication bottlenecks on current hardware ; not computational limit
Compressing the key-value cache using low-rank mat

Graph View

Created with Quartz v4.5.1 © 2025

GitHub
Discord Community