Attention

DeepSeek-v2 In a Nutshell - Multi-Head Latent Attention

Understanding DeepSeek-v2’s MLA architecture and its solutions to KV cache memory challenges in long context LLM inference.

Deep dive into the mathematical foundations of flash attention, from softmax fundamentals to efficient kernel implementation.