DeepSeek-v2 In a Nutshell - Multi-Head Latent Attention
Understanding DeepSeek-v2’s MLA architecture and its solutions to KV cache memory challenges in long context LLM inference.
Understanding DeepSeek-v2’s MLA architecture and its solutions to KV cache memory challenges in long context LLM inference.
Deep dive into the mathematical foundations of flash attention, from softmax fundamentals to efficient kernel implementation.