DeepSeek-v2 In a Nutshell - Multi-Head Latent Attention
Understanding DeepSeek-v2’s MLA architecture and its solutions to KV cache memory challenges in long context LLM inference.
Understanding DeepSeek-v2’s MLA architecture and its solutions to KV cache memory challenges in long context LLM inference.
This blog introduces KV Cache quantization in LLM inference.