Quantization

LLM Quantization Review

This blog post provides an overview of the fundamental concepts of quantization, as well as a review of mainstream quantization methods in the context of LLMs.

Spin to Win: The Power of Rotation in LLM Quantization

介绍 LLM 量化中的旋转技术以及相关的优化方案

MXFP4 and NVFP4

介绍 MXFP4 和 NVFP4 的区别

W4A8KV4 Quantization Summary and Best Practices

Comprehensive summary of W4A8KV4 quantization techniques, covering KV4 and W4A8 optimization methods with practical recommendations.

Low-Bit MoE Quantization for Large Language Models

Comprehensive guide to quantizing large MoE models like DeepSeek-V3/R1, covering techniques for efficient memory usage and inference optimization.

8-bit KV Cache

This blog introduces KV Cache quantization in LLM inference.

SmoothQuant and AWQ

This blog post compares SmoothQuant and AWQ differences and their code implementation.

GPTQ Code Implementation

This blog post delved into the code implementation of the GPTQ quantization process, using the Llama model as a case study.

GPTQ Math Derivation

This blog post traces the development of GPTQ, starting from its roots in OBD, through OBS, and finally to OBC.