LLM Quantization Review
This blog post provides an overview of the fundamental concepts of quantization, as well as a review of mainstream quantization methods in the context of LLMs.
This blog post provides an overview of the fundamental concepts of quantization, as well as a review of mainstream quantization methods in the context of LLMs.
Comprehensive summary of W4A8KV4 quantization techniques, covering KV4 and W4A8 optimization methods with practical recommendations.
This blog introduces KV Cache quantization in LLM inference.
This blog post compares SmoothQuant and AWQ differences and their code implementation.
This blog post delved into the code implementation of the GPTQ quantization process, using the Llama model as a case study.
This blog post traces the development of GPTQ, starting from its roots in OBD, through OBS, and finally to OBC.
Introduce some metrics for LLM inference benchmarking
记录一次找 TensorRT FP16 和 PyTorch 推理结果不一致的经历