Transformer

LLM Quantization Review

This blog post provides an overview of the fundamental concepts of quantization, as well as a review of mainstream quantization methods in the context of LLMs.

8-bit KV Cache

This blog introduces KV Cache quantization in LLM inference.

SmoothQuant and AWQ

This blog post compares SmoothQuant and AWQ differences and their code implementation.

GPTQ Code Implementation

This blog post delved into the code implementation of the GPTQ quantization process, using the Llama model as a case study.

GPTQ Math Derivation

This blog post traces the development of GPTQ, starting from its roots in OBD, through OBS, and finally to OBC.

Benchmark for LLM Inference

Introduce some metrics for LLM inference benchmarking

RoPE and Length Scaling

Introduce some basic concepts of Position Encoding, RoPE and length extrapolation related it.