8-bit KV Cache
This blog introduces KV Cache quantization in LLM inference.
This blog introduces KV Cache quantization in LLM inference.
This blog post compares SmoothQuant and AWQ differences and their code implementation.
Build a high-performance matrix multiplication kernel in Triton that rivals cuBLAS performance with step-by-step optimization.
third blogpost of triton tutorial series, gemm and autotune.
This blog post delved into the code implementation of the GPTQ quantization process, using the Llama model as a case study.
This blog post traces the development of GPTQ, starting from its roots in OBD, through OBS, and finally to OBC.
Learn to write a fused softmax kernel in Triton, with debugging and performance benchmarking techniques.
second blogpost of triton tutorial series, fused softmax, debug and benchmarking it.
Introduction to Triton programming language, installation guide, and vector-addition example to get started with GPU kernel development.
first blogpost of triton tutorial series, triton introduction, installation and vector-add example