Tutorial

Triton Tutorial - Series 2

Build a high-performance matrix multiplication kernel in Triton that rivals cuBLAS performance with step-by-step optimization.

Learn to write a fused softmax kernel in Triton, with debugging and performance benchmarking techniques.

Introduction to Triton programming language, installation guide, and vector-addition example to get started with GPU kernel development.