Triton Tutorial - Series 2

Build a high-performance matrix multiplication kernel in Triton that rivals cuBLAS performance with step-by-step optimization.

September 20, 2023 · 9 min · Sherlock

Triton Tutorial #2

third blogpost of triton tutorial series, gemm and autotune.

September 20, 2023 · 9 min · Sherlock

Triton Tutorial - Series 1

Learn to write a fused softmax kernel in Triton, with debugging and performance benchmarking techniques.

September 5, 2023 · 7 min · Sherlock

Triton Tutorial #1

second blogpost of triton tutorial series, fused softmax, debug and benchmarking it.

September 5, 2023 · 7 min · Sherlock

Triton Tutorial - Series 0

Introduction to Triton programming language, installation guide, and vector-addition example to get started with GPU kernel development.

September 2, 2023 · 5 min · Sherlock

Triton Tutorial #0

first blogpost of triton tutorial series, triton introduction, installation and vector-add example

September 2, 2023 · 5 min · Sherlock

Parallel Reduction Optimization with CUDA

A step-by-step guide to optimizing parallel reduction operations using CUDA, from basic implementation to advanced optimization techniques.

August 27, 2023 · 9 min · Sherlock