Kernel Development

From Softmax to FlashAttention

Deep dive into the mathematical foundations of flash attention, from softmax fundamentals to efficient kernel implementation.

Learn to write a fused softmax kernel in Triton, with debugging and performance benchmarking techniques.