From Softmax to FlashAttention

Deep dive into the mathematical foundations of flash attention, from softmax fundamentals to efficient kernel implementation.

March 20, 2024 · 8 min · Sherlock