LayerNorm Mathematical Derivation and Implementation

Comprehensive mathematical derivation of LayerNorm forward and backward passes, including PyTorch implementation details.

April 10, 2024 · 4 min · Sherlock

From Softmax to FlashAttention

Deep dive into the mathematical foundations of flash attention, from softmax fundamentals to efficient kernel implementation.

March 20, 2024 · 8 min · Sherlock

Fast Hadamard Transform

A comprehensive guide to the Fast Hadamard Transform, its mathematical foundations, and practical implementation with code examples.

February 15, 2024 · 5 min · Sherlock