LayerNorm Mathematical Derivation and Implementation

Comprehensive mathematical derivation of LayerNorm forward and backward passes, including PyTorch implementation details.

April 10, 2024 · 4 min · Sherlock

Triton Tutorial #2

third blogpost of triton tutorial series, gemm and autotune.

September 20, 2023 · 9 min · Sherlock

Triton Tutorial #1

second blogpost of triton tutorial series, fused softmax, debug and benchmarking it.

September 5, 2023 · 7 min · Sherlock

Triton Tutorial #0

first blogpost of triton tutorial series, triton introduction, installation and vector-add example

September 2, 2023 · 5 min · Sherlock

如何做 continued pre-train

介绍一下 continued pre-train

July 4, 2023 · 2 min · Sherlock

如何 Debug PyTorch 和 TensorRT FP16 diff

记录一次找 TensorRT FP16 和 PyTorch 推理结果不一致的经历

February 28, 2023 · 2 min · Sherlock

VSCode 配置最舒适的深度学习开发环境

记录配置和使用 VSCode 的流程,演示一些深度学习中 debug 的例子

February 19, 2022 · 4 min · Sherlock

如何在 OneFlow 中开发一个新的 UserOp

记录在 oneflow 中开发 userOp 的流程以及中间遇到的一些问题

November 18, 2021 · 4 min · Sherlock

AutoDiff 介绍以及简单的代码实现

从数学和实现的角度解释 AutoDiff 的原理,给出一个简单的代码实现

November 10, 2021 · 8 min · Sherlock

L2 regularization 和 weight decay

L2 reg 和 weight decay 的区别和联系

November 5, 2021 · 2 min · Sherlock