2025  2

December  1

Spin to Win: The Power of Rotation in LLM Quantization

December 2, 2025 · 1 min · Sherlock

August  1

MXFP4 and NVFP4

August 28, 2025 · 2 min · Sherlock

2024  8

August  1

W4A8KV4 Quantization Summary and Best Practices

August 30, 2024 · 6 min · Sherlock

July  1

Low-Bit MoE Quantization for Large Language Models

July 25, 2024 · 3 min · Sherlock

June  1

Speculative Sampling for Faster LLM Inference

June 20, 2024 · 3 min · Sherlock

May  1

DeepSeek-v2 In a Nutshell - Multi-Head Latent Attention

May 15, 2024 · 2 min · Sherlock

April  1

LayerNorm Mathematical Derivation and Implementation

April 10, 2024 · 4 min · Sherlock

March  1

From Softmax to FlashAttention

March 20, 2024 · 8 min · Sherlock

February  1

Fast Hadamard Transform

February 15, 2024 · 5 min · Sherlock

January  1

8-bit KV Cache

January 24, 2024 · 8 min · Sherlock

2023  18

October  2

SmoothQuant and AWQ

October 8, 2023 · 12 min · Sherlock

LLM Quantization Review

October 2, 2023 · 11 min · Sherlock

September  8

Triton Tutorial - Series 2

September 20, 2023 · 9 min · Sherlock

Triton Tutorial #2

September 20, 2023 · 9 min · Sherlock

GPTQ Code Implementation

September 18, 2023 · 14 min · Sherlock

GPTQ Math Derivation

September 9, 2023 · 7 min · Sherlock

Triton Tutorial - Series 1

September 5, 2023 · 7 min · Sherlock

Triton Tutorial #1

September 5, 2023 · 7 min · Sherlock

Triton Tutorial - Series 0

September 2, 2023 · 5 min · Sherlock

Triton Tutorial #0

September 2, 2023 · 5 min · Sherlock

August  3

Parallel Reduction Optimization with CUDA

August 27, 2023 · 9 min · Sherlock

Benchmark for LLM Inference

August 20, 2023 · 5 min · Sherlock

RoPE and Length Scaling

August 10, 2023 · 8 min · Sherlock

July  4

CodeLLM Training Recipe

July 26, 2023 · 8 min · Sherlock

WizardLM(Coder) 和 Ocra 的一些理解

July 22, 2023 · 6 min · Sherlock

我是如何使用番茄工作法的

July 10, 2023 · 1 min · Sherlock

如何做 continued pre-train

July 4, 2023 · 2 min · Sherlock

February  1

如何 Debug PyTorch 和 TensorRT FP16 diff

February 28, 2023 · 2 min · Sherlock

2022  5

March  2

CSAPP Attack Lab - Code Injection and ROP Attacks

March 15, 2022 · 8 min · Sherlock

CSAPP 之 Attack Lab

March 15, 2022 · 8 min · Sherlock

February  3

VSCode 配置最舒适的深度学习开发环境

February 19, 2022 · 4 min · Sherlock

CSAPP Binary Bomb Lab - Reverse Engineering Challenge

February 13, 2022 · 13 min · Sherlock

CSAPP 之 Bomb Lab

February 13, 2022 · 13 min · Sherlock

2021  7

November  3

如何在 OneFlow 中开发一个新的 UserOp

November 18, 2021 · 4 min · Sherlock

AutoDiff 介绍以及简单的代码实现

November 10, 2021 · 8 min · Sherlock

L2 regularization 和 weight decay

November 5, 2021 · 2 min · Sherlock

June  1

《黑客与画家》读后感

June 2, 2021 · 1 min · Sherlock

May  2

深度学习中的分布式并行介绍

May 16, 2021 · 5 min · Sherlock

Study Less, Study Smart

May 9, 2021 · 1 min · Sherlock

April  1

FastReID V1.0: Beyond reID

April 28, 2021 · 3 min · Sherlock

2020  3

May  1

FastReID: 一个面向学术界和工业界的 ReID Toolbox

May 29, 2020 · 3 min · Sherlock

February  2

Self-Supervised Learning 入门介绍

February 20, 2020 · 3 min · Sherlock

A Simple Framework for Contrastive Learning of Visual Representations" 阅读笔记

February 15, 2020 · 1 min · Sherlock