The Distillery

Parallel Reduction Optimization with CUDA

A step-by-step guide to optimizing parallel reduction operations using CUDA, from basic implementation to advanced optimization techniques.

Introduce some metrics for LLM inference benchmarking

Introduce some basic concepts of Position Encoding, RoPE and length extrapolation related it.

一个偏综述的文章，总结 codeLLM 相关 paper 从 data collection 到 training 中间的一些细节

介绍一下最近看到的两篇关于 SIFT 数据相关的非常好的论文 WizardLM(WizardCoder) 和 Ocra，以及我对这个问题的一些思考

记录我如何使用番茄工作法进行提升效率，以及一些使用过程中的心路历程

介绍一下 continued pre-train

记录一次找 TensorRT FP16 和 PyTorch 推理结果不一致的经历

Understanding buffer overflow attacks through CSAPP lab exercises, covering code injection and return-oriented programming techniques.

记录在阅读和学习 CSAPP 过程中，完成 AttackLab 的相关内容