Tags
- Accelerate 1
- Algorithm 1
- Assembly 3
- assembly, 1
- attack, 1
- Attention 2
- AutoDiff 1
- AutoRegressive 1
- awq 2
- Benchmark 1
- Binary Analysis 1
- book review 1
- Buffer Overflow 1
- c 1
- c, 1
- cmu15-213 2
- Code Injection 1
- codeLLM 1
- compiler 3
- Compression 5
- CoT 1
- csapp 3
- csapp, 1
- CUDA 5
- data clean 1
- Debugging 1
- Deep Learning 10
- DeepSeek 1
- DeepSeek-v2 1
- Deploy 2
- deployment 1
- development 1
- disassembly 1
- dl framework 2
- Evolution 1
- fastreid 2
- FlashAttention 1
- GDB 1
- GEMM 2
- gptq 3
- GPU 7
- High Performance Computing 1
- Inference 8
- Inference Optimization 1
- Kernel Development 2
- KV Cache 2
- LayerNorm 1
- Length Extrapolation 1
- Linear Algebra 1
- LLM 18
- Long Context 1
- Low-bit 1
- Math 1
- Mathematics 3
- Matrix Multiplication 1
- Memory Optimization 2
- MLA 1
- Model Architecture 1
- Model Optimization 3
- MoE 1
- MXFP4 1
- Neural Networks 1
- NVFP4 1
- NVIDIA 1
- objdump 1
- onnx 1
- operation system 1
- operation system, 1
- Optimization 1
- Parallel Computing 2
- parallel programming 3
- Performance 3
- Performance Optimization 1
- Positional Encoding 1
- pre-train 3
- Profile 1
- programming 1
- programming language 3
- project management 1
- PyTorch 1
- Quantization 9
- QuaRot 1
- reid 2
- Reverse Engineering 2
- ROP 1
- Rotate 1
- self-supervised learning 3
- SIFT 1
- Signal Processing 1
- SmoothQuant 1
- Softmax 2
- Speculative Sampling 1
- SpinQuant 1
- summary 2
- survey 1
- System Security 1
- tensorRT 1
- todo 1
- tool 1
- training recipe 1
- Transform 1
- Transformer 7
- tricks 1
- triton 6
- triton-tutorial 3
- Tutorial 3
- userOp 1
- utility 1
- vscode, 1
- W4A8KV4 1