Low-Bit MoE Quantization for Large Language Models
Comprehensive guide to quantizing large MoE models like DeepSeek-V3/R1, covering techniques for efficient memory usage and inference optimization.
Comprehensive guide to quantizing large MoE models like DeepSeek-V3/R1, covering techniques for efficient memory usage and inference optimization.