Speculative Sampling for Faster LLM Inference

Deep dive into speculative sampling techniques for accelerating LLM inference through draft model prediction and rejection sampling.

June 20, 2024 · 3 min · Sherlock