llama.cpp adds EAGLE speculative decoding support

LaunchDevelopers

Jun 14, 10:45 PM

llama.cpp adds EAGLE speculative decoding support

EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency) is now merged into llama.cpp, enabling faster text generation with reduced compute. The technique uses a lightweight draft model to predict multiple tokens, achieving up to 3x speedup on some benchmarks.

··Discuss

Jun 14, 10:45 PM