AnalysisAI Models
14 days ago
EAGLE 3.1 fixes attention drift in LLM speculative decoding
EAGLE 3.1 introduces a fix for attention drift in speculative decoding, improving LLM inference speed. The method uses a small draft model to propose tokens verified in parallel by the target model.
·
14 days ago
