AnalysisDevelopers
3 days ago
llama.cpp PR adds EAGLE3 for Qwen models
A pull request on llama.cpp adds speculative decoding with EAGLE3 for Qwen models. This is early work-in-progress, but aims to improve inference speed.
A pull request on llama.cpp adds speculative decoding with EAGLE3 for Qwen models. This is early work-in-progress, but aims to improve inference speed.