AnalysisAI Models
19 hours ago
Speculative decoding explained: how draft models speed up AI agents
Speculative decoding uses a small draft model to propose tokens and a larger model to verify them. This technique reduces inference latency without sacrificing output quality.
19 hours ago
