AnalysisAI ModelsJune 30, 2026
Qwen 3.6 27B speculative decoding bench: ~100 TPS on RTX 3090
Benchmark reports ~100 tokens per second using speculative decoding on a single RTX 3090. Compares 5 inference engines across quants; llama.cpp forks lead in performance.