AnalysisAI ModelsJune 30, 2026

Qwen 3.6 27B speculative decoding bench: ~100 TPS on RTX 3090

Benchmark reports ~100 tokens per second using speculative decoding on a single RTX 3090. Compares 5 inference engines across quants; llama.cpp forks lead in performance.

1 source

Qwen 3.6 27B Speculative Decoding Bench: Pushing ~100 TPS on a single RTX 3090reddit.com

Back to the feed