Qwen3.6-35B-A3B runs at +30 tps on 8GB GPU with Q4 quantization

AnalysisAI Models

25 days ago

Qwen3.6-35B-A3B runs at +30 tps on 8GB GPU with Q4 quantization

Reddit user reports running Qwen3.6-35B-A3B with Q4 quantization and 262k context on an 8GB RTX 3070 Ti at +30 tokens per second. Context can be pushed to 1M but slows beyond 150k.

25 days ago