AnalysisDevelopers
9 days ago
Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax
The technique enables running 33-35B MoE models on 16GB GPUs, e.g., Qwen3.6 35B-A3B memory drops from ~20.5 GiB to 13.3 GiB. It loads only necessary expert modules, avoiding offloading overhead and making large MoEs feasible on consumer hardware.
·
9 days ago
