Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

AnalysisDevelopers

9 days ago

Luce Spark: a 35B MoE on a 16 GB GPU, without the offload tax

The technique enables running 33-35B MoE models on 16GB GPUs, e.g., Qwen3.6 35B-A3B memory drops from ~20.5 GiB to 13.3 GiB. It loads only necessary expert modules, avoiding offloading overhead and making large MoEs feasible on consumer hardware.

9 days ago