DeepSeek-V4 runs locally on 4x RTX 2080 Ti with custom Turing kernels

AnalysisAI Models

27 days ago

DeepSeek-V4 runs locally on 4x RTX 2080 Ti with custom Turing kernels

284B-parameter DeepSeek-V4-Flash runs on $2,500 consumer hardware using custom Turing kernels and W8A8 quantization, achieving 255 tokens/s prefill. The setup demonstrates viability of frontier MoE models on legacy GPUs.

27 days ago