AnalysisAI Models
27 days ago
DeepSeek-V4 runs locally on 4x RTX 2080 Ti with custom Turing kernels
284B-parameter DeepSeek-V4-Flash runs on $2,500 consumer hardware using custom Turing kernels and W8A8 quantization, achieving 255 tokens/s prefill. The setup demonstrates viability of frontier MoE models on legacy GPUs.
·
27 days ago