Enthusiast runs 1-trillion-parameter LLM on single GPU with 768GB Optane memory

AnalysisAI Models

17 days ago

Enthusiast runs 1-trillion-parameter LLM on single GPU with 768GB Optane memory

A user achieved ~4 tokens/second running Kimi K2.5 locally using 768GB of cheap Intel Optane DIMMs with a single GPU. The setup demonstrates a cost-effective way to run massive models via memory expansion.

··Discuss

17 days ago