AnalysisAI Models
21 hours ago
ParallelKernelBench: frontier LLMs struggle with fast multi-GPU CUDA kernels
The benchmark spans 87 real-world workloads, with the best model solving under a third. However, a few generated kernels outperform any existing public implementation.
21 hours ago
