NVIDIA NVFP4 recipe boosts pretraining speed on Blackwell

How-ToDevelopersAI Models

9 days ago

NVIDIA NVFP4 recipe boosts pretraining speed on Blackwell

NVFP4 4-bit mixed-precision training delivers 7x GEMM throughput vs FP8 on Hopper with no accuracy loss. The recipe in MaxText and TransformerEngine enables faster LLM pretraining on Blackwell GPUs.

9 days ago