AnalysisAI Models
14 days ago
Sakana AI proposes DiffusionBlocks for block-wise training
Training memory is reduced by a factor of B (number of blocks) by training transformer networks one block at a time. Performance is maintained across diverse architectures.
·
14 days ago
