How-ToAI Models
28 days ago
SGD's Frequency Bias and How Adam Fixes It
SGD's frequency bias causes parameters for rare tokens to update slowly. Adam fixes this by using per-parameter adaptive learning rates, balancing updates across common and rare tokens. The article provides a detailed explanation with examples.
·
28 days ago
