Back to AIBriefs
How-ToAI Models

SGD's Frequency Bias and How Adam Fixes It

SGD's frequency bias causes parameters for rare tokens to update slowly. Adam fixes this by using per-parameter adaptive learning rates, balancing updates across common and rare tokens. The article provides a detailed explanation with examples.

·
28 days ago
SGD's Frequency Bias and How Adam Fixes It — AIBriefs