AnalysisAI Models
12 days ago
NVIDIA X-Token: Cross-Tokenizer KD Outperforms GOLD by +3.82
NVIDIA's X-Token uses projection-guided cross-tokenizer knowledge distillation, transferring dark knowledge via per-position KL divergence. On Llama-3.2-1B, it outperforms GOLD by an average of 3.82 points across tasks.
