Back to AIBriefs
AnalysisAI Models

NVIDIA X-Token: Cross-Tokenizer KD Outperforms GOLD by +3.82

NVIDIA's X-Token uses projection-guided cross-tokenizer knowledge distillation, transferring dark knowledge via per-position KL divergence. On Llama-3.2-1B, it outperforms GOLD by an average of 3.82 points across tasks.

NVIDIA X-Token: Cross-Tokenizer KD Outperforms GOLD by +3.82 — AIBriefs