Smaller Self-Supervised ViTs Localize Better than Larger Ones

AnalysisAI Models

8 days ago

Smaller Self-Supervised ViTs Localize Better than Larger Ones

A new arXiv study finds that smaller self-supervised Vision Transformers produce better foreground object localization than larger models. The paper attributes this to differences in attention map dynamics during training.

8 days ago