Nvidia releases LocateAnything for fast vision-language grounding

LaunchAI ModelsVisual AI

17 days ago

10x faster than Qwen3-VL using parallel box decoding. Weights, code, and demo are available on Hugging Face and GitHub.

17 days ago