AnalysisAI ModelsVisual AI
8 days ago
Zero-Shot 3D Question Answering via Hierarchical View-to-Token Transportation
Researchers propose a method for zero-shot 3D scene understanding by sampling multiple 2D views from point clouds and feeding them into 2D VLMs. The hierarchical view-to-token transportation enables spatial reasoning without 3D training data.
·
8 days ago