AnalysisAI ModelsVisual AI
6 days ago
VTI-CoT: Visual-Textual Interleaved Chain of Thought for Video Reasoning
Introduces VTI-CoT, a method that interleaves visual and textual reasoning chains for improved video understanding. The approach addresses limitations of existing CoT methods by enabling fine-grained cross-modal reasoning across temporal events.
·
6 days ago