AnalysisAI ModelsVisual AI
6 days ago
Interleaved Latent Visual Reasoning proposed for video event prediction
The paper introduces Interleaved Latent Visual Reasoning (ILVR), which performs future state prediction in latent visual space rather than verbalizing intermediate steps. ILVR uses frame-level temporal abstraction and latent state propagation to capture fine-grained motion and uncertainty.
·
6 days ago