AnalysisAI Models
7 days ago
Stateful visual encoders improve vision-language models
Paper proposes stateful visual encoders that process video frames with memory, enabling models to detect visual changes without relying solely on language. Outperforms existing VLMs on multi-image and video tasks by encoding temporal context directly in the vision backbone.
·
7 days ago