AnalysisAI ModelsVisual AI
8 days ago
Inference-Time Scaling for Joint Audio-Video Generation
This paper introduces an inference-time scaling approach for joint audio-video generation, enabling synthesis of realistic, synchronized audio-video pairs from text without additional training. The method applies test-time compute scaling to enhance alignment and synchronization.
·
8 days ago