Back to AIBriefs
AnalysisAI ModelsMusic

MOSS-Audio: Unified audio-language model for speech, sound, music

The MOSS-Audio technical report presents a unified audio-language model for speech, environmental sound, and music understanding. It supports audio captioning, time-aware question answering, timestamped transcription, and audio-grounded reasoning.

·
9 days ago
MOSS-Audio: Unified audio-language model for speech, sound, music — AIBriefs