MOSS-Audio: Unified audio-language model for speech, sound, music The MOSS-Audio technical report presents a unified audio-language model for speech, environmental sound, and music understanding. It supports audio captioning, time-aware question answering, timestamped transcription, and audio-grounded reasoning.
See 14 more sourcesMOSS-Audio Technical Report 8 days ago Chen Yang, Chufan Yu, Hanfu Chen, Jie Zhu, Jingqi Chen, Ke Chen, Wenxuan Wang, Yang Wang, Yaozhou Jiang, Yi Jiang, Zhengyuan Lin, Ziqi Chen, Zhaoye Fei, Chenghao Liu, Jun Zhan, Kang Yu, Kexin Huang, Mingshu Chen, Qinyuan Cheng, Ruixiao Li, Shimin Li, Songl Audio Interaction Model 7 days ago Zhifei Xie, Zihang Liu, Ze An, Xiaobin Hu, Yue Liao, Ziyang Ma, Dongchao Yang, Mingbao Lin, Deheng Ye, Shuicheng Yan, Chunyan Miao UNISON: A Unified Sound Generation and Editing Framework via Deep LLM Fusion 8 days ago Zhaoqing Li, Haoning Xu, Jingran Su, Yaofang Liu, Zhefan Rao, Huimeng Wang, Jiajun Deng, Tianzi Wang, Zengrui Jin, Rui Liu, Haoxuan Che, Xunying Liu Arxiv CS.SD (Sound/Audio ML) · Junjie Zheng Huixin Xue Shihong Ren Chaofan Ding Hao Liu Zihao Chen