AnalysisAI Models
8 days ago
AnyAudio-Judge benchmark for audio instruction following
The paper introduces a dynamic rubric-based benchmark and evaluator for audio instruction following, addressing limitations of holistic scoring from general-purpose LLMs. It provides fine-grained, interpretable evaluation metrics.
·
8 days ago