AnalysisAI Models
6 days ago
Almieyar-Oryx-BloomBench: bilingual multimodal benchmark for VLM evaluation
The benchmark is designed for cognitively informed evaluation of vision-language models (VLMs) in English and Arabic. It argues current benchmarks lack diagnostic rigor for reasoning abilities.
·
6 days ago