AnalysisAI Models
8 days ago
WebRISE benchmark evaluates MLLM-generated web artifacts
WebRISE compiles task requirements into states and transitions to assess correctness of MLLM-generated web pages. Unlike existing benchmarks, it captures requirement-induced behavior beyond local evidence.
·
8 days ago