WebRISE benchmark evaluates MLLM-generated web artifacts

AnalysisAI Models

8 days ago

WebRISE benchmark evaluates MLLM-generated web artifacts

WebRISE compiles task requirements into states and transitions to assess correctness of MLLM-generated web pages. Unlike existing benchmarks, it captures requirement-induced behavior beyond local evidence.

8 days ago