LaunchDevelopers
2 days ago
Ai2 launches olmo-eval, an open evaluation workbench for LLM development
olmo-eval helps model developers add, run, and analyze benchmarks across LLM checkpoints. It extends OLMES from final-score reproducibility into the daily development loop.
