Ai2 launches olmo-eval, an open evaluation workbench for LLM development

LaunchDevelopers

2 days ago

Ai2 launches olmo-eval, an open evaluation workbench for LLM development

olmo-eval helps model developers add, run, and analyze benchmarks across LLM checkpoints. It extends OLMES from final-score reproducibility into the daily development loop.

LLMs are no longer created w/ human data alone. They rely on other models to generate & filter data,...2 days agoAllen Institute for AI (Ai2)

Building an LLM means evaluating it over & over as it changes. Tweak a hyperparameter or scale the m...1 day agoAllen Institute for AI (Ai2)

2 days ago