TMax: open-source RL recipe for terminal agents

LaunchAI Models

3 hours ago

Featured·

TMax: open-source RL recipe for terminal agents

@natolambert.bsky.social

A LLN - large language Nathan - (RL, RLHF, society, robotics), athlete, yogi, chef Writes http://interconnects.ai Prev Ai2/Olmo, HuggingFace, Berkeley, and normal places

View on Bluesky

Nathan Lambert

@natolambert.bsky.social

Excited to share a new open-source, RL recipe paper! TMax is the best openly available terminal-bench style training data, establishing the open frontier of small terminal agents with RL training. Many great insights into training in the work led by Hamish Ivison and Oscar Yin.

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents3 days agoRuishan Fang, Siyuan Lu, Chenyi Zhuang, Tao Lin

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning4 days agoChao Chen, Chengzu Li, Zhiwei Li, Yinhong Liu, Zhijiang Guo

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier4 days agoLorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, Matthew Daborn-Sargent

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning4 days agoXiaoyue Xu, Sikui Zhang, Xiaorong Wang, Xu Han, Chaojun Xiao

TMax: A Simple Recipe for Terminal Agents3 hours agopmttyji Discuss

3 hours ago