Community PPO fine-tune of Qwen-35B-A3 beats GLM-5.2 and Qwen-350B

AnalysisAI Models

Jun 17, 12:35 PM

Community PPO fine-tune of Qwen-35B-A3 beats GLM-5.2 and Qwen-350B

Trained with PPO on Qwen-35B-A3, the model outperforms GLM-5.2 and Qwen-350B on karpathy/autoresearch parameter-golf. User reports the generated ideas feel similar to Opus 4.8.

Jun 17, 12:35 PM