Qwen-3.6-27B and Gemma-4-31B surpass Claude Mythos via scaled test-time compute

AnalysisAI Models

3 days ago

Qwen-3.6-27B and Gemma-4-31B surpass Claude Mythos via scaled test-time compute

A Reddit user scaled test-time compute on Qwen-3.6-27B and Gemma-4-31B using ~25-40x more compute than baseline. The scaffold used exploration breadth 5, correction depth 10, and iterative hypothesis revision.

3 days ago