Tiny Recursive Model is Secretly Doing Policy Iterations

TL;DR: Tiny Recursive Models are effectively running a truncated form of policy iteration on each puzzle. Their “mysterious” reasoning power is just good old Bellman math in disguise. 12/11/25: 📢 PAPER ON THE WAY This post is only the starting point. There are many directions to push this RL view of TRM, and I plan to follow up with more experiments and theory — stay tuned. TRM as Implicit Reinforcement Learning: One Coupled Operator to Rule Them All This post is about a particular way to look at Tiny Recursive Models Citation: [1]Less is More: Recursive Reasoning with Tiny Networks A. Jolicoeur-Martineau, (2025) Link through the lens of reinforcement learning. ...

December 11, 2025 · 10 min · 2035 words · Benhao Huang