This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Sung Kim
sungkim.bsky.social
did:plc:cq4gg3odxz2pzmkx2fuac3u3
What causes collapse in RL training?
The paper studies this through ablations on importance sampling and the decoupled surrogate objective. It also explains why anchoring the objective to π old, rather than π current, helps stabilize training,
2026-05-05T09:21:37.176Z