This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Gokul Swamy
gokul.dev
did:plc:7e3hw64shux7ikibrebi6xx5
We therefore advocate for caution when making or evaluating claims about LLM reasoning and beyond with GRPO and PPO, ideally using algorithms like RLoo or REBEL instead. Check out our blog post for links to our code and W&B logs if you'd like to reproduce our experiments.
2025-07-15T17:46:44.669Z