This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Abhishek Sharma
abhishekshar.bsky.social
did:plc:2tplrroxeqontoe6yduvkxac
Our paper: Decision-Point Guided Safe Policy Improvement
We show that a simple approach to learn safe RL policies can outperform most offline RL methods. (+theoretical guarantees!)
How? Just allow the state-actions that have been seen enough times! 🤯
arxiv.org/abs/2410.09361
https://arxiv.org/abs/2410.09361
2025-01-23T18:23:12.850Z