This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Tim Kellogg
timkellogg.me
did:plc:ckaz32jwl6t2cno6fmuw2nhn
has anyone figured out this paper?
it looks like they’re doing reward modeling at inference time, but i thought RM was for RL. it doesn’t look like they’re using RM to steer the sampler, and i don’t think they’re doing RL at runtime. what’s going on?
[contains quote post or other embedded content]
2025-04-05T17:35:24.199Z