@timkellogg.me on Bluesky

JavaScript RequiredThis is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Learn more about Bluesky at bsky.social and atproto.com.

Post

Tim Kellogg

timkellogg.me

did:plc:ckaz32jwl6t2cno6fmuw2nhn

has anyone figured out this paper? it looks like they’re doing reward modeling at inference time, but i thought RM was for RL. it doesn’t look like they’re using RM to steer the sampler, and i don’t think they’re doing RL at runtime. what’s going on? [contains quote post or other embedded content]

2025-04-05T17:35:24.199Z