This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Michael Saxon @ NeurIPS
saxon.me
did:plc:yhpo4pagfroflgcj2czskvjn
Interesting result, even after you correct for anthropomorphizing language
The key takeaway is that providing information about the training condition (explicitly or implicitly) to an LM makes it only "align" (update the probability distribution) in that condition
https://www.anthropic.com/research/alignment-faking
2024-12-18T22:55:44.752Z