This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Dylan Freedman
dylanfreedman.nytimes.com
did:plc:zeqq4z7aybrqg6go6vx6lzwt
Anthropic released an interesting paper on "alignment faking." A large language model, knowing it might be trained based on its responses, will sometimes pretend to go along with harmful user requests to avoid "being modified to be more compliant in the future."
https://www.anthropic.com/research/alignment-faking
2024-12-19T16:07:30.258Z