This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Toby Ord
tobyord.bsky.social
did:plc:giqjsb4j3mkbic6uwwk3hgmj
Brilliant experiment by Anthropic's alignment team (and Redwood Research), where their LLM (Claude 3 Opus) pretended to be aligned with the goals it knew it was being trained on in order to preserve underlying preferences which went against those goals.
https://www.anthropic.com/research/alignment-faking
2024-12-19T09:46:31.952Z