This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Raphaël Millière
raphaelmilliere.com
did:plc:mzbt67ojwwerred6cl63tovy
Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 🧵 1/13
2025-06-10T13:39:50.145Z