@raphaelmilliere.com on Bluesky

JavaScript RequiredThis is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Learn more about Bluesky at bsky.social and atproto.com.

Post

Raphaël Millière

raphaelmilliere.com

did:plc:mzbt67ojwwerred6cl63tovy

Despite extensive safety training, LLMs remain vulnerable to “jailbreaking” through adversarial prompts. Why does this vulnerability persist? In a new paper published in Philosophical Studies, I argue this is because current alignment methods are fundamentally shallow. 🧵 1/13

2025-06-10T13:39:50.145Z