This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Simon Willison
simonwillison.net
did:plc:kft6lu4trxowqmter2b6vg6z
New benchmark just dropped: SnitchBench by Theo Browne tests if LLMs will snitch on you to the authorities if you feed them incriminating documents and a tool that lets them send email, as seen in the Claude 4 System Card
Turns out they pretty much all will! https://simonwillison.net/2025/May/31/snitchbench-with-llm/
2025-05-31T22:09:52.852Z