This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
antirez
antirez.bsky.social
did:plc:ipt7y6qaf6fn7oeeduboqe44
LLM evaluation: only trust people with hard problems. Today models perform decently on most trivial task: this is a win of the technology, but also means that evaluation is more and more a realm of sprcislists.
2025-08-07T20:47:24.005Z