This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Andrew White 🐦⬛
andrew.diffuse.one
did:plc:5lmwpxyligfn4bmpeb5a4ejf
We make evals at FutureHouse. It’s hard and it sucks. It’s also now the bottleneck, as we scratch the boundary of human ability. HLE was a huge effort and made many good questions and we hope this analysis stimulates review of the other HLE categories and improvements 7/7
2025-07-23T16:29:03.959Z