This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Reuben Binns
rdbinns.bsky.social
did:plc:ix6tykalkoh2ik7pblexo67g
I've seen a lot of papers evaluate LLM performance with another LLM, which has always felt dodgy. Here's an example of why that might go wrong:
[contains quote post or other embedded content]
2025-08-28T12:44:12.907Z