This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Joachim Baumann
joachimbaumann.bsky.social
did:plc:gckw62fuxtx6nkdcd4tdk7ay
We tested 18 LLMs on 37 social science annotation tasks (13M labels, 1.4M regressions). By trying different models and prompts, you can make 94% of null results appear statistically significant–or flip findings completely 68% of the time.
Importantly this also concerns well-intentioned researchers!
2025-09-12T10:33:46.322Z