This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
MaiNLP lab, LMU Munich
mainlp.bsky.social
did:plc:qnrvtnfdzw4tr5ppkdeipuqs
📝LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
🔎We present a large-scale study of whether LLM judgments can be reliably used as proxies for human judgments
👥Anna Bavaresco et al.
🔗 arxiv.org/abs/2406.18403
📁Main - Short
2025-07-23T12:30:07.498Z