@mainlp.bsky.social on Bluesky

JavaScript RequiredThis is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Learn more about Bluesky at bsky.social and atproto.com.

Post

MaiNLP lab, LMU Munich

mainlp.bsky.social

did:plc:qnrvtnfdzw4tr5ppkdeipuqs

📝LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks 🔎We present a large-scale study of whether LLM judgments can be reliably used as proxies for human judgments 👥Anna Bavaresco et al. 🔗 arxiv.org/abs/2406.18403 📁Main - Short

2025-07-23T12:30:07.498Z