@tedunderwood.com on Bluesky

JavaScript RequiredThis is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is. Learn more about Bluesky at bsky.social and atproto.com.

Post

Ted Underwood

tedunderwood.com

did:plc:565ebob5f6hw33hjdkxty6qj

If you know ground truth, you can evaluate LLM annotation by measuring accuracy. But what if the annotation task is subjective and you have many judgments by different observers? This paper offers a method for assessing whether divergence of a single LLM is outside expected human range. + https://arxiv.org/abs/2510.06658

2025-10-10T11:19:22.578Z