This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Ted Underwood
tedunderwood.com
did:plc:565ebob5f6hw33hjdkxty6qj
If you know ground truth, you can evaluate LLM annotation by measuring accuracy. But what if the annotation task is subjective and you have many judgments by different observers? This paper offers a method for assessing whether divergence of a single LLM is outside expected human range. +
https://arxiv.org/abs/2510.06658
2025-10-10T11:19:22.578Z