This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Eugene Yan
eugeneyan.com
did:plc:ilhs5ksfpze2l5rqgig4ycl6
Wrote an intro to evals for long-context Q&A systems:
• How it differs from basic Q&A
• What dimensions & metrics to eval on
• How to build llm-evaluators
• How to build eval datasets
• Benchmarks: narratives, technical docs, multi-docs
https://eugeneyan.com/writing/qa-evals/
2025-06-25T01:48:43.108Z