This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Ameya P.
bayesiankitten.bsky.social
did:plc:kjgrwkc2upr2vhyvlyyikd2v
How do we benchmark the vast capabilities of foundation models? Introducing ONEBench – a unifying benchmark to test them all, led by
@adhirajghosh.bsky.social and
@dziadzio.bsky.social!⬇️
Sample-level benchmarks could be the new generation- reusable, recombinable & evaluate lots of capabilities!
[contains quote post or other embedded content]
2024-12-10T18:39:01.369Z