This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Benno Krojer
bennokrojer.bsky.social
did:plc:e24hhq5kyrybscj2lxys63uy
We show that seemingly “high-performing” VideoLLMs take various shortcuts on video tasks meant to test physical understanding, such as models falling back to single-frame biases.
In total we analyze 4 such shortcuts and find that model scores often don't change much:
2025-06-13T14:47:45.231Z