This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Pekka Lund
pekka.bsky.social
did:plc:yzywgiiou7cx63uddiru6m2o
Yep. It makes it even more impressive that they can do that with 1D tokenized text.
I think visual reasoning performance should be pretty good once the visual parts catch up with the reasoning parts and complex images are properly tokenized with 2D positional data.
2024-12-26T01:25:18.251Z