This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Eris
isolyth.dev
did:plc:allu5vs3gnm2wm7jzf4rad3r
12B Gemma 4!!! It's neat on its own, image and audio in, but architecturally it's super cool: instead of a visual or audio encoders, they directly train the model on vision and sound, with images using a "lightweight embedding module" and audio 'projected into the same space as text tokens'
https://blog.google/innovation-and-ai/technology/developers-tools/introducing-gemma-4-12B/
2026-06-03T16:21:03.266Z