This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Simon Schug
smonsays.bsky.social
did:plc:m3uhnr7vfgjniyyp2evzi4ch
Neural networks used to struggle with compositionality but transformers got really good at it. How come?
And why does attention work so much better with multiple heads?
There might be a common answer to both of these questions.
2024-10-28T15:22:13.086Z