This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Tim Kellogg
timkellogg.me
did:plc:ckaz32jwl6t2cno6fmuw2nhn
not enough is being said about DeepSeek’s multi token prediction (MTP)
They were able to get sonnet-level performance with less data than llama 3.3 70B
Does that mean scaling isn’t over? (if we can just be more efficient) Also, does that mean we can train a LLM fully on properly licensed content?
2024-12-26T23:54:13.153Z