@tomgoldstein.bsky.social -

https://bsky.app/profile/tomgoldstein.bsky.social@tomgoldstein.bsky.social - https://bsky.app/profile/tomgoldstein.bsky.social/post/3lhtmgv5hrs2rNowadays ML projects feel like they need to be compressed into a few months. Its refreshing to be able to work on something for a few years! But also a slog. [contains quote post or other embedded content]10 Feb 2025 16:57 +0000at://did:plc:5boaish3gk3ubwn6mc4g57bo/app.bsky.feed.post/3lhtmgv5hrs2rhttps://bsky.app/profile/tomgoldstein.bsky.social/post/3lhtj55nfq22dNew open source reasoning model! Huginn-3.5B reasons implicitly in latent space 🧠 Unlike O1 and R1, latent reasoning doesn’t need special chain-of-thought training data, and doesn't produce extra CoT tokens at test time. We trained on 800B tokens 👇10 Feb 2025 15:58 +0000at://did:plc:5boaish3gk3ubwn6mc4g57bo/app.bsky.feed.post/3lhtj55nfq22dhttps://bsky.app/profile/tomgoldstein.bsky.social/post/3lgvhoxaouc2jLet’s sanity check DeepSeek’s claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M. It sort of checks out and sort of doesn't. The v3 model is an MoE with 37B (out of 671B) active parameters. Let's compare to the cost of a 34B dense model. 🧵29 Jan 2025 17:12 +0000at://did:plc:5boaish3gk3ubwn6mc4g57bo/app.bsky.feed.post/3lgvhoxaouc2j