<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"><channel><link>https://bsky.app/profile/tomgoldstein.bsky.social</link><title>@tomgoldstein.bsky.social - </title><item><link>https://bsky.app/profile/tomgoldstein.bsky.social/post/3lhtmgv5hrs2r</link><description>Nowadays ML projects feel like they need to be compressed into a few months. Its refreshing to be able to work on something for a few years!&#xA;&#xA;But also a slog.&#xA;&#xA;[contains quote post or other embedded content]</description><pubDate>10 Feb 2025 16:57 +0000</pubDate><guid isPermaLink="false">at://did:plc:5boaish3gk3ubwn6mc4g57bo/app.bsky.feed.post/3lhtmgv5hrs2r</guid></item><item><link>https://bsky.app/profile/tomgoldstein.bsky.social/post/3lhtj55nfq22d</link><description>New open source reasoning model!&#xA;&#xA;Huginn-3.5B reasons implicitly in latent space 🧠&#xA;&#xA;Unlike O1 and R1, latent reasoning doesn’t need special chain-of-thought training data, and doesn&#39;t produce extra CoT tokens at test time.&#xA;&#xA;We trained on 800B tokens 👇</description><pubDate>10 Feb 2025 15:58 +0000</pubDate><guid isPermaLink="false">at://did:plc:5boaish3gk3ubwn6mc4g57bo/app.bsky.feed.post/3lhtj55nfq22d</guid></item><item><link>https://bsky.app/profile/tomgoldstein.bsky.social/post/3lgvhoxaouc2j</link><description>Let’s sanity check DeepSeek’s claim to train on 2048 GPUs for under 2 months, for a cost of $5.6M.  It sort of checks out and sort of doesn&#39;t. &#xA;&#xA;The v3 model is an MoE with 37B (out of 671B) active parameters.  Let&#39;s compare to the cost of a 34B dense model. 🧵</description><pubDate>29 Jan 2025 17:12 +0000</pubDate><guid isPermaLink="false">at://did:plc:5boaish3gk3ubwn6mc4g57bo/app.bsky.feed.post/3lgvhoxaouc2j</guid></item></channel></rss>