<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"><channel><description>AGI Alignment &amp; Control Pretraining @ Geodesic Research | https://kyobrien.io</description><link>https://bsky.app/profile/kyletokens.bsky.social</link><title>@kyletokens.bsky.social - Kyle O’Brien</title><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3mduatzdkxs2p</link><description>Very excited to see pretraining safety efforts! We’re only now beginning to understand how promising pretraining safety and alignment interventions are. Much in the way that curating the base model is important for capabilities like reasoning, so too might it be important for safety.&#xA;&#xA;[contains quote post or other embedded content]</description><pubDate>02 Feb 2026 06:47 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3mduatzdkxs2p</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3mcj7hnhpjs2g</link><description>I&#39;ve joined Geodesic Research to build the open-science field of AI safety pretraining research. Our first paper is wild. &#xA;&#xA;TL;DR — LLMs pretrained on data about misaligned AIs themselves become less aligned. Luckily, pretraining LLMs with data about good AIs helps them become more aligned.&#xA;https://alignmentpretraining.ai/</description><pubDate>16 Jan 2026 03:58 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3mcj7hnhpjs2g</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3m7xplmh5sc26</link><description>You know you&#39;re AGI-pilled when your Spotify Wrapped looks like this.</description><pubDate>14 Dec 2025 18:08 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3m7xplmh5sc26</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3m4it2ff3d22t</link><description>Applications to apply for the ERA:AI Fellowship close November 3rd! Participating in this Summer&#39;s fellowship was my gateway into pursuing AGI safety research full-time. I will be a research manager for the upcoming Winter fellowships. Feel free to DM me with questions. :) erafellowship.org&#xA;https://erafellowship.org/</description><pubDate>31 Oct 2025 15:45 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3m4it2ff3d22t</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3lwad3w4lfs27</link><description>This articles covers our work for a general audience. :)&#xA;&#xA;[contains quote post or other embedded content]</description><pubDate>12 Aug 2025 22:07 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3lwad3w4lfs27</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3lw7ebgvrac2z</link><description>Big and True :)&#xA;&#xA;[contains quote post or other embedded content]</description><pubDate>12 Aug 2025 12:55 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3lw7ebgvrac2z</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3lw2g5fk3p223</link><description>I like that OpenAI published this. They were able to fine-tune away GPT-oss&#39;s refusal, decreasing refusal rates to ~0%. These results aren&#39;t surprising. Acknowledging that existing safeguards don&#39;t generalize to open models is the first step in developing solutions.&#xA;https://arxiv.org/abs/2508.03153v1</description><pubDate>10 Aug 2025 13:45 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3lw2g5fk3p223</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3lvgpyyboik2c</link><description>I&#39;ve learned a lot over the past two years of getting into research, mostly from mistakes. I’ve made many mistakes. Such is science. Good research is often at the adjacent possible. I&#39;ve written up much of what I&#39;ve learned now that I&#39;m beginning to mentor others. https://open.substack.com/pub/kyletokens/p/dont-think-just-think?r=3gtmk8&amp;utm_campaign=post&amp;utm_medium=web&amp;showWelcomeOnShare=false</description><pubDate>02 Aug 2025 17:49 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3lvgpyyboik2c</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3ls2n5pzv5c2i</link><description>I led an effort at Microsoft last Fall that studied whether SAE steering was an effective way to improve jailbreak robustness. Our paper on SAE steering has been accepted to the ICML Actionable Interpretability Workshop! &#xA;&#xA;Venue: actionable-interpretability.github.io&#xA;Paper: arxiv.org/abs/2411.11296&#xA;https://arxiv.org/abs/2411.11296</description><pubDate>20 Jun 2025 18:10 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3ls2n5pzv5c2i</guid></item><item><link>https://bsky.app/profile/kyletokens.bsky.social/post/3lqy6i4sius2e</link><description>I&#39;ll be in England this summer as an AI Safety Research Fellow with ERA! erafellowship.org/fellowship&#xA;&#xA;I will be studying data filtering and tamper-resistant unlearning for open-weight AI safety so that the community can continue to benefit from open models as capabilities improve.&#xA;https://erafellowship.org/fellowship</description><pubDate>07 Jun 2025 01:17 +0000</pubDate><guid isPermaLink="false">at://did:plc:2l5f6yopoueecajei62paniv/app.bsky.feed.post/3lqy6i4sius2e</guid></item></channel></rss>