@fbarez.bsky.social - Fazl Barez

Let's build AI's we can trust!https://bsky.app/profile/fbarez.bsky.social@fbarez.bsky.social - Fazl Barezhttps://bsky.app/profile/fbarez.bsky.social/post/3m7g5sa7ojc2jIncredibly excited to announce $1 Million prize pool to solve the world’s most important scientific problem in Interpretability. The goal is to turns hard interpretability questions into tools for human empowerment, oversight and governance.07 Dec 2025 18:35 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3m7g5sa7ojc2jhttps://bsky.app/profile/fbarez.bsky.social/post/3m2k2i7fvjc2k🚨New AI Safety Course @aims_oxford ! I’m thrilled to launch a new called AI Safety & Alignment (AISAA) course on the foundations & frontier research of making advanced AI systems safe and aligned at @UniofOxford what to expect 👇 robots.ox.ac.uk/~fazl/aisaa/06 Oct 2025 16:40 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3m2k2i7fvjc2khttps://bsky.app/profile/fbarez.bsky.social/post/3lz6jcxub3k24🚀 Excited to have 2 papers accepted at #NeurIP2025! 🎉 congrats to my amazing co-authors! More details (and more bragging) soon! and maybe even more news on sep 25 👀 See you all in… Mexico? San Diego? Copenhagen? Who knows! 🌍✈️19 Sep 2025 09:08 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lz6jcxub3k24https://bsky.app/profile/fbarez.bsky.social/post/3lsvzwvwrhc2oExcited to share our paper: "Chain-of-Thought Is Not Explainability"! We unpack a critical misconception in AI: models explaining their steps (CoT) aren't necessarily revealing their true reasoning. Spoiler: the transparency can be an illusion. (1/9) 🧵01 Jul 2025 15:41 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lsvzwvwrhc2ohttps://bsky.app/profile/fbarez.bsky.social/post/3lsl6pqmi6s2yTechnology = power. AI is reshaping power — fast. Today’s AI doesn’t just assist decisions; it makes them. Governments use it for surveillance, prediction, and control — often with no oversight. Technical safeguards aren’t enough on their own — but they’re essential for AI to serve society.27 Jun 2025 08:07 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lsl6pqmi6s2yhttps://bsky.app/profile/fbarez.bsky.social/post/3lpmliipets2mCome work with me at Oxford this summer! Paid research opportunity to: White-box LLMs & model security Safe RL & reward hacking Interpretability & governance tools Remote or Oxford. Apply by 30 May 23:59 UTC. DM with questions.20 May 2025 17:13 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lpmliipets2mhttps://bsky.app/profile/fbarez.bsky.social/post/3lp7ezosj6s2vCome work with me at Oxford! We’re hiring a Postdoc in Causal Systems Modelling to: - Build causal & white-box models that make frontier AI safer and more transparent - Turn technical insights into safety cases, policy briefs, and governance tools ] DM if you have any questions.15 May 2025 11:12 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lp7ezosj6s2vhttps://bsky.app/profile/fbarez.bsky.social/post/3lmcgedgv422lFirst-time Area Chair seeking advice! What helped you most when evaluating papers beyond just averaging scores? After suffering through unhelpful reviews as an author, I want to do right by papers in my track.08 Apr 2025 11:59 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lmcgedgv422lhttps://bsky.app/profile/fbarez.bsky.social/post/3llr5rzk7wc2vTechnical AI Governance (TAIG) at #ICML2025 this July in Vancouver! Credit to Ben and Lisa for all the work! We have a new centre at Oxford working on technical AI governance with Robert Trager and @maosbot.bsky.social many other great minds. We are hiring - please reach out! Quote [contains quote post or other embedded content]01 Apr 2025 15:10 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3llr5rzk7wc2vhttps://bsky.app/profile/fbarez.bsky.social/post/3ljky7bly2k26🔍 Excited to share our paper: "Same Question, Different Words: A Latent Adversarial Framework for Prompt Robustness"!04 Mar 2025 17:24 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3ljky7bly2k26https://bsky.app/profile/fbarez.bsky.social/post/3ljdjmxpmvk2mNew paper alert! 🚨 Important question: Do SAEs generalise? We explore the answerability detection in LLMs by comparing SAE features vs. linear residual stream probes. Answer: probes outperform SAE features in-domain, out-of-domain generalization varies sharply between features and datasets. 🧵01 Mar 2025 18:14 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3ljdjmxpmvk2mhttps://bsky.app/profile/fbarez.bsky.social/post/3lffnztwq7c2g 🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨 Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇 Paper: arxiv.org/pdf/2501.04952 1/810 Jan 2025 16:58 +0000at://did:plc:mbdddmxva4ofg5w6wait2mjs/app.bsky.feed.post/3lffnztwq7c2g