<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"><channel><description>I do AI Security.&#xA;I work in AI Security.&#xA;I advocate AI Security.&#xA;👉 www.arewesafeyet.com</description><link>https://bsky.app/profile/aisecurity.bsky.social</link><title>@aisecurity.bsky.social - Artificial Intelligence Security</title><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3m4w4bcitwk2l</link><description>Researchers showed that Anthropic&#39;s new &#34;Agent Skills&#34; feature can be hijacked with almost laughable ease. Security-by-design still hasn&#39;t made it onto the AI industry&#39;s to-do list.&#xA;&#xA;https://www.arewesafeyet.com/when-ai-breaks-things-cybersecurity-gets-the-bill/</description><pubDate>05 Nov 2025 22:34 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3m4w4bcitwk2l</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3lnkk4pt3bc2u</link><description>The AI systems we increasingly depend on are fundamentally vulnerable. NIST’s latest report makes that reality plain, exposing the limits of today’s AI security measures and highlighting a growing disconnect between how AI is deployed and how it’s defended.&#xA;&#xA;https://www.arewesafeyet.com/adversarial-machine-learning-is-cybersecuritys-new-frontier/</description><pubDate>24 Apr 2025 10:52 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3lnkk4pt3bc2u</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3llvmx3wr5k2i</link><description>A new paper reveals that fine-tuning large language models on a seemingly narrow task – like writing insecure code – can trigger broad and deeply harmful behaviors. These include promoting violence, expressing authoritarian ideology, and encouraging self-harm.&#xA;&#xA;https://www.arewesafeyet.com/emergent-misalignment-narrow-finetuning-can-produce-broadly-misaligned-llms/</description><pubDate>03 Apr 2025 09:52 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3llvmx3wr5k2i</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3lj3qwa7q7k2k</link><description>The UK realized AI might do more harm as a weapon than as an insensitive chatbot. They’ve rebranded their AI ‘Safety’ Institute to ‘Security’ Institute to focus on actual threats like cyberattacks. And yet, geopolitics pushed this change more than common sense.&#xA;https://www.arewesafeyet.com/safety-is-dead-long-live-security/</description><pubDate>26 Feb 2025 16:03 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3lj3qwa7q7k2k</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3lirgpg2nm22p</link><description>A new research paper introduces Indiana Jones, a highly effective method for jailbreaking large language models. It uses dialogues between multiple specialized AI systems and historically framed prompts to achieve high success rates.&#xA;&#xA;https://www.arewesafeyet.com/indiana-jones-there-are-always-some-useful-ancient-relics/</description><pubDate>22 Feb 2025 13:34 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3lirgpg2nm22p</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3lcubgqdjlk2e</link><description>This weekend I went through OpenAI&#39;s latest model system card. Definitely not your typical Sunday reading.&#xA;&#xA;From self-preservation tactics to outwitting oversight, #o1 GPT raises chilling questions about the fine line between tool and manipulator.&#xA;&#xA;https://www.arewesafeyet.com/deception-as-a-service-the-ai-that-refuses-to-hand-over-its-keys/</description><pubDate>09 Dec 2024 08:07 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3lcubgqdjlk2e</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3l765nost432p</link><description>According to Penn researchers, AI robots are fantastic at following orders.&#xA;&#xA;The problem? They don’t care if those orders come from you or a hacker.&#xA;&#xA;Safety features? Working on it.&#xA;&#xA;https://www.arewesafeyet.com/ai-robots-are-more-hackable-than-your-wi-fi/</description><pubDate>23 Oct 2024 08:45 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3l765nost432p</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3koed4xexmz2b</link><description>Leveraging my decades-long background in #cybersecurity, I&#39;ve written this article on the critical role of red teams in ensuring #AI safety and reliability.&#xA;&#xA;By adapting red teaming methodologies to AI, we can proactively identify risks and build trust in these transformative technologies.&#xA;https://www.linkedin.com/pulse/red-teaming-proactive-approach-ai-safety-luca-sambucci-tbiaf/</description><pubDate>23 Mar 2024 11:30 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3koed4xexmz2b</guid></item><item><link>https://bsky.app/profile/aisecurity.bsky.social/post/3ko33ms36wg2f</link><description>Fascinating research on the security risks posed by the &#39;dark psychological states&#39; of AI agents in multi-agent systems - a must-read for anyone working with or interested in the future of AI and its implications for cybersecurity.&#xA;https://www.linkedin.com/pulse/psysafe-new-approach-multi-agent-system-security-luca-sambucci-uvosf/</description><pubDate>19 Mar 2024 19:22 +0000</pubDate><guid isPermaLink="false">at://did:plc:6q5h4frbjr6da7goxeahcapz/app.bsky.feed.post/3ko33ms36wg2f</guid></item></channel></rss>