@byron.bsky.social - Byron Wallace

Assoc. Prof in CS @ Northeastern, NLP/ML & health & etc. He/him.https://bsky.app/profile/byron.bsky.social@byron.bsky.social - Byron Wallacehttps://bsky.app/profile/byron.bsky.social/post/3mlnstt6vkc2tSurgically editing prompts to vary a factor of interest (like gender) is an intuitive way of analyzing model behavior and sensitivity. But @zihaogavinyang.bsky.social shows that we should really compare the results from such perturbations to those observed when, e.g., we simply paraphrase inputs 👇 [contains quote post or other embedded content]12 May 2026 12:42 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3mlnstt6vkc2thttps://bsky.app/profile/byron.bsky.social/post/3m4vfraa4t22rCheck out @hibaahsan.bsky.social's paper on spotting (problematic) racial biases in LLMs for healthcare applications 👇 [contains quote post or other embedded content]05 Nov 2025 15:52 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3m4vfraa4t22rhttps://bsky.app/profile/byron.bsky.social/post/3m3xc3hcnuc23Chantal (and Vinith) find that you can jailbreak LLMs with syntax! Some examples: https://cshaib.github.io/syntax_domain_spurious_correlations/jailbreaks.html [contains quote post or other embedded content]24 Oct 2025 16:26 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3m3xc3hcnuc23https://bsky.app/profile/byron.bsky.social/post/3m3rtnkxz5s27Now to appear at #EMNLP2025 (Findings). We've added more models and experiments: arxiv.org/abs/2502.13319 [contains quote post or other embedded content]22 Oct 2025 12:24 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3m3rtnkxz5s27https://bsky.app/profile/byron.bsky.social/post/3m23orxjabs25Can we distill *circuits* from teacher models into smaller students? 👇 [contains quote post or other embedded content]30 Sep 2025 23:34 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3m23orxjabs25https://bsky.app/profile/byron.bsky.social/post/3lzlk5lzkic2gCan we quantify what makes some text read like AI "slop"? We tried 👇 [contains quote post or other embedded content]24 Sep 2025 13:28 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3lzlk5lzkic2ghttps://bsky.app/profile/byron.bsky.social/post/3lak7tf5lqk2cI'll be @ #EMNLP2024 if anyone wants to find snobby coffee / despair about election / or I guess talk research. Some work to be presented👇09 Nov 2024 21:21 +0000at://did:plc:alozu2wqmtguj7whemteebqj/app.bsky.feed.post/3lak7tf5lqk2c