@ai2.bsky.social

Breakthrough AI to solve the world's biggest problems. › Join us: http://allenai.org/careers › Get our newsletter: https://share.hsforms.com/1uJkWs5aDRHWhiky3aHooIg3ioxmhttps://bsky.app/profile/ai2.bsky.social@ai2.bsky.social - Ai2https://bsky.app/profile/ai2.bsky.social/post/3mj5lxxfoyb26You can now train, adapt, and eval web agents on your own tasks. We're releasing the full MolmoWeb codebase—the training code, eval harness, annotation tooling, synthetic data pipeline, & client-side code for our demo. 🧵10 Apr 2026 15:06 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mj5lxxfoyb26https://bsky.app/profile/ai2.bsky.social/post/3miw72pj3is2cToday we're releasing WildDet3D—an open model for monocular 3D object detection in the wild. It works with text, clicks, or 2D boxes, and on zero-shot evals it nearly doubles the best prior scores. 🧵07 Apr 2026 16:27 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3miw72pj3is2chttps://bsky.app/profile/ai2.bsky.social/post/3miceejeknx2gThrilled to have Ai2’s VP of Engineering Jeremy Tryba on stage at @geekwire.com's Agents of Transformation event last week. He painted a vivid picture of what agentic AI can do for science, and cancer research in particular. 🧵30 Mar 2026 19:08 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3miceejeknx2ghttps://bsky.app/profile/ai2.bsky.social/post/3mi2pdr2q5324MolmoBot, our open robotic manipulation suite trained entirely in simulation, now has code, training data, a data generation pipeline, & evals all available. This puts our robotics models within reach of any research lab—no extensive real-world data collection required. 🧵27 Mar 2026 18:04 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mi2pdr2q5324https://bsky.app/profile/ai2.bsky.social/post/3mhsucnydgm2gToday we're releasing MolmoWeb, an open source agent that can navigate + complete tasks in a browser on your behalf. Built on Molmo 2 in 4B & 8B sizes, it sets a new open-weight SOTA across four major web-agent benchmarks & even surpasses agents built on proprietary models. 🧵24 Mar 2026 15:11 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mhsucnydgm2ghttps://bsky.app/profile/ai2.bsky.social/post/3mhr46p7aav2gWe were at #NVIDIAGTC last week! Across panels, livestreams, & expo floor demos, we shared work on Olmo Hybrid, SERA, Asta AutoDiscovery, MolmoBot, & more—all grounded in the same idea: truly open AI means sharing the full pipeline, not just the weights. 🧵 buff.ly/rGAEUh5 https://buff.ly/rGAEUh523 Mar 2026 22:27 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mhr46p7aav2ghttps://bsky.app/profile/ai2.bsky.social/post/3mhgxwhkhtk2i📢 Introducing vla-evaluation-harness—a unified, fully open framework to evaluate any VLA model on any robot simulation benchmark. Integrate your model once. Integrate the benchmark once. The full cross-evaluation matrix fills itself. 🧵19 Mar 2026 21:44 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mhgxwhkhtk2ihttps://bsky.app/profile/ai2.bsky.social/post/3mhe2k3t5vc2vGrounding lets vision-language models do more than describe—they can point to where a robot should grasp, which button to click, or which object to track across video frames. Today we're releasing MolmoPoint, a better way for models to point. 🧵18 Mar 2026 17:53 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mhe2k3t5vc2vhttps://bsky.app/profile/ai2.bsky.social/post/3mh37slg5fs2o🔎 Deep research agents like Asta ScholarQA are transforming how we perform literature review. But how do we know if the way we evaluate them is actually meaningful? Announcing our new paper: “Deep Research, Shallow Evaluation: A Case Study in Meta-Evaluation for Long-Form QA Benchmarks” 🧵15 Mar 2026 05:33 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mh37slg5fs2ohttps://bsky.app/profile/ai2.bsky.social/post/3mgsaic5i2k22Today, a step forward in open robotics - our results show that sim-to-real zero shot transfer for manipulation is possible. MolmoBot is our open model suite for robotics, trained entirely in simulation on MolmoSpaces.🧵11 Mar 2026 15:51 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mgsaic5i2k22https://bsky.app/profile/ai2.bsky.social/post/3mgda2kkyn22nIntroducing Olmo Hybrid, a 7B fully open model combining transformer and linear RNN layers. It decisively outperforms Olmo 3 7B across evals, w/ new theory & scaling experiments explaining why. 🧵05 Mar 2026 16:34 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mgda2kkyn22nhttps://bsky.app/profile/ai2.bsky.social/post/3mg6elxhnrk2h📢 Update: the Molmo 2 codebase is now open source. We're releasing the code behind Molmo 2—our open model family for video & image understanding, pointing, tracking, & more. Now you can easily train Molmo 2 on your own data. 🧵03 Mar 2026 18:12 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mg6elxhnrk2hhttps://bsky.app/profile/ai2.bsky.social/post/3mg44t6zmx52qIn just a few weeks, researchers used AutoDiscovery to generate 20K+ hypotheses across oncology, climate science, marine ecology, entomology, cybersecurity, music cognition, social sciences, & more. Now we're extending access for three more months—and refreshing credits. 👇02 Mar 2026 20:47 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mg44t6zmx52qhttps://bsky.app/profile/ai2.bsky.social/post/3mfubtunv5w2cWe analyzed 250K+ queries & 430K+ clickstream interactions from Asta, our AI-powered research assistant—and today we're releasing the full dataset. How do researchers actually use AI science tools? Here's what we found. 🧵27 Feb 2026 17:56 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mfubtunv5w2chttps://bsky.app/profile/ai2.bsky.social/post/3mfp5qeieoj2iCan AI predict what scientists will do next—not just one piece, but the whole research process? PreScience is our new model eval for forecasting how science unfolds end-to-end, from how research teams form to a paper's eventual impact. Built with UChicago, supported by NSF.25 Feb 2026 16:59 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mfp5qeieoj2ihttps://bsky.app/profile/ai2.bsky.social/post/3mfkkx2kpyu2gLess than a week left to try AutoDiscovery. 🔬 Most AI tools for science wait for a question. AutoDiscovery starts with your data—generating hypotheses, running experiments, and surfacing surprising findings with reproducible code. Get 1,000 Hypothesis Credits through Feb 28. 👇23 Feb 2026 21:12 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mfkkx2kpyu2ghttps://bsky.app/profile/ai2.bsky.social/post/3mfa4klvejz2hIt's been incredible seeing what the scientific community has done in just one week with AutoDiscovery, our new tool that autonomously surfaces hypotheses you might never think to test. Researchers have run 10,000+ experiments so far. Tell us what it's uncovering for you. 🧵19 Feb 2026 17:28 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mfa4klvejz2hhttps://bsky.app/profile/ai2.bsky.social/post/3mf5pybcx3v2wWe've released a Chrome extension for Asta—a faster way to go from finding a paper to asking questions about it while you read. 🧵18 Feb 2026 18:37 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mf5pybcx3v2whttps://bsky.app/profile/ai2.bsky.social/post/3meqwroxnas2gData mixing – determining how much web text, code, math, etc., you need for LM development – is a first-order lever on model quality. Introducing Olmix: a framework for configuring mixing methods at the start of dev & efficiently updating as data changes throughout. 🧵13 Feb 2026 16:34 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3meqwroxnas2ghttps://bsky.app/profile/ai2.bsky.social/post/3meoepd2xaw2bKnowing which questions to ask is often the hardest part of science. Today we're releasing AutoDiscovery in AstaLabs, an AI system that starts with your data and generates its own hypotheses. 🧪12 Feb 2026 16:06 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3meoepd2xaw2bhttps://bsky.app/profile/ai2.bsky.social/post/3memamxbby62eIntroducing MolmoSpaces, a large-scale, fully open platform + benchmark for embodied AI research. 🤖 230k+ indoor scenes, 130k+ object models, & 42M annotated robotic grasps—all in one ecosystem.11 Feb 2026 19:47 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3memamxbby62ehttps://bsky.app/profile/ai2.bsky.social/post/3mejggcvxan2sLLMs often generate step-by-step instructions, from real-world tasks (how do I file taxes?) to plans for AI agents. Improving this is hard: outputs can sound fluent for steps that don't work, and current datasets cover few domains. How2Everything evals/trains for this at scale. 🧵10 Feb 2026 16:53 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mejggcvxan2shttps://bsky.app/profile/ai2.bsky.social/post/3megulqnpqv26New: A web demo to make using DR Tulu even simpler, built by our collaborators at MIT & the University of Washington. Ask a question and watch DR Tulu plan, search, & synthesize a citation-grounded report you can share. 🔎09 Feb 2026 16:29 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3megulqnpqv26https://bsky.app/profile/ai2.bsky.social/post/3mdxvqabzuj2eSince launching Open Coding Agents, it's been exciting to see how quickly the community has adopted them. Today we're releasing SERA-14B – a new 14B-parameter coding model – plus a major refresh of our open training datasets. 🧵03 Feb 2026 17:39 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mdxvqabzuj2ehttps://bsky.app/profile/ai2.bsky.social/post/3mdiw5hi3up26Introducing Theorizer: Turning thousands of papers into scientific laws 📚➡️📜 Most automated discovery systems focus on experimentation. Theorizer tackles the other half of science: theory building—compressing scattered findings into structured, testable claims. 🧵28 Jan 2026 18:37 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mdiw5hi3up26https://bsky.app/profile/ai2.bsky.social/post/3mdg5munm4r2eIntroducing Ai2 Open Coding Agents—starting with SERA, our first-ever coding models. Fast, accessible agents (8B–32B) that adapt to any repo, including private codebases. Train a powerful specialized agent for as little as ~$400, & it works with Claude Code out of the box. 🧵27 Jan 2026 16:12 +0000at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3mdg5munm4r2e