This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Jon Atkinson
jon-atkinson.bsky.social
did:plc:xyzv52rncwbjcpvuyaxrd3v7
I don't think enough people know about Trafilatura (trafilatura.readthedocs.io). It's such good software. If you ever need to extract text, from almost any source, Trafilatura will do it. I use it all the time to prepare content for LLMs and RAG, or just make things readable. SUCH good software.
https://trafilatura.readthedocs.io/en/latest/
2025-06-16T18:18:23.056Z