This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.
Post
Simon Willison
simonwillison.net
did:plc:kft6lu4trxowqmter2b6vg6z
FineWeb-Edu is derived from Common Crawl, which many people consider an unethical source of training data as that's copyright material scraped from the web
arxiv.org/abs/2406.17557
https://arxiv.org/abs/2406.17557
2025-10-15T05:17:55.524Z