<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"><channel><description>Datasets @ Hugging Face | Open Source + HF Dataset Hub</description><link>https://bsky.app/profile/lhoestq.hf.co</link><title>@lhoestq.hf.co - Quentin Lhoest 🤗</title><item><link>https://bsky.app/profile/lhoestq.hf.co/post/3mlo6fmxvk224</link><description>1M public datasets on @hf.co ! Congrats for pushing forward AI with open datasets !</description><pubDate>12 May 2026 16:09 +0000</pubDate><guid isPermaLink="false">at://did:plc:nj2acksctlolkzquptyutvz7/app.bsky.feed.post/3mlo6fmxvk224</guid></item><item><link>https://bsky.app/profile/lhoestq.hf.co/post/3lusgjgy3vk26</link><description>New blog post 🚨 Every data engineer should read it&#xA;&#xA;@kszucs.bsky.social (Apache Arrow PMC member) announces how to drastically speed up Parquet files uploads and downloads via deduplication.&#xA;&#xA;Best part: the feature enabling this is open source !&#xA;https://huggingface.co/blog/parquet-cdc</description><pubDate>25 Jul 2025 16:06 +0000</pubDate><guid isPermaLink="false">at://did:plc:nj2acksctlolkzquptyutvz7/app.bsky.feed.post/3lusgjgy3vk26</guid></item><item><link>https://bsky.app/profile/lhoestq.hf.co/post/3lpcedioyqk2e</link><description>CDC Parquet writer is out in PyArrow nightlies 🔥🔥&#xA;&#xA;$ pip install \                    &#xA;  -i https://pypi.anaconda.org/scientific-python-nightly-wheels/simple \&#xA;  &#34;pyarrow&gt;=21.0.0.dev0&#34;&#xA;&#xA;it&#39;s changing the way I view data versioning👇</description><pubDate>16 May 2025 15:38 +0000</pubDate><guid isPermaLink="false">at://did:plc:nj2acksctlolkzquptyutvz7/app.bsky.feed.post/3lpcedioyqk2e</guid></item></channel></rss>