Analyzing LeRobot Datasets on Hugging Face

Note: this article is for hobbyists and people familiar with LeRobot.
While playing with my LeRobot and the SO-101, I started to wonder: "What are other people doing with this project?"
Curiosity won, so I put together a small Python script to crawl through all Hugging Face datasets with the “LeRobot” tag.
The script found 16,065 datasets in total. This does not include gated or private datasets, so the absolute real number should be bigger.
In this article, I’ll share some of the stats I found. Hopefully, they’ll give you a better picture of how the project is growing and what the community is working on.
How Many Datasets Were Created Over Time?
Looking at the chart, it’s clear that 2025 is when the project really started to gain traction, with most of the datasets created this year.
This looks pretty impressive, considering the very first commit to the LeRobot GitHub repository was only made on January 26, 2024.
How Many Episodes Does a Dataset Have on Average?
One of my first questions with LeRobot was: “How many episodes are enough for a good dataset?” After recording a few datasets on my own, I quickly realized that recording lots of episodes is harder than it looks. That made me wonder — how big are the datasets other people are recording?
To answer that, I filtered the data a bit:
- Early versions of
LeRobotDataset
didn’t track metadata likerobot_type
ortotal_episodes
, so I excluded those. - I also removed any datasets where
total_episodes
was zero.
That left 14,997 datasets for analysis.
The histogram below shows how many datasets exist at different sizes:
Key observations:
- Small datasets dominate. Around 6,000 datasets (~43%) contain just 1–5 episodes. This suggests that many contributions are likely experimental, incomplete, or quick tests rather than large-scale datasets.
- While most datasets are small, a few very large ones exist — the largest dataset has 442,226 episodes.
- The MEDIAN of
total_episodes
is 10 episodes, while the MEAN is 273 episodes. This average is heavily skewed by a small number of massive datasets.
Which Robots Are Most Popular?
LeRobot supports many different robot embodiments. So the next logical question for me was: "Which robots are the most popular?"
I started by running DISTINCT on robot_type
and found around 296 unique values. Some of these are just variations (like so101, so101_follower or so100-blue), but overall it shows that there’s a pretty decent variety of embodiments people are experimenting with.
This chart shows the number of datasets for each robot_type
.
To keep things simple, I highlighted only the top groups and combined the rest into “Other” and even with this, the overall trend is easy to see.
Key observations:
- SO-100, SO-101, Koch, LeKiwi dominate the dataset count. More than half of all datasets come from these robots. The reason is clear: they are relatively affordable and easy to set up. As a result, many hobbyists record small experiments, which become separate datasets.
Here’s the full list of robot_type
values I found:
How Many Cameras Should a Robot Have?
Out of ~16k datasets from HuggingFace I've analyzed, ~13k have camera info — enough to draw some conclusions (older LeRobotDataset
versions didn't track this).
This chart shows how many cameras people have in their datasets.
Key Observations:
- Most robots have 1–2 cameras, with the 2-camera setup being the most common configuration.
- There are a few datasets with robots using 8 and even 9 cameras. I checked some of those repositories, and all of them had
robot_type
set to either a2d or R1Pro, which are next-generation humanoid robots.
What Resolution and FPS Should Cameras Have?
When I started recording my first dataset, I used 1920×1089 resolution — and ended up with a really heavy dataset. While SmolVLA (which I’d been fine-tuning) managed to digest such big videos and still performed well, after reading a few whitepapers I realized you don’t actually need such high resolution.
So let's take a look at the top 10 camera resolutions and fps people are using in their datasets:
Key Observations:
- 640×480 at 30fps dominates: ~17k cameras (~72%) use this resolution and fps.
- Resolutions like 672×376 and 640×360 appear, indicating non-standard or cropped recordings.
How Long Are Training Episodes?
Once your robot is assembled and the cameras are configured, it’s tempting — especially after watching glossy demo videos — to aim for long, complex tasks.
I did the same at first: lots of variability, long episodes, and high expectations. But after some experimenting (and a bit of reading), I realized that short, focused training episodes actually work best. They make it much easier for the model to lock onto the behavior you’re really trying to teach.
Maybe in the future, as the number of training datasets grows and synthetic data generation tools like NVIDIA Cosmos become more accessible, long-horizon tasks will be within reach. But for now, getting stable robot behavior with short-horizon tasks is much more realistic, especially in home lab setups.
So let’s take a look at what the community is actually experimenting with.
To calculate average episode length in seconds for each dataset, we will use this formula: total_frames / (fps * total_episodes)
Now, let’s build a histogram.
Key Observations:
- Median episode length is 17.4 seconds, while the mean is 24.8 seconds — showing that most episodes across all datasets are fairly short, but a few very long ones pull the average up.
- The vast majority of datasets (12,240) have episodes under 30 seconds. This suggests that most contributors are experimenting with short-horizon tasks like pick-and-place.
- In the few datasets with episodes averaging 300+ seconds, I noticed they often contain recordings of evaluating trained models rather than recordings useful for training.
How Many Episodes Does a Dataset Have on Average?
In my view, the number of datasets shows how many different people are trying out the robot, while the number of episodes signals the scale and seriousness of the work. A dataset with only a few episodes is a quick test, whereas hundreds of episodes suggest long, focused work.
In the chart below, I’ve summed up total_episodes
per robot_type
to get the overall episode count for each robot.
Key observations:
- Platforms like XArm, Franka, and KUKA IIWA stand out when looking at total episodes. They may not have many datasets, but each dataset contains a very large number of episodes, suggesting longer and more structured experiments.
- While SO-100/101 datasets often come from hobby projects, larger robots like XArm and Franka are expensive and usually used in labs or companies, where paid staff generate longer recordings and bigger datasets.
- Big robots such as XArm, Franka, and KUKA IIWA are costly, so their datasets likely reflect professional lab or company work rather than hobbyist experiments.
How Many Tasks Do Datasets Have?
Currently, 91% of all datasets are single-task and just a handful include two or more. However, there are exceptions — for example, repositories like DROID contain thousands of tasks and these are usually the result of large community efforts rather than individual contributors.
Here are the top 20 datasets with the most tasks in them.
Some seem to be forks, but I’ll dig into that later.
How Long Are The Task Descriptions Themselves?
Looking at the task descriptions across datasets, most are simple and to the point — usually short phrases describing pick-and-place, grabbing, or collecting actions.
The graph below makes this clear:
Key Observations:
- Task descriptions are generally short, with a MEAN length of ~45 characters.
- The longest outlier reaches nearly 2,000 characters, but such cases are rare.
SO-10x, LeKiwi, Koch Datasets:
You can put together a dataset of 50–60 short episodes in just a day or two. But creating something with 200+ episodes requires real effort — and that makes those datasets especially interesting to learn from.
I haven’t yet had the chance to review all of them, but here are the largest datasets for SO-100, SO-101, Koch, and LeKiwi.
Conclusion:
Now we know the typical setup:
- SO-100 or SO-101 robots
- two cameras at 640×480, 30 fps
- training episodes lasting 30–40 seconds
- usually just one task per dataset
- short-horizon tasks like pick, place, grab, and collect
This was only a first look at the numbers, but I hope you found it useful and interesting.