Update: NVIDIA Cosmos Transfer2.5

Right after publishing my article about Cosmos Transfer1, NVIDIA released Cosmos Transfer2.5.
Below is a concise, step-by-step guide on how to run it on RunPod, along with examples from my early experiments.
Disclaimer: I’m not affiliated with RunPod in any way, and this isn’t a paid promotion. Feel free to try this Docker image with any other similar services that support NVIDIA GPU.
Step 1: Choose a GPU
Sign in to RunPod and click this link — you should see a list of available GPUs.
For some reason RTX PRO 6000 no longer works and the script throws RuntimeError: FlashAttention only supports Ampere GPUs or newer.
I recommend selecting the H100 SXM. After testing several GPUs on the same video, I found that it runs about 25–30 % faster than H100 PCIe, while costing only ~ 12 % more per hour.
Step 2: Run the Pod
Click the Deploy On Demand
button to start your Pod.
It takes about 10-15 minutes to download the Docker image and initialize everything.
Step 3: First Run
Once your Pod is ready, log in via SSH.
You can find the official connection guide 👉 here.
After connecting, go to the /workspace
directory and install the remaining dependencies:
Step 4: Hugging Face
Before you run inference, make sure you’re logged in to Hugging Face:
Not sure where to find your token? 👉 Here’s a guide.
You no longer need to pre-download any models. They will be downloaded automatically before inference. As a result, the first run may take a bit longer.
Step 5: Run Inference
Upload your training episode(s) to the container using SCP, and create your params_file
file and run inference with:
Here’s an example params_file
from my setup:
Example prompt_path
file:
You can find a bit more detail on 👉 GitHub. At the time I’m writing this, NVIDIA still hasn’t released an official document on their website.
My Experiments
I can’t really call my first experiments a success. Most of the generated videos turned out a bit strange - some were almost okay, others completely missed the mark, and a few got blocked by the built-in guardrail model.
Task: Change Blocks Colors
If you watch until the middle, you’ll notice the video contains some anomalies like blocks taking strange positions, and toward the end, the work surface changes.
Task: Change Work Surface
In this video, you’ll notice an extra block, and around the middle, the perspective shifts and strange objects start to appear.
Generating a single video on the H100 SXM took roughly 50 minutes.
Conclusion
It’s clear that augmenting robot training datasets isn’t as simple as it seems — and I’m probably still missing a few key steps along the way. It definitely requires plenty of trial and error, and I suspect the model needs some fine-tuning to better align with each robot’s embodiment and setup.
While digging through the docs, I also noticed that NVIDIA mentions two Cosmos Transfer versions 2B and 12B. It looks like the larger 12B variant might be on its way to release.
Still, it’s exciting to see even partial results.
If you’ve experimented with Cosmos Transfer or similar models, I’d love to hear your experience. Drop me a comment or message — sharing insights helps the whole community move forward faster.