Installing NVIDIA Isaac GR00T on Ubuntu 24.04 with an RTX 5090

If you’re like me and just bought a brand-new NVIDIA RTX 5090 and want to play with NVIDIA Isaac GR00T and Lerobot, you’ve probably instantly bumped into "RuntimeError: CUDA error: no kernel image is available for execution on the device" when trying to start fine-tuning with GR00T.

As of August 2, 2025:

PyTorch 2.5.1 does not work with RTX 5090. You need to upgrade to version 2.7.0 or later.
FlashAttention2 officially supports only Ampere, Ada, and Hopper GPUs (A100, RTX 3090, RTX 4090, H100).

The CUDA ecosystem moves fast, and new releases could break the instructions in this article overnight.

STEP 1: Install GR00T

Follow the instructions on the Isaac‑GR00T GitHub page. At the time of writing, the relevant section looks like this:

Hovewer running python scripts/gr00t_finetune.py immediately throw RuntimeError: CUDA error: .... So let's fix it.

STEP 2: Upgrade PyTorch

GR00T currently bundles PyTorch 2.5.1, which lacks sm_120 (Blackwell) support.
Install PyTorch 2.7.1 (CUDA 12.8 build):

$ pip uninstall -y torch torchvision
$ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu128

STEP 3: Prepare the Flash Attention build

STEP 3.1: Install CUDA 12.9 Toolkit

You’ll need a fresh CUDA toolkit to compile Flash Attention, so first check which version you have:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2025 NVIDIA Corporation
Built on Tue_May_27_02:21:03_PDT_2025
Cuda compilation tools, release 12.9, V12.9.86
Build cuda_12.9.r12.9/compiler.36037853_0

If the version is older or nvcc is missing - install CUDA 12.9.1:

$ cd /tmp
$ wget https://developer.download.nvidia.com/compute/cuda/12.9.1/local_installers/cuda_12.9.1_575.57.08_linux.run
$ sudo sh cuda_12.9.1_575.57.08_linux.run

During installation choose only the toolkit and uncheck the driver (I’m assuming your GPU driver is already working - if not, read this).

It should look like this:

CUDA 12.9 installer with only Toolkit selected

STEP 3.2: Verify CUDA Toolkit installation

Run nvcc --version again. You should see the same 12.9 build info (see step 3.1). Also verify the CUDA_HOME variable:

$ echo $CUDA_HOME
/usr/local/cuda-12.9

If CUDA_HOME is empty:

$ which nvcc
/usr/local/cuda-12.9/bin/nvcc

Use only the base path /usr/local/cuda-12.9 and add these lines to ~/.bashrc or ~/.zshrc:

$ export CUDA_HOME=/usr/local/cuda-12.9
$ export PATH="$CUDA_HOME/bin:$PATH"
$ export LD_LIBRARY_PATH="$CUDA_HOME/lib64:$LD_LIBRARY_PATH"

Reload the environment:

$ source ~/.bashrc     # or ~/.zshrc

STEP 3.3: Install Ninja

Without Ninja the build of Flash Attention 2 can take hours.

$ pip install ninja

STEP 3.4: Clone Flash Attention 2 repository

$ git clone https://github.com/Dao-AILab/flash-attention.git
$ cd flash-attention
$ git checkout v2.8.2

STEP 4: Compile Flash Attention 2

Before you start compiling, pay attention to the MAX_JOBS setting—it controls how many threads the build uses. On my machine (128 GB RAM and an Intel Core Ultra 9 285K with 24 cores), MAX_JOBS=10 ran smoothly while using only about half of the memory. If you set the value too high, you may run out of RAM, so experiment to find the sweet spot. 4 is usually a safe choice for most PCs.

This is the single most important setting in the entire process: FLASH_ATTN_CUDA_ARCHS=120 tells setup.py that we're compiling for the RTX 5090’s Blackwell architecture (sm_120).

$ FLASH_ATTN_CUDA_ARCHS=120 MAX_JOBS=10 python setup.py install

On my hardware, the build took about 20–30 minutes.

STEP 4.1: Verification

Run the unit tests:

$ pytest tests/test_flash_attn.py

My run finished with:

10 failed, 312346 passed, 196416 skipped in 2366.50s (0:39:26)

I consider that a success.

If you get the same result - congrats! Now you can proceed to fine-tuning with GR00T.