Orin Nano Standup Beyond LLMs

Setting Up Orin Nano for Beyond LLMs Workflows

This post walks through the full bring-up of a fresh Jetson Orin Nano: from re-flashing the OS over USB because the bundled JetPack was too old, through installing CUDA libraries and a CUDA-enabled PyTorch, all the way to running Depth-Anything-V3 (DA3-Small) on the GPU.

If you are starting with a Jetson that shipped with an older L4T release and you want to run modern ML workloads on it, this is roughly the path you will need to take.

image

Step 1: Why we had to re-flash

The board arrived with L4T 36.4.3 (an older JetPack 6.x point release). Most current ML stacks — and in particular the NVIDIA-built PyTorch wheels for Jetson — target a newer JetPack. Trying to install them on the old base image produced CUDA/driver mismatches that no amount of pip gymnastics was going to fix.

The fix is to re-flash the device using NVIDIA SDK Manager running on a separate Linux host.

A second wrinkle: the microSD card slot was not reliably detected during flashing. We worked around this by flashing JetPack directly onto an NVMe SSD installed in the M.2 slot, which also turned out to be much faster at runtime.

Step 2: Re-flashing with SDK Manager (host-side)

You will need:

  • A separate Linux machine (Ubuntu 22.04 worked for us) with NVIDIA SDK Manager installed.
  • A USB-C cable from the Jetson's "recovery" USB port to the host.
  • The NVMe SSD already seated in the Jetson's M.2 slot.

Steps:

  1. Put the Jetson into Force Recovery (reboot) mode. With the device powered off, jumper the FC REC and GND pins on the carrier board's button header (or hold the Recovery button on dev kits), then plug in power. The board will boot into recovery silently — no display output.
  2. On the host, confirm the device is visible: You should see an NVIDIA Corp. entry.
  3. lsusb | grep -i nvidia
  4. Launch SDK Manager, log in, and pick the target board. Select the latest JetPack that supports your module.
  5. Important: in the storage step, choose NVMe as the install target rather than the eMMC/SD slot. This both avoids the SD-detection issue and gives you a much larger, faster root filesystem.
  6. Let the host download and flash. The full download, flash, and on-device installation takes 60–90 minutes end to end depending on your network.
  7. When prompted, complete the on-device OEM setup (username, password, locale) over the serial console or with a display attached.

After this you should be booted from NVMe with a current JetPack, current kernel, and a matching CUDA toolkit.

Step 3: Install JetPack CUDA Components

Even after a successful SDK Manager flash, you'll want to make sure the full JetPack stack — CUDA, cuDNN, TensorRT, and VPI — is installed and up to date on the device:

sudo apt update
sudo apt install nvidia-jetpack

Check. Verify the CUDA compiler is available and the toolkit is in place:

nvcc --version
ls /usr/local/cuda
If jtop later shows CUDA as MISSING, this is often just a display issue with /etc/nv_tegra_release — verify CUDA is actually present using nvcc --version rather than trusting the dashboard.

Step 4: Install jtop (System Monitor)

jtop is the go-to live monitor on Jetson — it shows GPU/CPU/RAM/EMC utilization, power mode, and the installed CUDA/cuDNN/TensorRT versions in one place.

sudo pip3 install jetson-stats
sudo reboot

Check. After reboot, run jtop and confirm GPU, CPU, RAM, and CUDA version are all visible.

Step 5: Set Up Python Environment

We use uv as the Python package manager — it's significantly faster than pip and handles virtualenvs cleanly, which matters on a Jetson where every install touches an SD/NVMe and ARM wheel resolution can be slow.

# Install uv (fast Python package manager)
curl -LsSf <https://astral.sh/uv/install.sh> | sh
source ~/.bashrc

# Create a virtual environment with Python 3.10
uv venv env_your_project --python 3.10
source env_your_project/bin/activate

Python 3.10 is intentional: NVIDIA's official Jetson PyTorch wheel for JetPack 6 ships as cp310, so the venv must match.

Step 6: Add ~/.local/bin to PATH

uv and several pip-installed CLIs land in ~/.local/bin, which isn't always on the default PATH:

echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrc

Check. echo $PATH should now show /home/<user>/.local/bin at the beginning.

Step 7: Install PyTorch (NVIDIA Jetson Wheel)

Do NOT use a plain pip install torch — PyPI will hand you an x86 build, or at best a CPU-only aarch64 build, neither of which talks to the Jetson GPU. You want NVIDIA's prebuilt wheel that is compiled against the JetPack CUDA you just installed:

# Download NVIDIA's official Jetson wheel for PyTorch 2.5.0 (Python 3.10)
wget <https://developer.download.nvidia.com/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl>

# Install prerequisites
uv pip install numpy==1.26.1 setuptools

# Install PyTorch
uv pip install torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl

The nv24.08 tag in the wheel name pins it to a specific JetPack release — make sure it matches the JetPack you flashed in Step 2.

Step 8: Fix Missing libcusparseLt

PyTorch 2.5+ links against cuSPARSELt, which JetPack does not include by default. Without it, import torch will fail to load CUDA kernels at runtime. Install it manually from NVIDIA's archive — note the sbsa (Server Base System Architecture) build, which is the right one for Jetson's aarch64 userspace; do not grab the x86_64 archive:

wget <https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.5.2.1-archive.tar.xz>
tar xf libcusparse_lt-linux-sbsa-0.5.2.1-archive.tar.xz
sudo cp -a libcusparse_lt-linux-sbsa-0.5.2.1-archive/include/* /usr/local/cuda/include/
sudo cp -a libcusparse_lt-linux-sbsa-0.5.2.1-archive/lib/*     /usr/local/cuda/lib64/
sudo ldconfig

Check. CUDA should now be visible to PyTorch:

python3 -c "import torch; print(torch.cuda.is_available())"
# Should print: True

If it still prints False, the usual suspects are: wrong wheel for your JetPack, the venv isn't activated, or ldconfig didn't pick up the new .so files.

Step 9: Install TorchVision (Build from Source)

There is no official torchvision wheel for Jetson that matches NVIDIA's custom torch build, so it has to be built from source. Pick the torchvision branch that pairs with your torch version — for torch 2.5.x, that's v0.20.0.

A few things worth knowing:

  • TORCH_CUDA_ARCH_LIST="8.7" is the SM architecture for Orin (Ampere). Setting it explicitly avoids torchvision compiling kernels for every GPU NVIDIA has ever made.
  • MAX_JOBS=2 keeps the parallel C++ compile from OOM-killing itself on the Orin Nano's 8 GB of RAM.
  • -no-build-isolation --no-deps ensures the build links against the torch already installed in your venv rather than pulling a fresh CPU build from PyPI.

Check.

python3 -c "import torchvision; print(torchvision.__version__)"
# Should show: 0.20.0

Step 10: Depth-Anything-V3 — installing the requirements

With Torch and torchvision in place, clone Depth-Anything-V3:

git clone <https://github.com/><...>/Depth-Anything-3
cd Depth-Anything-3

The upstream requirements.txt is written for x86 + datacenter GPUs and pulls in a few packages that either have no aarch64 wheel or require a heavy from-source build that is not worth it just to run inference. The two we had to comment out to get a clean install on Jetson were:

  • xformers — no Jetson wheel; only needed for some training / memory-efficient attention paths. Inference works without it (PyTorch falls back to standard attention).
  • pycolmap — pulls in COLMAP and a long C++ build chain. Only needed for the SfM/reconstruction utilities, not for running the depth model.
  • note that Depth-Anything-3/src/depth_anything_3/api.py tries to import utils.export which has pycolmap dependency — to avoid the import error, we just comment out the utils.export import in api.py

Edit requirements.txt and prefix those lines with #, then:

uv pip install -r requirements.txt

If anything else fails to build, the same principle applies: figure out whether it is on the inference path or only used by training/eval tooling, and skip it if you can.

Step 11: Running DA3-Small on CUDA

At this point the DA3-Small checkpoint runs end-to-end on the Jetson GPU. A minimal smoke test:

import torch
from depth_anything_v3 import DepthAnythingV3   # adjust import to match repo

device = "cuda"
model = DepthAnythingV3.from_pretrained("depth-anything-v3-small").to(device).eval()

img = torch.randn(1, 3, 518, 518, device=device)
with torch.no_grad():
    depth = model(img)
print(depth.shape, depth.device)

You should see the tensor on cuda:0, and jtop (or tegrastats) should show the GPU lighting up.

Recap

The shape of the work was:

  1. Re-flash with SDK Manager from a Linux host, into NVMe, with the Jetson in recovery mode (60–90 minutes).
  2. Install the full JetPack CUDA stack via nvidia-jetpack.
  3. Install jtop for live monitoring.
  4. Set up a Python 3.10 virtualenv with uv, and put ~/.local/bin on PATH.
  5. Install the NVIDIA-built PyTorch wheel that matches your JetPack.
  6. Drop in the missing cuSPARSELt (sbsa build) so torch.cuda.is_available() returns True.
  7. Build torchvision from source, pinned to the matching version, with TORCH_CUDA_ARCH_LIST="8.7" and MAX_JOBS=2.
  8. Install Depth-Anything-V3's requirements with xformers and pycolmap commented out.
  9. Load DA3-Small on cuda and confirm it runs.

Most of the friction is in the flashing, the PyTorch wheel, and the torchvision build — once you have a Jetson-correct PyTorch with a matching torchvision, the rest of the ML ecosystem mostly behaves.