Setting Up Orin Nano for Beyond LLMs Workflows
This post walks through the full bring-up of a fresh Jetson Orin Nano: from re-flashing the OS over USB because the bundled JetPack was too old, through installing CUDA libraries and a CUDA-enabled PyTorch, all the way to running Depth-Anything-V3 (DA3-Small) on the GPU.
If you are starting with a Jetson that shipped with an older L4T release and you want to run modern ML workloads on it, this is roughly the path you will need to take.
Step 1: Why we had to re-flash
The board arrived with L4T 36.4.3 (an older JetPack 6.x point release). Most current ML stacks — and in particular the NVIDIA-built PyTorch wheels for Jetson — target a newer JetPack. Trying to install them on the old base image produced CUDA/driver mismatches that no amount of pip gymnastics was going to fix.
The fix is to re-flash the device using NVIDIA SDK Manager running on a separate Linux host.
A second wrinkle: the microSD card slot was not reliably detected during flashing. We worked around this by flashing JetPack directly onto an NVMe SSD installed in the M.2 slot, which also turned out to be much faster at runtime.
Step 2: Re-flashing with SDK Manager (host-side)
You will need:
- A separate Linux machine (Ubuntu 22.04 worked for us) with NVIDIA SDK Manager installed.
- A USB-C cable from the Jetson's "recovery" USB port to the host.
- The NVMe SSD already seated in the Jetson's M.2 slot.
Steps:
- Put the Jetson into Force Recovery (reboot) mode. With the device powered off, jumper the
FC RECandGNDpins on the carrier board's button header (or hold the Recovery button on dev kits), then plug in power. The board will boot into recovery silently — no display output. - On the host, confirm the device is visible:
You should see an
NVIDIA Corp.entry. - Launch SDK Manager, log in, and pick the target board. Select the latest JetPack that supports your module.
- Important: in the storage step, choose NVMe as the install target rather than the eMMC/SD slot. This both avoids the SD-detection issue and gives you a much larger, faster root filesystem.
- Let the host download and flash. The full download, flash, and on-device installation takes 60–90 minutes end to end depending on your network.
- When prompted, complete the on-device OEM setup (username, password, locale) over the serial console or with a display attached.
lsusb | grep -i nvidiaAfter this you should be booted from NVMe with a current JetPack, current kernel, and a matching CUDA toolkit.
Step 3: Install JetPack CUDA Components
Even after a successful SDK Manager flash, you'll want to make sure the full JetPack stack — CUDA, cuDNN, TensorRT, and VPI — is installed and up to date on the device:
sudo apt update
sudo apt install nvidia-jetpackCheck. Verify the CUDA compiler is available and the toolkit is in place:
nvcc --version
ls /usr/local/cudaIfjtoplater shows CUDA asMISSING, this is often just a display issue with/etc/nv_tegra_release— verify CUDA is actually present usingnvcc --versionrather than trusting the dashboard.
Step 4: Install jtop (System Monitor)
jtop is the go-to live monitor on Jetson — it shows GPU/CPU/RAM/EMC utilization, power mode, and the installed CUDA/cuDNN/TensorRT versions in one place.
sudo pip3 install jetson-stats
sudo rebootCheck. After reboot, run jtop and confirm GPU, CPU, RAM, and CUDA version are all visible.
Step 5: Set Up Python Environment
We use uv as the Python package manager — it's significantly faster than pip and handles virtualenvs cleanly, which matters on a Jetson where every install touches an SD/NVMe and ARM wheel resolution can be slow.
# Install uv (fast Python package manager)
curl -LsSf <https://astral.sh/uv/install.sh> | sh
source ~/.bashrc
# Create a virtual environment with Python 3.10
uv venv env_your_project --python 3.10
source env_your_project/bin/activatePython 3.10 is intentional: NVIDIA's official Jetson PyTorch wheel for JetPack 6 ships as cp310, so the venv must match.
Step 6: Add ~/.local/bin to PATH
uv and several pip-installed CLIs land in ~/.local/bin, which isn't always on the default PATH:
echo 'export PATH="$HOME/.local/bin:$PATH"' >> ~/.bashrc
source ~/.bashrcCheck. echo $PATH should now show /home/<user>/.local/bin at the beginning.
Step 7: Install PyTorch (NVIDIA Jetson Wheel)
Do NOT use a plain pip install torch — PyPI will hand you an x86 build, or at best a CPU-only aarch64 build, neither of which talks to the Jetson GPU. You want NVIDIA's prebuilt wheel that is compiled against the JetPack CUDA you just installed:
# Download NVIDIA's official Jetson wheel for PyTorch 2.5.0 (Python 3.10)
wget <https://developer.download.nvidia.com/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl>
# Install prerequisites
uv pip install numpy==1.26.1 setuptools
# Install PyTorch
uv pip install torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whlThe nv24.08 tag in the wheel name pins it to a specific JetPack release — make sure it matches the JetPack you flashed in Step 2.
Step 8: Fix Missing libcusparseLt
PyTorch 2.5+ links against cuSPARSELt, which JetPack does not include by default. Without it, import torch will fail to load CUDA kernels at runtime. Install it manually from NVIDIA's archive — note the sbsa (Server Base System Architecture) build, which is the right one for Jetson's aarch64 userspace; do not grab the x86_64 archive:
wget <https://developer.download.nvidia.com/compute/cusparselt/redist/libcusparse_lt/linux-sbsa/libcusparse_lt-linux-sbsa-0.5.2.1-archive.tar.xz>
tar xf libcusparse_lt-linux-sbsa-0.5.2.1-archive.tar.xz
sudo cp -a libcusparse_lt-linux-sbsa-0.5.2.1-archive/include/* /usr/local/cuda/include/
sudo cp -a libcusparse_lt-linux-sbsa-0.5.2.1-archive/lib/* /usr/local/cuda/lib64/
sudo ldconfigCheck. CUDA should now be visible to PyTorch:
python3 -c "import torch; print(torch.cuda.is_available())"
# Should print: TrueIf it still prints False, the usual suspects are: wrong wheel for your JetPack, the venv isn't activated, or ldconfig didn't pick up the new .so files.
Step 9: Install TorchVision (Build from Source)
There is no official torchvision wheel for Jetson that matches NVIDIA's custom torch build, so it has to be built from source. Pick the torchvision branch that pairs with your torch version — for torch 2.5.x, that's v0.20.0.
A few things worth knowing:
TORCH_CUDA_ARCH_LIST="8.7"is the SM architecture for Orin (Ampere). Setting it explicitly avoids torchvision compiling kernels for every GPU NVIDIA has ever made.MAX_JOBS=2keeps the parallel C++ compile from OOM-killing itself on the Orin Nano's 8 GB of RAM.-no-build-isolation --no-depsensures the build links against thetorchalready installed in your venv rather than pulling a fresh CPU build from PyPI.
Check.
python3 -c "import torchvision; print(torchvision.__version__)"
# Should show: 0.20.0Step 10: Depth-Anything-V3 — installing the requirements
With Torch and torchvision in place, clone Depth-Anything-V3:
git clone <https://github.com/><...>/Depth-Anything-3
cd Depth-Anything-3The upstream requirements.txt is written for x86 + datacenter GPUs and pulls in a few packages that either have no aarch64 wheel or require a heavy from-source build that is not worth it just to run inference. The two we had to comment out to get a clean install on Jetson were:
xformers— no Jetson wheel; only needed for some training / memory-efficient attention paths. Inference works without it (PyTorch falls back to standard attention).pycolmap— pulls in COLMAP and a long C++ build chain. Only needed for the SfM/reconstruction utilities, not for running the depth model.- note that
Depth-Anything-3/src/depth_anything_3/api.pytries to importutils.exportwhich haspycolmapdependency — to avoid the import error, we just comment out the utils.export import inapi.py
Edit requirements.txt and prefix those lines with #, then:
uv pip install -r requirements.txtIf anything else fails to build, the same principle applies: figure out whether it is on the inference path or only used by training/eval tooling, and skip it if you can.
Step 11: Running DA3-Small on CUDA
At this point the DA3-Small checkpoint runs end-to-end on the Jetson GPU. A minimal smoke test:
import torch
from depth_anything_v3 import DepthAnythingV3 # adjust import to match repo
device = "cuda"
model = DepthAnythingV3.from_pretrained("depth-anything-v3-small").to(device).eval()
img = torch.randn(1, 3, 518, 518, device=device)
with torch.no_grad():
depth = model(img)
print(depth.shape, depth.device)You should see the tensor on cuda:0, and jtop (or tegrastats) should show the GPU lighting up.
Recap
The shape of the work was:
- Re-flash with SDK Manager from a Linux host, into NVMe, with the Jetson in recovery mode (60–90 minutes).
- Install the full JetPack CUDA stack via
nvidia-jetpack. - Install
jtopfor live monitoring. - Set up a Python 3.10 virtualenv with
uv, and put~/.local/binonPATH. - Install the NVIDIA-built PyTorch wheel that matches your JetPack.
- Drop in the missing cuSPARSELt (
sbsabuild) sotorch.cuda.is_available()returnsTrue. - Build torchvision from source, pinned to the matching version, with
TORCH_CUDA_ARCH_LIST="8.7"andMAX_JOBS=2. - Install Depth-Anything-V3's requirements with
xformersandpycolmapcommented out. - Load DA3-Small on
cudaand confirm it runs.
Most of the friction is in the flashing, the PyTorch wheel, and the torchvision build — once you have a Jetson-correct PyTorch with a matching torchvision, the rest of the ML ecosystem mostly behaves.