How to Write CUDA GPU Kernels in Rust with NVIDIA's cuda-oxide Compiler

By ✦ min read

Introduction

NVIDIA's cuda-oxide is an experimental compiler that lets you write CUDA SIMT (Single Instruction, Multiple Threads) GPU kernels directly in standard Rust. Instead of switching to C++ or relying on Python-level abstractions, you can now compile Rust code straight to PTX (Parallel Thread Execution)—the intermediate representation used by CUDA for NVIDIA GPUs. This guide walks you through setting up and using cuda-oxide to create your first Rust-based GPU kernel, explaining the unique compilation pipeline and how it fits into the Rust GPU ecosystem.

How to Write CUDA GPU Kernels in Rust with NVIDIA's cuda-oxide Compiler
Source: www.marktechpost.com

What You Need

Step-by-Step Guide

Step 1: Set Up the Rust Nightly Toolchain

cuda-oxide uses unstable Rust features that are only available in nightly builds. Install nightly via rustup and set it as default for your project (or use a toolchain override).

  1. Install Rust if you haven't: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
  2. Install nightly: rustup install nightly
  3. Set nightly as default: rustup default nightly (or use rustup override set nightly in your project directory).

Step 2: Clone and Build cuda-oxide

Download the cuda-oxide repository. It includes the custom codegen backend and all necessary crates (like rustc-codegen-cuda and Pliron-based dialects).

  1. Clone the repo: git clone https://github.com/NVIDIA/cuda-oxide.git
  2. Navigate into the directory: cd cuda-oxide
  3. Build the project: cargo build --release
  4. Note the path to the compiled backend – you'll use it later to compile kernels.

Step 3: Create a New Rust Project for Your Kernel

cuda-oxide compiles entire #![no_std] crates into PTX. Create a library crate with special attributes.

  1. Create a new cargo project: cargo new my_gpu_kernel --lib
  2. Edit Cargo.toml to add: crate-type = ["lib"] and set edition = "2021".
  3. Add #![no_std] and #![feature(abi_c_cmse_nonsecure_call)] (or other required features) to lib.rs.

Step 4: Write a SIMT Kernel in Rust

Define a function that will run on the GPU. Use the #[cuda_kernel] attribute (provided by cuda-oxide) and access thread indices via device intrinsics.

// lib.rs
#![no_std]
#![feature(abi_c_cmse_nonsecure_call)]

extern crate cuda_oxide_intrinsics;
use cuda_oxide_intrinsics::{thread_idx_x, block_idx_x, block_dim_x};

#[no_mangle]
pub unsafe extern "C" fn vector_add(
    a: *const f32,
    b: *const f32,
    c: *mut f32,
    n: u32
) {
    let idx = block_idx_x() * block_dim_x() + thread_idx_x();
    if idx < n {
        *c.add(idx as usize) = *a.add(idx as usize) + *b.add(idx as usize);
    }
}

Step 5: Compile Your Kernel with cuda-oxide

Invoke the cuda-oxide compiler (a custom rustc wrapper) to produce a PTX file.

How to Write CUDA GPU Kernels in Rust with NVIDIA's cuda-oxide Compiler
Source: www.marktechpost.com
  1. Build using the custom codegen backend: cargo +nightly build --target-dir ptx --release -Zcodegen-backend=/path/to/cuda-oxide/target/release/librustc_codegen_cuda.so
  2. Find the generated PTX file in ptx/release/my_gpu_kernel.ptx.

Step 6: Integrate PTX into a CUDA Host Program (Optional)

To run the kernel on actual hardware, you need a C/C++ host program that loads the PTX and launches it. cuda-oxide focuses on the kernel compilation step; you can use standard CUDA runtime APIs.

  1. Write a simple host program (e.g., in C) that calls cuModuleLoadData and cuLaunchKernel with the PTX.
  2. Compile with NVCC: nvcc host.cu -o host
  3. Run: ./host

Step 7: Test and Debug

Validate the kernel by comparing output with CPU results. Use ptxas to assemble PTX into cubin for additional verification. Note that cuda-oxide is experimental, so expect occasional failures.

Tips for Success

Tags:

Recommended

Discover More

10 Things You Need to Know About OpenClaw and the Future of Autonomous AI AgentsOrchestrating Multi-Agent AI Systems for Comprehensive Biological Modeling5 Key Facts About the DDoS Attack That Crippled Ubuntu ServicesNavigating the New Mac Mini Pricing: What $799 Gets You NowNew AWS Agents Go Live and Service Lifecycle Updates: Your Questions Answered