Updates in 2025.3

General

  • Added support for CUDA 13.0. See the tool’s CUDA driver system requirements.

  • Added or improved support for Blackwell chips.

  • For Green Context launches, launch__waves_per_multiprocessor is now scaled to the number of SMs in the Green Context.

  • Added support for profiling individual nodes of device-launchable CUDA graphs launched from the host.

  • Added metric launch__persisting_l2_cache_size to the Memory Workload Analysis section.

  • Removed metric profiler__pmsampler_dropped_samples.

  • Added support for not importing SASS cubins into the report.

NVIDIA Nsight Compute

NVIDIA Nsight Compute CLI

  • Added the option –forward-signals to transparently forward signals to the profiled application.

Resolved Issues

  • Fixed that some ncu console messages were truncated after 1024 characters.

  • Fixed some display issues related to Green Context tables.

  • Improved the performance of remote profiling in application replay mode.

  • Fixed a hang in certain scenarios when profiling dependent kernels with device-mapped host allocations.

  • Fixed missing correlation between JIT-compiled PTX to SASS in some situations.

  • Fixed an error when profiling a CUDA graph kernel node doing a cluster launch on driver 580 or newer.