Updates in 2025.3.1

General

  • Improved the charts in the Compute Workload Analysis section to better distinguish between per_cycle_active and per_cycle_elapsed metrics.

Resolved Issues

  • Fixed an issue where kernels using the compile-time attribute __block_size__ were launched with incorrect grid dimensions.

  • Fixed an issue with timline y-axis labels being showing unexpected units for small max values.

  • Fixed a crash when stepping applications in interactive profiling mode.

  • Fix that roofline charts missed showing achived value data in some cases.

  • Fixed that duplicated tooltips could be shown for some links in the Memory Chart.

  • Fixed a potential hang when setting --pm-sampling-buffer-size to very large values.

  • Fixed several rules to not show non-actionable warnings for unsupported, missing metrics when profiling on mobile chips.