## Nsight Profilers
Nvidia provides nsight-compute (ncu) and nsight-system (nsys) profilers. Also, it provides a UI viewer called (nsys-ui).

To use `ncu` **to generate any dump logs** without modifying nvidia system-wide module default configuration, you need superuser permission. 

Also, install the conda package `cuda` or its subset `nvidia::nsight-compute`.

To profile, switch to the root account, and `conda activate` your environment, run

In [6]:
%%bash
echo YOUR_ROOT_PASSWD_HERE | sudo -S ncu --set detailed -o test.ncu-rep -- python -c 'import torch; print(torch.tensor([1], device="cuda:0") * 10)'

[sudo] password for tk: 

==PROF== Connected to process 401642 (/home/tk/Desktop/jupyter/simp-intelligence/.pixi/envs/default/bin/python3.13)
==PROF== Profiling "vectorized_elementwise_kernel" - 0: 0%....50%....100% - 18 passes
tensor([10], device='cuda:0')
==PROF== Disconnected from process 401642
==PROF== Report: /home/tk/Desktop/jupyter/simp-intelligence/simp_intelligence/cuda/test.ncu-rep


If you ever see 
> ==ERROR== ERR_NVGPUCTRPERM - The user does not have permission to access NVIDIA GPU Performance Counters on the target device 0. For instructions on enabling permissions and to get more information see https://developer.nvidia.com/ERR_NVGPUCTRPERM

it is because you have not configured the Nvidia system module to allow any user, and you are running as a non-root user.

To see different log level in `ncu` (Metrics indicate the runtime costs):

In [7]:
!ncu --list-set

---------- --------------------------------------------------------------------------- ------- -----------------
Identifier Sections                                                                    Enabled Estimated Metrics
---------- --------------------------------------------------------------------------- ------- -----------------
basic      LaunchStats, Occupancy, SpeedOfLight, WorkloadDistribution                  yes     191              
detailed   ComputeWorkloadAnalysis, LaunchStats, MemoryWorkloadAnalysis, MemoryWorkloa no      560              
           dAnalysis_Chart, Occupancy, SourceCounters, SpeedOfLight, SpeedOfLight_Roof                          
           lineChart, WorkloadDistribution                                                                      
full       ComputeWorkloadAnalysis, InstructionStats, LaunchStats, MemoryWorkloadAnaly no      6581             
           sis, MemoryWorkloadAnalysis_Chart, MemoryWorkloadAnalysis_Tables, NumaAffin          

For using ncu inside docker (also see ref 2):
```bash
$ systemctl isolate multi-user
$ sudo modprobe -r nvidia_uvm nvidia_drm nvidia_modeset nvidia-vgpu-vfio nvidia
$ sudo vim /etc/modprobe.d/nvidia-compute.conf  #  write `options nvidia "NVreg_RestrictProfilingToAdminUsers=0"` 
$ sudo update-initramfs -u
$ sudo reboot
```

## Reference
1. https://developer.nvidia.com/nvidia-development-tools-solutions-err_nvgpuctrperm-permission-issue-performance-counters
2. https://forums.developer.nvidia.com/t/use-nsight-compute-in-docker/73344/8