# Analyze performance with Taichi Profiler
Ref: Yuanming Hu, Taichi Cookbook 001, 27.07.22

It takes a long time to run my program, but how can I figure out which Taichi kernel is the most time-consuming?

In [1]:
import taichi as ti

[Taichi] version 1.7.3, llvm 15.0.4, commit 5ec301be, linux, python 3.12.9


[I 05/26/25 15:45:17.534 24607] [shell.py:_shell_pop_print@23] Graphical python shell detected, using wrapped sys.stdout


In [3]:
ti.init(arch=ti.cpu, kernel_profiler=True)
f = ti.field(ti.i32, shape=(32,32))

[Taichi] Starting on arch=x64


In [4]:
@ti.kernel
def foo():
    for i in range(400_000):
        f[0, 0] += i
    
    for j in range(100_000):
        f[0, 1] += i

@ti.kernel
def bar():
    for i in range(1000):
        a = f[0, 31]

In [7]:
# Execute the kernels multiple times to generate profiling data
for i in range(10):
    foo()
bar()

# Print the kernel profiler information
ti.sync()
ti.profiler.print_kernel_profiler_info()

Kernel Profiler(count, default) @ X64 
[      %     total   count |      min       avg       max   ] Kernel name
-------------------------------------------------------------------------
[ 95.90%   0.192 s     20x |    8.179     9.618    10.904 ms] foo_c76_0_kernel_0_range_for
[  4.10%   0.008 s     20x |    0.275     0.411     0.638 ms] foo_c76_0_kernel_1_range_for
-------------------------------------------------------------------------
[100.00%] Total execution time:   0.201 s   number of results: 2


To sum up:

1. A kernel that has been fully optimized by the compiler would not generate profiling records (the bar kernel mentioned above is a fully optimized one).

2. One kernel may generate multiple records of parallel for loops because they are divided into different tasks and assigned to separate devices.

3. Make sure you call `ti.sync()` before performance profiling if the program is running on GPU.

4. jit_evaluator_xxx can be ignored because it is automatically generated by the system.

5. Currently, `kernel_profiler` only supports CPU and CUDA. (But you are very encouraged to make contributions and add more backends!)

6. You are recommended to run performance profiling several times to observe the minimum or average execution time.