# Profiling in Python

"Premature optimization is the root of all evil" - Donald Knuth

It’s usually more important that your code runs correctly according to the business requirements and that other team members can understand it rather than it being the most efficient solution.

The actual time-saver might be elsewhere. For example, having the ability to quickly extend your code with new features to meet user needs.

Sometimes, the return on investment in performance optimizations just isn’t worth the effort. If you only run your code once or twice, or if it takes longer to improve the code than execute it, then what’s the point?

Code will often become faster just as a result of fixing the bugs and refactoring. One of the creators of Erlang once said:

Make it work, then make it beautiful, then if you really, really have to, make it fast. 90 percent of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful! (Source)

— Joe Armstrong

Optimize performance as a final step if it's necessary.


Software profiling is the process of collecting and analyzing various metrics of a running program to identify performance bottlenecks.

Software profiling can help tell you whether optimizing the code is necessary and, if so, which parts of the code to focus on.

Note: A performance profiler is a valuable tool for identifying hot spots in existing code, but it won’t tell you how to write efficient code from the start.

It’s often the choice of the underlying algorithm or data structure that can make the biggest difference. Even when you throw the most advanced hardware available on the market at some computational problem, an algorithm with a poor time or space complexity may never finish in a reasonable time.

These bottlenecks can happen due to a number of reasons: including excessive memory use, inefficient CPU utilization, or a suboptimal data layout, which will result in frequent cache misses that increase latency.

The bottlenecks might lie not in the underlying code’s execution time but in network communication.

There are different types of python profilers. You should pick the right tool for the job.
- Timers like the `time` and `timeit` standard library modules, or the `codetiming` third-party package
- Deterministic profilers like `profile`, `cProfile`, and `line_profiler`
- Statistical profilers like `Pyinstrument` and the Linux `perf` profiler
- Scalene


## `time` for measuring the exact execution time

The time module is versatile and quick to set up, making it suitable for temporary checks. It’ll give you a faithful impression of runtime in real-world conditions, taking into account factors like the current system load.

In [1]:
import time

# ask the OS to suspend the current thread of execution for 1.75 secs
# during this time the function remains dormant without occupying the CPU
def sleeper():
    time.sleep(1.75)

# perform busy waiting, wasting CPU cycles without doing any useful work
def spinlock():
    for _ in range(100_000_000):
        pass


for function in sleeper, spinlock:
    # get wall-clock time and CPU time
    t1 = time.perf_counter(), time.process_time()
    function()
    t2 = time.perf_counter(), time.process_time()
    print(f"{function.__name__}()")
    print(f" Real time: {t2[0] - t1[0]:.2f} seconds")
    print(f" CPU time: {t2[1] - t1[1]:.2f} seconds")
    print()

sleeper()
 Real time: 1.75 seconds
 CPU time: 0.00 seconds

spinlock()
 Real time: 1.95 seconds
 CPU time: 1.95 seconds



## `timeit` for benchmarking code snippets

The `timeit` module accounts from factors such as system load, garbage collection or other processes that might be running concurrently that might skew timing results. The timeit module helps to mitigate these factors, providing a more accurate measure of code execution time.

In [12]:
from timeit import timeit

# A recursive function that calculates the nth element of the Fibonacci sequence
def fib(n):
    return n if n < 2 else fib(n - 2) + fib(n - 1)

# repeat 100 times
iterations = 100
total_time = timeit("fib(30)", number=iterations, globals=globals())

# with magic command
# %timeit fib(30)

# average to average out random fluctuations in execution time that may come from other processes running on your computer.
f"Average time is {total_time / iterations:.2f} seconds"

284 ms ± 7.33 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


'Average time is 0.28 seconds'

## `cProfile` for collecting detailed runtime statistics

`cProfiler` is a deterministic profiler, which can help you answer questions like how many times a particular function was called or how much total time was spent inside that function. A deterministic profiler can give you reproducible results under the same conditions because it traces all function calls in your program.

You can use cProfile against your whole program in the command line or profile a narrow code fragment programmatically.

In [17]:
from cProfile import Profile
from pstats import SortKey, Stats


def fib(n):
    return n if n < 2 else fib(n - 2) + fib(n - 1)


with Profile() as profile:
    print(f"{fib(35) = }")
    (Stats(profile).strip_dirs().sort_stats(SortKey.CALLS).print_stats())

fib(35) = 9227465
         29860736 function calls (34 primitive calls) in 7.329 seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
29860703/1    7.329    0.000    7.329    7.329 1786731988.py:5(fib)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.isinstance}
        3    0.000    0.000    0.000    0.000 {built-in method builtins.len}
        2    0.000    0.000    0.000    0.000 {built-in method posix.getpid}
        2    0.000    0.000    0.000    0.000 {method '__exit__' of '_thread.RLock' objects}
        2    0.000    0.000    0.000    0.000 {method 'write' of '_io.StringIO' objects}
        2    0.000    0.000    0.000    0.000 iostream.py:444(_is_master_process)
        2    0.000    0.000    0.000    0.000 iostream.py:465(_schedule_flush)
        2    0.000    0.000    0.000    0.000 iostream.py:535(write)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects