## Lecture 02  
## Python Performance Tips  
### Feb. 08, 2021

## Useful resources
- https://nyu-cds.github.io/python-performance-tips/ (link seems broken, a copy is available [here](http://alberto.bietti.me/python-performance-tips/))

- https://wiki.python.org/moin/PythonSpeed/PerformanceTips

- Python is a dynamic interpreted language (not a compiled language)

- It is not compiled to the native object code and executed on a computer system

- **Types** of variables, function arguments, etc. are not known until the program runs

- Dynamic interpreted languages have great flexibility, but suffer significant performance limitations

- Difficult to optimize, dependence on the interpreter


---

- Python is easy to learn, write, read, debug

- A large library of built-in functions and libraries: https://docs.python.org/3/library/functions.html

---


### How to optimize?

- Get the program to give correct results

- Then rerun to see if the correct program is slow

- Profile to find which parts of the program consume most of the time 

- Repeat

### Today's topics: 

- Built-in functions

- Function Call Overhead

- Function Decorator

- Loops, and built-in operators

- Membership operator **in** 


## Timing Python Code

In [7]:
# manually to time:

import time

start = time.time()

#factorial 500! = 1 * 2 * 3 * ... * 500
fact = 1
for i in range(1, 1000): 
    fact *= i
    
end = time.time()
print("run_time: %f" % (end_time - start_time))

run_time: 0.000000


In [17]:
# Timing Python Code
# timeit module
# To see how long it takes a program to run once;
# on average over a bunch of runs, e.g. over k=10000 runs;

import timeit

def my_function():
    fact = 1
    for i in range(500): 
        fact *= i

k = 10_000
print("run_time:", timeit.timeit(my_function, number=k)/k)

run_time: 1.7912330000001474e-05


In [22]:
%timeit -r 5 -n 1000 my_function(100)

3.76 µs ± 416 ns per loop (mean ± std. dev. of 5 runs, 1000 loops each)


In [113]:
Fact = 1
def my_function(n):
    global Fact
    for i in range(n): 
        Fact *= i

%timeit my_function(100)

5.04 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [None]:
Fact = 1
def my_function(n, fact):
    for i in range(n): 
        fact *= i
    return fact

%timeit my_function(100, Fact)

In [20]:
%timeit -n 1000 my_function(500)

18.7 µs ± 759 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [None]:
#if using IPython

# default 100_000 loops
%timeit my_function()

%timeit -n 1000 my_function()



In [None]:
#if using IPython

# default 100_000 loops
%timeit -r 10 my_function()

%timeit -r 10 -n 1000 my_function()



In [None]:
def f(x):
    return x**2

def g(x):
    return x**4

def h(x):
    return x**8


%timeit -n 100 f(5)

%timeit -n 100 g(5)

%timeit -n 100 h(5)


# Built-in Functions


- One of the easiest ways to improve Python performance is not to execute any Python code at all! 

- Python provides a large number of built-in functions that perform a wide variety of operations. 

- These built-in functions are written in C, and so are generally very fast. 

- See the Python documentation for a list of the available functions: https://docs.python.org/3/library/functions.html


In [None]:
import random

def my_min(values):
    min_value = values[0]
    for v in values:
        if v < min_value:
            min_value = v
    return min_value


random_numbers = [random.random() for _ in range(0,100_000)]

print(my_min(random_numbers), min(random_numbers))

#time "my_min()"
%timeit -n 100 my_min(random_numbers)

#IPython already provides the function "min()"
%timeit -n 100 min(random_numbers)


In [23]:
import numpy as np

def my_min(values):
    min_value = values[0]
    for v in values:
        if v < min_value:
            min_value = v
    return min_value


random_numbers = [np.random.rand() for _ in range(0,100_000)]

print(my_min(random_numbers), min(random_numbers))

#time "my_min()"
%timeit -n 100 my_min(random_numbers)

#IPython already provides the function "min()"
%timeit -n 100 min(random_numbers)


9.314628581025275e-05 9.314628581025275e-05
2.71 ms ± 69.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
930 µs ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Function Call Overhead  


**How functions affect a program’s performance?**  

- Function call overhead in Python is relatively high, especially compared with the execution speed of builtin functions. 

- The overhead is due, e.g., to loading function objects, loading and checking function arguments, dynamic type checking of function arguments that must be performed before and after the function call.

- One idea is to minimize the number of function calls by handling aggregates


In [37]:
total_sum = 0
def inner(i):
    global total_sum
    total_sum += i

    
# The sum of the first n non-negative integers
# S_{n} = 1 + ... + n 

def outer_1():
    for i in range(10_000 + 1): 
        inner(i)

outer_1()
print(total_sum)

%timeit -n 100 -r 3 outer_1()

50005000
1.4 ms ± 75.5 µs per loop (mean ± std. dev. of 3 runs, 100 loops each)


- The "inner" function was called 10000 times. 

- Instead, move the loop inside the "aggregate" function and call it only once!

In [105]:
x = 0
def aggregate(l):
    global x
    for i in l:
        x = x + i

def outer_2():
    aggregate(range(10000))

outer_2()
print(x)

%timeit -n 1000 -r 3 outer_2()

49995000
739 µs ± 38.7 µs per loop (mean ± std. dev. of 3 runs, 1000 loops each)


In [None]:
def aggregate(l):
    x = 0
    for i in l:
        x = x + i
    return x

def outer_2():
    aggregate(range(10000))

outer_2()
print(x)

%timeit -n 1000 -r 3 outer_2()

## Membership Testing

- Python provides the **in** operator (a membership operator) to check if an element exists in a collection. 

- The **in** operator is very fast at checking if an element exists in a **dict** or a **set**, because both dict and set are implemented using a **hash table**. 


In [None]:
letters = 'abcdefghijklmnopqrstuvwxyz'
letters_list = [x + y + z for x in letters for y in letters for z in letters]

print("first 10 members:", letters_list[:10])
print("last  10 members:", letters_list[-10:])

In [None]:
print("len_letters_list = %d" % len(letters_list))
print("len_letters ** 3 = %d = %d ** 3" % (len(letters) ** 3, len(letters)))

In [None]:
"aaa" in letters_list

In [None]:
"zzz" in letters_list

In [None]:
#Membership of a List
%timeit ("aaa" in letters_list)
%timeit ("mmm" in letters_list)
%timeit ("zzz" in letters_list)

### Checking for membership in a list or tuple is not as efficient!

In [None]:
#Membership of a Dictionary

# identity mapping: 
letters_dict = dict([(x, x) for x in letters_list])

for k, v in letters_dict.items():
    print(k, ":", v)

In [None]:
# "aaa" in letters_dict
# "zzz" in letters_dict

In [None]:
%timeit ("aaa" in letters_dict)
%timeit ("mmm" in letters_dict)
%timeit ("zzz" in letters_dict)

### You could also convert a list into a set and check for a membership

---

In [None]:
letters_set = set(letters_list)

%timeit ("aaa" in letters_set)
%timeit ("mmm" in letters_set)
%timeit ("zzz" in letters_set)

## String Concatenation

In [40]:
def make_string(string_list):
    my_string = ''
    for character in string_list:
        my_string += character
    return my_string

str_list = [character for character in 'abcdefghijklmnopqrstuvwxyz']
print(str_list)

['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']


In [42]:
%timeit -n 1000 list('abcdefghijklmnopqrstuvwxyz')

310 ns ± 52.5 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [43]:
%timeit -n 1000 [character for character in 'abcdefghijklmnopqrstuvwxyz']

934 ns ± 205 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [44]:
%timeit make_string(str_list)

1.28 µs ± 21 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [45]:
%timeit "".join(str_list)

230 ns ± 5.47 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


# Decorator Caching

## Function Decorator



---

- The symbol **@** is Python decorator syntax. 


- Python decorators are callable Python object that is used to modify a function, method or class definition.   


- Python decorators are normally used for tracking, locking, or logging  


- The wise use of decorators can improve the performance of codes.  


- Decorate a Python function so that it remembers the results needed later  

---

In [46]:
def decorating_a_function(func):
    def function_wrapper(x):
        print("Now \"" + func.__name__ + "\" becomes decorated.")
        print("The attribute:")
        func(x)
        print("is surrounded by all these text!")
    return function_wrapper

def foo(x):
    print(x)

In [47]:
# run the function
foo("test")

# separator
print("-" * 50)

# get the function name
print("function_name:", foo.__name__)

test
--------------------------------------------------
function_name: foo


In [48]:
foo = decorating_a_function(foo)

# run the function
foo("test")

# separator
print("-" * 50)

# get the function name
print(foo.__name__)

Now "foo" becomes decorated.
The attribute:
test
is surrounded by all these text!
--------------------------------------------------
function_wrapper


In [50]:
# run the function
foo("hel")

# separator
print("-" * 50)

# get the function name
print(foo.__name__)

Now "foo" becomes decorated.
The attribute:
hel
is surrounded by all these text!
--------------------------------------------------
function_wrapper


In [51]:
def decorating_a_function(func):
    def function_wrapper(x):
        print("Now the function called \"" + func.__name__ + "\" becomes decorated.")
        print("The attribute of the function:")
        func(x)
        print("is surrounded by all this text!")
    return function_wrapper

@decorating_a_function
def foo(x):
    print(x)

In [52]:
# Test 1:
foo("test")

Now the function called "foo" becomes decorated.
The attribute of the function:
test
is surrounded by all this text!


In [None]:
# Test 2:
print(foo.__name__)

## Imported functions can also be decorated

In [53]:
from math import sin, cos, pi

def our_decorator(func):
    def function_wrapper(x):
        print("The result of %s(%0.4f) is: %0.4f" % (func.__name__, x, func(x)))
    return function_wrapper

# in this case is not possible to use @
sin = our_decorator(sin)
cos = our_decorator(cos)

for f in [sin, cos]: f(pi/2)

for f in [sin, cos]: f(pi)
    


The result of sin(1.5708) is: 1.0000
The result of cos(1.5708) is: 0.0000
The result of sin(3.1416) is: 0.0000
The result of cos(3.1416) is: -1.0000


In [55]:
from math import sin, cos, pi

sin(pi/2)

1.0

In [61]:
from math import sin, cos, pi

def our_decorator(func):
    def function_wrapper(x):
        print("The result of %s(%0.4f) is: %0.4f" % (func.__name__, x, func(x)))
        func(x)
    return function_wrapper

# in this case is not possible to use @
sin = our_decorator(sin)
cos = our_decorator(cos)

sin(pi/4)

The result of sin(0.7854) is: 0.7071


In [66]:
from collections import Counter

def decorator(func):
    def wrapper(x):
        print("The result of function {} is {}".format(func.__name__, func(x)))
        func(x)
    return wrapper

counter = decorator(Counter)
counter([1, 2, 3, 4])


The result of function Counter is Counter({1: 1, 2: 1, 3: 1, 4: 1})


## Using wraps from module functools

- a module with higher-order functions and operations on callable objects

In [83]:
from functools import wraps

def greeting(func):
    @wraps(func)
    def function_wrapper(x):
        """function_wrapper of greeting"""
        res = func(x)
        print("Hello, this function " + func.__name__ + " at value " + str(x) + " returns:", res)
        return res
    return function_wrapper

@greeting
def simple_f(x):
    """this is a docstring of some simple function"""
    return (x + 500)

In [84]:
#call simple_f
a = simple_f(10)

Hello, this function simple_f at value 10 returns: 510


In [85]:
a

510

In [78]:
print("function name: " + simple_f.__name__)
print("docstring: " + simple_f.__doc__)
print("module name: " + simple_f.__module__)

function name: simple_f
docstring: this is a docstring of some simple function
module name: __main__


# Using Decorators for Caching

In [81]:
import time

# Consider Fibonacci numbers: 
# defined as f_n = f_{n - 1} + f_{n - 2} for n >=2 
# where f_0 = 0 and f_1 = 1
# https://en.wikipedia.org/wiki/Fibonacci_number
    

# a simple recursion: 
def fib(i):
    if i <= 1: return i
    return fib(i - 1) + fib(i - 2)

#0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...

t = time.process_time() 
fib_result = fib(33)
elapsed_time = time.process_time() - t
print("fibonacci time: %0.10f; fib_result: %d" % (elapsed_time, fib_result))

fibonacci time: 1.2031250000; fib_result: 3524578


In [82]:
start = time.time()
fib_result = fib(33)
end = time.time()
print(end - start)

1.1839494705200195


# Memoization

In [98]:
# The idea is a memoization: 
# introduce a map (dictionary) "memo"
# in which to save intermediate steps
# of calculations

def fib_memo(i, memo=dict()):
    if i <= 1: 
        return i
    if i in memo: 
        return memo[i]

    memo[i] = fib_memo(i - 1, memo) + fib_memo(i - 2, memo)
    return memo[i]

t = time.process_time() 
fib_m = fib_memo(33)
elapsed_time_memo = time.process_time() - t
print("fibonacci time: %0.10f; fib_result: %d" % (elapsed_time_memo, fib_m))

fibonacci time: 0.0000000000; fib_result: 3524578


In [None]:
# We can create a decorator that saves:
# each intermediate value in memory 
# rather than calculating it every time.

from functools import wraps

def cache(f):
    memo = {}
    @wraps(f)
    def function_wrapper(*arg):
        if arg not in memo:
            memo[arg] = f(*arg)
        return memo[arg]
    
    return function_wrapper

@cache
def fib_cache(i):
    if i < 2: return i
    return fib_cache(i - 1) + fib_cache(i - 2)


t = time.process_time() 
fib_c = fib_cache(33)
elapsed_time_c = time.process_time() - t
print("fibonacci time: %0.10f; result: %d" % (elapsed_time_c, fib_c))

# Optimizing Loops

In [86]:
import random

lowerlist = ['abcdefghijklmnopqrstuvwxyz'[:random.randint(0, 26)] for x in range(10_00)]
upperlist = []

# get firs 20 elements
lowerlist[ : 20]

['abcdefghijklmnopqr',
 'abcdefghijklmnopqrstu',
 'abcdefghijkl',
 'abcdefghijklmnopqrstuvw',
 'abcdefghijklm',
 'abcdefghijklmno',
 'abc',
 'abcdefghijklmnopqrst',
 'abcdef',
 'abcdefghijkl',
 'abcdefg',
 'abcdefghijkl',
 'abcdefghijklmnopq',
 'abcdefghijklmnopq',
 '',
 'abc',
 'abcdefgh',
 'abcdef',
 'abcdefghijklmnopqrs',
 'abcdefghijklmnopqrst']

### Task: From the lowerlist build the upperlist

In [107]:
X = [1, 2, 3, 4, 5]
newList = []
def global_access():
    for i in X:
        newList.append(i)
%timeit global_access()

454 ns ± 10.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [108]:
X = [1, 2, 3, 4, 5]
newList = []
def local_access():
    x = X
    newlist = newList
    for i in x:
        newlist.append(i)
%timeit local_access()

424 ns ± 15 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [103]:
upperlist = []

def to_upper_1():
    for word in lowerlist:
        upperlist.append(str.upper(word))
    return upperlist

%timeit -n 100 to_upper_1()

158 µs ± 8.92 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


- The loop calls two methods: "upperlist.append" and "str.upper" every time. 

- Python must support dynamic attributes as well as multiple namespaces. 


In [104]:

upperlist = []
f_upper = str.upper
f_append = upperlist.append

def to_upper_2():
    for word in lowerlist:
        f_append(f_upper(word))

%timeit -n 100 to_upper_2()

116 µs ± 6.99 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:

def to_upper_3():
    upperlist = []
    f_upper = str.upper
    f_append = upperlist.append

    for word in lowerlist:
        f_append(f_upper(word))

%timeit -n 100 to_upper_3()

### Avoid the loop

In [87]:
# A "map" is often called "apply-to-all" when considered in functional form;
# e.g. a map applied on all elements of a list

simple_mapping = map(lambda x : x + 100, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])

list(simple_mapping)

[100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110]

In [88]:
# Apply the method "upper" on strings

print(lowerlist[0], str.upper(lowerlist[0]))

abcdefghijklmnopqr ABCDEFGHIJKLMNOPQR


In [90]:

# avoiding the loop by using "map"
upper = str.upper

%timeit -n 100 upperlist = list(map(str.upper, lowerlist))

58 µs ± 8.82 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [101]:

# avoiding the loop by using "list comprehension"
f_upper = str.upper 
%timeit -n 100 upperlist = [f_upper(word) for word in lowerlist]

88.4 µs ± 9.43 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [102]:
#f_upper = str.upper 
%timeit -n 100 upperlist = [str.upper(word) for word in lowerlist]

114 µs ± 6.27 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [93]:
# add two random vectors; given as lists

import random 

random_numbers1 = [random.random() for _ in range(0,100000)]
random_numbers2 = [random.random() for _ in range(0,100000)]

%timeit res1 = list(map(lambda x, y: x + y, random_numbers1, random_numbers2))

# However map is calling our function as "adding two vectors" not as "cross product" 

8.63 ms ± 131 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
len(list(map(lambda x, y: x + y, random_numbers1, random_numbers2)))

## Intrinsic Opertors

- Another performance improvement is to use intrinsic operators (+, -, *, etc.) instead of a user defined function.  


- The operator module exports a set of efficient functions corresponding to the intrinsic operators of Python.   


- operator.add(x, y) is equivalent to the expression x + y.   


- `import operator`

https://docs.python.org/3.4/library/operator.html

In [96]:
import operator

%timeit res2 = list(map(operator.add, random_numbers1, random_numbers2))

3.56 ms ± 64.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### using maps

- In Python 3.5, **the map function returns an iterator that does not evaluate the arguments until it needs to**. 

-  By converting the iterator to a list, we are forcing map to compute every value.


In [None]:
#Instead here we use the operator "add" directly without "map":

%timeit res3 = operator.add(random_numbers1, random_numbers2)

In [94]:
# Using numpy: exploit contiguous memory, special instructions

import numpy as np

rand1_np = np.array(random_numbers1)
rand2_np = np.array(random_numbers2)

%timeit res4 = rand1_np + rand2_np

41.2 µs ± 987 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [97]:
# Using Python arrays (if you don't have Numpy...)

import array

rand1_arr = array.array('d', random_numbers1)
rand2_arr = array.array('d', random_numbers2)

%timeit res4 = operator.add(rand1_arr, rand2_arr)

366 µs ± 3.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
