# Measuring and optimising python code

We will look at measuring how much time python code takes to run. 

We will compare standard python code with numpy.

## Timing your code

Lets look how much it costs to create 1000 lists with random numbers, each of length 10000, and then compute the sum of the list.

For any of the tests in this assignment, if you get very small numbers, or very large numbers, change the number of times you run the tests. I got 

0.086 for test1()
0.845 for test2()

Exactly what the numbers are is not important however. What we see here is that test2 is approximately one order of magnitude slower.

(one order of magnitude = 10 x    two orders of magnitude = 100 x   three orders = 1000 x and so on)

Lets try to create a list by first creating a numpy array and then converting it to a list.

This will be our test3(). I got

0.454 for test3()

This is often (but not always) true:  

- code in pure python is often slower than code written with numpy.


## Task

### a) 

Run the code and give use your timing data for test1() test2() and test3() on your computer. Adjust number of tests if needed (I am running on an old laptop).  

Identify where you see performance which is 2x, 5x, 10x (and so on) compared to other tests. Looking at the data in this way is often more useful than the actual numbers.



In [7]:
import numpy as np
import time
import random
import math as m

# start_time = time.time()
# run your code here
# end_time = time.time()
# print("It took: ",end_time - start_time)

a = np.random.random(10000)
b = [random.random() for i in range(10000)]

def test1():
    return np.random.random(10000)

def test2():
    return [random.random() for i in range(10000)]

def test3():
    return list(np.random.random(10000))


start_time = time.time()
for i in range(1000): test1()
end_time = time.time()

print("Time of test1(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test2()
end_time = time.time()

print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3()
end_time = time.time()

print("Time of test3(): ", end_time - start_time)

Time of test1():  0.057518959045410156
Time of test2():  0.8845329284667969
Time of test3():  0.43084049224853516


Write your answer here.
test 1 is 1 magnitude faster than the others.
test 3 is 2x faster than test2
test1 : 0.0575
test2 : 0.8845
test3 : 0.4308

## Doing sums in different ways

We will create 10000 random numbers and compute the sum.

We will do this in different ways.

## Task 2

Analyse the results and try to come up with some reasons/conclusions for the numbers that you see.

It is more interesting to find patterns of 2 times, 5 times, 10 times, 100 times performance. 

If you see large performance differences, try to come up with reasons for that. Discuss with your peers and use the internet if you can't come up with anything.



In [8]:
import time 
import random
import numpy as np

# start_time = time.time()
# run your code here
# end_time = time.time()
# print("It took: ",end_time - start_time)

a = np.random.random(10000)
b = [random.random() for i in range(10000)]

def test1(a):
    return np.sum(a)

def test2(a):
    return sum(a)

def test3(a):
    answer = 0
    for i in range(10000):
        answer += a[i]
    return answer

def test4(a):
    answer = 0
    for v in a:
        answer += v
    return answer

print("Testing on numpy.array")

start_time = time.time()
for i in range(1000): test1(a)
end_time = time.time()
print("Time of test1(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test2(a)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(a)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test4(a)
end_time = time.time()
print("Time of test4(): ", end_time - start_time)

print("Testing on lists")

start_time = time.time()
for i in range(1000): test1(b)
end_time = time.time()
print("Time of test1(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test2(b)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(b)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test4(b)
end_time = time.time()
print("Time of test4(): ", end_time - start_time)




Testing on numpy.array
Time of test1():  0.007001399993896484
Time of test2():  0.8768908977508545
Time of test3():  1.5880420207977295
Time of test4():  1.1344342231750488
Testing on lists
Time of test1():  0.3366506099700928
Time of test2():  0.025015830993652344
Time of test3():  0.5148112773895264
Time of test4():  0.24549198150634766


## Some heavier math

We run a made up computation on some data and compare times. 

We use np.vectorize to turn a python function into a numpy-type function. This can be extremely useful when writing code, but how is the performance? We will find out now.

## Task 3

Compute the timing data and explain what you see.

Look for patterns where something is 10 x, 2 x and so on, compared to something else, and identify which ones are roughly the same.

Try to explain why you see what you see. Discuss with your peers or search for information on the internet if you don't understand what is going on.

How does np.vectorize perform compared to writing pure numpy code? When should you use np.vectorize? When should you not use it?



In [1]:
import time 
import random
import numpy as np
import math as m

# start_time = time.time()
# run your code here
# end_time = time.time()
# print("It took: ",end_time - start_time)

a = np.random.random(10000)
b = [random.random() for i in range(10000)]

def test1(a):
    return np.sum(np.arctan(np.sin(np.sqrt(a) + np.cos(a*a))))

# create a np.function
fun = np.vectorize(lambda v : m.atan(m.sin(m.sqrt(v) + m.cos(v*v))))

def test2(a):
    return np.sum(fun(a))


# we are often forced to write code like this, even with numpy, since
# numpy doesn't do everything we want...
def test3(a):
    answer = 0
    for i in range(10000):
        answer += m.atan(m.sin(m.sqrt(a[i]) + m.cos(a[i]*a[i])))
    return answer

def test4(a):
    answer = 0
    for v in a:
        answer += m.atan(m.sin(m.sqrt(v) + m.cos(v*v)))
    return answer

print("Testing on numpy.array")

start_time = time.time()
for i in range(1000): test1(a)
end_time = time.time()
print("Time of test1(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test2(a)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(a)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test4(a)
end_time = time.time()
print("Time of test4(): ", end_time - start_time)

print("Testing on lists")

start_time = time.time()
for i in range(1000): test2(b)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(b)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test4(b)
end_time = time.time()
print("Time of test4(): ", end_time - start_time)



Testing on numpy.array
Time of test1():  0.1879744529724121
Time of test2():  4.845792055130005
Time of test3():  7.773846626281738
Time of test4():  5.285832166671753
Testing on lists
Time of test2():  5.140161037445068
Time of test3():  4.2525553703308105
Time of test4():  3.8058457374572754


np.array
Test 1 takes in a variable the does a calculation on it.
Test 2 takes in a variable and sends it to a vectorized function, this example is slow since it creates a new vectorized function for each time its called.
Test 3 takes in a variable and goes through every index, it needs to call upon each index which will make it slower.
Test 4 takes in a variable and goes through every index, but unlike test 3 it works with the original, so this is faster since it does not need to look up the array and go to the right index each time.

lists
Test 1 is quite slower than in np.array since it is a python list.
Test 2 is quite faster than in np.array this is since the IDE does not need to change to c language for "i" amount of times.
Test 3 is faster than np.array in the same sense as Test 2(lists) was faster than Test 2(arrays) as well this->Test is faster than Test 2(lists) since it does not need to look up the list for "i" times.

np.vectorize returns a pyfunc that takes in arrays, in the examples above, it does not work that much better.
you want to use it when you have a function that you have to run 1000x times, instead of running it n many times you can run it once by vectorizing and sending in an array instead of. and you don't want to use it in times when you only have to run a function once.

## Task 4

Write example code and give evidence for or against the following statement:

"Numpy performs better than python on large collections of data. On small collections on data, it is better to write standard python".

We have already seen that numpy performs better on large data sets. So focus on the second half of the statement.

We suggest you try to do some calculations with 2d vectors using numpy array with 2 elements, a list with 2 elements, a tuple with 2 elements and a pyglet.math vec2 and compare these 4 different approaches.

We look forward to seeing what you find out. 



# An amazing magic trick

We go back to some tests we have already done.

Now install numba.  (pip install  numba)

We add the decorator @njit

We try to go even faster by turning off some failsafes when handling floating points (fastmath=True).

If you get the same results as me: Slow numpy code is now considerably faster. A 10 x improvement.

If we work with lists, we don't get any benefit from numba. So we can't hope to improve all code by plastering @njit everywhere.

Note! pythran is another library which does similar things.I don't know anything about it, but by all means, try it out.

Note!! For really heavy calculations, numba can also help with moving computation over to the GPU.



In [1]:
import time 
import random
import numpy as np
import math as m
from numba import njit,float64

# start_time = time.time()
# run your code here
# end_time = time.time()
# print("It took: ",end_time - start_time)

a = np.random.random(10000)
b = [random.random() for i in range(10000)]

@njit
def test1(a):
    return np.sum(np.arctan(np.sin(np.sqrt(a) + np.cos(a*a))))

@njit
def test2(a):
    answer = 0
    for i in range(10000):
        answer += m.atan(m.sin(m.sqrt(a[i]) + m.cos(a[i]*a[i])))
    return answer

@njit
def test3(a):
    answer = 0
    for v in a:
        answer += m.atan(m.sin(m.sqrt(v) + m.cos(v*v)))
    return answer


@njit(fastmath=True)
def test1fast(a):
    return np.sum(np.arctan(np.sin(np.sqrt(a) + np.cos(a*a))))

@njit(fastmath=True)
def test2fast(a):
    answer = 0
    for i in range(10000):
        answer += m.atan(m.sin(m.sqrt(a[i]) + m.cos(a[i]*a[i])))
    return answer

@njit(fastmath=True)
def test3fast(a):
    answer = 0
    for v in a:
        answer += m.atan(m.sin(m.sqrt(v) + m.cos(v*v)))
    return answer


@njit
def test5(a):
    answer = 0
    for i in range(10000):
        answer += m.atan(m.sin(m.sqrt(a[i]) + m.cos(a[i]*a[i])))
    return answer

@njit
def test6(a):
    answer = 0
    for v in a:
        answer += m.atan(m.sin(m.sqrt(v) + m.cos(v*v)))
    return answer


print("Testing on numpy.array")

start_time = time.time()
for i in range(1000): test1(a)
end_time = time.time()
print("Time of test1(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test2(a)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(a)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)


print("Testing on numpy.array with fastmath")

start_time = time.time()
for i in range(1000): test1(a)
end_time = time.time()
print("Time of test1(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test2(a)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(a)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)


print("Testing on lists")

start_time = time.time()
for i in range(1000): test2(b)
end_time = time.time()
print("Time of test2(): ", end_time - start_time)

start_time = time.time()
for i in range(1000): test3(b)
end_time = time.time()
print("Time of test3(): ", end_time - start_time)


Testing on numpy.array
Time of test1():  1.2681100368499756
Time of test2():  0.31825828552246094
Time of test3():  0.3161141872406006
Testing on numpy.array with fastmath
Time of test1():  0.2668263912200928
Time of test2():  0.25173330307006836
Time of test3():  0.25018858909606934
Testing on lists
Time of test2():  12.048087120056152
Time of test3():  11.60790753364563


In [68]:
import numpy as np
import time
from pyglet import math
import math as m

def matXvec(u, m):
    return np.array([sum(u[j] * m[i, j] for j in range(len(u))) for i in range(len(m))])

def matXmat(m1, m2):
    return np.array([[np.sum([m * m2[i][x] for i, m in enumerate(row)]) for x in range(len(m1))] for row in m1])

n = 1000000
m_value = 100000

listvec = [1, 2]
arrayvec = np.array([1, 2])
tuplevec = (1, 2)
pygletvec = math.Vec2(x=1.0, y=2.0)

listmat = [[1,2], [3,4]]
arraymat = np.array([[1,2], [3,4]])
tuplemat = ((1,2),(3,4))

mat1 = np.array([[m.cos(m.pi/2), -m.sin(m.pi/2)], [m.sin(m.pi/2), m.cos(m.pi/2)]])

start_time = time.time()
for _ in range(n): 
    matXvec(listvec, mat1)
end_time = time.time()
print("list took : ", end_time - start_time)

start_time = time.time()
for _ in range(n): 
    matXvec(arrayvec, mat1)
end_time = time.time()
print("numpy array took : ", end_time - start_time)

start_time = time.time()
for _ in range(n): 
    matXvec(tuplevec, mat1)
end_time = time.time()
print("tuple took : ", end_time - start_time)

start_time = time.time()
for _ in range(n): 
    matXvec(pygletvec, mat1)
end_time = time.time()
print("pyglet vector took : ", end_time - start_time)

print("---------------------------")

start_time = time.time()
for _ in range(m_value): 
    matXmat(listmat, mat1)
end_time = time.time()
print("List mat took : ", end_time - start_time)

start_time = time.time()
for _ in range(m_value): 
    matXmat(arraymat, mat1)
end_time = time.time()
print("numpy array mat took : ", end_time - start_time)

start_time = time.time()
for _ in range(m_value): 
    matXmat(tuplemat, mat1)
end_time = time.time()
print("Tuple mat took : ", end_time - start_time)

dimension = 2
while True:
    dimension += 1
    testarray = np.ones((dimension, dimension))*random.randint(1, 100)
    testlist = [[i for i in range(dimension)] for x in range(dimension)]
    start_time_array = time.time()
    matXmat(testarray, testarray)
    end_time_array = time.time()
    start_time_list = time.time()
    matXmat(testlist, testlist)
    end_time_list = time.time()
    if end_time_array - start_time_array < end_time_list - start_time_list:
        break
print("Done!, Dimensions before array is faster than list : ", dimension)

list took :  4.2596025466918945
numpy array took :  4.715015649795532
tuple took :  4.141317367553711
pyglet vector took :  5.029679536819458
---------------------------
List mat took :  3.228597402572632
numpy array mat took :  3.908308744430542
Tuple mat took :  3.232940673828125
Done!, Dimensions before array is faster than list :  5


It would seem as though using lists and tuples are marginally faster when working with smaller data structures.
Even making the data structures a little bigger only shows a marginal increase in performance. 
Making the dimensions of the data structure anywhere from between 4x4 to 7x7 (from my testing) makes numpy perform better than python lists