# Python loops
(by Tevfik Aytekin)

Examples showing the speed issues of loops in Python and how to solve it. 

### Dot Product

In [5]:
import numpy as np
import time

size = 1000000
a = np.random.rand(size)
b = np.random.rand(size)


c = 0
tic = time.time()
for i in range(size):
    c+=a[i]*b[i]
toc = time.time()

print(c)
print("For loop: "+str(1000*(toc-tic))+"ms")

tic = time.time()
c = np.dot(a,b)
toc = time.time()

print(c)
print("Numpy version: "+str(1000*(toc-tic))+"ms")



249873.36956749737
For loop: 437.9768371582031ms
249873.36956749423
Numpy version: 0.8718967437744141ms


The main reason for this difference is that numpy library uses compiled code, on the other hand, Python loops are executed by the Python interpreter which needs to keep track of many things. Another reason might be that numpy library functions might utilize parallel programming whereas the above Python loop is sequentially executed on a single core.

What you should do if you have a heavy computation? Here are some possibilties:
- Write your code in C/C++
- Use Cython
- Use only fast libraries like numpy.

All options have pros/cons which I will not be able to discuss now. Below are some examples of using Cython and calling C/C++ functions from Python.

In [2]:
%load_ext cython

In [3]:
%%cython -a
import numpy as np

def my_dot(double[:] a, double[:] b, int size):
    cdef double c = 0
    cdef int i
    for i in range(size):
        c+=a[i]*b[i]
    print(c)


In [4]:
size = 10000000;
a = np.random.rand(size)
b = np.random.rand(size)


In [5]:
%%timeit -n1 -r1
my_dot(a, b, size)

2499525.450996904
13.3 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)


### Matrix Multiplication

In [6]:
Ax = 5000
Ay = 5000
Bx = 5000
By = 5000
A = np.random.rand(Ax,Ay)
B = np.random.rand(Bx,By)
"""
result = np.zeros((Ax,  By))
tic = time.time()
for i in range(Ax):
    for j in range(By):
        for k in range(Bx):
            result[i][j] += A[i][k] * B[k][j]
toc = time.time()
print("For loop: "+str(1000*(toc-tic))+"ms")
"""
tic = time.time()
result = np.dot(A,B)
toc = time.time()
print("Vectorized version: "+str(1000*(toc-tic))+"ms")


Vectorized version: 3983.288049697876ms


### dot vs matmul

In Python in addition to dot function there is another matrix multiplication function called matmul. Look at this [discussion](https://stackoverflow.com/questions/34142485/difference-between-numpy-dot-and-python-3-5-matrix-multiplication) for their difference.

### Calling a C/C++ function from Python

You can write the most time consuming part of your code in C/C++ and then call it from Python code. To do this, first create a C/C++ funtction.

Then, create a shared libriary using the following command

cc -fPIC -shared -o my_function.so my_function.c

Then call your function from Python code as follows:

In [None]:
import numpy as np
import time
from ctypes import *

lib = cdll.LoadLibrary("c_codes/c_dot.so")
c_dot = lib.c_dot
c_sum = lib.c_sum
c_dot.restype = c_int


a = np.random.randint(1,100,100000)
b = np.random.randint(1,100,100000)
print(a)
print(b)

c = 0
tic = time.time()
for i in range(len(a)):
    c+=a[i]*b[i]
toc = time.time()

print(c)
print("Python: "+str(1000*(toc-tic))+"ms")


tic = time.time()
c = c_dot(c_void_p(a.ctypes.data), c_void_p(b.ctypes.data), len(a))

toc = time.time()

print(c)
print("C function: "+str(1000*(toc-tic))+"ms")



In [None]:
import numpy as np
import time
from ctypes import *

lib = cdll.LoadLibrary("c_codes/c_dot.so")
c_sum = lib.c_sum
c_sum.restype = c_int


a = np.random.randint(1,5,6)
print(a)

c = c_sum(c_void_p(a.ctypes.data), c_int(len(a)))
print(c)
