### Performance of Python data structures

<font size = "4">

Create a list of integers $0, 1, 2, \dots, 999$ in one of 4 ways. Which one is fastest?

In [2]:
def test1():
    l = []
    for i in range(1000):
        l = l + [i]


def test2():
    l = []
    for i in range(1000):
        l.append(i)


def test3():
    l = [i for i in range(1000)]


def test4():
    l = list(range(1000))

In [3]:
from timeit import timeit, Timer

num_repeats = 1000

total_time1 = timeit(stmt = "f()", number = num_repeats, 
    globals = {"f" : test1})
print(f"For test1, the average time was {1000*total_time1/num_repeats} milliseconds")

total_time2 = timeit(stmt = "f()", number = num_repeats, 
    globals = {"f" : test2})
print(f"For test2, the average time was {1000*total_time2/num_repeats} milliseconds")

total_time3 = timeit(stmt = "f()", number = num_repeats, 
    globals = {"f" : test3})
print(f"For test3, the average time was {1000*total_time3/num_repeats} milliseconds")

total_time4 = timeit(stmt = "f()", number = num_repeats, 
    globals = {"f" : test4})
print(f"For test4, the average time was {1000*total_time4/num_repeats} milliseconds")

For test1, the average time was 0.482604666845873 milliseconds
For test2, the average time was 0.0095528329256922 milliseconds
For test3, the average time was 0.007301457924768329 milliseconds
For test4, the average time was 0.005186874885112047 milliseconds


<font size = "4">

- We can also time the code using instances of the `Timer` class.

- The `timeit` function actually uses an instance of `Timer` internally.

In [4]:
t1 = Timer("test1()", "from __main__ import test1")
print(f"concatenation: {t1.timeit(number=1000):15.4f} milliseconds")
t2 = Timer("test2()", "from __main__ import test2")
print(f"appending: {t2.timeit(number=1000):19.4f} milliseconds")
t3 = Timer("test3()", "from __main__ import test3")
print(f"list comprehension: {t3.timeit(number=1000):10.4f} milliseconds")
t4 = Timer("test4()", "from __main__ import test4")
print(f"list range: {t4.timeit(number=1000):18.4f} milliseconds")

concatenation:          0.4804 milliseconds
appending:              0.0095 milliseconds
list comprehension:     0.0077 milliseconds
list range:             0.0051 milliseconds


<font size = "4">

Using `Timer` is convenient when the code you are testing changes a mutable object.

In [5]:
# demonstration of ".pop()" method

x = [1, 2, 3, 4, 5]
val = x.pop()
print("popped value:", val)
print("x =", x)
print()

x = [1, 2, 3, 4, 5]
val = x.pop(0)
print("popped value:", val)
print("x =", x)
print()
x = [1, 2, 3, 4, 5]
val = x.pop(1)
print("popped value:", val)
print("x =", x)


popped value: 5
x = [1, 2, 3, 4]

popped value: 1
x = [2, 3, 4, 5]

popped value: 2
x = [1, 3, 4, 5]


In [6]:
# This will cause an error, because x.pop() changes x
x = list(range(200))
total_time = timeit(stmt = "x.pop()", number = 1000, 
    globals = {"x" : x})
print(f"x.pop(), the average time was {1000*total_time/num_repeats} milliseconds")

IndexError: pop from empty list

In [7]:
pop_zero = Timer("x.pop(0)", "from __main__ import x")
pop_end = Timer("x.pop()", "from __main__ import x")

x = list(range(2_000_000))
time1 = pop_zero.timeit(number=1000)
time2 = pop_end.timeit(number=1000)


print(f"pop(0): {time1:10.8f} milliseconds")
print(f"pop(): {time2:11.8f} milliseconds")


pop(0): 0.26030337 milliseconds
pop():  0.00001025 milliseconds


### Test: the `in` operator with lists and dictionaries

In [None]:
# Reminder: Dictionaries vs. lists

family_ages = [36, 12, 37, 24, 45]
print(family_ages[3])

family_ages = {"Kevin" : 36, "Mary": 12, "Ken" : 37, "Peter" : 24, "Bob" : 45, 55: "DATASCI"}
print(family_ages["Peter"])
print(family_ages[55])

24
24
DATASCI


In [9]:
# Description of the "in" operator for lists and dictionaries

import random

n = 8 
x = list(range(n))
y = {j: None for j in range(n)}
print("x:", x)
print("y:", y , '\n')
t = random.randrange(n)
print("t:", t)
print("t in x:", t in x)
print("t in y:",t in y)

x: [0, 1, 2, 3, 4, 5, 6, 7]
y: {0: None, 1: None, 2: None, 3: None, 4: None, 5: None, 6: None, 7: None} 

t: 1
t in x: True
t in y: True


In [10]:
n_vals = [10_000, 100_000, 1_000_000, 10_000_000]

num_repeats = 100

print("List test:")
for n in n_vals:
    x = list(range(n))
    total_time = timeit(stmt="f(n) in x", number = num_repeats, 
        globals = {"f" : random.randrange, "n" : n, "x" : x})
    print(f"For n = {n}, the average time was {1000*total_time/num_repeats} milliseconds")
print()
print("Dict test:")
for n in n_vals:
    y = {j: None for j in range(n)}
    total_time = timeit(stmt="f(n) in y", number = num_repeats, 
        globals = {"f" : random.randrange, "n" : n, "y" : y})
    print(f"For n = {n}, the average time was {1000*total_time/num_repeats} milliseconds")

List test:
For n = 10000, the average time was 0.026165831368416548 milliseconds
For n = 100000, the average time was 0.3321683290414512 milliseconds
For n = 1000000, the average time was 1.3050504098646343 milliseconds
For n = 10000000, the average time was 13.553817078936845 milliseconds

Dict test:
For n = 10000, the average time was 0.00020333100110292435 milliseconds
For n = 100000, the average time was 0.0002804095856845379 milliseconds
For n = 1000000, the average time was 0.0017433310858905315 milliseconds
For n = 10000000, the average time was 0.0013658404350280762 milliseconds


<font size = "4">

- The following reference summarizes computational cost for the standard built-in Python data structures: [Time Complexity Wiki](https://wiki.python.org/moin/TimeComplexity)

- An exhaustive look at complexity in Python can be found at [pythoncomplexity.com](https://pythoncomplexity.com/)