<a href="https://colab.research.google.com/github/travisormsby/python-tips-tricks/blob/main/docs/PerformanceMemory.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Optimize Performance and Memory Use

When you begin to use Python regularly in your work, you'll start noticing bottlenecks in your code. Some workflows may run at lightning speed, while others take hours of processing time to complete, or even crash.

Avoiding bloat is invaluable as you move toward using code for automation, bigger data, and working with APIs. Code efficiency means:
- Less chance of a slowdown or crash: the dreaded MemoryError.
- Quicker response time and fewer bottlenecks for the larger workflow.
- Better scaling.
- Efficient code is often (but not always!) cleaner and more readable.

Let's look at some ways you can reduce bloat in your code.


Access and store only what you need, no more.
- __Storage__: avoid a list where you could use a tuple
- __Membership look-up__: avoid a list (or tuple) where you could use a set (or dictionary)
- __Iteration__: avoid a function (or list comprehension) where you could use a generator (or generator expression)
- __Profile__: make time for performance checks by profiling your code for bottlenecks

## Use more tuples and fewer lists

If you have a collection of values, your first thought may be to store them in a list.

In [2]:
data_list = [17999712, 2015, 'Hawkins Road', 'Linden ', 'NC', 28356]

Lists are nice because they are very flexible. You can change the values in the list, including appending and removing values. But that flexibility comes at a cost. Lists are less efficient than tuples. For example, they use more memory.

In [3]:
import sys

data_tuple = (17999712, 2015, 'Hawkins Road', 'Linden ', 'NC', 28356)

print(sys.getsizeof(data_list))
print(sys.getsizeof(data_tuple))

104
88


If you aren't going to be changing the values in a collection, use a tuple instead of a list.

### Membership look-up: sequential vs. hashable

However, when you want to see if an element _already exists_ in a collection of elements, use a set or dictionary to store that collection if possible.

- List and tuple lookup is **sequential**. The bigger the list, the longer look-up takes.
- Set and dictionary lookups are **hashable**: mapping keys to values. Lookup always takes the same amount of time, now matter how much data there is.

The example below shows that a set is over 100x faster than a list in calculating the first 10,000 values of [Recaman's sequence](https://oeis.org/search?q=recaman&language=english&go=Search).

In [None]:
def recaman_check(cur, i, visited):
    return (cur - i) < 0 or (cur - i) in visited

def recaman_list(n: int) -> list[int]:
    """
    return a list of the first n numbers of the Recaman series
    """

    visited_list = [0]
    current = 0
    for i in range(1, n):
        if recaman_check(current, i, visited_list):
            current += i
        else:
            current -= i
        visited_list.append(current)
    return visited_list

UsageError: Line magic function `%%timeit` not found.


In [None]:
%%timeit
recaman_list(10000)

94 ms ± 655 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [6]:
def recaman_set(n: int) -> list[int]:
    """
    return a set of the first n numbers of the Recaman series
    """
    visited_set = {0}
    current = 0
    for i in range(1, n):
        if recaman_check(current, i, visited_set):
            current += i
        else:
            current -= i
        visited_set.add(current)
    return visited_set

In [7]:
%%timeit
recaman_set(10000)

913 μs ± 5.31 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


## Use more generators

We often use functions to operate on data, but generators can be more memory-efficient and faster for certain tasks.

**Regular functions and comprehensions** typically store outputs into containers, like lists or dictionaries. This can take up unnecessary memory, especially when we're creating multi-step workflows with many intermediate outputs.

In contrast, **generators** only hold one data item in memory at a time. A generator is a type of iterator that produces results on-demand (lazily), maintaining its state between iterations.

In [9]:
def massive_rf():
  """A regular function that produces even numbers, endlessly."""
  x_list = []
  x = 0
  while True:
    x_list.append(x)
    x += 2

In [None]:
# Eventually x_list will be bigger than memory
massive_rf()

In [None]:
def massive_gen():
  """A generator that produces even numbers, endlessly."""
  x = 0
  while True:
    yield x
    x += 2

# This will run until the machine loses power
for x in massive_gen():
  print(x)

What goes for functions, also goes for list comprehensions. You can often use a generator expression in place of a list comprehension. We've already seen an example of a generator expression in the n-dimensional distance function:

In [17]:
coords = (1, 1, 1, 1)
sum(d ** 2 for d in coords)


4

Compare that example to one that uses a list comprehension:

In [18]:
coords = (1, 1, 1, 1)
sum([d ** 2 for d in coords])

4

The `sum` function operates by looping over an iterable and adding the value to a running total. In the first case, the iterable is a generator that produces a single value at a time. 

In the second case, the list comprension loops over `coords` to produce a list where every value is stored in memory. Then the `sum` function loops over that list. 

An important limitation of generators is that because they produce a single value at a time and then forget about it, you cannot reuse them.

In [45]:
generator = (d ** 2 for d in coords)
sum(generator)

2000000

In [46]:
max(generator)

ValueError: max() iterable argument is empty

**Big Takeaway**: If you're only going to use a value once, you should probably use a generator. If you need to use it again, you probabably need to store it in something like a list or tuple.

## Profile, don't guess.

Profiling is any technique used to measure the performance of your code, in particular its speed. There are dozens of tools available for profiling. We'll use a few to:
1. **Check memory use:** Use `tracemalloc` to check the memory usage of code.
1. **Spot-profile your code:** Use the `timeit` notebook magic to perform some basic profiling by cell or by line.
1. **Profile your script comprehensively:** The `cProfile` module has the ability to break down call by call to determine the number of calls and the total time spent on each.

To make profiling easier, the cell below defines functions for calculating a sum on a generator expression and on a list comprehension. Both functions will be called with a very large number of coordinates to make profile differences more obvious.

In [1]:
coords = (1, 1) * 1_000_000
def sum_generator(coords):
    return sum(d ** 2 for d in coords)

def sum_list_comprehension(coords):
    return sum([d ** 2 for d in coords])

### Check memory use

The cells below uses `tracemalloc` to capture information about memory usage for the the two versions of the function.
 

In [2]:
import tracemalloc

tracemalloc.start()

sum_generator(coords)

current, peak = tracemalloc.get_traced_memory()
print(peak)

22542


In [2]:
import tracemalloc

tracemalloc.start()

sum_list_comprehension(coords)

current, peak = tracemalloc.get_traced_memory()
print(peak)

17129595


### Spot-check speed with `%%timeit`

The `timeit` module measures the execution time of a selection of code. Among the many ways you'll see it written are "magic" commands

`%%timeit` is a form of cell magic. It measures the execution time of the entire notebook cell.

In [3]:
%%timeit
sum_generator(coords)

49.6 ms ± 277 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [4]:
%%timeit
sum_list_comprehension(coords)

31.1 ms ± 430 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


If you just want to check the timeing for a single line, you can use the `%timeit` line magic. That's useful if you have some code that takes some time to run, but you don't want it affecting the `timeit` results. Compare the use of cell magic and line magic in the next two cells.

In [30]:
%%timeit
from time import sleep

sleep(1)

sum_list_comprehension(coords)

1.05 s ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [31]:
from time import sleep

sleep(1)

%timeit sum_list_comprehension(coords)

30.9 ms ± 134 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)


### Profile with `cProfile`

While `timeit` is a quick way to test speed, `cProfile` is useful as a comprehensive and holistic code profiler. Some perks of `cProfile`:
 - Compare which lines take longest to execute
 - See how often a function is executed
 - Sort profiling results by time
 - See the respective data the function interacts with
 - Print detailed reports with multiple statistics

In [9]:
import cProfile

cProfile.run('sum_generator(coords)', sort='tottime')

         2000672 function calls (2000662 primitive calls) in 0.272 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.136    0.136    0.232    0.232 {built-in method builtins.sum}
  2000001    0.109    0.000    0.109    0.000 1708522561.py:3(<genexpr>)
        2    0.006    0.003    0.010    0.005 socket.py:700(send_multipart)
        2    0.005    0.003    0.016    0.008 iostream.py:276(<lambda>)
        2    0.005    0.002    0.008    0.004 {method '__exit__' of 'sqlite3.Connection' objects}
        3    0.004    0.001    0.031    0.010 base_events.py:1953(_run_once)
       14    0.003    0.000    0.005    0.000 socket.py:623(send)
        2    0.003    0.002    0.005    0.003 {method 'recv' of '_socket.socket' objects}
      5/3    0.000    0.000    0.006    0.002 events.py:87(_run)
        1    0.000    0.000    0.000    0.000 inspect.py:3120(_bind)
        1    0.000    0.000    0.000    0.000 {method 'disa

In [8]:
cProfile.run('sum_list_comprehension(coords)', sort='tottime')

         387 function calls (378 primitive calls) in 0.041 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        4    0.015    0.004    0.015    0.004 socket.py:623(send)
        1    0.006    0.006    0.006    0.006 {built-in method builtins.sum}
      2/1    0.005    0.003    0.026    0.026 history.py:92(only_when_enabled)
        2    0.005    0.003    0.005    0.003 {method 'recv' of '_socket.socket' objects}
        1    0.005    0.005    0.005    0.005 {method 'poll' of 'select.epoll' objects}
        1    0.002    0.002    0.008    0.008 1708522561.py:5(sum_list_comprehension)
      2/1    0.000    0.000    0.015    0.015 history.py:1008(_writeout_input_cache)
      4/1    0.000    0.000    0.034    0.034 events.py:87(_run)
        1    0.000    0.000    0.000    0.000 inspect.py:3120(_bind)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        2    0.000    0.000   

## Exercises

### 1) Use tuples

The code below creates a list containing all years in a research study timeframe, from 1900 to 2030.

The values in this collection will not need to be changed because the study will always use this timeframe.

In [11]:
import sys

def listFromRange(r1, r2):
  """Create a list from a range of values"""
  return list(range(r1, r2+1))

start = 1900
end = 2030

studyYears = listFromRange(start, end)

print(studyYears)
print("Bytes used: ", sys.getsizeof(studyYears))

[1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030]
Bytes used:  1112


Write a different implementation using a storage option that takes up less memory.

## 2) Use sets

The code below assigns a collection of placenames to a list. Then, it checks whether a placename is in the list. If not, the placename is reported missing.

If you have 1 million placenames to look up and 6 names in the list, that’s up to 6 million checks.

In [12]:
placeNames_list = ["Kinshasa", "Duluth", "Uruguay", "Doherty Residence", "Dinkytown", "Khazad-dum"]

# List look-up
if "Dinkytown" not in placeNames_list:
    print("Missing.")  # O(n) look-up

Write a different implementation using a storage option that allows quicker checks for membership at scale.

### 3) Use generators

The code below uses a generator to create vertices for triangles from a random selection. It also defines a function for calculating the area of a polygon from its vertices.

In [None]:
from itertools import cycle
from random import choice

def generate_triangles(num_triangles):
    vertex_a_options = ((0, 0), (1, 1), (2, 2))
    vertex_b_options = ((0, 10), (1, 11), (2, 12))
    vertex_c_options = ((10, 0), (11, 1), (12, 2))

    for _ in range(num_triangles):
        vertex_a = choice(vertex_a_options)
        vertex_b = choice(vertex_b_options)
        vertex_c = choice(vertex_c_options)
        yield (vertex_a, vertex_b, vertex_c)

def calculate_area(vertices):
    subtotals = []
    vertex_cycle = cycle(vertices)
    next(vertex_cycle)
    for vertex in vertices:
        x, y = vertex
        nextx, nexty = next(vertex_cycle)
        subtotal = x * nexty - y * nextx
        subtotals.append(subtotal)
    area = abs(sum(subtotals) / 2)
    return area

The code below generates 1 million triangles. You want to find out the area of the largest triangle. The code below does this with a list comprehension.

In [44]:
triangles = generate_triangles(1_000_000)
max([calculate_area(triangle) for triangle in triangles])


70.0

Rewrite the code above to use less memory. 

Hint: The easiest fix is to replace the list comprehension with a generator expression. Harder would be writing a generator using the `yield` statement

### 4) Compare execution speed

Using `timeit`, compare the execution time of the two versions of the maximum polygon area code.

Hint: Think about whether you should use the `%%timeit` cell magic or the `%timeit` line magic.

### 5) Check memory usage

Using `tracemalloc`, compare the memory usage of the two versions of the two versions of the maximum polygon area code.

Hint: Because the notebook keeps many variables in memory, you will want to restart the notebook kernel between running the cell to get a valid comparison.



---

