# Debugging, Profiling and Testing in Python

## Python under-the-hood

### What happens when I click run on a python script/cell?

1. **Parser**: Checks syntax and breaks down Python code into manageable parts.
  
1. **Interpreter**: 
    - Translates parsed code into bytecode, optimizing it for execution.
    - Executes bytecode, producing the output defined by the Python code, and manages program execution.

As programmers, we hope the all our bugs are of the first and second kind, That way, Python takes care of our mistakes for us, and returns an 'error' message and stop execution as expected.
Logical Errors will not stop the execution of the program.

### The three main types of program errors:

- Syntax Errors
- Runtime Errors
- Logical Errors

### Syntax error example:

In [2]:
def say_hello(name)
    print("Hello" ,name,"!")

SyntaxError: expected ':' (1972436306.py, line 1)

### Runtime error example:

In [3]:
numerator = 10
denominator = 0

result = numerator/denominator
result

ZeroDivisionError: division by zero

#### Another example:

In [4]:
result = "Hello" + 42
result

TypeError: can only concatenate str (not "int") to str

### Logical error example:

In [5]:
data = [10, 20, 30, 40, 50]
total = sum(data)
count = len(data)
average = total / (count + 1)

average

25.0

#### Another example:

In [6]:
def calculate_discounted_price(original_price, discount_percentage):
    discounted_price = original_price - discount_percentage
    return discounted_price

pants_price = 100
discount = 20
final_price = calculate_discounted_price(pants_price, discount)
final_price

80

### Exercises

Fix the following bugs

In [7]:
x = 1
y = 0
while x < 4:
    y += x
print(y)

KeyboardInterrupt: 

In [8]:
switch = 'on'
if switch = 'off':
    print('go home')

SyntaxError: invalid syntax. Maybe you meant '==' or ':=' instead of '='? (905021021.py, line 2)

In [9]:
range(2.5)

TypeError: 'float' object cannot be interpreted as an integer

In [10]:
range(2,3,0)

ValueError: range() arg 3 must not be zero

### How do we debug our code?

- **Print Statements**: Output variable values or execution flow strategically.
  
- **Debugger**: Use built-in or IDE debuggers for real-time code inspection and variable tracking.
  
- **Logging**: Record diagnostic info with Python's logging module.
  
- **Assertions**: Validate conditions with `assert` statements to catch errors.
  
- **Interactive Exploration**: Experiment interactively in environments like Jupyter.
  
- **Profiling Tools**: Analyze code performance with tools like `snakeviz` for optimization.

We will focus on real-time code inspection, interactive exploration, and profiling.

### Exercise

Find the bug in the following code:

In [29]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

result = factorial(5) # try a float
print("Factorial of 5:", result)

Factorial of 5: 120


### mutability  vs immutability.

In [35]:
def add_to_list(item, my_list=[]):  # Mutable default argument!
    my_list.append(item)
    return my_list

# First call
print("First call:")
result1 = add_to_list("apple")
print(result1)  # Output: ['apple']

# Second call - expecting a new list with just "banana"
print("\nSecond call:")
result2 = add_to_list("banana")
print(result2)  # Output: ['apple', 'banana'] - SURPRISE!

First call:
['apple']

Second call:
['apple', 'banana']


In [36]:
def add_to_list_fixed(item, my_list=None):
    if my_list is None:
        my_list = []  # Create a new list each time
    my_list.append(item)
    return my_list

print("\n\nFixed version:")

# First call
print("First call (fixed):")
fixed1 = add_to_list_fixed("apple")
print(fixed1)  # Output: ['apple']

# Second call
print("\nSecond call (fixed):")
fixed2 = add_to_list_fixed("banana")
print(fixed2)  # Output: ['banana'] - Works as expected!



Fixed version:
First call (fixed):
['apple']

Second call (fixed):
['banana']


https://pythontutor.com/

https://pythonspeed.com/articles/minimizing-copying/

## Profiling

add some text about profiling

In [14]:
%load_ext snakeviz
%snakeviz_config -h localhost -p 8900

The snakeviz extension is already loaded. To reload it, use:
  %reload_ext snakeviz
Snakeviz configured with host localhost and port 8900


In [15]:
%%snakeviz
# https://blog.finxter.com/python-cprofile-a-helpful-guide-with-prime-example/
import random

def guess():
    ''' Returns a random number '''
    return random.randint(2, 1000)

def is_prime(n):
    ''' Checks whether n is prime '''
    for i in range(2, n):
        for j in range(2, n):
            if i * j == n:
                return False
    return True

def find_primes(num):
    primes = []
    while len(primes) < num:
        p = guess()
        if is_prime(p):
        	primes.append(p)
    return primes

print(find_primes(100))

[131, 281, 101, 269, 617, 577, 941, 593, 787, 467, 311, 211, 421, 373, 263, 619, 953, 751, 953, 83, 233, 257, 19, 619, 691, 313, 643, 631, 499, 821, 787, 809, 557, 907, 17, 809, 197, 661, 401, 593, 797, 613, 701, 617, 97, 857, 79, 223, 349, 367, 17, 151, 241, 137, 137, 191, 797, 953, 719, 449, 31, 863, 397, 251, 313, 761, 727, 743, 967, 421, 389, 769, 113, 11, 661, 227, 541, 691, 241, 863, 997, 547, 113, 617, 463, 359, 29, 107, 823, 433, 431, 11, 223, 607, 53, 839, 53, 577, 53, 823]
 
*** Profile stats marshalled to file '/tmp/tmp4tqq92pq'.
Embedding SnakeViz in this document...
<function display at 0x7fcbf4512c20>


### Another example:

In [17]:
%%snakeviz
# try this example without functions
def load_word_list(file_path):
    with open(file_path, 'r') as file:
        return set(word.strip() for word in file)

def check_words_against_file(main_file_path, word_file_path):
    unmatched_words = []
    
    word_list = load_word_list(word_file_path)

    with open(main_file_path, 'r') as file:
        for line in file:
            words = line.split()
            for word in words:
                word = word.strip(',.?!;:"\'').lower()
                if word not in word_list:
                    unmatched_words.append(word)

    return unmatched_words

# Example usage:
main_file_path = '../data/anage_data.txt'
word_file_path = '../data/to_match.txt'


# %lprun -f check_words_against_file check_words_against_file(main_file_path, word_file_path)
unmatched_words = check_words_against_file(main_file_path, word_file_path)

# Print unmatched words
print("Number of words not found in the word list:")
print(len(unmatched_words))

Number of words not found in the word list:
61917
 
*** Profile stats marshalled to file '/tmp/tmpbfr04gos'.
Embedding SnakeViz in this document...
<function display at 0x7fcbf4512c20>


## Unit testing

- **Early Bug Detection**: Unit tests catch bugs early, reducing debugging time.
  
- **Safe Refactoring**: Unit tests ensure code changes don't break existing functionality.
  
- **Improved Code Quality**: Unit tests encourage modular, maintainable code design.

### How do we unit-test?

Pytest is a popular and powerful framework for writing tests in Python.

- It allows you to write various types of tests like unit tests, integration tests, and functional tests.
- It is easy to and can handle complex testing needs.


Ipytest is a Python library that allows you to run tests written with the pytest framework inside Jupyter notebooks.



In [18]:
import ipytest
ipytest.autoconfig()

In [19]:
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)
    
def test_factorial():
    assert factorial(0) == 1
    assert factorial(1) == 1
    assert factorial(2) == 2
    assert factorial(3) == 6
    assert factorial(4) == 24
    assert factorial(5) == 120

In [20]:
ipytest.run('-v')

platform linux -- Python 3.10.12, pytest-8.2.1, pluggy-1.5.0
rootdir: /home/pupkolab/Dev/PyDataSciBio/sessions
plugins: anyio-4.3.0
collected 1 item

t_2bff95c1d7f74212a31c3da6882e4fe6.py [32m.[0m[32m                                                      [100%][0m



<ExitCode.OK: 0>

### Exercise

Below is a function that calculates the length of the largest side of a right triangle given the lengths of the other two sides using the [Pythagorean theorem](http://en.wikipedia.org/wiki/Pythagorean_theorem):

$$ a^2 + b^2 = c^2 $$

In [3]:
def pythagoras(a,b):
    c2 = a**2 + b**2
    return c**0.5

Write a series of tests to test the function.

In [None]:
ipytest.autoconfig()



### Exercise

Here's a code that calculates the mass of a protein given its amino acid sequence. The code has several bugs, fix them all untill the assertion below suceeds silently.

In [15]:
weights = {'D': 115.02694, 'E': 129.04259, 'R': 156.10111, 'S': 87.03203, 'M': 131.04049, 'W': 186.07931, 'P': 97.05276, 'C': 103.00919, 'V': 99.06841, 'I': 113.08406, 'G': 57.02146, 'A': 71.03711, 'L': 113.08406, 'N': 114.04293, 'T': 101.04768, 'k': 128.09496, 'Q': 128.05858, 'H': 137.05891, 'F': 147.06841, 'Y': '163.06333'}

In [23]:
def protein_mass(sequence)
    mass = 0
    for aa in sequence:
        if aa not in weights:
            raise ValueError "Input sequence contains an illegal aa: {}".format(aa)
        mass =+ weigts(aa)
    return mass

In [24]:
seq = 'SKADYEK'
assert round(protein_mass(seq), 3) == 821.392

S 87.03203
K 128.09496
A 71.03711
D 115.02694
Y 163.06333
E 129.04259
K 128.09496
