# Software Carpentry Exercises

### Instructions

Exercises are marked with a colored question mark (❓/❔). The aim is to complete the ❓ questions during the Breakout session, ❔ questions can be treated as optional or homework.

⚠️ Make sure to execute the cells containing code in the question itself to create the required variables. If you encounter a `NameError` in your solution, you probably did not execute all cells containing code in the question.

🔎 **Solutions** are provided for each question. If the solution comes in the form of code, first uncomment the line containing `%load solutions/<solution.py>` (remove the #-symbol, but keep the %-symbol), and then execute the cell. After running the cell, the answer is shown. You can then run the cell again to run the code of the solution.

**Legend**  
❓ = Question to cover in the breakout session  
❔ = Optional question / homework  
💡 = hints  
🔎 = solution

## Breakout Session 1

### ❓ <ins>Arithmic with different types</ins>

Where reasonable, `float()` will convert a string to a floating point number, and `int()` will convert a floating point number to an integer:

In [None]:
print("string to float:", float("3.4"))
print("float to int:", int(3.4))

**Question**  
Given this information, which of the following will return the floating point number `2.0`? Discuss your answer.  
_Note: there may be more than one right answer._

In [1]:
first = 1.0
second = "1"
third = "1.1"

1. `first + float(second)`
1. `float(second) + float(third)`
1. `first + int(third)`
1. `first + int(float(third))`
1. `int(first) + int(float(third))`
1. `2.0 * second`

In [4]:
# Your code here


<details>
<summary>🔎 Solution</summary>

Answer: 1 and 4 <br>
    
Note, questions 3 and 6 produce an error, respectively a `ValueError` owing to an ill-posed argument in a function, and `TypeError`, owing to ill-posed operands in a operation.
    
</details>

### ❓ <ins>Slicing strings</ins>

“Indexing” means referring to an element of a list by its position within the list. “Slicing” means getting a subset of elements from a list based on their indices. We can take slices of character strings as well:

In [None]:
element = "oxygen"
print('first three characters:', element[0:3]) # from index 0 until, but not including, 3
print('last three characters:', element[3:6])

**Question**  

What are the values of the following slices?
```python
element[4]
element[4:]
element[:] 
element[-1]
element[-3:]
```

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/slicing_strings.py

### ❔ <ins>Slicing lists with steps</ins>

We’ve seen how to use slicing to take single blocks of successive entries from a sequence. But what if we want to take a subset of entries that aren’t next to each other in the sequence? We can achieve this by providing a third argument to the range within the brackets, called the step size.

The full syntax for creating slices is `[begin:end:step]`, although you most often find a short-hand notation as we've seen in the above exercise.

The example below shows how you can take every third entry in a list:

In [None]:
primes = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37]
subset = primes[0:12:3]
print('subset', subset)

Given the following list of months:

In [None]:
months = ['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec']

**Questions**

1. What slice of months will produce the following output `['jan', 'mar', 'may', 'jul', 'sep', nov']`?

1. Given the short-hand notation we used for the character string in the Exercise (i.e. `element[:2] == 'element[0:2]`), can you find the short-hand notation for question 1? What do you find easier to read?

1. Using the step size parameter, can you think of a way to reverse the list?

<details>
<summary>💡 Click here for hints</summary>

- Note that `months[-1]` will retrieve the last index (`dec`), but `months[0:-1]` will not. Slicing is "up to, but not including". 
- The step parameter accepts negative values for reversing the slicing direction

</details>

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/slicing_steps.py

## Breakout Session 2

### ❓ <ins>Change in inflammation</ins>

The patient data is longitudinal in the sense that each row represents a series of observations relating to one individual. This means that the change in inflammation over time is a meaningful concept.

The `numpy.diff()` function takes an array and returns the differences between two successive values. Let’s use it to examine the changes per day across the first week of patient 3 from our inflammation dataset:

In [None]:
# Load data
import numpy
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [None]:
patient3_week1 = data[3, :7]
print(patient3_week1)

Calling `numpy.diff(patient3_week1)` would do the following calculations

```python
[ 0 - 0, 2 - 0, 0 - 2, 4 - 0, 2 - 4, 2 - 2 ]
```
and return the 6 difference values in a new array:

In [None]:
numpy.diff(patient3_week1)

**Questions**

1. When calling `numpy.diff()` with a multi-dimensional array, an `axis` argument may be passed to the function to specify which axis to process. When applying `numpy.diff()` to our 2D inflammation array `data`, which axis would we specify?

1. If the shape of an individual data file is `(60, 40)` (60 rows and 40 columns), what would the `shape` of the array be after you run the `diff()` function and why?

1. How would you find the largest change in inflammation for each patient? Does it matter if the change in inflammation is an increase or a decrease?

1. Plot a histogram using `matplotlib.pyplot.hist()` of the change per day for patient 3 across all 40 days. 

<details>
<summary>💡 Click here for hints</summary>

1) Using `axis=0`, means it will calculate the consecutive difference between patients (considering all days). Using `axis=1` means it will calculate the consecutive difference between days (considering all patients). Since it is more useful to see what happens to all patients as they go through the clinical trial, it makes more sense to see overall how the number of inflammation episodes change for the patients between consecutive days.
    
    ![](https://swcarpentry.github.io/python-novice-inflammation/fig/python-operations-across-axes.png)

2) Note that the array of differences is shorter by one element.
    
3) By using the `numpy.max()` function after you apply the `numpy.diff()` function, you will get the largest difference between days.
</details>

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/inflammation.py

### ❔ <ins>Plotting error bars</ins>

1. Create a plot showing the standard deviation (`numpy.std`) of the inflammation data for each day across all patients.


In [None]:
# Load data
import numpy
import matplotlib.pyplot as plt

data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/error_bars1.py

2. Use the standard deviation to plot errorbars around the mean of the inflammation data for each day across all patients. Make use of `matplotlib.pyplot.errorbar()` and optionally change the style through the arguments `linestyle`, `marker`, and `capsize`. Note that `errorbar()` requires both x and y data. See the [documentation](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.errorbar.html) for more information.

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/error_bars2.py

## Breakout Session 3

### ❓ <ins>Summing a list</ins>

Write a loop that calculates the sum of elements in a list by adding each element and printing the final value, so `[124, 402, 36]` prints 562

<details>
<summary>💡 Click here for hints</summary>

**Steps:**  
1. Define the input variable `numbers` and output variable `summed`
1. Create a for-loop over the indices of `numbers` and add each `num` to `summed`, e.g.
    
    ```python
    for num in numbers:
        # Add num to summed
    ```
1. Print the result with `print()`
    

</details>

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/summing_list.py

### ❓ <ins>Computing the value of a polynomial</ins>

The built-in function `enumerate` takes a sequence (e.g. a list) and generates a new sequence of the same length. Each element of the new sequence is a pair composed of the index (0, 1, 2,…) and the value from the original sequence:

In [None]:
a_list = ["apple", "banana", "orange"]
for count, value in enumerate(a_list):
    print(count, value)

The code above loops through `a_list`, assigning the index to `count` and the value to `value`.

Suppose you have encoded a polynomial as a list of coefficients in the following way: the first element is the constant term, the second element is the coefficient of the linear term, the third is the coefficient of the quadratic term, etc.

$$y = p_0 + p_1 x^1 + p_2 x^2 + ... + p_n x^n $$

For example, the solution for $x=5$ with the coefficients $p_0=2, p_1=4, p_2=3$ is

In [None]:
x = 5
coefs = [2, 4, 3]
y = coefs[0] * x**0 + coefs[1] * x**1 + coefs[2] * x**2
print(y)

Write a loop using `enumerate(coefs)` which computes the value `y` of any polynomial, given `x` and `coefs`.

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/polynomial.py

### ❔ <ins>Plotting differences</ins>

Plot the difference between the average inflammations reported in the first and second datasets (stored in `inflammation-01.csv` and `inflammation-02.csv`, correspondingly).

Steps:

1. Import libraries
1. Import data
1. Calculate difference
1. Create and annotate figure

In [None]:
# Your code here


#### 🔎 Solution

In [None]:
# %load solutions/plotting_differences.py