# Neuroanalytics Lecture 2: Python Crash Course - Numpy

The aim of the first section of the course is to cover fundamentals of Python. By the end of this section, you should be able to load in your data, perform manipulations or selections within the data, visualize the data and compute basic statistics like mean and median. 

For the purpose of the first lecture, you imported a list of timestamps at which a neuron recorded from a live animal fired action potentials. You know that while electrical activity from this neuron was recorded, there were a few loud sounds presented to the animal. In the first lecture, you first learned a few basic syntax rules of Python. We then introduced you to important data types, most importantly, lists. After learning lists, you [imported](#timestampsload) the "list" of timestamps at which the above neuron fired action potentials. Once the timestamps are imported, you filtered the list to only include spiketimes within a period of interest in multiple ways. Lastly, you learned to create a binned spike count of the timestamps by writing a function.

In this lecture, you will be doing the same exercises but using a special module in Python called numpy. 

At the end of this lecture, we will also teach you how to programmatically [browse](#filebrowsing) files or subfolders within a given folder.

<br>
__NumPy __

<a href="http://www.numpy.org/">NumPy</a> (or Numpy) is a linear algebra <a href="https://stackoverflow.com/questions/19198166/whats-the-difference-between-a-module-and-a-library-in-python">package</a> for Python, the reason it is so important for analysis with Python is that almost all of the libraries in the PyData Ecosystem rely on NumPy as one of their main building blocks.

Numpy is also incredibly fast, as it has bindings to C libraries. For more info on why you would want to use Arrays instead of lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

---
## Contents
- [Installation](#install)
- [Importing](#import)
- [Numpy arrays](#array)
- [Generating random numbers](#random)
- [Array attributes and methods](#attributes)
- [Indexing](#index)
- [Broadcasting](#broadcasting)
- [Numpy operations](#operations)
- [More numpy](#more)

---
<a id='install'></a>
## Installation Instructions

**It is highly recommended that you install Python using the Anaconda distribution** to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install. If you have Anaconda, install NumPy by going to your terminal or command prompt and typing:
    
    conda install numpy
    
If you do not have Anaconda and cannot install it, please refer to [Numpy's official documentation on various installation instructions.](http://docs.scipy.org/doc/numpy-1.10.1/user/install.html)

---
## Importing numpy

Once you've installed NumPy you can import it as a library:

In [None]:
import numpy as np

> __*Importing modules and packages*__

> Modules are typically imported as the very first step in your script/analysis. Here are common ways to do this:

> ```
import <package name>
import <package name> as <new name>
from <package name> import <specific package/module within>
```

> Try avoiding `from <package name> import *`. This will likely import a massive number of functions and variables that will be hard to manage since they are not preceded by the package name (`np`, for instance).

> Modules, if changed, can be reloaded using the function `reload`

## Numpy arrays

- Collection of objects of the __same__ type
- Can be multidimensional
- Nomenclature: __vectors__ are 1-dimensional, __matrices__ are 2-dimensional

### `array`
Create numpy array from existing array-like object.

In [None]:
my_list = [1,2,3]
my_list

In [None]:
my_array = np.array(my_list)
my_array

In [None]:
print(type(my_list))
print(type(my_array))

In [None]:
my_nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
my_nested_list

In [None]:
my_matrix = np.array(my_nested_list)
my_matrix

Notice the _two_ sets of brackets when there are _two_ dimensions.

> __*Row-major order*__

> Numpy uses the convention in C where the last dimension is "filled" first. This is referred to as [__row-major order__](#https://en.wikipedia.org/wiki/Row-_and_column-major_order). If you are coming from MATLAB, this is the opposite convention. Notice with `reshape`, the rows were filled first.

### `arange`

Return evenly spaced values within a given interval.

In [None]:
my_array = np.arange(0, 10)
print(my_array)

In [None]:
my_array

In [None]:
print(type(my_array))

In [None]:
frame_rate = 5.0
time_points = np.arange(100) / frame_rate
print(time_points)

In [None]:
np.arange(0, 20)

In [None]:
np.arange(0, 20.)

In [None]:
np.arange(0, 20, dtype=float)

### `zeros` and `ones`

Generate arrays of zeros or ones.

In [None]:
np.zeros(3)

In [None]:
np.zeros((5, 5))

In [None]:
np.nan * np.zeros((5, 5))

In [None]:
np.ones(3)

In [None]:
np.ones((3, 3), dtype=int)

### `linspace`
- Return evenly spaced numbers over a specified interval
- similar to `range` and `arange` but end point is __included__.

In [None]:
np.logspace(1, 5, 5)

In [None]:
np.arange(0, 1000, 250)

---
<a id = 'random'></a>
## Generating random numbers

Numpy also has lots of ways to create random number arrays:

### `random.rand`
Create an array of the given shape and populate it with random samples from a __uniform distribution__ over ``[0, 1)``.

In [None]:
np.random.rand(3)

In [None]:
np.random.rand(5,5)

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

random_numbers = np.random.rand(1000)
plt.hist(random_numbers, bins=10); plt.xlabel('value'); plt.ylabel('count')

### `random.randn`

Return a sample (or samples) from the __standard normal__ distribution, unlike `rand` which is uniform.

In [None]:
np.random.randn(2)

In [None]:
np.random.randn(5,5)

In [None]:
random_numbers = np.random.randn(1000)
plt.hist(random_numbers, bins=10); plt.xlabel('value'); plt.ylabel('count')

### `random.choice`

Sample from a given array.

In [None]:
my_array = [14, 98, 1, -3, 6, 100]
np.random.choice(my_array, 6, replace=False) # set replace to True if want duplicates

---
<a id='attributes'></a>
## Array attributes and methods

### Shape of arrays

`shape` is an attribute that arrays have (not a method, notice no parentheses). `shape` is a tuple of the length of each dimension.

In [None]:
arr0 = np.arange(20)
arr0.shape

In [None]:
nested_list = [range(i, i + 4) for i in range(0, 12, 4)]
nested_list

In [None]:
arr1 = np.array(nested_list)
arr1

In [None]:
arr1.shape

In [None]:
nested_list.shape

Function `shape` is able to calculate the shape of non arrays.

In [None]:
np.shape(nested_list)

In [None]:
a = np.array([
    [1,2,3],
    [4,5]
])

In [None]:
a.shape

In [None]:
a

### Reshaping arrays
`reshape` returns an array containing the same data with a new shape.

In [None]:
arr = np.arange(20)
arr

In [None]:
arr.reshape(5, 4)

In [None]:
arr2 = arr.reshape(1, 20)
arr2

In [None]:
arr
"""
his illustrates a concept in python that helps with memory management - 
applying a transformation/function on certain data structures does not
create a new object; rather the action is performed on the original object.

We'll see more of this shortly
"""

### `max`, `min`, `argmax`, `argmin`

These are useful methods for finding max or min values. Or to find their index locations using `argmin` or `argmax`.

In [None]:
arr = np.random.randint(0, 100, 20)
arr

In [None]:
arr.max()

In [None]:
arr.argmax()

Optional lesson point: Finding max and min positions in 2D arrays

In [None]:
arr = arr.reshape(4, 5)
arr

In [None]:
arr[2, 3] = 100
arr

In [None]:
print('min is {} at {}'.format(arr.min(), arr.argmin()))
print('max is {} at {}'.format(arr.max(), arr.argmax()))

In [None]:
arr[arr.argmax()] # np.unravel_index(arr.argmax(), arr.shape)

### `mean` and `std`

In [None]:
arr.mean(axis=-1)

In [None]:
arr.std()

In [None]:
arr

In [None]:
arr.flatten(order='F')

### `dtype`

You can also grab the data type of the object in the array.

In [None]:
arr.dtype

__In-class practice__

Numpy also provides a useful function for loading in data from text files: `loadtxt`. Use this to load in our example data `ca-traces.txt`.

In [None]:
fname = 'sample_data/ca-traces.txt'
ca_data = np.loadtxt(fname, delimiter=',')

In [None]:
ca_data.shape

Let's visualize the data. We will cover plotting in a later lecture.

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

In [None]:
# Plot time series of sample neuron
plt.plot(ca_data[9]);

How could we downsample the data by average every 10 data points using methods `reshape` and `mean`?

In [None]:
ca_data[9].shape

In [None]:
# Code to downsample data
ca_data_ds = ca_data[9].reshape(int(3000/5), 5).mean(axis=1)

In [None]:
# Plot time series of sample neuron downsampled
plt.plot(ca_data_ds);

The reshape dimensions and averaged dimension matter! We want to average across rows here.

In [None]:
np.arange(20).reshape(10,2)

---
<a id='index'></a>
## Indexing

### Bracket indexing
- Indexing is similar to lists: use brackets
- However, you can list index for all dimensions within one bracket, e.g., `arr[1, 2]`
- For 2-dimenional arrays, the first index specifies the row

In [None]:
import numpy as np

In [None]:
arr0 = np.arange(0, 10)

In [None]:
arr0

In [None]:
arr0[8]

In [None]:
arr0[1:5]

__Important to note__: slicing does not create a new object, we've seen this before

_Note to instructor: Go over if no one is confused!_

In [None]:
arr2 = np.arange(10)
slice_of_arr2 = arr2[0:6]

In [None]:
slice_of_arr2[:] = 99
slice_of_arr2

Now note the changes also occur in our original array!

In [None]:
arr2

Data is not copied, it's a view of the original array! This avoids memory problems!

Use the method `copy` to explicitly create a copy.

In [None]:
slice_copy = arr2[0:6].copy()
slice_copy[:] = 42
arr2

### Indexing 2-D arrays (matrices)

The general format is **arr_2d[row][col]** or **arr_2d[row,col]**. I recommend usually using the comma notation for clarity.

In [None]:
arr_2d = np.array(([5,10,15],[20,25,30],[35,40,45]))
arr_2d

In [None]:
arr_2d[1, 2]

In [None]:
arr_2d[1]

In [None]:
arr_2d[1, :]

In [None]:
arr_2d[:, 1]

In [None]:
nested = [[5,10,15],[20,25,30],[35,40,45]]

In [None]:
nested[1][2]

In [None]:
nested[1, 2]

Transposing a matrix switches rows and columns. For an m,n matrix, the transpose will have shape (n, m).

In [None]:
arr2 = np.arange(20).reshape(4, 5)

In [None]:
arr2

In [None]:
arr2.T

Slicing can occur in one or more dimensions

In [None]:
arr2

In [None]:
arr2[0, 0:3]

In [None]:
arr2[1:3, 0:3]

### Fancy Indexing

- Fancy indexing allows more options for selecting elements than slicing
- Fancy indexing is done by providing a list for each dimension that specifies where to index

In [None]:
arr3 = np.arange(100).reshape(10, 10)
arr3

In [None]:
arr3[[2, 4, 6, 8], :]

In [None]:
arr3[[6, 4, 2, 8], :]

In [None]:
ix = [6, 4, 2, 8]
arr3[ix, :]

Often you'll need to index based on the values at each location. Comparison operators will be useful in these cases.

In [None]:
arr = np.arange(1,11)
arr

In [None]:
arr > 4

In [None]:
bool_arr = arr>4

In [None]:
arr[bool_arr]

In [None]:
arr[arr>4]

### More Indexing Help
Indexing a 2d matrix can be a bit confusing at first, especially when you start to add in step size. Try google image searching NumPy indexing to find useful images, like this one:

<img src= 'http://memory.osu.edu/classes/python/_images/numpy_indexing.png' width=500/>

---
<a id='broadcast'></a>
## Broadcasting

From [numpy](#https://docs.scipy.org/doc/numpy-1.13.0/user/basics.broadcasting.html): "The term broadcasting describes how numpy treats arrays with different shapes during arithmetic operations."

Interpretation: you can 'intuitively' propagate operations along particular axes to make performing operations easier.

In [None]:
arr = np.arange(10)
arr[0:5] = 100
arr

Doesn't work as intended with lists!

In [None]:
my_list = range(10)
my_list[0:5] = 100
my_list

What broadcasting is actually doing.

In [None]:
100 * np.ones(5)

In [None]:
arr[0:5] = 100 * np.ones(5)
arr

Another example:  
Normalizing data with z score: $\dfrac{x - \mu}{\sigma}$

In [None]:
# Number of nosepokes for 5 mice over 10 sessions
nosepokes = np.array([
    [ 89,  43,  97,  60,  29],
    [ 81,  42, 101,  56,  14],
    [ 91,  43,  94,  53,  18],
    [ 94,  40,  94,  55,  14],
    [ 83,  30,  99,  61,  14],
    [ 82,  43,  93,  59,  16],
    [ 85,  31, 101,  58,  30],
    [ 88,  36,  92,  58,  21],
    [ 82,  30,  88,  56,  28],
    [ 92,  44,  99,  64,  16]
])

In [None]:
pokemean = nosepokes.mean(axis=0)
pokemean

In [None]:
pokestd = nosepokes.std(axis=0)
pokestd

In [None]:
(nosepokes - pokemean) / pokestd

What to watch for

In [None]:
arr1 = np.arange(20).reshape(4, 5)
arr1 + np.ones(4)

In [None]:
arr1 = np.arange(20).reshape(4, 5)
arr1 + np.ones(4).reshape(4, 1)

---
<a id='operations'></a>
## Numpy operations

### Arithmetic

You can easily perform array-with-array arithmetic or scalar-with-array arithmetic

In [None]:
arr = np.arange(0, 10).astype(float)
arr

In [None]:
np.arange(0., 10)

In [None]:
np.arange(0, 10, dtype=float)

In [None]:
arr + arr

In [None]:
arr * arr

In [None]:
arr - arr

In [None]:
arr / arr

In [None]:
1 / arr

Watch out for division by zero __warnings__. They will execute, but will warn you that you might get unexpected results.

Operations and broadcasting can be used together.

In [None]:
np.ones(3)[:, np.newaxis]   # Another way to add a 'singleton' dimension

### Universal Array Functions

- [Universal array functions](http://docs.scipy.org/doc/numpy/reference/ufuncs.html) essentially perform operations across the array

In [None]:
arr = np.arange(10)

In [None]:
np.sqrt(arr)

In [None]:
np.exp(arr)

In [None]:
np.max(arr) #same as arr.max()

In [None]:
np.max([np.nan, 10, 1, 2])

In [None]:
np.nanmax([np.nan, np.inf, 10, 1, 2])

In [None]:
np.max([np.nan, 10, 1, 2])

In [None]:
np.sin(arr)

In [None]:
np.log(arr)

In [None]:
np.log(100)

---
<a id='more'></a>
## More numpy

### Saving with `np.savetxt` and `np.save`.
- `savetxt` is useful for sharing with other applications on __2-dimenstional__ data
- `save` is useful for saving data for future use in numpy for __any__ n-dimensional data

In [None]:
import numpy as np

In [None]:
arr = np.arange(20).reshape(4, 5)
np.savetxt(r'sample_data\created\test.txt', arr, fmt='%.2e')

In [None]:
arr3 = np.arange(60).reshape(3, 4, 5)
np.savetxt(r'sample_data\created\test.txt', arr3)

In [None]:
np.save(r'sample_data\created\test3d.npy', arr3)

### Loading data with `np.loadtxt` and `np.load`

In [None]:
loaded_arr3 = np.load(r'sample_data\created\test3d.npy')
loaded_arr3

In [None]:
loaded_arr3.shape

In [None]:
loaded_arr = np.loadtxt(r'sample_data\created\test.txt')
loaded_arr

Notice `loadtxt` creates an array of floats. `load` preserves the data type however.`

### Joining arrays
- `concatenate`: adds to __existing__ dimension
- `stack`: adds to __new__ dimension

In [None]:
mouse1_wts = np.array([29, 29, 30, 29, 31])
mouse4_wts = np.array([31, 32, 32, 32, 33])

In [None]:
np.concatenate([mouse1_wts, mouse4_wts], axis=0)

In [None]:
np.concatenate([mouse1_wts, mouse4_wts], axis=1)

In [None]:
np.stack([mouse1_wts, mouse4_wts], axis=1)

`stack` will always add a new dimension, thus dimensionality increases by 1.

In [None]:
all_wts = np.stack([mouse1_wts, mouse4_wts], axis=0)
all_wts

Now, you can do things like take an average.

In [None]:
all_wts.mean(axis=1)

In [None]:
np.row_stack([mouse1_wts, mouse4_wts])

In [None]:
mouse3_wts = [28, 28, 29, 27, 27]

In [None]:
all_wts

In [None]:
np.concatenate([all_wts, mouse3_wts], axis=0) # try row_stack instead