#MLHEP2015: on unsupervised learning and fish brains

There are certain species of fish, beloved by the neurobiologists for their (almost) absolute transparency during the first weeks of life. One of such fish species is known as the zebrafish (or Danio Rerio).


The dataset you are going to work with contains time series of zebrafish brain scans. The time series are 240s, one data point per second.

In [None]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
%load_ext autoreload
%autoreload 2

In [None]:
timeSeries = pd.DataFrame.from_csv("Mr.fish")

timeSeries = np.array(timeSeries)

#Greet Mr. Fish
This way you can draw a picture of some zebrafish-related data as an image. In this case we print a t=0 snapshot of the brain

In [None]:
from zebrafish_drawing_factory import draw_component

draw_component(timeSeries[:,1]) #at t=0 here

#####QUEST:
Plot a snapshot (image for a fixed time moment) for a different t between 0 and 239.

In [None]:
<your code>

#Individual neurons
Here we plot activity time series for several random neurons

In [None]:
time_points = np.arange(240)

plt.figure(figsize=[10,10])

plt.plot(time_points,timeSeries[20222,:])
plt.plot(time_points,timeSeries[12345,:])
plt.plot(time_points,timeSeries[10000,:])

    

#####QUEST:
You are given several pixel indices. 
Try plotting the activities of the corresponding pixels.

In [None]:
pixel_ids = np.random.randint(0,timeSeries.shape[0],30)

#your code: plot the pixels with corresponding ids

#Let machines do the dirty work for us
#####QUEST: 
Find the top-10 (or more) principial components of the data.

In [None]:
from sklearn.decomposition import PCA

pca_model = <Your code: create and fit the PCA model>


In [None]:
#let us view the components

for i in range(10):
    plt.plot(pca_model.components_[i])
    plt.show()

####Now we shall compute principial component values for the neurons|

In [None]:
timeSeries_pca = pca_model.transform(timeSeries)

In [None]:
print timeSeries_pca[:10,:5]

###... and view the areas  where these components occur
#####QUEST:
Draw the intensity map for the 2nd principial component with draw_component, just like you did before.

In [None]:
<your code>

In [None]:
from zebrafish_drawing_factory import draw_two_components

draw_two_components(timeSeries_pca[:,3],timeSeries_pca[:,4])

In [None]:
#plot pairs of different components
<your code>

#Extracting features from time series
The vanilla PCA only captures the linear components of the data.
In order to find nonlinear structure in it, we should extract features from the data, to than feed then to the PCA.

In [None]:
def extract_features(impulses):
    features = []
    
    #example features: sum every 10-th element for each of 10 features (with different starting positions)
    for i in range(0,10,1):
        features.append(np.sum(impulses[i::10]))
        
    return features

In [None]:
timeSeries_features = np.array(map(extract_features, timeSeries)).astype(float)

In [None]:
timeSeries_features[:10,:5]

#####QUEST:
Extract PCA features from the data. NOTE that your model only knows how to handle the features data, not the raw time series.

In [None]:
pca_features = <your code>

In [None]:
timeSeries_features_pca = pca.transform(timeSeries_features)#<your code>

###See what we found...

#####QUEST: draw some plots to explore the data

In [None]:
#optional: plot the components 


In [None]:
draw_component(timeSeries_pca[:,1])

In [None]:
draw_two_components(timeSeries_pca[:,5],timeSeries_pca[:,6])

In [None]:
#draw some other components of the data. Feel free to explore
<your code>

#####QUEST:
Try changing the period in the features extractor above. Try collecting features for several different periods together.

#Fourier features
#####QUEST:
Implement extract_features_fourier function below so that it computes the fourier transformation of data as features.
The fourier transformation can be done via np.fft.fft function

In [None]:
def extract_features_fourier(impulses):
    features = []
    
    <your code here>
    
    return features
#numpy.fft.fft

In [None]:
#extract features
timeSeries_features_fft = np.array(map(extract_features_fourier, timeSeries)).astype(float)
#train PCA
pca_fft = PCA().fit(timeSeries_features_fft)
#transform data
timeSeries_pca_fft = pca_fft.transform(timeSeries_features_fft)


In [None]:
#Draw the components
for i in range(len(pca_fft.components_)):
    plt.plot(pca_fft.components_[i])
    plt.show()

In [None]:
#Draw the component activity maps
for i in range(1,20):
    draw_component(timeSeries_pca_fft[:,i])


#Let us invent...
#####Boss fight:
It is now your turn to attemt features of your own invention.
Here are some suggestions:
* a) Try different intervals (why must they be 20 after all?)
* b) Sum with different periods (instead of different starting points)
* c) Fourier transformation (np.fft, google me)
* d) combine different approaches
* e) look what your neighbor's doing and add it to (d)

 1) think it;  
 2) code it;   
 3) try it;         
 4) boast about it;        
 5) go to step 1; 

In [None]:
def extract_features2(impulses):
    features = []
    
    <your imagination goes here>
    
    #OLD ONE: sum every 20-th element for each of 20 features (with different starting positions)
    #for i in range(0,20,2):
    #    features.append(np.sum(impulses[i::20]))

    
    return features

In [None]:
#extract features
timeSeries_features2 = np.array(map(extract_features2, timeSeries)).astype(float)
#train PCA
pca2 = PCA().fit(timeSeries_features2)
#transform data
timeSeries_pca2 = pca2.transform(timeSeries_features2)


In [None]:
#explore them!
draw_two_components(timeSeries_pca2[:,5],timeSeries_pca2[:,6])

In [None]:
#MOAR BRAINZ!!1
