# NMR spectra investigation

In this document presented an investigation of four classes of multi-component signal with chemical shift.

The first aim of this investigation is to see if unsupervised clusteristion can present convincing results.
Second - if it is possible to set up a NN-based signal classifier.

## 1. Data overview

The disposable dataset has 59 samples belonging to 4 categories: 'Pre'(19 samples), 'W2'(14 samples), 'W8'(13 samples), 'W6'(13 samples). 
![](whole_dataset.png)
<text><center>Pic.1</center></text>

As seen on the Pic.1, there is no clear pattern for any of signal types. Pic.2 and Pic.3 show all the signal groups plotted separated and and a random example of each type of signal respectively.
<center>All signals</center> | <center>Signals plotted by class</center>
- | - 
![alt](all_signals.png) | ![alt](all_signals_examples.png)
<center>Pic.2</center> | <center>Pic.3</center>

As seen, there is no obvious clear pattern which allows visually discern the signals and define which class they belonging to.

## 2. Clusterisation

### 2.1 PCA
PCA was the first unsuperwised clusterisation method, apllied to the dataset:

 <center>All signals</center>| <center>Signals plotted by class</center>
- | - 
![alt](pca_y.png) | ![alt](pca_all_2d.png)

 <center>All signals - scaled</center>| <center>Signals plotted by class - scaled</center>
- | - 
![alt](pca_y_scalled.png) | ![alt](pca_all_2d_scalled.png)

<center>All signals - normalized</center>| <center>Signals plotted by class - normalized</center>
- | - 
![alt](pca_all_2d_norm.png) | ![alt](pca_y_norm.png)
<center>Pic.4</center> | <center>Pic.5</center>

Only the class "Pre" shows a clear pattern in PCA. For the other three is impossible to define affiliation to any of other classes with accetable error rate.

### 2.2 K-Means Neighbours

K-Mean neighbours has neither shown satisfied results (Pic.6 and Pic.7)
<center>All signals</center>| <center>Signals plotted by class</center>
- | - 
![alt](KMean_all.png) | ![alt](KMeans_separated.png)

 <center>All signals - scaled</center>| <center>Signals plotted by class - scaled</center>
- | - 
![alt](KMean_all_scaled.png) | ![alt](KMeans_separated_scaled.png)

<center>All signals - normalized</center>| <center>Signals plotted by class - normalized</center>
- | - 
![alt](KMean_all_norm.png) | ![alt](KMeans_separated_norm.png)

<center>Pic.6</center> 

### 2.3 Spectral clustering

Try to project K-Means on Laplacian: <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.165.9323">spectral clustering</a>:

<center>Spectral clustering</center>|
- |
![alt](SC_1.png)| 
![alt](SC_2.png)|
![alt](SC_3.png)|
<center>Pic.7</center> |

Resulting accuracy is about 33%

### 2.4 Affine transformation

with damping = 0.9

<center>Affine transformation</center>|
- |
![alt](at_1.png)| 
![alt](at2.png)|
![alt](at3.png)|
<center>Pic.8</center> |

### 2.4 Clusterisation Results and Discussion

Presented clusterisation methods do not provide satisfactory results. Means either they do not have needed feature sensivity or there are simply no sistematic elements in presented classes. The next logical step would be to apply geometrical feature analysis to the dataset.

## 3. Geometrical Feature Search

In this analysis the presented dataset will be analysed class-wise to find out if there are some significant feature which could be enough for more sensitive clusterisation or classification.

### 3.1 Persistence diagrams

<a href="https://en.wikipedia.org/wiki/Persistent_homology">Persistence homology</a> is well known as a tool which allows to extract true features rather than artifacts and/or noise. Results of homological analysis is presented on the Pic.9


| <center>Pre</center> | <center>W2</center>|
| - | - |
|![](PD_pre.png)|![](PD_W2.png)|

| <center>W6</center> | <center>W8</center> |
| - | - |
|![alt](PD_W6.png)|![alt](PD_W8.png)|

<center>Pic.8</center>

Persistent homology shows clear difference between presented samples.

### 3.2 Persistence entropy

To be sure about results obtained with persistence persistence diagrams, will be also applied <a href="https://link.springer.com/article/10.1007/s10844-017-0473-4">persistence entropy</a> (Pic.10)

| <center>10% samples </center> | <center>30% of samples</center>|
| - | - |
|![](pers_entr_01.png)|![](pers_entr_03.png)|

| <center>50% samples</center> | <center>all samples</center> |
| - | - |
|![alt](pers_entr_05.png)|![alt](pers_entr_1.png)|

<center>Pic.9</center>

This method also shows clear feature difference for all the four classes.

## Worth to try:
### Persistence entropy with preprocessed (projected) signals
The reshaped signal 90x91

| <center>with outlier</center> | <center>all</center> |
| - | - |
|![alt](pe_reshaped_outl.png)|![alt](pe_reshaped_all.png)|

<center>Pic.10</center>

### Fuzzy K-Means
Shows the sets are very similar. Well, we know it...

| <center>all</center> | <center>separated</center> |
| - | - |
|![alt](fuzzy_c_means_all.png)|![alt](fuzzy_c_means_sep.png)|

<center>Pic.10</center>


- Fourier transformation -- convolution theorem
- Taken's embedding for each class 
- Iterative Signature Algorithm

- SVD/SVM
- Cut out important pieces

## 4. Feature Extraction with Convolutional Neural Network

The next logical step would be prove if a deep learning method is able to catch the features of deformed signals in presented dataset (and experience says - yes, it can). But a check is needed.
As feature extractor was used pretrained ResNet50, thresholded results are presented on the Pic.11:

| <center>Pre</center> | <center>W2</center>|
| - | - |
|![](NN_pre.png)|![](NN_W2.png)|

| <center>W6</center> | <center>W8</center> |
| - | - |
|![alt](NN_W6.png)|![alt](NN_W8.png)|

<center>Pic.10</center>

NN seems to be sensitive to the features of signal (even with limited input, which in case of the most presented signal - "Pre" is only 19). Anyway, a classification worth to try.

| <center>488</center> | <center>696</center>|<center>868</center>|<center>904</center>|
| - | - |- | - |
|![](488-488.png)|![](488-696.png)|![](488-868.png)|![](488-904.png)|
|![alt](696-488.png)|![alt](696-696.png)| ![](696-868.png)|![](696-904.png)|
|![alt](868-488.png)|![alt](868-696.png)| ![](868-868.png)|![](868-904.png)|
|![alt](904-488.png)|![alt](904-696.png)| ![](904-868.png)|![](904-904.png)|


<center>Pic.11</center>

Plots from Pic.12 shows coordinates from the most represented NN outputs. As expected, a NN extracted also the most represented features of signals.
### Worth to try:
- Clustered anomaly detection for the multidimentional output space.

## 5. Signal Classification with Neural Net

### 5.1 Dataset Augmentation

The available dataset has limited number of samples (min 13, max 19) per class.

For data augmentation can be applied to the dataset with the following algorithm:
0. Find "Peaks-zones"
1. Define number of transformation N = [0,n];
2. Apply the stretch transformation ST = [-20,20] N times to the "Peaks-zones"
3. Stretch "zones without signals" to compensate ST
