# DeepInsight Example
- This notebook shows how to convert SNP tabular data into SNP images by [DeepInsight](https://alok-ai-lab.github.io/DeepInsight/).
- We just applied SNPs on Chromosome 1 as an example.
- More details about DeepInsight can be found in [the official github](https://github.com/alok-ai-lab/pyDeepInsight).



In [None]:
from pyDeepInsight import ImageTransformer
from pyDeepInsight.utils import Norm2Scaler
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.manifold import TSNE
import pandas as pd
import numpy as np
import pyreadr

## Training TSNE

### Loading SNP per chromosome files.

In [8]:
# Load SNP per chromosome RData file, 
# It is generated in DataPreprocessing.Rmd in PreProcessing folder.
chrr = "1" #Here I just load SNP from chromosome 1. 
result = pyreadr.read_r('../../../Geno/SNPchrs/SNPchr'+ chrr+ '.RData')
geno_chr = result["geno_chr"]
expr = geno_chr
y = expr.index
X = expr.values

### Normalize data using LogScaler.

In [None]:
ln = Norm2Scaler() 
X_norm = ln.fit_transform(X)

### Create t-SNE object

In [None]:
distance_metric = 'cosine'
reducer = TSNE(
    n_components=2, ## 2 components
    metric=distance_metric,
    init='random',
    learning_rate='auto',
    n_jobs=-1,
    random_state=42
)

### Initialize image transformer.


In [None]:
resolution = 277 ##optioinal resolution 
pixel_size = (resolution,resolution)
it = ImageTransformer(
    feature_extractor=reducer, 
    pixels=pixel_size)

### Train image transformer, it may take long time.

In [None]:
# Train image transformer on training data and transform training 
# and testing sets. Values should be between 0 and 1.
it.fit(X, plot=True)

### Convert all SNP tabular files into SNP images

In [None]:
X_img = it.transform(X_norm, empty_value=1)
np.save(f'../../SNPimg/Chr{chrr}_tsne_{resolution}.npy', X_img)

In [None]:
# The feature density matrix can be extracted from the trained transformer 
# in order to view overall feature overlap.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import seaborn as sns

fdm = it.feature_density_matrix()
fdm[fdm == 0] = np.nan

plt.figure(figsize=(10, 7.5))

ax = sns.heatmap(fdm, cmap="viridis", linewidths=0., 
                 linecolor="lightgrey", square=True)
ax.xaxis.set_major_locator(ticker.MultipleLocator(5))
ax.yaxis.set_major_locator(ticker.MultipleLocator(5))
for _, spine in ax.spines.items():
    spine.set_visible(True)
_ = plt.title("Genes per pixel")

## Show TSNE 

The following are showing plots for the image matrices first four samples 
of the training set. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Create a grid of subplots to display images
rows, cols = 49, 4
fig, axes = plt.subplots(rows, cols, figsize=(8, 98))

# Loop through the axes and display images
for i in range(rows):
    for j in range(cols):
        index = i * cols + j
        if index < len(X_img):
            axes[i, j].imshow(X_img[index])
            axes[i, j].title.set_text(f"{y[index]}")
            axes[i, j].axis('off')

# Adjust layout and display the plot
plt.tight_layout()
# plt.savefig(f'../../SNPimg/Chr{Chr}_tsne_{res}.png')
# plt.show()

## Reference
- Sharma, Alok, et al. "DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture." Scientific reports 9.1 (2019): 11399.