# Graphing RNAseq Counts with Search Functionality

Welcome to my notebook! The purpose of this code is to graph some very basic RNA sequencing data - counts per gene, specified by gene. The data set I have is for _Drosophila melanogaster_ wing and eye imaginal discs. This data set has three trials each for eye and wing disc RNAseq counts. 

In [106]:
# Load necessary packages

import pandas as pd
import numpy as np
import plotly.express as px
import webbrowser

# You might need to install plotly in your terminal by using the following code: python -m pip install plotly


## Loading data and organize

In [107]:
# Load data file, retain only gene name column, make all gene names lower case for easy searching
df = pd.read_csv('final results clean.csv')
data = df['gene name'].reset_index().drop(columns='index')

# Average the RNAseq counts across trials, add those columns back onto dataset. 
# Don't run this section of code more than once or it'll duplicate the columns.
eye_data_mean = df.iloc[:,2:4].mean(axis=1)
wing_data_mean = df.iloc[:,5:7].mean(axis=1)
data.insert(1,"Eye Disc", eye_data_mean)
data.insert(1,"Wing Disc", wing_data_mean)

data_tidy = data.melt(
    id_vars=['gene name'],
    var_name='Imaginal Disc'
)

## Searching for genes to compare

This code allows you to search for genes within the data set and add them to a running list that will then be shown on a graph. First the list is blanked - the list can be re-blanked if you'd like to start over. The next code allows you to search for a gene name or partial gene name, which you can then add to the list. Then you can search again for another gene name, and add that to the list as well. If your search results contain mulitple genes, all will be added. When you're done adding genes, move onto the next step.

The search is case sensitive.

In [126]:
# Blank the list
genes = []

In [127]:
# Search by gene name. When run, a search box will appear above the noteboook.
search_result = []

search_input = input ("Please enter a search criterion: ")

for gene in df['gene name']:
    if search_input in str(gene):
        search_result.append(gene)
        continue

if len(search_result) == 0:
    print ("No gene matches your search criterion of ->", search_input)

else:
    print("We have a match!")

for align in search_result:
    print(align)

We have a match!
Wnt2
Wnt5
Wnt4
Wnt6
Wnt10


In [128]:
# If you would like the gene you searched to be included in the graph, run this code to add it to the list. 
# Continue searching and adding until your list is complete.
genes.extend(search_result)
genes

['Wnt2', 'Wnt5', 'Wnt4', 'Wnt6', 'Wnt10']

In [129]:
# You can also extend the list manually, if you already know what genes you want to add.
# This is also helpful if you can't narrow down the search to one specific gene only.
genes.extend(["wg"])
genes

['Wnt2', 'Wnt5', 'Wnt4', 'Wnt6', 'Wnt10', 'wg']

## Graphing the data

In [130]:
# Index the data based on the list of genes you've created.
data_sample = data_tidy[data_tidy['gene name'].isin(genes)]



In [131]:
# Graph the data. Feel free to change the font.
fig = px.line(data_sample, x='gene name', y='value', color='Imaginal Disc', markers=True, labels={
                     "gene name": "Gene",
                     "value": "Counts",
                 },title="RNAseq Counts in Imaginal Discs")
fig.update_layout(
    font_family="Arial",
    title_font_family="Arial",
)
fig.update_xaxes(title_font_family="Arial")
fig.show()

In [114]:
# Since this is an interactive plot, save it as an html. You can then open it in your browser. 
fig.write_html("file.html")
webbrowser.open('file://' + os.path.realpath('file.html'))

True