# Plate Analysis Interactive
**by Conrad Hall** <br><br>

This is an attempt to automate the analysis of a high content screen with a small molecule library.
The 5 parameters followed in this screen were:
-  ValidObjectCount
-  MEAN_TargetTotalIntenCh2
-  MEAN_ObjectAvgIntenCh1
-  MEAN_ObjectAreaCh1
-  %HIGH_TargetTotalIntenCh2

The plots are interactive so that all parameters on all plates can be visually inspected as a heatmap

In [1]:
import ipywidgets as widgets
import glob
import re
from ipywidgets import fixed
from IPython.display import display, Markdown, display_markdown
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from collections import OrderedDict
%matplotlib notebook

***
### Import Data
Imports all csv files in a hardcoded path then places the data into dataframes as part of a dictionary. The plate names are also pulled from the filename and stored.

**Apply the code from join lopac features to CSV so molecule data is now present as a part of df_dict**

In [2]:
# Load Feature Data from CSV into a DataFrame
path = r'MoleculeData'
filenames = glob.glob(path + '/*.csv')
i=0
df_dict = {}
plate_name = []
regex = r'^.*[\/\\](\D+)(\d+|\d+\D+)\.\w+' # Regex finds two groups from the file name. First is the library name and second is the plate name
for filename in filenames:
    df = pd.read_csv(filename, index_col=0)
    plate_name.append(re.findall(regex,filename)[0][1])
    df_dict[i]= df
    i+=1
    
# Load Compound List from CSV into a DataFrame
drugpath = r'MoleculeData/drug_list'
drugfilename = glob.glob(drugpath + '/*.csv')
drugdf = pd.read_csv(drugfilename[0], encoding='cp1252')

#Rearrange Compound DataFrame to match Feature DataFrame
well_list = drugdf["well"].tolist()
drug_regex = r'^(\w)(\d+)'
ID = []
for entry in well_list:
    row = re.findall(drug_regex,entry)[0][0]
    col = re.findall(drug_regex,entry)[0][1]
    ID.append(100*(ord(row)-64) + int(col))

se = pd.Series(ID)
drugdf['WellID'] = se.values
drugdf = drugdf.set_index('WellID')
drugdf = drugdf.drop(columns='well')

# Merge Compound DataFrame into the Feature DataFrame
i=0
for name in plate_name:
    plateregex = r'^\d+'
    df1 = drugdf.loc[(drugdf['plate #'] == int(re.findall(plateregex,name)[0])) & drugdf['molecule']]
    df_dict[i] = pd.concat([df1['molecule'],df_dict[i]],axis=1, sort=False)
    df_dict[i]['molecule'] = df_dict[i]['molecule'].fillna('DMSO')
    i+=1
# Display a sample of the assembled DataFrame
display_markdown('### This is a sample of the structure of the imported data for plate ' + plate_name[0]+':',raw=True)
df_dict[0].head()

### This is a sample of the structure of the imported data for plate 1245:

Unnamed: 0_level_0,molecule,pRow,pCol,%HIGH_TargetTotalIntenCh2,MEAN_ObjectAreaCh1,MEAN_ObjectAvgIntenCh1,MEAN_TargetTotalIntenCh2,ValidObjectCount
WellID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
101,DMSO,1,1,7.96,158.97,3594.84,83311.16,1019
102,DMSO,1,2,12.94,173.76,3502.27,129541.95,2080
103,DL-alpha-Methyl-p-tyrosine,1,3,14.2,181.96,3465.02,142029.32,1888
104,N-Acetyl-L-Cysteine,1,4,20.92,187.53,3362.29,198354.63,1822
105,"6-Methoxy-1,2,3,4-tetrahydro-9H-pyrido[3,4b] i...",1,5,16.29,166.75,3969.12,171912.72,1830


***
### Generate Heatmaps

Takes the imported data from the previous section and creates a heatmap in a plateview. Two sliders are presented to allow the user to select which plate and which feature they would like displayed. 

**Previous sections import data and organize it into a dictionary of dataframes. The plate names are stored in a list**

Now take that data and create a continously updated heatmap as the sliders for the plate and the feature are adjusted

In [3]:
def update_heatmap(feature,plate,table,plate_name):
    # Display heatmap of features
    df = df_dict[plate]
    col_name = df.columns[feature+3]
    plateview = df.pivot(index='pRow', columns='pCol', values=col_name)
    dyn_map = sns.heatmap(plateview, linewidths=.05, square =True, cmap='coolwarm', cbar_kws={"orientation": "horizontal"})
    dyn_map.set_title('Plate: ' + plate_name[plate] + '   Feature: ' + col_name)
    display_markdown('### Feature Heatmap\n ### Plate: ' + plate_name[plate] + '  Feature: ' + col_name, raw=True)
    plt.show()
    
    # Display table showing location of compounds that updates as plate slider is moved
    drug_plateview = df_dict[plate].pivot(index='pRow', columns='pCol', values='molecule')
    display_markdown('### Compound Positions Plate: ' + plate_name[plate], raw=True)
    display(drug_plateview)

# Widgets and controls for heatmap that will update in realtime
sns.set(rc={'figure.figsize':(11.7,8.27)})
plate = widgets.IntSlider(min=0,max=(len(plate_name)-1),description='Plate:') 
feature = widgets.IntSlider(min=0,max=(len(df_dict[0].columns)-4),description='Feature:')
widgets.interactive(update_heatmap,feature=feature,plate=plate,table=fixed(df_dict),plate_name=fixed(plate_name))

interactive(children=(IntSlider(value=0, description='Feature:', max=4), IntSlider(value=0, description='Plate…

***
### Identify Candidates
Perform a normalization of the data using the DMSO controls. Display a candidate list based on thresholding parameters and graph the values of all wells.


In [4]:
# Calculate the DMSO control's mean and standard deviation for every feature of each plate. Save it as a dataframe called dmsodf
dmsostat = []
featurecol = df_dict[0].columns
featurecol = featurecol[3:len(featurecol)]
for i in df_dict:
    dmsodf = df_dict[i].loc[df_dict[i]['molecule']=='DMSO']
    
    for j in featurecol:
        stat=OrderedDict()
        stat['plate'] = plate_name[i]
        stat['feature'] = j
        stat['mean'] = dmsodf[j].mean()
        stat['std'] = dmsodf[j].std()
        dmsostat.append(stat)

statdf = pd.DataFrame(dmsostat)

In [5]:
# df_norm is a dictionary of normalized data from the dictionary df_dict
df_norm=df_dict
for i in df_norm:
    df_temp = df_norm[i]
    statdf_temp = statdf.loc[statdf['plate']==plate_name[i]]
    for j in featurecol:
        val = statdf_temp.loc[statdf_temp['feature'] == j]['mean'].tolist()
        df_temp[j] = 100*(1-df_temp[j]/val)
            
    df_norm.update({i: df_temp})

In [6]:
def update_candidates(plate_norm,feature_norm,sigma_thresh,celldeath_thresh,plate_name,df_norm,statdf,featurecol):
    # Normalize data
    avgfeature = (statdf.loc[(statdf['plate']== plate_name[plate_norm]) & (statdf['feature']==featurecol[feature_norm])]['mean']).to_list()[0]
    stdfeature = (statdf.loc[(statdf['plate']== plate_name[plate_norm]) & (statdf['feature']==featurecol[feature_norm])]['std']).to_list()[0]
    
    avgcell = (statdf.loc[(statdf['plate']== plate_name[plate_norm]) & (statdf['feature']=='ValidObjectCount')]['mean']).to_list()[0]
    stdcell = (statdf.loc[(statdf['plate']== plate_name[plate_norm]) & (statdf['feature']=='ValidObjectCount')]['std']).to_list()[0]
    print('plate #:',plate_name[plate_norm], '\nDMSO Control Value:', featurecol[feature_norm]+':', int(avgfeature), '+-', int(stdfeature), 'CellCount:', int(avgcell), '+-', int(stdcell))
    
    dfsort = df_norm[plate_norm].sort_values(by=featurecol[feature_norm])
    
    # Display Scatterplot of %Reduced_%HIGH_TargetTotalIntenCh2
    sns.set(style="whitegrid")
    fig, ax = plt.subplots()
    ax.scatter(x=list(range(1,len(df_norm[plate_norm].index)+1)), y='ValidObjectCount', data=dfsort)
    ax2 =ax.twinx()
    ax2.scatter(x=list(range(1,len(df_norm[plate_norm].index)+1)), y=featurecol[feature_norm], data=dfsort, color='r')
    ax.set(ylabel='%Reduced_CellCount(BLUE)')
    ax2.set(ylabel='%Reduced_'+featurecol[feature_norm]+'(RED)')

    # Display list of candidates that satisfy criteria
    hitslist = df_norm[plate_norm].loc[(df_norm[plate_norm][featurecol[feature_norm]] >= (stdfeature*sigma_thresh)) & (df_norm[plate_norm]['ValidObjectCount'] <= (celldeath_thresh*avgcell)), ['molecule',featurecol[feature_norm], 'ValidObjectCount']]
    display('Number of candidates:', len(hitslist))
    display(hitslist)
    
# Widgets for graph and candidate list that updates in realtime
plate_norm = widgets.IntSlider(value=0,min=0,max=(len(plate_name)-1),description='Plate:')
feature_norm = widgets.IntSlider(min=0,max=(len(featurecol)-1),description='Feature:')
sigma_thresh = widgets.FloatSlider(min=1,max=10,description='Sigma multiplier:') 
celldeath_thresh = widgets.IntSlider(min=1,max=100,description='Reduction in cell count:')

widgets.interactive(update_candidates, plate_norm=plate_norm, feature_norm=feature_norm, sigma_thresh=sigma_thresh,celldeath_thresh=celldeath_thresh,plate_name=fixed(plate_name),df_norm=fixed(df_norm),statdf=fixed(statdf),featurecol =fixed(featurecol))

interactive(children=(IntSlider(value=0, description='Plate:', max=4), IntSlider(value=0, description='Feature…