# Visualizing SSNs using an internactive network

Networks have nodes and edges. A node is a connection point and edges are the connections between them. In this exercise each protein is a node and each edge indicates an evalue (expectation value) smaller than 10e-40.

## Examining the BLAST output

<font color=blue><b>STEP 1:</b></font> Let's first look at the output of our BLASTP search using the "!head" command. 

In [None]:
!head files/BLASTe40_out

***
You should see three columns, they aren't labeled, but we dicated them using the outfmt command in our blast search. To refresh your memory here is the command we ran:

~~~python
!blastp -db files/finalpro_40 -query files/final_40.fasta -outfmt "6 qseqid sseqid evalue" -out files/BLASTe40_out -num_threads 4 -evalue 10e-40
~~~

    - qseqid is the query sequence id (we will call this the source)
    - sseqid is the subject sequence id (we will call this the target)
    - evalue is the expectation value

<font color=blue><b>STEP 2:</b></font> Answer the following questions:

    1. What is the expectation value when the source and target are the same?
    2. Given that the expectation value refers to the chance that a BLAST hit is found by chance, does your answer to question 1 make sense?
    3. In the first ten results in the file, what is the expectation value of the closest non-identical match (give the source, target, and evalue)?
    
***

## Creating a Dataframe from the BLAST output

A litte lingo here. An API is an Application Programming Interface. APIs are pieces of software that allow applications to talk to each other. A dataframe is a popular API that resembles a spreadsheet. The BLAST output is a tab separated file, and we will use pandas - a "powerful Python data analysis toolkit" to read our file and convert it to a dataframe.

<font color=blue><b>STEP 3:</b></font> Edit the code to replace the <b>\<<<your file here\>>></b> with the BLAST output file. Then run the code below to convert the BLASTe40_out file into a dataframe. Since our BLAST output did not contain any headers, we can add them in.


In [None]:
import pandas as pd # imports the pandas functions
#import json
#import ipycytoscape

headerList = ['source','target','evalue']

blast_data_con = pd.read_csv('files/BLASTe40_out', sep='\t', header=None)  # reads the BLAST output and looks for 'tab' to separate the values

blast_data_con.columns = ['source', 'target', 'evalue']  # assigns names to the columns

blast_data_con    # show what is in the dataframe.

Note that the complete dataframe is not shown, but that it contains over 2200 edges.
***
## Creating dataframes of edges and nodes

## Removing duplicates and self-references from edges

The code below removes duplicates (e.g. if <font color="blue">a</font> finds <font color="blue">b</font> and <font color="blue">b</font> finds <font color="blue">a</font>, we only need to keep one of them) and self-references (e.g. remove all instances of <font color="blue">a</font> finds <font color="blue">a</font>).

The code is a bit complicated and uses another function called numpy. Briefly, the code uses pandas and the numpy.sort function to create another dataframe with only the duplicates. 

We call the new dataframe edges.

<font color=blue><b>STEP 4:</b></font> Run the code below to create a dataframe of unique edges.

In [None]:
import numpy as np

df = blast_data_con    

#remove duplicates
m=pd.DataFrame(np.sort(df[['source','target']])).duplicated()
df = df[~m]

#removes self-reference
df = df[df.source != df.target]

edges = df    # this is a unique set of edges
edges         # show us the edges dataframe

## Creating a unique list of nodes

We will use the numpy.unique function to read through the sources and targets in the dataframe and make a list (called uniq_list) of nodes.

Then we will use pandas to convert this list into a simple dataframe of nodes.

<font color=blue><b>STEP 5:</b></font> Run the code below to create dataframe of unique nodes.

In [None]:

uniq_list = np.unique(df[['source', 'target']].values)   # find the unique values and put them in a list

nodes = pd.DataFrame(uniq_list, columns = ['id'])  # make a node dataframe with the column header id

nodes     # show us the nodes dataframe

In [None]:

uniq_list = np.unique(df[['source', 'target']].values)

nodes = pd.DataFrame(uniq_list, columns = ['id'])

# Let's add some new columns to our dataframe
nodes['label'] = nodes['id']
nodes['background-color']='cyan'    # our default color is cyan, but could be anything.
nodes['width']='24'
nodes['height']='24'
nodes['text-valign']='center'
nodes['text-halign']='center'
nodes['count'] = '1'

"""
#Let's change the size of the nodes based on the number of connections. 
col_one_list = nodes['id'].tolist()     # make a list from the dataframe

for item in col_one_list: 
    size = len(edges[edges['source']==item]) + len(edges[edges['target']==item])
    nodes.loc[nodes['id'] == item, 'count']=str(size)
    size = size*10
    nodes.loc[nodes['id'] == item, 'width']=str(size)
    nodes.loc[nodes['id'] == item, 'height']=str(size)
"""


#Here we can assign colors to nodes that connect to one of our knowns!

nodes.loc[nodes['id'] == '1U8R_IDER','background-color']  = 'red'
nodes.loc[nodes['id'] == '1C0W_DTXR','background-color']  = 'orange'
nodes.loc[nodes['id'] == '6O5C_MTSR','background-color']  = 'yellow'
nodes.loc[nodes['id'] == '3HRT_SCAR','background-color']  = 'green'
nodes.loc[nodes['id'] == '5CVI_SLOR','background-color']  = 'blue'
nodes.loc[nodes['id'] == '3R60_MNTR','background-color']  = 'magenta'

"""
records = edges.to_records(index=False)
result = list(records)

for item in result:
    #print(item)
    if item[1] == '1U8R_IDER':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'red'
    if item[1] == '5CVI_SLOR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'blue'
    if item[1] == '3HRT_SCAR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'green'
    if item[1] == '1C0W_DTXR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'orange'
    if item[1] == '3R60_MNTR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'magenta'
    if item[1] == '6O5C_MTSR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'yellow'
"""
nodes


In [None]:
#nodes = df.drop_duplicates('source')
uniq_list = np.unique(df[['source', 'target']].values)
#print(uniq_list)
nodes = pd.DataFrame(uniq_list, columns = ['id'])
#nodes = nodes.drop(columns=['target', 'evalue'])
#nodes.columns=['id']

nodes['label'] = nodes['id']
nodes['background-color']='cyan'

#nodes.loc['3HRT_SCAR']
nodes['width']='24'
nodes['height']='24'
nodes['text-valign']='center'
nodes['text-halign']='center'

nodes['count'] = '1'

col_one_list = nodes['id'].tolist()

for item in col_one_list: 
    #print(item)
    size = len(edges[edges['source']==item]) + len(edges[edges['target']==item])
    nodes.loc[nodes['id'] == item, 'count']=str(size)
    size = size*10
    nodes.loc[nodes['id'] == item, 'width']=str(size)
    nodes.loc[nodes['id'] == item, 'height']=str(size)

records = edges.to_records(index=False)
result = list(records)
#print(result)
    
#edge_tar_list = edges['target'].tolist()

#edge_tar_list = edges.iterrows().tolist()

for item in result:
    #print(item)
    if item[1] == '1U8R_IDER':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'red'
    if item[1] == '5CVI_SLOR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'blue'
    if item[1] == '3HRT_SCAR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'green'
    if item[1] == '1C0W_DTXR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'orange'
    if item[1] == '3R60_MNTR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'magenta'
    if item[1] == '6O5C_MTSR':
        nodes.loc[nodes['id'] == item[0],'background-color']  = 'yellow'

nodes.loc[nodes['id'] == '1U8R_IDER','background-color']  = 'red'
nodes.loc[nodes['id'] == '1C0W_DTXR','background-color']  = 'orange'
nodes.loc[nodes['id'] == '6O5C_MTSR','background-color']  = 'yellow'
nodes.loc[nodes['id'] == '3HRT_SCAR','background-color']  = 'green'
nodes.loc[nodes['id'] == '5CVI_SLOR','background-color']  = 'blue'
nodes.loc[nodes['id'] == '3R60_MNTR','background-color']  = 'magenta'
    
nodes
#nodes.to_csv('files/nodes_40', index=True, sep='\t')

In [None]:
import json
import ipycytoscape

def transform_into_ipycytoscape(nodes_df,edges_df):
    
    nodes_dict = nodes_df.to_dict('records')
    edges_dict = edges_df.to_dict('records')
    """   
    for thing in edges_dict:
        if thing['target'] == "1U8R_IDER":
            print(thing["source"])
    """
    # building nodes

    data_keys = ['id']  #this is a list of keys in stations (nodes)
    position_keys = ['position_x','position_y']
    rest_keys = ['score','idInt','name','score','group','removed','selected','selectable','locked','grabbed'
                 'grabbable']
    
    nodes_graph_list=[] #an empty list for making the json-like? file
    for node in nodes_dict: #iterating over each node
        dict_node = {}
        data_sub_dict = {'data':{el:node[el] for el in data_keys}}
        rest_sub_dict = {el:node[el] for el in node.keys() if el in rest_keys}
        posi_sub_dict = {}
        if 'position_x' in node.keys() and 'position_y' in node.keys():
            #print(node.keys())
            posi_sub_dict = {'position':{el:node[el] for el in node.keys() if el in position_keys}}
        
        dict_node = {**data_sub_dict,**rest_sub_dict,**posi_sub_dict}
        nodes_graph_list.append(dict_node)
    #print(nodes_graph_list) #NOTE this works to here!!!
    
    # building edges
    
    data_keys  = ['source','target','evalue'] #this is a list of keys in rails (edges)
    data_keys2 = ['label','classes'] #these are from the stations I think
    rest_keys  = ['score','weight','group','networkId','networkGroupId','intn','rIntnId','group','removed','selected','selectable','locked','grabbed','grabbable','classes']
    position_keys = ['position_x','position_y']
    
    edges_graph_list = []
    for edge in edges_dict:
        dict_edge = {}
        data_sub_dict = {el:edge[el] for el in data_keys}
        data_sub_dict2 = {el:edge[el] for el in edge.keys() if el in data_keys2}
        rest_sub_dict = {el:edge[el] for el in edge.keys() if el in rest_keys}
        
        dict_edge = {'data':{**data_sub_dict,**data_sub_dict2},**rest_sub_dict}
        edges_graph_list.append(dict_edge)
    
    #print(edges_graph_list)
    
    total_graph_dict = {'nodes': nodes_graph_list, 'edges':edges_graph_list}
    
    #print(total_graph_dict)
    
    # building the style
    all_node_style = ['background-color','background-opacity',
                     'font-family','font-size','label','width',
                     'shape','height','width','text-valign','text-halign']
    all_edge_style = ['background-color','background-opacity',
                     'font-family','font-size','label','width','line-color', 
                     ]
    
    total_style_dict = {}
    style_elements=[]
    for node in nodes_dict:
        node_dict = {'selector': f'node[id = \"{node["id"]}\"]'}
        style_dict ={"style": { el:node[el] for el in node.keys() if el in all_node_style}}
        node_dict.update(style_dict)
        style_elements.append(node_dict)
    
    for edge in edges_dict:
        edge_dict = {'selector': f'edge[id = \"{edge["source"]}\"]'}
        style_dict ={"style": { el:edge[el] for el in edge.keys() if el in all_edge_style}}
        edge_dict.update(style_dict)
        style_elements.append(edge_dict)
    
    # the graph
    data_graph = json.dumps(total_graph_dict)
    json_to_python = json.loads(data_graph)
    result_cyto = ipycytoscape.CytoscapeWidget()
    result_cyto.graph.add_graph_from_json(json_to_python)    
    result_cyto.set_style(style_elements)
    result_cyto.set_layout(name='grid')
    
    return result_cyto

G=transform_into_ipycytoscape(nodes,edges)
display(G)