# Space Mice Genes
## Heat Propagation and Clustering Package


----------------------

Author: Mikayla Webster (13webstermj@gmail.com)

Date: 2nd May, 2018

----------------------

<a id='toc'></a>
## Table of Contents
1. [Background](#background)
2. [Import packages](#import)
3. [Define Analysis Preferences](#pref)
3. [Load Networks](#load)
4. [Run Heat Propagation](#heat)
5. [Clustering](#cluster)

## Background
<a id='background'></a>

## Import packages
<a id='import'></a>

In [105]:
import sys
code_path = '../../network_bio_toolkit'
sys.path.append(code_path)

import Heat
reload(Heat)

import pandas as pd

## Define Analysis Preferences
<a id='pref'></a>

In [106]:
symbol = 'symbol'
entrez = 'entrez'

human = 'human'
mouse = 'mouse'

heat = Heat.Heat(gene_type = symbol, species = mouse)

## Load Networks
<a id='load'></a>

1. Load DEG file 
2. Load STRING background network

In [107]:
# load DEG file
DEG_filename = "../../DEG_databases/DE_CoeffspaceFlight - groundControl_glds48_20180312.csv"  
heat.create_DEG_list(DEG_filename, p_value_filter = 0.05, sep = ',')

print('Number of DEG\'s: ' + str(len(heat.DEG_list)))

Number of DEG's: 181


In [108]:
# load background network from BIOGRID ndex2 network 
heat.load_ndex_from_server(UUID = '52d57bf9-23dc-11e8-b939-0ac135e8bacf', relabel_node_field = 'name')

print('\nNumber of interactions: ' + str(len(list(heat.DG_universe.nodes()))))


Number of interactions: 12669


## Run Heat Propagation
<a id='heat'></a>

In [109]:
Wprime = heat.normalized_adj_matrix() # optional. Saves time to do it once here.

In [110]:
heat.draw_heat_prop(Wprime = Wprime, # you don't have to pass this argument. Will calculate automatically
                  num_nodes = 500,
#                  random_walk = False,
                  edge_width = 2,
                  edge_smooth_enabled = True,
                  edge_smooth_type = 'bezier',
                  node_size_multiplier = 5,
                  hover = False,
                  hover_connected_edges = False,
                  largest_connected_component = True,
                  physics_enabled = True,
                  node_font_size = 20,
                  graph_id = 1,
                  node_shadow_x = 6)

## Clustering 
<a id='cluster'></a>

Parameter information:
- **G_DEG**: background network filtered by DEG list, output of load_STRING_to_digraph
- **DG_universe**: full background network, output of create_graph.load_STRING_to_digraph 
- **seed_nodes**: list of DEG's, output of create_graph.create_DEG_list
- **Wprime**: will calculate automatically of not specified, output of visualizations.normalized_adj_matrix
- **num_top_genes**: number of genes to display in the output graph
- **cluster_size_cut_off**: color clusters below this threshhold grey
- **remove_stray_nodes**: remove custers below the cluster size cut off
- **r**: increases spacing between clusters. recommended number between 0.5 and 4.0
- **x_offset**: modify if some clusters are overlapping. Extra helpful when x_offset != y_offset
- **y_offset**: modify if some clusters are overlapping. Extra helpful when x_offset != y_offset
- **node_spacing**: recommended number between 500 and 2000
- **node_size_multiplier**: as you scale node_spacing, scale this number. Recommended number between 5 and 25
- **physics_enabled**: Nodes will bounce aroound when you click and drag them. Only set to True when number of nodes is 200 or less
- **node_font_size**: as you scale node_spacing, scale this number. Recommended number between 20 and 50
- **graph_id**: Allows rendering of multiple graphs in one notebook. Just make sure each graph has a unique id. 

In [111]:
heat.draw_clustering(rad_positions = False,
                Wprime = Wprime,
                k = None,
                largest_connected_component = True,
                num_top_genes = 500,
                cluster_size_cut_off = 5,
                remove_stray_nodes = True,
                node_spacing = 700,
                node_size_multiplier = 5,
                physics_enabled = False,
                node_font_size = 16,
                graph_id = 2
               )

500
64


In [113]:
heat.draw_clustering(Wprime = Wprime,
                num_top_genes = 500,
                cluster_size_cut_off = 5,
                remove_stray_nodes = True,
                r = 1.2,
#                x_offset = 2,
#                y_offset = 2,
                node_spacing = 700,
                node_size_multiplier = 12,
                physics_enabled = False,
                node_font_size = 45,
                graph_id = 3,
                node_shadow_x = 6
               )

500
64


In [32]:
'''
--------------------------------------------------------

Authors:
    - Brin Rosenthal (sbrosenthal@ucsd.edu)
    - Julia Len (jlen@ucsd.edu)
     - Mikayla Webster (13webstermj@gmail.com)

--------------------------------------------------------
'''

from __future__ import print_function
import json
import math
import matplotlib as mpl
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import pandas as pd
#import visJS_module # use this for local testing
import visJS2jupyter.visJS_module as visJS_module

def draw_graph_overlap(G1, G2,
                       edge_cmap=plt.cm.coolwarm,
                       export_file='graph_overlap.json',
                       export_network=False,
                       highlight_nodes=None,
                       k=None,
                       node_cmap=plt.cm.autumn,
                       node_name_1='graph 1',
                       node_name_2='graph 2',
                       node_size=10,
                       physics_enabled=False,
                       **kwargs):
    '''
    Takes two networkX graphs and displays their overlap, where intersecting
    nodes are triangles. Additional kwargs are passed to visjs_module.

    Inputs:
        - G1: a networkX graph
        - G2: a networkX graph
        - edge_cmap: matplotlib colormap for edges, default: matplotlib.cm.coolwarm
        - export_file: JSON file to export graph data, default: 'graph_overlap.json'
        - export_network: export network to Cytoscape, default: False
        - highlight_nodes: list of nodes to place borders around, default: None
        - k: float, optimal distance between nodes for nx.spring_layout(), default: None
        - node_cmap: matplotlib colormap for nodes, default: matplotlib.cm.autumn
        - node_name_1: string to name first graph's nodes, default: 'graph 1'
        - node_name_2: string to name second graph's nodes, default: 'graph 2'
        - node_size: size of nodes, default: 10
        - physics_enabled: enable physics simulation, default: False

    Returns:
        - VisJS html network plot (iframe) of the graph overlap.
    '''

    G_overlap = create_graph_overlap(G1, G2, node_name_1, node_name_2)

    # create nodes dict and edges dict for input to visjs
    nodes = list(G_overlap.nodes())
    edges = list(G_overlap.edges())

    # set the position of each node
    if k is None:
        pos = nx.spring_layout(G_overlap)
    else:
        pos = nx.spring_layout(G_overlap,k=k)

    xpos,ypos=zip(*pos.values())
    nx.set_node_attributes(G_overlap, name = 'xpos', values = dict(zip(pos.keys(),[x*1000 for x in xpos])))
    nx.set_node_attributes(G_overlap, name = 'ypos', values = dict(zip(pos.keys(),[y*1000 for y in ypos])))

    # set the border width of nodes
    if 'node_border_width' not in kwargs.keys():
        kwargs['node_border_width'] = 2

    border_width = {}
    for n in nodes:
        if highlight_nodes is not None and n in highlight_nodes:
            border_width[n] = kwargs['node_border_width']
        else:
            border_width[n] = 0

    nx.set_node_attributes(G_overlap, name = 'nodeOutline', values = border_width)

    # set the shape of each node
    nodes_shape=[]
    for node in G_overlap.nodes(data=True):
        if node[1]['node_overlap']==0:
            nodes_shape.append('dot')
        elif node[1]['node_overlap']==2:
            nodes_shape.append('square')
        elif node[1]['node_overlap']==1:
            nodes_shape.append('triangle')
    node_to_shape=dict(zip(G_overlap.nodes(),nodes_shape))
    nx.set_node_attributes(G_overlap, name = 'nodeShape', values = node_to_shape)

    # set the node label of each node
    if highlight_nodes:
        node_labels = {}
        for node in nodes:
            if node in highlight_nodes:
                node_labels[node] = str(node)
            else:
                node_labels[node] = ''
    else:
        node_labels = {n:str(n) for n in nodes}

    nx.set_node_attributes(G_overlap, name = 'nodeLabel', values = node_labels)

    # set the node title of each node
    node_titles = [ node[1]['node_name_membership'] + '<br/>' + str(node[0])
                    for node in G_overlap.nodes(data=True) ]
    node_titles = dict(zip(G_overlap.nodes(),node_titles))
    nx.set_node_attributes(G_overlap, name = 'nodeTitle', values = node_titles)

    # set color of each node
    node_to_color = visJS_module.return_node_to_color(G_overlap,
                                                      field_to_map='node_overlap',
                                                      cmap=node_cmap,
                                                      color_max_frac=.9,
                                                      color_min_frac=.1)

    # set color of each edge
    edge_to_color = visJS_module.return_edge_to_color(G_overlap,
                                                      field_to_map='edge_weight',
                                                      cmap=edge_cmap,
                                                      alpha=.3)

    # create the nodes_dict with all relevant fields
    nodes_dict = [{'id':str(n),
                   'border_width':border_width[n],
                   'color':node_to_color[n],
                   'degree':G_overlap.degree(n),
                   'node_label':node_labels[n],
                   'node_shape':node_to_shape[n],
                   'node_size':node_size,
                   'title':node_titles[n],
                   'x':np.float64(pos[n][0]).item()*1000,
                   'y':np.float64(pos[n][1]).item()*1000}
                  for n in nodes]

    # map nodes to indices for source/target in edges
    node_map = dict(zip(nodes,range(len(nodes))))

    # create the edges_dict with all relevant fields
    edges_dict = [{'source':node_map[edges[i][0]],
                   'target':node_map[edges[i][1]],
                   'color':edge_to_color[edges[i]]}
                  for i in range(len(edges))]

    # set node_size_multiplier to increase node size as graph gets smaller
    if 'node_size_multiplier' not in kwargs.keys():
        if len(nodes) > 500:
            kwargs['node_size_multiplier'] = 3
        elif len(nodes) > 200:
            kwargs['node_size_multiplier'] = 5
        else:
            kwargs['node_size_multiplier'] = 7

    kwargs['physics_enabled'] = physics_enabled

    # if node hovering color not set, set default to black
    if 'node_color_hover_background' not in kwargs.keys():
        kwargs['node_color_hover_background'] = 'black'

    # node size determined by size in nodes_dict, not by id
    if 'node_size_field' not in kwargs.keys():
        kwargs['node_size_field'] = 'node_size'

    # node label determined by value in nodes_dict
    if 'node_label_field' not in kwargs.keys():
        kwargs['node_label_field'] = 'node_label'

    # export the network to JSON for Cytoscape
    if export_network:
        node_colors = map_node_to_color(G_overlap,'node_overlap',False)
        nx.set_node_attributes(G_overlap, name = 'nodeColor', values = node_colors)
        edge_colors = map_edge_to_color(G_overlap,'edge_weight',False)
        nx.set_edge_attributes(G_overlap, name = 'edgeColor', values = edge_colors)
        visJS_module.export_to_cytoscape(G = G_overlap, export_file = export_file)

    return visJS_module.visjs_network(nodes_dict,edges_dict,**kwargs)


def create_graph_overlap(G1,G2,node_name_1,node_name_2):
    '''
    Create and return the overlap of two graphs.

    Inputs:
        - G1: a networkX graph
        - G2: a networkX graph
        - node_name_1: string to name first graph's nodes
        - node_name_2: string to name second graph's nodes

    Returns:
        - A networkX graph that is the overlap of G1 and G2.
    '''

    overlap_graph = nx.Graph()
    node_union = list(np.union1d(list(G1.nodes()),list(G2.nodes())))
    node_intersect = list(np.intersect1d(list(G1.nodes()),list(G2.nodes())))
    nodes_1only = np.setdiff1d(list(G1.nodes()),node_intersect)
    nodes_2only = np.setdiff1d(list(G2.nodes()),node_intersect)

    edges_total = list(G1.edges())
    edges_total.extend(list(G2.edges()))

    overlap_graph.add_nodes_from(node_union)

    # set node attributes to distinguish which graph the node belongs to
    node_overlap=[]
    node_name_membership=[]
    for node in node_union:
        if node in nodes_1only:
            node_overlap.append(0)
            node_name_membership.append(node_name_1)
        elif node in nodes_2only:
            node_overlap.append(2)
            node_name_membership.append(node_name_2)
        else:
            node_overlap.append(1)
            node_name_membership.append(node_name_1+' + '+node_name_2)

    nx.set_node_attributes(overlap_graph,
                           name = 'node_overlap',
                           values = dict(zip(node_union,node_overlap)))
    nx.set_node_attributes(overlap_graph,
                           name = 'node_name_membership',
                           values = dict(zip(node_union,node_name_membership)))

    nodes_total = list(overlap_graph.nodes())
    intersecting_edge_val = int(math.floor(math.log10(len(nodes_total)))) * 10

    # set the edge weights
    edge_weights = {}
    for e in edges_total:
        eflip = (e[1],e[0])
        if (e in edge_weights.keys()):
            edge_weights[e]+=intersecting_edge_val
        elif (eflip in edge_weights.keys()):
            edge_weights[eflip]+=intersecting_edge_val
        else:
            edge_weights[e]=1

    v1,v2 = zip(*edge_weights.keys())
    weights = edge_weights.values()
    edges = zip(v1,v2,weights)

    overlap_graph.add_weighted_edges_from(edges)
    nx.set_edge_attributes(overlap_graph, name = 'edge_weight', values = edge_weights)
    return overlap_graph


def draw_heat_prop(G, seed_nodes, random_walk = True,
                   edge_cmap=plt.cm.autumn_r,
                   export_file='heat_prop.json',
                   export_network=False,
                   highlight_nodes=None,
                   k=None,
                   largest_connected_component=False,
                   node_cmap=plt.cm.autumn_r,
                   node_size=10,
                   num_nodes=None,
                   physics_enabled=False,
                   Wprime=None,
                   **kwargs):
    '''
    Implements and displays the network propagation for a given graph and seed
    nodes. Additional kwargs are passed to visJS_module.

    Inputs:
        - G: a networkX graph
        - seed_nodes: nodes on which to initialize the simulation (must be a dict if random_walk = False)
          - random_walk: True to perform a random walk style heat propagation, False to perform a diffusion style one.
        - edge_cmap: matplotlib colormap for edges, default: matplotlib.cm.autumn_r
        - export_file: JSON file to export graph data, default: 'graph_overlap.json'
        - export_network: export network to Cytoscape, default: False
        - highlight_nodes: list of nodes to place borders around, default: None
        - k: float, optimal distance between nodes for nx.spring_layout(), default: None
        - largest_connected_component: boolean, whether or not to display largest_connected_component,
                                       default: False
        - node_cmap: matplotlib colormap for nodes, default: matplotlib.cm.autumn_r
        - node_size: size of nodes, default: 10
        - num_nodes: the number of the hottest nodes to graph, default: None (all nodes will be graphed)
        - physics_enabled: enable physics simulation, default: False
        - Wprime: normalized adjacency matrix (from function normalized_adj_matrix())

    Returns:
        - VisJS html network plot (iframe) of the heat propagation.
    '''

    # check for invalid nodes in seed_nodes
    invalid_nodes = [node for node in seed_nodes if node not in G.nodes()]
    for node in invalid_nodes:
        print ('Node {} not in graph'.format(node))
    if invalid_nodes:
        return

    # perform the network propagation
    if random_walk == True: # perform random walk style heat propagation
        if Wprime is None:
            Wprime = normalized_adj_matrix(G)    
        prop_graph = network_propagation(G, Wprime, seed_nodes).to_dict()
        nx.set_node_attributes(G, name = 'node_heat', values = prop_graph)

    # find top num_nodes hottest nodes and connected component if requested
    G = set_num_nodes(G,num_nodes)
    
    print('nodes: ' + str(len(G.nodes())))
    print('edges: ' + str(len(G.edges())))
    
    if largest_connected_component:
        G = max(nx.connected_component_subgraphs(G), key=len)
    nodes = list(G.nodes())
    edges = list(G.edges())
    
    print('nodes: ' + str(len(G.nodes())))
    print('edges: ' + str(len(G.edges())))

    # check for empty nodes and edges after getting subgraph of G
    if not nodes:
        print ('There are no nodes in the graph. Try increasing num_nodes.')
        return
    if not edges:
        print ('There are no edges in the graph. Try increasing num_nodes.')
        return

    # set the position of each node
    if k is None:
        pos = nx.spring_layout(G)
    else:
        pos = nx.spring_layout(G,k=k)

    xpos,ypos=zip(*pos.values())
    nx.set_node_attributes(G, name = 'xpos', values = dict(zip(pos.keys(),[x*1000 for x in xpos])))
    nx.set_node_attributes(G, name = 'ypos', values = dict(zip(pos.keys(),[y*1000 for y in ypos])))

    # set the border width of nodes
    if 'node_border_width' not in kwargs.keys():
        kwargs['node_border_width'] = 2

    border_width = {}
    for n in nodes:
        if n in seed_nodes:
            border_width[n] = kwargs['node_border_width']
        elif highlight_nodes is not None and n in highlight_nodes:
            border_width[n] = kwargs['node_border_width']
        else:
            border_width[n] = 0

    nx.set_node_attributes(G, name = 'nodeOutline', values = border_width)

    # set the shape of each node
    nodes_shape=[]
    for node in G.nodes():
        if node in seed_nodes:
            nodes_shape.append('triangle')
        else:
            nodes_shape.append('dot')
    node_to_shape=dict(zip(G.nodes(),nodes_shape))
    nx.set_node_attributes(G, name = 'nodeShape', values = node_to_shape)

    # add a field for node labels
    if highlight_nodes:
        node_labels = {}
        for node in nodes:
            if node in seed_nodes:
                node_labels[node] = str(node)
            elif node in highlight_nodes:
                node_labels[node] = str(node)
            else:
                node_labels[node] = ''
    else:
        node_labels = {n:str(n) for n in nodes}

    nx.set_node_attributes(G, name = 'nodeLabel', values = node_labels)

    # set title for each node
    node_titles = [str(node[0]) + '<br/>heat = ' + str(round(node[1]['node_heat'],5))
                   for node in G.nodes(data=True)]
    node_titles = dict(zip(G.nodes(),node_titles))
    nx.set_node_attributes(G, name = 'nodeTitle', values = node_titles)

    # set color of each node
    node_to_color = visJS_module.return_node_to_color(G,
                                                      field_to_map='node_heat',
                                                      cmap=node_cmap,
                                                      color_vals_transform='log')

    # set heat value of edge based off hottest connecting node's value
    node_attr = nx.get_node_attributes(G,'node_heat')
    edge_weights = {}
    for e in edges:
        if node_attr[e[0]] > node_attr[e[1]]:
            edge_weights[e] = node_attr[e[0]]
        else:
            edge_weights[e] = node_attr[e[1]]

    nx.set_edge_attributes(G, name = 'edge_weight', values = edge_weights)

    # set color of each edge
    edge_to_color = visJS_module.return_edge_to_color(G,
                                                      field_to_map='edge_weight',
                                                      cmap=edge_cmap,
                                                      color_vals_transform='log')

    # create the nodes_dict with all relevant fields
    nodes_dict = [{'id':str(n),
                   'border_width':border_width[n],
                   'degree':G.degree(n),
                   'color':node_to_color[n],
                   'node_label':node_labels[n],
                   'node_size':node_size,
                   'node_shape':node_to_shape[n],
                   'title':node_titles[n],
                   'x':np.float64(pos[n][0]).item()*1000,
                   'y':np.float64(pos[n][1]).item()*1000} for n in nodes]

    # map nodes to indices for source/target in edges
    node_map = dict(zip(nodes,range(len(nodes))))

    # create the edges_dict with all relevant fields
    edges_dict = [{'source':node_map[edges[i][0]],
                   'target':node_map[edges[i][1]],
                   'color':edge_to_color[edges[i]]} for i in range(len(edges))]

    # set node_size_multiplier to increase node size as graph gets smaller
    if 'node_size_multiplier' not in kwargs.keys():
        if len(nodes) > 500:
            kwargs['node_size_multiplier'] = 3
        elif len(nodes) > 200:
            kwargs['node_size_multiplier'] = 5
        else:
            kwargs['node_size_multiplier'] = 7

    kwargs['physics_enabled'] = physics_enabled

    # if node hovering color not set, set default to black
    if 'node_color_hover_background' not in kwargs.keys():
        kwargs['node_color_hover_background'] = 'black'

    # node size determined by size in nodes_dict, not by id
    if 'node_size_field' not in kwargs.keys():
        kwargs['node_size_field'] = 'node_size'

    # node label determined by value in nodes_dict
    if 'node_label_field' not in kwargs.keys():
        kwargs['node_label_field'] = 'node_label'

    # export the network to JSON for Cytoscape
    if export_network:
        node_colors = map_node_to_color(G,'node_heat',True)
        nx.set_node_attributes(G, name = 'nodeColor', values = node_colors)
        edge_colors = map_edge_to_color(G,'edge_weight',True)
        nx.set_edge_attributes(G, name = 'edgeColor', values = edge_colors)
        visJS_module.export_to_cytoscape(G = G,export_file = export_file)

    return visJS_module.visjs_network(nodes_dict,edges_dict,**kwargs)


def draw_colocalization(G, seed_nodes_1, seed_nodes_2,
                        edge_cmap=plt.cm.autumn_r,
                        export_file='colocalization.json',
                        export_network=False,
                        highlight_nodes=None,
                        k=None,
                        largest_connected_component=False,
                        node_cmap=plt.cm.autumn_r,
                        node_size=10,
                        num_nodes=None,
                        physics_enabled=False,
                        Wprime=None,
                        **kwargs):
    '''
    Implements and displays the network propagation for a given graph and two
    sets of seed nodes. Additional kwargs are passed to visJS_module.

    Inputs:
        - G: a networkX graph
        - seed_nodes_1: first set of nodes on which to initialize the simulation
        - seed_nodes_2: second set of nodes on which to initialize the simulation
        - edge_cmap: matplotlib colormap for edges, optional, default: matplotlib.cm.autumn_r
        - export_file: JSON file to export graph data, default: 'colocalization.json'
        - export_network: export network to Cytoscape, default: False
        - highlight_nodes: list of nodes to place borders around, default: None
        - k: float, optional, optimal distance between nodes for nx.spring_layout(), default: None
        - largest_connected_component: boolean, optional, whether or not to display largest_connected_component,
                                       default: False
        - node_cmap: matplotlib colormap for nodes, optional, default: matplotlib.cm.autumn_r
        - node_size: size of nodes, default: 10
        - num_nodes: the number of the hottest nodes to graph, default: None (all nodes will be graphed)
        - physics_enabled: enable physics simulation, default: False
        - Wprime:  Normalized adjacency matrix (from normalized_adj_matrix)

    Returns:
        - VisJS html network plot (iframe) of the colocalization.
    '''

    # check for invalid nodes in seed_nodes
    invalid_nodes = [(node,'seed_nodes_1') for node in seed_nodes_1 if node not in G.nodes()]
    invalid_nodes.extend([(node,'seed_nodes_2') for node in seed_nodes_2 if node not in G.nodes()])
    for node in invalid_nodes:
        print ('Node {} in {} not in graph'.format(node[0], node[1]))
    if invalid_nodes:
        return

    # perform the colocalization
    if Wprime is None:
        Wprime = normalized_adj_matrix(G)
    prop_graph_1 = network_propagation(G, Wprime, seed_nodes_1).to_dict()
    prop_graph_2 = network_propagation(G, Wprime, seed_nodes_2).to_dict()
    prop_graph = {node:(prop_graph_1[node]*prop_graph_2[node]) for node in prop_graph_1}
    nx.set_node_attributes(G, name = 'node_heat', values = prop_graph)

    # find top num_nodes hottest nodes and connected component if requested
    G = set_num_nodes(G,num_nodes)
    if largest_connected_component:
        G = max(nx.connected_component_subgraphs(G), key=len)
    nodes = list(G.nodes())
    edges = list(G.edges())

    # check for empty nodes and edges after getting subgraph of G
    if not nodes:
        print ('There are no nodes in the graph. Try increasing num_nodes.')
        return
    if not edges:
        print ('There are no edges in the graph. Try increasing num_nodes.')
        return

    # set position of each node
    if k is None:
        pos = nx.spring_layout(G)
    else:
        pos = nx.spring_layout(G,k=k)

    xpos,ypos=zip(*pos.values())
    nx.set_node_attributes(G, name = 'xpos', values = dict(zip(pos.keys(),[x*1000 for x in xpos])))
    nx.set_node_attributes(G, name = 'ypos', values = dict(zip(pos.keys(),[y*1000 for y in ypos])))

    # set the border width of nodes
    if 'node_border_width' not in kwargs.keys():
        kwargs['node_border_width'] = 2

    border_width = {}
    for n in nodes:
        if n in seed_nodes_1 or n in seed_nodes_2:
            border_width[n] = kwargs['node_border_width']
        elif highlight_nodes is not None and n in highlight_nodes:
            border_width[n] = kwargs['node_border_width']
        else:
            border_width[n] = 0

    nx.set_node_attributes(G, name = 'nodeOutline', values = border_width)

    # set the shape of each node
    nodes_shape=[]
    for node in G.nodes():
        if node in seed_nodes_1:
            nodes_shape.append('triangle')
        elif node in seed_nodes_2:
            nodes_shape.append('square')
        else:
            nodes_shape.append('dot')
    node_to_shape=dict(zip(G.nodes(),nodes_shape))
    nx.set_node_attributes(G, name = 'nodeShape', values = node_to_shape)

    # add a field for node labels
    if highlight_nodes:
        node_labels = {}
        for node in nodes:
            if node in seed_nodes_1 or n in seed_nodes_2:
                node_labels[node] = str(node)
            elif node in highlight_nodes:
                node_labels[node] = str(node)
            else:
                node_labels[node] = ''
    else:
        node_labels = {n:str(n) for n in nodes}

    nx.set_node_attributes(G, name = 'nodeLabel', values = node_labels)

    # set the title of each node
    node_titles = [str(node[0]) + '<br/>heat = ' + str(round(node[1]['node_heat'],10))
                   for node in G.nodes(data=True)]
    node_titles = dict(zip(nodes,node_titles))
    nx.set_node_attributes(G, name = 'nodeTitle', values = node_titles)

    # set the color of each node
    node_to_color = visJS_module.return_node_to_color(G,
                                                      field_to_map='node_heat',
                                                      cmap=node_cmap,
                                                      color_vals_transform='log')

    # set heat value of edge based off hottest connecting node's value
    node_attr = nx.get_node_attributes(G,'node_heat')
    edge_weights = {}
    for e in edges:
        if node_attr[e[0]] > node_attr[e[1]]:
            edge_weights[e] = node_attr[e[0]]
        else:
            edge_weights[e] = node_attr[e[1]]

    nx.set_edge_attributes(G, name = 'edge_weight', values = edge_weights)

    # set the color of each edge
    edge_to_color = visJS_module.return_edge_to_color(G,
                                                      field_to_map = 'edge_weight',
                                                      cmap=edge_cmap,
                                                      color_vals_transform = 'log')

    # create the nodes_dict with all relevant fields
    nodes_dict = [{'id':str(n),
                   'border_width':border_width[n],
                   'degree':G.degree(n),
                   'color':node_to_color[n],
                   'node_label':node_labels[n],
                   'node_size':node_size,
                   'node_shape':node_to_shape[n],
                   'title':node_titles[n],
                   'x':np.float64(pos[n][0]).item()*1000,
                   'y':np.float64(pos[n][1]).item()*1000} for n in nodes]

    # map nodes to indices for source/target in edges
    node_map = dict(zip(nodes, range(len(nodes))))

    # create the edges_dict with all relevant fields
    edges_dict = [{'source':node_map[edges[i][0]],
                   'target':node_map[edges[i][1]],
                   'color':edge_to_color[edges[i]]} for i in range(len(edges))]

    # set node_size_multiplier to increase node size as graph gets smaller
    if 'node_size_multiplier' not in kwargs.keys():
        if len(nodes) > 500:
            kwargs['node_size_multiplier'] = 1
        elif len(nodes) > 200:
            kwargs['node_size_multiplier'] = 3
        else:
            kwargs['node_size_multiplier'] = 5

    kwargs['physics_enabled'] = physics_enabled

    # if node hovering color not set, set default to black
    if 'node_color_hover_background' not in kwargs.keys():
        kwargs['node_color_hover_background'] = 'black'

    # node size determined by size in nodes_dict, not by id
    if 'node_size_field' not in kwargs.keys():
        kwargs['node_size_field'] = 'node_size'

    # node label determined by value in nodes_dict
    if 'node_label_field' not in kwargs.keys():
        kwargs['node_label_field'] = 'node_label'

    # export the network to JSON for Cytoscape
    if export_network:
        node_colors = map_node_to_color(G,'node_heat',True)
        nx.set_node_attributes(G, name = 'nodeColor', values = node_colors)
        edge_colors = map_edge_to_color(G,'edge_weight',True)
        nx.set_edge_attributes(G, name = 'edgeColor', values = edge_colors)
        visJS_module.export_to_cytoscape(G = G,export_file = export_file)

    return visJS_module.visjs_network(nodes_dict,edges_dict,**kwargs)


def normalized_adj_matrix(G,conserve_heat=True,weighted=False):
    '''
    This function returns normalized adjacency matrix.

    Inputs:
        - G: NetworkX graph from which to calculate normalized adjacency matrix
        - conserve_heat:
            - True: Heat will be conserved (sum of heat vector = 1).  Graph asymmetric
            - False:  Heat will not be conserved.  Graph symmetric.

    Returns:
        - numpy array of the normalized adjacency matrix.
    '''

    wvec=[]
    for e in G.edges(data=True):
        v1 = e[0]
        v2 = e[1]
        deg1 = G.degree(v1)
        deg2 = G.degree(v2)

        if weighted:
            weight = e[2]['weight']
        else:
            weight=1

        if conserve_heat:
            wvec.append((v1,v2,weight/float(deg2))) #np.sqrt(deg1*deg2)))
            wvec.append((v2,v1,weight/float(deg1)))
        else:
            wvec.append((v1,v2,weight/np.sqrt(deg1*deg2)))

    if conserve_heat:
        # if conserving heat, make G_weighted a di-graph (not symmetric)
        G_weighted= nx.DiGraph()
    else:
        # if not conserving heat, make G_weighted a simple graph (symmetric)
        G_weighted = nx.Graph()

    G_weighted.add_weighted_edges_from(wvec)

    Wprime = nx.to_numpy_matrix(G_weighted,nodelist=list(G.nodes()))
    Wprime = np.array(Wprime)

    return Wprime


def network_propagation(G,Wprime,seed_nodes,alpha=.5, num_its=20):
    '''
    This function implements network propagation, as detailed in:
    Vanunu, Oron, et al. 'Associating genes and protein complexes with disease
    via network propagation.'

    Inputs:
        - G: NetworkX graph on which to run simulation
        - Wprime:  Normalized adjacency matrix (from normalized_adj_matrix)
        - seed_nodes:  Genes on which to initialize the simulation.
        - alpha:  Heat dissipation coefficient.  Default = 0.5
        - num_its:  Number of iterations (Default = 20.  Convergence usually happens within 10)

    Returns:
        - Fnew: heat vector after propagation
    '''

    nodes = list(G.nodes())
    numnodes = len(nodes)
    edges= list(G.edges())
    numedges = len(edges)

    Fold = np.zeros(numnodes)
    Fold = pd.Series(Fold,index=list(G.nodes()))
    Y = np.zeros(numnodes)
    Y = pd.Series(Y,index=list(G.nodes()))
    for g in seed_nodes:
        # normalize total amount of heat added, allow for replacement
        Y[g] = Y[g]+1/float(len(seed_nodes))
    Fold = Y.copy(deep=True)

    for t in range(num_its):
        Fnew = alpha*np.dot(Wprime,Fold) + np.multiply(1-alpha,Y)
        Fold=Fnew

    return Fnew


def map_node_to_color(G,field_to_map,color_vals_transform):
    '''
    Maps node to color value between 0 and 1 based on the given field.

    Inputs:
        - G: networkX graph
        - field_to_map: node attribute to map color to
        - color_vals_transform: to calculate color vals with log (boolean)

    Returns:
        - Dictionary that maps node to color value.
    '''

    node_to_field = [(n[0], max(n[1][field_to_map], 10**-18))
                        for n in G.nodes(data=True)]
    nodes,data = zip(*node_to_field)
    if color_vals_transform:
        nonzero_list = [d for d in data if d>(10**-18)]
        if not nonzero_list:
            data = [1 for d in data]
        else:
            min_val = min(nonzero_list)
            data = [np.log(max(d,min_val)) for d in data] #set 0 vals to min val
            data = [(d-np.min(data)) for d in data] #shift so we don't have neg vals
    min_val = np.min(data)
    max_val = np.max(data) - min_val
    color_list = [float(d-min_val)/max_val for d in data]
    return dict(zip(G.nodes(),color_list))


def map_edge_to_color(G,field_to_map,color_vals_transform):
    '''
    Maps edge to color value between 0 and 1 based on the given field.

    Inputs:
        - G: networkX graph
        - field_to_map: edge attribute to map color to
        - color_vals_transform: to calculate color vals with log (boolean)

    Returns:
        - Dictionary that maps edge to color value.
    '''

    edges_data = [(e[0],e[1],e[2][field_to_map])
                  for e in G.edges(data=True)]
    edges1,edges2,data = zip(*edges_data)
    if color_vals_transform:
        nonzero_list = [d for d in data if d>(10**-18)]
        if not nonzero_list:
            data = [1 for d in data]
        else:
            min_dn0 = min([d for d in data if d>(10**-18)])
            data = [np.log(max(d,min_dn0)) for d in data]  #set 0 vals to min val
            data = [(d-np.min(data)) for d in data] #shift so we don't have neg vals
    edges_data = zip(zip(edges1,edges2),data)
    edge_to_field = dict(edges_data)
    min_val = np.min(list(edge_to_field.values()))
    max_val = np.max(list(edge_to_field.values())) - min_val
    color_list = [float(edge_to_field[e]-min_val)/max_val for e in G.edges()]
    return dict(zip(G.edges(),color_list))


In [35]:
G_heat = nx.Graph(heat.DG_universe)
seed_nodes = [n for n in heat.DEG_list if n in heat.DG_universe]

In [127]:
draw_heat_prop(heat.DG_universe, seed_nodes, Wprime = Wprime, # you don't have to pass this argument. Will calculate automatically
                  num_nodes = 500,
#                  random_walk = False,
                  edge_width = 2,
                  edge_smooth_enabled = True,
                  edge_smooth_type = 'bezier',
                  node_size_multiplier = 5,
                  hover = False,
                  hover_connected_edges = False,
                  largest_connected_component = True,
                  physics_enabled = True,
                  node_font_size = 20,
                  graph_id = 1,
                  node_shadow_x = 6)

nodes: 500
edges: 3607


NetworkXNotImplemented: not implemented for directed type

In [69]:
def set_num_nodes(G, num_nodes):
    '''
    Sets whether the graph should be physics-enabled or not. It is set for
    graphs of fewer than 100 nodes.

    Inputs:
        - G: a networkX graph
        - num_nodes: the number of the hottest nodes to graph

    Returns:
        - networkX graph that is the subgraph of G with the num_nodes hottest
          nodes
    '''

    if num_nodes != None and num_nodes < len(G.nodes()):
        node_heat = [(node[0], node[1]['node_heat']) for node in G.nodes(data=True)]
        nodes_sorted = sorted(node_heat, key=lambda x: x[1], reverse=True) 
        top_hottest_nodes = [nodes_sorted[i][0] for i in range(num_nodes)]
        return G.subgraph(top_hottest_nodes)
    return G

In [70]:
Fnew = network_propagation(G_heat, Wprime, seed_nodes)

In [74]:
heat_tuples = [(i,j) for i,j in zip(Fnew.index, list(Fnew))]

In [75]:
nodes_sorted = sorted(heat_tuples, key=lambda x: x[1], reverse=True)

In [81]:
top_hottest_nodes = [nodes_sorted[i][0] for i in range(500)]
top_hottest_nodes

[u'Nek6',
 u'Epha1',
 u'Nme1',
 u'Trim2',
 u'Mcm10',
 u'Pnrc1',
 u'Zyx',
 u'Bcl2l11',
 u'Irgm1',
 u'Ripk2',
 u'Ypel2',
 u'Tnk2',
 u'Ifitm3',
 u'Bcl6b',
 u'Cndp2',
 u'Acaa1a',
 u'Per3',
 u'Cela1',
 u'Leo1',
 u'Arntl',
 u'Acat2',
 u'Ablim3',
 u'Capn2',
 u'S100a10',
 u'Ugdh',
 u'Stat1',
 u'Clpx',
 u'Hdc',
 u'Per1',
 u'Gnl1',
 u'Adssl1',
 u'Nedd4l',
 u'Slc20a1',
 u'Serpine1',
 u'Tsc22d1',
 u'Plin2',
 u'Rps7',
 u'Lrp5',
 u'Cbx7',
 u'Ngef',
 u'Dhcr7',
 u'Cdkn1a',
 u'Txnip',
 u'Pycrl',
 u'Ifit3',
 u'Slc25a45',
 u'Fbxo6',
 u'Tfrc',
 u'Ddx60',
 u'Lad1',
 u'Tab2',
 u'Rsad2',
 u'Fabp5',
 u'Tubb2a',
 u'Bcl7c',
 u'Nmral1',
 u'Ifih1',
 u'Dnmt3b',
 u'Abcd2',
 u'Ppargc1b',
 u'Net1',
 u'Ndrg1',
 u'Hmgcs1',
 u'Phospho1',
 u'Asap2',
 u'Pcsk4',
 u'Rnf19b',
 u'Nfil3',
 u'Isg15',
 u'Ppp1r3b',
 u'Tef',
 u'Rrm2',
 u'Rad23b',
 u'Bhlhe41',
 u'Tars',
 u'Ilvbl',
 u'Dbp',
 u'Sdf2l1',
 u'GORASP1',
 u'HACE1',
 u'TMED7',
 u'Sorcs2',
 u'Trerf1',
 u'EPN2',
 u'KAT7',
 u'POLR2G',
 u'Emr1',
 u'ADD1',
 u'Pctp',
 u'Tssk3',


In [78]:
top_genes = Fnew.sort_values(ascending=False)[0:500].index

In [79]:
top_genes

Index([u'Nek6', u'Epha1', u'Nme1', u'Trim2', u'Mcm10', u'Pnrc1', u'Zyx',
       u'Bcl2l11', u'Irgm1', u'Ripk2',
       ...
       u'Cbx4', u'Rbm4b', u'TRIP4', u'Sh3bp1', u'Nrep', u'Cbx6', u'Cbx8',
       u'Gm9127', u'C1orf198', u'CDYL2'],
      dtype='object', length=500)

In [82]:
top_hottest_nodes == top_genes

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True, False, False, False, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,

In [87]:
G_Brin = nx.Graph(heat.DG_universe).subgraph(top_hottest_nodes)
G_Mikayla = nx.Graph(heat.DG_universe).subgraph(top_genes)

In [88]:
print(len(G_Brin.nodes()))
print(len(G_Brin.edges()))
print(len(G_Mikayla.nodes()))
print(len(G_Mikayla.edges()))

500
64
500
64


In [None]:
Fnew = visualizations.network_propagation(nx.Graph(DG_universe), Wprime, seed_nodes)
top_genes = Fnew.sort_values(ascending=False)[0:num_top_genes].index
G_top_genes = nx.Graph(DG_universe).subgraph(top_genes) # casting to Graph to match heat prop

In [None]:
G_heat = nx.Graph(heat.DG_universe)
seed_nodes = [n for n in heat.DEG_list if n in heat.DG_universe]
Fnew = network_propagation(G_heat, Wprime, seed_nodes)

In [118]:
Fnew_directed = network_propagation(heat.DG_universe, Wprime, seed_nodes)

In [119]:
Fnew_undirected = network_propagation(G_heat, Wprime, seed_nodes)

In [122]:
Fnew_directed

Plekhg3     9.716678e-07
Plekhg1     4.007960e-07
Hspbap1     4.374433e-08
Plekhg5     5.010924e-06
Nsa2        9.716678e-07
ATRX        2.191131e-07
RNF10       2.440887e-08
ASS1        1.436320e-07
COPRS       2.286252e-07
TCOF1       4.562373e-08
NSRP1       1.625083e-08
SP2         2.358970e-10
GOLIM4      4.417233e-07
Mef2d       4.839180e-07
Dag1        1.945464e-07
OPA1        1.745084e-07
Sacs        1.270937e-06
ITGA8       9.327994e-08
ITGA9       9.327994e-08
ATP2A2      1.494400e-07
Mllt4       2.424921e-06
ITGA1       1.575902e-06
ITGA2       1.896011e-07
ITGA3       1.289144e-07
Mllt1       1.192351e-05
ITGA6       1.289144e-07
Mllt3       1.326987e-06
Krt6b       2.116077e-07
Krt6a       2.638473e-06
Snrpd3      3.823175e-06
                ...     
Psmd10      1.849905e-06
Psmd13      5.934512e-06
Psmd12      1.282669e-05
CACYBP      5.733012e-09
Scn11a      6.165697e-06
CAND1       4.053970e-07
PRPS1       3.201835e-08
Calml3      2.142708e-05
BRPF3       4.882166e-08


In [123]:
Fnew_undirected

RNF14       1.499572e-06
Plekhg3     7.613141e-08
Cers4       5.802772e-07
Plekhg1     9.889776e-07
Hspbap1     1.499572e-06
Plekhg5     1.259016e-06
Nsa2        4.692992e-08
Ube2d3      1.207916e-07
UCHL5       4.633240e-07
Cers5       1.243005e-07
MZT2A       1.271239e-08
MZT2B       5.634794e-10
ATRX        1.554427e-06
RNF10       5.503405e-07
Ube2d1      1.601040e-06
ASS1        2.148881e-08
Shank2      6.837703e-07
Shank1      3.829876e-07
Strn3       3.829876e-07
Camkk2      1.012914e-07
CAMK1       4.277927e-06
Syt1        1.774062e-07
COPRS       1.214238e-06
Syt3        2.678719e-08
Syt5        7.041024e-06
ZC3H13      2.678719e-08
ZC3H15      1.563822e-06
ZC3H14      4.865039e-07
ZC3H18      8.901745e-07
Casp8ap2    2.422472e-06
                ...     
smad3a      4.374071e-06
smad3b      3.189981e-06
Cd19        5.193392e-06
LDHA        2.191962e-08
LDHB        4.633225e-07
Lhb         1.568109e-06
ODF2L       7.026811e-06
Cspp1       1.435883e-04
Tekt4       3.751594e-07


In [125]:
normalized_adj_matrix(heat.DG_universe)

array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

In [130]:
nx.connected_component_subgraphs(heat.DG_universe) ############################# BINGO #################################

NetworkXNotImplemented: not implemented for directed type

In [131]:
len(heat.DG_universe.edges())

75422

In [132]:
len(nx.Graph(heat.DG_universe.edges()))

12669