# Practice Session 03: Management of networks data

Author: <font color="black">Tània Pazos</font>

E-mail: <font color="black">tania.pazos01@estudiant.upf.edu</font>

Date: <font color="black">13/10/2023</font>

# 1. The flavors bi-partite graph

## 1.1. Read the bipartite graph in a dataframe


In [1]:
import io
import csv
import pandas as pd
import networkx as nx

from networkx.algorithms import bipartite

import numpy as np
import matplotlib
import scipy

import itertools

from IPython.display import Image

In [2]:
INPUT_INGR_FILENAME = "data/flavors-network/ingredients.tsv"
INPUT_COMP_FILENAME = "data/flavors-network/compounds.tsv"
INPUT_INGR_COMP_FILENAME = "data/flavors-network/ingredient-compound.tsv"

In [3]:
# Leave this code as-is

ingredients = pd.read_csv(INPUT_INGR_FILENAME, sep="\t")
display(ingredients.head(3))

compounds = pd.read_csv(INPUT_COMP_FILENAME, sep="\t")
display(compounds.head(3))

ingr_comp = pd.read_csv(INPUT_INGR_COMP_FILENAME, sep="\t")
display(ingr_comp.head(3))


Unnamed: 0,ingredient_id,ingredient_name,ingredient_category
0,0,magnolia_tripetala,flower
1,1,calyptranthes_parriculata,plant
2,2,chamaecyparis_pisifera_oil,plant derivative


Unnamed: 0,compound_id,compound_name,compound_code
0,0,jasmone,488-10-8
1,1,5-methylhexanoic_acid,628-46-6
2,2,l-glutamine,56-85-9


Unnamed: 0,ingredient_id,compound_id
0,1392,906
1,1259,861
2,1079,673


## 1.2. Create the flavors bipartite network

In [4]:
# First, we join ingredients and ingr_comp
result = ingredients.set_index('ingredient_id').join(ingr_comp.set_index('ingredient_id'), how='inner')

# We join the result with compounds
flavors = result.set_index('compound_id').join(compounds.set_index('compound_id'), how='inner')

# Show the first 20 nodes of flavors dataframe
display(flavors.head(20))

Unnamed: 0_level_0,ingredient_name,ingredient_category,compound_name,compound_code
compound_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,red_bean,vegetable,jasmone,488-10-8
0,jasmine_tea,plant derivative,jasmone,488-10-8
0,jasmine,flower,jasmone,488-10-8
0,soybean,vegetable,jasmone,488-10-8
0,dried_black_tea,plant derivative,jasmone,488-10-8
0,ceylon_tea,plant derivative,jasmone,488-10-8
0,pittosporum_glabratum,plant,jasmone,488-10-8
0,mung_bean,vegetable,jasmone,488-10-8
0,fermented_tea,plant derivative,jasmone,488-10-8
0,fermented_russian_black_tea,plant derivative,jasmone,488-10-8


In [5]:
# Drop the compound_code column
flavors= flavors.drop(columns=['compound_code'])

# Sort by ingredient_name, then by coumpound_name
flavors = flavors.sort_values(['ingredient_name', 'compound_name'])

# Reset index
flavors= flavors.reset_index(drop=True)

# Show first 20 rows of dataframe flavors
display(flavors.head(20))

Unnamed: 0,ingredient_name,ingredient_category,compound_name
0,abies_alba,plant,bornyl_acetate
1,abies_alba_pine_needle,plant,maltol
2,abies_balsamea_oil,plant derivative,myrcene
3,abies_canadensis,plant,bornyl_acetate
4,abies_concolor,plant,bornyl_acetate
5,abies_sibirica,plant,bornyl_acetate
6,abies_sibirica,plant,camphene
7,abies_sibirica,plant,isoborneol
8,acacia,plant,(e)-2-hexenyl_hexanoate
9,acacia,plant,benzyl_acetate


In [6]:
# Save flavors dataframe into a tab-separated file
flavors.to_csv("flavors.tsv", sep='\t')

## 1.3. Open this bi-partite network in Cytoscape


The following figure shows clusters representing ingredient categories in the flavors network. All compound nodes remain in white.

In [7]:
Image(url="flavors.png", width=1200)

On the other hand, the figure below shows all the compounds that Onion and Garlic have in common. Note that Onion appears at the top of the image, the compounds in the middle, and Garlic at the bottom.

In [8]:
Image(url="compounds-in-common.png", width=1200)

Since the degree of both nodes onion and garlic is 16, we know these two ingredients have 16 compounds in common. Based entirely on their names, 10 of these compounds -such as allyl_sulfide and methyl_sulfide- contain sulfur.

# 2. The ingredient-ingredient graph

## 2.1. Create an ingredient-ingredient.csv file


Converting column ingredient_name of dataframe ingredients to ingredients_array.

In [9]:
ingredients_array = np.asarray(ingredients['ingredient_name'])

print("There are %d ingredients."% len(ingredients_array))

There are 1530 ingredients.


Creating a dictionary ingredient_to_compounds in which keys are ingredients, and values are sets of compounds. 

In [10]:
# Creating a dictionary
ingredient_to_compounds = {}

# Loop through each row in the flavors dataframe
for index, row in flavors.iterrows():
    # Extract ingredient_name and compound_name from each row
    ingredient_name = row['ingredient_name']
    compound_name = row['compound_name']
    
    # Check if ingredient_name is in ingredients_array
    if ingredient_name in ingredients_array:
        # If ingredient_name is not a key, create a set of coumpounds for that ingredient_name
        if ingredient_name not in ingredient_to_compounds:
            ingredient_to_compounds[ingredient_name] = set()
        # Add the compound to the set associated with the ingredient_name
        ingredient_to_compounds[ingredient_name].add(compound_name)
        
# Calculate the number of keys in the dictionary
num_keys = len(ingredient_to_compounds)

print("The dictionary has %d keys" % num_keys)

The dictionary has 1525 keys


We now create a NetworkX graph with nodes representing ingredients and edges of weight x connecting two ingredients having x flavor compounds in common.

In [11]:
ingredient_ingredient = nx.Graph()
# Set the minimum common compounds to 75 so that the graph has around 150 nodes
MIN_COMMON_COMPOUNDS= 75

for u, v in itertools.combinations(ingredient_to_compounds.keys(),2):
        common_compounds = ingredient_to_compounds[u].intersection(ingredient_to_compounds[v])
        if len(common_compounds) >= MIN_COMMON_COMPOUNDS:
            ingredient_ingredient.add_edge(u, v, weight=len(common_compounds))

In [12]:
print("The ingredient-ingredient graph has %d nodes and %d edges" %
      (ingredient_ingredient.number_of_nodes(), ingredient_ingredient.number_of_edges()))

The ingredient-ingredient graph has 152 nodes and 1647 edges


Save the resulting graph into a GML file.

In [13]:
nx.write_gml(ingredient_ingredient, "/Users/taniapazospuig/Desktop/UNI/YEAR_2/TRIM_1/NETWORKS_SCIENCE/networks-science-course/practicum/ingredient-ingredient.gml")

In [14]:
OUTPUT_INGR_INGR_FILENAME = 'ingredient-ingredient.gml'

## 2.2. Work with this file in Cytoscape

The image below shows the visualization of the ingredient-ingredient file. The edge width is directly propertional to the number of coumpounds two ingredients -nodes- have in common. The edge color also reflects the number of common compounds. On the other hand, the node color illustrates different ingredient categories. The style used in the graph is summarized in the legend displayed righ below the network.

In [15]:
display(Image(url="ingr-ingr.png", width=1200))

display(Image(url="ingr-ingr-legend.gif", width=400))

Considering the network structure, it is clear that ingredients of the same category -like white wine and red wine- have more compounds in common. However, precisely because they are part of the same ingredient category -alcoholic beverages-, they do not constitute an interesting combination. 
On the other hand, two ingredients that would perfectly combine would be parmesan cheese and white wine -they are joint by an orange edge, meaning that the number of compounds they have in common is around one hundred. Another interesting union would be beer and roasted beef, with approximately the same number of compounds in common.

<font size="+2" color="#003300">I hereby declare that, except for the code provided by the course instructors, all of my code, report, and figures were produced by myself.</font>
