<a href="https://colab.research.google.com/github/yavuzuzun/projects/blob/main/Lab1_ECE442.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# This notebook is a Homework prepared for thr ECE 442 Network Science Analytics class taught by Gonzalo Mateos during Spring 2023
## Manipulating network graphs, introduction to NetworkX and PyTorch Geometric

In this first laboratory we will work with a real dataset, generate a network graph and analyze it using the Python package **[NetworkX](https://networkx.org/)**. We will also introduce **[pandas](https://pandas.pydata.org/)**, an excellent library to load and process datasets efficiently. A third goal of this assigment is to start familiarizing ourselves with **[PyTorch Geometric](https://pytorch-geometric.readthedocs.io/en/latest/)**, a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to network data.

To this end, we will study the email graph of the Enron corporation. Emails exchanged among several Enron employees in the period between November 1998 and June 2002 were made publicly available during the federal investigation; for additional details about the Enron scandal see https://en.wikipedia.org/wiki/Enron_scandal.  The completed dataset can be accessed from http://www.cs.cmu.edu/~enron/. Here we will use a smaller and curated version of the email corpus (for instance, with the email body removed), which can be obtained from http://cis.jhu.edu/~parky/Enron/enron.html. 

For those of you who have never worked with the aforementioned libraries, we hope this laboratory will provide a useful first exposure and bring you up to speed with what you will need for the rest of the course. We ask you upload to Gradescope the answers to all the questions that follow in a report submitted as a single pdf file. You are welcome to explore and play with the data beyond what we ask; let us know what you find!

### Network graph generation

In [None]:
# load the libraries we will use
import pandas as pd
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# get the dataset (see http://cis.jhu.edu/~parky/Enron/enron.html for additional details)
!wget http://cis.jhu.edu/~parky/Enron/employees
!wget http://cis.jhu.edu/~parky/Enron/execs.email.linesnum

In [None]:
# load the data
df_mails = pd.read_csv('execs.email.linesnum', names=['time','from','to'], sep=' ')
df_employees = pd.read_csv('employees', sep='\t', names=['mail', 'name and more'])


In the variable `df_mails` we store a pandas [`DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) with the id of the sender (`from` column) and recepient (`to`) of an email sent at a given timestamp (`time`). In addition, the email user account and other information from the employees are stored in the dataframe  `df_employees`. You can think of a dataframe as an indexed table, but pandas offers plenty of additional functionalities, some of which we will leverage to process the data and generate the network graph.

In [None]:
# compute the dates from the timestamp (in seconds from 1/1/1970)
df_mails['date'] = pd.to_datetime(df_mails.time, unit='s')

# strangely enough there are dates from 1979. Let's remove those.
df_mails = df_mails[df_mails.date.dt.year>1980]

df_mails.head()

### Graph construnction for the entire time horizon

First we construct a network graph spanning all emails.


In [None]:
# count number of emails between a pair of users
mails_exchanged = df_mails.groupby(['from', 'to']).count().reset_index()
mails_exchanged.head()


In [None]:
# the columns "time" and "date" have the same information, so abrbitrarily change one to "weight" which I will use to define edge weights
mails_exchanged.rename(columns={'time':'weight'}, inplace=True)
mails_exchanged.head()

In [None]:
# and here is something nice: pandas can be interfaced with networkx. 
G = nx.from_pandas_edgelist(mails_exchanged, source='from', target='to', edge_attr='weight', create_using=nx.DiGraph)

# remove self loops
G.remove_edges_from(nx.selfloop_edges(G))

# generating a graph visualization is easy...
nx.draw_networkx(G)
plt.show()

In [None]:
# ... but cannot see much, typical ball of yarn phenomena we encounter with large graphs.

# so let's be a little bit more creative
positions = nx.circular_layout(G)
edges = G.edges()
weights = np.array([G[u][v]['weight'] for u,v in edges])

between_dict = nx.betweenness_centrality(G)
between = np.array(list(between_dict.values()))

plt.figure(figsize=(15,15))
nx.draw_networkx_nodes(G, pos=positions, node_color=10*np.log(1+between/(np.min(between)+1e-9)), cmap='Blues')
nx.draw_networkx_edges(G, alpha=0.1, width=np.log10(weights+1), pos=positions)
nx.draw_networkx_labels(G, pos=positions, font_color='black')
plt.title('Network graph of emails exchanged during the whole time period.\n Edge width is proportional to the number of emails exchanged (log scale).\n \
  Vertex color intensity is proportional to its betweeness centrality (log scale).', fontsize=18)
plt.show()

### Interfacing NetworkX with NumPy

In [None]:
# in addition to interfacing with pandas, NetworkX can work with NumPy and matrices

# for instance, obtaining the adjacency matrix is as simple as this
G_np = nx.to_numpy_array(G,nodelist=range(G.number_of_nodes()))
# we plot it using seaborn
sns.heatmap(np.log10(G_np+1), cmap='Greys')
plt.show()
# or we can exclusively focus on the connecitivity pattern...
sns.heatmap(G_np>0, cmap='Greys')
plt.show()


## Network analysis

Now you should use the Networkx or NumPy APIs to compute various summary statistics of the network graph `G(V,E)`: 


1.   Number of directed edges (arcs) in the network, i.e., the number of unique ordered pairs $(u,v)\in E$,
where $u,v\in V$.
2.   Number of undirected edges in the network, i.e., the number of unique unordered pairs $(u,v)\in E$,
where $u,v\in V$. (This means that if at least one of $(u,v)\in E$ or $(v,u)\in E$, you count the pair as a single undirected edge.)
3.   Number of mutual arcs in the network, i.e., the number of pairs $(u,v)$, where $\{(u,v),(v,u)\}\subseteq E$
and $u,v\in V$. (This means that if both $(u,v)\in E$ and $(v,u)\in E$, you count the pair as a mutual arc.)
4.   Number of nodes with $d_v^{\text{in}}=0$, and list the corresponding employee names.
5.   Number of nodes with $d_v^{\text{out}}=0$, and list the corresponding employee names.
6.   Number of employees that have been contacted by 30 or more employees. Generate a new graph visualization and: (i) color these nodes in red; (ii) label these nodes with the corresponding employee names.  
7.   Number of employees that have contacted 30 or more employees. Generate a new graph visualization and: (i) color these nodes in red; (ii) label these nodes with the corresponding employee names. 
8.   Histogram of vertex degrees (separate $d_v^{\text{in}}$ and $d_v^{\text{out}}$). You can for instance use the histplot tool in seaborn.



1. G is generated with diGraph. We may directly ask for the unique set.

In [None]:
print('There are' + ' ' + str(len(G.edges())) + ' ' + 'unique directed edges in the graph')

2. We can get the unique undirected edges by generating an undirected graph.

In [None]:
G_unDir = nx.from_pandas_edgelist(mails_exchanged, source='from', target='to', edge_attr='weight', create_using=nx.Graph)
print('There are' + ' ' + str(len(G_unDir.edges())) + ' ' + 'unique undirected edges in the graph')


3.

In [None]:
print('To prevent double counting, one need to substract the number of the edges\n\
working both way in a directed graph to find the number of the undirected edges.\n\
So the difference between the count of unique directed and undirected edges gives\n\
the number of the mutual arcs, which is' + ' ' + str(len(G.edges())-len(G_unDir.edges())) + '.')


4. In edges for a node is stored in the corresponding column. We can find the nodes without an edge by finding the zeros of the sum of the adjacency matrix along the first dimension.

In [None]:
# Nodes without an incoming edge
d_in = np.sum(G_np,0)
d_in_isolated = np.where(d_in == 0)[0]
d_in_isolated 


In [None]:
# Corresponding names of the nodes are shown below
df_employees.loc[d_in_isolated,"name and more"]


5.

In [None]:
# Nodes without an out edge
d_out = np.sum(G_np,1)
d_out_isolated = np.where(d_out == 0)[0]
d_out_isolated 


In [None]:
# Corresponding names of the nodes are shown below
df_employees.loc[d_out_isolated,"name and more"]


6. We can find the nodes with a high incoming edge with thresholding.

In [None]:
d_in_dense = np.where(d_in >= 30)[0]

In [None]:
color_map = []
labels = {}
ii = 0
for node in range(len(G.nodes())):
    if node in d_in_dense:
        color_map.append('red')
        labels[ii] = df_employees.loc[node,"name and more"]
        ii += 1
    else: 
        color_map.append('blue')
        labels[ii] = ""
        ii += 1

pos = nx.circular_layout(G)
edges = G.edges()
weights = np.array([G[u][v]['weight'] for u,v in edges])

between_dict = nx.betweenness_centrality(G)
between = np.array(list(between_dict.values()))

plt.figure(figsize=(15,15))
nx.draw_networkx_nodes(G, pos=positions, node_color=color_map)
nx.draw_networkx_edges(G, alpha=0.1, width=np.log10(weights+1), pos=positions)
nx.draw_networkx_labels(G, pos, labels, font_color='black')
plt.title('Network graph of emails exchanged during the whole time period.\n \
  Vertex color is red for d_in >= 30, and blue for d_in < 30.', fontsize=18)
plt.show()


7.

In [None]:
d_out_dense = np.where(d_out >= 30)[0]

In [None]:
color_map = []
labels = {}
ii = 0
for node in range(len(G.nodes())):
    if node in d_out_dense:
        color_map.append('red')
        labels[ii] = df_employees.loc[node,"name and more"]
        ii += 1
    else: 
        color_map.append('blue')
        labels[ii] = ""
        ii += 1

pos = nx.circular_layout(G)
edges = G.edges()
weights = np.array([G[u][v]['weight'] for u,v in edges])

between_dict = nx.betweenness_centrality(G)
between = np.array(list(between_dict.values()))

plt.figure(figsize=(15,15))
nx.draw_networkx_nodes(G, pos=positions, node_color=color_map)
nx.draw_networkx_edges(G, alpha=0.1, width=np.log10(weights+1), pos=positions)
nx.draw_networkx_labels(G, pos, labels, font_color='black')
plt.title('Network graph of emails exchanged during the whole time period.\n \
  Vertex color is red for d_in >= 30, and blue for d_in < 30.', fontsize=18)
plt.show()


8. Histogram of vertex degrees (separate 𝑑in𝑣 and 𝑑out𝑣 ). You can for instance use the histplot tool in seaborn.

In [None]:
sns.histplot(np.sum(G_np,0))
plt.title('Distribution of d_in')
plt.xlabel('in_edges')
plt.ylabel('count')


In [None]:
sns.histplot(np.sum(G_np,1))
plt.title('Distribution of d_out')
plt.xlabel('out_edges')
plt.ylabel('count')


## Dynamic (temporal) network analysis

So far we have examined the entire dataset and ignored its temporal dimension. To bridge this gap, in this section we will carry out a simple dynamic network analysis to study how the graph changes across time.

In [None]:
# let's cluster emails per week, so we first check to which week a given email corresponds to and then we add it to df_mails
df_mails['week'] = df_mails.date.dt.to_period('W')
print(df_mails.head())

# per week aggregation. This generates a GroupBy object over which we can iterate, and contains all data for each week
grouped_week = df_mails.groupby('week')
# list that will contain the weekly network graphs
graphs = []
# list that will contain the weeeks themselves. Come be used to identify timestamps down the road. 
weeks = []

for week_id, mails_group in grouped_week:
    # we basically repeated what we did for the entire graph, but on a per week basis. 
    # we will be storing the weekly graphs in a list. Arguably not the most efficient approach, but the dataset is not that large

    # count number of emails between a pair of users this week
    mails_exchanged = mails_group.groupby(['from', 'to']).count().reset_index()
    # the columns have the same information, so abrbitrarily change one to "weight" which I will use to define edge weights
    mails_exchanged.rename(columns={'week':'weight'}, inplace=True)
    G = nx.from_pandas_edgelist(mails_exchanged, source='from', target='to', edge_attr='weight', create_using=nx.DiGraph)
    
    # remove self loops
    G.remove_edges_from(nx.selfloop_edges(G))
    
    # add the new graph to the list
    graphs.append(G)
    weeks.append(week_id)
    


In [None]:
# let's examine the temporal evolution of some simple summary statistcs

num_nodes = [current_graph.number_of_nodes() for current_graph in graphs]
num_arcs = [current_graph.number_of_edges() for current_graph in graphs]
pd.DataFrame({'n_nodes':num_nodes, 'n_arcs':num_arcs}, index=weeks).plot(figsize=(12,6))
plt.grid()
plt.legend(['Number of nodes', 'Number of arcs'])
plt.show()


### Changes in the network graph
9. Pick two node centrality measures of your choice (see e.g., Ch. 4 of E. Kolaczyk's book Statistical Analysis of Network Data, the [lecture slides on centrality](https://www.hajim.rochester.edu/ece/sites/gmateos/ECE442/Slides/block_3_descriptive_analysis_properties_part_c.pdf), or the [NetworkX documentation](https://networkx.org/documentation/stable/reference/algorithms/centrality.html)) and indicate who was the most central Enron employee each week according to each of these measures.  Compare your results with what you obtain for the "entire" graph (namely, the network constructed earlier using data for the whole time horizon). 
10. Experiment with a few graph-level summary statistics (e.g., number of nodes, edges, average degree, average clustering coefficient, or any other of your liking) and use them to identify some of the major events tied to the scandal (Figure 8 in https://arxiv.org/abs/1403.0989 has a very nice timeline that could help). Likely you should be able to spot the launch of Enron online and Stephen Cooper's ascent to the CEO role.

9. Here I found the most central node using betweenness centrality and eigenvalue centrality. Initially I showed the results without taking time into account. Later, showed the results for the course of the scandal.

In [None]:
# Generate the directed graph for . 
G_all = nx.from_pandas_edgelist(mails_exchanged, source='from', target='to', edge_attr='weight', create_using=nx.DiGraph)
G_all.remove_edges_from(nx.selfloop_edges(G_all))

In [None]:
bc_all = nx.betweenness_centrality(G_all)
# node with the maximum betweenness centrality
max_node_bc  = max(bc_all, key=bc_all.get)
print(df_employees.loc[max_node_bc,"name and more"] , ' ' , 'is the most central node according to the betweenness\n\
 centrality without taking the temporal part of the graph into consideration')

In [None]:
ec_all = nx.eigenvector_centrality(G_all, max_iter=10000)
# node with the maximum betweenness centrality
max_node_ec  = max(ec_all, key=ec_all.get)
print(df_employees.loc[max_node_ec,"name and more"] , ' ' , 'is the most central node according to the eigenvalue\n\
 centrality without taking the temporal part of the graph into consideration')

In [None]:
# Data frame already has a column for weeks. We can directly use it.
# per week aggregation. This generates a GroupBy object over which we can iterate, and contains all data for each week
grouped_week = df_mails.groupby('week')
# list that will contain the weekly network graphs
graphs = []
# list that will contain the weeeks themselves. Come be used to identify timestamps down the road. 
weeks = []

# create list to store central nodes for each week 
betweenness_c = []
eigenvector_c = []

for week_id, mails_group in grouped_week:
    # we basically repeated what we did for the entire graph, but on a per week basis. 
    # we will be storing the weekly graphs in a list. Arguably not the most efficient approach, but the dataset is not that large

    # count number of emails between a pair of users this week
    mails_exchanged = mails_group.groupby(['from', 'to']).count().reset_index()
    # the columns have the same information, so abrbitrarily change one to "weight" which I will use to define edge weights
    mails_exchanged.rename(columns={'week':'weight'}, inplace=True)
    G = nx.from_pandas_edgelist(mails_exchanged, source='from', target='to', edge_attr='weight', create_using=nx.DiGraph)
    # remove self loops
    G.remove_edges_from(nx.selfloop_edges(G))

    bc = nx.betweenness_centrality(G)
    # node with the maximum betweenness centrality
    max_node_bc  = max(bc, key=bc.get)
    print(df_employees.loc[max_node_bc,"name and more"] , 'is the most central node according to the betweenness\n\
    centrality for week ', week_id)

    ec = nx.eigenvector_centrality(G, max_iter=10000)
    # node with the maximum betweenness centrality
    max_node_ec  = max(ec, key=ec.get)
    print(df_employees.loc[max_node_ec,"name and more"] , ' ' , 'is the most central node according to the eigenvalue\n\
    centrality for week ', week_id)
    
    # add the new graph to the list
    graphs.append(G)
    weeks.append(week_id)
    betweenness_c.append(max_node_bc)
    eigenvector_c.append(max_node_ec)

Overall Chris Germany is the most central node according to betweenness and eigenvector centralities. However, temporal part indicates that Chris Germany is become the most central node only for the last one and a half month.

10. Number of nodes, edges, average degree, average clustering coefficient as an indicator of changes over the network.

In [None]:
# Colect the statistical feature of the graph for each week and plot

num_nodes = [current_graph.number_of_nodes() for current_graph in graphs]
num_arcs = [current_graph.number_of_edges() for current_graph in graphs]
averageDegree = [current_graph.number_of_edges()/current_graph.number_of_nodes() for current_graph in graphs]
averageClustering = [nx.average_clustering(current_graph) for current_graph in graphs]

pd.DataFrame({'n_nodes':num_nodes, 'n_arcs':num_arcs}, index=weeks).plot(figsize=(12,6))
plt.grid()
plt.legend(['Number of nodes', 'Number of arcs'])
plt.show()

pd.DataFrame({'averageDegree' : averageDegree, 'averageClustering' : averageClustering}, index=weeks).plot(figsize=(12,6))
plt.grid()
plt.legend(['Average degree', 'Average Clustering'])
plt.show()

The launch of Enron online corresponds to the fluctiations of the average clustering at the end of 1999, and Stephen Cooper's ascent to the CEO role corresponds to the plummiting of the average clustering coefficient in 2002.

## Introduction to Pytorch Geometric (PyG)
**[PyTorch Geometric](https://github.com/rusty1s/pytorch_geometric)** is a Python library for deep learning on graphs, which provides the required functionatility to work with Graph Neural Networks (GNNs). The library is an extension of **[PyTorch](https://pytorch.org/)**, arguably the most widely adopted open source deep learning framework.

In [None]:
import torch
print(f"PyTorch version is {torch.__version__}")

In [None]:
# install PyG for the working version of PyTorch
!pip install torch-scatter -f https://data.pyg.org/whl/torch-{torch.__version__}.html
!pip install torch-sparse -f https://data.pyg.org/whl/torch-{torch.__version__}.html
!pip install torch-geometric

PyG includes several network datasets in the package **[torch_geometric.datasets](https://pytorch-geometric.readthedocs.io/en/latest/modules/datasets.html#torch_geometric.datasets)**. In this part of the laboratory we will work with a dataset that has become a *de facto* testbed for community detection algorithms, namely [**Zachary's karate club network**](https://en.wikipedia.org/wiki/Zachary%27s_karate_club).

In [None]:
from torch_geometric.datasets import KarateClub

dataset = KarateClub()
print(f'Dataset: {dataset}:')
print('======================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

The dataset consists of a single network graph, each vertex has an associated vector in $\mathbb{R}^{34}$ (a so-termed nodal *feature* vector), and nodes are partitioned in 4 classes. Let's examine some other network summary statistics:

In [None]:
# focus on the first time (and only) graph
data = dataset[0]

print(data)
print('==============================================================')

# network charactersitics
print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average degree: {(2*data.num_edges) / data.num_nodes:.2f}')
print(f'Graph has isolated nodes: {data.has_isolated_nodes()}')
print(f'Graph has self loops: {data.has_self_loops()}')
print(f'Graph is undirected: {data.is_undirected()}')

A graph in PyG by an object of type [`Data`](https://pytorch-geometric.readthedocs.io/en/latest/modules/data.html#torch_geometric.data.Data). Each of these objects has at least 5 attributes:
- **`x`**: is a network-wide feature matrix associated to the vertices (that is, a matrix whose columns are the nodal feature vectors). It is an object of type [`tensor`](https://pytorch.org/docs/stable/tensors.html), torch's native type to store matrices (the equivalent to [`ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html) in numpy).
- **`edge_index`**: is the graph's connectivity matrix in [COO](https://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_(COO)) format. This format is very useful to store and work with *sparse* matrices (those having a large number of zeros, here denoting non-edges). It only stores a list of nodes connected by edges, instead of storting the whole adjacency matrix.
- **`y`**: is a matrix of nodal labels (for the Karate club, the matrix that encodes the class membership of each vetex).
- **`train_mask`**: binary matrix indicating the subset of vertices that are part of the training set. This will be useful down the road when we e.g.,  build and train a GNN model for node classification.
- **`edge_attr`**: is a network-wide feature matrix associated to the edges. Since the Karate club network is unweighted, the dataset has no edge features. 

In [None]:
print('data.x')
print('========================================')
print(data.x)
print('\ndata.edge_index')
print('=========================================')
print(data.edge_index.t())
print('\ndata.y')
print('=========================================')
print(data.y)
print('\ndata.train_mask')
print('=========================================')
print(data.train_mask)
print('\ndata.edge_attr')
print('=========================================')
print(data.edge_attr)

PyG offers a simple interface to convert a graph into NetworkX's format

In [None]:
from torch_geometric.utils import to_networkx
G = to_networkx(data, to_undirected=True)
nx.draw_networkx(G,node_color=data.y,pos=nx.spring_layout(G, seed=42))

## Verify properties of the graph Laplacian
The goal of the following questions is to empirically verify a few properties of the graph Laplacian matrix. In the **optional** exercise below, you are asked to mathematically establish those properties.

11. Compute the graph Laplacian matrix $\mathbf{L}$ for Zachary's karate club network. You are encouraged to use some suitable function from the subpackage [`torch_geometric.utils`](https://pytorch-geometric.readthedocs.io/en/latest/modules/utils.html).
12. Check that $\mathbf{L}$ has a 0 eigenvalue and verify that the vector of all ones $[1,1,\dots,1]^\top$ is the corresponding eigenvector. The subpackage [`torch.linalg`](https://pytorch.org/docs/stable/linalg.html) may be useful to that end.
13. Corroborate that $\mathbf{L}$ is a symmetric positive semidefinite matrix.
14. Form a matrix $\tilde{\mathbf{B}}$ as described in Part 2 of the optional exercise below and verify that $\mathbf{L}=\tilde{\mathbf{B}}\tilde{\mathbf{B}}^\top$. You are encouraged to use the function [`networkx.incidence_matrix`](https://networkx.org/documentation/stable/reference/generated/networkx.linalg.graphmatrix.incidence_matrix.html).

11. Graph Laplacian

In [None]:
from scipy.sparse.csgraph import laplacian
aa = nx.to_numpy_array(G)
L = laplacian(aa)
L

In [None]:
L[0]

12.

In [None]:
from scipy.linalg import eig
w, v = eig(L)

In [None]:
w # Eigenvalue array

In [None]:
# Only eigenvalue that is negative is on the order of -e15. So, we can assume that it is zero.
# The corresponding eigenvector is
eigMin = v[:,np.where(w==w.min())]
eigMin = eigMin/np.mean(eigMin)
np.transpose(eigMin)

Confirms that the corresponding eigenvector is array of ones.

13. To check whether the Laplacian is symmetric or not we can compare it with its transpose.

In [None]:
np.allclose(L,np.transpose(L))

A matrix is said to be positive semidefinite when all its eigenvalues are non-negative. We can check that after we get rid of the numerical artifacts. To be on the safe side I added small positive number to the eigenvalue matrix before checking.

In [None]:
w+1e-10 >= 0

14. Since it is arbitrary to choose the start and the end of the directed graph, we can multiply upper triangle with -1 and check the hypothesis.

In [None]:
G = to_networkx(data, to_undirected=False)

In [None]:
B = nx.to_numpy_array(G)
L == np.matmul(B,np.transpose(B))

In [None]:
aa = nx.to_numpy_array(G)
aa

In [None]:
bb = aa
for ii in range(np.size(bb,0)):
  for jj in range(ii, np.size(bb,1)-1):
    bb[ii,jj+1] = bb[ii,jj+1]*-1
bb

In [None]:
L == np.matmul(bb,np.transpose(bb))

I tried direct calculation or assigning in and out edges but could not show that last equality works.

# Acknowledgements

An intial version of this Laboratory (in Spanish) was conceived and developed by colleagues from [Facultad de Ingenieria](https://www.fing.edu.uy) in Montevideo, Uruguay and myself, for the course **[Aprendizaje Automático para Datos en Grafos](https://eva.fing.edu.uy/course/view.php?id=1626&section=0)**.

The first part in the section '**Introduction to PyTorch Geometric**' is based on [this notebook](https://colab.research.google.com/drive/16tqEHKOLUgYvXKx1V3blfYGpQb1_09MG?usp=sharing#scrollTo=bbny-iTO7NQN) from Stanford's course **[CS224W: Machine Learning with Graphs](http://web.stanford.edu/class/cs224w)**.