Project SNA by Aleksandra Elena Getman (r0884498) and Vaishnav Dilip (r0872689)

<img src ="https://99designs-blog.imgix.net/blog/wp-content/uploads/2016/04/got-title.jpg?auto=format&q=60&fit=max&w=930"></img>

# Introduction

## About the Game of Thrones (HBO Series)

Game of Thrones is fantasy drama television series, which has over 8 series and a total of 73 episodes. 

__This series shows three main storylines;__
1. Fights of the Iron Throne of the Seven Kingdoms: in which various noble characters create a web of political conflicts to fights for the throne or to set the independence from whoever is sitting on the throne 
2. Legal descendant try to reclaim their birth rights to dynasty of the Iron Throne
3. Where the Night's Watch (a military located at the Northern borders) tries to protect all the Kingdoms from the mystical creatures

(Source; https://en.wikipedia.org/wiki/Game_of_Thrones#:~:text=Game%20of%20Thrones%20is%20an,is%20A%20Game%20of%20Thrones)

__Database__

In the following notebooks, we are going to present you our Graph Database (Neo4J) Project using the Game of Thrones database that can be found on the following GitHub repository: https://github.com/mathbeveridge/gameofthrones
This repository contains the pairs of characters found in the HBO series that are connected by (undirected edges) weighted by the number of interactions. 

__In reality, this database contains 5 types of interactions;__
1. Character A speaks directly after Character B
2. Character A speaks about Character B
3. Character C speaks about Characters A and B
4. Characters A and B are mentioned in the same stage direction
5. Characters A and B appear in a scene together

However, the exact type of interaction is not mentioned in each database, instead, the overall overview is provided.

## What is the Project all about?

__Our project objectives__

1. Community mining
2. Link prediction by using network embeddings

__This project consist out of 3 notebooks__
1. First notebook presents the introduction and graph creation out of a database
2. Second notebook define various communities found in different season
3. Third notebook predicts the future links for season 7 and 8 using network embeddings

# Create graph

The following section underneath creates a graph. Nodes and edges are imported from each season of the database. Notice that the graph underneath introduces eight different types of nodes and edges, this is because there are 8 seasons in GOT show. Another important aspect to pinpoint is that a character is not restricted to only appearing in one season, he/ she can come back in different seasons. Yet, we still make a clear distinction between different seasons. For example, Person_1 contains the characters in season 1 and Person_2 will hold the characters in season 2. Both Perosn_1 and Person_2 are not mutually exclusive, meaning that it can be the case that the characters in season 1 also are present in season 2, however, these characters will have other edges. The reason why we still choose to make this clear distinction of characters in each season instead of creating one type of node containing all the characters in each season is that later in the community detection this will cause more convenience, as no extra queries will be required. 

In [1]:
from py2neo import Graph
import pandas as pd
import numpy as np
import os

In [7]:
graph = Graph("bolt://localhost:7687", auth=("neo4j", "neo4jneo4j"))

In [8]:
graph.run("MATCH (n) DETACH DELETE n")

In [9]:
#Making nodes function
def create_nodes(df, season):
    """Function to create the nodes from the csv files.

    Returns:
        str: Acknowledgement string
    """
    i=0
    for idx in df.index:
        id_ = df.loc[idx, 'Id']
        label = df.loc[idx, 'Label'].replace('\'', '')
        cypher = "MERGE (a:Person {id:" + "'" + \
            id_ + "'" + ",label:" + "'" + label + "'})"
        cypher = cypher + f'SET a.seed={i} '
        cypher = cypher + f'SET a:Person_{season}'
        graph.run(cypher)
        i+=1
    return "Done creating nodes"

In [10]:
#Making edges
def create_edges(df):
    """Function to create the edges from the csv files.

    Returns:
        str: Acknowledgement string
    """
    for idx in df.index:
        src = df.loc[idx, 'Source']
        tar = df.loc[idx, 'Target']
        weight = df.loc[idx, 'Weight']
        season = df.loc[idx, 'Season']
        cypher = "MATCH (src:Person {id:" + "'" + src + "'}),"
        cypher = cypher + " (tar:Person {id:" + "'" + tar + "'})"
        cypher = cypher + \
            "MERGE (src)-[r:INTERACTS_"+ str(season) +"]-(tar)"
        cypher = cypher + f"SET r.weight={weight} SET r.season={season}"
        graph.run(cypher)
    return "Done creating edges"

In [11]:
#Creating graph
def create_graph():
    """Function to create the graph from the csv files.

    Returns:
        str: Acknowledgement string
    """
    for season in range(1,9):
        for type in ['nodes','edges']:
            base_url = "https://raw.githubusercontent.com/mathbeveridge/gameofthrones/master/data/"
            final_url = base_url+"got-s"+str(season)+"-"+type+".csv"
            df = pd.read_csv(final_url)
            if type == 'nodes':
                create_nodes(df, season)
            else:
                create_edges(df)
    return "Done creating graph"

In [12]:
create_graph()

'Done creating graph'

Now, let's move to the second notebook which carry out community mining. 