# Network Analysis

Script based off https://towardsdatascience.com/data-science-in-venture-capital-8c13ec0c8458

### The Data

- Data comes from Crunchbase
- Focused on Singaporean funds and startups

### Data Cleaning

In [1]:
# First, import all of the necessary libraries

import matplotlib.pyplot as plt # used for creating 

import networkx as nx # used for analysing the structure of networks

import pandas as pd # used for data manipulation and analysis

import numpy as np # used for creating n-dimensional arrays

from itertools import combinations # for creating combinations (nCr)

### Importing Data

We will now read two datasets.

- df_investors includes the name of all VC funds headquartered in Singapore with at least 1 investment
- In the case of VC funds, we include the few categories:VC, CVCs, Micro VCs, Family Offices and Venture Debt
- df_startups  includes the name of all Singaporean startups with a total funding amount higher than SGD 500k, as well as the name of their investors

EDIT: OK SO IT TURNS OUT THAT CRUNCHBASE PRO TRIAL DOESN'T ALLOW DOWNLOADING AND PITCHBOOK ENTERPRISE (SCHOOL VERSION) ONLY ALLOWS 10 DOWNLOADS A DAY SO IT WOULD WASTE TOO MUCH TIME TO RANDOMLY CREATE DATASETS

In [None]:
# Investors Dataset

# read the csv file of investors
df_investors = pd.read_csv("Investors.csv")
print(len(df_investors)) # get a sense how many investors there are

# Startups Dataset
df_startups = pd.read_csv("Companies.csv")
print(len(df_startups)) # get a sense how many startups there are

### Basic Data Exploration

In [None]:
df_investors.head()

In [None]:
df_startups.head()

### Data Cleaning

Because the column "Investors" in the df_startups dataset includes all investors that invest in the startups, i.e. a Singaporean startup may have Japanese investors, we want to filter this so that the df_startups dataset only has Singaporean VCs

In [None]:
# extract the investor column from the df_startups, and convert the series into a list
# afterwards, the index for the new dataframe is based off the startups
# stack by column instead of row, and then reset index to create a new index column

df_startups = (pd.DataFrame(df_startups.Investors.str.split(',').tolist(),
                           index = df_startups.Organization).stack().reset_index([0, "Organization"]))


# basically now there are two columns with an index column, so afterwards just rename the columns
df_startups.rename({0: 'Investors'}, axis = 1, inplace = True)
df_startups['Investors'] = df_startups["Investors"].str.lstrip()

# merging the datasets to get the singaporean investors
df = pd.merge(df_startups, df_investors, how = "outer")

# drops NA values, drops location after subsetting dataframe to only singaporean investors, and resets index
df = df[df["Location"] == "Singapore"].dropna().drop(["Location"], axis = 1).reset_index(drop = True)