# Network Analysis

Script based off https://towardsdatascience.com/data-science-in-venture-capital-8c13ec0c8458

### The Data

- Data comes from Crunchbase
- Focused on Singaporean funds and startups

### Data Cleaning

In [73]:
# First, import all of the necessary libraries

import matplotlib.pyplot as plt # used for creating 

import networkx as nx # used for analysing the structure of networks

import pandas as pd # used for data manipulation and analysis

import numpy as np # used for creating n-dimensional arrays

from itertools import combinations # for creating combinations (nCr)

### Importing Data

We will now read two datasets.

- df_investors includes the name of all VC funds headquartered in Singapore with at least 1 investment
- In the case of VC funds, we include the few categories:VC, CVCs, Micro VCs, Family Offices and Venture Debt
- df_startups  includes the name of all Singaporean startups with a total funding amount higher than SGD 500k, as well as the name of their investors
- Here, we are limited as Crunchbase only allows for Top 5 Investors in a startup. Never mind, we make do with the limitations.

In [82]:
# Investors Dataset

# read the csv file of investors
df_investors = pd.read_csv("investors.csv")
print(len(df_investors)) # get a sense how many investors there are

# Startups Dataset
df_startups = pd.read_csv("companies.csv")
print(len(df_startups)) # get a sense how many startups there are

302
971


### Basic Data Exploration

In [83]:
# slicing the investors database so that we only get location and the name
df_investors = df_investors.iloc[:, [0,2]]

# now rename the column header to something a bit more palatable
df_investors = df_investors.rename(columns={"Organization/Person Name" : "Investors"})

df_investors.head()

Unnamed: 0,Investors,Location
0,Wavemaker Partners,"Singapore, Central Region, Singapore"
1,Antler,"Singapore, Central Region, Singapore"
2,BEENEXT,"Singapore, Central Region, Singapore"
3,JAFCO Asia,"Singapore, Central Region, Singapore"
4,EDBI,"Singapore, Central Region, Singapore"


In [84]:
# we slice the startups database so we only get the relevant columns
df_startups = df_startups.iloc[:, [0, 19]]

# let's rename the column header to something a bit more palatable
df_startups = df_startups.rename(columns = {"Top 5 Investors" : "Investors", "Organization Name" : "Organization"})

df_startups.head()

Unnamed: 0,Organization,Investors
0,Sea,"Tencent Holdings, General Atlantic, Hillhouse ..."
1,Funding Societies,"Sequoia Capital India, Line Corporation, Alpha..."
2,Carousell,"Naver, Sequoia Capital India, 500 Startups, Ra..."
3,LongHash Ventures,"HashKey Capital, Fenbushi Capital"
4,Grab Financial Group,"GGV Capital, Flourish Ventures, Arbor Ventures..."


### Data Cleaning

Because the column "Investors" in the df_startups dataset includes all investors that invest in the startups, i.e. a Singaporean startup may have Japanese investors, we want to filter this so that the df_startups dataset only has Singaporean VCs

In [89]:
# extract the investor column from the df_startups, and convert the series into a list
# afterwards, the index for the new dataframe is based off the startups
# stack by column instead of row, and then reset index to create a new index column

# create dictionary of investors
investors = pd.DataFrame(df_startups['Investors'])
investors

sg_investors = investors[]



Unnamed: 0,Investors
0,"Tencent Holdings, General Atlantic, Hillhouse ..."
1,"Sequoia Capital India, Line Corporation, Alpha..."
2,"Naver, Sequoia Capital India, 500 Startups, Ra..."
3,"HashKey Capital, Fenbushi Capital"
4,"GGV Capital, Flourish Ventures, Arbor Ventures..."
...,...
966,MAPE Advisory Group
967,
968,Actis
969,"Southern Cross Venture Partners, Talu Ventures"


In [None]:
# basically now there are two columns with an index column, so afterwards just rename the columns
df_startups.rename({0: 'Investors'}, axis = 1, inplace = True)
df_startups['Investors'] = df_startups["Investors"].str.lstrip()

# merging the datasets to get the singaporean investors
df = pd.merge(df_startups, df_investors, how = "outer")

# drops NA values, drops location after subsetting dataframe to only singaporean investors, and resets index
df = df[df["Location"] == "Singapore"].dropna().drop(["Location"], axis = 1).reset_index(drop = True)

Now exclude the startups with only one investor