# INDIAN START-UP FUNDING ANALYSIS

# Hypothesis

Null hypothesis: The sector a start-up belongs to did not influence the funding amount the start-up received.

Alternate hypothesis: The sector a start-up belongs to influenced the funding amount the start-up received.

# Questions

1. How many start-ups got funded each year?
2. What is the total funding amount received by start-ups each year?
3. Which ten start-ups received the most funding?
4. Which ten start-ups received the least funding?
5. Which ten sectors received the most funding?
6. Which ten sectors received the least funding?
7. Which ten headquarters received the most funding?
8. Which ten headquarters received the least funding?
9. What is the trend of funding received by start-ups in the top 5 headquarters from 2018 to 2021?

In [None]:
# Importing libraries

import pandas as pd
import numpy as np
import matplotlib as mp
import matplotlib.pyplot as  plt
import seaborn as sns
import scipy.stats as stats
import os

: 

In [None]:
# Loading the datasets

df1 = pd.read_csv('startup_funding2018.csv')
df2 = pd.read_csv('startup_funding2019.csv')
df3 = pd.read_csv('startup_funding2020.csv')
df4 = pd.read_csv('startup_funding2021.csv')

: 

In [None]:
# Evaluating the first dataset

df1.head()

: 

In [None]:
# Evaluating the second dataset

df2.head()

: 

In [None]:
# Evaluating the third dataset

df3.head()

: 

In [None]:
# Evaluating the fourth dataset

df4.head()

: 

# Observations

The first dataset does not have columns named "Founded", "Founders" and "Investor" which are in the other three datasets. Considering the fact that the information on these columns do not affect our hypothesis and questions, these columns will be dropped to enable us merge all the datasets. The "What it does" and "Stage" columns that are not relevant to the hypothesis and analytical questions, and will also be dropped in order to make the dataset more manageable and to reduce the amount of noise in the dataset. We will do this while cleaning the datasets.

There is a column named "Unnamed: 9" in the third dataset. Its first five rows have missing values, and other datasets do not contain such column. This column will be examined while cleaning the third dataset.

All the datasets have similar column names except the first dataset. This means that there is need to edit the column names of the first dataset to match the column names of the other datasets. This will be done while cleaning the first dataset.

There will be need to add a column named "Funding Year" to each of the datasets. This "Funding Year" column will show the year of funding that each row belong to, for easy identification when all the datasets are merged, and for anwering our analytical questions.

# Cleaning the first dataset

In [None]:
# Evaluating the dataset

df1.head()

: 

In [None]:
# Changing the column names using a dictionary, to fit the column names of the other datasets.

column_dict = {
    'Company Name': 'Company/Brand',
    'Industry': 'Sector',
    'Round/Series': 'Stage',
    'Location': 'HeadQuarter',
    'About Company': 'What it does',
    'Amount': 'Amount($)'
}

df1 = df1.rename(columns=column_dict)
df1.head()

: 

In [None]:
# Dropping the "Stage" and "What it does" columns that are irrelevant to our hypothesis and analytical questions

df1 = df1.drop(["Stage", "What it does"], axis=1)
df1.head()

: 

In [None]:
# Checking the cell values

df1.info()

: 

In [None]:
# Confirming that there are no empty cells

df1.isna().sum()

: 

There are no empty cells in the dataset.

In [None]:
# Checking for duplicate rows

df1.duplicated().sum()

: 

The dataset has one duplicate row

In [None]:
# Dropping the duplicate row

df1 = df1.drop_duplicates()

# Confirming that the duplicate row has been dropped

df1.duplicated().sum()

: 

The duplicate row has been dropped

In [None]:
df1.info()

: 

After dropping the duplicate row, the number of rows reduced from 526 to 525

# Checking for data quality

In [None]:
# Rearranging the columns in the dataset to fit the format in the second, third and fourth datasets

df1 = df1.loc[:, ["Company/Brand", "HeadQuarter", "Sector", "Amount($)"]]
df1.head()

: 

In [None]:
# Inspecting the "Company/Brand" column

df1["Company/Brand"].unique()

: 

In [None]:
# Converting the values in "Company/Brand" column to sentence case

df1["Company/Brand"] = df1["Company/Brand"].str.title()
df1["Company/Brand"].unique()

: 

In [None]:
# Inspecting the "HeadQuarter" column

df1["HeadQuarter"].unique()

: 

The cells in the "HeadQuarter" column contain the name of the city, state, and country. Since only the name of the city is required and it is the first name before the first comma on the cells, the first name on each cell will be kept and the rest of the names removed.

In [None]:
# Splitting the "HeadQuarter" column values by comma and selecting only the first word (name of the city)

df1["HeadQuarter"] = df1["HeadQuarter"].str.split(',').str[0]
df1

: 

In [None]:
# Inspecting the "HeadQuarter" column again

df1["HeadQuarter"].unique()

: 

In [None]:
# Identifying the rows that have 'India' in the "HeadQuarter" column.

df1.loc[df1["HeadQuarter"] == 'India']

: 

These rows are all in the 2018 (first) dataset. Let's evaluate these rows in the original 2018 dataset.

In [None]:
# Loading the original 2018 dataset with a different name
ddf1 = pd.read_csv('startup_funding2018.csv')

# Inspecting the 13th row in the original 2018 dataset
ddf1.iloc[12:13]

: 

In [None]:
# Inspecting the 43rd row in the original 2018 dataset
ddf1.iloc[42:43]

: 

In [None]:
# Inspecting the 59th row in the original 2018 dataset
ddf1.iloc[59:60]

: 

In [None]:
# Inspecting the 199th row in the original 2018 dataset
ddf1.iloc[199:200]

: 

Since each of these rows do not have any city or state name in the "Location" (now named "HeadQuarter") column, 'India' will be replaced with 'N/A'. 'India' cannot be used as the HeadQuarter to avoid a false generalization since other locations in the column are all in India.

In [None]:
# Replacing 'India' with 'N/A'.
df1["HeadQuarter"] = df1["HeadQuarter"].replace('India', 'N/A', regex=True)

: 

In [None]:
# Replacing Bangalore and Bangalore City with Bengaluru, which is the official name of the city.

df1["HeadQuarter"] = df1["HeadQuarter"].replace(['Bangalore', 'Bangalore City'], 'Bengaluru', regex=True)
df1["HeadQuarter"].unique()

: 

In [None]:
# Inspecting the "Sector" column

df1["Sector"].unique()

: 

The "Sector" column in the first dataset does not have specific words, unlike the "Sector" column of other datasets. It will be necessary to keep only the relevant word that best describes the sector of each company.

It can be summarized that the first word on each row on the "Sector" column best describes the sector. As a result, the first word on each row on the "Sector" column will be kept while other words will be dropped.

In [None]:
# Splitting the "Sector" column values by comma and selecting only the first word

df1["Sector"] = df1["Sector"].str.split(',').str[0]
df1

: 

In [None]:
# Inspecting the "Sector" column again

df1["Sector"].unique()

: 

In [None]:
# Replacing the dash symbol in the "Sector" column with N/A

df1["Sector"]= df1["Sector"].replace('—','N/A')
df1["Sector"].unique()

: 

In [None]:
# Inspecting the "Amounts ($)" column

df1["Amount($)"].unique()

: 

The cells of the "Amounts ($)" colummn of the dataset has amounts in rupees, amounts in dollars, and dash symbols. The rupee amounts will be converted to dollars since the amounts in other datasets are also in dollars. The average exchange rate of rupees to dollars in 2018 will be used to ensure accurate conversion.

The rupee amounts alone need to be converted and the presence of the rupee symbol in the amounts suggest that the datatype would be string. There is need to change the datatypes to float to make the conversion possible. We will first remove all the non-numerical values then applying .astype("int64"). The non-numerical values are the rupee symbol and the comma symbol. While removing them, we will use the lambda function to ensure that NaN values are not returned.

In [None]:
# Define the average exchange rate in 2018
exchange_rate = 0.0146  # That is, 1 Rupee = 0.0146 Dollar

# Select the rows with rupee amounts
rupee_rows = df1["Amount($)"].str.contains('₹') # Note that only the rupee amounts should be converted 

# Remove the non-numerical values and change the datatype to integer
df1.loc[rupee_rows, "Amount($)"] = df1.loc[rupee_rows, "Amount($)"].apply(lambda x: x.replace('₹', '').replace(',', '')
                                if isinstance(x, str) else x).astype("int64")

# Convert the rupee amounts to dollars
df1.loc[rupee_rows, "Amount($)"] = df1.loc[rupee_rows, "Amount($)"] * exchange_rate

df1

: 

# Handling the dash symbols

In [None]:
# Counting the number of cells containing the dash symbol

(df1["Amount($)"] == '—').sum()

: 

It will be assumed that the funding amounts received by these 148 companies/brands were lost during the data collection process or were undisclosed. The dash symbols will be replaced with 0, to easily change the column datatype to integer.

In [None]:
# Replacing the dash symbol with zero and investigating the column again

df1["Amount($)"] = df1["Amount($)"].replace('—', 0, regex=True)
df1["Amount($)"].unique()

: 

Some cells still have the dollar and comma symbols which are non-numeric characters. These non-numeric characters need to be removed.

In [None]:
# Removing the dollar and commma symbols in the entire Amount($) column

df1["Amount($)"] = df1["Amount($)"].replace(['\$', ','], '', regex=True)
df1["Amount($)"].unique()

: 

In [None]:
# Converting the datatype of the Amount($) column to integer

df1["Amount($)"] = df1["Amount($)"].astype("int64")
df1["Amount($)"].unique()

: 

In [None]:
# Evaluating the entire dataset

df1.head()

: 

In [None]:
# Confirming that there are no empty cells

df1.isna().sum()

: 

There are no empty cells in our dateset, and the columns have been inspected and cleaned.

Next is to add the "Funding Year" column.

In [None]:
# Creating a new column named "Funding Year" and filling it with 2018.

df1["Funding Year"] = 2018
df1.head()

: 

# Cleaning the second data set

In [None]:
# Evaluating the dataset

df2

: 

In [None]:
# Dropping the "Founded", "What it does", "Founders", "Investor" and "Stage" columns

df2 = df2.drop(["Founded", "What it does", "Founders", "Investor", "Stage"], axis =1)
df2.head()

: 

In [None]:
# Checking the cell values

df2.info()

: 

In [None]:
# Confirming the number of empty cells

df2.isna().sum()

: 

There are nineteen empty cells in the "HeadQuarter" column and five empty cells in the "Sector" column. This will be handled when inspecting the columns.

In [None]:
# Checking for duplicate rows

df2.duplicated().sum()

: 

There are no duplicate rows in the dataset.

# Checking the data quality

In [None]:
# Inspecting the "Company/Brand" column

df2["Company/Brand"].unique()

: 

In [None]:
# Changing the name "Fireflies .ai" to "Fireflies.ai"

df2["Company/Brand"]= df2["Company/Brand"].replace('Fireflies .ai','Fireflies.ai')
df2["Company/Brand"].unique()

: 

In [None]:
# Inspecting the "HeadQuarter" column

df2["HeadQuarter"].unique()

: 

In [None]:
# Filling the empty cells in the "HeadQuarter" column with N/A

df2["HeadQuarter"]= df2["HeadQuarter"].fillna('N/A')
df2["HeadQuarter"].unique()

: 

In [None]:
# Replacing Bangalore with Bengaluru

df2["HeadQuarter"] = df2["HeadQuarter"].replace('Bangalore', 'Bengaluru', regex=True)
df2["HeadQuarter"].unique()

: 

In [None]:
# Replacing Uttar pradesh with Uttar Pradesh

df2["HeadQuarter"] = df2["HeadQuarter"].replace('Uttar pradesh', 'Uttar Pradesh', regex=True)
df2["HeadQuarter"].unique()

: 

In [None]:
# Inspecting the "Sector" column

df2["Sector"].unique()

: 

In [None]:
# Converting the values in "Sector" column to sentence case

df2["Sector"] = df2["Sector"].str.title()
df2["Sector"].unique()

: 

In [None]:
# Filling the empty cells in the "Sector" column with N/A

df2["Sector"]= df2["Sector"].fillna('N/A')
df2["Sector"].unique()

: 

In [None]:
# Inspecting the "Amount($)" column

df2["Amount($)"].unique()

: 

In [None]:
# Removing the dollar and commma symbols in the Amount($) column

df2["Amount($)"] = df2["Amount($)"].replace(['\$', ','], '', regex=True)
df2["Amount($)"].unique()

: 

In [None]:
# Replacing Undisclosed in the "Amount($)" column with zero, in order to change the datatype to integer

df2["Amount($)"] = df2["Amount($)"].replace('Undisclosed', 0, regex=True)
df2["Amount($)"].unique()

: 

In [None]:
# Converting the datatype of the Amount($) column to integer

df2["Amount($)"] = df2["Amount($)"].astype("int64")
df2["Amount($)"].unique()

: 

In [None]:
# Evaluating the entire dataset

df2.head()

: 

In [None]:
# Confirming that there are no empty cells

df2.isna().sum()

: 

There are no empty cells in our dateset, and the columns have been inspected and cleaned.

Next is to add the "Funding Year" column.

In [None]:
# Creating a new column named "Funding Year" and filling it with 2019

df2["Funding Year"] = 2019
df2.head()

: 

# Cleaning the third data set

In [None]:
# Evaluating the third dataset

df3.head()

: 

In [None]:
# Dropping the "Founded", "What it does", "Founders", "Investor" and "Stage" columns

df3 = df3.drop(["Founded", "What it does", "Founders", "Investor", "Stage"], axis =1)
df3.head()

: 

In [None]:
# Checking the cell values

df3.info()

: 

In [None]:
# Confirming the number of empty cells

df3.isna().sum()

: 

There are ninety-four empty cells in the "HeadQuarter" column, thirteen empty cells in the "Sector" column, three empty cells in the "Amount($)" column and one thousand and fifty-three empty cells in the "Unnamed: 9" column. The empty cells in the "HeadQuarter" and "Sector" columns will be filled with N/A. This will be done when inspecting the cells.

The "Unnamed: 9" column has only two cells with cell values, it is absent in other datasets, and it has no effect on our hypothesis and questions. For these reasons, this column will be dropped.

In [None]:
# Dropping the "Unnamed: 9" column

df3 = df3.drop(["Unnamed: 9"], axis =1)
df3.head()

: 

In [None]:
# Confirming that there are no empty cells

df3.isna().sum()

: 

In [None]:
# Checking for duplicate rows

df3.duplicated().sum()

: 

In [None]:
# Dropping the duplicate rows

df3 = df3.drop_duplicates()

# Confirming that the duplicate rows have been dropped

df3.duplicated().sum()

: 

In [None]:
df3.info()

: 

After dropping the duplicate rows, the number of rows reduced from 1055 to 1048.

# Checking the data quality

In [None]:
# Inspecting the "Company/Brand" column

df3["Company/Brand"].unique()

: 

In [None]:
# Inspecting the "HeadQuarter" column

df3["HeadQuarter"].unique()

: 

Some cells in the "HeadQuarter" column contain the name of the city, state, and country. Since only the name of the city is required and it is the first name before the first comma on the cells, the first name on each cell will be kept and the rest of the names removed.

In [None]:
# Splitting the "HeadQuarter" column values by comma and selecting only the first word (name of the city)

df3["HeadQuarter"] = df3["HeadQuarter"].str.split(',').str[0]
df3["HeadQuarter"].unique()

: 

In [None]:
# Filling the empty cells in the "HeadQuarter" column with N/A

df3["HeadQuarter"] = df3["HeadQuarter"].fillna('N/A')
df3["HeadQuarter"].unique()

: 

In [None]:
# Replacing Bangalore and Banglore with Bengaluru

df3["HeadQuarter"] = df3["HeadQuarter"].replace(['Bangalore', 'Banglore'], 'Bengaluru', regex=True)
df3["HeadQuarter"].unique()

: 

In [None]:
# Inspecting the "Sector" column

df3["Sector"].unique()

: 

In [None]:
# Filling the empty cells in the "Sector" column with N/A

df3["Sector"] = df3["Sector"].fillna('N/A')
df3["Sector"].unique()

: 

In [None]:
# Inspecting the "Amount($)" column

df3["Amount($)"].unique()

: 

In [None]:
# Replacing Undisclosed, Undiclsosed and Undislosed in the "Amount($)" column with zero

df3["Amount($)"] = df3["Amount($)"].replace(['Undisclosed', 'Undiclsosed', 'Undislosed'], 0, regex=True)
df3["Amount($)"].unique()

: 

In [None]:
# Filling the empty cells in the "Amount($)" column with zero

df3["Amount($)"] = df3["Amount($)"].fillna(0)
df3["Amount($)"].unique()

: 

In [None]:
# Removing the dollar and commma symbols in the Amount($) column

df3["Amount($)"] = df3["Amount($)"].replace(['\$', ','], '', regex=True)
df3["Amount($)"].unique()

: 

At this point, the "Amount($)" column datatype could still not be converted to integer because of the presence of non-integer values. There is need to identify these non-integer values. To do this, a boolean mask will be created to identify the non-integer values. This boolean mask will be used to create a new DataFrame with only the non-integer values. Before creating the boolean mask, the entire column will first be converted to a string to ensure a more accurate DataFrame result.

In [None]:
# Creating a boolean mask to identify non-integer values in Amount($) column
non_int_mask = df3["Amount($)"].apply(lambda x: not str(x).isdigit())

# Creating a new DataFrame with only the non-integer values
non_int_df3_amounts = df3[non_int_mask]
non_int_df3_amounts

: 

Having identified the non-integer values, they will be worked on and changed to integer values.

In [None]:
# Replacing the amounts "887000 23000000" and 800000000 to 850000000 with their avaerage respectively

df3["Amount($)"].replace("887000 23000000", np.mean([887000, 23000000]), inplace = True, regex=True)
df3["Amount($)"].replace("800000000 to 850000000", np.mean([800000000, 850000000]), inplace = True, regex=True)

: 

In [None]:
# Checking the datatype of 8700000 in Amount($)

type(df3["Amount($)"][548]) # 8700000 has a row header of 548

: 

Since the datatype of 8700000 is a string, it will be converted to an integer easily when converting the entire column to integer.

In [None]:
# Removing the decimal points in 9.600000 and 42.23000

df3['Amount($)'].replace('9.600000','9600000', inplace= True,regex=True)
df3['Amount($)'].replace('42.23000','4223000',inplace =True, regex=True)

: 

In [None]:
# Converting the datatype of the Amount($) column to integer

df3["Amount($)"] = df3["Amount($)"].astype("int64")
df3["Amount($)"].unique()

: 

In [None]:
# Evaluating the entire dataset

df3.head()

: 

In [None]:
# Confirming that there are no empty cells

df3.isna().sum()

: 

There are no empty cells in our dateset, and the columns have been inspected and cleaned.

Next is to add the "Funding Year" column.

In [None]:
# Creating a new column named "Funding Year" and filling it with 2020

df3["Funding Year"] = 2020
df3.head()

: 

# Cleaning the fourth data set

In [None]:
# Evaluating the fourth dataset

df4.head()

: 

In [None]:
# Dropping the "Founded", "What it does", "Founders", "Investor" and "Stage" columns

df4 = df4.drop(["Founded", "What it does", "Founders", "Investor", "Stage"], axis =1)
df4.head()

: 

In [None]:
# Checking the cell values

df4.info()

: 

In [None]:
# Confirming the number of empty cells

df4.isna().sum()

: 

The "HeadQuarter" column has one empty cell which will be filled with N/A when inspecting the column. While the Amount($) column has three empty cells which will be filled with zero when inspecting the column.

In [None]:
# Checking for duplicate rows

df4.duplicated().sum()

: 

The dataset has thirty-one duplicate rows

In [None]:
# Dropping the duplicate row

df4 = df4.drop_duplicates()

# Confirming that the duplicate rows have been dropped

df4.duplicated().sum()

: 

In [None]:
df4.info()

: 

After dropping the duplicate rows, the number of rows reduced from 1209 to 1178.

# Checking the data quality

In [None]:
# Inspecting the "Company/Brand" column

df4["Company/Brand"].unique()

: 

In [None]:
# Inspecting the "HeadQuarter" column

df4["HeadQuarter"].unique()

: 

In [None]:
# Filling the empty cells in the "HeadQuarter" column with N/A

df4["HeadQuarter"]= df4["HeadQuarter"].fillna('N/A')
df4["HeadQuarter"].unique()

: 

In [None]:
# Replacing 'Small Towns, Andhra Pradesh' with 'Andhra Pradesh'

df4["HeadQuarter"] = df4["HeadQuarter"].replace('Small Towns, Andhra Pradesh', 'Andhra Pradesh', regex=True)
df4["HeadQuarter"].unique()

: 

Some cell values are inappropriate for the "HeadQuarter" column. This will be handled by examining the rows involved.

In [None]:
# Identifying the rows that have 'Computer Games' on the "HeadQuarter" Column

df4.loc[df4["HeadQuarter"] == 'Computer Games']

: 

This row does not have accurate information on the columns. Having previously dropped some columns on the dataset, it is important to reload the original dataset with a different variable name to see all the information on this row for all the original columns.

In [None]:
# loading the original dataset using a different variable name

ddf4 = pd.read_csv('startup_funding2021.csv')

: 

In [None]:
# Finding the row on the original dataset with 'Computer Games' in the "HeadQuarters" column

ddf4.loc[ddf4["HeadQuarter"] == 'Computer Games']

: 

This row is one of the duplicated rows, and the duplicate row had been dropped.

The "HeadQuarter" value is unknown and will be filled with 'N/A'. The "Sector" value is 'Computer Games', while the Amount value is $1200000.

In [None]:
# Filling in the correct values in the row with 'Computer Games' in the "HeadQuarters" column

df4.loc[df4["Company/Brand"] == "FanPlay", ["HeadQuarter","Amount($)"]] = ['N/A','$1200000',]
df4.loc[df4["Company/Brand"] == "FanPlay"]

: 

In [None]:
# Identifying the rows that have 'Food & Beverages' on the "HeadQuarter" Column

df4.loc[df4["HeadQuarter"] == 'Food & Beverages']

: 

The "HeadQuarter" value is 'Hauz Khas'. While the "Sector" value is 'Food & Beverages'.

In [None]:
# Filling in the correct values for the "HeadQuarters" column and the "Sector" column

df4.loc[df4["Company/Brand"] == "MasterChow", ['HeadQuarter',"Sector"]] = ['Hauz Khas','Food & Beverages']
df4.loc[df4["Company/Brand"] == "MasterChow"]

: 

In [None]:
# Identifying the rows that have 'Pharmaceuticals\t#REF!' on the "HeadQuarter" Column

df4.loc[df4["HeadQuarter"] == 'Pharmaceuticals\t#REF!']

: 

In [None]:
# Finding the row on the original dataset with 'Pharmaceuticals\t#REF!' in the "HeadQuarters" column

ddf4.loc[ddf4["HeadQuarter"] == 'Pharmaceuticals\t#REF!']

: 

This row is one of the duplicated rows, and the duplicate row had been dropped.

The "HeadQuarter" value is unknown and will be filled with 'N/A'. The "Sector" value is 'Pharmaceuticals', while the Amount value is $22000000.

In [None]:
# Filling in the correct values in the row with 'Pharmaceuticals\t#REF!' in the "HeadQuarters" column

df4.loc[df4["Company/Brand"] == "Fullife Healthcare", ["HeadQuarter", "Sector", "Amount($)"]] = ['N/A', 'Pharmaceuticals', '$22000000',]
df4.loc[df4["Company/Brand"] == "Fullife Healthcare"]

: 

In [None]:
# Identifying the rows that have 'Gurugram\t#REF!' on the "HeadQuarter" Column

df4.loc[df4["HeadQuarter"] == 'Gurugram\t#REF!']

: 

In [None]:
# Finding the row on the original dataset with 'Gurugram\t#REF!' in the "HeadQuarters" column

ddf4.loc[ddf4["HeadQuarter"] == 'Gurugram\t#REF!']

: 

The "HeadQuarter" value is 'Gurugram', the "Sector" value is unknown and will be filled with 'N/A' and the "Amount" column value is $5000000.

In [None]:
# Filling in the correct values in the row with 'Gurugram\t#REF!' in the "HeadQuarters" column

df4.loc[df4["Company/Brand"] == "MoEVing", ["HeadQuarter", "Sector", "Amount($)"]] = ['Gurugram',  'N/A', ' $5000000']
df4.loc[df4["Company/Brand"] == "MoEVing"]

: 

In [None]:
# Identifying the rows that have 'Online Media\t#REF!' on the "HeadQuarter" Column

df4.loc[df4["HeadQuarter"] == 'Online Media\t#REF!']

: 

In [None]:
# Finding the row on the original dataset with 'Online Media\t#REF!' in the "HeadQuarters" column

ddf4.loc[ddf4["HeadQuarter"] == 'Online Media\t#REF!']

: 

The "HeadQuarter" value is unknown and will be filled with 'N/A', the "Sector" value is Online Media, and the "Amount" column value is undisclosed and will be filled with zero

In [None]:
# Filling in the correct values in the row with 'Online Media\t#REF!' in the "HeadQuarters" column

df4.loc[df4["Company/Brand"] == "Sochcast", ["HeadQuarter", "Sector", "Amount($)"]] = ['N/A',  ' Online Media', '0']
df4.loc[df4["Company/Brand"] == "Sochcast"]

: 

In [None]:
# Identifying the rows that have 'Information Technology & Services' on the "HeadQuarter" Column

df4.loc[df4["HeadQuarter"] == 'Information Technology & Services']

: 

In [None]:
# Filling in the correct values in the row with 'Computer Games' in the "HeadQuarters" column

df4.loc[df4["Company/Brand"] == "Peak", ["HeadQuarter","Sector"]] = ['Manchester','Information Technology & Services']
df4.loc[df4["Company/Brand"] == "Peak"]

: 

In [None]:
# Inspecting the "HeadQuarter" column again

df4["HeadQuarter"].unique()

: 

In [None]:
# Replacing Bangalore with Bengaluru

df4["HeadQuarter"] = df4["HeadQuarter"].replace('Bangalore', 'Bengaluru', regex=True)
df4["HeadQuarter"].unique()

: 

In [None]:
# Inspecting the "Sector" column

df4["Sector"].unique()

: 

In [None]:
# Replacing Helathcare with Healthcare

df4["Sector"] = df4["Sector"].replace('Helathcare', 'Healthcare', regex=True)

: 

In [None]:
# Replacing Aeorspace with Aerospace

df4["Sector"] = df4["Sector"].replace('Aeorspace', 'Aerospace', regex=True)

: 

In [None]:
# Inspecting the "Amount($)" column

df4["Amount($)"].unique()

: 

In [None]:
# Removing the dollar and comma symbols in the Amount($) column

df4["Amount($)"] = df4["Amount($)"].replace(['\$', ','], '', regex=True)
df4["Amount($)"].unique()

: 

In [None]:
# Creating a boolean mask to identify non-integer values in Amount($) column
non_int_mask = df4["Amount($)"].apply(lambda x: not str(x).isdigit())

# Creating a new DataFrame with only the non-integer values
non_int_df4_amounts = df4[non_int_mask]

: 

In [None]:
# Inspecting the values in the "Amount($)" column of the non-integer DataFrame

non_int_df4_amounts["Amount($)"].unique()

: 

In [None]:
# Replacing 'Undisclosed', 'undisclosed' and '' with zero

df4["Amount($)"] = df4["Amount($)"].replace(['Undisclosed', 'undisclosed', ''], 0, regex=True)

: 

In [None]:
# Filling the empty cells in the "Amount($)" column with zero

df4["Amount($)"] = df4["Amount($)"].fillna(0)
df4["Amount($)"].unique()

: 

In [None]:
# Replacing ' 5000000' with 5000000

df4["Amount($)"] = df4["Amount($)"].replace(' 5000000', 5000000, regex=True)

: 

In [None]:
# Identifying the rows that have 'ah! Ventures' on the "Amount($)" Column on the original dataset

ddf4.loc[ddf4["Amount($)"] == 'ah! Ventures']

: 

In [None]:
# Filling in the correct values in the row with 'ah! Ventures' in the "Amount($)" column of df4

df4.loc[df4["Company/Brand"] == "Little Leap", "Amount($)"] = 300000
df4.loc[df4["Company/Brand"] == "Little Leap"]

: 

In [None]:
# Identifying the rows that have 'Pre-series A' on the "Amount($)" Column on the original dataset

ddf4.loc[ddf4["Amount($)"] == 'Pre-series A']

: 

In [None]:
# Filling in the correct values in the row with 'Pre-series A' in the "Amount($)" column of df4

df4.loc[df4["Company/Brand"] == "AdmitKard", "Amount($)"] = 1000000
df4.loc[df4["Company/Brand"] == "AdmitKard"]

: 

In [None]:
# Creating another boolean mask to identify the remaining non-integer values in Amount($) column
non_int_mask1 = df4["Amount($)"].apply(lambda x: not str(x).isdigit())

# Creating a new DataFrame with only the non-integer values
non_int_df4_amounts1 = df4[non_int_mask1]
non_int_df4_amounts1

: 

In [None]:
# Identifying the rows that have 'ITO Angel Network LetsVenture' on df4

df4.loc[df4["Amount($)"] == 'ITO Angel Network LetsVenture']

: 

In [None]:
# Identifying the rows that have 'BHyve' on the "Company/Brand" Column on the original dataset

ddf4.loc[ddf4["Company/Brand"] == 'BHyve']

: 

In [None]:
# Filling in the correct values in the row with 'ITO Angel Network LetsVenture' in the "Amount($)" column of df4

df4.loc[df4["Company/Brand"] == "BHyve", "Amount($)"] = 300000
df4.loc[df4["Company/Brand"] == "BHyve"]

: 

In [None]:
# Identifying the rows that have 'JITO Angel Network LetsVenture' on df4

df4.loc[df4["Amount($)"] == 'JITO Angel Network LetsVenture']

: 

In [None]:
# Identifying the rows that have 'BHyve' on the "Company/Brand" Column on the original dataset

ddf4.loc[ddf4["Company/Brand"] == 'Saarthi Pedagogy']

: 

In [None]:
# Filling in the correct values in the row with 'JITO Angel Network LetsVenture' in the "Amount($)" column of df4

df4.loc[df4["Company/Brand"] == "Saarthi Pedagogy", "Amount($)"] = 1000000
df4.loc[df4["Company/Brand"] == "Saarthi Pedagogy"]

: 

In [None]:
# Identifying the rows that have 'Seed' on the "Amount($)" Column on the original dataset

ddf4.loc[ddf4["Amount($)"] == 'Seed']

: 

In [None]:
# Filling in the correct values in the rows with 'Seed' in the "Amount($)" column of df4

df4.loc[df4["Company/Brand"] == "MoEVing", "Amount($)"] = 5000000
df4.loc[df4["Company/Brand"] == "Godamwale", "Amount($)"] = 1000000

: 

In [None]:
# Converting the datatype of the Amount($) column to integer

df4["Amount($)"] = df4["Amount($)"].astype("int64")
df4["Amount($)"].unique()

: 

In [None]:
# Evaluating the entire dataset

df4.head()

: 

In [None]:
# Confirming that there are no empty cells

df4.isna().sum()

: 

There are no empty cells in our dateset, and the columns have been inspected and cleaned.

Next is to add the "Funding Year" column.

In [None]:
# Creating a new column named "Funding Year" and filling it with 2021

df4["Funding Year"] = 2021
df4.head()

: 

# Merging the datasets

In [None]:
# Using pd.concat to merge

sets = [df1, df2, df3, df4]
df = pd.concat(sets)
df

: 

# Performing EDA

In [None]:
df.info()

: 

There are no empty cells on the dataset.

In [None]:
# Inspecting the "HeadQuarter" column thoroughly

df["HeadQuarter"].unique()

: 

In [None]:
# changing 'Gurgaon' to 'Gurugram' since Gurugram is the official name.
df["HeadQuarter"] = df["HeadQuarter"].replace('Gurgaon', 'Gurugram', regex=True)

# changing 'New Delhi' to 'Delhi' since it's the same city.
df["HeadQuarter"] = df["HeadQuarter"].replace('New Delhi', 'Delhi', regex=True)

# Changing 'Bengaluru City' to 'Bengaluru', since it's the same city.
df["HeadQuarter"] = df["HeadQuarter"].replace('Bengaluru City', 'Bengaluru', regex=True)

# Changing 'Telugana' to 'Telangana', since Telangana is the offical name.
df["HeadQuarter"] = df["HeadQuarter"].replace('Telugana', 'Telangana', regex=True)

# Changing 'Faridabad, Haryana' to 'Faridabad'.
df["HeadQuarter"] = df["HeadQuarter"].replace('Faridabad, Haryana', 'Faridabad', regex=True)

# Changing 'Irvine', 'San Francisco', 'San Ramon', 'San Francisco Bay Area', 'San Franciscao' (misspelling)
# and 'Mountain View, CA' to California, since they are all in California.
df["HeadQuarter"] = df["HeadQuarter"].replace(['Irvine', 'San Francisco', 'San Ramon', 'San Francisco Bay Area', 'San Franciscao', 'Mountain View, CA'], 'California', regex=True)

# Changing 'Frisco' and 'Plano' to 'Texas', since they are both in Texas.
df["HeadQuarter"] = df["HeadQuarter"].replace(['Frisco', 'Plano'], 'Texas', regex=True)

# Changing 'Samstipur' to 'Samastipur', since Samstipur is a misspelling.
df["HeadQuarter"] = df["HeadQuarter"].replace('Samsitpur', 'Samastipur', regex=True)

# Changing 'France' to 'Paris', since the "HeadQuarter" column carries names of cities, not countries.
df["HeadQuarter"] = df["HeadQuarter"].replace('France', 'Paris', regex=True)

# Changing 'Milano' to 'Milan' the official name.
df["HeadQuarter"] = df["HeadQuarter"].replace('Milano', 'Milan', regex=True)

# Changing 'Orissia' to 'Odisha', since Odisha is the offical name.
df["HeadQuarter"] = df["HeadQuarter"].replace('Orissia', 'Odisha', regex=True)

df["HeadQuarter"].unique()

: 

In [None]:
df["HeadQuarter"] = df["HeadQuarter"].replace('California Bay Area', 'California', regex=True)

df["HeadQuarter"].unique()

: 

In the HeadQuarter column, we have the names of states and also have the names of cities that are within those states in the same column. For example,  Uttar Pradesh is a state in the HeadQuarter column while Noida, Kanpur and Ghaziabad are cities in Uttar Pradesh also in the HeadQuarter column. This means that when analysing the HeadQuarter column, the true values of funding in Uttar Pradesh will not be obtained. This applies to some other states as well.

To handle this challenge, the names of cities will be replaced with the names of states they are found in. Doing so will give the true information of funding in each state, which will make the analysis more valid.

In [None]:
# Creating a dictionary called "state-mapping" where each city is placed in the appropriate state.

state_mapping = {
    'Ahmedabad': 'Gujarat',
    'Alwar': 'Rajasthan',
    'Alleppey': 'Kerala',
    'Ambernath': 'Maharashtra',
    'Anand': 'Gujarat',
    'Andheri': 'Maharashtra',
    'Azadpur': 'Delhi',
    'Bengaluru': 'Karnataka',
    'Belgaum': 'Karnataka',
    'Bhilwara': 'Rajasthan',
    'Bhopal': 'Madhya Pradesh',
    'Bhubaneswar': 'Odisha',
    'Chennai': 'Tamil Nadu',
    'Cochin': 'Kerala',
    'Dehradun': 'Uttarakhand',
    'Dhingsara': 'Haryana',
    'Coimbatore': 'Tamil Nadu',
    'Ernakulam': 'Kerala',
    'Faridabad': 'Haryana',
    'Gandhinagar': 'Gujarat',
    'Ghaziabad': 'Uttar Pradesh',
    'Guindy': 'Tamil Nadu',
    'Gurugram': 'Haryana',
    'Guwahati': 'Assam',
    'Hauz Khas': 'Delhi',
    'Hubli': 'Karnataka',
    'Hyderabad': 'Telangana',
    'Indore': 'Madhya Pradesh',
    'Jaipur': 'Rajasthan',
    'Jodhpur': 'Rajasthan',
    'Kalkaji': 'Delhi',
    'Kalpakkam': 'Tamil Nadu',
    'Kannur': 'Kerala',
    'Kanpur': 'Uttar Pradesh',
    'Kochi': 'Kerala',
    'Kolkata': 'West Bengal',
    'Kota': 'Rajasthan',
    'Kormangala': 'Karnataka',
    'Kottayam': 'Kerala',
    'Lucknow': 'Uttar Pradesh',
    'Ludhiana': 'Punjab',
    'Margão': 'Goa',
    'Mangalore': 'Karnataka',
    'Mohali': 'Punjab',
    'Mumbai': 'Maharashtra',
    'Mylapore': 'Tamil Nadu',
    'Nagpur': 'Maharashtra',
    'Noida': 'Uttar Pradesh',
    'Panaji': 'Goa',
    'Panchkula': 'Haryana',
    'Patna': 'Bihar',
    'Powai': 'Maharashtra',
    'Pune': 'Maharashtra',
    'Rajsamand': 'Rajasthan',
    'Ranchi': 'Jharkhand',
    'Roorkee': 'Uttarakhand',
    'Samastipur': 'Bihar',
    'Santra': 'Chhattisgarh',
    'Satara': 'Maharashtra',
    'Surat': 'Gujarat',
    'Thane': 'Maharashtra',
    'The Nilgiris': 'Tamil Nadu',
    'Thiruvananthapuram': 'Kerala',
    'Tirunelveli': 'Tamil Nadu',
    'Trivandrum': 'Kerala',
    'Tumkur': 'Karnataka',
    'Vadodara': 'Gujarat',
    'Warangal': 'Telangana',
    'Worli': 'Maharashtra'
    }

: 

Chandigarh is the shared capital of the states of Haryana and Punjab, while Silvassa lies between the states of Gujarat and Maharashtra. Hence they are not mapped to any state name.

In [None]:
# Replacing each city in the dictionary with the corresponding state name, and when a location (a state in India or a 
# location outside India) is not found in the dictionary, the location itself is returned.

df["HeadQuarter"] = [state_mapping.get(city, city) for city in df["HeadQuarter"]]
df["HeadQuarter"].unique()

: 

In [None]:
# Inspecting the "Sector" column thoroughly

df["Sector"].unique()

: 

The "Sector" column contains many similar kinds of businesses as seperate sectors. For example, 'food', 'food and beverages', 'beverages', 'food industry' etc are similar but are represented as seperate sectors in the "Sector" column. Other sectors have similar occurences. There is need to map these related sectors together.

In [None]:
sector_mapping = {'3D Printing': 'Information Technology (IT)',
                  'Accomodation': 'Real Estate',
                  'Accounting': 'Banking',
                  'Ad-tech': 'Information Technology (IT)',
                  'Advertisement': 'Media and Entertainment',
                  'Advertising': 'Media and Entertainment',
                  'Advisory firm': 'Career and Personal Development',
                  'Aero company': 'Transportation and Tourism',
                  'Aerospace': 'Transportation and Tourism',
                  'Agri tech': 'Information Technology (IT)',
                  'Agriculture': 'Agriculture and Food Production',
                  'AgriTech': 'Information Technology (IT)',
                  'AgTech': 'Agriculture and Food Production',
                  'AI': 'Information Technology (IT)',
                  'Ai': 'Information Technology (IT)',
                  'Ai & Tech': 'Information Technology (IT)',
                  'AI and Tech': 'Information Technology (IT)',
                  'AI and Tech': 'Information Technology (IT)',
                  'AI Company': 'Information Technology (IT)',
                  'AI startup': 'Information Technology (IT)',
                  'AI & Data science': 'Information Technology (IT)',
                  'AI & Debt': 'Information Technology (IT)',
                  'AI & Deep learning': 'Information Technology (IT)',
                  'AI & Media': 'Information Technology (IT)',
                  'AI & Tech': 'Information Technology (IT)',
                  'AI and Tech': 'Information Technology (IT)',
                  'AI Chatbot': 'Information Technology (IT)',
                  'AI company': 'Information Technology (IT)',
                  'AI health': 'Information Technology (IT)',
                  'AI platform': 'Information Technology (IT)',
                  'AI Platform': 'Information Technology (IT)',
                  'AI Robotics': 'Information Technology (IT)',
                  'AI Startup': 'Information Technology (IT)',
                  'Agritech': 'Information Technology (IT)',
                  'Agritech startup': 'Information Technology (IT)',
                  'AgriTech': 'Information Technology (IT)',
                  'Agritech/Commerce': 'Information Technology (IT)',
                  'Air Transportation': 'Transportation & Tourism',
                  'Alternative Medicine': 'Healthcare & Fitness',
                  'Analytics': 'Information Technology (IT)',
                  'Android': 'Information Technology (IT)',
                  'API platform': 'Information Technology (IT)',
                  'Appliance': 'Information Technology (IT)',
                  'Apparel & Fashion': 'Fashion and Beauty',
                  'Apps': 'Information Technology (IT)',
                  'AR platform': 'Information Technology (IT)',
                  'AR startup': 'Information Technology (IT)',
                  'AR/VR': 'Information Technology (IT)',
                  'AR/VR startup': 'Information Technology (IT)',
                  'Artificial Intelligence': 'Information Technology (IT)',
                  'Arts & Crafts': 'Cultural Heritage',
                  'Audio': 'Media and Entertainment',
                  'Augmented reality': 'Information Technology (IT)',
                  'Auto-tech': 'Manufacturing',
                  'Automation': 'Manufacturing',
                  'Automation tech': 'Manufacturing',
                  'Automobile': 'Manufacturing',
                  'Automobile & Technology': 'Manufacturing',
                  'Automobile Technology': 'Manufacturing',
                  'Automobile Technology': 'Manufacturing',
                  'Automobiles': 'Manufacturing',
                  'Automotive': 'Manufacturing',
                  'Automotive Startup': 'Manufacturing',
                  'Automotive Tech': 'Manufacturing',
                  'Automotive and Rentals': 'Manufacturing',
                  'Automotive company': 'Manufacturing',
                  'Automotive tech': 'Manufacturing',
                  'Autonomous Vehicles': 'Manufacturing',
                  'Aviation': 'Transportation & Tourism',
                  'Aviation & Aerospace': 'Transportation & Tourism',
                  'Ayurveda tech': 'Healthcare and Fitness',
                  'B2B': 'E-commerce and Retail',
                  'B2B Agritech': 'E-Commerce and Retail',
                  'B2B E-commerce': 'E-Commerce and Retail',
                  'B2B Ecommerce': 'E-Commerce and Retail',
                  'B2B Marketplace': 'E-Commerce and Retail',
                  'B2B Manufacturing': 'E-Commerce and Retail',
                  'B2B Service': 'E-Commerce and Retail',
                  'B2B startup': 'E-Commerce and Retail',
                  'B2B Supply Chain': 'E-Commerce and Retail',
                  'B2B Travel': 'E-Commerce and Retail',
                  'B2B marketplace': 'E-Commerce and Retail',
                  'B2B service': 'E-Commerce and Retail',
                  'B2B startup': 'E-Commerce and Retail',
                  'Basketball': 'Healthcare and Fitness',
                  'Battery': 'Manufacturing',
                  'Battery design': 'Manufacturing',
                  'Battery manufacturer': 'Manufacturing',
                  'Beauty': 'Fashion and Beauty',
                  'Beauty & wellness': 'Fashion and Beauty',
                  'Beauty products': 'Fashion and Beauty',
                  'Bevarages': 'Agriculture and Food Production',
                  'Beverage': 'Agriculture and Food Production',
                  'Beverages': 'Agriculture and Food Production',
                  'Big Data': 'Information Technology (IT)',
                  'Bike Rental': 'E-Commerce and Retail',
                  'Bike marketplace': 'E-Commerce and Retail',
                  'Biopharma': 'Healthcare and Fitness',
                  'Biotech': 'Information Technology (IT)',
                  'BioTechnology': 'Information Technology (IT)',
                  'Biotechnology': 'Information Technology (IT)',
                  'Biotechnology company': 'Information Technology (IT)',
                  'Biomaterial startup': 'Manufacturing',
                  'Blockchain startup': 'Blockchain',
                  'Blogging': 'E-Commerce and Retail',
                  'B2B Marketplace': 'E-Commerce and Retail',
                  'Brand Marketing': 'E-Commerce and Retail', 
                  'Broadcasting': 'Media and Entertainment',
                  'B2B': 'E-Commerce and Retail',
                  'Business Development': 'E-Commerce and Retail',
                  'Business Intelligence': 'Information Technology (IT)',
                  'Business Supplies & Equipment': 'E-Commerce and Retail',
                  'Business Travel': 'Transportation and Tourism',
                  'Business and Rentals': 'E-Commerce and Retail',
                  'Business company': 'E-Commerce and Retail',
                  'Business conglomerate company': 'E-Commerce and Retail',
                  'Business platform': 'E-Commerce and Retail',
                  'Business software': 'Information Technology (IT)',
                  'Cannabis startup': 'Healthcare and Fitness',
                  'Capital Markets': 'Banking',
                  'Car Service': 'Transportation and Tourism',
                  'Car Trade': 'E-Commerce and Retail',
                  'Career Planning': 'Career and Personal Development',
                  'Catering': 'E-Commerce and Retail',
                  'Celebrity Engagement': 'Career and Personal Development',
                  'Child Care': 'Healthcare and Fitness',
                  'Children': 'Career and Personal Development',
                  'Classifieds': 'Media and Entertainment',
                  'Clean Energy': 'Energy',
                  'CleanTech': 'Information Technology (IT)',
                  'Cleantech': 'Information Technology (IT)',
                  'Clothing': 'Fashion and Beauty',
                  'Cloud Computing': 'Information Technology (IT)',
                  'Cloud Infrastructure': 'AI and Tech',
                  'Cloud Kitchen': 'Information Technology (IT)',
                  'Cloud kitchen': 'Information Technology (IT)',
                  'Cloud company': 'Information Technology (IT)',
                  'Collaboration': 'Career and Personal Development',
                  'Coworking': 'Career and Personal Development',
                  'Co-living': 'Career and Personal Development',
                  'Co-working': 'Career and Personal Development',
                  'Co-working Startup': 'Career and Personal Development',
                  'Commerce': 'E-Commerce and Retail',
                  'Commercial': 'E-Commerce and Retail',
                  'Commercial Real Estate': 'E-Commerce and Retail',
                  'Communities': 'Social Development',
                  'Community': 'Social Development',
                  'Community platform': 'Media and Entertainment',
                  'Conversational AI platform': 'Media and Entertainment',
                  'Company-as-a-Service': 'E-Commerce and Retail',
                  'Computer': 'Information Technology (IT)',
                  'Computer Games': 'Media and Entertainment',
                  'Computer Software': 'Information Technology (IT)',
                  'Computer software': 'Information Technology (IT)',
                  'Computer & Network Security': 'Information Technology (IT)',
                  'Construction tech': 'Construction',
                  'Consultancy': 'Career and Personal Development',
                  'Consulting': 'Career and Personal Development',
                  'Consumer': 'E-Commerce and Retail',
                  'Consumer Applications': 'Information Technology (IT)',
                  'Consumer Electronics': 'Information Technology (IT)',
                  'Consumer Goods': 'E-Commerce and Retail',
                  'Consumer Lending': 'Banking',
                  'Consumer Service': 'E-Commerce and Retail',
                  "Consumer service": 'E-Commerce and Retail',
                  'Consumer Services': 'E-Commerce and Retail',
                  'Consumer appliances': 'Manufacturing',
                  'Consumer goods': 'E-Commerce and Retail',
                  'Consumer lending': 'Banking',
                  'Consumer software': 'AI and Tech',
                  'Content Marktplace': 'E-Commerce and Retail',
                  'Content commerce': 'E-Commerce and Retail',
                  'Content creation': 'Career and Personal Development',
                  'Content management': 'Career and Personal Development',
                  'Content marketplace': 'E-Commerce and Retail',
                  'Content publishing': 'E-Commerce and Retail',
                  'Continuing Education': 'Education',
                  'Cooking': 'Agriculture and Food Production',
                  'Cosmetics': 'Fashion and Beauty',
                  'Craft Beer': 'Cultural Heritage',
                  'Creative Agency': 'Career and Personal Development',
                  'Credit': 'Banking',
                  'Credit Cards': 'Banking',
                  'CRM': 'Career and Personal Development',
                  'Crowdfunding': 'Banking',
                  'Crowdsourcing': 'Banking',
                  'Crypto': 'Blockchain',
                  'Cryptocurrency': 'Blockchain',
                  'Cultural': 'Cultural Heritage',
                  'Customer Service': 'Career and Personal Development',
                  'Customer service company': 'Career and Personal Development',
                  'Cybersecurity': 'Information Technology (IT)',
                  'D2C': 'E-Commerce and Retail',
                  'D2C Business': 'E-Commerce and Retail',
                  'D2C Fashion': 'E-Commerce and Retail',
                  'D2C jewellery': 'E-Commerce and Retail',
                  'D2C startup': 'E-Commerce and Retail',
                  'Dairy': 'Agriculture and Food Production',
                  'Dairy startup': 'Agriculture and Food Production',
                  'Data Analytics': 'Information Technology (IT)',
                  'Data Intelligence': 'Information Technology (IT)',
                  'Data Science': 'Information Technology (IT)',
                  'Deep Tech': 'Information Technology (IT)',
                  'Deep Tech AI': 'Information Technology (IT)',
                  'Dating': 'Social Development',
                  'Dating app': 'Media and Entertainment',
                  'Delivery': 'Transportation and Tourism',
                  'Delivery Service': 'Transportation and Tourism',
                  'Delivery service': 'Transportation and Tourism',
                  'Dental': 'Healthcare and Fitness',
                  'Deeptech': 'Information Technology (IT)',
                  'Deeptech startup': 'Information Technology (IT)',
                  'Defense & Space': 'Defense',
                  'Defense tech': 'Defense',
                  'Deisgning': 'Career and Personal Development',
                  'Design': 'Career and Personal Development',
                  'Dietary Supplements': 'Health & Fitness',
                  'Digital mortgage': 'Banking',
                  'Digital platform': 'Media and Entertainment',
                  'Digital tech': 'Media and Entertainment',
                  'Digital Entertainment': 'Media and Entertainment',
                  'Digital Marketing': 'Media and Entertainment',
                  'Digital Media': 'Media and Entertainment',
                  'Drone': 'Media and Entertainment',
                  'E store': 'E-Commerce and Retail',
                  'Estore': 'E-Commerce and Retail',
                  'E-Sports': 'Healthcare and Fitness',
                  'E tailor': 'E-Commerce and Retail',
                  'E-Commerce': 'E-Commerce and Retail',
                  'E-commerce': 'E-Commerce and Retail',
                  'E-Commerce & Ar': 'E-Commerce and Retail',
                  'E-Commerce Platforms': 'E-Commerce and Retail',
                  'E-market': 'E-Commerce and Retail',
                  'E-Marketplace': 'E-Commerce and Retail',
                  'E-connect': 'E-Commerce and Retail',
                  'E-learning': 'Education',
                  'E-Mobility': 'Transportation and Tourism',
                  'E-mobility': 'Transportation and Tourism',
                  'E-tail': 'E-Commerce and Retail',
                  'eCommerce': 'E-Commerce and Retail',
                  'Ecommerce': 'E-Commerce and Retail',
                  'EdTech': 'Education',
                  'Edtech': 'Education',
                  'EdTech Startup': 'Education',
                  'EdtTech': 'Education',
                  'Educational Technology': 'Education',
                  'Education Management': 'Education',
                  'E-Learning': 'Education',
                  'Electric Vehicle': 'Manufacturing',
                  'Electricity': 'Manufacturing',
                  'Electronics': 'Manufacturing',
                  'Embedded Systems': 'Information Technology (IT)',
                  'eMobility': 'Transportation and Tourism',
                  'Enterprise Resource Planning (ERP)': 'Information Technology (IT)',
                  'Enterprise Software': 'Information Technology (IT)',
                  'Entertainment': 'Media and Entertainment',
                  'Entreprenurship': 'E-Commerce and Retail',
                  'Environment': 'Water and Environment',
                  'Environmental Consulting': 'Career and Personal Development',
                  'Environmental service': 'Water and Environment',
                  'Environmental Services': 'Water and Environment',
                  'Equity Management': 'Banking',
                  'Escrow': 'E-Commerce and Retail',
                  'eSports': 'Healthcare and Fitness',
                  'ETech': 'Education',
                  'EV': 'Manufacturing',
                  'EV startup': 'Manufacturing',
                  'Events': 'Events and Hospitality',
                  'Eyeglasses': 'Fashion and Beauty',
                  'Eyewear': 'Fashion and Beauty',
                  'Eye Wear': 'Fashion and Beauty',
                  'Eye wear': 'Fashion and Beauty',
                  'Facilities Services': 'Career and Personal Development',
                  'Facilities Support Services': 'Career and Personal Development',
                  'Fantasy Sports': 'Healthcare and Fitness',
                  'Fantasy sports': 'Healthcare and Fitness',
                  'Farming': 'Agriculture and Food Production',
                  'Fashion': 'Fashion and Beauty',
                  'Fashion & Lifestyle': 'Fashion and Beauty',
                  'Fashion and lifestyle': 'Fashion and Beauty',
                  'Fashion startup': 'Fashion and Beauty',
                  'Fashion Tech': 'Fashion and Beauty',
                  'FemTech': 'Information Technology (IT)',
                  'Femtech': 'Information Technology (IT)',
                  'Fertility tech': 'Healthcare and Fitness',
                  'File Sharing': 'Information Technology (IT)',
                  'Finance': 'Banking',
                  'Finance company': 'Banking',
                  'Financial Services': 'Banking',
                  'FinTech': 'Banking',
                  'Fintech': 'Banking',
                  'Fishery': 'Agriculture and Food Production',
                  'Fitness': 'Healthcare and Fitness',
                  'Fitness startup': 'Healthcare and Fitness',
                  'FM': 'Career and Personal Development',
                  'FMCG': 'Manufacturing',
                  'Food': 'Agriculture and Food Production',
                  'Food & Bevarages': 'Agriculture and Food Production',
                  'Food & Beverages': 'Agriculture and Food Production',
                  'Food & Logistics': 'Agriculture and Food Production',
                  'Food & Nutrition': 'Agriculture and Food Production',
                  'Food & Tech': 'Agriculture and Food Production',
                  'Food and Beverages': 'Agriculture and Food Production',
                  'Food and Beverage': 'Agriculture and Food Production',
                  'Food Delivery': 'Agriculture and Food Production',
                  'Food delivery': 'Agriculture and Food Production',
                  'Food devlivery': 'Agriculture and Food Production',
                  'Food diet': 'Agriculture and Food Production',
                  'Food Industry': 'Agriculture and Food Production',
                  'Food Processing': 'Agriculture and Food Production',
                  'Food Production': 'Agriculture and Food Production',
                  'Food Startup': 'Agriculture and Food Production',
                  'Food Tech': 'Agriculture and Food Production',
                  'Food tech': 'Agriculture and Food Production',
                  'Food Technology': 'Agriculture and Food Production',
                  'FoodTech': 'Agriculture and Food Production',
                  'Foodtech': 'Agriculture and Food Production',
                  'Foodtech & Logistics': 'Agriculture and Food Production',
                  'Foootwear': 'Fashion and Beauty',
                  'Fragrance': 'Fashion and Beauty',
                  'Fraud Detection': 'Information Technology (IT)',
                  'Funding Platform': 'Banking',
                  'Furniture': 'Manufacturing',
                  'Furniture Rental': 'E-Commerce and Retail',
                  'Fusion beverages': 'Agriculture and Food Production',
                  'Gaming': 'Media and Entertainment',
                  'Games': 'Media and Entertainment',
                  'Gaming startup': 'Media and Entertainment',
                  'Government': 'Social Development',
                  'Graphics': 'Media and Entertainment',
                  'Hardware': 'Information Technology (IT)',
                  'Health': 'Healthcare and Fitness',
                  'Health and Wellness': 'Healthcare and Fitness',
                  'Health care': 'Healthcare and Fitness',
                  'Health Care': 'Healthcare and Fitness',
                  'Health Diagnostics': 'Healthcare and Fitness',
                  'Health Insurance': 'Healthcare and Fitness',
                  'Health Tech': 'Healthcare and Fitness',
                  'Health, Wellness & Fitness': 'Healthcare and Fitness',
                  'Healthcare': 'Healthcare and Fitness',
                  'Healthcare/Edtech': 'Healthcare and Fitness',
                  'HealthTech': 'Healthcare and Fitness',
                  'Health and Fitness': 'Healthcare and Fitness',
                  'Health & Fitness': 'Healthcare and Fitness',
                  'Health & Wellness': 'Healthcare and Fitness',
                  'HealthCare': 'Healthcare and Fitness',
                  'Healthcare & Fitness': 'Healthcare and Fitness',
                  'Healthtech': 'Healthcare and Fitness',
                  'Healtcare': 'Healthcare and Fitness',
                  'Heathcare': 'Healthcare and Fitness',
                  'HeathTech': 'Healthcare and Fitness',
                  'Higher Education': 'Education',
                  'Home Decor': 'Career and Personal Development',
                  'Home Design': 'Career and Personal Development',
                  'Home interior services': 'Career and Personal Development',
                  'Home services': 'Career and Personal Development',
                  'Hospital': 'Healthcare and Fitness',
                  'Hospital & Health Care': 'Healthcare and Fitness',
                  'Hospitality': 'Events and Hospitality',
                  'Housing': 'Real Estate',
                  'Housing & Rentals': 'Real Estate',
                  'Housing Marketplace': 'Real Estate',
                  'HR': 'Career and Personal Development',
                  'HR Tech': 'Career and Personal Development',
                  'HR tech': 'Career and Personal Development',
                  'Hr Tech': 'Career and Personal Development',
                  'Human Resources': 'Career and Personal Development',
                  'HR Tech startup': 'Career and Personal Development',
                  'HrTech': 'Career and Personal Development',
                  'HRTech': 'Career and Personal Development',
                  'Hygiene': 'Healthcare and Fitness',
                  'Hygiene management': 'Healthcare and Fitness',
                  'Industrial': 'Manufacturing',
                  'Industrial Automation': 'Manufacturing',
                  'Information Services': 'Information Technology (IT)',
                  'Information Technology': 'Information Technology (IT)',
                  'Information Technology & Services': 'Information Technology (IT)',
                  'Infratech': 'Information Technology (IT)',
                  'Information Technology (IT)': 'Information Technology (IT)',
                  'Innovation Management': 'Career and Personal Development',
                  'Innovation management': 'Career and Personal Development',
                  'InsureTech': 'Banking',
                  'Insuretech': 'Banking',
                  'Insurance': 'Banking',
                  'Insurance Tech': 'Banking',
                  'Insurance Technology': 'Banking',
                  'Insurance technology': 'Banking',
                  'Insurtech': 'Banking',
                  'Interior & decor': 'Career and Personal Development',
                  'Interior design': 'Career and Personal Development',
                  'Interior Design': 'Career and Personal Development',
                  'Internet': 'Information Technology (IT)',
                  'Internet of Things': 'Information Technology (IT)',
                  'Investment': 'Banking',
                  'Investment': 'Banking',
                  'Investment Banking': 'Banking',
                  'Investment Management': 'Banking',
                  'Investment Tech': 'Banking',
                  'IoT': 'Information Technology (IT)',
                  'Iot': 'Information Technology (IT)',
                  'IoT/Automobile': 'Information Technology (IT)',
                  'IoT platform': 'Information Technology (IT)',
                  'IoT startup': 'Information Technology (IT)',
                  'IT': 'Information Technology (IT)',
                  'IT company': 'Information Technology (IT)',
                  'IT startup': 'Information Technology (IT)',
                  'Last Mile Transportation': 'Transportation and Tourism',
                  'Legal': 'Career and Personal Development',
                  'Legal Services': 'Career and Personal Development',
                  'LegalTech': 'Career and Personal Development',
                  'Legaltech': 'Career and Personal Development',
                  'Legal Tech': 'Career and Personal Development',
                  'Legal tech': 'Career and Personal Development',
                  'Life sciences': 'Education',
                  'Jewellery': 'Fashion and Beauty',
                  'Jewellery startup': 'Fashion and Beauty',
                  'Job discovery platform': 'Media and Entertainment',
                  'Job portal': 'Information Technology (IT)',
                  'Lifestyle': 'Career and Personal Development',
                  'Linguistic Spiritual': 'Career and Personal Development',
                  'Location Analytics': 'Information Technology (IT)',
                  'Logitech': 'AI and Tech',
                  'Logistics': 'Transportation and Tourism',
                  'Logistics & Supply Chain': 'Transportation and Tourism',
                  'Luxury car startup': 'Transportation and Tourism',
                  'Machine Learning': 'Information Technology (IT)',
                  'Management Consulting': 'Career and Personal Development',
                  'Manufacturing startup': 'Manufacturing',
                  'Market Research': 'E-Commerce and Retail',
                  'Marketplace': 'E-Commerce and Retail',
                  'Marketing': 'E-Commerce and Retail',
                  'Marketing & Advertising': 'E-Commerce and Retail',
                  'Marketing & Customer Loyalty': 'E-Commerce and Retail',
                  'Marketing company': 'E-Commerce and Retail',
                  'Marketing startup': 'E-Commerce and Retail',
                  'Maritime': 'Transportation and Tourism',
                  'MarTech': 'Information Technology (IT)',
                  'Matrimony': 'Career and Personal Development',
                  'Mechanical & Industrial Engineering': 'Career and Personal Development',
                  'Mechanical Or Industrial Engineering': 'Career and Personal Development',
                  'Mechanical and Industrial Engineering': 'Career and Personal Development',
                  'Med Tech': 'Healthcare and Fitness',
                  'Media': 'Media and Entertainment',
                  'Media & Networking': 'Media and Entertainment',
                  'Media Tech': 'Healthcare and Fitness',
                  'Medical': 'Healthcare and Fitness',
                  'Medical Device': 'Healthcare and Fitness',
                  'Medical Devices': 'Healthcare and Fitness',
                  'Medical technology': 'Healthcare and Fitness',
                  'Medicine': 'Healthcare and Fitness',
                  'Medtech': 'Healthcare and Fitness',
                  'Menstrual hygiene': 'Healthcare and Fitness',
                  'Mental Health': 'Healthcare and Fitness',
                  'Mental Health Tech': 'Healthcare and Fitness',
                  'Merchandise': 'E-Commerce and Retail',
                  'Microfinance': 'Banking',
                  'Micro-mobiity': 'Manufacturing',
                  'Milk startup': 'Agriculture and Food Production',
                  'MLOps platform': 'Information Technology (IT)',
                  'Mobile': 'Information Technology (IT)',
                  'Mobile App': 'Media and Entertainment',
                  'Mobile Payments': 'Banking',
                  'Mobile Development': 'Media and Entertainment',
                  'Mobile Games': 'Media and Entertainment',
                  'Mobile Tech': 'Media and Entertainment',
                  'Mobile Technology': 'Media and Entertainment',
                  'Mobility': 'Transportation and Tourism',
                  'Mobility Solutions': 'Transportation and Tourism',
                  'Mobility tech': 'Transportation and Tourism',
                  'Mobility/Transport': 'Transportation and Tourism',
                  'Multinational conglomerate company': 'Multinational Conglomerate Company',
                  'Music': 'Media and Entertainment',
                  'Music Streaming': 'Media and Entertainment',
                  'Mutual Funds': 'Banking',
                  'Nano Tech': 'Information Technology (IT)',
                  'Nanotechnology': 'Information Technology (IT)',
                  'Nano Distribution Network': 'Multinational Conglomerate Company',
                  'Natural Language Processing': 'Information Technology (IT)',
                  'Natural Resources': 'Water and Environment',
                  'Natural Sciences': 'Education',
                  'Neo-banking': 'Banking',
                  'Networking': 'Social Development',
                  'Networking platform': 'Media and Entertainment',
                  'Neuroscience': 'Information Technology (IT)',
                  'News': 'Information Technology (IT)',
                  'NFT': 'Blockchain',
                  'NFT Marketplace': 'Blockchain',
                  'NFT marketplace': 'Blockchain',
                  'Nutrition': 'Agriculture and Food Production',
                  'Nutrition sector': 'Agriculture and Food Production',
                  'Nutrition tech': 'Agriculture and Food Production',
                  'Nutrition Tech': 'Agriculture and Food Production',
                  'Oil & Energy': 'Energy',
                  'Oil and Energy': 'Energy',
                  'Online credit management startup': 'Banking',
                  'Online Education': 'Information Technology (IT)',
                  'Online financial service': 'Banking',
                  'Online Games': 'Media and Entertainment',
                  'Online Gaming': 'Media and Entertainment',
                  'Online Grocery': 'E-Commerce and Retail',
                  'Online Learning': 'Education',
                  'Online Marketplace': 'E-Commerce and Retail',
                  'Online Media': 'Media and Entertainment',
                  ' Online Media': 'Media and Entertainment',
                  'Online media': 'Media and Entertainment',
                  'Online Payments': 'Banking',
                  'Online Portals': 'Information Technology (IT)',
                  'Online Retail': 'E-Commerce and Retail',
                  'Online Services': 'E-Commerce and Retail',
                  'Online Shopping': 'E-Commerce and Retail',
                  'Online storytelling': 'Career and Personal Development',
                  'Online Travel': 'Transportation and Tourism',
                  'Outsourcing/Offshoring': 'Career and Personal Development',
                  'OTT': 'Media and Entertainment',
                  'PaaS startup': 'Information Technology (IT)',
                  'PaaS startup': 'Information Technology (IT)',
                  'Packaging': 'Manufacturing',
                  'Packaging Services': 'Manufacturing',
                  'Packaging solution startup': 'Manufacturing',
                  'Payment': 'Banking',
                  'Payment Gateway': 'Banking',
                  'Payment Solutions': 'Banking',
                  'Personal Care': 'Healthcare and Fitness',
                  'Personal care startup': 'Healthcare and Fitness',
                  'Pet care': 'Career and Personal Development',
                  'Pharmaceutical': 'Healthcare and Fitness',
                  'Pharma': 'Healthcare and Fitness',
                  'Pharmacy': 'Healthcare and Fitness',
                  'Photonics startup': 'Information Technology (IT)',
                  'Personal Finance': 'Banking',
                  'Pharmaceuticals': 'Healthcare and Fitness',
                  'Photography': 'Career and Personal Development',
                  'Plastic Surgery': 'Healthcare and Fitness',
                  'Platform': 'Media and Entertainment',
                  'Platform as a Service (PaaS)': 'Media and Entertainment',
                  'Podcast': 'Media and Entertainment',
                  'Pollution control equiptment': 'Water and Environment',
                  'Political Organization': 'Social Development',
                  'Preschool Daycare': 'Education',
                  'Productivity': 'Career and Personal Development',
                  'Product studio': 'Information Technology (IT)',
                  'Professional Training & Coaching': 'Career and Personal Development',
                  'Professional Services': 'Career and Personal Development',
                  'Property Management': 'Career and Personal Development',
                  'Proptech': 'Career and Personal Development',
                  'Publication': 'Information Technology (IT)',
                  'Public Relations': 'Social Development',
                  'Public Services': 'E-Commerce and Retail',
                  'Publishing': 'E-Commerce and Retail',
                  'QR Code': 'Information Technology (IT)',
                  'QSR startup': 'E-Commerce and Retail',
                  'Reading Apps': 'Media and Entertainment',
                  'Real estate': 'Real Estate',
                  'Real Estate Tech': 'Real Estate',
                  'Recruitment': 'Career and Personal Development',
                  'Recruitment platform': 'Career and Personal Development',
                  'Recruitment Services': 'Career and Personal Development',
                  'Recruitment startup': 'Career and Personal Development',
                  'Recycling': 'Water and Environment',
                  'Renewable Energy': 'Energy',
                  'Renewables player': 'Energy',
                  'Renewable player': 'Energy',
                  'Renewables & Environment': 'Energy',
                  'Rental': 'E-Commerce and Retail',
                  'Rental space': 'E-Commerce and Retail',
                  'Retail': 'E-Commerce and Retail',
                  'Retail Aggregator': 'E-Commerce and Retail',
                  'Retail startup': 'E-Commerce and Retail',
                  'Reatil startup': 'E-Commerce and Retail',
                  'Retail Tech': 'Information Technology (IT)',
                  'Retail Technology': 'Information Technology (IT)',
                  'Ride-hailing': 'E-Commerce and Retail',
                  'Robotics': 'Information Technology (IT)',
                  'Robotics & Ai': 'Information Technology (IT)',
                  'Saas': 'Information Technology (IT)',
                  'saas': 'Information Technology (IT)',
                  'SaaS': 'Information Technology (IT)',
                  'SaaS B2B': 'Information Technology (IT)',
                  'SaaS-based platform': 'Information Technology (IT)',
                  'SaaS Platform': 'Information Technology (IT)',
                  'SaaS platform': 'Information Technology (IT)',
                  'SaaS Startup': 'Information Technology (IT)',
                  'SaaS startup': 'Information Technology (IT)',
                  'SaaS/Edtech': 'Information Technology (IT)',
                  'SaaS\xa0\xa0startup': 'Information Technology (IT)',
                  'Safety Tech': 'Information Technology (IT)',
                  'Sales & Marketing': 'E-Commerce and Retail',
                  'Sales and Marketing': 'E-Commerce and Retail',
                  'Salesforce': 'E-Commerce and Retail',
                  'Scanning app': 'Information Technology (IT)',
                  'Search Engine': 'Information Technology (IT)',
                  'Service industry': 'Career and Personal Development',
                  'Sales and Distribution': 'E-Commerce and Retail',
                  'Sales & Services': 'E-Commerce and Retail',
                  'Sles and marketing': 'E-Commerce and Retail',
                  'Sles and Marketing': 'E-Commerce and Retail',
                  'Sanitation solutions': 'Water and Environment',
                  'Security': 'Defense',
                  'Security Solutions': 'Defense',
                  'Self-Driving Cars': 'Manufacturing',
                  'Self-Improvement': 'Career and Personal Development',
                  'Semiconductor': 'Manufacturing',
                  'Sharing Economy': 'Social Development',
                  'Shipping': 'Transportation and Tourism',
                  'Shipping Tech': 'Transportation and Tourism',
                  'Shoes': 'Manufacturing',
                  'Skill development': 'Career and Personal Development',
                  'Skincare startup': 'Fashion and Beauty',
                  'Smart Cities': 'Information Technology (IT)',
                  'Social audio': 'Media and Entertainment',
                  'Social Commerce': 'E-Commerce and Retail',
                  'Social commerce': 'E-Commerce and Retail',
                  'Social community': 'Social Development',
                  'Social e-commerce': 'E-Commerce and Retail',
                  'Social Impact': 'Social Development',
                  'Social Media': 'Media and Entertainment',
                  'Social media': 'Media and Entertainment',
                  'Social Media Marketing': 'E-Commerce and Retail',
                  'Social Network': 'Media and Entertainment',
                  'Social network': 'Media and Entertainment',
                  'Social platform': 'Media and Entertainment',
                  'Spiritual': 'Career and Personal Development',
                  'Spacetech': 'Information Technology (IT)',
                  'Social Networking': 'Media and Entertainment',
                  'Software': 'Information Technology (IT)',
                  'Software as a Service (SaaS)': 'Information Technology (IT)',
                  'Software company': 'Information Technology (IT)',
                  'Software Company': 'Information Technology (IT)',
                  'Software Development': 'Information Technology (IT)',
                  'Software Engineering': 'Career and Personal Development',
                  'Software Solutions': 'Information Technology (IT)',
                  'Software Startup': 'Information Technology (IT)',
                  'Soil-Tech': 'Information Technology (IT)',
                  'Solar': 'Energy',
                  'Solar Energy': 'Energy',
                  'Solar Monitoring Company': 'Energy',
                  'Solar SaaS': 'Energy',
                  'Solar Solution': 'Energy',
                  'Solar solution': 'Energy',
                  'Space Tech': 'Information Technology (IT)',
                  'SpaceTech': 'Information Technology (IT)',
                  'Sports': 'Healthcare and Fitness',
                  'sports': 'Healthcare and Fitness',
                  'Sports startup': 'Healthcare and Fitness',
                  'SportsTech': 'Healthcare and Fitness',
                  'Sports Tech': 'Healthcare and Fitness',
                  'Staffing & Recruiting': 'Career and Personal Development',
                  'Startup': 'N/A',
                  'Startup laboratory': 'Healthcare and Fitness',
                  'Startup Studio': 'Media and Entertainment',
                  'Stationery': 'Manufacturing',
                  'Student Housing': 'Real Estate',
                  'Supply Chain': 'Transportation and Tourism',
                  'Supply chain, Agritech': 'Transportation and Tourism',
                  'Supply chain platform': 'Transportation and Tourism',
                  'Supply Chain Management': 'Transportation and Tourism',
                  'Supply Chain Solutions': 'Transportation and Tourism',
                  'Sustainable Development': 'Social Development',
                  'TaaS startup': 'Career and Personal Development',
                  'Taxation': 'Social Development',
                  'Tech': 'Information Technology (IT)',
                  'Tech company': 'Information Technology (IT)',
                  'Tech Hub': 'Information Technology (IT)',
                  'Tech hub': 'Information Technology (IT)',
                  'Tech Platform': 'Information Technology (IT)',
                  'Tech platform': 'Information Technology (IT)',
                  'Tech Startup': 'Information Technology (IT)',
                  'Tech startup': 'Information Technology (IT)',
                  'Technology': 'Information Technology (IT)',
                  'Techonology': 'Information Technology (IT)',
                  'Telecommunication': 'Information Technology (IT)',
                  'Telecommuncation': 'Information Technology (IT)',
                  'Telecommunications': 'Information Technology (IT)',
                  'Textile': 'Manufacturing',
                  'Textiles': 'Manufacturing',
                  'Tyre management': 'Manufacturing',
                  'Ticketing': 'E-Commerce and Retail',
                  'Tobacco': 'Agriculture and Food Production',
                  'Tourism': 'Transportation and Tourism',
                  'Tourism & EV': 'Transportation and Tourism',
                  'Toy': 'Manufacturing',
                  'Training': 'Career and Personal Development',
                  'Trading': 'E-Commerce and Retail',
                  'Trading Platform': 'E-Commerce and Retail',
                  'Trading platform': 'E-Commerce and Retail',
                  'Translation & Localization': 'Career and Personal Development',
                  'Transport': 'Transportation and Tourism',
                  'Transport Automation': 'Transportation and Tourism',
                  'Travel Tech': 'Transportation and Tourism',
                  'Travel tech': 'Transportation and Tourism',
                  'Travel & SaaS': 'Transportation and Tourism',
                  'TravelTech': 'Transportation and Tourism',
                  'Transportation': 'Transportation and Tourism',
                  'Transportation & Tourism': 'Transportation and Tourism',
                  'Transportation Tech': 'Transportation and Tourism',
                  'Travel': 'Transportation and Tourism',
                  'Transport & Rentals': 'Transportation and Tourism',
                  'Travel & Tourism': 'Transportation and Tourism',
                  'Travel and Tourism': 'Transportation and Tourism',
                  'UI/UX Design': 'Information Technology (IT)',
                  'Vehicle repair startup': 'Transportation and Tourism',
                  'Venture Capital': 'Banking',
                  'Venture capital': 'Banking',
                  'Venture capitalist': 'Banking',
                  'Venture Capital & Private Equity': 'Banking',
                  'Veterinary': 'Agriculture and Food Production',
                  'Video': 'Media and Entertainment',
                  'Video communication': 'Media and Entertainment',
                  'Video Games': 'Media and Entertainment',
                  'Video personalization': 'Media and Entertainment',
                  'Video platform': 'Media and Entertainment',
                  'Video Production': 'Media and Entertainment',
                  'Video sharing platform': 'Media and Entertainment',
                  'Video streaming platform': 'Media and Entertainment',
                  'Visual Media': 'Media and Entertainment',
                  'Virtual auditing startup': 'Banking',
                  'Virtual Banking': 'Banking',
                  'Virtual Reality': 'Information Technology (IT)',
                  'VR': 'Information Technology (IT)',
                  'VR & SaaS': 'Information Technology (IT)',
                  'VR/AR': 'Information Technology (IT)',
                  'Warehouse': 'E-Commerce and Retail',
                  'Waste Management': 'Water and Environment',
                  'Water': 'Water and Environment',
                  'Water purification': 'Water and Environment',
                  'Wealth Management': 'Banking',
                  'Wearable Tech': 'Information Technology (IT)',
                  'Wearable Technology': 'Information Technology (IT)',
                  'Wedding': 'Events and Hospitality',
                  'Web Design': 'Information Technology (IT)',
                  'Web Development': 'Information Technology (IT)',
                  'Wellness': 'Healthcare and Fitness',
                  'Wellness and Fitness': 'Healthcare and Fitness',
                  'Wholesale': 'E-Commerce and Retail',
                  'Wine & Spirits': 'Manufacturing',
                  'Wireless': 'Information Technology (IT)',
                  'WL & RAC protection': 'Transportation and Tourism',
                  "Women's Health": 'Healthcare and Fitness',
                  'Workforce Development': 'Career and Personal Development',
                  'Work fulfillment': 'Career and Personal Development',
                  'Workplace Safety': 'Career and Personal Development',
                  'Writing and Editing': 'Career and Personal Development',
                  'Yoga': 'Healthcare and Fitness',
                  'Yoga & Wellness': 'Healthcare and Fitness',
                  'Yoga & wellness': 'Healthcare and Fitness',
                  'Youth Development': 'Career and Personal Development'
                 }
    


: 

Startup does not indicate any specific sector, so it is mapped to N/A.

In [None]:
# Replacing the sectors in the dictionary with a better common name, and when a sector is not found in the dictionary,
# the sector itself is returned.

df["Sector"] = [sector_mapping.get(sector, sector) for sector in df["Sector"]]
df["Sector"].unique()

: 

In [None]:
# Replacing 'Banking' with 'Banking and Financial Services'

df["Sector"] = df["Sector"].replace('Banking', 'Banking and Financial Services', regex=True)
df["Sector"].unique()

: 

In [None]:
# Replacing 'Career and Personal Development' with 'Career, Special Duties and Personal Development'

df["Sector"] = df["Sector"].replace('Career and Personal Development', 'Career, Special Duties and Personal Development', regex=True)
df["Sector"].unique()

: 

In [None]:
# Replacing 'AI and Tech' with Information Technology (IT)'

df["Sector"] = df["Sector"].replace('AI and Tech', 'Information Technology (IT)', regex=True)
df["Sector"].unique()

: 

In [None]:
# Replacing 'Transportation & Tourism' with 'Transportation and Tourism'

df["Sector"] = df["Sector"].replace('Transportation & Tourism', 'Transportation and Tourism', regex=True)
df["Sector"].unique()

: 

In [None]:
# Replacing 'Health & Fitness' and 'Healthcare & Fitness' with 'Career, 'Healthcare and Fitness'

df["Sector"] = df["Sector"].replace(['Health & Fitness', 'Healthcare & Fitness'], 'Healthcare and Fitness', regex=True)
df["Sector"].unique()

: 

In [None]:
# Inspecting the "Amount($)" column thoroughly

df["Amount($)"].unique()

: 

In [None]:
# Evaluating the "Amount($)" column

df["Amount($)"].describe()

: 

In [None]:
# Saving the cleaned dataset into a new csv file

df.to_csv('cleaned_dataset.csv', index = False)

# Viewing the saved dataset to confirm it's accurate
data = pd.read_csv('cleaned_dataset.csv')
data.head()

: 

# Hypothesis Testing

Null hypothesis: The sector a start-up belongs to did not influence the funding amount the start-up received.

Alternate hypothesis: The sector a start-up belongs to influenced the funding amount the start-up received.

The hypothesis was tested using Analysis of Variance (ANOVA).

In [None]:
# Creating a DataFrame with sector and funding amount data
hypothesis_data = {
    "Sector": df["Sector"],
    "Amount($)": df["Amount($)"]
}
hypothesis_data_df = pd.DataFrame(hypothesis_data)

# Performing ANOVA test
grouped_data = df.groupby("Sector")["Amount($)"]
grouped_arrays = [np.array(group) for _, group in grouped_data]

f_statistic, p_value = stats.f_oneway(*grouped_arrays)

# Printing the results
print("F-Statistic:", f_statistic)
print("p-value:", p_value)

# Checking if the null hypothesis is rejected or not
alpha = 0.05  # Significance level
if p_value < alpha:
    print("Reject the null hypothesis.")
else:
    print("Fail to reject the null hypothesis.")

: 

Since the p-value (0.9997970614101779) is greater than the Significance level (0.05), we fail to reject the null hypothesis. This implies that the null hypothesis is correct.

# Answering Questions with visualizations

In [None]:
df

: 

# Questions

1. How many start-ups got funded each year?
2. What is the total funding amount received by start-ups each year?
3. Which ten start-ups received the most funding?
4. Which ten start-ups received the least funding?
5. Which ten sectors received the most funding?
6. Which ten sectors received the least funding?
7. Which ten headquarters received the most funding?
8. Which ten headquarters received the least funding?
9. What is the trend of funding received by start-ups in the top 5 headquarters from 2018 to 2021?

The first eight questions were based on univariate analysis which refers to the analysis of one variable in a dataset. But the last question is based on multivariate analysis which is the analysis of several variables in a dataset.

# Question 1: How many start-ups got funded each year?

In [None]:
# Using value_counts to calculate the number of funding received each year

Number_of_funding_each_year = df["Funding Year"].value_counts()
Number_of_funding_each_year

: 

In [None]:
# Creating a bar chart to illustrate the number of start-ups that got funded each year

# Funding_Year_Chart = sns.barplot(x = Number_of_funding_each_year.index, y = Number_of_funding_each_year.values);
# Funding_Year_Chart.set(xlabel = "Funding Year", ylabel = "Number of funding each year", title = "Number Of Start-Ups That Got Funded Each Year");

# Set the figure background color to light blue
# fig, ax = plt.subplots(facecolor='lightblue')

# Set the color palette
sns.set_palette('Blues')

# Create a countplot
sns.countplot(x="Funding Year", data=df)

# Set labels and title
plt.xlabel("Funding Year")
plt.ylabel("Number of funding each year")
plt.title("Number Of Start-Ups That Got Funded Each Year")

# Save the chart as an image file
plt.savefig("Number Of Start-Ups That Got Funded Each Year.png")

# Show the plot
plt.show()

: 

In 2018, 525 start-ups received funding. This number declined to 89 in 2019, which is the year with the lowest number of start-ups receiving funding. There was a sharp rise to 1048 in 2020. The number increased further to 1178 in 2021, which is the year with the highest number of start-ups receiving funding.

# Question 2: What is the total funding amount received each year?

In [None]:
# Using groupby to calculate the amount of funding received each year

Total_funding_each_year = df.groupby(["Funding Year"]).sum(numeric_only=True).reset_index()
Total_funding_each_year

: 

In [None]:
# Creating a bar chart to illustrate the total funding amount each year

# Funding_Amount_Chart = sns.barplot(x = Total_funding_each_year["Funding Year"], y = Total_funding_each_year["Amount($)"], errorbar = None);
# Funding_Amount_Chart.set(xlabel = "Funding Year", ylabel = "Amount($)", title = "Total Funding Amount Received Each Year");

plt.title("Total Funding Amount Each Year")
Funding_Amount_Chart = sns.barplot(y = Total_funding_each_year["Amount($)"], x = Total_funding_each_year["Funding Year"], palette='Blues', errorbar = None)
Funding_Amount_Chart.set(xlabel = "Funding Year", ylabel = "Amount($)")

# Save the chart as an image file
plt.savefig("Total Funding Amount Each Year.png")

plt.show()

: 

In 2018, the total funding amount received was 6,641,523,177 US Dollars. This amount declined to 3,336,433,200 US Dollars in 2019, which is the year with the lowest funding amount received. There was a sharp rise to 90,691,243,104 US Dollars in 2020. The amount increased further to 179,293,326,000 US Dollars in 2021, which is the year with the highest funding amount received.

# Question 3: Which ten start-ups received the most funding?

In [None]:
# The amounts received by the start-ups will be grouped by Company/Brand

Funded_Startups = df.groupby("Company/Brand").sum(numeric_only=True).reset_index()

: 

In [None]:
# Sorting from the highest to lowest and applying .head(10)

Funded_Startups.sort_values(['Amount($)'], ascending = False, inplace=True)
Most_Funded_Startups = Funded_Startups.head(10)
Most_Funded_Startups

: 

In [None]:
# # Creating a horizontal bar chart to illustrate the 10 start-ups that received the most funding

# Most_Funded_Startups_Chart = sns.barplot(x = Most_Funded_Startups["Amount($)"], y = Most_Funded_Startups["Company/Brand"], orient = 'h', errorbar = None);
# Most_Funded_Startups_Chart.set(xlabel = "Funding Amount Received", ylabel = "Company/Brand", title = "The 10 Start-Ups That Received The Most Funding");

# Set the color palette
color_palette = sns.color_palette('Blues')

# Create a horizontal bar plot
sns.barplot(x="Amount($)", y="Company/Brand", data=Most_Funded_Startups, palette=color_palette, orient='h')

# Set labels and title
plt.xlabel("Funding Amount Received")
plt.ylabel("Company/Brand")
plt.title("The 10 Start-Ups That Received The Most Funding")

# Save the chart as an image file
plt.savefig("The 10 Start-Ups That Received The Most Funding.png")

# Show the plot
plt.show()

: 

Alteria Capital got the highest funding amount (150,000,000,000 US Dollars), followed by Reliance Retail Ventures Ltd (70,000,000,000 US Dollars), Snowflake (3,000,000,000 US Dollars), Reliance (2,200,000,000 US Dollars), Swiggy (1,956,000,000 US Dollars), VerSe Innovation (1,550,000,000 US Dollars), BYJU'S (1,260,000,000 US Dollars), Dream Sports (1,240,000,000 US Dollars), Zomato (1,239,000,000 US Dollars) and Zetwerk (925,200,000 US Dollars).

# Question 4: Which ten start-ups received the least funding?

In [None]:
# Selecting the least amounts greater than 0
Least_Funded_Startups = Funded_Startups[Funded_Startups['Amount($)']>0].tail(10)
Least_Funded_Startups

: 

In [None]:
# Creating a horizontal bar chart to illustrate the 10 start-ups that received the least funding

# Least_Funded_Startups_Chart = sns.barplot(x = Least_Funded_Startups["Amount($)"], y = Least_Funded_Startups["Company/Brand"], orient = 'h', errorbar = None);
# Least_Funded_Startups_Chart.set(xlabel = "Funding Amount Received", ylabel = "Company/Brand", title = "The 10 Start-Ups That Received The Least Funding");

# Set the color palette
color_palette = sns.color_palette('Blues')

# Create a horizontal bar plot
sns.barplot(x="Amount($)", y="Company/Brand", data=Least_Funded_Startups, palette=color_palette, orient='h')

# Set labels and title
plt.xlabel("Funding Amount Received")
plt.ylabel("Company/Brand")
plt.title("The 10 Start-Ups That Received The Least Funding")

# Save the chart as an image file
plt.savefig("The 10 Start-Ups That Received The Least Funding.png")

# Show the plot
plt.show()

: 

Classworks India and Next Digital Solutions got the joint lowest funding amount (1,460 US Dollars), followed by Enlyft Digital Solutions Private Limited (1,460 US Dollars), Antariksh Waste Ventures Pvt Ltd (7,300 US Dollars) and Mombay (7,500 US Dollars). Glii and Authmetrik were next as they received (10,000 US Dollars), while Monech, Medicus and Teach Us followed with (12,700 US Dollars).

# Question 5: Which ten sectors received the most funding?

In [None]:
# The amounts received by the start-ups will be grouped by Sector 

Funded_Sectors = df.groupby("Sector").sum(numeric_only=True).reset_index()
Funded_Sectors

: 

In [None]:
# Sorting from the highest to lowest and applying .head(10)

Funded_Sectors.sort_values(["Amount($)"], ascending = False, inplace=True)
Most_Funded_Sectors = Funded_Sectors.head(10)
Most_Funded_Sectors

: 

In [None]:
# Creating a horizontal bar chart to illustrate the 10 sectors that received the most funding

# Most_Funded_Sectors_Chart = sns.barplot(x = Most_Funded_Sectors["Amount($)"], y = Most_Funded_Sectors["Sector"], orient = 'h', errorbar = None);
# Most_Funded_Sectors_Chart.set(xlabel = "Funding Amount Received", ylabel = "Sector", title = "The 10 Sectors That Received The Most Funding");

# Set the color palette
color_palette = sns.color_palette('Blues')

# Create a horizontal bar plot
sns.barplot(x="Amount($)", y="Sector", data=Most_Funded_Sectors, palette=color_palette, orient='h')

# Set labels and title
plt.xlabel("Funding Amount Received")
plt.ylabel("Sector")
plt.title("The 10 Sectors That Received The Most Funding")

# Save the chart as an image file
plt.savefig("The 10 Sectors That Received The Most Funding.png")

# Show the plot
plt.show()

: 

The Banking and Financial Services sector received the highest amount of funding (160,149,076,746 US Dollars), followed by the E-Commerce and Retail sector (76,411,115,900 US Dollars), Information Technology (IT) sector (9,313,075,410 US Dollars), Education sector (6,915,347,290 US Dollars), Agriculture and Food Production (4,369,672,700 US Dollars), Healthcare and Fitness sector (3,970,320,985 US Dollars), Manufacturing	 sector (3,237,776,242 US Dollars), Career, Special Duties and Personal Development (3,011,784,860 US Dollars), Media and Entertainment sector (2,941,253,400 US Dollars), and Transportation and Tourism sector (2,791,082,500 US Dollars).

# Question 6: Which ten sectors received the least funding?

In [None]:
# Selecting the least amounts greater than 0
Least_Funded_Sectors = Funded_Sectors[Funded_Sectors['Amount($)']>0].tail(10)
Least_Funded_Sectors

: 

In [None]:
# Creating a horizontal bar chart to illustrate the 10 sectors that received the least questions

# Least_Funded_Sectors_Chart = sns.barplot(x = Least_Funded_Sectors["Amount($)"], y = Least_Funded_Sectors["Sector"], orient = 'h', errorbar = None);
# Least_Funded_Sectors_Chart.set(xlabel = "Funding Amount Received", ylabel = "Sector", title = "The 10 Sectors That Received The Least Funding");

# Set the color palette
color_palette = sns.color_palette('Blues')

# Create a horizontal bar plot
sns.barplot(x="Amount($)", y="Sector", data=Least_Funded_Sectors, palette=color_palette, orient='h')

# Set labels and title
plt.xlabel("Funding Amount Received")
plt.ylabel("Sector")
plt.title("The 10 Sectors That Received The Least Funding")

# Save the chart as an image file
plt.savefig("The 10 Sectors That Received The Least Funding.png")

# Show the plot
plt.show()

: 

The Defense sector received the lowest amount of funding (1,000,000 US Dollars), followed by the Cultural Heritage sector (1,064,000 US Dollars), Social Development sector (7,717,00 US Dollars), Water and Environment sector (22,600,000 US Dollars), Real Estate sector (134,072,328 US Dollars), Construction sector (234,315,000 US Dollars), Blockchain sector (303,002,000 US Dollars), Fashion and Beauty sector (763,150,200 US Dollars) and Energy sector (1,220,325,000 US Dollars). N/A refers to sectors that were not disclosed.

# Question 7: Which ten headquarters received the most funding?

In [None]:
# The amounts received by the start-ups will be grouped by HeadQuarters 

Funded_HeadQuarters = df.groupby("HeadQuarter").sum(numeric_only=True).reset_index()
Funded_HeadQuarters

: 

In [None]:
# Sorting from the highest to lowest and applying .head(10)

Funded_HeadQuarters.sort_values(["Amount($)"], ascending = False, inplace=True)
Most_Funded_HeadQuarters = Funded_HeadQuarters.head(10)
Most_Funded_HeadQuarters

: 

In [None]:
# Creating a horizontal bar chart to illustrate the 10 headquarters that received the most funding.

# Most_Funded_HeadQuarters_Chart = sns.barplot(x = Most_Funded_HeadQuarters["Amount($)"], y = Most_Funded_HeadQuarters["HeadQuarter"], orient = 'h', errorbar = None);
# Most_Funded_HeadQuarters_Chart.set(xlabel = "Funding Amount Received", ylabel = "Headquarter", title = "The 10 Headquarters That Received The Most Funding");

# Set the color palette
color_palette = sns.color_palette('Blues')

# Create a horizontal bar plot
sns.barplot(x="Amount($)", y="HeadQuarter", data=Most_Funded_HeadQuarters, palette=color_palette, orient='h')

# Set labels and title
plt.xlabel("Funding Amount Received")
plt.ylabel("Headquarter")
plt.title("The 10 Headquarters That Received The Most Funding")

# Save the chart as an image file
plt.savefig("The 10 Headquarters That Received The Most Funding.png")

# Show the plot
plt.show()

: 

Start-ups in Maharashtra received the highest funding amount (233,189,364,405 US Dollars), followed by start-ups in Karnataka (23,961,669,610	US Dollars), Haryana (8,197,616,180 US Dollars), Delhi (4,252,140,240 US Dollars), California (3,419,500,000 US Dollars), Tamil Nadu (1,401,925,826 US Dollars), Rajasthan (625,383,460 US Dollars), Uttar Pradesh (596,949,000 US Dollars) and Telangana (420,662,440 US Dollars). N/A refers to the funding amount in unknown locations (2,027,544,300 US Dollars).

# Question 8: Which ten headquarters received the least funding?

In [None]:
# Selecting the least amounts greater than 0

Least_Funded_HeadQuarters = Funded_HeadQuarters[Funded_HeadQuarters['Amount($)']>0].tail(10)
Least_Funded_HeadQuarters

: 

In [None]:
# Creating a horizontal bar chart to illustrate the 10 headquarters that received the least funding.

# Least_Funded_HeadQuarters_Chart = sns.barplot(x = Least_Funded_HeadQuarters["Amount($)"], y = Least_Funded_HeadQuarters["HeadQuarter"], orient = 'h', errorbar = None);
# Least_Funded_HeadQuarters_Chart.set(xlabel = "Funding Amount Received", ylabel = "Headquarter", title = "The 10 Headquarters That Received The Least Funding");


# Set the color palette
color_palette = sns.color_palette('Blues')

# Create a horizontal bar plot
sns.barplot(x="Amount($)", y="HeadQuarter", data=Least_Funded_HeadQuarters, palette=color_palette, orient='h')

# Set labels and title
plt.xlabel("Funding Amount Received")
plt.ylabel("Headquarter")
plt.title("The 10 Headquarters That Received The Least Funding")

# Save the chart as an image file
plt.savefig("The 10 Headquarters That Received The Least Funding.png")

# Show the plot
plt.show()

: 

Start-ups in Jharkand received the least funding amount (100,000 US Dollars), folllowed by start-ups in Guntur (130,000 US Dollars), Newcastle Upon Tyne (300,000 US Dollars), Andhra Pradesh (800,000 US Dollars), Silvassa and Rajastan (1,000,000 US Dollars) each, Seattle and Hyderebad (1,100,000 US Dollars) each, Riyadh (1,300,000 US Dollars) and Seoul (1,400,000 US Dollars). Most of these locations are not in India.

# Question 9: What is the trend of funding received by start-ups in the top 5 headquarters from 2018 to 2021?

From the answer to Question 7, it is seen that the 5 top headquarters were start-ups received the highest funding are Maharashtra (233,189,364,405 US Dollars), Karnataka (23,961,669,610	US Dollars), Haryana (8,197,616,180 US Dollars), Delhi (4,252,140,240 US Dollars) and California (3,419,500,000 US Dollars) in descending order.

In [None]:
# Grouping 'Maharashtra','Karnataka','Haryana', 'Delhi','California' in the HeadQuarter column by the funding years
Top_5_HeadQuarter_Funding_Trend = df.query("HeadQuarter == ['Maharashtra','Karnataka','Haryana', 'Delhi','California']").groupby(["HeadQuarter","Funding Year"]).sum(numeric_only=True).reset_index()

# Sorting the funding years in ascending order
Top_5_HeadQuarter_Funding_Trend.sort_values(["Funding Year"], ascending = True, inplace=True)
Top_5_HeadQuarter_Funding_Trend

: 

In [None]:
# Set the color palette
color_palette = sns.color_palette('Blues')

# Creating a bar chart to illustrate the trend of funding received by start-ups in the top 5 headquarters from 2018 to 2021.

sns.barplot(x = Top_5_HeadQuarter_Funding_Trend["Funding Year"], y = Top_5_HeadQuarter_Funding_Trend["Amount($)"], hue = Top_5_HeadQuarter_Funding_Trend["HeadQuarter"])
plt.title("The Trend Of Funding Received By Startups In The Top 5 Headquarters From 2018 to 2021")

# Shifting the legend to the middle
plt.legend(loc='upper center', bbox_to_anchor=(0.5, 1))

# Save the chart as an image file
plt.savefig("The Trend Of Funding Received By Startups In The Top 5 Headquarters From 2018 to 2021.png")

: 

Start-ups in Maharashtra received very high funding in 2020 and 2021 compared to start-ups in other headquarters in these same years.

# Evaluation
Since the p-value (0.9997970614101779) is greater than the Significance level (0.05), we fail to reject the null hypothesis. This implies that our null hypothesis (The sector a start-up belongs to did not influence the funding amount the start-up received) is correct.

The top 5 start-ups that received the highest funding are Alteria Capital (150,000,000,000 US Dollars), Reliance Retail Ventures Ltd (70,000,000,000 US Dollars), Snowflake (3,000,000,000 US Dollars) and Reliance (2,200,000,000 US Dollars), Swiggy (1,956,000,000 US Dollars) in descending order.

The top 5 sectors whose start-ups received the highest funding are the Banking and Financial Services sector (160,149,076,746 US Dollars), the E-Commerce and Retail sector (76,411,115,900 US Dollars), the Information Technology (IT) sector (9,313,075,410 US Dollars), the Education sector (6,915,347,290 US Dollars) and the Agriculture and Food Production sector (4,369,672,700 US Dollars) in descending order.

The top 5 headquarters where start-ups received the highest funding are Maharashtra (233,189,364,405 US Dollars), Karnataka (23,961,669,610 US Dollars), Haryana (8,197,616,180 US Dollars), Delhi (4,252,140,240 US Dollars) and California (3,419,500,000 US Dollars) in descending order.

The chart for the 10 start-ups that received the most funding and the 10 sectors that received the most funding are similar. The chart for the 10 headquarters that received the most funding is a little similar to them as well. The 2 startups that received the highest funding (Alteria Capital and Reliance Retail Ventures Ltd) will be evaluated further for more insights since their funding amounts are so high.

In [None]:
# Identifying the rows that have 'Alteria Capital' on the "Company/Brand" Column on the merged dataset

df.loc[df["Company/Brand"] == 'Alteria Capital']

: 

Alteria Capital is the start-up that received the highest funding amount (received in 2021). It belongs to the Banking and Financial Services sector, and is located in Maharashtra. The investor responsible for this funding will be identified by evaluating the original 2021 dataset.

In [None]:
# Identifying the row that has 'Alteria Capital' on the "Company/Brand" Column on the original 2021 dataset

ddf4.loc[ddf4["Company/Brand"] == 'Alteria Capital']

: 

Alas, the investor information is not given. However, it is seen that the city of this startup is Mumbai.

In [None]:
# Identifying the rows that have 'Reliance Retail Ventures Ltd' on the "Company/Brand" Column on the merged dataset

df.loc[df["Company/Brand"] == 'Reliance Retail Ventures Ltd']

: 

In [None]:
# Loading the original 2020 dataset with a different name

ddf3 = pd.read_csv('startup_funding2020.csv')

# Identifying the row that has 'Reliance Retail Ventures Ltd' on the "Company/Brand" Column on the original 2020 dataset

ddf3.loc[ddf3["Company/Brand"] == 'Reliance Retail Ventures Ltd']

: 

The investors responsible for this huge funding are Silver Lake and Mubadala Investment Company. It is also seen that the city of this start-up is Mumbai.

# Recommendation

From the evaluated results of the analysis, it is recommended that the team should consider exploring Mumbai in the state of Maharashtra as the start-up location. The Banking and Financial Services sector as well as the E-Commerce and Retail sector are the top performing sectors to consider. Silver Lake and Mubadala Investment Company are the top performing investors to be approached for funding.