# **CAREER ACCELERATOR LP1 - PROJECT**

### **Introduction:**

Ideas, creativity, and execution are essential for a start-up to flourish. But are they enough? Investors provide start-ups and other entrepreneurial ventures with the capital---popularly known as "funding"---to think big, grow rich, and leave a lasting impact. In this project, you are going to analyse funding received by start-ups in India from 2018 to 2021. You will find the data for each year of funding in a separate csv file in the dataset provided. In these files you'll find the start-ups' details, the funding amounts received, and the investors' information.


### **Scenario:**
My team Has been tasked with analyzing the Indian Startup Ecosystem. The analysis should provide insight as to the best course of action for the company.

### **Task:**

Our task is to develop a unique story from this dataset by stating and testing a hypothesis, asking questions, perform analysis and share insights with appropriate visualisations.

# **INDIAN STARTUP ECOSYSTEM ANALYSIS 2018 - 2021**

# **1. Business Understanding**

To be able to understand anything, We must first break it apart and examine it's components before we understand how it works as a whole. The task is to perform an analysis of the 'Indian Start-Up Ecosystem', but what exactly do each of these mean? Let's dive into the definitions of each of the elements in the task;

#### **Definitions** ####
##### **Ecosystem:**
In natural sciences, ‘ecosystems’ are generally defined as a system, or a group of interconnected elements, formed by the interaction of a community of organisms with their environment. 

##### **Startup:**
A startup or start-up is a company or project undertaken by an entrepreneur to seek, develop, and validate a scalable business model. Startups are new businesses that intend to grow large beyond the solo founder. At the beginning, startups face high uncertainty and have high rates of failure, but a minority of them do go on to become successful and influential.

##### **India:** 
India is a country that occupies the greater part of South Asia. India is made up of 28 states and eight union territories, and its national capital is New Delhi. It is the seventh-largest country by area and the most populous country as of June 2023.

A startup ecosystem is community of people, startups in their various stages and various types of organizations (funders, governments, etc) in a location (physical or virtual), interacting as a system to create and scale new startups. 

Neither biological nor startup ecosystems can be created, designed or built by an outside actor. While this makes the term ‘start-up ecosystem’ hard to grasp, it does underline that start-ups operate in complex and highly dynamic environments. For this reason, it is particularly important to take sufficient time to analyse and understand the ecosystem before designing interventions to partake in it.

Just like biological ecosystems, a startup ecosystem consists of different elements, which can be individuals, groups, organisations and institutions that form a community by interacting with one another, but also environmental determinants that have an influence on how these actors work and interconnect; in startup ecosystems, these can be laws and policies or cultural norms.

![**A Start-Up Ecosystem**](https://upload.wikimedia.org/wikipedia/commons/thumb/3/35/StartupEcosystem.png/300px-StartupEcosystem.png)

#### **Previous Studies / Research**

In nature, for any and all participants to thrive, the ecosystem must be healthy and in balance. For a company this could be the best indicator for whether to invest in an ecosystem or not. Previous studies and researchers have identified 5 key aspects of an ecosystem that can be tracked to measure it's vibrance and and these are:


**1. What is the Density and ecosystem value?**  \
A first step to mapping an ecosystem is to look at its actual size, growth, and value. This can be tracked by the number of new startups founded in a region during a specific period but also the total combined valuation of all these companies over time, and even break them down by funding year to monitor each cohort. Looking at the number of exits, especially the larger ones are also an interesting indicator of startup success.

**2. How does the Funding activity look in the Ecosystem?** \
To assess the health of a startup ecosystem we need to have an eye on the quality, quantity, and ease of access to funding. To evaluate the ease of access to funding, start tracking early-stage funding rounds. Their volume and growth over time will let us know if start-ups are getting the support they need to take their business off the ground. The location of the investors will help you to identify foreign VCs already investing in your Indian startup ecosystem and allow us to build bridges for potential collaboration and partnerships.

**3. Market reach and scaling opportunities** \
The easiest way to gauge the success of your startups is to watch the unicorns (measured in terms of companies valued at over $1 billion) in your ecosystem. Although it may be a metric not relevant in the future (due to the increase in number of unicorns), it remains an interesting indicator of startup ecosystem success.

**4.Knowledge and innovation** \
Innovation and entrepreneurship often flourish alongside world-class knowledge institutes and R&D incentives. These institutions often foster high-impact innovation, collaboration, and success across sectors. You can measure the level of innovation and new technology in your local ecosystem through research and patent activity, and by keeping tabs on the number of spinouts your local knowledge institutions produce. 

**5. Connectedness, Talent, Diversity, and more…** \
A vibrant ecosystem is not simply a collection of isolated elements, the connections between the elements matter just as much as the elements themselves. The metrics for connectedness and access to quality and diverse talent are a little more complex. You could however look out for the number of accelerators & incubators in your region, on job boards to access the type of talent your startups are looking for the most and on investment heatmaps to understand the breadth of various industries or depth of expertise present in your community.

### **Business Objective** 
To find out whether to invest in the Indian start-up ecosystem or not.

#### **Hypothesis**
Null - The Indian Startup Ecosystem is healthy and worth an investment\
Alternative  - The Indian Startup Ecosystem is weak and not worthy of investment

#### **Key Questions**

Using metrics similar to those of previous researchers enables the company to easily compare the Indian case with other global thereby giving the company a broader worldview and the ability to make a more informed decision. 
This is to mean our Key questions will be influenced heavily by the body of previous research.

**1. What is the Total Value of the Indian Startup Ecosystem?**
* How Many startups were founded in the period
* How Much Money has the ecosystem received in funding 

**2. How has the Ecosystem changed over time?**
* What is the change in performance year on year
* Which region has the best performance

**3. What is the Success rate of Start-ups in the ecosystem?**
* Are there any unicorns from the ecosystem
* How Many Unicorns

**4. Who is already in the Ecosystem?**
* How many companies are already involved in the ecosystem
* What fields are they invested in

**5. Which is the best performing sector in the ecosystem?**
* Sector with highest amount raised
* Sector with most start-ups


#### **Success Criteria**

1. To produce a dashboard that showcases the metrics monitoring the health of the Indian Start-up Ecosystem.
2. To provide an objective metric that can be used to compare with other startup ecosystems.
3. If decision is to invest, to provide guidance on the best path of investment into the Indian Startup Ecosystem.

# **2. Data Understanding**

### **2.1: Data Preparation**

#### **2.1.1: Importations**

In [3]:
# import all necessary libraries
import os
import pandas as pd
import numpy as np
import pyodbc
from dotenv import dotenv_values
import matplotlib.pyplot as plt
import seaborn as sns
from thefuzz import process, fuzz

#remove pandas display limits
pd.set_option('display.max_columns', None)

#hide warnings
import warnings

warnings.filterwarnings('ignore')


#confrimation all libraries loaded
print("all libraries loaded successfully")

all libraries loaded successfully


### **2.1.2: Database Connection**

In [4]:
#reading data from database
#Load environment variables from .env file into a dictionary variable
environment_variables=dotenv_values('.env')

# Get the values for the credentials you set in the '.env' file
database = environment_variables.get("DB_NAME")
server = environment_variables.get("SERVER_NAME")
username = environment_variables.get("USERNAME")
password = environment_variables.get("PASSWORD")

#Connecting to the database
connection_string = f"DRIVER={{SQL Server}};SERVER={server};DATABASE={database};UID={username};PWD={password}"

# Using the connect method of the pyodbc library.
# This will connect to the server. 
connection=pyodbc.connect(connection_string)

print("connected successfully")

connected successfully


### **Note:** If the connection stops working, try restarting the kernel

#### **2.1.3: Reading the Data**

##### *YEAR: 2018*

In [5]:
# import 2018 data from GitHub
# Available from Azubi Africa Career Accelerator LP1 Repository as csv

df_2018 = pd.read_csv("https://raw.githubusercontent.com/Azubi-Africa/Career_Accelerator_LP1-Data_Analysis/main/startup_funding2018.csv")

df_2018.head()

Unnamed: 0,Company Name,Industry,Round/Series,Amount,Location,About Company
0,TheCollegeFever,"Brand Marketing, Event Promotion, Marketing, S...",Seed,250000,"Bangalore, Karnataka, India","TheCollegeFever is a hub for fun, fiesta and f..."
1,Happy Cow Dairy,"Agriculture, Farming",Seed,"₹40,000,000","Mumbai, Maharashtra, India",A startup which aggregates milk from dairy far...
2,MyLoanCare,"Credit, Financial Services, Lending, Marketplace",Series A,"₹65,000,000","Gurgaon, Haryana, India",Leading Online Loans Marketplace in India
3,PayMe India,"Financial Services, FinTech",Angel,2000000,"Noida, Uttar Pradesh, India",PayMe India is an innovative FinTech organizat...
4,Eunimart,"E-Commerce Platforms, Retail, SaaS",Seed,—,"Hyderabad, Andhra Pradesh, India",Eunimart is a one stop solution for merchants ...


##### *YEAR: 2019*

In [6]:
# import 2019 data from csv
df_2019 = pd.read_csv("datasets\startup_funding2019.csv")
df_2019.head()

Unnamed: 0,Company/Brand,Founded,HeadQuarter,Sector,What it does,Founders,Investor,Amount($),Stage
0,Bombay Shaving,,,Ecommerce,Provides a range of male grooming products,Shantanu Deshpande,Sixth Sense Ventures,"$6,300,000",
1,Ruangguru,2014.0,Mumbai,Edtech,A learning platform that provides topic-based ...,"Adamas Belva Syah Devara, Iman Usman.",General Atlantic,"$150,000,000",Series C
2,Eduisfun,,Mumbai,Edtech,It aims to make learning fun via games.,Jatin Solanki,"Deepak Parekh, Amitabh Bachchan, Piyush Pandey","$28,000,000",Fresh funding
3,HomeLane,2014.0,Chennai,Interior design,Provides interior designing solutions,"Srikanth Iyer, Rama Harinath","Evolvence India Fund (EIF), Pidilite Group, FJ...","$30,000,000",Series D
4,Nu Genes,2004.0,Telangana,AgriTech,"It is a seed company engaged in production, pr...",Narayana Reddy Punyala,Innovation in Food and Agriculture (IFA),"$6,000,000",


##### *YEAR: 2020*

In [7]:
#reading the 2020 SQL table into a dataframe

query='''SELECT * 
        FROM dbo.LP1_startup_funding2020'''
        
df_2020=pd.read_sql(query,connection)

df_2020.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage,column10
0,Aqgromalin,2019.0,Chennai,AgriTech,Cultivating Ideas for Profit,"Prasanna Manogaran, Bharani C L",Angel investors,200000.0,,
1,Krayonnz,2019.0,Bangalore,EdTech,An academy-guardian-scholar centric ecosystem ...,"Saurabh Dixit, Gurudutt Upadhyay",GSF Accelerator,100000.0,Pre-seed,
2,PadCare Labs,2018.0,Pune,Hygiene management,Converting bio-hazardous waste to harmless waste,Ajinkya Dhariya,Venture Center,,Pre-seed,
3,NCOME,2020.0,New Delhi,Escrow,Escrow-as-a-service platform,Ritesh Tiwari,"Venture Catalysts, PointOne Capital",400000.0,,
4,Gramophone,2016.0,Indore,AgriTech,Gramophone is an AgTech platform enabling acce...,"Ashish Rajan Singh, Harshit Gupta, Nishant Mah...","Siana Capital Management, Info Edge",340000.0,,


##### *YEAR: 2021*

In [8]:
#reading the 2021 SQL table into a dataframe

query='''SELECT * 
        FROM dbo.LP1_startup_funding2021'''
        
df_2021=pd.read_sql(query,connection)

df_2021.head()

Unnamed: 0,Company_Brand,Founded,HeadQuarter,Sector,What_it_does,Founders,Investor,Amount,Stage
0,Unbox Robotics,2019.0,Bangalore,AI startup,Unbox Robotics builds on-demand AI-driven ware...,"Pramod Ghadge, Shahid Memon","BEENEXT, Entrepreneur First","$1,200,000",Pre-series A
1,upGrad,2015.0,Mumbai,EdTech,UpGrad is an online higher education platform.,"Mayank Kumar, Phalgun Kompalli, Ravijot Chugh,...","Unilazer Ventures, IIFL Asset Management","$120,000,000",
2,Lead School,2012.0,Mumbai,EdTech,LEAD School offers technology based school tra...,"Smita Deorah, Sumeet Mehta","GSV Ventures, Westbridge Capital","$30,000,000",Series D
3,Bizongo,2015.0,Mumbai,B2B E-commerce,Bizongo is a business-to-business online marke...,"Aniket Deb, Ankit Tomar, Sachin Agrawal","CDC Group, IDG Capital","$51,000,000",Series C
4,FypMoney,2021.0,Gurugram,FinTech,"FypMoney is Digital NEO Bank for Teenagers, em...",Kapil Banwari,"Liberatha Kallat, Mukesh Yadav, Dinesh Nagpal","$2,000,000",Seed


**Notes:** \
    1. The data for each year is saved in variables named 'df_year'

## **2.2: Exploratory Data Analysis**

The data provided is expected to have the following columns to be used in the analysis:


|  | **COLUMN NAME** | **DESCRIPTION** | **EXPECTED DATATYPE** |
|--|-----------------|-----------------|-----------------------|
|**1**| **Company** | Name of the company/start-up | Object |
|**2**| **Founded** | Year start-up was founded | Datetime[Y] / int |
|**3**| **Sector** | Sector/ Industry | Category |
|**4**| **Description** | Description about Company | Object |
|**5**| **Founders** | Founders of the Company | Object |
|**6**| **Investor** | Investors | Category |
|**7**| **Amount** | Raised funds | float64 / int64 |
|**8**| **Stage** | Round of funding reached | Category |
|**9**| **Location** | City/ Region of Startup | Category |

**Key Assumption**
Based on our business understanding and the key questions asked, we have created the expected datatype column to guide our EDA

In [9]:
#checking the 2018 info
df_2018.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 526 entries, 0 to 525
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company Name   526 non-null    object
 1   Industry       526 non-null    object
 2   Round/Series   526 non-null    object
 3   Amount         526 non-null    object
 4   Location       526 non-null    object
 5   About Company  526 non-null    object
dtypes: object(6)
memory usage: 24.8+ KB


In [10]:
#checking 2019 info
df_2019.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 89 entries, 0 to 88
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company/Brand  89 non-null     object 
 1   Founded        60 non-null     float64
 2   HeadQuarter    70 non-null     object 
 3   Sector         84 non-null     object 
 4   What it does   89 non-null     object 
 5   Founders       86 non-null     object 
 6   Investor       89 non-null     object 
 7   Amount($)      89 non-null     object 
 8   Stage          43 non-null     object 
dtypes: float64(1), object(8)
memory usage: 6.4+ KB


In [11]:
#checking 2020 info
df_2020.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1055 entries, 0 to 1054
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1055 non-null   object 
 1   Founded        842 non-null    float64
 2   HeadQuarter    961 non-null    object 
 3   Sector         1042 non-null   object 
 4   What_it_does   1055 non-null   object 
 5   Founders       1043 non-null   object 
 6   Investor       1017 non-null   object 
 7   Amount         801 non-null    float64
 8   Stage          591 non-null    object 
 9   column10       2 non-null      object 
dtypes: float64(2), object(8)
memory usage: 82.6+ KB


In [12]:
#checking 2021 info
df_2021.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1209 entries, 0 to 1208
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Company_Brand  1209 non-null   object 
 1   Founded        1208 non-null   float64
 2   HeadQuarter    1208 non-null   object 
 3   Sector         1209 non-null   object 
 4   What_it_does   1209 non-null   object 
 5   Founders       1205 non-null   object 
 6   Investor       1147 non-null   object 
 7   Amount         1206 non-null   object 
 8   Stage          781 non-null    object 
dtypes: float64(1), object(8)
memory usage: 85.1+ KB


##### **Notes:** #####
1. The 2018 Dataset has less columns than the expected and all in the object datatype.
2. The 2019 and 2021 datasets have all expected and similar datatypes across columns the Founded column.
3. The 2020 dataset has more columns than expected and two columns in the expected datatypes (Founded and Amount).


##### **Decisions:** #####
We will clean all years' data separately as the columns are not in expected datatypes.\
To identify each data with year collected, we will add a year column to all datasets.

In [15]:
#adding a year column to identify each year's data
df_2018['year'] = 2018
df_2019['year'] = 2019
df_2020['year'] = 2020
df_2021['year'] = 2021

## **Collaboration Tip:**
Clean the data with a focus of creating a dataframe with an info like the Expected table above.

**Reference:** \
Effective Pandas by Matt Harrison - https://www.youtube.com/watch?v=zgbUk90aQ6A&t=4084s

## **2018 CLEANING**

In [16]:
#checking if column names are as expected
(df_2018
.columns)

Index(['Company Name', 'Industry', 'Round/Series', 'Amount', 'Location',
       'About Company', 'year'],
      dtype='object')