In [1]:
# Supress Warnings
import warnings
warnings.filterwarnings('ignore')

# Import the numpy and pandas packages
import numpy as np
import pandas as pd

# Business and Data Understanding
Spark Funds has two minor constraints for investments:
* It wants to invest between 5 to 15 million USD per round of investment
* It wants to invest only in English-speaking countries because of the ease of communication with the companies it would invest in


## Checkpoints - Part 1
### Checkpoint 1: Data Cleaning 1
1.	Load the companies and rounds data (provided on the previous page) into two data frames and name them companies and rounds2 respectively.


In [2]:
# Reading the CSV files into data frames
# Using ISO8859 master encoding as utf-8 encoding is failing
rounds2 = pd.read_csv("rounds2.csv", sep=",", encoding='iso8859')
print(len(rounds2.index))
companies = pd.read_csv("companies.txt",sep="\t", encoding='iso8859')
print(len(companies.index))

114949
66368


##### Inspect the 'companies' datafames

In [3]:
# Print the first 5 rows of the dataframe
print(companies.head(5))

#Check the Shape and Dimensions of the dataframe
print(companies.shape)
print(companies.ndim)

#Check the columns and their datatypes within the dataframe
print(companies.dtypes)

#Check the basic statistics of the columns in the dataframe
print(companies.describe())

                             permalink                    name  \
0                  /Organization/-Fame                   #fame   
1               /Organization/-Qounter                :Qounter   
2  /Organization/-The-One-Of-Them-Inc-  (THE) ONE of THEM,Inc.   
3                /Organization/0-6-Com                 0-6.com   
4       /Organization/004-Technologies        004 Technologies   

                        homepage_url  \
0                 http://livfame.com   
1             http://www.qounter.com   
2                http://oneofthem.jp   
3                 http://www.0-6.com   
4  http://004gmbh.de/en/004-interact   

                                       category_list     status country_code  \
0                                              Media  operating          IND   
1  Application Platforms|Real Time|Social Network...  operating          USA   
2                                  Apps|Games|Mobile  operating          NaN   
3                                        C

##### Inspect the 'rounds2' datafames

In [4]:
# Print the first 5 rows of the dataframe
print(rounds2.head(5))

#Check the Shape and Dimensions of the dataframe
print(rounds2.shape)
print(rounds2.ndim)

#Check the columns and their datatypes within the dataframe
print(rounds2.dtypes)

#Check the basic statistics of the columns in the dataframe
print(rounds2.describe())

                     company_permalink  \
0                  /organization/-fame   
1               /ORGANIZATION/-QOUNTER   
2               /organization/-qounter   
3  /ORGANIZATION/-THE-ONE-OF-THEM-INC-   
4                /organization/0-6-com   

                           funding_round_permalink funding_round_type  \
0  /funding-round/9a01d05418af9f794eebff7ace91f638            venture   
1  /funding-round/22dacff496eb7acb2b901dec1dfe5633            venture   
2  /funding-round/b44fbb94153f6cdef13083530bb48030               seed   
3  /funding-round/650b8f704416801069bb178a1418776b            venture   
4  /funding-round/5727accaeaa57461bd22a9bdd945382d            venture   

  funding_round_code   funded_at  raised_amount_usd  
0                  B  05-01-2015         10000000.0  
1                  A  14-10-2014                NaN  
2                NaN  01-03-2014           700000.0  
3                  B  30-01-2014          3406878.0  
4                  A  19-03-2008      

### Results Expected:
1. How many unique companies are present in rounds2?
2. How many unique companies are present in companies?
3. In the companies data frame, which column can be used as the unique key for each company? Write the name of the column.
4. Are there any companies in the rounds2 file which are not present in companies? Answer yes or no: Y/N
5. Merge the two data frames so that all variables (columns) in the companies frame are added to the rounds2 data frame. Name the merged frame master_frame. How many observations are present in master_frame?


In [5]:
# 1. How many unique companies are present in rounds2?
ucount = companies.permalink.nunique()
print(ucount)
# 2. How many unique companies are present in companies?
rcount = rounds2.company_permalink.nunique()
print(rcount)
# Before the clean up of data,counts are checked. 

66368
90247


rounds2 and companies files are having mixed case in permalink. As it needs to be used as Key, clean up needs to be done.

### Data Cleanup for Permalink

In [6]:
# Before Clean Up
rounds2.iloc[[29597,31863],:]

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd
29597,/ORGANIZATION/E-CÃBICA,/funding-round/8491f74869e4fe8ba9c378394f8fbdea,seed,,01-02-2015,
31863,/ORGANIZATION/ENERGYSTONE-GAMES-ÇµÇ³Æ¸¸Æ,/funding-round/b89553f3d2279c5683ae93f45a21cfe0,seed,,09-08-2014,


In [7]:
# Clean up of Data and Creating a clean encoded column. This will be further used to create Master data frame

companies['clean_permalink']=pd.DataFrame(companies['permalink'].str.encode('iso8859').str.decode('utf8','ignore').str.upper())
rounds2['clean_permalink']=pd.DataFrame(rounds2['company_permalink'].str.encode('iso8859').str.decode('utf8','ignore').str.upper())
#Verifciation
#rounds2.iloc[[29597,31863],:]
rounds2.loc[~rounds2['clean_permalink'].isin(companies['clean_permalink']), :]
# Tried converting to UTF8. However, the issue is still not resolved. Hence converting to Ascii
companies['clean_permalink']=pd.DataFrame(companies['permalink'].str.encode('iso8859').str.decode('ascii','ignore').str.upper())
rounds2['clean_permalink']=pd.DataFrame(rounds2['company_permalink'].str.encode('iso8859').str.decode('ascii','ignore').str.upper())
# Conversion is only for the Keys. The other columns like company Name etc, should be in UTF8

In [8]:
# 1. How many unique companies are present in rounds2?
ucount = companies.clean_permalink.nunique()
print('companies count: ', ucount)
# 2. How many unique companies are present in companies?
rcount = rounds2.clean_permalink.nunique()
print('rounds_company count: ',rcount)

companies count:  66368
rounds_company count:  66368


In the companies data frame, which column can be used as the  unique key for each company? 
Write the name of the column - permalink

In [9]:
# 4. Are there any companies in the rounds2 file which are not present in companies? Answer yes or no: Y/N :
comp_notin_rounds = rounds2.loc[~rounds2['clean_permalink'].isin(companies['clean_permalink']), :]
if len(comp_notin_rounds.index) == 0: print('Answer: ','N') 
else: print('Answer: ','Y')
# Now, We see that All companies which are rounds2 are in companies as well

Answer:  N


Are there any companies in the rounds2 file which are not present in companies? : Answer is NO

### Merging Companies and Rounds

In [10]:
# Creating Master Dataframe
master_frame = pd.merge(rounds2, companies,how='left',on='clean_permalink')
print(master_frame.shape)
print(100*(master_frame.isnull().sum()/len(master_frame.index)))

(114949, 17)
company_permalink           0.000000
funding_round_permalink     0.000000
funding_round_type          0.000000
funding_round_code         72.909725
funded_at                   0.000000
raised_amount_usd          17.390321
clean_permalink             0.000000
permalink                   0.000000
name                        0.000870
homepage_url                5.336280
category_list               2.966533
status                      0.000000
country_code                7.549435
state_code                  9.522484
region                      8.844792
city                        8.842182
founded_at                 17.852265
dtype: float64


> **Observation** : We notice that there are about 17% null values in Raised Amount (USD). However, by extrapolating or by applying 0's instead of nulls will affect the averages. Also removing the data would affect the number of funding rounds metrics. Hence Not applying any other cleaning processes

In [11]:
master_frame

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd,clean_permalink,permalink,name,homepage_url,category_list,status,country_code,state_code,region,city,founded_at
0,/organization/-fame,/funding-round/9a01d05418af9f794eebff7ace91f638,venture,B,05-01-2015,10000000.0,/ORGANIZATION/-FAME,/Organization/-Fame,#fame,http://livfame.com,Media,operating,IND,16,Mumbai,Mumbai,
1,/ORGANIZATION/-QOUNTER,/funding-round/22dacff496eb7acb2b901dec1dfe5633,venture,A,14-10-2014,,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014
2,/organization/-qounter,/funding-round/b44fbb94153f6cdef13083530bb48030,seed,,01-03-2014,700000.0,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014
3,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/funding-round/650b8f704416801069bb178a1418776b,venture,B,30-01-2014,3406878.0,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/Organization/-The-One-Of-Them-Inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,Apps|Games|Mobile,operating,,,,,
4,/organization/0-6-com,/funding-round/5727accaeaa57461bd22a9bdd945382d,venture,A,19-03-2008,2000000.0,/ORGANIZATION/0-6-COM,/Organization/0-6-Com,0-6.com,http://www.0-6.com,Curated Web,operating,CHN,22,Beijing,Beijing,01-01-2007
5,/ORGANIZATION/004-TECHNOLOGIES,/funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830,venture,,24-07-2014,,/ORGANIZATION/004-TECHNOLOGIES,/Organization/004-Technologies,004 Technologies,http://004gmbh.de/en/004-interact,Software,operating,USA,IL,"Springfield, Illinois",Champaign,01-01-2010
6,/organization/01games-technology,/funding-round/7d53696f2b4f607a2f2a8cbb83d01839,undisclosed,,01-07-2014,41250.0,/ORGANIZATION/01GAMES-TECHNOLOGY,/Organization/01Games-Technology,01Games Technology,http://www.01games.hk/,Games,operating,HKG,,Hong Kong,Hong Kong,
7,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/funding-round/2b9d3ac293d5cdccbecff5c8cb0f327d,seed,,11-09-2009,43360.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997
8,/organization/0ndine-biomedical-inc,/funding-round/954b9499724b946ad8c396a57a5f3b72,venture,,21-12-2009,719491.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997
9,/ORGANIZATION/0XDATA,/funding-round/383a9bd2c04f7038bb543ccef5ba3eae,seed,,22-05-2013,3000000.0,/ORGANIZATION/0XDATA,/Organization/0Xdata,H2O.ai,http://h2o.ai/,Analytics,operating,USA,CA,SF Bay Area,Mountain View,01-01-2011


In [12]:
# Creating Company Name in a clean format, so that it can be used for reporting in case needed, Using Title function
master_frame['company_name']=pd.DataFrame(master_frame['name'].str.encode('iso8859').str.decode('utf8','ignore').str.lower().str.title())
master_frame

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd,clean_permalink,permalink,name,homepage_url,category_list,status,country_code,state_code,region,city,founded_at,company_name
0,/organization/-fame,/funding-round/9a01d05418af9f794eebff7ace91f638,venture,B,05-01-2015,10000000.0,/ORGANIZATION/-FAME,/Organization/-Fame,#fame,http://livfame.com,Media,operating,IND,16,Mumbai,Mumbai,,#Fame
1,/ORGANIZATION/-QOUNTER,/funding-round/22dacff496eb7acb2b901dec1dfe5633,venture,A,14-10-2014,,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter
2,/organization/-qounter,/funding-round/b44fbb94153f6cdef13083530bb48030,seed,,01-03-2014,700000.0,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter
3,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/funding-round/650b8f704416801069bb178a1418776b,venture,B,30-01-2014,3406878.0,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/Organization/-The-One-Of-Them-Inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,Apps|Games|Mobile,operating,,,,,,"(The) One Of Them,Inc."
4,/organization/0-6-com,/funding-round/5727accaeaa57461bd22a9bdd945382d,venture,A,19-03-2008,2000000.0,/ORGANIZATION/0-6-COM,/Organization/0-6-Com,0-6.com,http://www.0-6.com,Curated Web,operating,CHN,22,Beijing,Beijing,01-01-2007,0-6.Com
5,/ORGANIZATION/004-TECHNOLOGIES,/funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830,venture,,24-07-2014,,/ORGANIZATION/004-TECHNOLOGIES,/Organization/004-Technologies,004 Technologies,http://004gmbh.de/en/004-interact,Software,operating,USA,IL,"Springfield, Illinois",Champaign,01-01-2010,004 Technologies
6,/organization/01games-technology,/funding-round/7d53696f2b4f607a2f2a8cbb83d01839,undisclosed,,01-07-2014,41250.0,/ORGANIZATION/01GAMES-TECHNOLOGY,/Organization/01Games-Technology,01Games Technology,http://www.01games.hk/,Games,operating,HKG,,Hong Kong,Hong Kong,,01Games Technology
7,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/funding-round/2b9d3ac293d5cdccbecff5c8cb0f327d,seed,,11-09-2009,43360.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.
8,/organization/0ndine-biomedical-inc,/funding-round/954b9499724b946ad8c396a57a5f3b72,venture,,21-12-2009,719491.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.
9,/ORGANIZATION/0XDATA,/funding-round/383a9bd2c04f7038bb543ccef5ba3eae,seed,,22-05-2013,3000000.0,/ORGANIZATION/0XDATA,/Organization/0Xdata,H2O.ai,http://h2o.ai/,Analytics,operating,USA,CA,SF Bay Area,Mountain View,01-01-2011,H2O.Ai


In [13]:
master_frame.loc[master_frame.clean_permalink=='/ORGANIZATION/ASYS-2']

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd,clean_permalink,permalink,name,homepage_url,category_list,status,country_code,state_code,region,city,founded_at,company_name
114947,/ORGANIZATION/ÃASYS-2,/funding-round/35f09d0794651719b02bbfd859ba9ff5,seed,,01-01-2015,18192.0,/ORGANIZATION/ASYS-2,/Organization/ÃAsys-2,Ãasys,http://www.oasys.io/,Consumer Electronics|Internet of Things|Teleco...,operating,USA,CA,SF Bay Area,San Francisco,01-01-2014,Ôasys


Merge the two data frames so that all  variables (columns)  in the companies frame are added to the rounds2 data frame. Name the merged frame master_frame. How many observations are present in master_frame ?
114949

>> ### Checkpoint 1: Data Cleaning 1 Complete

### Checkpoint 2: Funding Type Analysis

Considering the constraints of Spark Funds, you have to decide one funding type which is most suitable for them.
1.	Calculate the average investment amount for each of the four funding types (venture, angel, seed, and private equity).
2.	Based on the average investment amount calculated above, which investment type do you think is the most suitable for Spark Funds?


In [14]:
# 1. Creating group by on funding type
master_funding_type = master_frame.groupby(['funding_round_type'])

# 2. Finding the average funding type in millions
funding_amt_per_type = pd.DataFrame(master_funding_type['raised_amount_usd'].mean())
funding_amt_per_type.sort_values(by=['raised_amount_usd'],ascending=False, inplace=True)
print(funding_amt_per_type.raised_amount_usd.apply(lambda x: '%.2f' % x))


funding_round_type
post_ipo_debt            168704571.82
post_ipo_equity           82182493.87
secondary_market          79649630.10
private_equity            73308593.03
undisclosed               19242370.23
debt_financing            17043526.02
venture                   11748949.13
grant                      4300576.34
convertible_note           1453438.54
product_crowdfunding       1363131.07
angel                       958694.47
seed                        719818.00
equity_crowdfunding         538368.21
non_equity_assistance       411203.05
Name: raised_amount_usd, dtype: object


As seen in the result above, between venture, angel, seed, and private equity, **"Venture"** is more suitable for investments, for Spark Funds, as their average investment is between range of 5 - 15 million. ***And Venture is having $ 11.74 M as average funding ***

### Check Point 2 Completed

## Check Point 3 - Country Analysis

**Spark Funds** wants to invest in countries with the highest amount of funding for the chosen investment type. This is a part of its broader strategy to invest where most investments are occurring.
 
1.	Spark Funds wants to see the **top nine countries** which have received the highest total funding (across ALL sectors for the chosen investment type)
2.	For the chosen investment type, make a data frame named top9 with the top nine countries (based on the total investment amount each country has received)
 
Identifying the top three English-speaking countries in the data frame top9.


In [15]:
#Spark Funds wants to see the top nine countries which have received the highest total funding 
#(across ALL sectors for the chosen investment type)

# List of English Speaking Countries for further use 
# Country name-code mapping data obtained from secondary source
eng_speaking_countries = pd.DataFrame({ 'country_name' : ['Botswana','Cameroon','Ethiopia','Eritrea','The Gambia','Ghana','Kenya','Lesotho','Liberia',
'Malawi','Mauritius','Namibia','Nigeria','Rwanda','Seychelles','Sierra Leone','South Africa',
'South Sudan','Sudan','Swaziland','Tanzania','Uganda','Zambia','Zimbabwe','Antigua and Barbuda',
'The Bahamas','Barbados','Belize','Canada','Dominica','Grenada','Guyana','Jamaica','Saint Kitts and Nevis',
'Saint Lucia','Saint Vincent and the Grenadines','Trinidad and Tobago','United States','India',
'Pakistan','Philippines','Singapore','Australia','Fiji','Kiribati',
'Marshall Islands','Federated States of Micronesia','Nauru','New Zealand','Palau','Papua New Guinea',
'Samoa','Solomon Islands','Tonga','Tuvalu','Vanuatu','Ireland','Malta','United Kingdom'],
                              'country_code' : ['BWA','CMR','ETH','ERI','GMB','GHA','KEN','LSO','LBR','MWI','MUS','NAM','NGA',
'RWA','SYC','SLE','ZAF','SSD','SDN','SWZ','TZA','UGA','ZMB','ZWE','ATG','BHS',
'BRB','BLZ','CAN','DMA','GRD','GUY','JAM','KNA','LCA','VCT','TTO','USA','IND',
'PAK','PHL','SGP','AUS','FJI','KIR','MHL','FSM','NRU','NZL','PLW','PNG','WSM',
'SLB','TON','TUV','VUT','IRL','MLT','GBR']
                                    })

# Merging Countries to Master Frame, so that country names can be used for the reporting as well, for english speaking countries.
#master_frame = pd.merge(master_frame,eng_speaking_countries,how='left', on='country_code')
print(len(master_frame.index))
# In General, the top countries of Funding, With Venture as the funding Investment type.

venture_master_frame = master_frame.loc[(master_frame['funding_round_type']=='venture')]

# This gives only Top Coutries in General, for Venture type of investment.
funding_per_country = venture_master_frame.groupby(['country_code'])

#Country-wise raised amount in millions
tot_investments_per_country = pd.DataFrame(funding_per_country['raised_amount_usd'].sum()/1000000)
tot_investments_per_country.columns = ['raised_amount_m_usd']
top9 = pd.DataFrame(tot_investments_per_country.sort_values(['raised_amount_m_usd'], ascending=False).head(9)).reset_index()

# The top 9 countries for venture investments
print('Top 9 Countries for Venture: \n',top9)

# Finding the Top 3 English Speaking Countries
Top3_Eng = top9.loc[top9['country_code'].isin(eng_speaking_countries.country_code)].head(3)
print('\nTop 3 English Speaking Countries: \n',Top3_Eng)



114949
Top 9 Countries for Venture: 
   country_code  raised_amount_m_usd
0          USA        422510.842796
1          CHN         39835.418773
2          GBR         20245.627416
3          IND         14391.858718
4          CAN          9583.332317
5          FRA          7259.536732
6          ISR          6907.514579
7          DEU          6346.959822
8          JPN          3363.676611

Top 3 English Speaking Countries: 
   country_code  raised_amount_m_usd
0          USA        422510.842796
2          GBR         20245.627416
3          IND         14391.858718


>> As we see, The top 3 English Speaking countries for Venture Funding are 
 1. **USA**
 2. **GBR**
 3. **IND**


## Check Point 3 Completed

# Checkpoint 4: Sector Analysis 1

The **business rule** that the first string before the vertical bar will be considered the primary sector. 
1.	Extract the primary sector of each category list from the category_list column
2.	Use the mapping file 'mapping.csv' to map each primary sector to one of the eight main sectors (Note that ‘Others’ is also considered one of the main sectors)

**Expected Results:** Code for a merged data frame with each primary sector mapped to its main sector (the primary sector should be present in a separate column).


In [16]:
# Importing Mapping csv into Dataframe. Using Melt to convert Pivot table into a regular table, for ease of manipulations
mapping = pd.read_csv("mapping.csv",sep=",",encoding='iso8859')
mapping

#Replace the 0's with na
mapping['category_list'].replace(to_replace = '0', value='na', regex = True, inplace = True)
#Replace 'Enterprise 2.na' with 'Enterprise 2.0'
mapping['category_list'].replace(to_replace = 'Enterprise 2.na', value='Enterprise 2.0', regex = True, inplace = True)


# Add a column known as primary_sector to the master_frame
master_frame = master_frame.assign(primary_sector=np.nan)

In [17]:
# Copy the category_list data into the primary_sector column
master_frame['primary_sector'] = master_frame['category_list']

# Remove all the sectors after the first pipe, leaving behind only the primary sector
master_frame['primary_sector']=master_frame['primary_sector'].str.split('|').str[0]

In [18]:
master_frame

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd,clean_permalink,permalink,name,homepage_url,category_list,status,country_code,state_code,region,city,founded_at,company_name,primary_sector
0,/organization/-fame,/funding-round/9a01d05418af9f794eebff7ace91f638,venture,B,05-01-2015,10000000.0,/ORGANIZATION/-FAME,/Organization/-Fame,#fame,http://livfame.com,Media,operating,IND,16,Mumbai,Mumbai,,#Fame,Media
1,/ORGANIZATION/-QOUNTER,/funding-round/22dacff496eb7acb2b901dec1dfe5633,venture,A,14-10-2014,,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter,Application Platforms
2,/organization/-qounter,/funding-round/b44fbb94153f6cdef13083530bb48030,seed,,01-03-2014,700000.0,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter,Application Platforms
3,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/funding-round/650b8f704416801069bb178a1418776b,venture,B,30-01-2014,3406878.0,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/Organization/-The-One-Of-Them-Inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,Apps|Games|Mobile,operating,,,,,,"(The) One Of Them,Inc.",Apps
4,/organization/0-6-com,/funding-round/5727accaeaa57461bd22a9bdd945382d,venture,A,19-03-2008,2000000.0,/ORGANIZATION/0-6-COM,/Organization/0-6-Com,0-6.com,http://www.0-6.com,Curated Web,operating,CHN,22,Beijing,Beijing,01-01-2007,0-6.Com,Curated Web
5,/ORGANIZATION/004-TECHNOLOGIES,/funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830,venture,,24-07-2014,,/ORGANIZATION/004-TECHNOLOGIES,/Organization/004-Technologies,004 Technologies,http://004gmbh.de/en/004-interact,Software,operating,USA,IL,"Springfield, Illinois",Champaign,01-01-2010,004 Technologies,Software
6,/organization/01games-technology,/funding-round/7d53696f2b4f607a2f2a8cbb83d01839,undisclosed,,01-07-2014,41250.0,/ORGANIZATION/01GAMES-TECHNOLOGY,/Organization/01Games-Technology,01Games Technology,http://www.01games.hk/,Games,operating,HKG,,Hong Kong,Hong Kong,,01Games Technology,Games
7,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/funding-round/2b9d3ac293d5cdccbecff5c8cb0f327d,seed,,11-09-2009,43360.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.,Biotechnology
8,/organization/0ndine-biomedical-inc,/funding-round/954b9499724b946ad8c396a57a5f3b72,venture,,21-12-2009,719491.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.,Biotechnology
9,/ORGANIZATION/0XDATA,/funding-round/383a9bd2c04f7038bb543ccef5ba3eae,seed,,22-05-2013,3000000.0,/ORGANIZATION/0XDATA,/Organization/0Xdata,H2O.ai,http://h2o.ai/,Analytics,operating,USA,CA,SF Bay Area,Mountain View,01-01-2011,H2O.Ai,Analytics


In [19]:
# Use the melt function on the mappings file to unpivot the data and store it in a new dataframe mapping_refined
mapping_refined=pd.DataFrame(pd.melt(mapping, id_vars='category_list'))

#Select only the rows where value is 1
mapping_refined=mapping_refined.loc[mapping_refined['value']==1,:]

#Get rid of the 'value' column in mappings_refined
mapping_refined.drop(columns=['value'], inplace= True)

#Change the column names in mapping_refined table
mapping_refined.columns=['primary', 'main_sector']

In [20]:
mapping_refined

Unnamed: 0,primary,main_sector
8,Adventure Travel,Automotive & Sports
14,Aerospace,Automotive & Sports
45,Auto,Automotive & Sports
46,Automated Kiosk,Automotive & Sports
47,Automotive,Automotive & Sports
57,Bicycles,Automotive & Sports
69,Boating Industry,Automotive & Sports
87,CAD,Automotive & Sports
93,Cars,Automotive & Sports
188,Design,Automotive & Sports


In [21]:
# Perform a join between the master_frame and mapping_refiend table to identify the main sectors corresponding to the primary sectors
master_frame = pd.merge(master_frame, mapping_refined, how='left', left_on = 'primary_sector', right_on = 'primary')


In [22]:
master_frame

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd,clean_permalink,permalink,name,homepage_url,...,status,country_code,state_code,region,city,founded_at,company_name,primary_sector,primary,main_sector
0,/organization/-fame,/funding-round/9a01d05418af9f794eebff7ace91f638,venture,B,05-01-2015,10000000.0,/ORGANIZATION/-FAME,/Organization/-Fame,#fame,http://livfame.com,...,operating,IND,16,Mumbai,Mumbai,,#Fame,Media,Media,Entertainment
1,/ORGANIZATION/-QOUNTER,/funding-round/22dacff496eb7acb2b901dec1dfe5633,venture,A,14-10-2014,,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter,Application Platforms,Application Platforms,"News, Search and Messaging"
2,/organization/-qounter,/funding-round/b44fbb94153f6cdef13083530bb48030,seed,,01-03-2014,700000.0,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter,Application Platforms,Application Platforms,"News, Search and Messaging"
3,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/funding-round/650b8f704416801069bb178a1418776b,venture,B,30-01-2014,3406878.0,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/Organization/-The-One-Of-Them-Inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,...,operating,,,,,,"(The) One Of Them,Inc.",Apps,Apps,"News, Search and Messaging"
4,/organization/0-6-com,/funding-round/5727accaeaa57461bd22a9bdd945382d,venture,A,19-03-2008,2000000.0,/ORGANIZATION/0-6-COM,/Organization/0-6-Com,0-6.com,http://www.0-6.com,...,operating,CHN,22,Beijing,Beijing,01-01-2007,0-6.Com,Curated Web,Curated Web,"News, Search and Messaging"
5,/ORGANIZATION/004-TECHNOLOGIES,/funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830,venture,,24-07-2014,,/ORGANIZATION/004-TECHNOLOGIES,/Organization/004-Technologies,004 Technologies,http://004gmbh.de/en/004-interact,...,operating,USA,IL,"Springfield, Illinois",Champaign,01-01-2010,004 Technologies,Software,Software,Others
6,/organization/01games-technology,/funding-round/7d53696f2b4f607a2f2a8cbb83d01839,undisclosed,,01-07-2014,41250.0,/ORGANIZATION/01GAMES-TECHNOLOGY,/Organization/01Games-Technology,01Games Technology,http://www.01games.hk/,...,operating,HKG,,Hong Kong,Hong Kong,,01Games Technology,Games,Games,Entertainment
7,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/funding-round/2b9d3ac293d5cdccbecff5c8cb0f327d,seed,,11-09-2009,43360.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,...,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.,Biotechnology,Biotechnology,Cleantech / Semiconductors
8,/organization/0ndine-biomedical-inc,/funding-round/954b9499724b946ad8c396a57a5f3b72,venture,,21-12-2009,719491.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,...,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.,Biotechnology,Biotechnology,Cleantech / Semiconductors
9,/ORGANIZATION/0XDATA,/funding-round/383a9bd2c04f7038bb543ccef5ba3eae,seed,,22-05-2013,3000000.0,/ORGANIZATION/0XDATA,/Organization/0Xdata,H2O.ai,http://h2o.ai/,...,operating,USA,CA,SF Bay Area,Mountain View,01-01-2011,H2O.Ai,Analytics,Analytics,"Social, Finance, Analytics, Advertising"


In [23]:
#Get rid of the unecessary join columns in master_frame
master_frame.drop(columns=['primary'], inplace= True)

In [24]:
master_frame

Unnamed: 0,company_permalink,funding_round_permalink,funding_round_type,funding_round_code,funded_at,raised_amount_usd,clean_permalink,permalink,name,homepage_url,category_list,status,country_code,state_code,region,city,founded_at,company_name,primary_sector,main_sector
0,/organization/-fame,/funding-round/9a01d05418af9f794eebff7ace91f638,venture,B,05-01-2015,10000000.0,/ORGANIZATION/-FAME,/Organization/-Fame,#fame,http://livfame.com,Media,operating,IND,16,Mumbai,Mumbai,,#Fame,Media,Entertainment
1,/ORGANIZATION/-QOUNTER,/funding-round/22dacff496eb7acb2b901dec1dfe5633,venture,A,14-10-2014,,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter,Application Platforms,"News, Search and Messaging"
2,/organization/-qounter,/funding-round/b44fbb94153f6cdef13083530bb48030,seed,,01-03-2014,700000.0,/ORGANIZATION/-QOUNTER,/Organization/-Qounter,:Qounter,http://www.qounter.com,Application Platforms|Real Time|Social Network...,operating,USA,DE,DE - Other,Delaware City,04-09-2014,:Qounter,Application Platforms,"News, Search and Messaging"
3,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/funding-round/650b8f704416801069bb178a1418776b,venture,B,30-01-2014,3406878.0,/ORGANIZATION/-THE-ONE-OF-THEM-INC-,/Organization/-The-One-Of-Them-Inc-,"(THE) ONE of THEM,Inc.",http://oneofthem.jp,Apps|Games|Mobile,operating,,,,,,"(The) One Of Them,Inc.",Apps,"News, Search and Messaging"
4,/organization/0-6-com,/funding-round/5727accaeaa57461bd22a9bdd945382d,venture,A,19-03-2008,2000000.0,/ORGANIZATION/0-6-COM,/Organization/0-6-Com,0-6.com,http://www.0-6.com,Curated Web,operating,CHN,22,Beijing,Beijing,01-01-2007,0-6.Com,Curated Web,"News, Search and Messaging"
5,/ORGANIZATION/004-TECHNOLOGIES,/funding-round/1278dd4e6a37fa4b7d7e06c21b3c1830,venture,,24-07-2014,,/ORGANIZATION/004-TECHNOLOGIES,/Organization/004-Technologies,004 Technologies,http://004gmbh.de/en/004-interact,Software,operating,USA,IL,"Springfield, Illinois",Champaign,01-01-2010,004 Technologies,Software,Others
6,/organization/01games-technology,/funding-round/7d53696f2b4f607a2f2a8cbb83d01839,undisclosed,,01-07-2014,41250.0,/ORGANIZATION/01GAMES-TECHNOLOGY,/Organization/01Games-Technology,01Games Technology,http://www.01games.hk/,Games,operating,HKG,,Hong Kong,Hong Kong,,01Games Technology,Games,Entertainment
7,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/funding-round/2b9d3ac293d5cdccbecff5c8cb0f327d,seed,,11-09-2009,43360.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.,Biotechnology,Cleantech / Semiconductors
8,/organization/0ndine-biomedical-inc,/funding-round/954b9499724b946ad8c396a57a5f3b72,venture,,21-12-2009,719491.0,/ORGANIZATION/0NDINE-BIOMEDICAL-INC,/Organization/0Ndine-Biomedical-Inc,Ondine Biomedical Inc.,http://ondinebio.com,Biotechnology,operating,CAN,BC,Vancouver,Vancouver,01-01-1997,Ondine Biomedical Inc.,Biotechnology,Cleantech / Semiconductors
9,/ORGANIZATION/0XDATA,/funding-round/383a9bd2c04f7038bb543ccef5ba3eae,seed,,22-05-2013,3000000.0,/ORGANIZATION/0XDATA,/Organization/0Xdata,H2O.ai,http://h2o.ai/,Analytics,operating,USA,CA,SF Bay Area,Mountain View,01-01-2011,H2O.Ai,Analytics,"Social, Finance, Analytics, Advertising"


### Checkpoint 4 Complete
Code for a merged data frame with each primary sector mapped to its main sector (the primary sector should be present in a separate column) created

## Checkpoint 5: Sector Analysis 2

The range of funding preferred by Spark Funds is 5 to 15 million USD.
 
Now, the aim is to find out the most heavily invested main sectors in each of the three countries (for funding type Venture and investments range of 5-15 M USD).
1.	Creating three separate data frames D1, D2 and D3 for each of the three countries containing the observations of funding type FT falling within the 5-15 million USD range. The three data frames would contain:
    * All the columns of the master_frame along with the primary sector and the main sector
    * The total number (or count) of investments for each main sector in a separate column
    * The total amount invested in each main sector in a separate column

Using these three data frames, we can calculate the total number and amount of investments in each main sector.
 
**Result Expected**
1.	Three data frames D1, D2 and D3 
2.	Based on the analysis of the sectors, which main sectors and countries would you recommend Spark Funds to invest in? Present your conclusions in the presentation. The conclusions are subjective (i.e. there may be no ‘one right answer’), but it should be based on the basic strategy — invest in sectors where most investments are occurring. 


In [25]:
#Filter out the masterframe for the funding type is 'venture'
final_venture_master_frame = master_frame.loc[(master_frame['funding_round_type']=='venture')]

# Filtering and creating separate data frames for the three countries - USA, GBR and IND
D1 = final_venture_master_frame.loc[(final_venture_master_frame.country_code == 'USA')&(final_venture_master_frame['raised_amount_usd'] >= 5000000) & (final_venture_master_frame['raised_amount_usd'] <= 15000000)]
D2 = final_venture_master_frame.loc[(final_venture_master_frame.country_code == 'GBR')&(final_venture_master_frame['raised_amount_usd'] >= 5000000) & (final_venture_master_frame['raised_amount_usd'] <= 15000000)]
D3 = final_venture_master_frame.loc[(final_venture_master_frame.country_code == 'IND')&(final_venture_master_frame['raised_amount_usd'] >= 5000000) & (final_venture_master_frame['raised_amount_usd'] <= 15000000)]
print('USA D1: ',len(D1.index))
print('GBR D2: ',len(D2.index))
print('IND D3: ',len(D3.index))


USA D1:  12150
GBR D2:  628
IND D3:  330


Now that the 3 dataframes are created, with Venture as its Funding type and between 5 to 15 Mill USD funding range, we can calculate the different metrics.

In [26]:
# 1. Total number of investments (count)
print('USA Total Investments :- ',D1.funding_round_permalink.nunique())
print('GBR Total Investments :- ',D2.funding_round_permalink.nunique())
print('IND Total Investments :- ',D3.funding_round_permalink.nunique())



USA Total Investments :-  12150
GBR Total Investments :-  628
IND Total Investments :-  330


In [27]:
# 2. Total amount of investment (in millions of USD)
print('USA AMT Raised in M :- ',D1.raised_amount_usd.sum()/1000000)
print('GBR AMT Raised in M :- ',D2.raised_amount_usd.sum()/1000000)
print('IND AMT Raised in M :- ',D3.raised_amount_usd.sum()/1000000)


USA AMT Raised in M :-  108531.347515
GBR AMT Raised in M :-  5436.843539
IND AMT Raised in M :-  2976.543602


In [28]:
#############################################################################################
# 3. Top sector (based on count of investments)
# 4. Second-best sector (based on count of investments)
# 5. Third-best sector (based on count of investments)
# 6. Number of investments in the top sector (refer to point 3)
# 7. Number of investments in the second-best sector (refer to point 4)
# 8. Number of investments in the third-best sector (refer to point 5)
#############################################################################################
############################All Above questions answered from three datasets#################
#############################################################################################

#Top Investments sectors in USA

D1_USA_Investments = pd.DataFrame(D1.groupby(D1['main_sector']).agg({'raised_amount_usd':'sum','funding_round_permalink':'count'})).reset_index()
D1_USA_Investments['total_amount_invested_M_usd'] = D1_USA_Investments['raised_amount_usd']/1000000
D1_USA_Investments = D1_USA_Investments.rename(columns={'funding_round_permalink':'total_num_investments' })
D1_USA_Investments = D1_USA_Investments.drop(columns=['raised_amount_usd'])

#total investments in millions of USD
D1_USA_Investments.sort_values(['total_num_investments','total_amount_invested_M_usd'],ascending=False)



# 9. For the top sector count-wise (point 3), which company received the highest investment?
# 10. For the second-best sector count-wise (point 4), which company received the highest investment?

Unnamed: 0,main_sector,total_num_investments,total_amount_invested_M_usd
7,Others,2950,26321.007002
8,"Social, Finance, Analytics, Advertising",2714,23807.376964
2,Cleantech / Semiconductors,2300,21206.628192
6,"News, Search and Messaging",1582,13959.567428
4,Health,909,8211.859357
5,Manufacturing,799,7258.553378
3,Entertainment,591,5099.197982
0,Automotive & Sports,167,1454.104361
1,Blanks,86,764.763292


In [29]:
#Top Investments sectors in GBR

D2_GBR_Investments = pd.DataFrame(D2.groupby(D2['main_sector']).agg({'raised_amount_usd':'sum','funding_round_permalink':'count'})).reset_index()
D2_GBR_Investments['total_amount_invested_M_usd'] = D2_GBR_Investments['raised_amount_usd']/1000000
D2_GBR_Investments = D2_GBR_Investments.rename(columns={'funding_round_permalink':'total_num_investments' })
D2_GBR_Investments = D2_GBR_Investments.drop(columns=['raised_amount_usd'])
D2_GBR_Investments.sort_values(['total_num_investments','total_amount_invested_M_usd'],ascending=False)


Unnamed: 0,main_sector,total_num_investments,total_amount_invested_M_usd
7,Others,147,1283.624289
8,"Social, Finance, Analytics, Advertising",133,1089.404014
2,Cleantech / Semiconductors,128,1150.139665
6,"News, Search and Messaging",73,615.746235
3,Entertainment,56,482.784687
5,Manufacturing,42,361.940335
4,Health,24,214.53751
0,Automotive & Sports,16,167.051565
1,Blanks,7,57.764848


In [30]:
#Top Investments sectors in IND

D3_IND_Investments = pd.DataFrame(D3.groupby(D3['main_sector']).agg({'raised_amount_usd':'sum','funding_round_permalink':'count'})).reset_index()
D3_IND_Investments['total_amount_invested_M_usd'] = D3_IND_Investments['raised_amount_usd']/1000000
D3_IND_Investments = D3_IND_Investments.rename(columns={'funding_round_permalink':'total_num_investments' })
D3_IND_Investments = D3_IND_Investments.drop(columns=['raised_amount_usd'])
D3_IND_Investments.sort_values(['total_num_investments','total_amount_invested_M_usd'],ascending=False)

Unnamed: 0,main_sector,total_num_investments,total_amount_invested_M_usd
7,Others,110,1013.409507
8,"Social, Finance, Analytics, Advertising",60,550.54955
6,"News, Search and Messaging",52,433.834545
3,Entertainment,33,280.83
5,Manufacturing,21,200.9
2,Cleantech / Semiconductors,20,165.38
4,Health,19,167.74
0,Automotive & Sports,13,136.9
1,Blanks,2,27.0


In [31]:
USA_Top1_Sector = D1_USA_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[:1]
USA_Top2_Sector = D1_USA_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[1:2]
USA_Top3_Sector = D1_USA_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[2:3]
print(USA_Top1_Sector.to_string(index=False))
print(USA_Top2_Sector.to_string(index=False))
print(USA_Top3_Sector.to_string(index=False))

Others
Social, Finance, Analytics, Advertising
Cleantech / Semiconductors


In [32]:
D1_top_sector = D1.loc[D1['main_sector']=='Others',:]
D1_groupby_company = D1_top_sector.groupby('company_name')
D1_company_final = pd.DataFrame(D1_groupby_company['raised_amount_usd'].sum().sort_values(ascending = False)).reset_index()

print('Top Company for Top Sector:')
print(D1_company_final.company_name[0])

D1_second_sector = D1.loc[D1['main_sector']=='Social, Finance, Analytics, Advertising',:]
D1_groupby_secsector_company = D1_second_sector.groupby('company_name')
D1_secsector_company_final = pd.DataFrame(D1_groupby_secsector_company['raised_amount_usd'].sum().sort_values(ascending = False)).reset_index()

print('\nTop Company for Second Sector:')
print(D1_secsector_company_final.company_name[0])

Top Company for Top Sector:
Virtustream

Top Company for Second Sector:
Sst Inc. (Formerly Shotspotter)


In [33]:
GBR_Top1_Sector = D2_GBR_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[:1]
GBR_Top2_Sector = D2_GBR_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[1:2]
GBR_Top3_Sector = D2_GBR_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[2:3]

print(GBR_Top1_Sector.to_string(index=False))
print(GBR_Top2_Sector.to_string(index=False))
print(GBR_Top3_Sector.to_string(index=False))

Others
Social, Finance, Analytics, Advertising
Cleantech / Semiconductors


In [34]:
D2_top_sector = D2.loc[D2['main_sector']=='Others',:]
D2_groupby_company = D2_top_sector.groupby('company_name')
D2_company_final = pd.DataFrame(D2_groupby_company['raised_amount_usd'].sum().sort_values(ascending = False)).reset_index()

print('Top Company for Top Sector:')
print(D2_company_final.company_name[0])

D2_second_sector = D2.loc[D2['main_sector']=='Social, Finance, Analytics, Advertising',:]
D2_groupby_secsector_company = D2_second_sector.groupby('company_name')
D2_secsector_company_final = pd.DataFrame(D2_groupby_secsector_company['raised_amount_usd'].sum().sort_values(ascending = False)).reset_index()

print('\nTop Company for Second Sector:')
print(D2_secsector_company_final.company_name[0])

Top Company for Top Sector:
Electric Cloud

Top Company for Second Sector:
Celltick Technologies


In [35]:
IND_Top1_Sector = D3_IND_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[:1]
IND_Top2_Sector = D3_IND_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[1:2]
IND_Top3_Sector = D3_IND_Investments.sort_values(['total_num_investments'],ascending=False).main_sector[2:3]

print(IND_Top1_Sector.to_string(index=False))
print(IND_Top2_Sector.to_string(index=False))
print(IND_Top3_Sector.to_string(index=False))

Others
Social, Finance, Analytics, Advertising
News, Search and Messaging


In [36]:
D3_top_sector = D3.loc[D3['main_sector']=='Others',:]
D3_groupby_company = D3_top_sector.groupby('company_name')
D3_company_final = pd.DataFrame(D3_groupby_company['raised_amount_usd'].sum().sort_values(ascending = False)).reset_index()

print('Top Company for Top Sector:')
print(D3_company_final.company_name[0])

D3_second_sector = D3.loc[D3['main_sector']=='Social, Finance, Analytics, Advertising',:]
D3_groupby_secsector_company = D3_second_sector.groupby('company_name')
D3_secsector_company_final = pd.DataFrame(D3_groupby_secsector_company['raised_amount_usd'].sum().sort_values(ascending = False)).reset_index()

print('\nTop Company for Second Sector:')
print(D3_secsector_company_final.company_name[0])

Top Company for Top Sector:
Firstcry.Com

Top Company for Second Sector:
Manthan Systems


## Check point 5 Completed

## Checkpoint 6: Plots

To present your findings to the CEO of Spark Funds. Specifically, as she wants to see the following plots:
1.	A plot showing the fraction of total investments (globally) in venture, seed, and private equity, and the average amount of investment in each funding type. This chart should make it clear that a certain funding type (FT) is best suited for Spark Funds.
2.	A plot showing the top 9 countries against the total amount of investments of funding type FT. This should make the top 3 countries (Country 1, Country 2, and Country 3) very clear.
3.	A plot showing the number of investments in the top 3 sectors of the top 3 countries on one chart (for the chosen investment type FT). 

Exporting Dataframes correspondingly to csv's so that they can be used in Tableau

In [37]:
# Exporting Data for Plots. The csv's will be used further for generating Plots

# 1. Exporting Cleaned Master frame for Venture analysis
master_frame.to_csv('master_frame.csv', sep=',')

# 2 Exporting final Data frame with funding type, so that it can be used for plotting both countries and sectors
final_venture_master_frame.to_csv('Venture_all_countries.csv',sep=',')

# 3 Exporting final Data frame with funding type, so that it can be used for plotting both countries and sectors, for 5 to 15 million
Final_venture_5_15M = final_venture_master_frame.loc[(final_venture_master_frame['raised_amount_usd'] >= 5000000) & (final_venture_master_frame['raised_amount_usd'] <= 15000000)]
Final_venture_5_15M.to_csv('venture_5_15m.csv',sep=',')


### All Checkpoint Completed