# **Context** 

Forbes is a media and publishing company based in the United States that is controlled by Integrated Whale Media Investments and the Forbes family. It is well-known for its magazine and rankings of billionaires and sports teams.
As a newly hired Data Scientist of Forbes company, you have been given the task of analyzing the net worth of Forbes Top Billionaires 2020. Your goal is to analyze the data and draw insights.

## **Problem Statement**

Analyze the net worth of Forbes Top Billionaires 2020.

**Data Dictionary**

Name - Name of the person
<br>
Net worth in billions - Net worth of the person in billions
<br>
Country - Country where the person is from
<br>
Source - Source of the income
<br>
Rank - Rank of the person in the Billionaires list
<br>
Age - Age of the person
<br>
Industry - Industry to which the person is related to

## Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns 

## Loading dataset and understanding it

In [2]:
df = pd.read_csv('Forbes Billionaire 2020.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'Forbes Billionaire 2020.csv'

### Looking at top 5 rows 

In [None]:
df.head()

### Dimension of Data

In [None]:
df.shape

### The Datatypes of the columns in the dataframe

In [None]:
df.info()

### Checking for missing values

In [None]:
df.isnull().sum().sum()

In [None]:
df.isnull().sum()

### Plots & Its use

#### Boxplot - helps in checking for Outliers

In [None]:
plt.figure(figsize=(5,7))
sns.boxplot(x = 'Age', data = df, showmeans=True)
plt.show() 

## Data Analysis

### Summary of data

In [None]:
df[['NetWorth in billions','Age']].describe()

#### Distribution of Age variable

#### distplot

In [None]:
sns.displot(df["Age"],kde = False, color = 'red',bins=6)
plt.show()

#### Which 5 countries have the maximum billiionaries?

In [None]:
df.Country.value_counts().head()

#### Which countries do the 5 youngest billionaires belong to?

In [None]:
df.sort_values(by = 'Age').head(10)

In [None]:
df.sort_values(by = 'Age').head(5)['Country'].unique()

#### Countplot for industry

In [None]:
plt.figure(figsize=(30,10))
sns.countplot(df['Industry'])
plt.show()

In [None]:
df['Industry'].value_counts()

#### Name all the billionaires in India who are younger than 50 years. 

In [None]:
Indian_billionaires = df[df['Country']=='India']
Indian_billionaires

In [None]:
print('The names of the Indian Billionaires under the age of 50 are:')
Indian_billionaires[Indian_billionaires['Age']<50]['Name'].values

#### Pieplot

In [None]:
plt.figure(figsize=(11,10))
plt.pie(df["Industry"].value_counts(), labels=df["Industry"].value_counts().index, autopct='%.2f%%', 
        explode = (0.3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0))
# autopct="%.2f%%" is used here display the percentage in the pie-chart upto 1 deciaml place
# explode is used here to take the 'Finance & Investment'(30%) of the pie-chart
plt.show()

#### To check the age distribution by Industry

#### Use boxplot

In [None]:
plt.figure(figsize=(15,10))
sns.boxplot(x= 'Age', y='Industry', data=df)
plt.grid()
plt.show()

#### Use Strip Plot

Plot a strip plot to check the relationship between the variables 'Industry' and 'Age'.

In [None]:
# strip plot
plt.figure(figsize=(30,10))
sns.stripplot(x = 'Industry', y = 'Age', data = df)

# display the plot
plt.show()

#### Scatterplot

In [None]:
plt.figure(figsize = (8,8))

# plot a pair plot
sns.pairplot(data=df,diag_kws={'bins':6})

# display the plot
plt.show()

In [None]:
df.columns

#### Heatmap

In [None]:
# compute correlation
corr_matrix = df.corr()

corr_matrix

In [None]:
# plot heatmap
# 'annot=True' returns the correlation values 
sns.heatmap(corr_matrix, annot = True)

# display the plot
plt.show()

## Happy Learning!