## Who is the richest?

### Goal of this assignment: 
Parse the file and answer the following questions: 
1. Who is the richest in this list?
2. How many people are without an email?
3. How many people are without phone numbers?

In [22]:
import pandas as pd

In [23]:
# Load the file as a dataframe
data = pd.read_excel('richest_people.xlsx')

In [24]:
data.head(10)

Unnamed: 0,Name,Email,Website,Phone Number,Net Worth
0,Anna Jenkins,,http://tran.com/,,$74 billion
1,Nishith Wadhwa,vsachdeva@gmail.com,http://thaker.com/,4726597841,$66 billion
2,Gerd SÃ¸rensen,arild80@hotmail.com,http://antonsen.com/,+47 32 90 93 83,$74 billion
3,çŽ‹è¶…,hgu@gmail.com,http://www.73.net/,18099330306,$123 billion
4,ìž„í˜„ì •,jeonghocoe@hanmail.net,https://jusighoesa.com/,,$126 billion
5,ç”°ä¸­ é‡Œä½³,,https://tanaka.net/,070-8821-2415,$144 billion
6,Deanna Bender,rvasquez@gmail.com,https://arnold.net/,4839936016,$90 billion
7,Kavya Kalla,nmann@gmail.com,http://www.shere.com/,2357487988,$148 billion
8,Grete Jacobsen,johannessenjenny@yahoo.com,http://tveit.com/,,$42 billion
9,æž—æ¡‚è‹±,linjun@hotmail.com,http://qiang.cn/,13609064748,$56 billion


In [25]:
data.columns

Index(['Name', 'Email', 'Website', 'Phone Number', 'Net Worth'], dtype='object')

In [26]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Name          50 non-null     object
 1   Email         40 non-null     object
 2   Website       50 non-null     object
 3   Phone Number  37 non-null     object
 4   Net Worth     50 non-null     object
dtypes: object(5)
memory usage: 2.1+ KB


### Who is the richest?

In [30]:
# extract the value (as float) from the net worth

def apply_scale(net_worth):
    if 'billion' in net_worth:
        return float(net_worth.replace('$', '').replace(' billion', '')) * 1e9
    elif 'million' in net_worth:
        return float(net_worth.replace('$', '').replace(' million', '')) * 1e6
    else: 
        return  net_worth.replace('$', '')

data['Net Worth Value'] = data['Net Worth'].apply(apply_scale)

In [31]:
data.head()

Unnamed: 0,Name,Email,Website,Phone Number,Net Worth,Net Worth Value
0,Anna Jenkins,,http://tran.com/,,$74 billion,74000000000.0
1,Nishith Wadhwa,vsachdeva@gmail.com,http://thaker.com/,4726597841,$66 billion,66000000000.0
2,Gerd SÃ¸rensen,arild80@hotmail.com,http://antonsen.com/,+47 32 90 93 83,$74 billion,74000000000.0
3,çŽ‹è¶…,hgu@gmail.com,http://www.73.net/,18099330306,$123 billion,123000000000.0
4,ìž„í˜„ì •,jeonghocoe@hanmail.net,https://jusighoesa.com/,,$126 billion,126000000000.0


In [37]:
# find the max value in the net worth value column
# use the index of that value to locate which person is the richest

print("{} is the richest person".format(data.nlargest(1, 'Net Worth Value')['Name'].values[0]))


éˆ´æœ¨ çŸ¥å®Ÿ is the richest person


In [51]:
# Trying a lengthier approach for practice purposes

print(data['Net Worth Value'].max())
print(data[['Net Worth Value']].idxmax().values[0])
data.iloc[[data[['Net Worth Value']].idxmax().values[0]]]


196000000000.0
35


Unnamed: 0,Name,Email,Website,Phone Number,Net Worth,Net Worth Value
35,éˆ´æœ¨ çŸ¥å®Ÿ,,http://kondo.com/,070-8335-7090,$196 billion,196000000000.0


### How many people are without an email?

In [40]:
print("There are {} people with no email".format(data['Email'].isna().sum()))

There are 10 people with no email


### How many people are without phone numbers?

In [41]:
print("There are {} people with no email".format(data['Phone Number'].isna().sum()))

There are 13 people with no email
