# Demographic Data Analyzer

With this project, we will analyze demographic data using Pandas. We are given a dataset of demographic data that was extracted from the 1994 Census database. 

The following questions must be answered:

- How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
- What is the average age of men?
- What is the percentage of people who have a Bachelor's degree?
- What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
- What percentage of people without advanced education make more than 50K?
- What is the minimum number of hours a person works per week?
- What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
- What country has the highest percentage of people that earn >50K and what is that percentage?
- Identify the most popular occupation for those who earn >50K in India.

In [1]:
import pandas as pd

In [2]:
url = 'adult.data.csv'

df = pd.read_csv(url)

df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


## Answering the questions
### 1. How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)

In [20]:
df['race'].value_counts()

White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: race, dtype: int64

In [4]:
white = df[df['race'] == 'White'].shape[0]

total = df.shape[0]

white_percent = white/total

white_percent*100

85.42735173981143

As can be seen above, white people are the most represented in the dataset, with a percentage of 85.43% of the total.

### 2. What is the average age of men?

In [5]:
round(df[df['sex'] == 'Male']['age'].mean(),1)

39.4

The average age of men in the dataset is 39.43 years.

### 3. What is the percentage of people who have a Bachelor's degree?

In [6]:
round(df[df['education'] == 'Bachelors']['education'].shape[0] / df.shape[0] * 100, 1)

16.4

The percentagem of people with Bachelor's degree is 16.4%.

### 4. What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
### 5. What percentage of people without advanced education make more than 50K?

In [7]:
higher_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]

In [8]:
higher_education_rich = round(len(higher_education[higher_education['salary'] == '>50K']) / len(higher_education) * 100, 1)
higher_education_rich

46.5

The percentagem of people with advanced education that make more than 50K is: 46.5%.

In [9]:
lower_education = df[~df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]

In [10]:
lower_education_rich = round(len(lower_education[lower_education['salary'] == '>50K']) / len(lower_education) * 100, 1)
lower_education_rich

17.4

The percentagem of people without advanced education that make more than 50K is: 17.4%.

### 6. What is the minimum number of hours a person works per week?

In [11]:
num_min_workers = df['hours-per-week'].min()
num_min_workers

1

The minimum number of hours a person works per week is 1 hour.

### 7. What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

In [12]:
min_workers = df[df['hours-per-week'] == 1]

rich_percentage = len(min_workers) / len(min_workers[min_workers['salary'] == '>50K'])

rich_percentage

10.0

The percentage of the people who work the minimum number of hours per week have a salary of more than 50K is: 10%

### 8. What country has the highest percentage of people that earn >50K and what is that percentage?

In [13]:
df.loc[df['salary'] == '>50K', 'native-country'].value_counts()

United-States         7171
?                      146
Philippines             61
Germany                 44
India                   40
Canada                  39
Mexico                  33
England                 30
Italy                   25
Cuba                    25
Japan                   24
Taiwan                  20
China                   20
Iran                    18
South                   16
Puerto-Rico             12
Poland                  12
France                  12
Jamaica                 10
El-Salvador              9
Greece                   8
Cambodia                 7
Hong                     6
Yugoslavia               6
Ireland                  5
Vietnam                  5
Portugal                 4
Haiti                    4
Ecuador                  4
Thailand                 3
Hungary                  3
Guatemala                3
Scotland                 3
Nicaragua                2
Trinadad&Tobago          2
Laos                     2
Columbia                 2
D

In [21]:
highest_earning_country = round(df.loc[df['salary'] == '>50K', 'native-country'].value_counts() / df['native-country'].value_counts() * 100,1).sort_values(ascending=False).index[0]

highest_earning_country

'Iran'

The country with the highest percentage of people that earn >50K is Iran!

In [15]:
highest_earning_country_percentage = highest_earning_country.sort_values(ascending=False)[0]
highest_earning_country_percentage

41.9

And the percentage is: 41.9%

### 9. Identify the most popular occupation for those who earn >50K in India.

In [18]:
top_IN_occupation = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')]['occupation'].value_counts().index[0]

top_IN_occupation

'Prof-specialty'

The most popular occupation for those who earn >50K in India