# Demographic Data Analyzer (FreeCode Camp - Data Analysis with Python Projects #2)

n this challenge you must analyze demographic data using Pandas. You are given a dataset of demographic data that was extracted from the 1994 Census database. Here is a sample of what the data looks like:

You must use Pandas to answer the following questions:

    - How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
    
    - What is the average age of men?
    
    - What is the percentage of people who have a Bachelor's degree?
    
    - What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?
    
    - What percentage of people without advanced education make more than 50K?
    
    - What is the minimum number of hours a person works per week?
    
    - What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?
    
    - What country has the highest percentage of people that earn >50K and what is that percentage?
    
    - Identify the most popular occupation for those who earn >50K in India.

Use the starter code in the file demographic_data_analyzer.py. Update the code so all variables set to None are set to the appropriate calculation or code. Round all decimals to the nearest tenth.

In [1]:
# Installation de la bibliothèque pandas 
import pandas as pd

# Lecture du dataset adult
df = pd.read_csv('adult.data.csv')

In [2]:
# Liste des colonnes de la dataframe

df.columns

Index(['age', 'workclass', 'fnlwgt', 'education', 'education-num',
       'marital-status', 'occupation', 'relationship', 'race', 'sex',
       'capital-gain', 'capital-loss', 'hours-per-week', 'native-country',
       'salary'],
      dtype='object')

## Réponse au différente question

In [3]:
# How many peaple of each race are represented in this dataset ?

race = df['race'].value_counts()

In [4]:
race

race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: count, dtype: int64

In [5]:
# what is the average age of men ?

avg_male_age = df[df['sex'] == 'Male']['age'].mean()

In [6]:
print(f" L'âge moyen des hommes est de {avg_male_age:.2f} ans.")

 L'âge moyen des hommes est de 39.43 ans.


In [7]:
# What is the pourcentage of people who got a Bachelor degree ?

bachelor = (df['education'].value_counts(normalize = True)*100)['Bachelors']

In [8]:
print(f" Le pourcentage de personnes ayant un Bachelor est de {bachelor:.2f}%.")

 Le pourcentage de personnes ayant un Bachelor est de 16.45%.


In [9]:
# What percentage of people with advanced education (Bachelors, Masters or Doctorate) make more than 50K ?

adv_education = df[df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]

In [10]:
adv_more_50k = (adv_education['salary'].value_counts(normalize = True)*100)['>50K']

In [11]:
print(f"Le pourcentage de personne à forte éducation et gagnant plus de 50 000 $ est de {adv_more_50k:.2f}%.")

Le pourcentage de personne à forte éducation et gagnant plus de 50 000 $ est de 46.54%.


In [12]:
# What percentage of people without advanced education make more than 50k

no_adv_education_salary = (df[~df['education'].isin(['Bachelors', 'Masters', 'Doctorate'])]['salary'].value_counts(normalize = True)*100)['>50K']

print(f"Parmi les personnes n'ayant pas fait de longues études, {no_adv_education_salary:.2f}% gagnent plus de 50 000 $ par an")

Parmi les personnes n'ayant pas fait de longues études, 17.37% gagnent plus de 50 000 $ par an


In [13]:
# What is the minimum number a person work per week ?

minimum_hours = df['hours-per-week'].min()

print(f" La durée minimal de travail d'un est de {minimum_hours} heure")

 La durée minimal de travail d'un est de 1 heure


In [14]:
# What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

minimum_hours_salary = (df[df['hours-per-week'] == 1]['salary'].value_counts(normalize = True)*100)['>50K']

print(f"Le pourcentage de personne ayant travaillé le moins d'heure par semaine et gagnant plus de 50k est de {minimum_hours_salary}%.")

Le pourcentage de personne ayant travaillé le moins d'heure par semaine et gagnant plus de 50k est de 10.0%.


In [15]:
# What country has the highest percentage of people that earn >50K and what is that percentage?

# Groupement par pays
country_salary_pct = df.groupby('native-country')['salary'].value_counts(normalize=True).unstack()

# On récupère uniquement la colonne '>50K'
percent_50k = country_salary_pct['>50K'] * 100

highest_country = percent_50k.idxmax()
highest_percentage = percent_50k.max()

print(f"{highest_country} a le plus haut pourcentage de personnes gagnant >50K : {highest_percentage:.2f}%")


Iran a le plus haut pourcentage de personnes gagnant >50K : 41.86%


In [16]:
# Identify the most popular occupation for those who earn >50K in India.

popular_india_occupation = df[(df['native-country'] == 'India') & (df['salary'] == '>50K')]['occupation'].mode()

In [17]:
print(popular_india_occupation)

0    Prof-specialty
Name: occupation, dtype: object
