# Demographic Data Analyzer

- How many people of each race are represented in this dataset? This should be a Pandas series with race names as the index labels. (race column)
- What is the average age of men?

- What is the percentage of people who have a Bachelor's degree?

- What percentage of people with advanced education (Bachelors, Masters, or Doctorate) make more than 50K?

- What percentage of people without advanced education make more than 50K?

- What is the minimum number of hours a person works per week?

- What percentage of the people who work the minimum number of hours per week have a salary of more than 50K?

- What country has the highest percentage of people that earn >50K and what is that percentage?

- Identify the most popular occupation for those who earn >50K in India.

In [2]:
import pandas as pd

In [3]:
import os

print(os.getcwd())

/home/duyla/data-science/analysis/projects/Demographic Data Analyzer


In [4]:
# Reading the dataset
data = pd.read_csv("adultData.csv", delimiter=',')

print(data.head())

data.head()

   age         workclass  fnlwgt  education  education-num  \
0   39         State-gov   77516  Bachelors             13   
1   50  Self-emp-not-inc   83311  Bachelors             13   
2   38           Private  215646    HS-grad              9   
3   53           Private  234721       11th              7   
4   28           Private  338409  Bachelors             13   

       marital-status         occupation   relationship   race     sex  \
0       Never-married       Adm-clerical  Not-in-family  White    Male   
1  Married-civ-spouse    Exec-managerial        Husband  White    Male   
2            Divorced  Handlers-cleaners  Not-in-family  White    Male   
3  Married-civ-spouse  Handlers-cleaners        Husband  Black    Male   
4  Married-civ-spouse     Prof-specialty           Wife  Black  Female   

   capital-gain  capital-loss  hours-per-week native-country salary  
0          2174             0              40  United-States  <=50K  
1             0             0             

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K


## Analysis

In [40]:
# Races appear in the dataset

print(data["race"])

print("\n\n")

print(data["race"].value_counts())

dist_races = data["race"].unique()

print(dist_races)

print("Amount of unique races: {}".format(len(dist_races)))

0        White
1        White
2        White
3        Black
4        Black
         ...  
32556    White
32557    White
32558    White
32559    White
32560    White
Name: race, Length: 32561, dtype: object



race
White                 27816
Black                  3124
Asian-Pac-Islander     1039
Amer-Indian-Eskimo      311
Other                   271
Name: count, dtype: int64
['White' 'Black' 'Asian-Pac-Islander' 'Amer-Indian-Eskimo' 'Other']
Amount of unique races: 5


- There 5 distinct races appear inner the dataset, including:
    + White
    + Black
    + Asian-Pacific-Islander (https://en.wikipedia.org/wiki/Asian_Pacific_Americans)
    + Amer-Indian-Eskimo 
    + Other

In statistics, `mean` and `median` are two measures of central tendency.

- `Mean` is the simple average of the given set of values or quantities
    + Arithmetic average
    + Sensitive to outliers

- `Median` is the middle number in an ordered list of values
    + Positional average
    + Less sensitive to outliers

In [6]:
# Men's average age
avg_men_age = data["age"].mean()

print(avg_men_age)


38.58164675532078


The average age in male citizens found in this dataset is `38.58164675532078` years old

In [7]:
university_grad_citizens = data[data["education"] == "Bachelors"]

print(university_grad_citizens)

amount_university_grad_citizens = university_grad_citizens.shape
print(amount_university_grad_citizens)

uni_grad_percentage = university_grad_citizens.shape[0] / data.shape[0]

print(uni_grad_percentage)

       age         workclass  fnlwgt  education  education-num  \
0       39         State-gov   77516  Bachelors             13   
1       50  Self-emp-not-inc   83311  Bachelors             13   
4       28           Private  338409  Bachelors             13   
9       42           Private  159449  Bachelors             13   
11      30         State-gov  141297  Bachelors             13   
...    ...               ...     ...        ...            ...   
32530   35                 ?  320084  Bachelors             13   
32531   30                 ?   33811  Bachelors             13   
32533   54           Private  337992  Bachelors             13   
32536   34           Private  160216  Bachelors             13   
32538   38           Private  139180  Bachelors             13   

           marital-status       occupation   relationship                race  \
0           Never-married     Adm-clerical  Not-in-family               White   
1      Married-civ-spouse  Exec-managerial   

The number of `5355` graduated university people with the percentage of only `16.44%` on overall

In [8]:
mySet = {1, 69, "hello"}

ele = "hello"

print(ele in mySet)

True


In [9]:
# People with advanced education (Bachelor, Master or Doctorate)
advanced_educated_citizen = data[data["education"].isin(["Bachelors", "Masters", "Doctorate"])]

# print(advanced_educated_citizen[advanced_educated_citizen["education"] == "Masters"])
amountOfBachelors = (data[data["education"] == "Bachelors"].shape)[0]
amountOfBachelorsEarnMore50K = (data[(data["education"] == "Bachelors") & (data["salary"] == ">50K")].shape)[0]
amountOfMasters = (data[data["education"] == "Masters"].shape)[0]
amountOfMastersEarnMore50K = (data[(data["education"] == "Masters") & (data["salary"] == ">50K")].shape)[0]
amountOfDoctorates = (data[data["education"] == "Doctorate"].shape)[0]
amountOfDoctoratesEarnMore50K = (data[(data["education"] == "Doctorate") & (data["salary"] == ">50K")].shape)[0]

print("Amount of bachelors: {}\nAmount of Masters: {}\nAmount of Doctorate: {}".format(amountOfBachelors, amountOfMasters, amountOfDoctorates))
print("{} >><< {} >><< {}".format(amountOfBachelorsEarnMore50K, amountOfMastersEarnMore50K, amountOfDoctoratesEarnMore50K))

print(amountOfBachelorsEarnMore50K + amountOfMastersEarnMore50K + amountOfDoctoratesEarnMore50K)

# print("The percentage of bachelors can earn more than 50K per year: {}".format())

print("\n\n")

thoseWhoEarnsMoreThan50K = advanced_educated_citizen[advanced_educated_citizen["salary"] == ">50K"]
print(thoseWhoEarnsMoreThan50K)

advanced_educ_people_salary_percentage = (thoseWhoEarnsMoreThan50K.shape)[0] / (data.shape)[0]
print(advanced_educ_people_salary_percentage)


Amount of bachelors: 5355
Amount of Masters: 1723
Amount of Doctorate: 413
2221 >><< 959 >><< 306
3486



       age         workclass  fnlwgt  education  education-num  \
8       31           Private   45781    Masters             14   
9       42           Private  159449  Bachelors             13   
11      30         State-gov  141297  Bachelors             13   
19      43  Self-emp-not-inc  292175    Masters             14   
20      40           Private  193524  Doctorate             16   
...    ...               ...     ...        ...            ...   
32533   54           Private  337992  Bachelors             13   
32536   34           Private  160216  Bachelors             13   
32538   38           Private  139180  Bachelors             13   
32539   71                 ?  287372  Doctorate             16   
32554   53           Private  321865    Masters             14   

           marital-status       occupation   relationship                race  \
8           Never-ma

People who has advanced education such as (Bacherlors, Masters or Doctorates) are recorded with an advantages in having a well-paid jobs (more than 50K per year) takes account of approximately 46.5% on overall (with 3486 people)

- Amount of Bachelors: 5355 people but actual 2221 people can earn more than 50K per year
- Amount of Masters: 1723 people (in which 959 of them are capable of earning more than 50K annually)
- Amount of Doctorate: 413 folks (306 of them have salary higher than 50K each year)

In [46]:
print((3486 / (5355 + 1723 + 413)) * 100)

46.535843011613935


In [10]:
# The percentage of people who can earn more than 50K per year but ain't have any advanced degrees

rangeOfEducations = data["education"].unique()
print(rangeOfEducations)

nonAdvancedDegreeFolks = data[~data["education"].isin(["Bachelors", "Masters", "Doctorate"])]

amountOfAdvancedEduFolks = (nonAdvancedDegreeFolks.shape)[0]
print(amountOfAdvancedEduFolks)

nonAdvancedDegreeFolksEarnMoreThan50K = nonAdvancedDegreeFolks[nonAdvancedDegreeFolks["salary"] == ">50K"]
print(nonAdvancedDegreeFolksEarnMoreThan50K.head())
print((nonAdvancedDegreeFolksEarnMoreThan50K.shape)[0])


# totalAmountOfPeopleWithoutAdvDegree = (data.shape)[0] 

['Bachelors' 'HS-grad' '11th' 'Masters' '9th' 'Some-college' 'Assoc-acdm'
 'Assoc-voc' '7th-8th' 'Doctorate' 'Prof-school' '5th-6th' '10th'
 '1st-4th' 'Preschool' '12th']
25070
    age         workclass  fnlwgt     education  education-num  \
7    52  Self-emp-not-inc  209642       HS-grad              9   
10   37           Private  280464  Some-college             10   
14   40           Private  121772     Assoc-voc             11   
27   54                 ?  180211  Some-college             10   
38   31           Private   84154  Some-college             10   

        marital-status       occupation relationship                race  \
7   Married-civ-spouse  Exec-managerial      Husband               White   
10  Married-civ-spouse  Exec-managerial      Husband               Black   
14  Married-civ-spouse     Craft-repair      Husband  Asian-Pac-Islander   
27  Married-civ-spouse                ?      Husband  Asian-Pac-Islander   
38  Married-civ-spouse            Sales      H

Number of people without advanced degree (including Bachelors, Masters, Doctorate): 25070. But only 4355 people out of them can earn more than $50K in a single year with the percentage of approximately
17.4%



In [44]:
print((4355 / 25070) * 100)

print(round((4355 / 25070) * 100, 1))

17.3713601914639
17.4


In [11]:
# Range of hours of an person to be able to do in a week
distinct_work_hours = data["hours-per-week"].unique()
print(distinct_work_hours)

min_work_hours_per_week = min(distinct_work_hours)
print(min_work_hours_per_week)


[40 13 16 45 50 80 30 35 60 20 52 44 15 25 38 43 55 48 58 32 70  2 22 56
 41 28 36 24 46 42 12 65  1 10 34 75 98 33 54  8  6 64 19 18 72  5  9 47
 37 21 26 14  4 59  7 99 53 39 62 57 78 90 66 11 49 84  3 17 68 27 85 31
 51 77 63 23 87 88 73 89 97 94 29 96 67 82 86 91 81 76 92 61 74 95]
1


The minimum number of hours a person can work per week is `1 hour`

In [12]:
# Finding the percentage of folks who can work minimum number of hours per week but still manage to earn
# more than 50K per year

# what is the explicit range of minimum hours per week a person is expected to do in a week so that it can
# give an reasonable reason to rationalize their capability to earn the expected salary?

# Finding people who can earn more than 50K, regardless of what occupy they do or if they have advanced
# educational background or not
wellEarnedPeople = data[data["salary"] == ">50K"]

print(wellEarnedPeople.shape)

print(wellEarnedPeople["hours-per-week"].dtype)

print(wellEarnedPeople["hours-per-week"].unique())

print(wellEarnedPeople["hours-per-week"].min())

# Counting the how many high-paid people can only work in that minimum number of hour per week
print(wellEarnedPeople[wellEarnedPeople["hours-per-week"] == 1])
print((wellEarnedPeople[wellEarnedPeople["hours-per-week"] == 1].shape)[0])


(7841, 15)
int64
[45 50 40 80 60 38 55 70 48 42  1 36 35 75 54  6 65 25 16 58 30 47 26 59
 24 56 44 99 43 52 46 20 72  7 90  5 84 18 10  8 68 33 49 12 15 98  2 32
 39 41 37 57 97 14 85 29 64 63 67 27 21 17 13 53 76 78  4 28 51 62 61 22
 34 96 66 95  9  3 73 89]
1
       age  workclass  fnlwgt  education  education-num      marital-status  \
189     58  State-gov  109567  Doctorate             16  Married-civ-spouse   
20072   65          ?   76043    HS-grad              9  Married-civ-spouse   

           occupation relationship   race   sex  capital-gain  capital-loss  \
189    Prof-specialty      Husband  White  Male             0             0   
20072               ?      Husband  White  Male             0             0   

       hours-per-week native-country salary  
189                 1  United-States   >50K  
20072               1  United-States   >50K  
2


In [48]:
print(wellEarnedPeople[wellEarnedPeople["fnlwgt"] == 109567])

print(wellEarnedPeople["fnlwgt"].dtype)

print(len(data[data["hours-per-week"] == 1]))

wellEarnedPeople[wellEarnedPeople["hours-per-week"] == 1]

     age  workclass  fnlwgt  education  education-num      marital-status  \
189   58  State-gov  109567  Doctorate             16  Married-civ-spouse   

         occupation relationship   race   sex  capital-gain  capital-loss  \
189  Prof-specialty      Husband  White  Male             0             0   

     hours-per-week native-country salary  
189               1  United-States   >50K  
int64
20


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
189,58,State-gov,109567,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,1,United-States,>50K
20072,65,?,76043,HS-grad,9,Married-civ-spouse,?,Husband,White,Male,0,0,1,United-States,>50K


There are 20 person who only works 1 hour per week inner the dataset

The minimum number of hours of an well-earn professional person is `1 hour per week` but still be affordable to earn more than 50 thousand bucks and in this recorded, only 2 people can do that. They are people with `fnlwgt` to be `109567` and `76043` => The percentage: (2 / 20) * 100 = 10%

In [14]:
# Finding which country has the highest percentage of people that earn more than $50K and its percentage

# Finding how many distinct countries inner the data
dist_countries = data["native-country"].unique()

print(dist_countries)

# Finding how many people who can earn high money in each country
# ern_each_country = data.groupby("native-country")[["salary"]]

# index = pd.MultiIndex.from_arrays(list(dist_countries), names=("Country"))

earn_each_country = data.groupby(["native-country", "salary"])

# print(earn_each_country.index)

print("\n\n")
# print(ern_each_country.head())
earn_each_country.head()

['United-States' 'Cuba' 'Jamaica' 'India' '?' 'Mexico' 'South'
 'Puerto-Rico' 'Honduras' 'England' 'Canada' 'Germany' 'Iran'
 'Philippines' 'Italy' 'Poland' 'Columbia' 'Cambodia' 'Thailand' 'Ecuador'
 'Laos' 'Taiwan' 'Haiti' 'Portugal' 'Dominican-Republic' 'El-Salvador'
 'France' 'Guatemala' 'China' 'Japan' 'Yugoslavia' 'Peru'
 'Outlying-US(Guam-USVI-etc)' 'Scotland' 'Trinadad&Tobago' 'Greece'
 'Nicaragua' 'Vietnam' 'Hong' 'Ireland' 'Hungary' 'Holand-Netherlands']





Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,39,State-gov,77516,Bachelors,13,Never-married,Adm-clerical,Not-in-family,White,Male,2174,0,40,United-States,<=50K
1,50,Self-emp-not-inc,83311,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,13,United-States,<=50K
2,38,Private,215646,HS-grad,9,Divorced,Handlers-cleaners,Not-in-family,White,Male,0,0,40,United-States,<=50K
3,53,Private,234721,11th,7,Married-civ-spouse,Handlers-cleaners,Husband,Black,Male,0,0,40,United-States,<=50K
4,28,Private,338409,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,40,Cuba,<=50K
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
31457,40,Private,226608,Some-college,10,Divorced,Tech-support,Not-in-family,White,Male,0,0,30,Guatemala,>50K
31486,41,Private,227968,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,Black,Female,0,0,35,Haiti,>50K
32203,51,Federal-gov,223206,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,15024,0,40,Vietnam,>50K
32214,22,Local-gov,164775,5th-6th,3,Never-married,Handlers-cleaners,Other-relative,White,Male,0,0,40,Guatemala,>50K


In [15]:
m = data[(data["native-country"] == "Vietnam") & (data["salary"] == ">50K")]

print(m)

print("\n\n")

print(len(m))
print((m.shape)[0])

       age    workclass  fnlwgt  education  education-num      marital-status  \
3879    43      Private  128212    5th-6th              3  Married-civ-spouse   
8633    23      Private  106957       11th              7       Never-married   
27002   30      Private   95299    HS-grad              9  Married-civ-spouse   
27214   30      Private  154568  Bachelors             13  Married-civ-spouse   
32203   51  Federal-gov  223206  Doctorate             16  Married-civ-spouse   

              occupation relationship                race     sex  \
3879   Machine-op-inspct         Wife  Asian-Pac-Islander  Female   
8633        Craft-repair    Own-child  Asian-Pac-Islander    Male   
27002  Machine-op-inspct      Husband  Asian-Pac-Islander    Male   
27214       Craft-repair      Husband  Asian-Pac-Islander    Male   
32203     Prof-specialty      Husband  Asian-Pac-Islander    Male   

       capital-gain  capital-loss  hours-per-week native-country salary  
3879              0     

In [21]:
def getSalaryStats(df: pd.DataFrame, country_name: str) -> dict:
    totalHighEarnFolks = (df[(df["native-country"] == country_name) & (df["salary"] == ">50K")].shape)[0]
    # for index, row in df[(df["native-country"] == country_name) & (df["salary"] == ">50K")].iterrows():
    #     print(row)
    totalFoksInCountry = (df[(df["native-country"] == country_name)].shape)[0]
    highEarnedPercentage = totalHighEarnFolks / totalFoksInCountry
    return {
        "Country": country_name,
        "Total high earned folks": totalHighEarnFolks,
        "Total people in the country": totalFoksInCountry,
        "High earned people's percentage": highEarnedPercentage
    }

getSalaryStats(data, "Vietnam")

salaryStats = []
for country in dist_countries:
    statsData = getSalaryStats(data, country)
    salaryStats.append(statsData)

print(salaryStats)

# maxHighEarnPercentageInCountry = -999999
# for each in salaryStats:
#     if each["High earned people's percentage"] > maxHighEarnPercentageInCountry:
#         maxHighEarnPercentageInCountry = each["High earned people's percentage"]

# print(maxHighEarnPercentageInCountry)

def findTheHighestEarnRatioCountry(stats: list) -> dict:
    maxHighEarnPercentageInCountry = -999999
    foundOne = None
    for each in stats:
        if each["High earned people's percentage"] > maxHighEarnPercentageInCountry:
            maxHighEarnPercentageInCountry = each["High earned people's percentage"]
            foundOne = each
    
    return foundOne

print(findTheHighestEarnRatioCountry(salaryStats))

[{'Country': 'United-States', 'Total high earned folks': 7171, 'Total people in the country': 29170, "High earned people's percentage": 0.24583476174151525}, {'Country': 'Cuba', 'Total high earned folks': 25, 'Total people in the country': 95, "High earned people's percentage": 0.2631578947368421}, {'Country': 'Jamaica', 'Total high earned folks': 10, 'Total people in the country': 81, "High earned people's percentage": 0.12345679012345678}, {'Country': 'India', 'Total high earned folks': 40, 'Total people in the country': 100, "High earned people's percentage": 0.4}, {'Country': '?', 'Total high earned folks': 146, 'Total people in the country': 583, "High earned people's percentage": 0.2504288164665523}, {'Country': 'Mexico', 'Total high earned folks': 33, 'Total people in the country': 643, "High earned people's percentage": 0.05132192846034215}, {'Country': 'South', 'Total high earned folks': 16, 'Total people in the country': 80, "High earned people's percentage": 0.2}, {'Country'

In [22]:
print(data[data["native-country"] == "Iran"].info())

print(len(data[(data["native-country"] == "Iran") & (data["salary"] == ">50K")]))

print(18 / 43)

data[data["native-country"] == "Iran"]

<class 'pandas.core.frame.DataFrame'>
Index: 43 entries, 135 to 31985
Data columns (total 15 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   age             43 non-null     int64 
 1   workclass       43 non-null     object
 2   fnlwgt          43 non-null     int64 
 3   education       43 non-null     object
 4   education-num   43 non-null     int64 
 5   marital-status  43 non-null     object
 6   occupation      43 non-null     object
 7   relationship    43 non-null     object
 8   race            43 non-null     object
 9   sex             43 non-null     object
 10  capital-gain    43 non-null     int64 
 11  capital-loss    43 non-null     int64 
 12  hours-per-week  43 non-null     int64 
 13  native-country  43 non-null     object
 14  salary          43 non-null     object
dtypes: int64(6), object(9)
memory usage: 5.4+ KB
None
18
0.4186046511627907


Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
135,38,Federal-gov,125933,Masters,14,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,Iran,>50K
237,41,Private,180138,Bachelors,13,Married-civ-spouse,Exec-managerial,Husband,White,Male,0,0,50,Iran,>50K
348,56,Private,101128,Assoc-acdm,12,Married-spouse-absent,Other-service,Not-in-family,White,Male,0,0,25,Iran,<=50K
1054,38,Private,193026,Some-college,10,Divorced,Craft-repair,Not-in-family,White,Male,0,0,40,Iran,<=50K
1387,40,Self-emp-not-inc,266324,Some-college,10,Divorced,Exec-managerial,Other-relative,White,Male,0,1564,70,Iran,>50K
2652,34,Private,213226,Bachelors,13,Married-civ-spouse,Prof-specialty,Husband,White,Male,0,0,40,Iran,>50K
3389,51,Private,293196,Doctorate,16,Married-civ-spouse,Prof-specialty,Husband,White,Male,15024,0,60,Iran,>50K
3390,45,Private,214627,Doctorate,16,Widowed,Prof-specialty,Unmarried,White,Male,15020,0,40,Iran,>50K
4145,36,Private,95654,Bachelors,13,Married-civ-spouse,Prof-specialty,Wife,White,Female,0,0,50,Iran,>50K
4369,22,?,211968,Some-college,10,Never-married,?,Own-child,White,Female,0,0,40,Iran,<=50K


The country `Iran` has the highest percentage of people that earn >50K (18 on overall 43 people found in this data) with the percentage of `41.86%`

In [38]:
# Identify the most popular occupation for those who earn >50K in India.
peopleWhoEarnedMoreInIndia = data[(data["native-country"] == "India") & (data["salary"] == ">50K")]

print(peopleWhoEarnedMoreInIndia.head())

jobs_freq = peopleWhoEarnedMoreInIndia["occupation"].value_counts()

print("\n\n")
print(jobs_freq.sort_values())
print(jobs_freq.max())


      age  workclass  fnlwgt    education  education-num      marital-status  \
11     30  State-gov  141297    Bachelors             13  Married-civ-spouse   
968    48    Private  164966    Bachelors             13  Married-civ-spouse   
1327   52    Private  168381      HS-grad              9             Widowed   
7258   42  State-gov  102343  Prof-school             15  Married-civ-spouse   
7285   54  State-gov   93449      Masters             14  Married-civ-spouse   

           occupation relationship                race     sex  capital-gain  \
11     Prof-specialty      Husband  Asian-Pac-Islander    Male             0   
968   Exec-managerial      Husband  Asian-Pac-Islander    Male             0   
1327    Other-service    Unmarried  Asian-Pac-Islander  Female             0   
7258   Prof-specialty      Husband  Asian-Pac-Islander    Male             0   
7285   Prof-specialty      Husband  Asian-Pac-Islander    Male             0   

      capital-loss  hours-per-week nat

The most popular occupation for those who earn >50K in India is `Prof-specialty`