<div align="center"><h1> Demystifying Income Disparities: A Comprehensive Analysis of Factors Shaping Income Levels and Inequality </h1></div>

<div align="center"><b> Data605 Final project by Xinzheng Tang </b></div>

 Age, education level, occupation, and work hours per week, which can significantly impact the likelihood of earning more than $50K per year. Understanding these income groups and their defining features can provide valuable insights for policymakers and individuals seeking to address income inequality and improve economic opportunities. 

Income inequality is a prevalent issue in the United States, with various factors contributing to the disparities in earnings among different individuals. In this data story, we aim to explore the Census Income dataset from 1990, which provides information on various demographic, educational, and occupational factors of individuals. Our goal is to analyze the relationships between these factors and the likelihood of earning more than $50K per year.

To achieve this goal, we have structured our data story around five clear and explicit analysis questions. For each question, we describe the analysis process and present our findings using visualizations and statistical results. Our analysis will help shed light on the key factors influencing income levels and provide insights for policymakers and individuals seeking to address income inequality and improve economic opportunities in the United States.


<p align="center">
  <img src="question 1.png" alt="Question 1 Visualization">
</p>
<div align="center"><b>  </b></div>

## Question 1: Unlocking the Earning Potential: The Role of Education

*Does higher education truly lead to higher incomes? Let's delve into the correlation between education levels and the likelihood of earning more than $50K per year.*

To investigate the relationship between education levels and earning potential, we embarked on a deep analysis of the Census Income data. We grouped individuals according to their education levels and calculated the percentage of people within each group earning more than $50K per year. To make the data more comprehensible, we created a mapping of education levels in numerical order and sorted the income percentages accordingly. The results were then visualized in a bar chart.


<p align="center">
  <img src="question 1.png" alt="Question 1 Visualization">
</p>
<div align="center"><b> Figure 1 Percentage of Individuals Earning >50K by Education Level </b></div>


Our findings paint a clear picture: there is a strong correlation between education level and earning potential. As the level of education increases, so does the likelihood of earning more than $50K per year. This trend suggests that investing in higher education can significantly improve an individual's chances of earning a higher income.

However, it's essential to consider other factors that may also influence income levels, such as occupation, work hours per week, and native country. For instance, our analysis revealed that individuals with a Doctorate or Professional school degree have a significantly higher likelihood of earning more than $50K per year (over 50%) compared to those with lower education levels. In contrast, individuals with less than a high school education have a much lower likelihood of earning more than $50K per year (less than 5%).

Moreover, the bar chart demonstrates a clear trend where the percentage of individuals earning more than $50K per year generally increases as the education level increases. This suggests that pursuing higher education can have a considerable impact on an individual's income. However, it's crucial to weigh other factors, such as occupation and work hours per week, when making decisions about education and career paths.


## Question 2: Climbing the Occupational Ladder: How Job Roles Impact Income Levels

*Are some jobs more financially rewarding than others? Let's identify specific occupations with a higher proportion of individuals earning more than $50K per year.*

To answer this intriguing question, we delved into the Census Income data, grouping individuals by their occupation. We then calculated the percentage of individuals within each occupation earning more than $50K per year and visualized the results in a bar chart.


<p align="center">
  <img src="question 2.png" alt="Question 2 Visualization">
</p>
<div align="center"><b> Figure 2 Percentage of Individuals Earning >50K by Occupation </b></div>

Our meticulous analysis unveils a significant difference in income levels across different occupations. Some specific occupations, such as "Exec-managerial" and "Prof-specialty", boast a higher percentage of individuals earning more than $50K per year (over 30%). This suggests that individuals in these roles are more likely to earn higher incomes.

On the other hand, occupations such as "Handlers-cleaners," "Priv-house-serv," and "Farming-fishing" have a lower percentage of individuals earning more than $50K per year (below 10%). Individuals in these roles are less likely to earn higher incomes.

The chart also highlights that income levels can vary significantly across different occupations, indicating that an individual's occupation plays a crucial role in determining their income level. These findings can be invaluable for individuals contemplating various career paths and for policymakers striving to address income inequality across different sectors.



## Question 3: The Age-Income Connection: Exploring the Link Between Age and Earnings

*Does age play a role in determining income levels? Let's uncover if older individuals tend to earn more than younger individuals or if there's a specific age range where income peaks.*

Our findings unveil that the relationship between age and income levels follows a non-linear trend. To better understand this relationship, we first analyzed the age group distribution in the Census Income dataset. We found that the distribution is skewed towards younger age groups, with a higher frequency of individuals in their 20s and 30s.

<p align="center">
  <img src="question 3.png" alt="Question 3 Visualization">
</p>
<div align="center"><b> Figure 3 Income Levels Based on Age and Age Group Distribution </b></div>

Next, we explored the percentage of individuals earning more than $50K per year across different age groups. Our results show that income levels tend to rise with age during the early 20s to mid-50s, peak around the mid-50s, and then decline as individuals approach retirement age. This pattern suggests that income levels generally increase as individuals gain more experience and advance in their careers, but they may decline as people reduce their working hours or retire altogether.

Understanding the relationship between age and income levels can help inform policy decisions related to employment, retirement, and social welfare. For instance, policies that encourage older workers to stay in the workforce longer or provide training opportunities for mid-career professionals could help maintain higher income levels for a more extended period. On the other hand, policies that support younger workers in building their skills and advancing in their careers could help them achieve higher incomes earlier in their working lives.


## Question 4: Gender, Race, and the Income Gap: Exploring the Influence of Socio-Demographic Factors on Earnings

*Do gender and race contribute to income inequality? Let's delve into how these factors influence the probability of earning more than $50K per year.*

We began our analysis by examining the Census Income data, considering both gender and race categories. Our goal was to uncover any disparities that might exist between different groups when it comes to earning more than $50K per year. To visualize our findings, we created a combined bar plot that showcased the income percentages across various gender and racial groups.

<p align="center">
  <img src="question 4 a.png" alt="Question 4 a Visualization">
</p>
<div align="center"><b> Figure 4a Income Levels Based on Gender and Race </b></div>

Our findings reveal a striking income gap between genders within each racial group. Males consistently have a higher percentage of individuals earning more than $50K per year compared to females. This disparity highlights the influence of gender on income levels and calls attention to the need for policies that address gender-based income inequality.

Furthermore, our analysis shows that the Asian-Pac-Islander and White racial groups have a higher percentage of individuals earning more than $50K per year compared to other racial groups, such as Black, Amer-Indian-Eskimo, and Other. This difference suggests that race also plays a role in determining the likelihood of earning a higher income.

<p align="center">
  <img src="question 4 b.png" alt="Question 4 b Visualization">
</p>
<div align="center"><b> Figure 4b Patterns of Work Hours, Education Levels, and Capital Gains </b></div>

The earnings gap between females and males could be attributed to several factors, including higher work hours and education levels among males compared to females. Additionally, potential sexism in the workplace and women's increased involvement in family responsibilities may also contribute to this disparity. Interestingly, our findings show that there is no significant difference in capital gains between females and males, suggesting that financial situations may not play a major role in this earnings gap.

Understanding the impact of gender and race on income levels can help policymakers and businesses develop targeted strategies to address income inequality and promote equal opportunities for all. For instance, policies promoting equal pay for equal work, diversity and inclusion initiatives, and skills training for underrepresented groups could help bridge the income gap and create a more equitable workforce.


## Question 5: The Native Country Factor: Examining the Impact of Origin on Income Levels

*Do individuals from certain countries have a higher likelihood of earning more than $50K per year? Let's explore the potential reasons for these differences.*

To explore the influence of native countries on income levels, we analyzed the Census Income data and grouped individuals by their country of origin. We then calculated the percentage of individuals within each native country group earning more than $50K per year. To make our findings more accessible, we visualized the data in a bar chart.

<p align="center">
  <img src="question 5 a.png" alt="Question 5 a Visualization">
</p>
<div align="center"><b> Figure 5a Percentage of Individuals Earning >50K by Native Country </b></div>

Our analysis reveals that the likelihood of earning more than $50K per year varies significantly across different native countries. Individuals from countries such as Iran, Taiwan, and France have a higher percentage of earners above $50K per year, while those from countries like Vietnam, Mexico, and the Dominican Republic have a lower percentage. This suggests that factors such as cultural background, access to education, and economic opportunities in the native country may play a role in determining an individual's earning potential in the United States.

<p align="center">
  <img src="question 5 b.png" alt="Question 5 b Visualization">
</p>
<div align="center"><b> Figure 5b Percentage of Individuals Earning >50K by Continent </b></div>

Asia (AS), Europe (EU) , and North America (NA) American people have a higher average percentage of individuals earning more than $50K per year. This could be due to the presence of more developed countries in European continents and North America such as USA and Canada, which typically have higher income levels. In contrast, Central America (CA) and South America (SA) have a lower average percentage of individuals earning more than $50K per year. This could be due to the presence of more developing countries in these continents, which typically have lower income levels.

<p align="center">
  <img src="question 5 c.png" alt="Question 5 c Visualization">
</p>
<div align="center"><b> Figure 5c Work Hours, Education Level, and Earning More Than $50K by Continent </b></div>

For Asian countries, from the patterns of work hours, education level, and earning, we find that higher education levels (average over 11) and working hours per week (about 42 hours) have contributed to their highest percent of earning more than $50k. It is similar for European Americans.  In contrast, people from central America have a relatively lower education level (about 8.5) and shorter working hours (less than 39 hours per week), which may cause their earning potentially lower. The similar situation occurs on South America people but a bit better than Central America. It is notable that people from North America have a lower education level (about 8.5) but a longer working hours (about 41 hours per week). Mexico’s underdevelopment lowers North America’s education and income averages.

<p align="center">
  <img src="question 5 d.png" alt="Question 5 d Visualization">
</p>
<div align="center"><b> Figure 5d Distribution of Individuals Earning >50K by Developing and Developed Countries </b></div>

Developed countries have a higher median percentage of individuals earning more than $50K per year than developing countries. This is consistent with our previous findings from the continent-based analysis. The distributions within both developing and developed countries vary, with developed countries having a wider range of income percentages than developing countries.

It's worth noting that our analysis doesn't delve into the reasons behind these disparities, but it does provide insights into how native country may influence income levels. Policymakers, educators, and employers can use these insights to develop targeted strategies aimed at addressing income inequality and promoting economic opportunities for individuals from various backgrounds.

For instance, policies focused on providing equal access to education and skill-building opportunities for immigrants from underrepresented countries could help bridge the income gap. Additionally, diversity and inclusion initiatives that promote cultural understanding and create a more welcoming environment for individuals from different backgrounds can further contribute to a more equitable workforce.


## Question 6: Uncovering Distinct Income Groups: Using Clustering and Classification Techniques to Analyze the Interplay of Age, Education, Occupation, and Work Hours

*Can we identify distinct income groups based on a combination of features such as age, education, occupation, and work hours per week? What are the key characteristics that define these groups, and how do they differ in terms of the likelihood of earning more than $50K per year?*

To answer this intriguing question, we employed clustering and classification techniques on the Census Income data, focusing on the features of age, education, occupation, and work hours per week. Our goal was to identify distinct income groups and understand the key characteristics that define them, as well as how these groups differ in their likelihood of earning more than $50K per year.

<p align="center">
  <img src="question 6.png" alt="Question 6 Visualization">
</p>
<div align="center"><b> Figure 6 Clustering Results Visualized with PCA </b></div>

After applying clustering algorithms, we discovered several distinct income groups, each with unique characteristics. The key features defining these groups included:

1. Age: Younger individuals were more likely to be in lower-income groups, while older individuals were more prevalent in higher-income groups.
2. Education: Higher levels of education were associated with a greater likelihood of belonging to a higher-income group.
3. Occupation: Certain occupations, such as "Exec-managerial" and "Prof-specialty", were more prevalent in higher-income groups, while others, like "Handlers-cleaners" and "Farming-fishing", were more common in lower-income groups.
4. Work Hours per Week: Individuals working more hours per week were more likely to belong to higher-income groups, while those working fewer hours were more common in lower-income groups.

By understanding the characteristics that define these distinct income groups, we can gain valuable insights into the factors that influence an individual's likelihood of earning more than $50K per year. These insights can be used by individuals to make informed decisions about their career paths, as well as by policymakers and employers to develop targeted strategies aimed at promoting economic opportunities and addressing income inequality. For instance, policies that encourage skill development, improve access to education, and promote diversity in higher-paying occupations could help individuals from lower-income groups transition to higher-income groups, ultimately reducing income inequality.


In conclusion, our data-driven exploration of the Census Income dataset has uncovered valuable insights into the factors that influence income levels and contribute to income inequality. By examining the interplay of age, education, occupation, work hours, gender, race, and native country, we have identified key characteristics that define distinct income groups and affect an individual's likelihood of earning more than $50K per year.

Based on our findings, we offer the following practical recommendations to individuals, policymakers, and employers:

1. **Invest in education and skill development**: Pursuing higher education and acquiring relevant skills can significantly increase an individual's earning potential. Policymakers and employers should support initiatives that provide equal access to education and skill-building opportunities, especially for underrepresented groups.

2. **Promote diversity and inclusion in the workplace**: Addressing gender and racial disparities in income levels requires concerted efforts from both policymakers and employers. Implementing policies that ensure equal pay for equal work and fostering a diverse and inclusive work environment can help bridge the income gap.

3. **Encourage lifelong learning and career advancement**: As age and work experience play a crucial role in determining income levels, individuals should seek opportunities for continuous learning and professional growth. Employers and policymakers can support this by offering training programs and incentives for career development, particularly for mid-career professionals.

4. **Support work-life balance**: While working longer hours may lead to higher incomes, it's essential to consider the impact of work hours on overall well-being. Employers should encourage a healthy work-life balance by providing flexible work arrangements and promoting a supportive work culture.

5. **Address income disparities based on native country**: Policymakers and educators should focus on providing equal opportunities for immigrants from underrepresented countries, helping them bridge the income gap through targeted education, skill development, and cultural integration initiatives.

By implementing these recommendations, we can work towards creating a more equitable workforce and fostering equal opportunities for all, regardless of age, education, occupation, work hours, gender, race, or native country. Through a collaborative effort among individuals, policymakers, and employers, we can address income inequality and promote economic growth for everyone.



While our analysis has provided valuable insights into the factors that influence income levels and contribute to income inequality, it's important to acknowledge the limitations of our study:

1. **Dataset limitations**: The Census Income dataset, though comprehensive, might not capture all relevant factors that influence income levels. Additionally, the data is limited to the United States and may not be directly applicable to other countries.

2. **Causality**: Our study identified correlations between various factors and income levels, but it's essential to note that correlation does not imply causation. Further research is needed to establish causal relationships between these factors and income levels.

3. **Model assumptions**: The clustering and classification techniques used in our analysis rely on certain assumptions and simplifications, which may not always hold true in real-world scenarios. The results should be interpreted with caution, keeping these assumptions in mind.

4. **Generalizability**: Our findings are based on a specific dataset, and the results may not be generalizable to all populations or time periods. Future studies could benefit from analyzing more recent data or exploring income disparities in different countries or regions.

**Acknowledgements**

We would like to express our gratitude to the creators of the Census Income dataset for providing a rich source of data that enabled us to conduct this study. We also thank our peers and colleagues for their valuable feedback and support throughout the analysis process. Finally, we acknowledge the broader community of data scientists and researchers, whose work has laid the foundation for our study and continues to inspire us to explore and uncover new insights from data.
