# Introduction
The goal of this analysis is to investigate the relationship between education and income in the United States for the year of 2021, specifically focusing on how this relationship varies across race and gender. This analysis is important because education and income are key indicators of socioeconomic status, and understanding the patterns and disparities in this relationship can inform policy decisions aimed at reducing inequality. This analysis is potentially interesting and useful from a human-centered perspective because it can show how different groups are impacted by socioeconomic inequality, and can inform interventions aimed at reducing these disparities.

# Data selected for analysis:
The dataset used for this analysis is the American Community Survey (ACS) Public Use Microdata Sample (PUMS) data for the year of 2021, which is available on the US Census Bureau website. This dataset includes detailed information on individuals and households, including educational attainment, income, race, gender, age, and other demographic characteristics. The dataset is publicly available and can be accessed at the following link: https://www.census.gov/programs-surveys/acs/microdata/access.html. The dataset is made available under the Open Government License, which allows for free use, reproduction, and distribution of the data.

The ACS PUMS dataset is useful in addressing the problem statement because it is representative of the United States population, and includes a large sample size that allows for meaningful analysis. However, the possible ethical considerations to using this dataset is that it includes sensitive personal information. To address these concerns, the data is anonymized and aggregated at the household and individual levels, and access to the data is restricted to authorized personnel.

In the data analysis, I will use measures such as mean, median, and standard deviation to describe the distribution of income and education by race and gender. I will use correlation analysis and linear regression to identify the association between education and income for each demographic group.

# Background and Related Work
Previous research has provided valuable insights into the relationship between education and income in the United States, as well as the disparities that exist across race and gender. Several studies have examined the impact of education on income, highlighting the importance of educational attainment in socioeconomic outcomes.

For instance, a study conducted by Carneiro and Heckman (2003)[1] found that education significantly affects individuals' earnings and long-term outcomes, such as employment rates and job stability. The study emphasized the positive relationship between educational attainment and income, suggesting that higher levels of education are associated with higher earnings.

Moreover, research on the gender wage gap has demonstrated persistent disparities in income between men and women. According to Blau and Kahn (2017)[2], even after accounting for factors such as education, experience, and occupation, a gender wage gap still exists, indicating that women, on average, earn less than men.

Furthermore, racial disparities in income have been extensively studied. A study by Chetty et al. (2018)[3] examined intergenerational income mobility across racial and ethnic groups in the United States. The research revealed substantial variation in upward mobility, with individuals from certain racial backgrounds experiencing lower levels of upward mobility compared to others.

The existing research informs my decision to do this project by highlighting the significance of investigating the relationship between education and income across race and gender. The findings of previous studies provide a foundation for my research questions and hypotheses, allowing me to explore specific dimensions of educational attainment and income disparities. By building upon the existing knowledge, I aim to contribute to the understanding of socioeconomic inequalities and inform potential interventions to reduce disparities.

# Research Questions¶
1. How does the relationship between education and income vary among different racial groups in the United States? Are there significant disparities in income levels based on educational attainment within each racial group?
2. Is there a gender-based wage gap in the United States, and does this gap differ across educational levels? How does the relationship between education and income differ between males and females?
3. Does the impact of educational attainment on income differ between racial groups? For example, do individuals from certain racial backgrounds experience a larger income increase for each level of education compared to others?

# Hypothesis
H1: Individuals from racial groups with historically lower average incomes, such as African Americans and Hispanics, will face income disparities even when comparing individuals with similar educational attainment within their respective racial groups.

H2: There is a gender-based wage gap in the United States, and the disparity increases as educational levels decrease.

H3: The impact of educational attainment on income will vary across racial groups, with some groups experiencing a larger income increase for each level of education compared to others.

# Methodology
## Analytical Method:
I will conduct subgroup analysis to explore variations in the education-income relationship within demographic groups of race and gender. This will involve running separate analyses for different racial and gender groups to understand how the relationship manifests within each subgroup.

## Presentation Method:
To present my findings, I will use data visualizations, such as bar charts and line graphs, to illustrate the distribution of educational attainment and income across race and gender. I will also use stacked bar charts to showcase variations in the education-income relationship among subgroups.

## Appropriateness and Usefulness of Methods:
Subgroup analysis aligns with my research questions by allowing me to investigate the relationship between education and income across race and gender. Data visualizations will enhance the presentation of my findings by offering visual representations of the data. The combination of these methods will provide a comprehensive exploration of the relationship between education and income across race and gender, and will aid in presenting the findings in a clear and concise manner, enhancing the overall understanding of the research outcomes.

## Unknowns and dependencies:
One factor outside of my control that might impact my ability to complete this project before the end of the quarter is the complexity of the analysis required to fully understand the relationship between education and income across different demographic groups. Additionally, the data cleaning and preprocessing required to analyze the data might take a lot of time, and there might be missing or incomplete data that could impact the accuracy of the results.

# Process
## Downloading Datasets
All datasets were downloaded from https://data.census.gov/mdat/#/search?ds=ACSPUMS1Y2021. Since the original dataset is too large to work with, I used the US Census Bureau's official online CSV formatting tool for creating subsets of the dataset that only include variables that concern my research questions. These variables are 'RAC1P' (race), 'SEX' (gender), 'SCHL' (educational attainment), and 'PINCP' (income). By placing different combinations of these variables on rows (x-axis), columns (y-axis), and values in data cells, I generated eight different tables of aggregate values which I used to explore the relationships between the four variables. I downloaded each dataset in table view (.CSV) and imported them onto individual pages on Google Sheets. The Google Sheets document containing all my datasets and data visualizations can be found here: https://docs.google.com/spreadsheets/d/1YuMvDZsYcTpA_iy6qM3s5g0YD4QwpMxsRR_9JI_Zv9c/edit?usp=sharing.

## Data Cleaning
For each table that concerned educational attainment, I removed the data for all grades below grade 12 to focus my research on adult populations. Several tables contained columns and rows for totals which I also removed.

## Organizing the Data
I organized the tables onto different sheets based on their combinations of variables and what research question they relate to. The tables were organized into the following sheets:
- Education vs Income, Race vs Income, Gender vs Income (Overall income trends)
- Education vs Race & Gender (Q1)
- Income vs Race & Gender (Q1)
- Income vs Gender & Education (Q2)
- Education & Income vs Race & Gender (Q2)
- Income & Education vs Race (Q3)

# Findings
## Exploring overall income trends
![Chart of education vs income](dataviz/Education_vs_Income.png)

This bar chart was created by placing educational attainment on the x-axis and income on the y-axis. Based on this chart, we can see that people with higher educational attainment tend to have higher incomes, and most people have at least a bachelor's degree.

![Chart of race vs income](dataviz/Race_vs_Income.png)

This bar chart was created by placing race codes on the x-axis and income on the y-axis. From this chart, we can see that White and Asian subgroups receive about double the income of any other race.

![Chart of gender vs income](dataviz/Gender_vs_Income.png)

This bar chart was created by placing gender on the x-axis and income on the y-axis. From this chart, we can see that males earn almost two times more than females. Clearly, there seems to be a gender-based wage gap in the United States across all education levels and races.

## Subgroup analysis
### Education vs Race & Gender
![Chart of education vs race](dataviz/Education_vs_Race.png)

This stacked bar chart was created by placing education on the x-axis, percentage of individuals on the y-axis, and racial groups in the stacked bars to show the distribution of educational attainment within each racial group. This chart shows that the White racial group has the most representation across all levels of educational attainment, especially in higher education levels. Similarly, the Asian racial group also has most of it's population holding a bachelor's degree or higher. For Black, other, and two or more racial groups, their population distribution is skewed to the left, tending to hold educational attainments below a bachelor's degree.

![Chart of education vs gender](dataviz/Educational_vs_Gender.png)

This grouped bar chart was created by placing education on the x-axis, number of people on the y-axis, and splitting the grouped bars by gender. This chart reveals that across different levels of educational attainment, the number of males and females in each education bracket are fairly similar across all racial groups, with slightly more females receiving associate's, bachelor's, and master's degrees.

![Chart of education vs gender & race](dataviz/Education_vs_Gender_&_Race.png)

The eight grouped bar charts above were created by placing education on the x-axis, number of people on the y-axis, and splitting the group bars by gender for every racial group. Creating individual charts for each racial group will allow us to take a closer look at how the gender gap in education varies across different racial groups, as well as how educational attainment is distributed across each racial group.

From the white racial group, we can see that most people hold high school diplomas or bachelor's degrees and the proportion of males to females within each education bracket is pretty even, again with slightly more females holding college, associate's, bachelor's, and master's degrees. By comparing the magnitude of the y-axis scale to other racial groups, we can also see that more people from the white racial group have educational attainments across all brackets.

Trends in the two or more races subgroup are similar to those in the white racial group, where most of the population holds a high school diploma or bachelor's degree, although the proportion of bachelor's degree holders to high school diploma holders are half as much as the white racial group.

In the Asian subgroup, most of the population holds a bachelor's degree, master's degree, or high school diploma in that order. The notable trend here is that while all other subgroups have most of their population holding high school diplomas, the Asian subgroup has most of its population holding a bachelor's degree. This subgroup tends to skew towards attaining higher levels of education, and females are more represented than males across all education brackets except at the doctorate level.

In the black racial group, most of the population holds high school diplomas. Like the white and two or more races subgroups, the next most population education brackets are bachelor's and associate's degrees, although they are lesser in magnitude. Although more black males hold high school diplomas than black females, females are more represented than males in higher levels of education, especially in master's degrees where twice as many females hold this degree compared to males.

For the Native Hawaiian and Pacific Islander, some other race, and American Indian subgroups, they all have similar trends where most of the population holds high school diplomas, followed by bachelor's, associate's, some college, and GED. Compared to other racial groups, these groups have less of their population distributed amongst brackets of education beyond high school. Gender wise, more males hold high school diplomas compared to females, but females outnumber males in education brackets beyond some college. Compared to the aforementioned racial groups, these groups also have far fewer people holding master's degrees and beyond. 

Finally, for the Alaska Native subgroup, most of their population also holds high school diplomas, with more males than females in this bracket. The notable difference between this group and the other racial groups is that not much of the population holds an educational attainment beyond some college. However, far more females compared to males hold bachelor's and masters degrees compared to males, while more males compared to females hold high school diplomas or no high school diploma.

### Income vs Race & Gender
![Chart of income vs race & gender](dataviz/Income_vs_Race_&_Gender.png)

The grouped bar chart above was created by placing race on the x-axis, income on the y-axis, and splitting the grouped bars by gender. This chart reveals that across all races, males are earning higher incomes than females. The difference is especially shocking in the White and Asian racial groups, where males are earning almost double of their female counterparts.

### Income vs Gender & Education 
![Chart of income vs gender & education](dataviz/Income_vs_Gender_&_Education.png)

This line chart was created by placing education on the x-axis, income on the y-axis, and having one line for males and one line for females, showing the average income for each gender at each educational level. Similar to the previous finding, this chart shows that males are earning more than females even when they hold the same level of education. The difference is most drastic as educational attainment increases, with the biggest disparities at the professional, doctorate, and master's level.

### Education & Income vs Race & Gender
![Chart of education & income vs race & gender](dataviz/Education_&_Income_vs_Race_&_Gender.png)

This line chart was created by placing education on the x-axis, income on the y-axis, and having one line for each race and gender combination. Despite having many lines on this chart, we can see that income disparity increases as level of educational attainment increases. Similar to the previous finding, the biggest disparities exist at the professional, doctorate, and master's level. The most shocking difference here is that Asian males are earning 8 times more than Alaska Native females despite both holding professional degrees. Other high earners are White males and two or more race males, while low earners are black females and Native Hawaiian females, despite holding the same levels of education.

### Income & Education vs Race
![Chart of income & education vs race](dataviz/Income_&_Education_vs_Race.png)

Lastly, this line chart was created by placing education on the x-axis, income on the y-axis, and having one line for each race. This chart also reveals that income disparity increases at higher levels of education, with the biggest disparities at the professional and doctorate levels. Trend lines for the White and Asian subgroups show that they have they receive the highest income at each educational bracket while people from the American Indian, other race, and black subgroup consistently receive the lowest incomes.

# Discussion

The findings of this analysis provide insights into the relationship between education and income in the United States, with a focus on how this relationship varies across race and gender. The following discussion will address the limitations of the study and the implications of the findings.

## Limitations
1. **Data limitations**: The analysis is based on the American Community Survey (ACS) Public Use Microdata Sample (PUMS) data for the year of 2021. While the dataset is representative of the United States population, it may not capture all nuances and variations within different racial and gender groups. The dataset also relies on self-reported data, which may introduce biases and inaccuracies.

2. **Sample size**: Although the ACS PUMS dataset is large, the analysis focuses on specific subgroups, such as different racial and gender groups. The sample sizes within these subgroups may vary, leading to potential limitations in generalizing the findings to the entire population.

3. **Missing data**: There might be missing or incomplete data, which could impact the accuracy of the results. The exclusion of certain variables or cases due to missing data may limit the comprehensiveness of the analysis.

## Implications
1. **Income disparities within racial groups**: The findings reveal that even within racial groups, income disparities exist based on educational attainment. This suggests that historical factors and systemic inequalities may continue to influence income levels, perpetuating socioeconomic disparities. Policymakers should consider targeted interventions to address these disparities and provide equal opportunities for individuals within racial groups.

2. **Gender-based wage gap**: The analysis confirms the presence of a gender-based wage gap in the United States across all education levels and races. This finding highlights the need for policies and initiatives aimed at reducing gender-based income disparities. Efforts should focus on promoting equal pay for equal work, addressing workplace discrimination, and supporting women's career advancement.

3. **Variations across racial groups**: The analysis shows that the impact of educational attainment on income varies across racial groups. Some racial groups experience a larger income increase for each level of education compared to others. Understanding these variations can inform targeted interventions and educational initiatives to promote equal economic opportunities for all racial groups.

4. **Educational attainment distribution**: The subgroup analysis provides insights into the distribution of educational attainment within different racial and gender groups. Identifying educational gaps can help policymakers and educators design targeted programs to improve educational outcomes and increase access to higher education, particularly for underrepresented groups.

5. **Intersectionality**: The analysis examines the intersection of race and gender, highlighting the unique experiences and disparities faced by individuals who belong to multiple marginalized groups. Policymakers and researchers should consider the intersectionality of race and gender when addressing socioeconomic inequalities to ensure inclusive and equitable policies and interventions.

# Conclusion
The goal of this research was to investigate the relationship between education and income in the United States, with a focus on how this relationship varies across race and gender. By analyzing the American Community Survey (ACS) Public Use Microdata Sample (PUMS) data for the year 2021, several key findings have emerged.

Firstly, the analysis of overall income trends revealed a clear positive relationship between educational attainment and income. Individuals with higher levels of education tend to have higher incomes. Furthermore, it was observed that the White and Asian racial groups had higher incomes compared to other racial groups, while males earned significantly more than females across all education levels and races. These findings suggest the presence of income disparities based on both education and demographic factors.

Subgroup analysis provided deeper insights into the education-income relationship within specific racial and gender groups. The distribution of educational attainment varied across racial groups, with the White and Asian groups having higher proportions of individuals with bachelor's degrees or higher. In terms of gender, there were slight differences in educational attainment, with slightly more females holding associate's, bachelor's, and master's degrees compared to males.

The analysis of income by race and gender confirmed the presence of a gender-based wage gap, with males consistently earning higher incomes than females across all racial groups. The income disparities were particularly pronounced in the White and Asian racial groups.

The relationship between education and income within each racial group and gender category also showed variations. While higher education generally corresponded to higher incomes, the magnitude of income increase varied across racial groups. Some racial groups experienced a larger income increase for each level of education compared to others.

This research has provided valuable insights into the relationship between education and income in the United States, considering the variations across race and gender. The findings confirm the existence of income disparities based on education, race, and gender, with certain racial and gender groups facing greater disadvantages. These disparities highlight the importance of addressing socioeconomic inequality and developing targeted interventions to reduce the gaps. The results of this research can inform policy decisions aimed at promoting equal opportunities and reducing inequalities in education and income. Efforts should focus on eliminating barriers to education and economic opportunities, promoting pay equity, and addressing systemic factors that contribute to income disparities. By addressing these challenges, policymakers can work towards creating a more inclusive and equitable society, where individuals have equal opportunities for upward mobility and economic well-being.

# References
[1] Carneiro, P., & Heckman, J. J. (2003). Human Capital Policy. National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w9495

[2] Blau, F. D., & Kahn, L. M. (2017). The gender wage gap: Extent, trends, and explanations. Journal of Economic Literature, 55(3), 789-865. doi: 10.1257/jel.20160995

[3] Chetty, R., Hendren, N., Jones, M. R., & Porter, S. R. (2018). Race and economic opportunity in the United States: An intergenerational perspective. The Quarterly Journal of Economics, 133(2), 697-747. doi: 10.1093/qje/qjy004