# Introduction
The goal of this analysis is to investigate the relationship between education and income in the United States for the year of 2021, specifically focusing on how this relationship varies across race and gender. This analysis is important because education and income are key indicators of socioeconomic status, and understanding the patterns and disparities in this relationship can inform policy decisions aimed at reducing inequality. This analysis is potentially interesting and useful from a human-centered perspective because it can show how different groups are impacted by socioeconomic inequality, and can inform interventions aimed at reducing these disparities.

# Data selected for analysis:
The dataset used for this analysis is the American Community Survey (ACS) Public Use Microdata Sample (PUMS) data for the year of 2021, which is available on the US Census Bureau website. This dataset includes detailed information on individuals and households, including educational attainment, income, race, gender, age, and other demographic characteristics. The dataset is publicly available and can be accessed at the following link: https://www.census.gov/programs-surveys/acs/microdata/access.html. The dataset is made available under the Open Government License, which allows for free use, reproduction, and distribution of the data.

The ACS PUMS dataset is useful in addressing the problem statement because it is representative of the United States population, and includes a large sample size that allows for meaningful analysis. However, the possible ethical considerations to using this dataset is that it includes sensitive personal information. To address these concerns, the data is anonymized and aggregated at the household and individual levels, and access to the data is restricted to authorized personnel.

In the data analysis, I will use measures such as mean, median, and standard deviation to describe the distribution of income and education by race and gender. I will use correlation analysis and linear regression to identify the association between education and income for each demographic group.

# Background and Related Work
Previous research has provided valuable insights into the relationship between education and income in the United States, as well as the disparities that exist across race and gender. Several studies have examined the impact of education on income, highlighting the importance of educational attainment in socioeconomic outcomes.

For instance, a study conducted by Carneiro and Heckman (2003)[1] found that education significantly affects individuals' earnings and long-term outcomes, such as employment rates and job stability. The study emphasized the positive relationship between educational attainment and income, suggesting that higher levels of education are associated with higher earnings.

Moreover, research on the gender wage gap has demonstrated persistent disparities in income between men and women. According to Blau and Kahn (2017)[2], even after accounting for factors such as education, experience, and occupation, a gender wage gap still exists, indicating that women, on average, earn less than men.

Furthermore, racial disparities in income have been extensively studied. A study by Chetty et al. (2018)[3] examined intergenerational income mobility across racial and ethnic groups in the United States. The research revealed substantial variation in upward mobility, with individuals from certain racial backgrounds experiencing lower levels of upward mobility compared to others.

The existing research informs my decision to do this project by highlighting the significance of investigating the relationship between education and income across race and gender. The findings of previous studies provide a foundation for my research questions and hypotheses, allowing me to explore specific dimensions of educational attainment and income disparities. By building upon the existing knowledge, I aim to contribute to the understanding of socioeconomic inequalities and inform potential interventions to reduce disparities.

# Research Questions¶
1. How does the relationship between education and income vary among different racial groups in the United States? Are there significant disparities in income levels based on educational attainment within each racial group?
2. Is there a gender-based wage gap in the United States, and does this gap differ across educational levels? How does the relationship between education and income differ between males and females?
3. Does the impact of educational attainment on income differ between racial groups? For example, do individuals from certain racial backgrounds experience a larger income increase for each level of education compared to others?

# Hypothesis
H1: Individuals from racial groups with historically lower average incomes, such as African Americans and Hispanics, will face income disparities even when comparing individuals with similar educational attainment within their respective racial groups.

H2: There is a gender-based wage gap in the United States, and the disparity increases as educational levels decrease.

H3: The impact of educational attainment on income will vary across racial groups, with some groups experiencing a larger income increase for each level of education compared to others.

# Methodology
## Analytical Methods:
I will use statistical measures such as mean, median, and standard deviation to examine the distribution of education levels and income across different races and genders.

To investigate the relationship between education and income, I will use inferential statistics, such as correlation analysis and linear regression. Correlation analysis will help me understand the strength and direction of the relationship between these variables. Linear regression will allow me to assess the impact of education on income while controlling for factors of race and gender.

Furthermore, I will also conduct subgroup analysis to explore variations in the education-income relationship within demographic groups of race and gender. This will involve running separate analyses for different racial and gender groups to understand how the relationship manifests within each subgroup.

## Presentation Method:
To present my findings, I will use a combination of data visualizations, model output interpretations, and tables. Data visualizations, such as bar charts and line graphs, will be used to illustrate the distribution of educational attainment and income across race and gender. I may use heatmaps or stacked bar charts to showcase variations in the education-income relationship among subgroups.

For the results of the inferential analyses, I will present the output of regression models, including coefficients and statistical significance. I will interpret these coefficients to provide insights into the associations between education and income, considering the influence of race and gender. Additionally, I will present summary tables summarizing key findings and statistical measures.

## Appropriateness and Usefulness of Methods:
The chosen analytical methods align with my research questions by allowing me to investigate the relationship between education and income across race and gender. Descriptive statistics will provide a clear overview of the distribution of educational attainment and income, highlighting potential disparities. Correlation analysis and linear regression will enable me to examine the strength and direction of the association between education and income.

Data visualizations will enhance the presentation of my findings by offering visual representations of the data. Model output interpretation and coefficient analysis will provide a deeper understanding of the relationships between variables and help answer the research questions more comprehensively. Tables will be employed to present concise summaries of the results, facilitating easy comparison and reference.

The combination of these methods will provide a comprehensive exploration of the relationship between education and income across race and gender. The visualizations, model output interpretation, and tables will aid in presenting the findings in a clear and concise manner, enhancing the overall understanding of the research outcomes.

## Unknowns and dependencies:
One factor outside of my control that might impact my ability to complete this project before the end of the quarter is the complexity of the analysis required to fully understand the relationship between education and income across different demographic groups. Additionally, the data cleaning and preprocessing required to analyze the data might take a lot of time, and there might be missing or incomplete data that could impact the accuracy of the results.

# Process
## Downloading Datasets
All datasets were downloaded from https://data.census.gov/mdat/#/search?ds=ACSPUMS1Y2021. Since the original dataset is too large to work with, I used the US Census Bureau's official online CSV formatting tool for creating subsets of the dataset that only include variables that concern my research questions. These variables are 'RAC1P' (race), 'SEX' (gender), 'SCHL' (educational attainment), and 'PINCP' (income). By placing different cominations of these variables on rows (x-axis), columns (y-axis), and values in data cells, I generated eight different tables of aggregate values which I used to explore the relationships between the four variables. I downloaded each dataset in table view (.CSV) and imported them onto individual pages on Google Sheets. The Google Sheets document containing all my datasets and data visualizations can be found here: https://docs.google.com/spreadsheets/d/1YuMvDZsYcTpA_iy6qM3s5g0YD4QwpMxsRR_9JI_Zv9c/edit?usp=sharing.

## Data Cleaning
For each table that concerned educational attainment, I removed the data for all grades below grade 12 to focus my research on adult populations. Several tables contained columns and rows for totals which I also removed.

## Organizing the Data
I organized the tables onto different sheets based on their combinations of variables and what research question they relate to. The tables were organized into the following sheets:
- Education vs Income, Race vs Income, Gender vs Income (Overall income trends)
- Education vs Race & Gender (Q1)
- Income vs Race & Gender (Q1)
- Income vs Gender & Education (Q2)
- Education & Income vs Race & Gender (Q2)
- Income & Education vs Race (Q3)

# Findings
## Exploring overall income trends
![Chart of edeucation vs income](dataviz/Education_vs_Income.png)
![Chart of race vs income](dataviz/Race_vs_Income.png)
![Chart of gender vs income](dataviz/Gender_vs_Income.png)

## Subgroup analysis
### Education vs Race & Gender
![Chart of education vs race](dataviz/Education_vs_Race.png)
![Chart of education vs gender](dataviz/Educational_vs_Gender.png)
![Chart of education vs gender & race](dataviz/Education_vs_Gender_&_Race.png)

### Income vs Race & Gender
![Chart of income vs race & gender](dataviz/Income_vs_Race_&_Gender.png)

### Income vs Gender & Education 
![Chart of income vs gender & education](dataviz/Income_vs_Gender_&_Education.png)

### Education & Income vs Race & Gender
![Chart of education & income vs race & gender](dataviz/Education_&_Income_vs_Race_&_Gender.png)

### Income & Education vs Race
![Chart of income & education vs race](dataviz/Income_&_Education_vs_Race.png)

## Statistical measures

# Discussion
## Limitations and Implications

# Conclusion

# References
[1] Carneiro, P., & Heckman, J. J. (2003). Human Capital Policy. National Bureau of Economic Research. Retrieved from https://www.nber.org/papers/w9495

[2] Blau, F. D., & Kahn, L. M. (2017). The gender wage gap: Extent, trends, and explanations. Journal of Economic Literature, 55(3), 789-865. doi: 10.1257/jel.20160995

[3] Chetty, R., Hendren, N., Jones, M. R., & Porter, S. R. (2018). Race and economic opportunity in the United States: An intergenerational perspective. The Quarterly Journal of Economics, 133(2), 697-747. doi: 10.1093/qje/qjy004