In this project, I will take on the role of a Data Analyst at the Census Bureau, which collects census data and creates interesting visualizations and insights from it. I will clean the data and organize the data. The first visualization I will make is a scatterplot that shows average income in a state vs proportion of women in that state. From there, I will make histogram and bar graphs based on race data gouped by each state.
I will be using glob
to combine files, regex
to clean and replace data in columns, fillna
and drop_duplicates
to handle missing data, and matplotlib
to plot the cleaned data into visualization.
The data has been provided and oragnized in 10 files named states[0-9].
State
: the name of the stateTotalPop
: total population of state in whole numberHispanic
: percentage of population with indicated raceWhite
: percentage of population with indicated raceBlack
: percentage of population with indicated raceNative
: percentage of population with indicated raceAsian
: percentage of population with indicated racePacific
: percentage of population with indicated raceIncome
: average income of state in dollar formattingGenderPop
: number of population based on gender, expressed asnumberM_numberF