# Employment data from census FactFinder Advanced Search utility

### Reference:  


# Employment Data for Top 5 Counties with Most Migration from SF County

## Data Workflow

1. Use the graphical interface to input as many locations as you want (by city, county, state, etc) 
Once you have selected all the key locations,  you can save the query,  which saves time if you are going to do other analyses later on

** In our case we have taken the then Top 5 counties with most migration from SF county -- both within California, and migration to other states

** Turns out that there is a lot of duplication in the census data,  so there is a lot of data-cleaning

2. Import the csv
*  Method: pd.read_csv

3. Clean up the data
* Reduce the data size and clean up the naming (for easier reference later on)

* df.drop()-    Drop columns all the columns which have no important data
* df.dropna() - Drop rows with NaN
* df.reset_index() - Reset the index because we had dropped out a few rows
* df[:x] - Drop rows, only keep x rows
* df.rename()  - Rename the colums with shorter names so the plots look ok
    
3.2      Transform the data.  Because of the %, $ the data inside the dataframe are all read in as objects
*   Doing it this way will clean the entire table, and the user can specify different facts to be used
     
*    df.set_index  - Move the "Fact" column into the index. That way the rest of the data will be cleaned of strings
*    df.replace    - Replace the %, $, ,  in the data to blank
*    df.apply(pd.to_numeric()) -- now change the objects in each column into numerics, "apply"  will apply to all cols

3.3      Setup to plot and plot
*    df[x:y]        -Select the row or rows of data you need
*    x_axis = np.arange(len(x_data)) -Set the x axis
*                   -Set the values, extract if needed
*    plt.bar()      - plot the bar chart

    

In [1]:
# Dependencies and Setup
%matplotlib inline
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Files to load
ACS_data = pd.read_csv("2017_S2506_Earning_and_housing.csv")
#non_CA_counties = pd.read_csv("non_CA_counties.csv")



ACS_data.drop(columns = ['GEO.id', 'GEO.id2','HC02_EST_VC01', 'HC02_EST_VC09','HC02_EST_VC28',
                         'HC01_EST_VC31','HC01_EST_VC32','HC01_EST_VC33','HC01_EST_VC34','HC01_EST_VC35',
                         'HC02_EST_VC31','HC02_EST_VC32','HC02_EST_VC33','HC02_EST_VC34','HC02_EST_VC35',
                         'HC02_EST_VC48', 'HC02_EST_VC78'], 
                        inplace=True)
ACS_data

Unnamed: 0,GEO.display-label,HC01_EST_VC01,HC01_EST_VC03,HC02_EST_VC03,HC01_EST_VC04,HC02_EST_VC04,HC01_EST_VC05,HC02_EST_VC05,HC01_EST_VC06,HC02_EST_VC06,...,HC02_EST_VC71,HC01_EST_VC74,HC02_EST_VC74,HC01_EST_VC75,HC02_EST_VC75,HC01_EST_VC76,HC02_EST_VC76,HC01_EST_VC77,HC02_EST_VC77,HC01_EST_VC78
0,Geography,Owner-occupied housing units with a mortgage; ...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...
1,"Alameda County, California",224114,2783,1.2,1402,0.6,15765,7,46577,20.8,...,0.4,3681,1.6,9423,4.2,209077,93.3,1933,0.9,5647
2,"Contra Costa County, California",189407,2472,1.3,1501,0.8,26919,14.2,56410,29.8,...,0.4,3431,1.8,8911,4.7,174901,92.3,2164,1.1,5018
3,"Los Angeles County, California",1103229,17447,1.6,9763,0.9,144831,13.1,382607,34.7,...,0.6,27443,2.5,77947,7.1,983436,89.1,14403,1.3,4095
4,"San Francisco County, California",90019,942,1,340,0.4,1819,2,5249,5.8,...,0.4,1962,2.2,4032,4.5,83203,92.4,822,0.9,7202
5,"San Mateo County, California",111963,1101,1,576,0.5,2250,2,7527,6.7,...,0.4,1368,1.2,4050,3.6,105626,94.3,919,0.8,6795
6,"Santa Clara County, California",256192,2842,1.1,1659,0.6,7873,3.1,27811,10.9,...,0.4,4112,1.6,8587,3.4,240208,93.8,3285,1.3,6893
7,"Cook County, Illinois",735268,21492,2.9,63230,8.6,390553,53.1,160250,21.8,...,0.6,18023,2.5,40264,5.5,668487,90.9,8494,1.2,4997
8,"Kings County, New York",175420,2926,1.7,1731,1,14969,8.5,41142,23.5,...,0.7,11131,6.3,12594,7.2,136860,78,14835,8.5,3894
9,"New York County, New York",93419,975,1,785,0.8,3949,4.2,10968,11.7,...,0.8,3605,3.9,4154,4.4,65611,70.2,20049,21.5,8170


In [None]:

#ACS_data = ACS_data.rename(columns={"HC01_EST_VC01": "# Owner-occupied units with a mortgage",
#                                 "HC01_EST_VC03":"#Units with mortgage <$50,000",
#                                 "HC02_EST_VC03":"%Units with a mortgage <$50,000",
#                                 "HC01_EST_VC04":"#Units Mortgage from $50K-$99K",
#                                "HC02_EST_VC04":"%Units Mortgage value $100K-$199K"
#                                            })

ACS_data.columns = ['#Owner occupied units with a mortgage',
                    '#Units with mortgage <$50K',
                    '%Units with mortgage <$50K',
                    '#Units with mortgage $50K-99K',
                    '%Units with mortgage $50K-99K',
                    '#Units with mortgage $100K-299K',
                    '%Units with mortgage $100K-299K',
                    '# $300-499K','% $300-499K', '# $500-749K', '% $500-749K',
                    '# $750K-$999K',"% $750K-$999K", "# >$1M", "% >$1M",
                    'Mortgage Median Value $',
                    '# Mortgages with either 2nd mortgage or home eq loan, but not both',
                    '% Mortgages with either 2nd mortgage or home eq loan, but not both',
                    '# Mortgages with 2nd mortgage only',
                    '% Mortgages with 2nd mortgage only',
                    '# Mortgages with home eq loan only'
                    '% Mortgages with home eq loan only',
                    '# with both 2nd and home eq loan',
                    '% with both 2nd and home eq loan',
                    '# Mortgages with only 1 loan',
                    '% Mortgages with only 1 loan',
                    '#Household income <$10K', '% income <$10K', '#Household income $10K-$24K', '% income $10K-24K'
                    '#$25K-$34K','%$25K-34K','# $35K-$49K','% $35K-$49K',
                    '#$50K-74K', '%$50K-74K', '#$75K-99K', '%$75K-99K',
                    '#$100K-$149K', '%$100K-140K','# >$150K', '% >$150K',
                    '#Monthly housing costs <$200', '%Monthly housing costs <$200',
                    '# $200-399', '% $200-399', '# $400-599', '% $400-599', '# $600-799', '% $600-799',
                    '# $800-999', '% $800-999', '# $1K-1.49K', '% $1K-$1.49', '# 1.5K-1.99K', '% $1.5K-1.99K',
                    '# $2K-2.49K', "% $2K - 2.49K", " $2.5K-2.99K", "% $2.5K-2.99K", "# > $3K", "% >3K",
                    'Median monthly cost ($)',
                    '#Income <$20K Total','%Income <20K Total',
                     '# Income <$20K, Monthly cost <20% of income', '% Income <$20K, Monthly cost <20% of income',
                    '# Income <$20K, Monthly cost 20%-29% of income', '% Income <$20K, Monthly cost 20%-29% of income',
                    '# Income <$20K, Monthly cost >30% of income', '% Income <$20K, Monthly cost >30% of income',
                     '#Income $20K-34K Total','%Income 20K-34K Total',
                      '# Income $20-34K, Monthly cost <20% of income', '% Income $20K-34K, Monthly cost <20% of income',
                    '# Income $20K-34, Monthly cost 20%-29% of income', '% Income $20-34K, Monthly cost 20%-29% of income',
                    '# Income $20K-34, Monthly cost >30% of income', '% Income $20K-34K, Monthly cost >30% of income',
                     '#Income $35K-50K Total','%Income 35K-50K Total',
                      '# Income $35-50K, Monthly cost <20% of income', '% Income $35K-50K, Monthly cost <20% of income',
                    '# Income $35K-50K, Monthly cost 20%-29% of income', '% Income $35-50K, Monthly cost 20%-29% of income',
                    '# Income $35K-50K, Monthly cost >30% of income', '% Income $35K-50K, Monthly cost >30% of income',
                        '#Income $50K-74K Total','%Income 50K-74K Total',
                     '# Income $50-75K, Monthly cost <20% of income', '% Income $50K-75K, Monthly cost <20% of income',
                    '# Income $50K-75K, Monthly cost 20%-29% of income', '% Income $50-75K, Monthly cost 20%-29% of income',
                    '# Income $50K-75K, Monthly cost >30% of income', '% Income $50K-75K, Monthly cost >30% of income',
                     '#Income >$75K Total','%Income >$75K Total',
                      '# Income >$75, Monthly cost <20% of income', '% Income >$75K, Monthly cost <20% of income',
                    '# Income >$75K, Monthly cost 20%-29% of income', '% Income >$75K, Monthly cost 20%-29% of income',
                    '# Income >$75K, Monthly cost >30% of income', '% Income >$75K, Monthly cost >30% of income',
                      '#No income Total','%No Income >$75K Total',
                    '# Real estate tax <$800', '% Real estate tax <$800',
                    '# Real estate tax $800-1500', '% Real estate tax $800-1500',
                    '# Real estate tax >$1500', '% Real estate tax >$1500',
                    '# Real estate tax none', '% Real estate tax none',
                    '# Real estate tax Median ($)'
                    
                   ]




In [2]:
ACS_data.set_index('GEO.display-label',drop=True, inplace=True)
ACS_data.head()

Unnamed: 0_level_0,HC01_EST_VC01,HC01_EST_VC03,HC02_EST_VC03,HC01_EST_VC04,HC02_EST_VC04,HC01_EST_VC05,HC02_EST_VC05,HC01_EST_VC06,HC02_EST_VC06,HC01_EST_VC07,...,HC02_EST_VC71,HC01_EST_VC74,HC02_EST_VC74,HC01_EST_VC75,HC02_EST_VC75,HC01_EST_VC76,HC02_EST_VC76,HC01_EST_VC77,HC02_EST_VC77,HC01_EST_VC78
GEO.display-label,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Geography,Owner-occupied housing units with a mortgage; ...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...,Percent owner-occupied housing units with a mo...,Owner-occupied housing units with a mortgage; ...
"Alameda County, California",224114,2783,1.2,1402,0.6,15765,7,46577,20.8,70199,...,0.4,3681,1.6,9423,4.2,209077,93.3,1933,0.9,5647
"Contra Costa County, California",189407,2472,1.3,1501,0.8,26919,14.2,56410,29.8,42685,...,0.4,3431,1.8,8911,4.7,174901,92.3,2164,1.1,5018
"Los Angeles County, California",1103229,17447,1.6,9763,0.9,144831,13.1,382607,34.7,285254,...,0.6,27443,2.5,77947,7.1,983436,89.1,14403,1.3,4095
"San Francisco County, California",90019,942,1,340,0.4,1819,2,5249,5.8,19632,...,0.4,1962,2.2,4032,4.5,83203,92.4,822,0.9,7202


In [3]:
ACS_dataT = ACS_data.T
#ACS_dataT.set_index('Geography', inplace=True)
ACS_dataT.drop(columns = ['Brooklyn borough, Kings County, New York', 'Manhattan borough, New York County, New York'], 
                        inplace=True)
ACS_dataT

GEO.display-label,Geography,"Alameda County, California","Contra Costa County, California","Los Angeles County, California","San Francisco County, California","San Mateo County, California","Santa Clara County, California","Cook County, Illinois","Kings County, New York","New York County, New York","Multnomah County, Oregon","Travis County, Texas","King County, Washington"
HC01_EST_VC01,Owner-occupied housing units with a mortgage; ...,224114,189407,1103229,90019,111963,256192,735268,175420,93419,127239,163943,357304
HC01_EST_VC03,Owner-occupied housing units with a mortgage; ...,2783,2472,17447,942,1101,2842,21492,2926,975,2459,3633,5454
HC02_EST_VC03,Percent owner-occupied housing units with a mo...,1.2,1.3,1.6,1,1,1.1,2.9,1.7,1,1.9,2.2,1.5
HC01_EST_VC04,Owner-occupied housing units with a mortgage; ...,1402,1501,9763,340,576,1659,63230,1731,785,1025,5312,2633
HC02_EST_VC04,Percent owner-occupied housing units with a mo...,0.6,0.8,0.9,0.4,0.5,0.6,8.6,1,0.8,0.8,3.2,0.7
HC01_EST_VC05,Owner-occupied housing units with a mortgage; ...,15765,26919,144831,1819,2250,7873,390553,14969,3949,49929,80090,76749
HC02_EST_VC05,Percent owner-occupied housing units with a mo...,7,14.2,13.1,2,2,3.1,53.1,8.5,4.2,39.2,48.9,21.5
HC01_EST_VC06,Owner-occupied housing units with a mortgage; ...,46577,56410,382607,5249,7527,27811,160250,41142,10968,46111,42958,119640
HC02_EST_VC06,Percent owner-occupied housing units with a mo...,20.8,29.8,34.7,5.8,6.7,10.9,21.8,23.5,11.7,36.2,26.2,33.5
HC01_EST_VC07,Owner-occupied housing units with a mortgage; ...,70199,42685,285254,19632,25533,68270,57233,51174,19175,18961,19510,84628


In [4]:


# Remove the rows which have NaNs,  doing inplace needed
ACS_dataT.dropna(inplace=True)


# Reset the index to keep everything in order, drop = True means that the original index will be discarded
# Do this because we need to have one DF that shows the row number as a reference (later code)
# Reference:  https://stackoverflow.com/questions/33165734/update-index-after-sorting-data-frame

ACS_dataT.reset_index(inplace=True)


# Only keep the top 131 rows of data

ACS_dataT = ACS_dataT[:116]


ACS_dataT

GEO.display-label,index,Geography,"Alameda County, California","Contra Costa County, California","Los Angeles County, California","San Francisco County, California","San Mateo County, California","Santa Clara County, California","Cook County, Illinois","Kings County, New York","New York County, New York","Multnomah County, Oregon","Travis County, Texas","King County, Washington"
0,HC01_EST_VC01,Owner-occupied housing units with a mortgage; ...,224114,189407,1103229,90019,111963,256192,735268,175420,93419,127239,163943,357304
1,HC01_EST_VC03,Owner-occupied housing units with a mortgage; ...,2783,2472,17447,942,1101,2842,21492,2926,975,2459,3633,5454
2,HC02_EST_VC03,Percent owner-occupied housing units with a mo...,1.2,1.3,1.6,1,1,1.1,2.9,1.7,1,1.9,2.2,1.5
3,HC01_EST_VC04,Owner-occupied housing units with a mortgage; ...,1402,1501,9763,340,576,1659,63230,1731,785,1025,5312,2633
4,HC02_EST_VC04,Percent owner-occupied housing units with a mo...,0.6,0.8,0.9,0.4,0.5,0.6,8.6,1,0.8,0.8,3.2,0.7
5,HC01_EST_VC05,Owner-occupied housing units with a mortgage; ...,15765,26919,144831,1819,2250,7873,390553,14969,3949,49929,80090,76749
6,HC02_EST_VC05,Percent owner-occupied housing units with a mo...,7,14.2,13.1,2,2,3.1,53.1,8.5,4.2,39.2,48.9,21.5
7,HC01_EST_VC06,Owner-occupied housing units with a mortgage; ...,46577,56410,382607,5249,7527,27811,160250,41142,10968,46111,42958,119640
8,HC02_EST_VC06,Percent owner-occupied housing units with a mo...,20.8,29.8,34.7,5.8,6.7,10.9,21.8,23.5,11.7,36.2,26.2,33.5
9,HC01_EST_VC07,Owner-occupied housing units with a mortgage; ...,70199,42685,285254,19632,25533,68270,57233,51174,19175,18961,19510,84628


In [8]:
# Rename the columns, look at the DataFrame
ACS_dataT = ACS_dataT.rename(columns={"San Francisco County, California": "San Francisco",
                                 "Alameda County, California":"Alameda",
                                 "San Mateo County, California":"San Mateo",
                                 "Contra Costa County, California":"Contra Costa",
                                "Los Angeles County, California":"Los Angeles",
                                "Santa Clara County, California":"Santa Clara"
                                            })


# Move the Facts into the index to get it out of the way since we don't need to clean the numbers in that column
# Making a new DF ca_data,  so  you can always refer to ca_df to see the line number of the row
#ca_data = CA_counties.set_index("Fact")

# Clean the $ and % and " signs from multiple columns, first put the columns put them in a list
# Reference:  https://stackoverflow.com/questions/38516481/trying-to-remove-commas-and-dollars-signs-with-pandas-in-python

cols = ACS_dataT.columns

# pass cols to df.replace(), specifying $,%" and , to be replaced by blanks

ACS_dataT[cols] = ACS_dataT[cols].replace({'\$': '', ',': '', '\%':'', '\"': ''}, regex=True)

ACS_dataT.head()

TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str'

In [6]:
# convert all objects to numerics
# reference:  https://stackoverflow.com/questions/36814100/pandas-to-numeric-for-multiple-columns
#cols = ACS_dataT.columns[ACS_dataT.dtypes.eq('object')]
ACS_dataT = ACS_dataT[cols].apply(pd.to_numeric, errors='coerce')
ACS_dataT

GEO.display-label,index,Geography,Alameda,Contra Costa,Los Angeles,San Francisco,San Mateo,Santa Clara,"Cook County, Illinois","Kings County, New York","New York County, New York","Multnomah County, Oregon","Travis County, Texas","King County, Washington"
0,,,224114.0,189407.0,1103229.0,90019.0,111963.0,256192.0,735268.0,175420.0,93419.0,127239.0,163943.0,357304.0
1,,,2783.0,2472.0,17447.0,942.0,1101.0,2842.0,21492.0,2926.0,975.0,2459.0,3633.0,5454.0
2,,,1.2,1.3,1.6,1.0,1.0,1.1,2.9,1.7,1.0,1.9,2.2,1.5
3,,,1402.0,1501.0,9763.0,340.0,576.0,1659.0,63230.0,1731.0,785.0,1025.0,5312.0,2633.0
4,,,0.6,0.8,0.9,0.4,0.5,0.6,8.6,1.0,0.8,0.8,3.2,0.7
5,,,15765.0,26919.0,144831.0,1819.0,2250.0,7873.0,390553.0,14969.0,3949.0,49929.0,80090.0,76749.0
6,,,7.0,14.2,13.1,2.0,2.0,3.1,53.1,8.5,4.2,39.2,48.9,21.5
7,,,46577.0,56410.0,382607.0,5249.0,7527.0,27811.0,160250.0,41142.0,10968.0,46111.0,42958.0,119640.0
8,,,20.8,29.8,34.7,5.8,6.7,10.9,21.8,23.5,11.7,36.2,26.2,33.5
9,,,70199.0,42685.0,285254.0,19632.0,25533.0,68270.0,57233.0,51174.0,19175.0,18961.0,19510.0,84628.0


In [None]:
# Remove extraneous rows, keep only Median Wages
wages = clean_ca[44:45]

#clean_ca.rename({"Median household income (in 2017 dollars), 2013-2017":"Median Income $"},axis=1)

wages

In [None]:
# Now setup x and y axes for the bar plot

x_data = wages.columns.tolist()   #  x_data has the list of counties to be plotted
x_axis = np.arange(len(x_data))

print(x_data)            # Check that this is a list of counties
print(wages.values[0])   # this is accessing the first (and only) row of data


median_wage = wages.values[0]    #  [[1,2,3,4,5]] ->  row zero [0] [1,2,3,4,5]




In [7]:
plt.bar(x_axis, median_wage, color='blue', alpha=0.5, align="center")

tick_locations = [value for value in x_axis]
#print(tick_locations)

plt.xticks(tick_locations, x_data, rotation="35", ha='right')   #ha indicates the alignment of the xlabel with the tick mark
plt.xlim(-0.75, len(x_axis)-0.25)
plt.ylim(0, max(median_wage) + 20000)
plt.title("Median Wages for People Leaving San Francisco County")
plt.xlabel("Comparison between SF and Top 5 CA counties")
plt.ylabel("Median Salary, ($)")

# Add labels to give more context 

style = dict(size=8, color ='black')
plt.text(0.3,120000, "US Census 2013-2017, All data shown in 2017 dollars", **style)
#plt.text(1,-70000, "All data shown in 2017 dollars", **style)





NameError: name 'x_axis' is not defined

In [None]:
non_ca_df = non_CA_counties.rename(columns={"San Francisco County, California":"San Francisco",
                                 "New York County (Manhattan Borough), New York":"NY (Manhattan), NY",
                                 "King County, Washington":"King, WA",
                                "Multnomah County, Oregon":"Multnomah, OR",
                                "Kings County (Brooklyn Borough), New York":"Kings (Brooklyn), NY",
                                "Cook County, Illinois":"Cook, IL",
                                "Travis County, Texas":"Travis, TX"
                                            })

# Move the Facts into the index to get it out of the way since we don't need to clean the numbers in that column
# Making a new DF ca_data,  so  you can always refer to ca_df to see the line number of the row
non_ca_data = non_CA_counties.set_index("Fact")

# Clean the $ and % and " signs from multiple columns, first put the columns put them in a list
# Reference:  https://stackoverflow.com/questions/38516481/trying-to-remove-commas-and-dollars-signs-with-pandas-in-python

cols = non_ca_data.columns

# pass cols to df.replace(), specifying $,%" and , to be replaced by blanks

non_ca_data[cols] = non_ca_data[cols].replace({'\$': '', ',': '', '\%':'', '\"': ''}, regex=True)

non_ca_data.head()

In [None]:
# convert all objects to numerics
# reference:  https://stackoverflow.com/questions/36814100/pandas-to-numeric-for-multiple-columns
#cols = ca_data.columns[ca_data.dtypes.eq('object')]
non_ca_data = non_ca_data[cols].apply(pd.to_numeric, errors='coerce')
non_ca_data.head()

In [None]:
# Remove extraneous rows, keep only Median Wages
wages_nonCA = non_ca_data[44:45]

#clean_ca.rename({"Median household income (in 2017 dollars), 2013-2017":"Median Income $"},axis=1)

wages_nonCA

In [None]:
# Now setup x and y axes for the bar plot

x2_data = wages_nonCA.columns.tolist()   #  x_data has the list of counties to be plotted
x2_axis = np.arange(len(x2_data))

print(x2_data)            # Check that this is a list of counties
print(wages_nonCA.values[0])   # this is accessing the first (and only) row of data


median_wage_nonCA = wages_nonCA.values[0]    #  [[1,2,3,4,5]] ->  row zero [0] [1,2,3,4,5]




In [None]:
plt.bar(x2_axis, median_wage_nonCA, color='blue', alpha=0.5, align="center")

tick_locations = [value for value in x2_axis]
#print(tick_locations)

plt.xticks(tick_locations, x2_data, rotation="35", ha='right')   #ha indicates the alignment of the xlabel with the tick mark
plt.xlim(-0.75, len(x2_axis)-0.25)
plt.ylim(0, max(median_wage) + 20000)
plt.title("Median Wages for People Leaving San Francisco County")
plt.xlabel("Comparison between SF and Top 5 non-CA counties")
plt.ylabel("Median Salary, ($)")

# Add labels to give more context 

style = dict(size=8, color ='black')
plt.text(0.4,120000, "US Census 2013-2017, All data shown in 2017 dollars", **style)
#plt.text(1,-70000, "All data shown in 2017 dollars", **style)



