Install the required libraries.

*   The pandas library allows us to work with data in a dataframe [(pandas - Python Data Analysis Library, 2021)](https://pandas.pydata.org/)
*   The NumPy provides us with more ways to work with numbers [(NumPy, 2021)](https://numpy.org/)
*   The matplotlib library provides us with more ways to create graphs  [(Matplotlib — Visualization with Python, 2021)](https://matplotlib.org/)
*   The plotly library allows us to work with interactive data visualisations [(Plotly, 2022)](https://plotly.com/python/)

In [5]:
import pandas as pd
import numpy as np
import matplotlib.cm as cm
import plotly.graph_objects as go


The following two functions were based on the example: https://plotly.com/python/box-plots/. 

To add county/precinct names as a label, code was modified based on this site: https://stackoverflow.com/questions/54368158/add-multiple-text-labels-from-dataframe-columns-in-plotly 

In [6]:
# function for distribution

def distribution(data_set, t, name_of_the_html_file, label):

  colors = ['rgba(93, 164, 214, 0.5)', 'rgba(255, 144, 14, 0.5)', 'rgba(44, 160, 101, 0.5)',
            'rgba(255, 65, 54, 0.5)', 'rgba(207, 114, 255, 0.5)', 'rgba(127, 96, 0, 0.5)']

  x_data = ['Hispanic','White','Black','Asian','Mixed','Others']

  fig = go.Figure()

  for xd, yd, cls in zip(x_data, data_set, colors):
          fig.add_trace(go.Box(
              y=yd,
              name=xd,
              boxpoints='all',
              jitter=0.5,
              whiskerwidth=0.2,
              fillcolor=cls,
              marker_size=2,
              line_width=2,
              boxmean=True,
              text = label)
          )

  fig.update_layout(
      title=t,
      yaxis=dict(
          autorange=True,
          showgrid=True,
          zeroline=True,
          dtick=0.1,
          gridcolor='rgb(255, 255, 255)',
          gridwidth=0.1,
          zerolinecolor='rgb(255, 255, 255)',
          zerolinewidth=1,
      ),
      margin=dict(
          l=40,
          r=30,
          b=80,
          t=100,
      ),
      paper_bgcolor='rgb(243, 243, 243)',
      plot_bgcolor='rgb(243, 243, 243)',
      showlegend=False
  )

  fig.write_html(name_of_the_html_file)
  fig.show()


In [40]:
# opening files and selecting the right columns:
race =  pd.read_csv('data/race_county_data/cleaned_georgia_race_county.csv', index_col = 0)
racedenscounty = race[['Population Density: Hispanic','Population Density: White','Population Density: Black','Population Density: Asian','Population Density: Mixed','Population Density: Others']]

In [39]:
# racial density distribution county level
# Georgia State 2020 county level population density distribution by race
# list1 is a list for county level pop density
list1 = [racedenscounty['Population Density: Hispanic'],racedenscounty['Population Density: White'],racedenscounty['Population Density: Black'],racedenscounty['Population Density: Asian'], racedenscounty['Population Density: Mixed'], racedenscounty['Population Density: Others']]
distribution(list1, 'Georgia 2020 County Level Population Density Distribution by Race', 'html_files/county_race_density.html', race['Area Name'])


In [9]:
from enum import auto
# for non density distribution
def raw_distribution (x, t, name_of_the_html_file, title):
  N= 50

  y_data = x

  colors = ['rgba(93, 164, 214, 0.5)', 'rgba(255, 144, 14, 0.5)', 'rgba(44, 160, 101, 0.5)',
            'rgba(255, 65, 54, 0.5)', 'rgba(207, 114, 255, 0.5)', 'rgba(127, 96, 0, 0.5)']

  x_data = ['Hispanic','White','Black','Asian','Mixed','Others']

  fig = go.Figure()

  for xd, yd, cls in zip(x_data, y_data, colors):
          fig.add_trace(go.Box(
              y=yd,
              name=xd,
              boxpoints='all',
              jitter=0.5,
              whiskerwidth=0.2,
              fillcolor=cls,
              marker_size=2,
              line_width=1,
              boxmean=True,
              text = title)
          )

  fig.update_layout(
      title=t,
      yaxis=dict(
          autorange=True,
          showgrid=True,
          zeroline=True,
          gridcolor='rgb(255, 255, 255)',
          zerolinecolor='rgb(255, 255, 255)',
          zerolinewidth=2,
      ),
      margin=dict(
          l=40,
          r=30,
          b=80,
          t=100,
      ),
      paper_bgcolor='rgb(243, 243, 243)',
      plot_bgcolor='rgb(243, 243, 243)',
      showlegend=False
  )

  fig.write_html(name_of_the_html_file)
  fig.show()

In [10]:
# Georgia state 2020 county level population distribution by race
# sample_data/html files/raw_county.html
# list2 is a list for raw data of county level population by race
list2 = [race['Hispanic'], race['White'], race['Black'], race['Asian'], race['Mixed'], race['Others']]
raw_distribution(list2, 'Georgia 2020 County Level Population Distribution by Race', 'html_files/county_raw_distribution.html', race['Area Name'])

In [52]:
# precinct level data
precincts = pd.read_csv('data/polling_site_data/clean_precincts_with_polling_site.csv', index_col = 0)

In [12]:
# raw precincts
# Georgia state 2020 precinct level population distribution by race
# sample_data/html files/raw_precinct.html
# list3 is a list for raw precinct level population data by race
list3 = [precincts['Hispanic'], precincts['White'], precincts['Black'], precincts['Asian'], precincts['Mixed'], precincts['Others']]
raw_distribution(list3, 'Georgia 2020 Precinct Level Population Distribution by Race', 'html_files/raw_precincts.html', precincts['Area Name'])

In [42]:
# general precinct racial density dist
# Georgia State 2020 precinct level population density distribution by race
# sample_data/html files/precinct_general_dist.html
# list4 is a list for precinct level population density by race
list4 = [precincts['Population Density: Hispanic'],precincts['Population Density: White'],precincts['Population Density: Black'],precincts['Population Density: Asian'], precincts['Population Density: Mixed'], precincts['Population Density: Others']]
distribution(list4, 'Georgia 2020 Precinct Level Population Density Distribution by Race', 'html_files/precinct_general_dist.html', precincts['Area Name'])

Citations:

Matplotlib.org. 2021. Matplotlib — Visualization with Python. [online] Available at: <https://matplotlib.org/> [Accessed 10 September 2021].

Numpy.org. 2021. NumPy. [online] Available at: <https://numpy.org/> [Accessed 10 September 2021].

Pandas.pydata.org. 2021. pandas - Python Data Analysis Library. [online] Available at: <https://pandas.pydata.org/> [Accessed 10 September 2021].

Plotly. (2019a). Box Plots. [online] Available at: <https://plotly.com/python/box-plots/> (Accessed: 11 January 2022).

Plotly. (2019b). Tables in Python. [online] Available at: <https://plotly.com/python/table/#styled-table> (Accessed: 11 January 2022).

Plotly. 2022. Plotly. [online] Available at: <https://plotly.com/python/> [Accessed 10 September 2021].

Stackoverflow. (2020). Add multiple text labels from DataFrame columns in Plotly. [online] Available at: <https://stackoverflow.com/questions/54368158/add-multiple-text-labels-from-dataframe-columns-in-plotly> (Accessed: 11 January 2022).