# A Barchart Race - California Population Growth by County 1970 - 2018

## Introduction

In this notebook, I am visualizing the county-wise population growth in the State of California with a barchart race.

## Import Data

In [1]:
import numpy as np
import pandas as pd
import bar_chart_race as bcr

In [2]:
filepath = r'C:\Users\yunyi\OneDrive\Desktop\barchart_race\population_by_county_ca(1970-2018).xlsx'
data = pd.read_excel(filepath)
data

Unnamed: 0,County,Jurisdiction,Year,Population
0,Alameda,Alameda,1970,70968
1,Alameda,Albany,1970,15561
2,Alameda,Berkeley,1970,114091
3,Alameda,Dublin,1970,0
4,Alameda,Emeryville,1970,2681
...,...,...,...,...
26448,Yolo,Winters,2018,7292
26449,Yolo,Woodland,2018,60426
26450,Yuba,Marysville,2018,11883
26451,Yuba,Unincorporated Yuba,2018,59347


In [3]:
data.shape # check the shape

(26453, 4)

In [4]:
data.isna().sum() # check if there are missing values

County          0
Jurisdiction    0
Year            0
Population      0
dtype: int64

In [5]:
data['County'].nunique() # count the number of counties in California

58

## Data Preprocessing

__Remove Unnecessary Feature__

Since the barchart race is county-wise, there is no need for the jurisdiction district.

In [6]:
data = data.drop('Jurisdiction',axis=1)
data.head()

Unnamed: 0,County,Year,Population
0,Alameda,1970,70968
1,Alameda,1970,15561
2,Alameda,1970,114091
3,Alameda,1970,0
4,Alameda,1970,2681


__Transform the Data__

Based on the documentation of the bar_chart_race package, the input should have the time as index and the desired variables should be column names. To transform the original dataset, pivot table is utilized.

In [7]:
df = data.pivot_table(values='Population',index=['Year'],columns=['County'],aggfunc=np.sum)
df

County,Alameda,Alpine,Amador,Butte,Calaveras,Colusa,Contra Costa,Del Norte,El Dorado,Fresno,...,Sonoma,Stanislaus,Sutter,Tehama,Trinity,Tulare,Tuolumne,Ventura,Yolo,Yuba
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1970,1071446,484,11821,101969,13585,12430,556116,14580,43833,413329,...,204885,194506,41935,29517,7615,188322,22169,378497,91788,44736
1971,1082330,550,12375,104260,13780,12340,562770,14900,45700,418575,...,210470,198940,42570,29920,7750,192440,22660,387925,93240,44870
1972,1093880,600,12860,108295,14060,12325,569980,15290,48575,427260,...,219335,204985,43230,30390,8200,197935,23500,400450,94810,45320
1973,1094600,650,13405,112170,14590,12300,574400,15430,51275,432960,...,229645,210685,44150,30835,8700,202795,24340,412275,96430,45100
1974,1097670,700,14490,115905,15080,12450,578050,15740,54400,437290,...,239075,216460,45070,31625,9150,207635,24940,423350,97780,44310
1975,1102810,750,15130,119395,15450,12715,583370,16090,57725,446330,...,245475,222200,45940,32870,9600,213375,25950,434800,100310,44480
1976,1107350,800,15280,122880,15750,12900,594450,16260,60750,462405,...,252115,228980,47220,33050,10050,218435,26540,449500,104100,46010
1977,1106020,850,15790,127545,16490,12955,610230,16530,64850,478980,...,264565,239090,48550,33925,10500,223900,27870,467925,107410,46925
1978,1109850,950,16870,132415,17860,12910,625340,17210,70850,490530,...,277805,249375,49700,35155,11000,230645,29610,485375,108950,48180
1979,1104250,1050,18040,136530,19280,13010,638170,17660,77525,500115,...,288110,257370,51000,36555,11500,237180,32800,502575,110690,48820


Since the year column is int type, which may cause problems later in the animation, so it is converted to string.

In [8]:
df.index = df.index.map(str)

## Generating Bar Chart Race Animation

In [9]:
bcr.bar_chart_race(df = df,
                   n_bars = 10,
                   sort = 'desc',
                   title = 'California County Population Since 1970',
                   period_length=750,
                   filename = 'ca_county_pop.mp4',
                   filter_column_colors=True)

  font.set_text(s, 0.0, flags=flags)
  font.set_text(s, 0, flags=flags)
  ax.set_yticklabels(self.df_values.columns)
  ax.set_xticklabels([max_val] * len(ax.get_xticks()))
