# DATA VISUALISATION AND INTERPRETATION

This Jupiter notebook is used to visualise and interpret the collected data. Its ultimate output is our interpretation of the observed trends on regional and brick level of the Czech Republic using two charts - simple barchart on regional level and interactive heatmap on brick level - as a reference.

## Import of packages

geopandas, pandas and json packages will be used in the first part of this Jupiter notebook in order to convert our ready to use datasets to bokeh compatible format. For visualisation bokeh will be used.

In [1]:
import geopandas as gpd
import pandas as pd
import json
from bokeh.io import output_notebook, show, curdoc
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LogColorMapper, LogTicker, ColorBar, HoverTool
from bokeh.palettes import brewer
from bokeh.layouts import row, column

## Data loading and final adjustments

After a lot of research we found https://kge.zcu.cz/pesonal/PERSON/Novotna/vyuka/HGCR/Rgcr_s/data/okresy.zip from where we got ready to use map of the Czech bricks in shx format. As geopandas are designed specifically to deal with such data formats we will use them to read and process this file.

As nice as the dataset is, it is fairly old (2009) and uses abandoned encoding that caused problems while reading them. Luckily, geopandas are able to minimise the damage to the data.

Further we load the data generated by our DATA SCRAPER and DATA PROCESSOR using regular pandas.

In [2]:
bricks = gpd.read_file("C:/Users/jhabetinek/Desktop/Škola/Python/Project_work/Okresy/okresy.shx", encoding=  "Windows-1250")

brick_data = pd.read_csv("brick_data_final.csv", index_col = 0)
regional_data = pd.read_csv("regional_data_final.csv", index_col = 0)

We need merge the data about dentists and population on brick level with the geospacial dataset. Therefore we display the column names in order to find a common identifier. We conclude that we will join NAME on OKRES:

In [3]:
print(bricks.columns.values)
print(brick_data.columns.values)

['OBJECTID' 'AREA' 'PERIMETER' 'NAZEV' 'OB91' 'OB01' 'OB_311202' 'OKRES'
 'NUTS4' 'NUTS3' 'NUTS2' 'KRAJ1960' 'NK' 'KN' 'KNOK' 'NAZKR' 'Shape_Leng'
 'Shape_Area' 'geometry']
['OKRES' 'DENTISTS' 'POPULACE' 'PATIENTS_PER_DENTIST']


To avoid problems with merging we check whether the columns indeed contain identical values:

In [4]:
print(set(bricks['NAZEV'])-set(brick_data['OKRES']))

{'-stí nad Labem', 'Leská Lípa', 'Leský Krumlov', '-stí nad Orlicí', 'Leské Budějovice'}


As expected, the old enconding resulted in limited corruption of special characters. Thank to geopandas ability to care for most of the issues the fix is rather quick and simple:

In [5]:
string_in = ['Leská Lípa', '-stí nad Labem', '-stí nad Orlicí', 'Leský Krumlov', 'Leské Budějovice']
string_out = ['Česká Lípa', 'Ústí nad Labem', 'Ústí nad Orlicí', 'Český Krumlov', 'České Budějovice']

for i in range(len(string_in)):
    bricks.at[bricks.loc[:,'NAZEV'].str.contains(string_in[i]),'NAZEV'] = string_out[i]

print(set(bricks['NAZEV'])-set(brick_data['OKRES']))

set()


Now we are ready to perform a few concluding adjustments of our data:

In [6]:
#merging of data about dentists and population with the geospacial data
brick_final = bricks.merge(brick_data, how= 'left', left_on= 'NAZEV', right_on= 'OKRES')

#identification of needed columns
colnames = ['NAZEV','DENTISTS','POPULACE', 'PATIENTS_PER_DENTIST', 'geometry']

#reduction of the dataset to needed extent only
brick_final = brick_final[colnames]

#standardisation of the column names to English only
brick_final.rename(columns= {'NAZEV':'NAME', 'POPULACE':'POPULATION'}, inplace= True)

#rounding of the data on brick level to whole numbers only
brick_final = brick_final.round(0)

#rounding of the data on regional level to whole numbers only and sorting it
region_final = regional_data.round(0).sort_values('PATIENTS_PER_DENTIST')

#examining the structure of our final data
print(brick_final.head(5))
print(region_final.head(5))

                 NAME  DENTISTS  POPULATION  PATIENTS_PER_DENTIST  \
0  Hlavní město Praha    1825.0     1308632                 717.0   
1             Benešov      59.0       98708                1665.0   
2              Beroun      57.0       93726                1634.0   
3              Kladno      66.0      165271                2517.0   
4               Kolín      48.0      101604                2123.0   

                                            geometry  
0  POLYGON ((3468220.512840505 5542427.192383988,...  
1  POLYGON ((3457760.067134763 5517172.618280468,...  
2  POLYGON ((3412975.112725054 5519197.552682353,...  
3  POLYGON ((3420326.737435901 5567914.871895725,...  
4  POLYGON ((3483150.11336641 5536861.955130804, ...  
                    KRAJ  DENTISTS  POPULACE  PATIENTS_PER_DENTIST
0     Hlavní město Praha    1825.0   1308632                 717.0
8         Olomoucký kraj     569.0    632492                1111.0
1      Jihomoravský kraj    1036.0   1187667          

Since we are using bokeh package to create interactive heatmap, we need to convert our geopandas dataframe into geojson. Unlike geojson, geopandas are not compatible with bokeh.

In [7]:
brick_json = json.loads(brick_final.to_json())
brick_json = json.dumps(brick_json)

## Data visualisation

Finally, we are able to actually analyse our data. We will do so with the help of two charts generated using bokeh package:

1. **Bar chart showing number of patients per one dentist in Czech regions**
2. **Interactive heatmap on brick level showing**
    - _Brick name_
    - _Number of dentists_
    - _Population_
    - _Average number of patient per dentist (based on value of this variable colors are assigned)_

The process of creation those charts is in detail described in the code cell bellow:

In [8]:
###########################################################
# INTERACTIVE MAP CREATION
###########################################################

#Loading GeoJSON source that contains features for plotting
geosource = GeoJSONDataSource(geojson = brick_json)

#Defining a sequential color palette
palette = brewer['OrRd'][9]

#Reversing color order so that dark red is the worst availability
palette = palette[::-1]

#Instantiating LogColorMapper that logarithmically maps numbers in a range into a sequence of colors
color_mapper = LogColorMapper(palette = palette, low = 480, high = 3000)

#Adding hover tool - source of interactivity
hover = HoverTool(tooltips = [ ('Name','@NAME'),
                               ('N. of Dentists', '@DENTISTS'),
                               ('Population', '@POPULATION'),
                               ('AVG patients per dentist', '@PATIENTS_PER_DENTIST')])

#Creating color bar that will serve as a legend
color_bar = ColorBar(color_mapper = color_mapper, 
                     ticker = LogTicker(), 
                     label_standoff = 12 ,
                     width = 20, 
                     height = 500,
                     border_line_color = None,
                     location = (0,0))

#Creating figure object
p1 = figure(title = 'Stomatological care availability in Czech bricks in 2019 (interactive plot)',
            plot_height = 600 , 
            plot_width = 950,
            toolbar_location = None, 
            tools = [hover])

#Dropping gridlines
p1.xgrid.grid_line_color = None
p1.ygrid.grid_line_color = None

#Dropping axes
p1.axis.visible = False

#Filling the figure object with a map and coloring it
p1.patches('xs',
           'ys', 
           source = geosource,
           fill_color = {'field' :'PATIENTS_PER_DENTIST', 'transform' : color_mapper},
           line_color = 'black', line_width = 0.25, fill_alpha = 1)

#Specifying layout of the chart and legend
p1.add_layout(color_bar, 'right')

#########################################################################################
# SIMPLE BAR CHART CREATION
#########################################################################################

#Creating second figure object
p2 = figure(title = "Stomatological care availability in Czech regions in 2019 (simple plot)",
            x_range = region_final["KRAJ"],
            plot_height=600,
            plot_width = 950,
            toolbar_location = None
            )

#Filling it with a simple bar chart
p2.vbar(x= region_final["KRAJ"], 
        top= region_final["PATIENTS_PER_DENTIST"], 
        width = 0.9,
        color = [palette[1],
                 palette[3],
                 palette[3],
                 palette[3],
                 palette[3],
                 palette[5],
                 palette[5],
                 palette[5],
                 palette[5],
                 palette[5],
                 palette[5],
                 palette[5],
                 palette[8],
                 palette[8]])

#Getting rid of gridlines of x axis
p2.xgrid.grid_line_color = None

#Starting the y axis from 500 rather than 0
p2.y_range.start = 500

#Rotating labels such that they do not overlap
p2.xaxis.major_label_orientation = 1

#Adding y axis label
p2.yaxis.axis_label = 'Number of patients per one dentist'

#########################################################################################
# DASHBOARD CREATION
#########################################################################################

#Specification of overall layout
layout = column(p2, p1)
curdoc().add_root(layout)

#Showing plots within the Jupiter notebook
output_notebook()

#Closing with command to show the above generated dashboard
show(layout)

## Results interpretation