# Creating graphs from Folkhälsomyndigheten's Covid19 statistics

This Jupyter Notebook uses the Covid19 statistics file from Folkhälsomyndigheten in Sweden to produce graphs of interest regarding the Covid-19 situation in Sweden. 

The notebook uses the following packages:

In [1]:
# Pandas for dataframes and Excel/csv file import
import pandas as pd

# Bokeh for graphs and palettes
from bokeh.plotting import figure, show
from bokeh.io import output_notebook, output_file
from datetime import date
from bokeh.models import CustomJS, DateRangeSlider
import bokeh.palettes
from bokeh.palettes import Category20_20

# Define that Bokeh shall produce output within the Jupyter Notebook
output_notebook()

# Pick a palette for comparison graphs
colo = bokeh.palettes.d3['Category20b'][20]

The (almost) daily statistics files from Folkhälsomyndigheten are made available in Excel format (.xlsx) with various data on different sheets. We also need the population of the different regions of Sweden for normalization purposes. This data can be found in "befolkning.csv". The population of Sweden is defined as the sum of the populations of all the regions.

We use the appropriate statistics data file to load for which we want to visualize the data. For the latest available data file, we can use "Folkhalsomyndigheten_Covid19_latest.xlsx"



In [2]:
fhm_data = pd.ExcelFile(r'https://github.com/unilsson/covid19-FHM/raw/main/data/Folkhalsomyndigheten_Covid19_latest.xlsx')
region_population = pd.read_csv(r'https://github.com/unilsson/covid19-FHM/raw/main/data/befolkning.csv', sep=';')
Sweden_population = sum(region_population.loc[:,'Antal'])

## Swedish Covid-19 cases

Here we will look at statistics for the number of Covid-19 cases in Sweden and its regions from various points of view. The number of cases per day are located on the sheet "Antal per dag region". This is read and stored in a pandas dataframe. 


In [3]:
covid19_cases = fhm_data.parse(sheet_name='Antal per dag region')

Besides the raw statistics, we also want to consider a 7-day rolling mean of the number of cases as well as the two-week cumulative case incidence for the whole of Sweden and per region. We will use pandas built-in function to calculate the 7-day rolling mean for Sweden as a whole as well as per region. The results are then stored in the pandas dataframe "covid19_cases_rolling_mean". 

In [4]:
covid19_cases_rolling_mean = covid19_cases[['Statistikdatum']].copy()
covid19_total_cases = covid19_cases[['Statistikdatum', 'Totalt_antal_fall' ]]
for i in range(1,covid19_cases.shape[1]):
    covid19_cases_rolling_mean[covid19_cases.columns[i]] = (covid19_cases.iloc[:,i].rolling(window=7).mean())

For later convenience, we can extract all the Swedish regions as a list called "Regions"

In [5]:
regions = covid19_cases_rolling_mean.columns[2:]

The two-week cumulative Covid-19 case coincidence is the sum of new cases over the latest two weeks, normalized to 100.000 citizens. The reason for normalization is to enable comparisons between different countries and different regions within a country. The result is stored in the pandas dataframe "covid19_cases_2w_inc".

In [6]:
covid19_cases_2w_inc = covid19_cases[['Statistikdatum']].copy()
covid19_cases_2w_inc['Totalt'] = covid19_cases.iloc[:,1].rolling(window=14).sum()*100000/Sweden_population
for i in range(2,covid19_cases.shape[1]):
    region = covid19_cases.columns[i]
    pop = (region_population.loc[ region_population['Region']==region]).iloc[0,1]
    covid19_cases_2w_inc[covid19_cases.columns[i]] = covid19_cases.iloc[:,i].rolling(window=14).sum()*100000/pop

We can now create some graphs to visualize the data. The first one will be the total number of Covid-19 cases in Sweden, using the 7-day rolling mean values.

In [7]:
p=figure(x_axis_type='datetime', sizing_mode='stretch_both', 
        title="Number of new Covid-19 cases",
        x_axis_label="Date", y_axis_label='Number of casses', toolbar_location=None)
p.vbar(x="Statistikdatum", top='Totalt_antal_fall', source=covid19_total_cases, line_color='blue', 
       legend_label='Reported cases')
p.line(x="Statistikdatum", y='Totalt_antal_fall', source=covid19_cases_rolling_mean, line_width=2, 
       line_color='red', legend_label='7-day rolling mean')
show(p)

We can also look at the number of new Covid-19 cases per region in Sweden, using the dataframes we constructed above. Since we are using bokeh to produce the graphs, they are interactive, which is very nice for comparisons. For simplicity, we make Stockholm the default visible graph - by clicking on other regions in the right list additional graphs will become visible.

In [8]:
p=figure(x_axis_type='datetime', sizing_mode='stretch_both', 
        title="Number of new Covid-19 cases in different regions of Sweden (7-day rolling mean)",
        x_axis_label="Datum", y_axis_label='Antal nya fall', toolbar_location=None)
c=0
for r in regions:
    li=p.line(x="Statistikdatum", y=r, source=covid19_cases_rolling_mean, line_color=colo[c % 20], 
           legend_label=r, line_width=3)
    if r=='Stockholm':
        li.visible=True
    else:
        li.visible=False
    c += 1
p.legend.location = "top_left"
p.legend.click_policy="hide"
show(p)

We now turn to the two-week cumulative Covid-19 case incidence and produce the same type of graphs as above. First we have the cumulative coincidence for Sweden.

In [9]:
p=figure(x_axis_type='datetime', sizing_mode='stretch_both', 
        title="Two-week cumulative Covid-19 case incidence in Sweden",
        x_axis_label="Date", y_axis_label='Number of cases', toolbar_location=None)
p.line(x="Statistikdatum", y='Totalt', source=covid19_cases_2w_inc, line_width=2, line_color='blue')
show(p)

We then consider the cumulative coincidence for every region in Sweden. The graph is interactive and we again begin with 
Stockholm as the only visible graph.

In [10]:
p=figure(x_axis_type='datetime', sizing_mode='stretch_both', 
        title="Two-week cumulative Covid-19 case incidence in different regions of Sweden per 100.000 citizens",
        x_axis_label="Date", y_axis_label='Number of cases', toolbar_location=None)
c=0
for r in regions:
    li=p.line(x="Statistikdatum", y=r, source=covid19_cases_2w_inc, line_color=colo[c % 20], 
           legend_label=r, line_width=3)
    if r=='Stockholm':
        li.visible=True
    else:
        li.visible=False
    c += 1
p.legend.location = "top_left"
p.legend.click_policy="hide"
show(p)

## Lethal cases of Covid-19 in Sweden

We now turn to the number of lethal cases of Covid-19 in Sweden. The information is available in the Excel sheet called "Antal avlidna per dag". The statistics contain a category of cases that cannot be attributed to a specific day. Since this is a small number we will discard it in the analysis. We again add the 7-day rolling mean values to the dataframe.

In [11]:
covid19_deceased = fhm_data.parse(sheet_name='Antal avlidna per dag'); covid19_deceased = covid19_deceased[:-1]
covid19_deceased['Smoothed'] = (covid19_deceased['Antal_avlidna'].rolling(window=7).mean()).values

The graph is again rendered using Bokeh.

In [12]:
p=figure(x_axis_type='datetime', sizing_mode='stretch_both', title="Number of lethal Covid-19 cases", 
         x_axis_label="Date", y_axis_label='NUmber of lethal cases', toolbar_location=None)
p.vbar(x="Datum_avliden", top='Antal_avlidna', source=covid19_deceased, line_color='blue', legend_label='Reported cases')
p.line(x='Datum_avliden', y='Smoothed', source = covid19_deceased, line_color='red', line_width=2, legend_label="7-day rolling mean")
show(p)

## Cases in Intensive Care

We now turn to the number of Covid-19 cases in Intensive care. The number of new Intensive Care cases per day is
in the sheet called "Antal intensivvårdade per dag". 

In [13]:
covid19_iva = fhm_data.parse(sheet_name='Antal intensivvårdade per dag')

We calculate and add the 7-day rolling mean to the dataframe.

In [14]:
covid19_iva['Smoothed'] = (covid19_iva['Antal_intensivvårdade'].rolling(window=7).mean()).values

The graph is generated using Bokeh. 

In [15]:
p=figure(x_axis_type='datetime', sizing_mode='stretch_both', title="Number of new Covid-19 cases per day requiring Intensive Care", 
         x_axis_label="Date", y_axis_label='Number of new cases', toolbar_location=None)
p.vbar(x="Datum_vårdstart", top='Antal_intensivvårdade', source=covid19_iva, line_color='blue', legend_label="Reported cases")
p.line(x='Datum_vårdstart', y='Smoothed', source = covid19_iva, line_color='red', line_width=2, legend_label="7-day rolling mean")

show(p)

# Municipalities (kommuner) in Sweden

Once a week, Folkhälsomyndigheten presents Covid-19 data for all the Swedish municipalities. This is the most granular level that statistrics is presented. Sweden has 290 municipalities which means that plotting them all in one graph will not work. Instead we create a function to which you can supply a list of one or several municipalities that you want to compare and it will plot the corresponding data.

First we extract the information from the Excel statistics file.

In [16]:
covid19_kommuner = fhm_data.parse(sheet_name='Veckodata Kommun_stadsdel')

The municipalities in Sweden can be extracted from the dataframe. These are the names that we can use as input to the plotting function we will define below. 

In [17]:
covid19_kommuner[ 'KnNamn'].unique()

array(['Ale', 'Alingsås', 'Alvesta', 'Aneby', 'Arboga', 'Arjeplog',
       'Arvidsjaur', 'Arvika', 'Askersund', 'Avesta', 'Bengtsfors',
       'Berg', 'Bjurholm', 'Bjuv', 'Boden', 'Bollebygd', 'Bollnäs',
       'Borgholm', 'Borlänge', 'Borås', 'Botkyrka', 'Boxholm', 'Bromölla',
       'Bräcke', 'Burlöv', 'Båstad', 'Dals-Ed', 'Danderyd', 'Degerfors',
       'Dorotea', 'Eda', 'Ekerö', 'Eksjö', 'Emmaboda', 'Enköping',
       'Eskilstuna', 'Eslöv', 'Essunga', 'Fagersta', 'Falkenberg',
       'Falköping', 'Falun', 'Filipstad', 'Finspång', 'Flen', 'Forshaga',
       'Färgelanda', 'Gagnef', 'Gislaved', 'Gnesta', 'Gnosjö', 'Gotland',
       'Grums', 'Grästorp', 'Gullspång', 'Gällivare', 'Gävle', 'Göteborg',
       'Götene', 'Habo', 'Hagfors', 'Hallsberg', 'Hallstahammar',
       'Halmstad', 'Hammarö', 'Haninge', 'Haparanda', 'Heby', 'Hedemora',
       'Helsingborg', 'Herrljunga', 'Hjo', 'Hofors', 'Huddinge',
       'Hudiksvall', 'Hultsfred', 'Hylte', 'Hällefors', 'Härjedalen',
       'Härnösan

The function "kommun" takes as input the full dataframe (covid19_kommuner), a list of municipalities to compare, and a pallette defined earlier.

In [18]:
def kommun( covid19_kommuner, mun, colo):
    q = figure(sizing_mode='stretch_both', y_range=(0,600), title='Number of new Covid19 cases per week per municipality',
              x_axis_label="Week", y_axis_label='New cases', toolbar_location=None)
    c=0
    for k in mun:
        n = covid19_kommuner.loc[ covid19_kommuner['KnNamn']==k, :].copy()
        rows = n.shape[0]
        for i in range(0, rows):
            if '<' in n.iloc[i, 8]:
                n.iloc[i, 1]='0'
        q.line(x=n['veckonummer'], y=n['nya_fall_vecka'], line_color = colo[c % 20], line_width=2, legend_label=k)
        c += 1
    q.legend.location = "top_left"
    q.legend.click_policy="hide"
    show(q)

We can now use this to graph the number of Covid-19 cases in a specific municipality, for example 'Botkyrka'.

In [19]:
kommun( covid19_kommuner, ['Botkyrka'], colo)

We can now extend the list of municipalities to produce a nice comparison graph.

In [20]:
kommun( covid19_kommuner, ['Haninge', 'Botkyrka', 'Huddinge', 'Tyresö', 'Nynäshamn'], colo)