# A descriptive data analysis of the Danish Regions

## 0. <a id='toc0_'></a>[Preamble](#toc0_)

You may need to install the DST api-data reader to run all code in this project. Uncomment the following cells and run to install. 

In [1]:
# The DST API wrapper
    # %pip install git+https://github.com/alemartinello/dstapi

**Imports and set magics:**

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from dstapi import DstApi

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# import pyfile with plotting functions
import dataprojectplot as DLP 

**Table of contents**<a id='toc0_'></a>    
- 1. [Fetching and Exploring Data](#toc1_)
  - 1.1. [Dataset 1: The accounts of the regions, REGR11 from DST](#toc1_1_) 
  - 1.2. [Dataset 2: Fulltime employees over time per region, from KRL](#toc1_2_)     
- 2. [Merging the Datasets](#toc2_)
- 3. [Calculating summary statistics](#toc3_)     


## 1. <a id='toc1_'></a>[Fetching and exploring data](#toc0_)

### 1.1. <a id='toc1_1_'></a>[Dataset 1: The account of regions, REGR11 from DST](#toc0_)

We'll use [dstapi](https://github.com/alemartinello/dstapi) by Alessandro Martinello to fetch data from Danmarks Statistik. 

First, we create an dst api **object** that will allow us to interact with the DST server. 

In [3]:
ind = DstApi('REGR11') # object to interact with DST server

A quick overview of the available data in REGR11:

In [4]:
tabsum = ind.tablesummary(language='en')
display(tabsum)

Table REGR11: Regions accounts by main accounts by region, main account, dranst, kind, price unit and time
Last update: 2023-04-21T08:00:00


Unnamed: 0,variable name,# values,First value,First value label,Last value,Last value label,Time variable
0,OMRÅDE,6,000,All Denmark,081,Region Nordjylland,False
1,FUNK1,6,X,I alt hovedkonto 0-5,5,5 Interest etc.,False
2,DRANST,5,1,1 Current expenditure,7,7 Financing,False
3,ART,52,UE,Expenses exclusive calculating expenses,97,9.7 Internal revenues,False
4,PRISENHED,2,LOBM,"Current prices (DKK 1,000)",INDL,"Per capita, current prices (DKK)",False
5,Tid,16,2007,2007,2022,2022,True


To get an overview of the available values for each variable in the dataset, we make a loop:

In [5]:
# The available values for a each variable: 
for variable in tabsum['variable name']:
    print(variable+':')
    display(ind.variable_levels(variable, language='en'))

OMRÅDE:


Unnamed: 0,id,text
0,0,All Denmark
1,84,Region Hovedstaden
2,85,Region Sjælland
3,83,Region Syddanmark
4,82,Region Midtjylland
5,81,Region Nordjylland


FUNK1:


Unnamed: 0,id,text
0,X,I alt hovedkonto 0-5
1,1,1 Healthcare
2,2,2 Social and specialeducation
3,3,3 County development
4,4,4 Joint purpose and administration
5,5,5 Interest etc.


DRANST:


Unnamed: 0,id,text
0,1,1 Current expenditure
1,2,2 Reimbursement from central government
2,3,3 Capital expenditure
3,4,4 Interests
4,7,7 Financing


ART:


Unnamed: 0,id,text
0,UE,Expenses exclusive calculating expenses
1,UI,Expenses inclusive calculating expenses
2,TOT,Total
3,I,Incomes
4,S0,0 Calculating expenses
5,00,"0.0 Balance sheets, entries"
6,01,0.1 Depreciation
7,02,0.2 Changes in stocks
8,03,0.3 Pension provision for civil servants
9,04,0.4 Interest


PRISENHED:


Unnamed: 0,id,text
0,LOBM,"Current prices (DKK 1,000)"
1,INDL,"Per capita, current prices (DKK)"


Tid:


Unnamed: 0,id,text
0,2007,2007
1,2008,2008
2,2009,2009
3,2010,2010
4,2011,2011
5,2012,2012
6,2013,2013
7,2014,2014
8,2015,2015
9,2016,2016


Now we choose the data from the dataset that we want to focus on in this project:

In [6]:
# the _define_base_params -method gives us a nice template (selects all available data)
params = ind._define_base_params(language='en')
params

{'table': 'regr11',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['*']},
  {'code': 'FUNK1', 'values': ['*']},
  {'code': 'DRANST', 'values': ['*']},
  {'code': 'ART', 'values': ['*']},
  {'code': 'PRISENHED', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}

In [7]:
# manually selecting the data we want by editing the above template
params = {'table': 'regr11',
 'format': 'BULK',  
 'lang': 'en',
 'variables': [{'code': 'OMRÅDE', 'values': ['*']},
  {'code': 'FUNK1', 'values': ['X']},           #choosing X = "i alt hovedkonto 1-5", i.e. sum of all expenses  
  {'code': 'DRANST', 'values': ['1']},          #dranst1 = current expenditure 
  {'code': 'ART', 'values': ['TOT']},           #TOT = total, i.e. all "art" (types) of expenses
  {'code': 'PRISENHED', 'values': ['LOBM']},    #LOBM = Current prices (DKK 1.000)
  {'code': 'Tid', 'values': ['*']}]}            #choosing all availables years

Now we can load the data from DST via the API using the operations specified in the param dictionary. 

In [8]:
reg_api = ind.get_data(params=params)
reg_api.head(5)

Unnamed: 0,OMRÅDE,FUNK1,DRANST,ART,PRISENHED,TID,INDHOLD
0,Region Syddanmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2009,20462606
1,Region Hovedstaden,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2009,30587411
2,Region Syddanmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2022,28435709
3,Region Hovedstaden,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2022,41964302
4,Region Sjælland,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2022,21327292


We can sort by OMRÅDE and TID to get a nicer structure in the data. 

In [9]:
reg_api.sort_values(by=['OMRÅDE', 'TID'], inplace=True)
reg_api.reset_index(inplace=True) #resetting index 
reg_api.head(5)

Unnamed: 0,index,OMRÅDE,FUNK1,DRANST,ART,PRISENHED,TID,INDHOLD
0,8,All Denmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2007,84398718
1,25,All Denmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2008,90810648
2,10,All Denmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2009,96968991
3,20,All Denmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2010,99429448
4,50,All Denmark,I alt hovedkonto 0-5,1 Current expenditure,Total,"Current prices (DKK 1,000)",2011,99328557


We then choose the exact data we want, rename columns and do some scaling:

In [10]:
exp_df = reg_api[['OMRÅDE', 'TID', 'INDHOLD']] #selecting the relevant columns fra the above DataFrame and store in new DataFrame
exp_df = exp_df.rename(columns={'OMRÅDE': 'region', 'TID': 'year', 'INDHOLD': 'expenditure'}) #renaming columns
exp_df['expenditure'] = exp_df['expenditure'].div(10**6)  #scale expenditures to billions 

exp_df.head(5)

Unnamed: 0,region,year,expenditure
0,All Denmark,2007,84.398718
1,All Denmark,2008,90.810648
2,All Denmark,2009,96.968991
3,All Denmark,2010,99.429448
4,All Denmark,2011,99.328557


Lastly, as DST does not offer fixed prices as an option, we will adjust everything by using the CPI for Denmark. One should probably use a health/regiononal specific CPI, but here we just use he general CPI to adjust. The CPI cata is from DST and is stored in the excel file CPI_PRIS8.xlsx.

In [11]:
#read CPI data 
filename = 'CPI_PRIS8.xlsx'
cpi = pd.read_excel(filename, skiprows=2)
cpi = cpi.iloc[:-2] #drop last two rows
cpi = cpi.rename(index={0: 'CPI'}) #name index

#calculate growth rate and append
growth_rates = 1+cpi.pct_change(periods=1, axis=1)
cpi.loc['Growth Rates'] = growth_rates.iloc[-1]

#multipliers: the number you have to multiply the current price by to achieve 2022 prices
multiplier = np.cumprod(growth_rates.iloc[-1,::-1])
cpi.loc['Multiplier'] = multiplier[::-1]

#cleaning 
cpi = cpi.transpose()
cpi = cpi.reset_index()
cpi = cpi.rename(columns={'index': 'year'})
cpi = cpi[['year', 'Multiplier']]
cpi['year']=cpi['year'].astype(int)
cpi.head(5)

Unnamed: 0,year,Multiplier
0,2006,
1,2007,1.325254
2,2008,1.30295
3,2009,1.260113
4,2010,1.243677


So now we have the multipliers for each year to adjust from current prices to fixed 2022 prices. Now we use the to do exactly that: 

In [12]:
# merging the datasets 
fixp = pd.merge(exp_df, cpi, on=['year'], how='left')
fixp.sample(5)

Unnamed: 0,region,year,expenditure,Multiplier
7,All Denmark,2014,107.159252,1.146313
29,Region Hovedstaden,2020,39.072816,1.101578
27,Region Hovedstaden,2018,36.631218,1.118918
6,All Denmark,2013,104.810414,1.15529
38,Region Midtjylland,2013,22.305412,1.15529


In [13]:
# multiplying everything up to 2022 fixed prices 
fixp['Product'] = fixp['expenditure'] * fixp['Multiplier']
fixp = fixp[['region', 'year', 'Product']]
fixp = fixp.rename(columns={'Product': 'expenditure'})
exp_df = fixp #set the exp_df to the df with the fixed prices expenditures
exp_df.head()

Unnamed: 0,region,year,expenditure
0,All Denmark,2007,111.849759
1,All Denmark,2008,118.321689
2,All Denmark,2009,122.191868
3,All Denmark,2010,123.65816
4,All Denmark,2011,120.747821


We can now make an interactive plot to inspect the total operating expenditures Region by Region from 2007-2021 in Denmark.

In [14]:
DLP.plot_exp(exp_df)

interactive(children=(Dropdown(description='variable', options=('expenditure',), value='expenditure'), Dropdow…

### 1.2. <a id='toc1_2_'></a>[Dataset 2: Fulltime employees over time per region, from KRL](#toc0_)

The second dataset we will be looking at comes from Kommunernes og Regionernes Løndatakontor (KRL). 

The access to KRL's databases goes through their tool SIRKA. SIRKA is a reporting tool, which can produce a range of reports and tables on personnel, pay and absence in the Danish regions and municipalities. 

KRL provide an API with the adress https://www.krl.dk/sirka/sirkaApi/tableApi. A request consists of a JSON object and the API can answer in JSON, CSV or XLSX. 

For more info and documentation of the API follow this link: https://www.krl.dk/#/apibeta/description. 

...

In the following we acces a table with number of employees per region every year in january:

In [15]:
# acessing the KRL API
import requests 
import json
from pandas.io.json import json_normalize

d = {
  "table": "Personale-måned",
  "time": [
    {
      "y1": "2023",
      "m1": "01"
    },
    {
      "y1": "2022",
      "m1": "01"
    },
    {
      "y1": "2021",
      "m1": "01"
    },
    {
      "y1": "2020",
      "m1": "01"
    },
    {
      "y1": "2019",
      "m1": "01"
    },
    {
      "y1": "2018",
      "m1": "01"
    },
    {
      "y1": "2017",
      "m1": "01"
    },
    {
      "y1": "2016",
      "m1": "01"
    },
    {
      "y1": "2015",
      "m1": "01"
    },
    {
      "y1": "2014",
      "m1": "01"
    },
    {
      "y1": "2013",
      "m1": "01"
    },
    {
      "y1": "2012",
      "m1": "01"
    },
    {
      "y1": "2011",
      "m1": "01"
    },
    {
      "y1": "2010",
      "m1": "01"
    },
    {
      "y1": "2009",
      "m1": "01"
    },
    {
      "y1": "2008",
      "m1": "01"
    },
    {
      "y1": "2007",
      "m1": "01"
    }
  ],
  "control": [
    "kom_reg"
  ],
  "data": [
    "fuldtid"
  ],
  "selection": [
    {
      "name": "Udvalgte population",
      "filters": {
        "omr": [
          "1",
          "8"
        ]
      }
    }
  ],
  "options": {
    "totals": True,
    "outputFormat": "json",
    "actions": [],
    "tableName": "Antal ansatte",
    "subLimit": 5,
    "modelName": "SIRKA",
    "timeIncreasing": True
  },
  "dimension": {
    "viewportHeight": 591,
    "viewportWidth": 638,
    "xsMaxWidth": 768,
    "smMaxWidth": 992,
    "mdMaxWidth": 1200,
    "CONSTANTS": {
      "XS": 0,
      "SM": 1,
      "MD": 2,
      "LG": 3,
      "MAIL": 4
    }
  }
}

r = requests.post("https://www.krl.dk/sirka/sirkaApi/tableApi", json.dumps(d)) 

dict = json.loads(r._content)
df = json_normalize(dict)
emp_df = pd.DataFrame(df)
display(emp_df)

  df = json_normalize(dict)


Unnamed: 0,_YM,_BM,kom_reg,fuldtid
0,200701,Udvalgte population,081,11143.754125
1,200701,Udvalgte population,082,24408.394261
2,200701,Udvalgte population,083,23871.169375
3,200701,Udvalgte population,084,35617.518759
4,200701,Udvalgte population,085,15271.505287
...,...,...,...,...
148,202301,Udvalgte population,085,17323.430604
149,202301,Udvalgte population,999,890.179819
150,202301,Udvalgte population,,128846.836405
151,202301,,,128846.836405


Let's clean the data:

In [16]:
# rename columns 
emp_df = emp_df.rename(columns={'_YM': 'year', 'kom_reg': 'region', 'fuldtid': 'fulltime_emp'}) 

# select the relevant columns 
emp_df = emp_df[['year', 'region', 'fulltime_emp']] 

# give regions names based on their komreg number, which can be found at KRL or DST websites 
emp_df['region'] = emp_df['region'].replace('081', 'Region Nordjylland')
emp_df['region'] = emp_df['region'].replace('082', 'Region Midtjylland')
emp_df['region'] = emp_df['region'].replace('083', 'Region Syddanmark')
emp_df['region'] = emp_df['region'].replace('084', 'Region Hovedstaden')
emp_df['region'] = emp_df['region'].replace('085', 'Region Sjælland')
emp_df['region'] = emp_df['region'].replace('999', 'Øvrige')

# dropping none values. We do this, because the api has problems with reading the "i alt"
emp_df = emp_df.dropna()

# sort values on year and region
emp_df.sort_values(by=['year', 'region'], inplace=True) 

emp_df.sample(10)


Unnamed: 0,year,region,fulltime_emp
1,200701,Region Midtjylland,24408.394261
59,201301,Øvrige,502.242758
45,201201,Region Nordjylland,12763.882086
76,201501,Region Sjælland,15630.318222
9,200801,Region Nordjylland,11321.328307
66,201401,Region Hovedstaden,38913.814186
113,201901,Øvrige,761.284779
92,201701,Region Syddanmark,23896.940634
129,202101,Region Hovedstaden,41437.232412
130,202101,Region Sjælland,16656.824661


Because of the problem with data for "All Denmark", we instead create it by summing over the available data

In [17]:
# making the "All Denmark", i.e. sum of all regions in each year 
df_alt = {'year': [], 'fulltime_emp': []}

for year in emp_df.year.unique():
    df_temp = emp_df[emp_df["year"] == year]
    df_alt['fulltime_emp'].append(df_temp['fulltime_emp'].sum())
    df_alt['year'].append(year)

df_alt = pd.DataFrame(df_alt)

# adding the region column with values "All Denmark", to be ready to combine with emp_df
df_alt['region'] = 'All Denmark' 
df_alt.tail()

Unnamed: 0,year,fulltime_emp,region
12,201901,120500.020156,All Denmark
13,202001,122103.375728,All Denmark
14,202101,128186.597949,All Denmark
15,202201,131915.486956,All Denmark
16,202301,128846.836405,All Denmark


Now we want to add the "All Denmark" data to the regional employment data 

In [18]:
# combine the emp_df with the df_alt
df_combined = pd.concat([emp_df,df_alt], axis=0)

# drop the month indicating characters in year
df_combined['year'] = df_combined['year'].apply(lambda x: x[:-2])

# reset index
df_combined.reset_index(inplace=True)
df_combined = df_combined.drop('index', axis=1)

# sort values on year and region
emp_df.sort_values(by=['year', 'region'], inplace=True) 

# renaming and display final dataset
emp_df = df_combined
emp_df.head()


Unnamed: 0,year,region,fulltime_emp
0,2007,Region Hovedstaden,35617.518759
1,2007,Region Midtjylland,24408.394261
2,2007,Region Nordjylland,11143.754125
3,2007,Region Sjælland,15271.505287
4,2007,Region Syddanmark,23871.169375


Now let us plot the number of fulltime employees region by region just as we did with expenditures earlier

In [19]:
DLP.plot_emp(emp_df)

interactive(children=(Dropdown(description='variable', options=('fulltime_emp',), value='fulltime_emp'), Dropd…

## 2. <a id='toc2_'></a>[Merging the data sets](#toc0_)

Now we want to merge the data for expenditures and employment for the regions.

First, let us understand the differences between the datasets we have set up

**Find differences:**

In [20]:
# find differences
diff_x = [x for x in exp_df.year.unique() if x not in emp_df.year.unique()] 
print(f'years in exp_df data, but not in emp_df data: {diff_x}')

diff_y = [y for y in exp_df.region.unique() if y not in emp_df.region.unique()] 
print(f'regions in exp_df data, but not in emp_df data: {diff_y}')

diff_z = [z for z in emp_df.region.unique() if z not in exp_df.region.unique()] 
print(f'regions in exp_df data, but not in emp_df data: {diff_z}')

years in exp_df data, but not in emp_df data: []
regions in exp_df data, but not in emp_df data: []
regions in exp_df data, but not in emp_df data: ['Øvrige']


So because we have cleaned at setup up some nice DataFrames we dont have major differences between the datasets. 

We now perform **Left join** (one-to-one), that keeps observations which are in the left dataset (exp_df) or in both data sets (exp_df and emp_df)

This gives us the final combined dataset which we have been aiming for. 

In [21]:
# merging the datasets 
reg_exp_emp = pd.merge(exp_df, emp_df, on=['region','year'], how='left')
reg_exp_emp.head(10)

Unnamed: 0,region,year,expenditure,fulltime_emp
0,All Denmark,2007,111.849759,110800.519217
1,All Denmark,2008,118.321689,109977.924505
2,All Denmark,2009,122.191868,113612.885754
3,All Denmark,2010,123.65816,118856.231359
4,All Denmark,2011,120.747821,116444.296419
5,All Denmark,2012,122.574946,115318.482198
6,All Denmark,2013,121.086381,117186.444884
7,All Denmark,2014,122.838028,119226.61548
8,All Denmark,2015,124.802303,120088.195058
9,All Denmark,2016,127.424307,118814.060742


We then want to make an interactive plot where we can look at the development in expenses and employees reion by region. 

The Regions are mainly responsible for health care services in Denmark, so the number of employees is naturally one of the key variables in understanding the development in expenses. We can get an impression of the relationship using the interactive plot below. 

From the plotting we see that there definitely seems to be a relationship between the number of employees and expenditures. However, from this obviously too simple analysis, we can see that the relationship varies quite a lot for different regions.

As an example Region Midtjylland has had a relatively stable number of employees since 2013, but the expenditures have increased anyways. In Region Hovedstaden both the number of employeees and expenditures have been increasing. This can be explained e.g. by difference in demand for their services and ability to recruite workers. However, it also shows that other forces than the number of employees lead to increased expenses. 

In [22]:
DLP.plot_merged(reg_exp_emp)

interactive(children=(Dropdown(description='variable1', options=('fulltime_emp',), value='fulltime_emp'), Drop…

## 3. <a id='toc3_'></a>[Calculating summary statistics](#toc0_)

To get a quick overview of the data, we show some **summary statistics** on a meaningful aggregation.

First we drop the "All Denmark" rows (so we can get meaningful descriptive statistics)

Then, we pivot the dataset we have created.

In [23]:
# drop all rows where Region is 'All Denmark' (to get meaningful descriptive statistics)
reg_exp_emp = reg_exp_emp.drop(reg_exp_emp[reg_exp_emp['region'] == 'All Denmark'].index)

In [24]:
# set the "year" column as the new index
df_pivot = reg_exp_emp.pivot(index='region', columns='year')

# swap the two levels of the multi-level column index, so that "expenditure" and "fulltime_emp" become the top level
df_pivot = df_pivot.swaplevel(axis=1)

# sort the column index by the top level, which is now the years
df_pivot = df_pivot.sort_index(axis=1)

# rename and print the pivoted dataframe
reg_exp_emp = df_pivot
display(reg_exp_emp)


year,2007,2007,2008,2008,2009,2009,2010,2010,2011,2011,...,2018,2018,2019,2019,2020,2020,2021,2021,2022,2022
Unnamed: 0_level_1,expenditure,fulltime_emp,expenditure,fulltime_emp,expenditure,fulltime_emp,expenditure,fulltime_emp,expenditure,fulltime_emp,...,expenditure,fulltime_emp,expenditure,fulltime_emp,expenditure,fulltime_emp,expenditure,fulltime_emp,expenditure,fulltime_emp
region,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Region Hovedstaden,35.382572,35617.518759,37.13929,34602.377841,38.543588,35741.521158,38.28894,37829.511637,37.377024,36252.762736,...,40.987335,38649.541892,41.447802,39092.312202,43.041751,39412.585161,44.413748,41437.232412,45.195438,42139.14861
Region Midtjylland,23.727946,24408.394261,25.074702,25427.344754,25.81874,26448.80972,26.503497,27843.600491,25.832499,27558.52501,...,27.866908,27254.629531,28.203453,27242.838212,29.533833,27517.651209,30.508896,29134.004179,31.137855,30484.169791
Region Nordjylland,11.585159,11143.754125,12.453331,11321.328307,12.90229,12113.164528,12.987085,12918.091286,12.595956,12824.35372,...,13.371206,12405.144169,13.475563,12433.889366,14.124125,12557.495233,14.603258,13225.388547,14.912652,13423.704289
Region Sjælland,17.532909,15271.505287,18.710484,14772.524183,19.142057,15027.910046,19.542165,15517.610603,18.741028,15179.296882,...,20.377304,15789.60861,20.516175,15684.546298,21.516331,15998.882329,22.472986,16656.824661,22.969435,17605.566451
Region Syddanmark,23.621172,23871.169375,24.943883,23394.697863,25.785192,23861.508604,26.336473,24203.558309,26.201315,24339.892253,...,27.773015,24841.622235,28.146161,25285.149299,29.543556,25726.86799,29.857796,26877.504902,30.62518,27380.872419


Now we have a dataset that is meaningful to make summary statistics on, for example for the employment statistics. The dataset can be used quickly acces and look up summary statistics about employment in the Danish Regions:

In [25]:
reg_exp_emp.describe()

year,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020,2021,2022
Unnamed: 0_level_1,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp,fulltime_emp
count,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0
mean,22062.468361,21903.654589,22638.582811,23662.474465,23230.96612,23009.728819,23336.840425,23740.874508,23935.16733,23713.107881,23656.365605,23788.109287,23947.747076,24242.696385,25466.19094,26206.692312
std,9458.457314,9202.388617,9439.374183,10003.250797,9521.707165,9365.925909,9857.361881,10462.026122,10677.872695,10702.154775,10536.841521,10338.480778,10522.042692,10577.783512,11157.519933,11301.579813
min,11143.754125,11321.328307,12113.164528,12918.091286,12824.35372,12763.882086,12677.143903,12622.031084,12481.28113,12200.701636,12393.890065,12405.144169,12433.889366,12557.495233,13225.388547,13423.704289
25%,15271.505287,14772.524183,15027.910046,15517.610603,15179.296882,15052.018656,15185.360986,15365.711316,15630.318222,15532.997042,15654.200622,15789.60861,15684.546298,15998.882329,16656.824661,17605.566451
50%,23871.169375,23394.697863,23861.508604,24203.558309,24339.892253,24555.506007,24339.851404,24403.072815,24500.139137,24209.07247,23896.940634,24841.622235,25285.149299,25726.86799,26877.504902,27380.872419
75%,24408.394261,25427.344754,26448.80972,27843.600491,27558.52501,26786.173905,27290.004567,27399.743136,27557.672058,27220.613991,27159.791736,27254.629531,27242.838212,27517.651209,29134.004179,30484.169791
max,35617.518759,34602.377841,35741.521158,37829.511637,36252.762736,35891.063441,37191.841266,38913.814186,39506.4261,39402.154266,39177.004969,38649.541892,39092.312202,39412.585161,41437.232412,42139.14861
