# Data API for Statistics Denmark (DST)

This notebook provides a short demo on how to call the DST api. 

Another approach for Python is found in Kristian Urup Larsen's repository [here](https://github.com/Kristianuruplarsen/PyDST) that implements functions for pulling data. 

The notebook contains a function that takes as input a table code and certain input parameters. The output is a Pandas DataFrame. The notebook provides an example of how to apply the function and structure the results.

In [52]:
import requests
from io import StringIO

import pandas as pd

dst_api_base = 'https://api.statbank.dk/v1/data/%s/csv?'
options_base =  {'valuePresentation':'Default',
                'timeOrder':'Ascending',
                'allowVariablesInHead':'true'}

def dst_api_dataframe(dataset, options_input={}):    
    '''
    This function produces a dataframe for the desired 
    Statistics Denmark (DST) table. It can be specified 
    to pull particular columns.
    
    Parameters
    ----------
    dataset : str
        DST name/code table to fetch
    options_input : dict, optional
        The extra options can be speficied to pull particular
        column values. E.g. {'ABC':'*'} will return all values 
        of the column 'ABC'. Note specific column values 
        requires separation with comma_char, see example below.
        
    Returns    
    -------
    df_out : pandas.DataFrame
        Table with desired formatted as a pandas DataFrame.
    '''
    
    url = dst_api_base % dataset
    
    options = options_base
    
    for k, v in options_input.items():
        options[k] = v
    
    for k, v in options.items():
        url += '%s=%s&' % (k,v) 
    
    response = requests.get(url[:-1])
    
    df_out = pd.read_csv(StringIO(response.text), sep=';') 
    
    return df_out 
    


# Example of calling API

In the following example we call the API using to fetch the table `KM5` which contain parish level demographic data. We want to pull all parishes for the years 2008-2010 and age 18-65.

In [53]:
parishes_str = '*' # all parish
ages_str = '*'
years_str = range(2008, 2011)

comma_char = '%2C' # separation character for making url

request_options =  {'SOGN': parishes_str, 
                    'TID': comma_char.join(map(str, years_str)),
                    'ALDER': comma_char.join(map(str, ages_str))}
    
dst_data = dst_api_dataframe('KM5', options_input=request_options)

Basic structuring of the dataset which includes renaming to english names. It also preserves the codes for municipality and parish for easy joining with other datasets. Note `SOGNEKODE` means parish code.

In [54]:
dst_data['age_lb'] = dst_data.ALDER.str.findall("[0-9]*").str[0].astype(int)

dst_data.drop('ALDER', axis=1, inplace=True)

col_map = {'TID':'year', 
           'INDHOLD':'obs_value'}

dst_data['sognekode'] = dst_data.SOGN.str[:4]
dst_data.rename(columns=col_map, inplace=True)

### Structuring the data

We compute the share of young people in each parish.

In [67]:
# define youth variable
dst_data['young'] = dst_data.age_lb.between(20,29)
c_share = 'Share of young (20-30)'
c_count = 'Count of young (20-30)'

# count number of young using groupby
young_count = dst_data\
                .groupby(['sognekode', 'year', 'young'])\
                .obs_value.sum()\
                .unstack(level=2)\
        

young_share = young_count\
                .pipe(lambda df: df[True]/df.sum(1))\
                .rename(c_share)\
    
    
youth_stats = pd.concat([young_share, 
                         young_count[True].rename(c_count)], 1)\
                .pipe(lambda df: df.mask(df.isnull().max(1)).reset_index())