# Zusammenfassen der Daten

Nun fasse ich alle Daten in ein Dataframe zusammen. Dadurch bin ich bei der Analyse flexibel.

## Daten
### Bibliotheksstatistik

In [1]:
# import modules
import pandas as pd
import numpy as np

In [2]:
df = pd.read_csv('./data/1_dbs_data.csv', delimiter=';', header=[2,3], encoding='ISO-8859-1', low_memory=False)

sum_row = df.loc[(df[('NR','Unnamed: 0_level_1')] == 'Summe')]

library_df = sum_row.astype('string')

for label, content in library_df.iteritems():
    library_df[label] = library_df[label].str.replace('.', '', regex=True)
    library_df[label] = library_df[label].str.replace(',', '.', regex=True)

library_df = pd.melt(library_df)
library_df = library_df.rename(columns={"variable_0": "Auspraegung", 'variable_1': 'Jahr', 'value': 'Wert'})

library_df = library_df.drop([0, 1, 2, 3, 103])

library_df = library_df.pivot(index='Jahr', columns='Auspraegung', values='Wert')

for label, content in library_df.iteritems():
    library_df[label] = library_df[label].astype(float)

library_df.reset_index(inplace=True)

library_df

Auspraegung,Jahr,Ausg. Erwerbung,Best. virt.Best.,Bestand insges,Entl. ab 60 J.,Entl. bis 12 J.,Entl. virt.Best.,Entleih. insges.,Entleiher,Lfd. Ausgaben
0,2010,96842620.0,457360.0,123027730.0,737336.0,2076475.0,961683.0,375793896.0,7844530.0,822217200.0
1,2011,99526560.0,591069.0,122796523.0,760140.0,2081832.0,1443091.0,378719659.0,7725936.0,845951000.0
2,2012,100690600.0,829951.0,122025276.0,805601.0,2034793.0,3418398.0,375565974.0,7604293.0,858229300.0
3,2013,104213600.0,1265499.0,121235283.0,819270.0,2015921.0,7060821.0,373419034.0,7479921.0,874840000.0
4,2014,104353400.0,1622216.0,119858476.0,846045.0,2003788.0,11436854.0,362542582.0,7355708.0,902105000.0
5,2015,104934800.0,1134853.0,117511823.0,868571.0,1973670.0,16314414.0,361763984.0,7249936.0,921825100.0
6,2016,108944500.0,1491330.0,116080149.0,919600.0,2034986.0,19813365.0,355303531.0,7431335.0,948461400.0
7,2017,109927700.0,1305260.0,113808636.0,932875.0,2028492.0,24350279.0,344544584.0,7306049.0,946273400.0
8,2018,111316100.0,1780716.0,112176481.0,970320.0,2063797.0,28979729.0,338358540.0,7264051.0,971207500.0
9,2019,114191000.0,2977275.0,110813254.0,999957.0,2108134.0,33410061.0,338357207.0,7309666.0,1006269000.0


### Bevölkerungsstatistik

In [3]:
pop_df = pd.read_csv('./data/1_pop_data.csv', delimiter=';', encoding='ISO-8859-1', header=[5])

pop_df = pop_df.drop([11, 12, 13, 14])

pop_df = pop_df.rename(columns={"Unnamed: 0": "Jahr"})

pop_df['Anzahl'] = pop_df['Anzahl'].astype(int)

pop_df['Jahr'] = pd.to_datetime(pop_df['Jahr'], format='%d.%m.%Y')

pop_df

Unnamed: 0,Jahr,Anzahl
0,2010-12-31,81751602
1,2011-12-31,80327900
2,2012-12-31,80523746
3,2013-12-31,80767463
4,2014-12-31,81197537
5,2015-12-31,82175684
6,2016-12-31,82521653
7,2017-12-31,82792351
8,2018-12-31,83019213
9,2019-12-31,83166711


## Zusammenfügen der Daten

Im DataFrame sind folgende Angaben im **Zeitraum 2010-2020**:
| Spaltenname | Erklärung |
| --- | --- |
| year |Jahr |
| population | der Bevölkerungsstand |
| digital_readers | Anzahl der Menschen, die E-Books lesen |
| readers_weekly | Anzahl der Menschen, die mehrmals die Woche lesen |
| readers_monthly | Anzahl der Menschen, die mehrmals im Monat lesen |
| readers_once_month | Anzahl der Menschen, die einmal im Monat lesen |
| ebook_sales | Anzahl der verkauften E-Book |
| lenders | Anzahl der aktiven Bücherei-Nutzer*innen |
| lendings | Anzahl der ausgeliehen Medien |
| digital_lendings | Anzahl der ausgeliehen digitalen Medien |

In [4]:
df = pd.DataFrame(
    {
        # general info
        'year' : ([2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020]),
        'population' : pop_df['Anzahl'],
        
        # reading habits
        'digital_readers': ([np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, 7650000, 8520000, 8680000, 9120000]),

        'readers_weekly' : ([20400000, 20500000, 20000000, 19300000, 18500000, 18100000, 18100000, 11640000, 11930000, 11960000, 12620000]),
        'readers_monthly' : ([20900000, 20500000, 20600000, 21100000, 21300000, 20200000, 20100000, 15070000, 14840000, 14730000, 14710000]),
        'readers_once_month' : ([12000000, 11500000, 10100000, 10000000, 10400000, 10400000, 10100000, 6870000, 7060000, 7110000, 7030000]),

        # publishing information
        'ebook_sales': ([1900000, 4300000, 13200000, 21500000, 24800000, 27000000, 28100000, 29100000, 32800000, 32400000, 35800000]),

        # library information
        'lenders': library_df['Entleiher'],
        'lendings': library_df['Entleih. insges.'],
        'digital_lendings': library_df['Entl. virt.Best.'], 
        
    }
)

### Neue Daten aggregieren
Zudem werten wir einige Daten aus:
- Anzahl der Menschen, die regelmäßig lesen (Summe der wöchentlichen, monatlichen und den Leser*innen, die nur einmal Monat lesen)
- Anteil der Menschen, die E-Books lesen
- Anteil der Menschen, die regelmäßig lesen

In [5]:
# regular readers
df['readers'] = df['readers_weekly'] + df['readers_monthly'] + df['readers_once_month']

# readers relative to the population over the years
df['perc_digital_readers'] = df['digital_readers'] / df['population']
df['perc_readers'] = df['readers'] / df['population']

## Ergebnis

In [6]:
df

Unnamed: 0,year,population,digital_readers,readers_weekly,readers_monthly,readers_once_month,ebook_sales,lenders,lendings,digital_lendings,readers,perc_digital_readers,perc_readers
0,2010,81751602,,20400000,20900000,12000000,1900000,7844530.0,375793896.0,961683.0,53300000,,0.651975
1,2011,80327900,,20500000,20500000,11500000,4300000,7725936.0,378719659.0,1443091.0,52500000,,0.653571
2,2012,80523746,,20000000,20600000,10100000,13200000,7604293.0,375565974.0,3418398.0,50700000,,0.629628
3,2013,80767463,,19300000,21100000,10000000,21500000,7479921.0,373419034.0,7060821.0,50400000,,0.624014
4,2014,81197537,,18500000,21300000,10400000,24800000,7355708.0,362542582.0,11436854.0,50200000,,0.618245
5,2015,82175684,,18100000,20200000,10400000,27000000,7249936.0,361763984.0,16314414.0,48700000,,0.592633
6,2016,82521653,,18100000,20100000,10100000,28100000,7431335.0,355303531.0,19813365.0,48300000,,0.585301
7,2017,82792351,7650000.0,11640000,15070000,6870000,29100000,7306049.0,344544584.0,24350279.0,33580000,0.0924,0.405593
8,2018,83019213,8520000.0,11930000,14840000,7060000,32800000,7264051.0,338358540.0,28979729.0,33830000,0.102627,0.407496
9,2019,83166711,8680000.0,11960000,14730000,7110000,32400000,7309666.0,338357207.0,33410061.0,33800000,0.104369,0.406413
