<a href="https://colab.research.google.com/github/yuliiabosher/Fiber-optic-project/blob/european_historical_data/EU_historical_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The required Python libraries were imported

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

The sheet 'Data' was loaded from the Excel document that could be downloaded using the following link https://ec.europa.eu/newsroom/dae/redirection/document/106734 found at https://digital-strategy.ec.europa.eu/en/library/digital-decade-2024-broadband-coverage-europe-2023. The columns of interest were separated.

In [2]:
eu_broadband = pd.read_excel('https://ec.europa.eu/newsroom/dae/redirection/document/106734', sheet_name ='Data', skiprows=6)
eu_columns = ['Country', 'Metric', 'Geography level', 2018, 2019, 2020, 2021, 2022, 2023]
eu_broadband = eu_broadband[eu_columns]
display(eu_broadband.head())

Unnamed: 0,Country,Metric,Geography level,2018,2019,2020,2021,2022,2023
0,Austria,Land area,Total,83879.0,83879.0,83879.0,83927.0,83927.0,83927.0
1,Austria,Population,Total,8772865.0,8858775.0,8901064.0,8932664.0,8978929.0,9104772.0
2,Austria,Households,Total,3935534.0,3883312.0,3918929.0,3959143.0,3995050.0,4033080.0
3,Austria,Broadband coverage (>2Mbps),Total,3858862.0,3813412.384,3863423.0,,,
4,Austria,Broadband coverage (>30Mbps),Total,2847375.0,3058873.0,3394576.0,3694166.0,3787714.0,3797226.0


The base dataframe datatypes and number of missing values are as follows

In [3]:
display(eu_broadband.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1650 entries, 0 to 1649
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Country          1650 non-null   object 
 1   Metric           1650 non-null   object 
 2   Geography level  1650 non-null   object 
 3   2018             1122 non-null   float64
 4   2019             1091 non-null   float64
 5   2020             1157 non-null   float64
 6   2021             1138 non-null   float64
 7   2022             1170 non-null   float64
 8   2023             1138 non-null   float64
dtypes: float64(6), object(3)
memory usage: 116.1+ KB


None

The distinct values in the 'Geography level' column are as follows

In [4]:
display(eu_broadband['Geography level'].unique())

array(['Total', 'Rural'], dtype=object)

The EU broadband dataframe was cleaned to include 'Total' values only in the 'Geography level' column.

In [5]:
eu_broadband = eu_broadband.query('`Geography level` == "Total"')
display(eu_broadband['Geography level'].unique())
display(eu_broadband.head())

array(['Total'], dtype=object)

Unnamed: 0,Country,Metric,Geography level,2018,2019,2020,2021,2022,2023
0,Austria,Land area,Total,83879.0,83879.0,83879.0,83927.0,83927.0,83927.0
1,Austria,Population,Total,8772865.0,8858775.0,8901064.0,8932664.0,8978929.0,9104772.0
2,Austria,Households,Total,3935534.0,3883312.0,3918929.0,3959143.0,3995050.0,4033080.0
3,Austria,Broadband coverage (>2Mbps),Total,3858862.0,3813412.384,3863423.0,,,
4,Austria,Broadband coverage (>30Mbps),Total,2847375.0,3058873.0,3394576.0,3694166.0,3787714.0,3797226.0


Following the cleaning steps the 'Geography level' column was dropped altogether.

In [6]:
eu_broadband = eu_broadband.drop(columns=['Geography level'])
display(eu_broadband.head())

Unnamed: 0,Country,Metric,2018,2019,2020,2021,2022,2023
0,Austria,Land area,83879.0,83879.0,83879.0,83927.0,83927.0,83927.0
1,Austria,Population,8772865.0,8858775.0,8901064.0,8932664.0,8978929.0,9104772.0
2,Austria,Households,3935534.0,3883312.0,3918929.0,3959143.0,3995050.0,4033080.0
3,Austria,Broadband coverage (>2Mbps),3858862.0,3813412.384,3863423.0,,,
4,Austria,Broadband coverage (>30Mbps),2847375.0,3058873.0,3394576.0,3694166.0,3787714.0,3797226.0


The distinct metrics in the base dataframe are as follows

In [7]:
display(eu_broadband['Metric'].unique())

array(['Land area', 'Population', 'Households',
       'Broadband coverage (>2Mbps)', 'Broadband coverage (>30Mbps)',
       'Broadband coverage (>100Mbps)', 'Broadband coverage (>1Gbps)',
       'Broadband coverage (>1Gbps upload and download)',
       'Fixed broadband coverage', 'NGA coverage',
       'Fixed VHCN coverage (FTTP & DOCSIS 3.1)',
       'VHCN coverage (as defined by BEREC)', 'DSL', 'VDSL',
       'VDSL 2 Vectoring', 'FTTP', 'Cable modem DOCSIS 3.0',
       'Cable modem DOCSIS 3.1', 'FWA', 'LTE', 'Average LTE coverage',
       '5G', '5G in the 3.4–3.8\xa0GHz band', 'Satellite',
       'Overall broadband coverage', 'DOCSIS 3.0 & FTTP coverage',
       'Cable modem', 'WiMAX', 'HSPA'], dtype=object)

In [8]:
eu_broadband = eu_broadband.query('`Metric` == "FTTP" | `Metric` == "Households"')
display(eu_broadband['Metric'].unique())
display(eu_broadband.head())

array(['Households', 'FTTP'], dtype=object)

Unnamed: 0,Country,Metric,2018,2019,2020,2021,2022,2023
2,Austria,Households,3935534.0,3883312.0,3918929.0,3959143.0,3995050.0,4033080.0
15,Austria,FTTP,512932.4,534791.0,805015.0,1054017.0,1463133.0,1652409.0
31,Belgium,Households,4914168.0,4899404.0,4751936.0,4989764.0,5022036.0,4818475.0
44,Belgium,FTTP,68688.76,174923.2,309472.3,503256.6,861948.1,1204619.0
60,Bulgaria,Households,2930380.0,2877314.0,2888188.0,2881895.0,2849557.0,2803352.0


In [9]:
eu_broadband = eu_broadband.melt(id_vars=['Country', 'Metric'], var_name='Year', value_name='Number of households')
display(eu_broadband.head())

Unnamed: 0,Country,Metric,Year,Number of households
0,Austria,Households,2018,3935534.0
1,Austria,FTTP,2018,512932.4
2,Belgium,Households,2018,4914168.0
3,Belgium,FTTP,2018,68688.76
4,Bulgaria,Households,2018,2930380.0
