# Práctica I

## Extracción de los datos

### Funciones de utilidad

Se crea una función que permite extrare el data frame a partir del código de dataframe

In [1]:
import pandas as pd

def get_raw_data_frame( key, gziped ): 
    
    # url_template = 'https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/%s$DEFAULTVIEW/?format=TSV&compressed=false' 
    
    url_template =  'https://ec.europa.eu/eurostat/databrowser-backend/api/extraction/1.0/LIVE/false/tsv/%s?i'

    url = url_template % key

    return pd.read_table( url, compression = 'gzip' ) if gziped else  pd.read_table(  url )

    # return pd.read_table( url, compression = 'gzip' )


Extraer el país a partir de la primera columna del dataframe

In [2]:
def extract_country( data ):
    
    new_data = data.rename( columns={data.iloc[:, 0].name :'country'} )

    new_data['country'] = new_data['country'].str.replace(r'^.*,(.*)$', r'\1', regex=True)
    
    return new_data


Eliminar espacios en las columnas

In [3]:
def trim_column_names( data ):
    
    for col in data.columns:
    
        data = data.rename( columns={col :col.strip()} )
    
    return data
    

Limpia y transforma todas las columnas que son númericas

In [4]:
def clean_numeric_columns( data ):
   
   data.iloc[:,1:] = data.iloc[:,1:].replace(
        r'^.*[:].*$', None, regex=True # Not available and confidencial flag
   ).replace(
        r'e', '', regex=True # Remove flag estimated
   ).replace(
        r'd', '', regex=True # Remove flag definition differs
   ).replace( 
        r'^(.+) +$', r'\1', regex = True # rTrim
   ).replace( 
        r'^ +(.+)$', r'\1', regex = True # lTrim>
   )
   
   for col in  data.iloc[:,1:].columns :    
        data[col] = pd.to_numeric( data[col] )
    
   return data
    
    

Función que filtra los valores por el filtro

In [5]:
def filter_data( filter ): 
    def _filter_data( data ):
        new_data = data[data.iloc[:, 0].str.contains( filter )]    
        new_data.reset_index(inplace = True, drop = True)
        return new_data
    return _filter_data
    

Función que realiza todo el proceso de extración y limpieza de los datos

In [6]:
class Compose:
    _f = None
    def __init__(self, f):
        self._f = f
    def andThen( self, g ):
        return Compose( lambda s: ( g( self._f(s) ) ) )
    def apply(self, a): 
        return self._f( a )
    

def flow( filter ) :
    return Compose( 
        filter_data( filter )
    ).andThen(
        extract_country
    ).andThen(
        trim_column_names 
    ).andThen( 
        clean_numeric_columns 
    )

def dataframe_by_key( key, filter, gziped = False ):
        return flow(filter).apply(  get_raw_data_frame( key, gziped ) )


### _DATASET I_: Precio de Gas doméstico en € por kw/h

Obtenido del origen de datos [Gas prices components for household consumers - annual data](https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_202_c/default/table?lang=en)

Clave de identificación de los datos: **`NRG_PC_202_C`**

Se filtrarán los datos por:

 - Datos anuales
 - El componentes del precio de la energia: _"Energia y suministro "_
 - Consumición de la energía: En Giga Julios en todas las bandas
 - Moneda: Euro (€)
 - Unidad de medida: Kiolwatio-hora

In [7]:
data_gas_prices_household_consumers = dataframe_by_key( 'NRG_PC_202_C', filter = 'A,NRG_SUP,TOT_GJ,EUR,KWH'  ) 

Columnas del dataset:

In [8]:
display(data_gas_prices_household_consumers.dtypes)

country     object
2017       float64
2018       float64
2019       float64
2020       float64
2021       float64
dtype: object

Ejemplo de valores:

In [9]:
data_gas_prices_household_consumers

Unnamed: 0,country,2017,2018,2019,2020,2021
0,AT,0.0299,0.0304,0.0312,0.0308,
1,BA,0.024,0.024,0.0249,0.0258,
2,BE,0.0283,0.0288,0.0289,0.0252,
3,BG,0.017,0.0209,0.024,0.0177,
4,CZ,0.036,0.039,0.0455,0.0431,0.0448
5,DE,,,0.0278,0.0292,
6,DK,0.0234,0.0259,0.0209,0.016,
7,EA,0.0295,0.0303,0.0319,0.0302,
8,EE,0.0234,0.0239,0.0253,0.024,
9,EL,,0.0311,0.0338,0.0258,


### _DATASET II_: Precio de Gas no doméstico en € por kw/h

Obtenido del origen de datos [Gas prices components for non-household consumers - annual data](https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_203_c/default/table?lang=en)

Clave de identificación de los datos: **`NRG_PC_203_C`**

Se filtrarán los datos por:

 - Datos anuales
 - El componentes del precio de la energia: _"Energia y suministro "_
 - Consumición de la energía: En Giga Julios en todas las bandas
 - Moneda: Euro (€)
 - Unidad de medida: Kiolwatio-hora

In [10]:
data_gas_prices_nonhousehold_consumers = dataframe_by_key( 'NRG_PC_203_C', filter = 'A,NRG_SUP,TOT_GJ,EUR,KWH'  ) 

Columnas del dataset:

In [11]:
display(data_gas_prices_nonhousehold_consumers.dtypes)

country     object
2017       float64
2018       float64
2019       float64
2020       float64
2021       float64
dtype: object

Ejemplo de valores:

In [12]:
data_gas_prices_nonhousehold_consumers

Unnamed: 0,country,2017,2018,2019,2020,2021
0,AT,,,0.0184,0.0168,
1,BA,,,0.0257,0.0259,
2,BE,,,0.0189,0.0148,
3,BG,,,0.0213,0.0142,
4,CZ,,,0.0226,0.0192,
5,DE,,,0.0196,0.0171,
6,DK,,,0.0178,0.0137,
7,EA,0.022,0.024,0.0211,0.0175,
8,EE,,,0.0213,0.0155,
9,EL,,,0.026,0.0165,


### _DATASET III_: Precio de la electricidad Doméstica para la franja de 2.500 a 4.999 kw

Obtenido del origen de datos [Electricity prices components for household consumers - annual data (from 2007 onwards)](https://ec.europa.eu/eurostat/databrowser/view/NRG_PC_204_C__custom_2388428/default/table?lang=en)

Clave de identificación de los datos: **`NRG_PC_204_C`**

In [13]:
# PRUEBAS BORRAR

url =  'https://ec.europa.eu/eurostat/databrowser-backend/api/extraction/1.0/LIVE/false/tsv/NRG_PC_204_C__custom_2388428?i'

pd.read_table( url, compression = 'gzip' )

# PRUEBAS BORRAR 
                          

Unnamed: 0,"freq,nrg_cons,nrg_prc,currency,geo\TIME_PERIOD",2012-S2,2013-S2,2014-S2,2015-S2,2016-S2,2017,2018,2019,2020,2021
0,"A,KWH2500-4999,NETC,EUR,AL",:,:,:,:,:,0.0000,0.0000,0.0000,:,:
1,"A,KWH2500-4999,NETC,EUR,AT",:,:,:,:,:,0.0606,0.0626,0.0645,0.0639,:
2,"A,KWH2500-4999,NETC,EUR,BA",:,:,:,:,:,0.0381,0.0388,0.0367,0.0370,:
3,"A,KWH2500-4999,NETC,EUR,BE",:,:,:,:,:,0.1055,0.1116,0.1092,0.1049,:
4,"A,KWH2500-4999,NETC,EUR,BG",:,:,:,:,:,0.0232,0.0242,0.0256,0.0265,:
...,...,...,...,...,...,...,...,...,...,...,...
514,"S,KWH2500-4999,TAX_RNW,EUR,LI",:,:,:,:,0.0092,:,:,:,:,:
515,"S,KWH2500-4999,VAT,EUR,EA",:,:,:,:,0.0205,:,:,:,:,:
516,"S,KWH2500-4999,VAT,EUR,EU27_2020",:,:,:,:,0.0205,:,:,:,:,:
517,"S,KWH2500-4999,VAT,EUR,IT",:,:,:,:,0.0205 d,:,:,:,:,:
