# Práctica I

## Extracción de los datos

### Funciones de utilidad

Se crea una función que permite extrare el data frame a partir del código de dataframe

In [1]:
import pandas as pd

def get_raw_data_frame( key ): 
    
    url_template = 'https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/%s$DEFAULTVIEW/?format=TSV&compressed=false' 
    
    url = url_template % key

    return pd.read_table( url )


Extraer el país a partir de la primera columna del dataframe

In [2]:
def extract_country( data ):
    
    new_data = data.rename( columns={data.iloc[:, 0].name :'country'} )

    new_data['country'] = new_data['country'].str.replace(r'^.*,(.*)$', r'\1', regex=True)
    
    return new_data


Eliminar espacios en las columnas

In [3]:
def trim_column_names( data ):
    
    for col in  data.columns :
    
        data = data.rename( columns={col :col.strip()} )
    
    return data
    

Limpia y transforma todas las columnas que son númericas

In [4]:
def clean_numeric_columns( data ):
   
   data.iloc[:,1:] = data.iloc[:,1:].replace(
        r'^.*[:].*$', None, regex=True # Not available and confidencial flag
   ).replace(
        r'e', '', regex=True # Remove flag estimated
   ).replace(
        r'd', '', regex=True # Remove flag definition differs
   ).replace( 
        r'^(.+) +$', r'\1', regex = True # rTrim
   ).replace( 
        r'^ +(.+)$', r'\1', regex = True # lTrim
   ).astype('float64')
    
   return data
    
    

Función que realiza todo el proceso de extración y limpieza de los datos

In [5]:
def dataframe_by_key( key ):
    return clean_numeric_columns(
        trim_column_names(
            extract_country( 
                get_raw_data_frame( key )  
            )
        )
    )

### _DATASET I_: Precio de Gas doméstico en € por kw/h

Obtenido del origien de datos [Gas prices components for household consumers - annual data](https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_202_c/default/table?lang=en)

Clave de identificación de los datos: **`NRG_PC_202_C`**

In [6]:
data_gas_prices_household_consumers = dataframe_by_key( 'NRG_PC_202_C' ) 

Columnas del dataset:

In [7]:
display(data_gas_prices_household_consumers.dtypes)

country     object
2017       float64
2018       float64
2019       float64
2020       float64
2021       float64
dtype: object

Ejemplo de valores:

In [8]:
data_gas_prices_household_consumers

Unnamed: 0,country,2017,2018,2019,2020,2021
0,AT,0.0219,0.0200,0.0179,0.0171,
1,BA,0.0039,0.0039,0.0036,0.0037,
2,BE,0.0163,0.0176,0.0165,0.0144,
3,BG,0.0126,0.0133,0.0133,0.0135,
4,CZ,0.0135,0.0134,0.0060,0.0058,0.0059
...,...,...,...,...,...,...
275,SI,0.0097,0.0100,0.0103,0.0102,0.0101
276,SK,0.0074,0.0076,0.0080,0.0081,0.0072
277,TR,0.0037,0.0028,0.0033,0.0029,
278,UA,,0.0000,0.0000,0.0042,


### _DATASET II_: Precio de Gas no doméstico en € por kw/h

Obtenido del origien de datos [Gas prices components for non-household consumers - annual data](https://ec.europa.eu/eurostat/databrowser/view/nrg_pc_203_c/default/table?lang=en)

Clave de identificación de los datos: **`NRG_PC_203_C`**

In [9]:
data_gas_prices_nonhousehold_consumers = dataframe_by_key( 'NRG_PC_203_C' ) 

Columnas del dataset:

In [10]:
display(data_gas_prices_nonhousehold_consumers.dtypes)

country    object
2017       object
2018       object
2019       object
2020       object
2021       object
dtype: object

Ejemplo de valores:

In [11]:
data_gas_prices_nonhousehold_consumers

Unnamed: 0,country,2017,2018,2019,2020,2021
0,AT,3.1561,2.8492,2.7004,2.6527,
1,BA,3.3336,3.2463,3.1931,3.2187,
2,BE,2.3286,2.1592,1.8494,1.6252,
3,BG,2.658,2.94,2.9073,3.0556,
4,CZ,1.821,2.0446,1.8549,1.752,
...,...,...,...,...,...,...
3771,SI,,,0.0068,0.0061,0.0077
3772,SK,,,0.0062,0.0059,0.0068
3773,TR,,,0.004,0.0029,
3774,UA,,,0.0,0.003,


In [12]:
# PRUEBAS BORRAR
    
# data_gas_prices_consumers.columns

# PRUEBAS BORRAR 
                          