# Obtain weekday prototypes
The main objective of this task is to create every weekday prototypes. We want to model two types of days based on the consumption activity of each building:
- **Active** day.
- **Inactive** day.

Thus, for each counter on the database, we'll get 13 day prototypes (6 working days * 2 types of days + 1 inactive day corresponding to Sundays). Moreover, there are 97 different buildings, so we expect to get 13 * 97 prototypical days.

This previously mentioned activity will be defined by calculating the mean of Sundays' consumptions for each building (greater than this value plus some margin will indicate an active day; lower or equal than this value plus the margin should be labelled as an inactive day).

#### Directory structure
./<br></br>
notebook/<br></br>
    &emsp;|--- data_preprocessing<br></br>
    &emsp;&emsp;&emsp;&emsp;|--- weekday_prototypes.ipynb<br></br>
out/<br></br>
    &emsp;|--- consumptions.zip

In [1]:
CONS_PATH = 'C:/Users/thmas/OneDrive - Universidad de Castilla-La Mancha/Informática/TFG/out/'

In [2]:
import pandas as pd
import numpy as np

In [3]:
raw = pd.read_pickle(CONS_PATH + 'consumptions.zip')
raw

Unnamed: 0_level_0,building_id,weekday,active,type,consumptions
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2012-02-24,89,4,False,1,"[nan, nan, nan, nan, 0.0, 25.9682072759303, 34..."
2012-02-25,89,5,False,1,"[8.0, 8.56965980289508, 7.83041664589254, 7.83..."
2012-02-26,89,6,False,1,"[9.0, 9.0, 8.47872481882854, 8.52127518117146,..."
2012-02-27,89,0,False,1,"[9.93594069444675, 9.0, 10.0, 18.4133936140153..."
2012-02-28,89,1,False,1,"[15.0, 15.0, 15.0, 23.0, 41.3474893206788, 39...."
...,...,...,...,...,...
2020-03-28,2233,5,False,1,"[8.96294314928535, 9.1999884489703, 9.22916758..."
2020-03-29,2233,6,False,1,"[9.05122649923577, 9.10856876843712, 9.0668798..."
2020-03-30,2233,0,False,1,"[9.14786320617928, 9.46424320377272, 12.979311..."
2020-03-31,2233,1,False,1,"[9.09777728991234, 9.49875136817228, 13.959012..."


In [4]:
def get_prototype(df: pd.DataFrame, mode: str = 'mean') -> pd.DataFrame:
    weekday = df['weekday'].iloc[0]
    active = df['active'].iloc[0]
    consumer_type = df['type'].iloc[0]
    
    cons = []
    for i in range(24):
        i_consumptions = []
        for j in range(df.shape[0]):
            i_consumptions.append(df['consumptions'].iloc[j][i])

        if mode == 'std':
            cons.append(np.nanstd(i_consumptions))
        else:
            cons.append(np.nanmean(i_consumptions))
            
    return pd.DataFrame({'weekday': weekday, 'active': active, 'type': consumer_type, 'consumptions': [cons]})

In [6]:
mean_proto, std_proto = pd.DataFrame(), pd.DataFrame()
for d in range(0, 7):
    df = raw[raw['weekday'] == d]
    df['daily'] = df['consumptions'].apply(np.nansum)
    
    for a in (True, False):
        df_a = df[df['active'] == a]
        
        for t in df_a['type'].unique():
            df_t = df_a[df_a['type'] == t]
            
            mean_proto = mean_proto.append(get_prototype(df_t, mode='mean'), ignore_index=True)
            std_proto = std_proto.append(get_prototype(df_t, mode='std'), ignore_index=True)

mean_proto

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.


Unnamed: 0,weekday,active,type,consumptions
0,0,True,1,"[10.895687980366292, 14.315760487655078, 17.42..."
1,0,True,2,"[39.54708221775872, 51.43227218871105, 68.6322..."
2,0,True,0,"[2.0315225688299905, 2.401072742409194, 2.7675..."
3,0,False,1,"[21.13959733877297, 23.531137443405353, 25.752..."
4,0,False,0,"[2.018456316902241, 2.4330310719642037, 2.7912..."
5,1,True,1,"[11.002494318189354, 14.320689738548413, 17.51..."
6,1,True,2,"[40.17847127749248, 52.17325169642676, 69.7777..."
7,1,True,0,"[2.1494240500050656, 2.504473065200021, 2.8623..."
8,1,False,1,"[21.34350289297516, 23.851559912570817, 26.363..."
9,1,False,0,"[2.0329991548331976, 2.451854301429503, 2.8260..."
