# Obtain weekday prototypes
The main objective of this task is to create every weekday prototypes. We want to model two types of days based on the consumption activity of each building:
- **Active** day.
- **Inactive** day.

Thus, for each counter on the database, we'll get 13 day prototypes (6 working days * 2 types of days + 1 inactive day corresponding to Sundays). Moreover, there are 97 different buildings, so we expect to get 13 * 97 prototypical days.

This previouisly mentioned activity will be defined by calculating the mean of Sundays' consumptions for each building (greater than this value plus some margin will indicate an active day; lower or equal than this value plus the margin should be labelled as an inactive day).

#### Directory structure
./<br></br>
notebook/<br></br>
    &emsp;|--- data-preprocessing<br></br>
    &emsp;&emsp;&emsp;&emsp;|--- weekday_prototypes.ipynb<br></br>
out/<br></br>
    &emsp;|--- consumptions_byday/<br></br>

In [1]:
CONS_PATH = 'C:/Users/thmas/OneDrive - Universidad de Castilla-La Mancha/Informática/TFG/out/consumptions_byday/'

In [2]:
import pandas as pd

In [3]:
counter_id = 27 # Counter ID example

raw_df = pd.read_pickle(CONS_PATH + 'counter_' + str(counter_id) + '_byDay.zip')
raw_df

Unnamed: 0_level_0,building_id,weekday,0,1,2,3,4,5,6,7,...,14,15,16,17,18,19,20,21,22,23
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-07-26,27,1,,,,,,,,,...,41.846246,22.805419,20.887574,18.846172,18.846057,18.420198,18.000000,18.118729,17.881271,18.000000
2011-07-27,27,2,17.000000,19.000000,18.350795,35.846313,47.846263,50.846379,149.316601,137.956260,...,39.845466,21.500125,19.000000,20.000000,18.000000,18.728452,17.845798,19.425750,18.000000,17.111296
2011-07-28,27,3,18.888704,18.803009,18.845892,35.845713,47.845857,51.845696,162.877204,147.690832,...,42.795578,23.204422,21.000000,22.000000,21.000000,21.023872,19.976128,20.714894,20.285106,20.000000
2011-07-29,27,4,20.000000,21.000000,20.000000,37.788779,45.845705,50.845726,162.822891,143.039176,...,38.845451,21.910895,19.000000,19.000000,19.000000,18.318000,17.845896,17.836104,18.855870,17.846017
2011-07-30,27,5,17.298113,17.000000,17.239697,17.845833,17.914470,18.000000,18.623151,19.376849,...,20.000000,19.235492,19.764508,19.000000,19.000000,19.000000,18.000000,19.000000,18.000000,17.000549
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-02-22,27,5,21.935497,21.935320,21.064881,21.946185,20.946242,21.784327,21.935417,22.118729,...,22.687338,21.935400,22.215645,21.946152,22.644214,22.247960,21.946055,21.611551,22.280723,20.946171
2020-02-23,27,6,21.946155,21.568840,21.323437,21.946155,21.536587,22.355722,21.946187,22.504338,...,22.452667,21.418263,22.935361,21.935446,21.495630,22.375282,21.935428,21.935402,21.935434,21.935470
2020-02-24,27,0,21.935432,24.299828,42.046687,56.888749,67.824054,77.116559,75.473832,73.795139,...,59.236310,32.813565,24.935450,25.127386,25.743428,23.935426,24.935405,23.935371,24.935494,24.808040
2020-02-25,27,1,24.179701,27.030773,43.485146,63.186978,77.027611,86.836100,88.814914,84.587054,...,59.872437,36.163066,24.980219,23.935414,24.935374,24.847482,25.023232,24.935294,24.815093,24.055454


### Obtaining prototype measures
In order to get the required measures for every day, first we obtain them for sundays, which are supposed to be inactive days. After that, we base the active/inactive result depending on:

- **Inactive days** &rarr; daily consumption mean within [sundays.mean + 2 * sundays.std, +$\infty$)
- **Active days** &rarr; daily consumption mean within [0, sundays.mean + 2 * sundays.std)

We'll store all this prototypical days (every building has 13, as previously discussed) in a pandas DataFrame for later use

In [4]:
def get_threshold(df: pd.DataFrame) -> float:
    df.loc[:, 'mean'] = df.loc[:, '0':].mean(axis=1) # Calculate daily consumption mean
    mean, std = df['mean'].mean(), df['mean'].std()
    
    return mean + 2 * std # Calculate threshold based on sundays

In [5]:
def get_prototype(df: pd.DataFrame, counter_id: int, weekday: int, active: bool, type: str = 'mean') -> pd.DataFrame:
    if type == 'std':
        proto = df.loc[:, '0':].std(axis=0) # Calculate hour consumption std
    else:
        proto = df.loc[:, '0':].mean(axis=0) # Calculate hour consumption mean
    
    proto = pd.DataFrame(proto).T # Force it to be a row
    proto.insert(0, 'active', active)
    proto.insert(0, 'weekday', weekday)
    proto.insert(0, 'building_id', counter_id)
    
    del proto['mean']
    
    return proto

In [6]:
clean_df = raw_df.dropna()

sundays = clean_df[clean_df['weekday'] == 6] # Select Sundays
sundays

Unnamed: 0_level_0,building_id,weekday,0,1,2,3,4,5,6,7,...,14,15,16,17,18,19,20,21,22,23
day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-07-31,27,6,18.845599,17.153853,18.000000,18.000000,18.000000,20.000000,19.000000,20.000000,...,19.000000,19.000000,19.000000,19.223473,18.776527,18.000000,18.761067,18.238933,17.452738,17.547262
2011-08-07,27,6,11.949745,11.050255,11.641160,11.358840,11.000000,12.000000,12.024355,12.975645,...,13.000000,13.000000,13.000000,13.000000,11.000000,12.020763,11.979237,12.000000,12.000000,12.000000
2011-08-14,27,6,11.846081,11.959588,12.000000,12.000000,13.000000,13.000000,13.000000,13.000000,...,13.000000,13.000000,13.574492,12.845753,12.579755,12.000000,12.000000,12.000000,12.000000,12.000000
2011-08-21,27,6,11.000000,11.970775,11.029225,11.662400,12.337600,12.000000,13.000000,12.044951,...,12.963477,13.036523,12.000000,13.000000,12.000000,11.192110,11.807890,11.883617,11.116383,12.000000
2011-08-28,27,6,11.000000,11.000000,11.000000,11.000000,11.000000,12.000000,12.000000,13.000000,...,12.829029,12.828929,12.703811,12.954430,11.829000,12.216570,12.000000,12.000000,12.000000,11.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2020-01-26,27,6,21.063754,20.946135,20.946081,19.946294,20.946161,20.946174,20.946151,21.946118,...,19.946099,21.946171,20.946211,19.946171,20.946245,20.946208,20.946195,19.946217,20.946229,21.946260
2020-02-02,27,6,21.935458,21.883851,21.946212,20.976291,21.935492,21.926825,21.946190,20.946240,...,24.023709,21.946211,20.946231,21.825915,20.935508,21.935519,20.935357,21.098925,21.946214,21.761323
2020-02-09,27,6,22.199418,21.935248,21.935288,22.714313,20.945995,21.946031,22.134570,22.756935,...,22.048326,23.843842,22.026832,23.935392,21.935316,21.935441,21.897741,21.946235,20.962361,22.930089
2020-02-16,27,6,23.492198,23.378940,23.935471,22.935478,22.935598,23.935543,23.935378,23.935642,...,24.631899,22.935445,23.935483,23.935471,23.935664,23.935431,22.935426,23.935331,22.935337,23.935503


In [7]:
threshold = get_threshold(sundays)

mean_proto = get_prototype(sundays, counter_id, 6, False, type='mean') # Get Sundays prototype
mean_proto

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[key] = _infer_fill_value(value)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s


Unnamed: 0,building_id,weekday,active,0,1,2,3,4,5,6,...,14,15,16,17,18,19,20,21,22,23
0,27,6,False,21.782368,21.680815,21.677122,21.712221,21.784557,22.121077,22.543591,...,22.811096,22.542973,22.313685,22.218645,22.135549,22.215568,22.177961,22.100104,22.057921,22.027624


In [8]:
std_proto = get_prototype(sundays, counter_id, 6, False, type='std') # Get Sundays prototype
std_proto

Unnamed: 0,building_id,weekday,active,0,1,2,3,4,5,6,...,14,15,16,17,18,19,20,21,22,23
0,27,6,False,6.049485,6.073791,6.057298,6.052764,6.06851,6.178819,6.248896,...,6.039515,6.022084,5.990873,6.002408,6.026091,6.115214,6.104574,6.169124,6.176622,6.193037


In [9]:
for i in range(0, 6):
    df = clean_df[clean_df['weekday'] == i]
    df.loc[:, 'mean'] = df.loc[:, '0':].mean(axis=1) # Calculate daily consumption mean
    
    df_a = df.loc[df['mean'] >= threshold] # Select active days
    mean_proto = mean_proto.append(get_prototype(df_a, counter_id, i, True, type='mean'))
    std_proto = std_proto.append(get_prototype(df_a, counter_id, i, True, type='std'))
    
    df_i = df.loc[df['mean'] < threshold] # Select inactive days
    mean_proto = mean_proto.append(get_prototype(df_a, counter_id, i, False, type='mean'))
    std_proto = std_proto.append(get_prototype(df_a, counter_id, i, False, type='std'))

mean_proto.reset_index(drop=True, inplace=True)
std_proto.reset_index(drop=True, inplace=True)
mean_proto

Unnamed: 0,building_id,weekday,active,0,1,2,3,4,5,6,...,14,15,16,17,18,19,20,21,22,23
0,27,6,False,21.782368,21.680815,21.677122,21.712221,21.784557,22.121077,22.543591,...,22.811096,22.542973,22.313685,22.218645,22.135549,22.215568,22.177961,22.100104,22.057921,22.027624
1,27,0,True,21.987414,23.547828,31.403885,51.294441,67.658796,83.231615,89.186817,...,51.763729,30.319845,25.149737,24.340377,24.223884,24.019317,23.916323,23.931445,23.917818,23.849526
2,27,0,False,21.987414,23.547828,31.403885,51.294441,67.658796,83.231615,89.186817,...,51.763729,30.319845,25.149737,24.340377,24.223884,24.019317,23.916323,23.931445,23.917818,23.849526
3,27,1,True,23.785422,25.229916,32.915017,52.278982,70.171602,84.2536,89.299668,...,48.405252,30.0954,25.37615,24.661971,24.498305,24.422049,24.419125,24.301736,24.349805,24.209923
4,27,1,False,23.785422,25.229916,32.915017,52.278982,70.171602,84.2536,89.299668,...,48.405252,30.0954,25.37615,24.661971,24.498305,24.422049,24.419125,24.301736,24.349805,24.209923
5,27,2,True,24.307285,25.706407,33.602966,52.904333,68.126702,82.148117,87.621479,...,48.234694,30.049083,25.274668,24.568892,24.341906,24.252489,24.248906,24.157215,24.122491,24.081269
6,27,2,False,24.307285,25.706407,33.602966,52.904333,68.126702,82.148117,87.621479,...,48.234694,30.049083,25.274668,24.568892,24.341906,24.252489,24.248906,24.157215,24.122491,24.081269
7,27,3,True,24.208587,25.562588,33.593101,52.67768,67.600205,80.564639,86.113119,...,47.541625,29.938253,25.234953,24.482791,24.352541,24.2215,24.17496,24.042571,24.026416,23.976951
8,27,3,False,24.208587,25.562588,33.593101,52.67768,67.600205,80.564639,86.113119,...,47.541625,29.938253,25.234953,24.482791,24.352541,24.2215,24.17496,24.042571,24.026416,23.976951
9,27,4,True,24.094901,25.344169,33.469687,51.967893,67.941058,81.400857,86.824998,...,40.827632,25.945963,22.885125,22.556892,22.32271,22.22217,22.167155,22.060855,22.086728,21.976244
