# GreEn-ER Dataset
The data set consists of electrical consumption data for the GreEn-ER building, as well as meteorological data since September 2016. 

This is a subset of the available measures. The objective is to predict the electricity consumption due to heating over a 3-day period, using the different data available on day D (one hour sampling period):
  - Electrical consumption for heating (kWh)
  - Outdoor temperature (Deg C)
  - Global measured radiation (RSR2 sensor, averaged over one hour, W/m^2)
  - Time of day for measurements (0 to 23 hours)
  - Number of the day of the week (1 for monday, 7 for sunday)


## Loading training data

In [None]:
import scipy.io as sio
import numpy as np
greener_train= sio.loadmat('greener_train.mat')
X_train= greener_train['data']
Y_train= greener_train['targets']
X_train.shape

### features
We have $n=4000$ samples on the training set with $p=5$ features:
 - `X_train[:,0]` Heating (kWh)
 - `X_train[:,1]` Temperature (Deg C)
 - `X_train[:,2]` Global measured radiation  (W/m^2)
 - `X_train[:,3]` Time of day (0 to 23 hours)
 - `X_train[:,4]` Day of the week (1 for monday, 7 for sunday)
 
We can plot some features

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
plt.subplot(2,1,1)
plt.plot(X_train[:,0])
plt.grid()
plt.xlabel('hours')
plt.ylabel('kWh')
plt.subplot(2,1,2)
plt.plot(X_train[:,1])
plt.grid()
plt.xlabel('hours')
plt.ylabel('Temp (Deg C)')


### Responses
We have $n=4000$ samples of the Heating elactrical consumption at day D+3. We can plot them

In [None]:
%matplotlib notebook
import matplotlib.pyplot as plt
plt.figure(figsize=(10,5))
ax1= plt.subplot(211)
plt.plot(np.arange(4000)+3*24, Y_train[:],'r')
plt.grid()
plt.xlabel('hours')
plt.ylabel('Heating D+3(kWh)')
plt.subplot(212)
plt.plot(X_train[:,0])
plt.grid()
plt.xlabel('hours')
plt.ylabel('Heating (kWh)')


## Loading test data
Here we have a test set in order to estimate true performances and validate the method/prediction rule


In [None]:
greener_test= sio.loadmat('greener_test.mat')
X_test= greener_test['data']
Y_test= greener_test['targets']

 # TODO
 
- Train a knn classifier to predict the responses (heating electrical consumption at Day D+3)
- plot the results
- evaluate both train and test error rate as a function of k (number of neighbors)  
- validate the parameters/methods