# Energy efficiency

A multivariate linear regression model has been built to predict the heating load in a residential building based on a set of descriptive features, describing the characteristics of the building. Heating load is the amount of heat energy required to keep a building at a specific temperature, during winter regardless outside temperature. 

The descriptive features used were (a) the overall surface area of the building, (b) the height of the building, (c) the area of the building's roof, and (d) the percentage of wall area in the building that is glazed. This kind of model would be useful to architects or engineers when designing a new building.

## Objective

This notebook reproduces the analyzed published in the paper: _Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools_

## Energy efficient data set

The dataset contains eight features, denoted by $X1, X2, \ldots{}, X8$ and two outcomes variables named $y1$ and $y2$. The goal comprises in using the eight features to predict each of the two output variables.

* **Attributes:**

   * **X1**: relative compactness
   * **X2**: surface area
   * **X3**: wall area
   * **X4**: roof area
   * **X5**: overall height
   * **X6**: orientation
   * **X7**: glazing area
   * **X8**: glazing area distribution
   * **y1**: heating load
   * **y2**: cooling load

**Source**: https://archive.ics.uci.edu/ml/datasets/energy+efficiency

## References

  * Tsanas A, Xifara A. <a href="http://people.maths.ox.ac.uk/tsanas/Preprints/ENB2012.pdf" target="_blank"><em>Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools</em></a>. Energy and Buildings. 2012 Jun 1;49:560-7.

### 1. Load the data energy efficiency data set

In [None]:
import sys
assert sys.version_info >= (3, 6)

import numpy
assert numpy.__version__ >="1.17.3" 
import numpy as np

import matplotlib.pyplot as plt

import pandas
assert pandas.__version__ >= "0.25.1"
import pandas as pd

import xlrd
assert xlrd.__version__ >= "1.2.0"

import sklearn
assert sklearn.__version__ >= "0.21.3"

%matplotlib inline

### 1. Loading the energy efficiency data set

In [None]:
energy_effiency = pandas.read_excel("https://archive.ics.uci.edu/ml/machine-learning-databases/00242/ENB2012_data.xlsx")
energy_effiency.shape

In [None]:
energy_effiency.head(n = 10)