# Energy Demand Forecast [ML Fortnight 2024]

This is the starter notebook for the "Energy Demand Forecast [ML Fortnight 2024]" competition. In this competition, you will be predicting the energy demand for the next 14 days based on historical data.

This notebook assumes the data is located in the `data/` directory and is named as follows:
- `train.csv`: This is the training data that you will use to train your model.
- `test.csv`: This is the test data on which you will apply your model to predict the energy demand for the next 14 days.
- `sample_submission.csv`: This is the format for the submission file that you will submit.

In [1]:
import os            # handle os stuff

import numpy as np   # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)

for dirname, _, filenames in os.walk('./data'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Let's load these files, and see what every one of them includes. The `train` file includes features:
- `measurement_time`: timestamp
- `source_1_temperature` to `source_4_temperature`: Temperatures from various energy sources.
- `mean_room_temperature`: Average temperature within the building.
- `sun_radiation_east`, `west`, `south`, `north`, `perpendicular`: Solar radiation levels from different directions.
- `outside_temperature`: Temperature outside the building.
- `wind_speed`: Speed of the wind.
- `wind_direction`: Direction of the wind.
- `clouds`: Cloud coverage.

and the `target` column. You use this data to train your models.

In [None]:
train = pd.read_csv("data/train.csv", index_col="ID")
train.head()

Unnamed: 0_level_0,measurement_time,target,source_1_temperature,source_2_temperature,source_3_temperature,source_4_temperature,mean_room_temperature,sun_radiation_east,sun_radiation_west,sun_radiation_south,sun_radiation_north,sun_radiation_perpendicular,outside_temperature,wind_speed,wind_direction,clouds
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
0,2023-11-01 00:00:00,3.4,27.6,18.799999,19.75,21.1,20.129892,0.0,0.0,0.0,0.0,0.0,8.97,2.06,140.0,20.0
1,2023-11-01 01:00:00,2.933333,28.4,18.933333,19.833333,21.033333,20.052919,0.0,0.0,0.0,0.0,0.0,9.19,2.06,110.0,100.0
2,2023-11-01 02:00:00,7.166667,29.4,19.0,19.799999,21.0,19.992375,0.0,0.0,0.0,0.0,0.0,9.42,2.57,140.0,20.0
3,2023-11-01 03:00:00,10.5,30.1,19.033333,19.933333,24.6,19.941565,0.0,0.0,0.0,0.0,0.0,9.19,2.57,150.0,100.0
4,2023-11-01 04:00:00,8.733334,31.866666,19.1,20.0,24.7,19.924502,0.0,0.0,0.0,0.0,0.0,9.99,2.57,160.0,100.0


The `test` file is exactly the same as `train` file, but it excludes the `target` column. And you use your model on this data to get the predictions. You have to upload these predictions to Kaggle, by which you will be evaluated.

In [None]:
test = pd.read_csv("data/test.csv", index_col="ID")
test.head()

Unnamed: 0_level_0,measurement_time,source_1_temperature,source_2_temperature,source_3_temperature,source_4_temperature,mean_room_temperature,sun_radiation_east,sun_radiation_west,sun_radiation_south,sun_radiation_north,sun_radiation_perpendicular,outside_temperature,wind_speed,wind_direction,clouds
ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
7047,2024-08-20 15:00:00,40.349999,16.625,16.325,18.65,21.17107,114.935484,622.645161,565.741935,115.774194,814.483871,23.43,6.17,200.0,20.0
7048,2024-08-20 16:00:00,39.5,19.333333,17.966667,19.033333,21.464404,97.774194,735.258065,406.677419,99.612903,766.774194,19.1,5.66,210.0,40.0
7049,2024-08-20 17:00:00,37.600001,19.566667,18.775,19.150001,21.56125,76.612903,749.064516,226.290323,81.967742,681.032258,19.23,2.06,230.0,20.0
7050,2024-08-20 18:00:00,37.299999,19.35,19.35,19.05,21.498269,52.967742,616.451613,67.612903,70.387097,524.322581,18.83,3.6,220.0,20.0
7051,2024-08-20 19:00:00,33.025001,19.066667,19.75,19.175001,21.417638,25.16129,301.096774,26.83871,88.903226,249.935484,18.12,2.57,220.0,20.0


Making the model is your task, but here is a simple example of how you do that.

In [None]:
X = train.drop(columns=["target"])
y = train["target"]

In [None]:
# Load and train a Dummy model
from sklearn.dummy import DummyRegressor

model = DummyRegressor()
model.fit(X, y)

Let's use this trained dummy model to make predictions on the test set

In [None]:
X_test = test

test_predictions = model.predict(X_test)

In [None]:
# let's see what the predictions look like
test_predictions

array([13.90299228, 13.90299228, 13.90299228, ..., 13.90299228,
       13.90299228, 13.90299228])

These predictions are fine, but not in the acceptable format. They are in the numpy array (see above). But we need it to format it similar to the `sample_submission` file.

In [None]:
sample_submission = pd.read_csv("data/sample_submission.csv", index_col="ID")
sample_submission.head()

Unnamed: 0_level_0,target
ID,Unnamed: 1_level_1
7047,13.9
7048,13.9
7049,13.9
7050,13.9
7051,13.9


Let's copy the sample file and replace the values, like follows:

In [None]:
test_submission = sample_submission.copy()
test_submission["target"] = test_predictions

test_submission

Unnamed: 0_level_0,target
ID,Unnamed: 1_level_1
7047,13.902992
7048,13.902992
7049,13.902992
7050,13.902992
7051,13.902992
...,...
8804,13.902992
8805,13.902992
8806,13.902992
8807,13.902992


Your submission file has two columns: an ID column and a target column. The ID field comes from the test data. The prediction column will use the name of the target field.

In [None]:
# Write to a file
test_submission.to_csv("submission.csv")

Finally, upload this `submission.csv` file to the Kaggle competition page (top right button) and see how you did! 🚀