# Solar radiation prediction
This notebook presents data analysis on a four-month dataset collected at the HI-SEAS weather station (Hawaii). The sampling rate is around 5 minutes. The objective is to derive a machine learning (ML) model to forecast solar radiation as a function of the available features.
## Dataset
The dataset consists of 
* Solar radiation [W/m^2]
* Temperature [F]
* Atmospheric pressure [Hg]
* Humidity [%]
* Wind speed [miles/h]
* Wind direction [degrees]



In [3]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import TimeSeriesSplit


In [2]:
df=pd.read_csv("solar_radiation_dataset.csv")
df.head()

Unnamed: 0,UNIXTime,Data,Time,Radiation,Temperature,Pressure,Humidity,WindDirection(Degrees),Speed,TimeSunRise,TimeSunSet
0,1475229326,9/29/2016 12:00:00 AM,23:55:26,1.21,48,30.46,59,177.39,5.62,06:13:00,18:13:00
1,1475229023,9/29/2016 12:00:00 AM,23:50:23,1.21,48,30.46,58,176.78,3.37,06:13:00,18:13:00
2,1475228726,9/29/2016 12:00:00 AM,23:45:26,1.23,48,30.46,57,158.75,3.37,06:13:00,18:13:00
3,1475228421,9/29/2016 12:00:00 AM,23:40:21,1.21,48,30.46,60,137.71,3.37,06:13:00,18:13:00
4,1475228124,9/29/2016 12:00:00 AM,23:35:24,1.17,48,30.46,62,104.95,5.62,06:13:00,18:13:00


# Split dataset

Provides train/test indices to split time series data samples that are observed at fixed time intervals, in train/test sets. In each split, test indices must be higher than before, and thus shuffling in cross validator is inappropriate.

This cross-validation object is a variation of KFold. In the kth split, it returns first k folds as train set and the (k+1)th fold as test set.

In [8]:
tscv = TimeSeriesSplit(n_splits=5)
print(tscv)
for train, test in tscv.split(df):
    print("%s %s" % (train, test))

TimeSeriesSplit(max_train_size=None, n_splits=5)
[   0    1    2 ... 5448 5449 5450] [ 5451  5452  5453 ... 10895 10896 10897]
[    0     1     2 ... 10895 10896 10897] [10898 10899 10900 ... 16342 16343 16344]
[    0     1     2 ... 16342 16343 16344] [16345 16346 16347 ... 21789 21790 21791]
[    0     1     2 ... 21789 21790 21791] [21792 21793 21794 ... 27236 27237 27238]
[    0     1     2 ... 27236 27237 27238] [27239 27240 27241 ... 32683 32684 32685]
