### Long Short-Term Memory (LSTM) neural network model:

LSTMs are a type of recurrent neural network (RNN) that can remember past information and use it to make predictions. They are particularly useful for time series data like the pollution data in this example, where past observations can be used to predict future values.

LSTM models can capture both short-term and long-term dependencies in the data, making them well-suited for time series prediction tasks. They are also capable of handling non-linear relationships between inputs and outputs, which can be useful when dealing with complex, real-world data.

One potential downside of LSTM models is that they can be computationally expensive to train, particularly when working with large datasets. Additionally, they can be difficult to interpret, which may make it challenging to understand how the model is making its predictions.

In [44]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

In [45]:
df = pd.read_excel('Final_Data.xlsx')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1461 entries, 0 to 1460
Data columns (total 19 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Date/Time                    1461 non-null   object 
 1   Air Quality Index            1461 non-null   float64
 2   Carbon Monoxide              1461 non-null   float64
 3   Hydrogen Sulphide            1448 non-null   float64
 4   Methane                      1459 non-null   float64
 5   Nitric Oxide                 1461 non-null   float64
 6   Nitrogen Dioxide             1461 non-null   float64
 7   Non-methane Hydrocarbons     1455 non-null   float64
 8   Outdoor Air Temperature      1461 non-null   float64
 9   Ozone                        1461 non-null   float64
 10  PM10 Mass                    412 non-null    float64
 11  PM2.5 Mass                   1461 non-null   float64
 12  Relative Humidity            1452 non-null   float64
 13  Std. Dev. of Wind 

In [46]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1461 entries, 0 to 1460
Data columns (total 19 columns):
 #   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  
 0   Date/Time                    1461 non-null   object 
 1   Air Quality Index            1461 non-null   float64
 2   Carbon Monoxide              1461 non-null   float64
 3   Hydrogen Sulphide            1448 non-null   float64
 4   Methane                      1459 non-null   float64
 5   Nitric Oxide                 1461 non-null   float64
 6   Nitrogen Dioxide             1461 non-null   float64
 7   Non-methane Hydrocarbons     1455 non-null   float64
 8   Outdoor Air Temperature      1461 non-null   float64
 9   Ozone                        1461 non-null   float64
 10  PM10 Mass                    412 non-null    float64
 11  PM2.5 Mass                   1461 non-null   float64
 12  Relative Humidity            1452 non-null   float64
 13  Std. Dev. of Wind 

In [49]:
df = df.drop('PM10 Mass', axis=1)

In [47]:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM

# Load the data
train_data = pd.read_excel('Final_Data.xlsx', sheet_name=None)

# Concatenate all sheets into a single dataframe
train_data = pd.concat(train_data, ignore_index=True)

# Drop the 'Date/Time' and 'PM10 Mass' columns
train_data = train_data.drop(['Date/Time', 'PM10 Mass'], axis=1)

# Drop rows with missing values
train_data = train_data.dropna()

# Scale the data
scaler = MinMaxScaler()
train_data_scaled = scaler.fit_transform(train_data)

# Split the data into inputs and outputs
n_timesteps = 3
train_X, train_y = [], []
for i in range(n_timesteps, len(train_data_scaled)):
    train_X.append(train_data_scaled[i-n_timesteps:i])
    train_y.append(train_data_scaled[i, 8])  # 8 corresponds to the index of Air Quality Index after dropping the 'Date/Time' and 'PM10 Mass' columns
train_X, train_y = np.array(train_X), np.array(train_y)

# Reshape the input data for LSTM model
n_features = train_X.shape[2]
train_X = train_X.reshape((train_X.shape[0], n_timesteps, n_features))

# Define the LSTM model
model = Sequential()
model.add(LSTM(50, activation='relu', input_shape=(n_timesteps, n_features)))
model.add(Dense(1))
model.compile(optimizer='adam', loss='mse')

# Fit the model
model.fit(train_X, train_y, epochs=200, verbose=0)

# Make a prediction for the next day's Air Quality Index
last_3_days = train_data_scaled[-n_timesteps:]
last_3_days = last_3_days.reshape((1, n_timesteps, n_features))
prediction = model.predict(last_3_days)
prediction = scaler.inverse_transform(prediction)[0][0]
print(f"The predicted Air Quality Index for the next day is: {prediction:.2f}")




ValueError: non-broadcastable output operand with shape (1,1) doesn't match the broadcast shape (1,17)