<a href="https://colab.research.google.com/github/vincent4u/vince-file/blob/main/group_work2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt

# Load the data
data1 = pd.read_csv("/content/train.csv")
data2 = pd.read_csv("/content/train_preprocessed.csv", compression='gzip')
data_sales = data1['Sales']

# Concatenate the data
rossman_processed_data = pd.concat([data2, data_sales], axis=1)
rossman_features = rossman_processed_data.iloc[:, :-1].values
rossman_target1 = rossman_processed_data.iloc[:, -1].values

# Normalizing the target variable
scaler = MinMaxScaler(feature_range=(0, 1))
rossman_target = scaler.fit_transform(rossman_target1.reshape(-1, 1))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(rossman_features, rossman_target, test_size=0.2, random_state=42)

# Reshape the data for LSTM input [samples, time steps, features]
X_train = X_train.reshape((-1, 1, X_train.shape[1]))
X_test = X_test.reshape((-1, 1, X_test.shape[1]))

# Build the LSTM model
model = Sequential()
model.add(LSTM(128, input_shape=(1, X_train.shape[2])))
model.add(Dense(1))

# Compile the model
model.compile(loss='mse', optimizer='adam')

# Train the model
history = model.fit(X_train, y_train, epochs=10, batch_size=32, verbose=1)

# Evaluate the model
test_loss = model.evaluate(X_test, y_test)
print('Test loss:', test_loss)

# Make predictions
predictions = model.predict(X_test)

# Denormalizing the predicted values
predictions = scaler.inverse_transform(predictions)
y_test = scaler.inverse_transform(y_test)

# Convert predictions to binary
threshold = 0.5
predictions_binary = np.where(predictions >= threshold, 1, 0)
y_test_binary = np.where(y_test >= threshold, 1, 0)

# Calculating accuracy
accuracy = accuracy_score(y_test_binary, predictions_binary)
print('Accuracy:', accuracy)

# Plot actual vs. predicted sales
plt.plot(y_test, label='Actual')
plt.plot(predictions, label='Predicted')
plt.xlabel('Data Point')
plt.ylabel('Sales')
plt.title('Actual vs. Predicted Sales')
plt.legend()
plt.show()

# Plot the loss values
loss_history = history.history['loss']
plt.plot(loss_history)
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training Loss')
plt.show()

Here's an explanation of the code in simple terms:

1.First, we import the necessary libraries: pandas, numpy, MinMaxScaler, train_test_split, Sequential, Dense, accuracy_score, and matplotlib.pyplot. These libraries provide various tools and functions for data manipulation, preprocessing, model building, evaluation, and visualization.

2.We load the data from two CSV files, "train.csv" and "train_preprocessed.csv", using pandas. The data contains information about sales.

3.We concatenate the two datasets, combining the preprocessed data and the sales column into a single DataFrame called "rossman_processed_data".

4.We separate the features (rossman_features) and the target variable (rossman_target1) from the concatenated DataFrame.

5.The target variable is normalized using the MinMaxScaler, which scales the values between 0 and 1.

6.The data is split into training and testing sets using the train_test_split function. 80% of the data is used for training (X_train and y_train), and 20% is used for testing (X_test and y_test).

7.The data is reshaped to fit the input requirements of an LSTM model. The shape is changed to (samples, time steps, features) using the reshape function.

8.We build an LSTM model using the Sequential class from Keras. The model has an LSTM layer with 128 units and a Dense layer with 1 unit.

9.The model is compiled with a mean squared error (MSE) loss function and the Adam optimizer.

10.The model is trained on the training data for 10 epochs, with a batch size of 32.

11.The model is evaluated on the testing data, and the test loss is calculated and printed.

12.Predictions are made on the testing data using the trained model.

13.The predicted values are denormalized using the inverse_transform function of the scaler.

14.The predicted and actual values are converted to binary values using a threshold of 0.5.

15.The accuracy of the predictions is calculated using the accuracy_score function.

16.The actual and predicted sales values are plotted using matplotlib.

17.The loss values during training are plotted to visualize the training progress.
18. Accuracy: 0.8630912004404203
I hope this explanation helps! Let me know if you have any further questions.
