# **Store Sales Forecasting with RNNs** üìàüìâ
# 2nd part - Building our ML Model

## Introduction ‚úèÔ∏è

Time series forecasting is one of the most important tasks in the world of business. It is a very complex task, and it is not always possible to predict the future. But we can build ML models to do so. One of the best ways to do so is to use recurrent neural networks (RNNs), which can handle time series data pretty well because they keep a memory state of the previous time steps.

To apply this concept, we will use the [Store Sales - Time Series Forecasting](https://www.kaggle.com/c/store-sales-time-series-forecasting/data) to predict the sales of a store in the next two weeks. We will read, manipulate and visualize the data, and then build a model to predict the sales. 

In the first notebook, we analyzed the data and feature engineered it. In this one, we will build the model and apply it. Let's get started!

## Dependencies üë™

In [3]:
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import os

## Reading Data üìñ

In [4]:
train_data = pd.read_csv('../data/train_data_cleaned.csv')
test_data = pd.read_csv('../data/test_data_cleaned.csv')

train_data.head()

Unnamed: 0,date,store_nbr,family,sales,onpromotion,city,type_of_store,cluster,dcoilwtico,transactions,n_holidays
0,2013-01-01,1,Others,0.0,0,Quito,D,13,93.14,,1.0
1,2013-01-01,1,Others,0.0,0,Quito,D,13,93.14,,1.0
2,2013-01-01,1,Others,0.0,0,Quito,D,13,93.14,,1.0
3,2013-01-01,1,BEVERAGES,0.0,0,Quito,D,13,93.14,,1.0
4,2013-01-01,1,Others,0.0,0,Quito,D,13,93.14,,1.0


## Data Manipulation üìù
### Replacing Missing Values

In [5]:
train_data.isnull().sum()

date                  0
store_nbr             0
family                0
sales                 0
onpromotion           0
city                  0
type_of_store         0
cluster               0
dcoilwtico       857142
transactions     245784
n_holidays            0
dtype: int64

In [6]:
train_data.dcoilwtico.fillna(method='ffill', inplace=True)
train_data.transactions.fillna(0, inplace=True)

test_data.dcoilwtico.fillna(method='ffill', inplace=True)
test_data.transactions.fillna(0, inplace=True)

### Correlation Matrix

In [7]:
corr = train_data.corr()
fig = px.imshow(corr)
fig.update_layout(title='Correlation Matrix')