## Store Sales - Time Series Forecasting

### **Descriptions**

**train.csv:**
* store_nbr =  identifies the store at which the products are sold.
* family =  identifies the type of product sold.
* sales = total sales for a product family at a particular store at a given date. Fractional values are possible since products can be sold in fractional units (1.5 kg of cheese, for instance, as opposed to 1 bag of chips).
* onpromotion = the total number of items in a product family that were being promoted at a store at a given date.

**test.csv:**
* Same features as the training data. The target sales for the dates in this files.
* The dates in the test are for the 15 days after the last date in the training data.

**stores.csv**

* Store metadata: city, state, type and cluster.
* Cluster is a grouping of similar stores.

**oil.csv**

* Daily oil price. Include values during both the train and test data timeframes.
* Ecuador is an oil-dependent country and it´s economical health is highly bulnerable to shocks in oil prices.


**Holidays_events.csv**

* Holidays and events, with metadata.
* Transferred column: A transferred day is more like a normal day than a holiday.

**Additional Notes**
* Wages in the public sector are paid every two weeks on the 15 th and on the last day of the month. Supermarket sales could be affected by this. (Seasonality?)
* A magnitude 7.8 earthquake struck Ecuador on `April 16, 2016`. People rallied in relief efforts donating water and other first need products which greatly affected supermarket sales for several weeks after the earthquake.

In [1]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

### 1st Part: **EDA**

In [2]:
# Paths
train_pth = r"data\train.csv"
test_pth = r"data\test.csv"
stores_pth = r"data\stores.csv"
oil_pth = r"data\oil.csv"
transactions_pth = r"data\transactions.csv"
holidays_events_pth = r"data\holidays_events.csv"

In [3]:
# Datasets
train = pd.read_csv(train_pth)
stores = pd.read_csv(stores_pth)
oil = pd.read_csv(oil_pth)
events = pd.read_csv(holidays_events_pth)
transactions = pd.read_csv(transactions_pth)

# Merge train -  stores - transactions
df = pd.merge(train,stores,on='store_nbr')
df = pd.merge_ordered(df, transactions, on=['date', 'store_nbr'])





In [4]:
df.head()

Unnamed: 0,id,date,store_nbr,family,sales,onpromotion,city,state,type,cluster,transactions
0,0,2013-01-01,1,AUTOMOTIVE,0.0,0,Quito,Pichincha,D,13,
1,1,2013-01-01,1,BABY CARE,0.0,0,Quito,Pichincha,D,13,
2,2,2013-01-01,1,BEAUTY,0.0,0,Quito,Pichincha,D,13,
3,3,2013-01-01,1,BEVERAGES,0.0,0,Quito,Pichincha,D,13,
4,4,2013-01-01,1,BOOKS,0.0,0,Quito,Pichincha,D,13,


In [5]:
df.shape

(3000888, 11)

In [6]:
events.shape

(350, 6)

In [7]:
oil.date.is_unique

True