# Fashion_Retail_Sales


# Outline

- [&nbsp;&nbsp;1.1 Tools](#l1.1)
- [&nbsp;&nbsp;1.2 Data Loading](#l1.2)
- [&nbsp;&nbsp;1.1 Goal](#l1.3)

- [2 Data Observation](#l2)
- [&nbsp;&nbsp;2.1 Null Values](#l2.1)
- [&nbsp;&nbsp;2.4 Duplicated Data](#l2.2)
- [&nbsp;&nbsp;2.2 General Observation](#l2.3)


- [3 Preprocessing](#l3)
- [&nbsp;&nbsp;3.1 Encoding](#l3.1)
- [&nbsp;&nbsp;3.2 Corr Matrix](#l3.2)

- [4 Visualization](#l4)
- [&nbsp;&nbsp;4.1 Outliers](#l4.1)

- [5 Spliting Data](#l5)

- [6 Param Tunning](#l6)

- [7 Evaluation](#l7)

In [None]:
#%pip install -i https://test.pypi.org/simple/ vectice==23.4.2.0a11304232
%pip install vectice
%pip install vectice[autolog]

In [None]:
from vectice import autolog
autolog.config(api_token="your_api_key", phase="_")

<a name="l1.1"></a>
##  1.1 Tools
In this lab, we will make use of: 

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV, ShuffleSplit
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import ConfusionMatrixDisplay, classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import RidgeClassifier
from sklearn.ensemble import GradientBoostingRegressor
import seaborn as sn
import warnings
from xgboost import XGBClassifier
warnings.filterwarnings('ignore')

<a name="l1.2"></a>
## 1.2 Data Loading

- The Dataset that used in this project is <a href="https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction">Heart Failure Prediction Dataset</a>

In [None]:
df = pd.read_csv("/Users/bryandaversa/kaggle_test/Fashion_Retail_Sales.csv")
df.head()

<a name="l1.3"></a>
## 1.3 Goal

<a name="l2"></a>
#  2 Data Observation

<a name="l2.1"></a>
##  2.1 null values

In [None]:
df.isna().sum()

In [None]:
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
df['Purchase Amount (USD)'] = imputer.fit_transform(df['Purchase Amount (USD)'].values.reshape(-1, 1))
df['Review Rating'] = imputer.fit_transform(df['Review Rating'].values.reshape(-1, 1))

<a name="l2.2"></a>
## 2.2 Duplicated Data

In [None]:
df.duplicated().sum()

<a name="l2.3"></a>
## 2.3 General Observation

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
for column in df:
    print(f"{column} - {len(df[column].unique())} : {df[column].unique()}")

<a name="l3"></a>
#  3 Preprocessing

<a name="l3.1"></a>
##  3.1 Date Purchase

In [None]:
def dateconvertor(df):
    new_df = df.copy()
    token = df["Date Purchase"].split("-")
    new_df["Year"] = int(token[0])
    new_df["Month"] = int(token[1])
    new_df["Day"] = int(token[2])
    return new_df

In [None]:
df1 = df.apply(dateconvertor,axis=1)
df1.head()

In [None]:
df1.drop(["Date Purchase", "Customer Reference ID"], axis = 1, inplace = True)

<a name="l3.2"></a>

## 3.2 Item Purchased

In [None]:
df1["Item Purchased"].value_counts()

In [None]:
df1

In [None]:
df2 = pd.get_dummies(df1, columns = ["Item Purchased"])
df2.head()

In [None]:
columns = ["Purchase Amount (USD)", "Review Rating", "Year", "Month", "Day"]
scaler = StandardScaler()
df2[columns] = scaler.fit_transform(df2[columns])
df2.head()

In [None]:
class_name = ["Credit Card", "Cash"]
df2["Payment Method"] = df2["Payment Method"].replace(class_name, [1, 0])
df2.head()

In [None]:
df2.info()

<a name="l4"></a>
#  4 Visualization

In [None]:
df2.hist(figsize = (10, 10), rwidth = 0.95, color = "skyblue", grid = False)
plt.title("Distributions")
plt.savefig("Distributions.png")

<a name="l4.1"></a>
##  4.1 OutLiers

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(12, 6))

sn.boxplot(x=df1["Purchase Amount (USD)"], ax=axes[0])
axes[0].set_title("Box-plot of 'Purchase Amount'")

sn.boxplot(x=df1["Review Rating"], ax=axes[1])
axes[1].set_title("Box-plot of Review Rating")

plt.tight_layout()
plt.savefig("box_plot.png")
plt.show()

In [None]:
def outliers(attr):
    Q1 = df1[attr].quantile(0.25)
    Q2 = df1[attr].quantile(0.5)
    Q3 = df1[attr].quantile(0.75)
    
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5*IQR
    upper_bound = Q3 + 1.5*IQR
    
    return df1[(df1[attr] < lower_bound) | (df1[attr]>upper_bound)]

In [None]:
len(outliers("Purchase Amount (USD)"))

<a name="l5"></a>
#  5 Spliting Data

In [None]:
X = df2.drop(["Payment Method"], axis = 1)
y = df2["Payment Method"]

train_x, test_x, train_y, test_y = train_test_split(X, y, test_size = 0.2, random_state = 42)

In [None]:
XGB_model = XGBClassifier(n_estimators = 100, learning_rate=0.01)
XGB_model.fit(train_x, train_y)
XGB_model.score(test_x, test_y)

<a name="l7"></a>
# 7 Evaluation

In [None]:
from sklearn.metrics import accuracy_score, roc_auc_score
y_pred = XGB_model.predict(test_x)
roc_auc = roc_auc_score(y_pred, test_y)
accuracy = accuracy_score(y_pred, test_y)

In [None]:
print(classification_report(y_pred, test_y))

In [None]:
autolog.notebook()