Skip to content

xGabrielR/Rossmann-Store-Sales

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rossmann Store Sales -> Data Science Project

rossmann

Full PDF Documentation PT-BR.

0. Rossmann Stores Data and Info

Rossmann are a chain of pharmacies located in Europe, mainly in Germany, with around 56,200 employees and more than 4000 stores. The company was founded by Dirk Rossmann with its headquarters in Burgwedel near Hanover in Germany. ~ Wiki.

What is a pharmacies chain?
Basically the chain start with one shop open named 'matrix', the next shop open is the branche. Have two types of chain, the associated chain is when several pharmacies with different owners get together & private chain which starts with the 'matrix' company and has a general owner.

Rossmann CFO on a monthly results meeting asked to all store mananger a sales forecast for the next six months.

CFO Like to know next sales for start a reform of all shops. Pharmaceutical Bussiness Model Rossmann is present with an e-commerce and in physical stores available for sales of household items, makeup and of course drugstore items, as it is a chain of pharmacies, it is spread over several parts of Europe, thus being able to select regions with greater growth potential and reducing the competition rate. 'First Assumptions'

The Course Base Seja Um Data Scientist.

    Market Size.
    All persons over 18 years old, with preference for older persons.
    Marketing Channels.
    Rossmann Website & Shops.
    Principal Metrics.
    Channel Offline: Working on physical stores.
    Recency: Purchases over time.
    Frequency: Shop sales frequency for sales forecast.
    Market Share: Sales competitions.

chanel

  1. Do older customers buy more from physical stores or from competitors?
  2. What is the marketing investment compared to physical stores in terms of e-commerce?
  3. What are the new products that make customers buy from Rossmann stores instead of competing stores?
  4. How do these stores behave in terms of receiving new merchandise to sell to new customers?
  5. Are the products sold easily accessible?
  6. How are the prices of the products in relation to the location of the stores?
  7. How are rossmann products and stores being evaluated?
  8. What is the buying process like for these customers?
  9. Would a customer who bought in a physical store buy again?
  10. How much does a customer cost for physical stores?
  11. Who are the main partners of the rossmann brand?
  12. Is there a community of customers where they can engage with the products? ...

Data Information at: https://www.kaggle.com/c/rossmann-store-sales

First Deploy is Telegram Bot

rossbot

Second Deploy executable software.

In Dev

1. Business Problem

Rossman's CFO would like to predict how much money its stores will generate to renovate them in the future.

Rossmann CFO, asked to all of shops merchant's to send for him this prediction, with this problem, all rossmann's merchant's asked to data/analisys team this prediction.

New Version of project (04/02/2021)

2. Solution Strategy & Assumptions

First CRISP Cycle

2.1. After Stakeholder Interview

How might we identify the budget needed for stores reformation?

2.2. Data Product

A.I Model to forecast the sales at smartphone

    Data Clearing & Descriptive Statistical.
    First real step is download the dataset, import in jupyter and start in seven steps to change data types, data dimension, fillout na... At first statistic dataframe, i used simple statistic descriptions to check how my data is organized.
    Feature Engineering.
    In this step, with coggle.it to make a mind map and use the mind map to create some hypothesis list, after this list, i created some new features based on date.
    Data Filtering.
    Simple way to reduce dimensionality of dataset.
    Exploration Data Analysis.
    Validation of all hypotesis list with data.
    Data Preparation.
    Split & Prepare and Prepare & Split, this two versios of preparation can provide data leak.
    Machine Learning Modeling.
    Selection of Four ML Models, Base, Linear and two Tree-Based.

3. EDA Insight's

After brainstorming and hypothesis validation, some insights appeared.

Top 3 Insight's

  • Stores with large assortment, sell less.
  • sales

  • Stores with consecutive promo, sell less if long time of promo.
  • promo

  • Stores with closely competitors, sell more.
  • less

4. Data Preparation

When you have "date" features on dataset, its possible to you get data leak during model training. I have selected two tipes of preparation, one splited train and test after data preparation, and before data preparation to check the data leakege.

    Categorical Data.
    Used the Frequency Encoding to all Categorical Data.
    Normalization.
    After KStest and QQplot, it was not necessary to normalize, because dont have normal distribution.
    Nature Transformation.
    Working with Sin/Cos for seasonal data.

4.1. Frequency Encoding

It is an encoder method that takes into account the number of times the value appears, for example in 10 records, 5 of which are blue and red, so the frequency is .5%

4.2. QQPlot

With QQPlot Quantile-Quantile Plot it is possible to observe how close the tested distribution is to a normal distribution, the normal distribution is characterized when blue line is equal to red line, there are other ways of doing this verification such as statistical tests, among others.

1

4.3. Feature Selection

XGBoost Feature Importance is a fast and good way to see which feature is important, feature selection is a second way to select features for better performace of model and following the principles of Occam's Razor.

feature_importance

Feature selection is one of most importante step on data science projects.

5. Machine Learning Models

I have used three models, SVR (Support Vector Regression), Random Forest and XGBoost (Gradient boosted decision tree).

models

I have selected the XGBoost than all of other two for production, in the step of hyperparameter fine tuning I used a tuning technique called Random Search and tested the trained model in the dataset with data leakage and in the dataset without data leakage. The information are in Notebook m03_machine_learningII.

Neural Network performace for aprox 40 epochs.

nn

6. Bussiness Results

This istep is to convert the model performace in money!!.

Below have model performace for two of the mos harder shops to forecast, there are stores where the algorithm cannot predict sales, so the RMSE error was high. MAE error be greater too, to avoid this is train more the model and work on better features. Have two columns, worst & best scenario, this columns is the sum and subtraction respectively os MAE for each model forecast.

hard_shops

Below have the Sum of sales for each senario.

model_money

7. Model Deployment

For deployment i selected Heroku for base clound 24/7h free.

Made a Telegram Bot and Personal '.exe' app for CFO to check the sales on smartphone and desktop.

sales

img

7. References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published