## PyCaret and Streamlit: Create and Deploy Wine Classifier Web App

Originally appeared in medium article by: Ruben Winastwan. The article can be found here at
```https://towardsdatascience.com/pycaret-and-streamlit-how-to-create-and-deploy-data-science-web-app-273d205271a3```

### Modified by: Venki Ramachandran as part of his Data Science Practise

In [30]:
import pandas as pd
import numpy as np

In [31]:
# Load the data set
wine_df = pd.read_csv("winequality-red.csv")

In [32]:
wine_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5


In [33]:
# What is the range of the quality column?
wine_df.quality.value_counts()

5    681
6    638
7    199
4     53
8     18
3     10
Name: quality, dtype: int64

In [34]:
# Is equal to and above 6 a good wine? Let us categorize it that way
# Change t he value in 'quality' to 'good' or 'bad' based on quality value
wine_df.quality = np.where(wine_df.quality >= 6,'Good', 'Bad')

In [35]:
wine_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,Bad
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,Bad
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,Bad
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,Good
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,Bad


### Load PyCaret using the Terminal or the iTerm window

```pip install pycaret```

<img src="./PyCaret-Install-MacOs.png">

Check and see if you get the above success message after install.

PyCaret is AWESOME...it is a low-code machine learning library that automates all of the machine learning workflows. What it does is that it provides a wrapper for popular machine learning libraries such as scikit-learn, XGBoost, LightGBM, CatBoost, and many more

In [36]:
# Step1: setup the environment
from pycaret.classification import *
exp_clf01 = setup(data = wine_df, target = 'quality', session_id = 123)

Unnamed: 0,Description,Value
0,session_id,123
1,Target,quality
2,Target Type,Binary
3,Label Encoded,"Bad: 0, Good: 1"
4,Original Data,"(1599, 12)"
5,Missing Values,False
6,Numeric Features,11
7,Categorical Features,0
8,Ordinal Features,False
9,High Cardinality Features,False


2021-05-12 12:35:19.411 INFO    logs: create_model_container: 0
2021-05-12 12:35:19.412 INFO    logs: master_model_container: 0
2021-05-12 12:35:19.413 INFO    logs: display_container: 1
2021-05-12 12:35:19.418 INFO    logs: Pipeline(memory=None,
         steps=[('dtypes',
                 DataTypes_Auto_infer(categorical_features=[],
                                      display_types=True, features_todrop=[],
                                      id_columns=[],
                                      ml_usecase='classification',
                                      numerical_features=[], target='quality',
                                      time_features=[])),
                ('imputer',
                 Simple_Imputer(categorical_strategy='not_available',
                                fill_value_categorical=None,
                                fill_value_numerical=None,
                                numeric_stra...
                ('scaling', 'passthrough'), ('P_transform', 'p

We passed the following parameters:

    * data — Our input data.
    * target — The name of the feature that we want to predict (dependent variable).
    * session_id — The identifier for our setup environment.


You need to take a look at the output carefully because there are times when the function infers the data types incorrectly. If you find that one of the feature is inferred incorrectly, you can correct it by doing the following:

```exp_clf01 = setup(data = wine_df, target = 'quality', session_id = 123, categorical_features = ['feature1', 'feature2'], numerical_features = ['feature3', 'feature4'])```

You can use categorical_features or numerical_features parameter to change the data types that are incorrectly inferred by setup() function

In [37]:
# Building a machine learning model
# With PyCaret, you’re able to compare the performance of 
# different kinds of classification models with literally single line of code
best = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
rf,Random Forest Classifier,0.8222,0.8973,0.8384,0.8357,0.8364,0.6416,0.6429,0.054
et,Extra Trees Classifier,0.8159,0.9044,0.8319,0.8302,0.8306,0.629,0.6299,0.043
xgboost,Extreme Gradient Boosting,0.8132,0.8799,0.822,0.8327,0.8267,0.6242,0.6254,0.291
lightgbm,Light Gradient Boosting Machine,0.8132,0.8849,0.8204,0.8346,0.8266,0.6242,0.6257,0.593
gbc,Gradient Boosting Classifier,0.7855,0.8593,0.799,0.8071,0.8018,0.5682,0.5703,0.026
ridge,Ridge Classifier,0.7569,0.0,0.7497,0.791,0.7688,0.5131,0.5151,0.006
lr,Logistic Regression,0.7507,0.8174,0.748,0.7825,0.7642,0.5,0.5015,0.504
lda,Linear Discriminant Analysis,0.7489,0.8173,0.7513,0.7779,0.7635,0.496,0.4974,0.005
dt,Decision Tree Classifier,0.7444,0.7411,0.7809,0.7568,0.7684,0.4835,0.4841,0.005
nb,Naive Bayes,0.7418,0.8043,0.7646,0.7615,0.7621,0.4798,0.4811,0.005


2021-05-12 12:35:40.933 INFO    logs: create_model_container: 14
2021-05-12 12:35:40.934 INFO    logs: master_model_container: 14
2021-05-12 12:35:40.934 INFO    logs: display_container: 2
2021-05-12 12:35:40.936 INFO    logs: RandomForestClassifier(bootstrap=True, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=None, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=100,
                       n_jobs=-1, oob_score=False, random_state=123, verbose=0,
                       warm_start=False)
2021-05-12 12:35:40.936 INFO    logs: compare_models() succesfully completed......................................


### Experiment 2: Tuned Setup
Before we go further, let’s see whether we can improve the performance of the models by tuning our setup() function.

In [11]:
exp_clf102 = setup(data = wine_df, target = 'quality',session_id=123, normalize = True, transformation = True)

Unnamed: 0,Description,Value
0,session_id,123
1,Target,quality
2,Target Type,Binary
3,Label Encoded,"Bad: 0, Good: 1"
4,Original Data,"(1599, 12)"
5,Missing Values,False
6,Numeric Features,11
7,Categorical Features,0
8,Ordinal Features,False
9,High Cardinality Features,False


In [12]:
# Let us try the best command again
best = compare_models()

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC,TT (Sec)
et,Extra Trees Classifier,0.8231,0.9032,0.8434,0.8338,0.8379,0.6431,0.6443,0.041
rf,Random Forest Classifier,0.8222,0.8976,0.8351,0.838,0.8359,0.6418,0.643,0.051
lightgbm,Light Gradient Boosting Machine,0.8141,0.8835,0.8237,0.8337,0.8275,0.626,0.6277,0.529
xgboost,Extreme Gradient Boosting,0.8132,0.8799,0.822,0.8327,0.8267,0.6242,0.6254,0.22
gbc,Gradient Boosting Classifier,0.7873,0.8596,0.7991,0.8095,0.803,0.572,0.5741,0.026
lr,Logistic Regression,0.7525,0.8201,0.7727,0.7719,0.7711,0.5015,0.5032,0.009
qda,Quadratic Discriminant Analysis,0.7507,0.8123,0.776,0.7679,0.7711,0.4972,0.4985,0.004
ridge,Ridge Classifier,0.7498,0.0,0.7595,0.775,0.7659,0.4972,0.499,0.005
lda,Linear Discriminant Analysis,0.7498,0.8215,0.7595,0.775,0.7659,0.4972,0.499,0.005
dt,Decision Tree Classifier,0.7444,0.7413,0.7793,0.7578,0.768,0.4837,0.4844,0.006


### Normalize and Transform the features
We passed several additional parameters there to tune our setup:

    * normalize — To transform our features by scaling them to a given range.
    * transformation — To transform our features such that our data can be represented by  a normal distribution. This can be helpful for models like Logistic Regression, LDA, or Gaussian Native Bayes.

Also note that we use the same session_id as our previous setup() function. This is to make sure that all of the future improvements on the model are solely due to the change that we’ve implemented in this setup() function.

As you can see, most of the metrics are slightly improved after we tuned the setup. Before we tuned the setup, the F1 score of Extra Tree classifier is 0.8306. After we tuned the setup, the F1 score becomes 0.8375.


In [13]:
# Based on this result, let’s build our Extra Tree classifier. We can do this with a single line of code
et_model = create_model('et')

Unnamed: 0,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,0.8304,0.9087,0.7869,0.8889,0.8348,0.6618,0.667
1,0.8214,0.9182,0.8361,0.8361,0.8361,0.64,0.64
2,0.8482,0.9346,0.8525,0.8667,0.8595,0.6945,0.6946
3,0.8393,0.9039,0.8689,0.8413,0.8548,0.6749,0.6754
4,0.8393,0.8941,0.8852,0.8308,0.8571,0.6739,0.6757
5,0.8482,0.9261,0.8525,0.8667,0.8595,0.6945,0.6946
6,0.8214,0.8987,0.8689,0.8154,0.8413,0.6377,0.6393
7,0.7768,0.8699,0.8333,0.7692,0.8,0.5484,0.5506
8,0.7768,0.8827,0.8,0.7869,0.7934,0.5507,0.5508
9,0.8288,0.8953,0.85,0.8361,0.843,0.6549,0.655


In [15]:
# Next, you can evaluate your model by looking at the visualization of the ROC curve, 
# feature importance, or confusion matrix of your model.
evaluate_model(et_model)

# Click on each of the buttons on the Plot Type and see the magic

interactive(children=(ToggleButtons(description='Plot Type:', icons=('',), options=(('Hyperparameters', 'param…

We can now use our ```Extra Tree classifier``` to predict the test data that has been generated by PyCaret. As mentioned earlier, soon after we executed the setup() function at the very first step, PyCaret will automatically split our data into training data and test data. All of the model performance and evaluation metrics that we’ve seen above are solely based on the training data.

To use the model to predict the test data, we can use the predict_model function

In [16]:
predict_model(et_model)

Unnamed: 0,Model,Accuracy,AUC,Recall,Prec.,F1,Kappa,MCC
0,Extra Trees Classifier,0.7875,0.8858,0.8387,0.7704,0.8031,0.5732,0.5757


Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,Label,Score
0,1.020588,-0.262337,0.879940,0.909526,-0.811347,-1.539989,-1.767543,-0.001569,0.179457,-0.026319,1.507669,Good,Good,0.88
1,0.803912,0.365825,0.069809,0.182406,0.169895,0.853181,1.990040,0.562903,-0.216540,-0.678448,-1.071227,Bad,Bad,0.95
2,-0.548070,1.883447,-0.893328,-0.877415,-0.243956,0.626979,0.590835,-0.012134,0.309834,-0.576033,-1.403862,Bad,Bad,0.79
3,0.107601,-1.862275,0.790650,-0.626787,-0.545447,-0.666601,-1.164981,-1.619909,-0.619242,-0.783892,1.558527,Good,Good,0.85
4,-0.094336,-0.848967,-0.143101,-0.877415,-0.079076,0.853181,0.377180,-0.235514,-0.150025,0.344294,-1.234888,Bad,Bad,0.62
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
475,-0.548070,-0.140215,0.322042,-0.184964,-1.025574,1.330936,0.533354,0.282229,-0.017589,0.205814,-0.205689,Good,Good,1.00
476,1.829642,-0.021461,1.097954,0.007663,0.466399,-1.351598,-1.357751,1.774647,-0.754683,-2.001079,-0.760163,Bad,Bad,1.00
477,-0.629806,-0.781026,0.222889,-0.877415,-0.301560,1.330936,0.934478,0.056549,0.761128,-0.287161,-0.912956,Good,Good,1.00
478,0.936346,-1.341807,0.654205,-1.148794,-1.025574,-1.735120,-1.697007,-1.047378,-1.095523,-0.476666,1.112170,Good,Good,0.90


### Test Data

Make sure that the number of rows above = 480 which is the rows in the test data when PyCaret automatically split it into train and test data set.

In [17]:
# Let us save the model
save_model(et_model, model_name = 'extra_tree_model')

Transformation Pipeline and Model Succesfully Saved


(Pipeline(memory=None,
          steps=[('dtypes',
                  DataTypes_Auto_infer(categorical_features=[],
                                       display_types=True, features_todrop=[],
                                       id_columns=[],
                                       ml_usecase='classification',
                                       numerical_features=[], target='quality',
                                       time_features=[])),
                 ('imputer',
                  Simple_Imputer(categorical_strategy='not_available',
                                 fill_value_categorical=None,
                                 fill_value_numerical=None,
                                 numeric_stra...
                  ExtraTreesClassifier(bootstrap=False, ccp_alpha=0.0,
                                       class_weight=None, criterion='gini',
                                       max_depth=None, max_features='auto',
                                       max_leaf_nod

## Build the Web App with Streamlit

Now it’s time for us to build our wine classifier web app. In this post, we’re going to use Streamlit to build the web app as it is more beginner friendly than Flask.

```pip install streamlit```

<img src="./streamlit_install.png">

The first thing that we need to do is importing all of the relevant libraries. Note that we’re going to use the Extra Tree classifier model that we’ve saved before using PyCaret. To load the model and to make a prediction using the saved model, we can use load_model and predict_model functions from PyCaret

In [27]:
# Create a new .py script in the same location and copy this code.
# Load the libraries and set the landing page

from pycaret.classification import load_model, predict_model
import streamlit as st
import pandas as pd
import numpy as np


def predict_quality(model, df):
    
    predictions_data = predict_model(estimator = model, data = df)
    return predictions_data['Label'][0]
    
model = load_model('extra_tree_model')


st.title('Wine Quality Classifier Web App')
st.subheader('Life is too short to drink Bad Wine - Old Jungle Saying!!')
st.write(
    "This is a web app to classify the quality of your wine based on\n"
    "several features that you can see in the sidebar. Please adjust the\n"
    "value of each feature. After that, click on the Predict button at the bottom to\n"
    "see the prediction of the classifier.\n\n"
    "Coding details originally appreared in medium article by Ruben Winastwan, All credit goes to him.\n\n"
    "Modified By: Venki Ramachandran on 12-May-2021."
)

2021-05-12 12:34:33.678 INFO    logs: Initializing load_model()
2021-05-12 12:34:33.679 INFO    logs: load_model(model_name=extra_tree_model, platform=None, authentication=None, verbose=True)


Transformation Pipeline and Model Successfully Loaded


Next, we need to let the user to specify the value of our features. Since our features are all numeric features, it will be best to represent them with a slider widget. To create a slider widget, we can use slider() function from Streamlit

In [28]:
# Allow the user to select new features for prediction
fixed_acidity = st.sidebar.slider(label = 'Fixed Acidity', min_value = 4.0,
                          max_value = 16.0 ,
                          value = 10.0,
                          step = 0.1)

volatile_acidity = st.sidebar.slider(label = 'Volatile Acidity', min_value = 0.00,
                          max_value = 2.00 ,
                          value = 1.00,
                          step = 0.01)
                          
citric_acid = st.sidebar.slider(label = 'Citric Acid', min_value = 0.00,
                          max_value = 1.00 ,
                          value = 0.50,
                          step = 0.01)                          

residual_sugar = st.sidebar.slider(label = 'Residual Sugar', min_value = 0.0,
                          max_value = 16.0 ,
                          value = 8.0,
                          step = 0.1)

chlorides = st.sidebar.slider(label = 'Chlorides', min_value = 0.000,
                          max_value = 1.000 ,
                          value = 0.500,
                          step = 0.001)
   
f_sulf_diox = st.sidebar.slider(label = 'Free Sulfur Dioxide', min_value = 1,
                          max_value = 72,
                          value = 36,
                          step = 1)

t_sulf_diox = st.sidebar.slider(label = 'Total Sulfur Dioxide', min_value = 6,
                          max_value = 289 ,
                          value = 144,
                          step = 1)

density = st.sidebar.slider(label = 'Density', min_value = 0.0000,
                          max_value = 2.0000 ,
                          value = 0.9900,
                          step = 0.0001)

ph = st.sidebar.slider(label = 'pH', min_value = 2.00,
                          max_value = 5.00 ,
                          value = 3.00,
                          step = 0.01)
                          
sulphates = st.sidebar.slider(label = 'Sulphates', min_value = 0.00,
                          max_value = 2.00,
                          value = 0.50,
                          step = 0.01)

alcohol = st.sidebar.slider(label = 'Alcohol', min_value = 8.0,
                          max_value = 15.0,
                          value = 10.5,
                          step = 0.1)

Next we need to convert all of those user input values into a dataframe. Then, we can use the dataframe as the input of our model’s prediction

In [29]:
features = {'fixed acidity': fixed_acidity, 'volatile acidity': volatile_acidity,
            'citric acid': citric_acid, 'residual sugar': residual_sugar,
            'chlorides': chlorides, 'free sulfur dioxide': f_sulf_diox,
            'total sulfur dioxide': t_sulf_diox, 'density': density,
            'pH': ph, 'sulphates': sulphates, 'alcohol': alcohol
            }
 

features_df  = pd.DataFrame([features])

st.table(features_df)  

if st.button('Predict'):
    
    prediction = predict_quality(model, features_df)
    
    st.write(' Based on feature values, your wine quality is '+ str(prediction))


### Deployment to localhost (your PC)
To check your web app, you need to open your prompt, then go to the working directory of your Python file.
In the working directory of your Python file, type the following:

```streamlit run app.py```

Run the above command without the & as you have to answer some prompts. wait until you see

<img src="./streamlit_deploy_success.png">

Automatically opens the http://localhost:8501 in your default browser. It should look like the below screen:

<img src="./localhost_landing_page.png">

### Deploying on streamli.io to share with the world

Please check out the public site at : https://share.streamlit.io/venkir/wine-classifier/main/app.py