<a href="https://colab.research.google.com/github/twisha-k/Python_notes/blob/main/95_coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 95 -  Streamlit Widgets II

### Teacher-Student Activities

In the previous class, we started building a web app implementing a multiclass classification model capable of predicting the glass type. Conitnuing from the same class, we will explore few Streamlit widgets useful for displaying various charts and plots such as count plot, pie chart, correlation heatmap and pair plot that help us to explore the dataset and provides us with some useful information.

Today, we will also add more functionalities to this glass-type prediction model wherein a user will be able to select a classfication ML algorithm on which the model should be built and will also be able to optimise different model parameters to obtain a more accurate model.

Let's quickly go through the activities covered in the previous class and begin this class from **Activity 1: Displaying Plots** section.

---

#### Problem statement

Recall the glass-type classification that you had performed in one of your previous classes wherein you classified different types of glasses based on their chemical and physical composition.

**Dataset Description:**

The dataset used in this problem statement involves the classification of samples of different glasses based on their physical and chemical properties. They are as follows:

1. **RI:** Refractive Index

2. **Na:** Sodium

3. **Mg:** Magnesium

4. **Al:** Aluminum

5. **Si:** Silicon

6. **K:** Potassium

7. **Ca:** Calcium

8. **Ba:** Barium

9. **Fe:** Iron


There are seven types (classes or labels) of glass listed; they are:

* **Class 1:** used for making building windows (float processed)

* **Class 2:** used for making building windows (non-float processed)

* **Class 3:** used for making vehicle windows (float processed)

* **Class 4:** used for making vehicle windows (non-float processed)

* **Class 5:** used for making containers

* **Class 6:** used for making tableware

* **Class 7:** used for making headlamps


**Dataset Credits:** https://archive.ics.uci.edu/ml/datasets/Glass+Identification


**Citation:** Dua, D., & Graff, C.. (2017). UCI Machine Learning Repository






---

#### Importing modules and Loading Data




First create a python file `glass_type_app.py` in Sublime editor and save it in `Python_scripts` folder created earlier. Copy the code given below in `glass_type_app.py` file.

The code given below performs the following tasks: *(Learnt in  **Logistic Regression - Multiclass Classification I**)*
1. Imports necessary libraries including the `streamlit`.
2. Drops unnecessary columns and  provides suitable column headers to the independent variables.

3. Creates feature DataFrame and target variables.
4. Splits the dataset into train and test sets using the `train_test_split()` function.


**Note:** Do not run the code shown below. It will thrown an error.


In [None]:
# Open Sublime text editor, create a new Python file, copy the following code in it and save it as 'glass_type_app.py'.
# You have already created this ML model in ones of the previous classes.

# Importing the necessary Python modules.
import numpy as np
import pandas as pd
import streamlit as st
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import plot_confusion_matrix

# ML classifier Python modules
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Loading the dataset.
@st.cache()
def load_data():
    file_path = "glass-types.csv"
    df = pd.read_csv(file_path, header = None)
    # Dropping the 0th column as it contains only the serial numbers.
    df.drop(columns = 0, inplace = True)
    column_headers = ['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'GlassType']
    columns_dict = {}
    # Renaming columns with suitable column headers.
    for i in df.columns:
        columns_dict[i] = column_headers[i - 1]
        # Rename the columns.
        df.rename(columns_dict, axis = 1, inplace = True)
    return df

glass_df = load_data()

# Creating the features data-frame holding all the columns except the last column.
X = glass_df.iloc[:, :-1]

# Creating the target series that holds last column.
y = glass_df['GlassType']

# Spliting the data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

**Note:** You have to store the `glass-types.csv` file in your computer in the same folder that contains the above Python script. You can download the `glass-types.csv` file from the link provided below.

https://s3-student-datasets-bucket.whjr.online/whitehat-ds-datasets/glass-types.csv


In the above code, we encapsulated some part of the code inside a function `load_data()` and added `st.cache()` decorator at the start of this function. Let us understand why?

**The `st.cache()` decorator:**

- Each time when you rerun your Streamlit app or change any widget value, the whole app runs from start to end.
- This is not feasible when we create complicated machine learning apps as it will consume lot of time in rerunning the entire app.
- Streamlit provides a functionality called **caching**, where data is not loaded each time when the app is loaded. Instead the data is loaded from the local cache (a temporary storage location on your machine). This saves cpu cycles and memory time, thereby improving the performance of your web app.


In our case, we added `@st.cache()` decorator at the start of `load_data()` function as this part of code will not change more often.

---

#### Adding `prediction()` Function

As done in the previous class, let us add a function, say `prediction()`, that will predict the type of glass for every unique combination of `RI`, `Na`, `Mg`, `Al`, `Si`, `K`, `Ca`, `Ba` and `Fe` values.

1. The `prediction()` function takes 10 inputs:

  - `model` (It holds the algorithm chosen by user)
  - `RI`
  - `Na`
  - `Mg`
  - `Al`
  - `Si`
  - `K`
  - `Ca`
  - `Ba`
  - `Fe`

2. Inside the `prediction()` function:

  - Call the `predict()` function on the `model` object.
   
  - The `predict()` function returns an array containing a single-digit integer value that would be either 1, 2, 3, 5, 6 or 7 where
    
    - `1` denotes the glass used for making building windows (float processed) i.e `building windows float processed`
    
    - `2` denotes the glass used for making building windows (non-float processed) i.e. `building windows non float processed`
    
    - `3` denotes the glass used for making vehicle windows (float processed) i.e. `vehicle windows float processed`
    
    - `4` denotes the glass used for making vehicle windows (non-float processed) i.e. `vehicle windows non float processed`
    
    - `5` denotes the glass used for making containers i.e. `containers`
    
    - `6` denotes the glass used for making tableware i.e. `tableware`
    
    - `7` denotes the glass used for making headlamps i.e. `headlamps`

    **Note:** There are no records for the glass-type `6` in the dataset. Nevertheless, we are still accounting for it in our code.
   
  - Extract the integer value using the indexing method i.e. `array_name[0]` and store it in the `glass_type` variable.

  - Return the type of glass by checking the value of `type` variable i.e.,
  
    - If `glass_type == 1`, then return `"building windows float processed"`
  
    - Else if `glass_type == 2`, then return `"building windows non float processed"`

    - Else if `glass_type == 3`, then return `"vehicle windows float processed"`

    - Else if `glass_type == 4`, then return `"vehicle windows non float processed"`

    - Else if `glass_type == 5`, then return `"containers"`

    - Else if `glass_type == 6`, then return `"tableware"`
  
    - Else return `"headlamp"`

3. Also mark the `prediction()` function with Streamlit decorator `@st.cache()`.

**Note:** Do not run the code shown below. It will thrown an error.

In [None]:
# Create a function that accepts an ML model object say 'model' and the nine features as inputs
# and returns the glass type.
@st.cache()
def prediction(model, ri, na, mg, al, si, k, ca, ba, fe):
    glass_type = model.predict([[ri, na, mg, al, si, k, ca, ba, fe]])
    glass_type = glass_type[0]
    if glass_type == 1:
        return "building windows float processed".upper()
    elif glass_type == 2:
        return "building windows non float processed".upper()
    elif glass_type == 3:
        return "vehicle windows float processed".upper()
    elif glass_type == 4:
        return "vehicle windows non float processed".upper()
    elif glass_type == 5:
        return "containers".upper()
    elif glass_type == 6:
        return "tableware".upper()
    else:
        return "headlamps".upper()

Next step is to add some Streamlit code for creating our front-end.

---

#### Displaying Title and Sidebar Title

Now it is time to add Streamlit widgets one by one. Let us add a title and the sidebar title for our Streamlit dashboard. This can be done using the `st.title()` and `st.sidebar.title()` functions.

**Syntax:** `st.title(some_title)`

In [None]:
# Add title on the main page and in the sidebar.
st.title("Glass Type Predictor")
st.sidebar.title("Exploratory Data Analysis")

After adding the above code, run your web app using the following command:

`streamlit run glass_type_app.py`

You will see the following output:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/67c23ee6-6e4e-499c-924b-15a498bafec6.PNG"></center>

If you notice in the image above, anything we call with the sidebar object will appear in the left-hand side of the web page.

**Important points about `st.sidebar`:**

- It is used to give a cleaner look to your app by moving your widgets into the left-hand side of your screen. This keeps your app at the centre, while the widgets are pinned to the left.

- The syntax for adding any widget to the sidebar is `st.sidebar.[widget]()`
   
  For example: `st.sidebar.title()`, `st.sidebar.checkbox()`, `st.sidebar.slider()` etc.

**Note:** For the upcoming activities, keep appending the code in the `glass_type_app.py` file using the Sublime editor and then rerun your app on your local machine.

---

#### Displaying Raw Data

You can display the dataset in raw form using `st.dataframe()` function.

**Syntax:** `st.dataframe(data)` where `data` is the pandas DataFrame object.

You can even display your dataset using `st.write()` function as follows:
    
> `st.write(data)`

Let us also add a checkbox widget in the sidebar to display the glass-type `glass_df` DataFrame only when this checkbox is clicked. It should look like this:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/b428f116-91be-4db4-a03b-755eed5e3bcc.PNG"></center>

**Syntax:** `st.checkbox(label)`

To add the checkbox in the sidebar, simply use `st.sidebar.checkbox()` instead of `st.checkbox()`.

When the user clicks on the checkbox labeled **`Show raw data`**,  
- Display a subheader on the main page with a label `"Glass Type Data set"` using `st.subheader()` function.

  **Syntax: `st.subheader(some_text)`**
  
- Below the subheader, display raw data using `st.dataframe()` function.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# Using if statement, display raw data on the click of the checkbox.
if st.sidebar.checkbox("Show raw data"):
    st.subheader("Full Dataset")
    st.dataframe(glass_df)

After running the app, it can be visualised as below:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/0360d045-be96-4dd4-942d-1c8bd86c62bd.PNG"></center>

---

Creating Plots

You can display any `matplotlib` or `seaborn` plots as Streamlit supports many visualisation frameworks. For our web app, we will create the following plots:

1. Scatter plot
2. Histogram
3. Box plot
4. Count plot
5. Pie chart
6. Correlation heatmap
7. Pair plot

Let's first try to create scatter plots between all the features and the target variable.

Let us add a drop-down list that allows a user to select multiple options. This will allow a user to select the values to be plotted on the $x$-axis of a scatter plot as we are fixing the `GlassType` column as the values to be plotted on the $y$-axis.

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/e05308c2-dc7a-4126-a968-56d81b911c86.gif"></center>

To select multiple values from a drop-down list, we use **multiselect** widget of Streamlit. It returns a list of selected options.

**Syntax:** `st.multiselect(label, (options))`

Here,
  - `label`: A short label explaining to the user what this widget is for.
  - `(options)`: Options that will go into the drop-down. These options are provided either in the form of a `list` or in the form of a `tuple`.

Let us now add a multiselect widget in the sidebar that will allow us to choose multiple $x$-axis values for the scatter plots.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# Scatter Plot between the features and the target variable.
# Add a subheader in the sidebar with label "Scatter Plot".
st.sidebar.subheader("Scatter Plot")

# Choosing x-axis values for the scatter plot.
# Add a multiselect in the sidebar with the 'Select the x-axis values:' label
# and pass all the 9 features as a tuple i.e. ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe') as options.
# Store the current value of this widget in the 'features_list' variable.
features_list = st.sidebar.multiselect("Select the x-axis values:",
                                            ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))

In the above code, the `features_list` holds multiple features selected by the user. The scatter plot will be plotted with each selected feature on the $x$-axis and `GlassType` column on $y$-axis.

**Displaying Scatter plot:**

You can display any `matplotlib` and `seaborn` plots using `st.pyplot()` function. You can think of `st.pyplot()` as an equivalent to `plt.show()`.

Now we will create the scatter plot for each of the selected features by iterating through each feature contained in the `features_list` and using the `scatterplot()` function of the `seaborn` module. You may use the `scatter()` function of the `matplotlib.pyplot` module as well for the same task.


**Note:**
- While using `st.pyplot()`, you will see a find "Deprecation warning" on your webpage. To remove that warning, add the following line before using `st.pyplot()`:

  `st.set_option('deprecation.showPyplotGlobalUse', False)`

- Don't run the code shown below. It will throw an error.

In [None]:
# Create scatter plots between the features and the target variable.
# Remove deprecation warning.
st.set_option('deprecation.showPyplotGlobalUse', False)

for feature in features_list:
    st.subheader(f"Scatter plot between {feature} and GlassType")
    plt.figure(figsize = (12, 6))
    sns.scatterplot(x = feature, y = 'GlassType', data = glass_df)
    st.pyplot()

In the above code, observe that `st.pyplot()` is used for plotting each scatter plot. If it is not specified after creating a plot, then that plot object will reside in memory but it won't be rendered on the screen.

In the same way, you can display histograms and histograms for all the desired columns using **multiselect** widget and `hist()` function of the `matplotlib.pyplot` module and `boxplot()` function of the seaborn module respectively.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# Create histograms for all the features.
# Sidebar for histograms.
st.sidebar.subheader("Histogram")

# Choosing features for histograms.
hist_features = st.sidebar.multiselect("Select features to create histograms:",
                                            ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))
# Create histograms.
for feature in hist_features:
    st.subheader(f"Histogram for {feature}")
    plt.figure(figsize = (12, 6))
    plt.hist(glass_df[feature], bins = 'sturges', edgecolor = 'black')
    st.pyplot()

# Create box plots for all the columns.
# Sidebar for box plots.
st.sidebar.subheader("Box Plot")

# Choosing columns for box plots.
box_plot_cols = st.sidebar.multiselect("Select the columns to create box plots:",
                                            ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'GlassType'))

# Create box plots.
for col in box_plot_cols:
    st.subheader(f"Box plot for {col}")
    plt.figure(figsize = (12, 2))
    sns.boxplot(glass_df[col])
    st.pyplot()

After adding the code for all the three plots, the app can be visualised as follows:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/b4885645-8ef9-4e39-ad1c-699c7d1efa02.gif"/></center>


---

#### Activity 1: Displaying Plots

Let's continue creating the remaining plots. However, let's create them with a slight twist. Let's add a new **multiselect** widget which allows a user to choose the types of plots. Finally, we will have two **multiselect widgets** for data visualisation:

- One for the scatter plot and

- Another for all the other types of plots

It will look like this:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/014cdce8-5072-4ca7-9c74-f568f7bac886.PNG"/></center>


In [None]:
# S1.1: Remove the multiselect widgets for histograms and box plots and add a new multiselect widget to choose a type of visualisation.
# Sidebar subheader for scatter plot

# Remove deprecation warning.

# Choosing x-axis values for scatter plots.

# Creating scatter plots.

# Remove the code blocks for histogram and box plots.

# Add a subheader in the sidebar with label "Visualisation Selector"

# Add a multiselect in the sidebar with label 'Select the Charts/Plots:'
# and with 6 options passed as a tuple ('Histogram', 'Box Plot', 'Count Plot', 'Pie Chart', 'Correlation Heatmap', 'Pair Plot').
# Store the current value of this widget in a variable 'plot_types'.


In the above code, the variable `plot_types` holds a list of selected options.

Based on the selected option, the corresponding plot must be displayed. For this, we will use the `if` statement and `in` keyword to check if a plot exists in the `plot_types` list in the following way:
```python
if 'Histogram' in plot_types:
  # plot histogram
if 'Box Plot' in plot_types:
  # plot box plot
if 'Count Plot' in plot_types:
  # plot count plot
if 'Pie Chart' in plot_types:
  # plot pie chart
if 'Correlation Heatmap' in plot_types:
  # plot correlation heatmap
if 'Pair Plot' in plot_types:
  # plot pair plot
```

**Displaying Histogram:**

In the previous class, we displayed histogram for mutiple features selected by user using a multiselect widget. However, if we wish to display histogram only for a single feature instead of multiple features, we can use Streamlit's **selectbox** widget.

The selectbox widget is used to create a drop-down where the user can select only one option. It will look like this:

<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/0e042d2a-d72f-4d3e-8384-b2f01ec83ced.gif"/></center>




**Syntax:** `st.selectbox(label, (options))`

Here,
  - `label`: A short label explaining to the user what this widget is for.
  - `(options)`: Options that will go into the drop-down. These options are either provided in the form of a `list` or in the form of a `tuple`.

Unlike multiselect widget, the selectbox widget returns only a single value.

Let us add this selectbox widget, populate it with all the column names and also display the histogram for the corresponding column.

**Note:** Don't run the code shown below. It will throw an error.


In [None]:
# S1.2: Create histograms for the selected features using the 'selectbox' widget.


In the above code, the `columns` variable holds the single column selected by the user for which the histogram is to be plotted.

Similarly, display the box plot using the selectbox widget and `boxplot()` function of the `seaborn` module.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# S1.3: Create box plots for the selected column using the 'selectbox' widget.


Similarly, let us display the following remaining plots for `glass_df` DataFrame using `st.pyplot()` function:

- Count plot
- Pie chart
- Correlation heatmap
- Pair plot

For correlation heatmap, we will control the limits of $y$-axis  to obtain a more clear view of the heatmap. For this, we will use `get_ylim()` and `set_ylim()` functions while plotting heatmap.

- The `get_ylim()` function: This function returns the $y$-coordinates of the top and bottom points of an axis.
  
  **Syntax:**
  `ax.get_ylim()` where `ax` is the object of the desired plot's axis. In our case, it will be seaborn's heatmap axis. This function returns the $y$-coordinates of the top and bottom points of an axis.

- The `set_ylim()` function: This function sets the new $y$-coordinates of the top and bottom points of an axis.

  **Syntax:** `ax.set_ylim(bottom = None, top = None)` where
  - `ax` is the object of the desired plot's axis.
  - `bottom` and `top` are the new $y$-axis limits.

For our app, adjust the heatmap's $y$-axis limits by increasing the bottom margin and decreasing the top margin using the following steps:

1. Use `heatmap()` function of `seaborn` module to create correlation heatmap and store its object in the `ax` variable.

2. Call `get_ylim()` function using the `ax` object and store the returned values in the `bottom` and `top` variables.

3. Call the `set_ylim()` function using the `ax` object and pass the new bottom and top margin limits to this function. The new bottom and top margin limits would be `bottom + 0.5` and  `top - 0.5`.
     


In [None]:
# S1.4: Create count plot, pie chart, correlation heatmap and pair plot.
# Create count plot using the 'seaborn' module and the 'st.pyplot()' function.
# S1.3: Create box plots for the selected column using the 'selectbox' widget.
if 'Histogram' in plot_types:
    st.subheader("Histogram")
    columns = st.sidebar.selectbox("Select the column to create its histogram",
                                  ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))
    # Note: Histogram is generally created for continous values not for discrete values.
    plt.figure(figsize = (12, 6))
    plt.title(f"Histogram for {columns}")
    plt.hist(glass_df[columns], bins = 'sturges', edgecolor = 'black')
    st.pyplot()
if 'Box Plot' in plot_types:
    st.subheader("Box Plot")
    columns = st.sidebar.selectbox("Select the column to create its box plot",
                                  ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'GlassType'))
    plt.figure(figsize = (12, 2))
    plt.title(f"Box plot for {columns}")
    sns.boxplot(glass_df[columns])
    st.pyplot()
if 'Count Plot' in plot_types:
    st.subheader("Count plot")
    sns.countplot(x = 'GlassType', data = glass_df)
    st.pyplot()

# Create pie chart using the 'matplotlib.pyplot' module and the 'st.pyplot()' function.
if 'Pie Chart' in plot_types:
    st.subheader("Pie Chart")
    pie_data = glass_df['GlassType'].value_counts()
    plt.figure(figsize = (5, 5))
    plt.pie(pie_data, labels = pie_data.index, autopct = '%1.2f%%',
            startangle = 30, explode = np.linspace(.06, .16, 6))
    st.pyplot()

# Display correlation heatmap using the 'seaborn' module and the 'st.pyplot()' function.
if 'Correlation Heatmap' in plot_types:
    st.subheader("Correlation Heatmap")
    plt.figure(figsize = (10, 6))
    ax = sns.heatmap(glass_df.corr(), annot = True) # Creating an object of seaborn axis and storing it in 'ax' variable
    bottom, top = ax.get_ylim() # Getting the top and bottom margin limits.
    ax.set_ylim(bottom + 0.5, top - 0.5) # Increasing the bottom and decreasing the top margins respectively.
    st.pyplot()

# Display pair plots using the the 'seaborn' module and the 'st.pyplot()' function.
if 'Pair Plot' in plot_types:
    st.subheader("Pair Plots")
    plt.figure(figsize = (15, 15))
    sns.pairplot(glass_df)
    st.pyplot()

After adding the code for all the 6 plots, the app can be visualised as follows:

<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/762fc8b6-6c4e-4528-aea9-580e93b540e3.gif"/></center>

---

#### Activity 2: Adding Slider Widgets


Next step is to collect user input for making new prediction. As all the 9 features (`'RI'`, `'Na'`, `'Mg'`, `'Al'`, `'Si'`, `'K'`, `'Ca'`, `'Ba'`, `'Fe'`) are numerical, we will use slider widgets to accept the values of these features from the user. By dragging the slider left or right, the user can dynamically choose the values of the features.

<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/dd135d1a-b6e9-45f3-a74e-147410b7d898.gif"/></center>

To add a slider widget, use the following syntax:

`st.slider(label, min_value, max_value)`

Here,
 - `label`:  A short label explaining to the user what this slider is for.
 - `min_value`: The minimum value. If this parameter is not specified, the default value is `0` in case of integer values and `0.0` in case of float values.
 - `max_value`: The maximum value. If this parameter is not specified, the default value is `100` in case of integer values and `1.0` in case of float values.


Now, add 9 sliders for all the 9 features and store their return values in following 9 different variables: `ri, na, mg, al, si, k, ca, ba, fe `.

**Note:**
 - The minimum and maximum values for these sliders would be the corresponding minimum and maximum values of that particular feature.
 - Don't run the code shown below. It will throw an error.

In [None]:
# S2.1: Add 9 slider widgets for accepting user input for 9 features.


After adding the above Streamlit code and rerunning the entire app, it will look like this:

<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/ec334fc2-3633-4d05-8c3c-a0f47b701ee5.gif"/>

---

#### Activity 3: Choosing the Classifier

Once the user has selected the desired value of all the features, the next step would be to choose the classifer. For our app, we are using three classifiers, namely Support Vector Machines, Random Forest Classifier and Logistic Regression.

For our app, we will use the **selectbox** widget to aid us in selecting the different classifiers that will be used in creating the machine learning model.

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/086060dd-130b-4425-bba4-b1015107bcea.PNG"></center>


Let us now add a selectbox widget below the sliders that will allow us to choose the classifier to be implemented.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# S3.1: Add a subheader and multiselect widget.
# Add a subheader in the sidebar with label "Choose Classifier"

# Add a selectbox in the sidebar with label 'Classifier'.
# and with 2 options passed as a tuple ('Support Vector Machine', 'Random Forest Classifier').
# Store the current value of this slider in a variable 'classifier'.


In the above code, the variable `classifier` holds the name of the user selected classifier. Thus,
  - If `classifier == 'Support Vector Machine'`, implement SVM classifier
  - If `classifier == 'Random Forest Classifier'`, implement Random Forest classifier
  - If `classifier == 'Logistic Regression'`, implement Logistic Regression classifier

---

#### Activity 4: Implementing SVM with Hyperparameter Tuning

In one of the previous classes, we learned the process of Hyperparameter tuning for SVM. We learned how the values of hyperparameters i.e. `C`, `kernel` and `gamma` are optimised so that the model can make accurate predictions.

For our app, we will allow the user to adjust the values of these hyperparameters using various Streamlit widgets.

For SVM, the front-end for hyperparameter tuning must look like this:
<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/944cbc3c-e309-4f3b-a0a9-19db8a27ddd0.PNG"/></center>

For creating the above user interface, we will use the following Streamlit widgets:

1. `st.number_input(label, min_value, max_value):`  Displays a numeric input widget. Here,
    - `label`: A short label explaining to the user what this input is for.
    - `min_value`: The minimum value. If not specified, then there will be no minimum.
    - `max_value`: The maximum value. If not specified, then there will be no maximum.
    
   We will use this widget to enable the user to input the error rate i.e `C` and `gamma` value.

2. `st.radio(label, (options))`: Displays a radio button widget, where the user can select only one option from a set of options. Here,
    - `label`: A short label explaining to the user what this radio group is for.
    - `(options)`: Labels for the radio options. These options are either provided in the form of a `list` or in the form of a `tuple`.
  
   We will use this widget to enable the user to choose the desired `kernel` for implementing SVM model.

3. `st.button(label)`: Displays a button widget. Here,
   - `label` is a short label explaining to the user what this button is for.
   We will use this widget to create a button labeled `Classify`.

When the user clicks on `Classify` button, following tasks must be performed:
 - The object of `SVC` class must be created based on the hyperparameter values selected by user.
 - The model should be trained and accuracy score must be calculated.
 - The glass type must be predicted based on the feature values selected by user.
 - The confusion matrix must be plotted.

To plot confusion matrix, we will import `plot_confusion_matrix` function from the `sklearn.metrics` module.

**Syntax:**
```
from sklearn.metrics import plot_confusion_matrix
plot_confusion_matrix(model, x, y)
```
Here,
 - `model` is the fitted classifier object.
 - `x` is the feature values array.
 - `y` is the target value array.

Follow the steps given below to implement SVM with hyperparameter tuning:

- Import `plot_confusion_matrix` function from `sklearn.metrics` module.

- If `classifier == 'Support Vector Machine'`

   1. Display a subheader with label `"Model Hyperparameters:"` in the sidebar.
   
   2. Accept the value of hyperparameters i.e `C` and `gamma` using `st.sidebar.number_input()` function and `kernel` using `st.sidebar.radio()` function. The default mininum values of `C` and `gamma` should be 1.
   
   3. Store the current value of these hyperparameters in three different variables `c_value`, `kernel_input` and `gamma_input`.
   
   4. If the user clicks on `Classify` button created in sidebar:
      - Create an object of `SVC` class and store it in a variable, say `svc_model`. Pass the hyperparameter values to its constructor as follows:
      ```python
        SVC(C = c_value, kernel = kernel_input, gamma = gamma_input)
      ```
      - Call the `fit()` function on `SVC` object created above with the train set as inputs.
      - Determine the accuracy of the model by calling the `score()` function on the test set.
      - Call the `prediction()` function and pass the `SVC` object i.e `svc_model` and input widgets values to this function. Store the glass type returned by this function in the `glass_type` variable.
      - Print the predicted glass type (`glass_type`) and accuracy score of the model.
      - Call `plot_confusion_matrix` function with `svc_model` and test set as inputs. Use `st.pyplot()` after `plot_confusion_matrix()` function, otherwise the confusion matrix will not be plotted.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# S4.1: Implement SVM with hyperparameter tuning
# if classifier == 'Support Vector Machine', ask user to input the values of 'C','kernel' and 'gamma'.

    # If the user clicks 'Classify' button, perform prediction and display accuracy score and confusion matrix.
    # This 'if' statement must be inside the above 'if' statement.


After adding the above Streamlit code and rerunning the entire app, it will look like this:
<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/c7aed461-cfb8-4eaf-b14a-b53c06ae953c.PNG"/></center>

In the above app, the user can adjust the hyperparameter values to achieve the desired accuracy.

Similarly, let us implement the Random Forest Classifier where the user can play with different hyperparameter values and predict the label for the new data instances.

---

#### Activity 5: Implementing Random Forest Classifier with Hyperparameter Tuning

For Random Forest, the front-end for hyperparameter tuning must look like this:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/dfa2f4ee-b538-4464-ba17-01ca307ee7a6.PNG"></center>

Recall the syntax for creating object of Random forest classifier:

`RandomForestClassifier(n_estimators, max_depth, n_jobs = -1)`

where,
   - `n_estimators` is the number of trees in the forest.
   - `max_depth` is the maximum depth of each tree.
   - `n_jobs = - 1`. For the time being, ignore the reason behind providing the `n_jobs = -1` parameter as an input.

For our app, we will ask the user to input the `n_estimators` and `max_depth` values using `st.number_input()` widget.

When the user clicks on `Classify` button, the set of tasks that must be performed is same as that performed for SVM classsifer.

Follow the steps given below to implement Random Forest Classifier with hyperparameter tuning:

- If `classifier == 'Random Forest Classifier'`,
  
  1. Display a subheader with label `"Model Hyperparameters:"` in the sidebar.
  
  2. Accept the value of hyperparameters i.e `n_estimators` and `max_depth` using `st.sidebar.number_input()` function. Consider the minimum and maximum values for `n_estimators` to be `100` and `5000` respectively. Consider the minimum and maximum values for `max_depth` to be `1` and `20` respectively.
  
  3. Store the current value of these hyperparameters in two different variables `n_estimators_input` and `max_depth_input`.
  
  4. If the user clicks on `Classify` button created in sidebar:
     
     - Create an object of `RandomForestClassifier` class and store it in a variable, say `rf_clf`. Pass the hyperparameter values to its constructor as follows:
      ```python
      RandomForestClassifier(n_estimators = n_estimators_input, max_depth=max_depth_input, n_jobs = -1)
      ```
     - Call the `fit()` function on `RandomForestClassifier` object created above with the train set as inputs.
     - Determine the accuracy of the model by calling the `score()` function on the test set.
     - Call the `prediction()` function and pass the `RandomForestClassifier` object i.e `rf_clf` and input widgets values to this function. Store the glass type returned by this function in the `glass_type` variable.
     - Print the predicted glass type (`glass_type`) and accuracy score of the model.
     - Call `plot_confusion_matrix` function with `rfc_model` and test set as inputs. Use `st.pyplot()` after `plot_confusion_matrix()` function.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# S5.1: Implement Random Forest Classifier with hyperparameter tuning.
# if classifier == 'Random Forest Classifier', ask user to input the values of 'n_estimators' and 'max_depth'.

    # If the user clicks 'Classify' button, perform prediction and display accuracy score and confusion matrix.
    # This 'if' statement must be inside the above 'if' statement.


After adding the above Streamlit code and rerunning the entire app, it will look like this:
<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/abe0cc6d-364f-4034-bf3d-97d6a540effa.PNG"></center>

In the above app, the user can adjust the hyperparameter values to achieve the desired accuracy.


**Note to the Teacher:**
You can download the entire `glass_type_app.py` file from the link given below:

https://drive.google.com/file/d/15LtVOA6S5K3tahGitpcuZ71YC6y1NDek

Let's stop here. You will learn to add Logistic Regression classifier as well in the next class. Further, you can host the web app on Streamlit Sharing or Heroku using the process you learnt in the previous classes.

---

### **Project**
You can now attempt the **Applied Tech. Project 95 - Streamlit Widgets II** on your own.

**Applied Tech. Project 95 - Streamlit Widgets II**: https://colab.research.google.com/drive/1LyiJiGzlhTQLskXk9E-ZMWejk50NM9aL

---

In [None]:
# S1.1: Remove the multiselect widgets for histograms and box plots and add a new multiselect widget to choose a type of visualisation.
# Sidebar subheader for scatter plot
st.sidebar.subheader("Scatter Plot")

# Remove deprecation warning.
st.set_option('deprecation.showPyplotGlobalUse', False)

# Choosing x-axis values for scatter plots.
features_list = st.sidebar.multiselect("Select the x-axis values:",
                                        ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))
# Creating scatter plots.
for feature in features_list:
    st.subheader(f"Scatter plot between {feature} and GlassType")
    plt.figure(figsize = (12, 6))
    sns.scatterplot(x = feature, y = 'GlassType', data = glass_df)
    st.pyplot()

# Remove the code blocks for histogram and box plots.

# Add a subheader in the sidebar with label "Visualisation Selector"
st.sidebar.subheader("Visualisation Selector")

# Add a multiselect in the sidebar with label 'Select the Charts/Plots:'
# and with 6 options passed as a tuple ('Histogram', 'Box Plot', 'Count Plot', 'Pie Chart', 'Correlation Heatmap', 'Pair Plot').
# Store the current value of this widget in a variable 'plot_types'.
plot_types = st.sidebar.multiselect("Select the charts or plots:",
                                    ('Histogram', 'Box Plot', 'Count Plot', 'Pie Chart', 'Correlation Heatmap', 'Pair Plot'))


if 'Histogram' in plot_types:
    st.subheader("Histogram")
    columns = st.sidebar.selectbox("Select the column to create its histogram",
                                  ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))
    # Note: Histogram is generally created for continous values not for discrete values.
    plt.figure(figsize = (12, 6))
    plt.title(f"Histogram for {columns}")
    plt.hist(glass_df[columns], bins = 'sturges', edgecolor = 'black')
    st.pyplot()
if 'Box Plot' in plot_types:
    st.subheader("Box Plot")
    columns = st.sidebar.selectbox("Select the column to create its box plot",
                                  ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'GlassType'))
    plt.figure(figsize = (12, 2))
    plt.title(f"Box plot for {columns}")
    sns.boxplot(glass_df[columns])
    st.pyplot()
if 'Count Plot' in plot_types:
    st.subheader("Count plot")
    sns.countplot(x = 'GlassType', data = glass_df)
    st.pyplot()

# Create pie chart using the 'matplotlib.pyplot' module and the 'st.pyplot()' function.
if 'Pie Chart' in plot_types:
    st.subheader("Pie Chart")
    pie_data = glass_df['GlassType'].value_counts()
    plt.figure(figsize = (5, 5))
    plt.pie(pie_data, labels = pie_data.index, autopct = '%1.2f%%',
            startangle = 30, explode = np.linspace(.06, .16, 6))
    st.pyplot()

# Display correlation heatmap using the 'seaborn' module and the 'st.pyplot()' function.
if 'Correlation Heatmap' in plot_types:
    st.subheader("Correlation Heatmap")
    plt.figure(figsize = (10, 6))
    ax = sns.heatmap(glass_df.corr(), annot = True) # Creating an object of seaborn axis and storing it in 'ax' variable
    bottom, top = ax.get_ylim() # Getting the top and bottom margin limits.
    ax.set_ylim(bottom + 0.5, top - 0.5) # Increasing the bottom and decreasing the top margins respectively.
    st.pyplot()

# Display pair plots using the the 'seaborn' module and the 'st.pyplot()' function.
if 'Pair Plot' in plot_types:
    st.subheader("Pair Plots")
    plt.figure(figsize = (15, 15))
    sns.pairplot(glass_df)
    st.pyplot()

st.sidebar.subheader("Select your values:")
ri = st.sidebar.slider("Input Ri", float(glass_df['RI'].min()), float(glass_df['RI'].max()))
na = st.sidebar.slider("Input Na", float(glass_df['Na'].min()), float(glass_df['Na'].max()))
mg = st.sidebar.slider("Input Mg", float(glass_df['Mg'].min()), float(glass_df['Mg'].max()))
al = st.sidebar.slider("Input Al", float(glass_df['Al'].min()), float(glass_df['Al'].max()))
si = st.sidebar.slider("Input Si", float(glass_df['Si'].min()), float(glass_df['Si'].max()))
k = st.sidebar.slider("Input K", float(glass_df['K'].min()), float(glass_df['K'].max()))
ca = st.sidebar.slider("Input Ca", float(glass_df['Ca'].min()), float(glass_df['Ca'].max()))
ba = st.sidebar.slider("Input Ba", float(glass_df['Ba'].min()), float(glass_df['Ba'].max()))
fe = st.sidebar.slider("Input Fe", float(glass_df['Fe'].min()), float(glass_df['Fe'].max()))

In [None]:
# S2.1: Add 9 slider widgets for accepting user input for 9 features.
st.sidebar.subheader("Select your values:")
ri = st.sidebar.slider("Input Ri", float(glass_df['RI'].min()), float(glass_df['RI'].max()))
na = st.sidebar.slider("Input Na", float(glass_df['Na'].min()), float(glass_df['Na'].max()))
mg = st.sidebar.slider("Input Mg", float(glass_df['Mg'].min()), float(glass_df['Mg'].max()))
al = st.sidebar.slider("Input Al", float(glass_df['Al'].min()), float(glass_df['Al'].max()))
si = st.sidebar.slider("Input Si", float(glass_df['Si'].min()), float(glass_df['Si'].max()))
k = st.sidebar.slider("Input K", float(glass_df['K'].min()), float(glass_df['K'].max()))
ca = st.sidebar.slider("Input Ca", float(glass_df['Ca'].min()), float(glass_df['Ca'].max()))
ba = st.sidebar.slider("Input Ba", float(glass_df['Ba'].min()), float(glass_df['Ba'].max()))
fe = st.sidebar.slider("Input Fe", float(glass_df['Fe'].min()), float(glass_df['Fe'].max()))


st.sidebar.subheader("Choose Classifier")

# Add a selectbox in the sidebar with label 'Classifier'.
# and with 2 options passed as a tuple ('Support Vector Machine', 'Random Forest Classifier').
# Store the current value of this slider in a variable 'classifier'.

classifier = st.sidebar.selectbox("Classifier",
                                 ('Support Vector Machine', 'Random Forest Classifier', 'Logistic Regression'))

# S4.1: Implement SVM with hyperparameter tuning
# Import 'plot_confusion_matrix' module.
from sklearn.metrics import plot_confusion_matrix

# if classifier =='Support Vector Machine', ask user to input the values of 'C','kernel' and 'gamma'.
if classifier == 'Support Vector Machine':
    st.sidebar.subheader("Model Hyperparameters")
    c_value = st.sidebar.number_input("C (Error Rate)", 1, 100, step = 1)
    kernel_input = st.sidebar.radio("Kernel", ("linear", "rbf", "poly"))
    gamma_input = st. sidebar.number_input("Gamma", 1, 100, step = 1)

    # If the user clicks 'Classify' button, perform prediction and display accuracy score and confusion matrix.
    # This 'if' statement must be inside the above 'if' statement.
    if st.sidebar.button('Classify'):
        st.subheader("Support Vector Machine")
        svc_model=SVC(C = c_value, kernel = kernel_input, gamma = gamma_input)
        svc_model.fit(X_train,y_train)
        y_pred = svc_model.predict(X_test)
        accuracy = svc_model.score(X_test, y_test)
        glass_type = prediction(svc_model, ri, na, mg, al, si, k, ca, ba, fe)
        st.write("The Type of glass predicted is:", glass_type)
        st.write("Accuracy", accuracy.round(2))
        plot_confusion_matrix(svc_model, X_test, y_test)
        st.pyplot()
    # S5.1: Implement Random Forest Classifier with hyperparameter tuning.
    # if classifier == 'Random Forest Classifier', ask user to input the values of 'n_estimators' and 'max_depth'.
    if classifier =='Random Forest Classifier':
        st.sidebar.subheader("Model Hyperparameters")
        n_estimators_input = st.sidebar.number_input("Number of trees in the forest", 100, 5000, step = 10)
        max_depth_input = st.sidebar.number_input("Maximum depth of the tree", 1, 100, step = 1)

        # If the user clicks 'Classify' button, perform prediction and display accuracy score and confusion matrix.
        # This 'if' statement must be inside the above 'if' statement.
        if st.sidebar.button('Classify'):
            st.subheader("Random Forest Classifier")
            rf_clf= RandomForestClassifier(n_estimators = n_estimators_input, max_depth = max_depth_input, n_jobs = -1)
            rf_clf.fit(X_train,y_train)
            accuracy = rf_clf.score(X_test, y_test)
            glass_type = prediction(rf_clf, ri, na, mg, al, si, k, ca, ba, fe)
            st.write("The Type of glass predicted is:", glass_type)
            st.write("Accuracy", accuracy.round(2))
            plot_confusion_matrix(rf_clf, X_test, y_test)
            st.pyplot()