<a href="https://colab.research.google.com/github/twisha-k/Python_notes/blob/main/96_coding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Lesson 96 -  Multipage Streamlit App I

### Teacher-Student Activities

In the previous class, we added various functionalities to the glass type prediction app such as data exploration and SVM and Random Forest Classifier implementation. In this process, we explored many useful Streamlit widgets which not only helped us in visualising data but also proved useful in performing hyperparameter tuning smoothly.  

Continuing from the previous class, today we will add Logistic Regression classifier to our app with hyperparameter tuning.
Further, we will start learning how to build a multi-page web app using Streamlit.


Let's quickly go through the activities covered in the previous class and begin this class from **Activity 1: Implementing Logistic Regression with Hyperparameter Tuning** section.



---

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# Importing the necessary Python modules.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import streamlit as st

from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import plot_confusion_matrix

# ML classifier Python modules
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Loading the dataset.
@st.cache()
def load_data():
    file_path = "glass-types.csv"
    df = pd.read_csv(file_path, header = None)
    # Droping the 0th column as it contains only the serial numbers.
    df.drop(columns = 0, inplace = True)
    column_headers = ['RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'GlassType']
    columns_dict = {}
    # Renaming columns with suitable column headers.
    for i in df.columns:
        columns_dict[i] = column_headers[i - 1]
        # Rename the columns.
        df.rename(columns_dict, axis = 1, inplace = True)

    return df

glass_df = load_data()

# Creating the features data-frame holding all the columns except the last column.
X = glass_df.iloc[:, :-1]

# Creating the target series that holds last column.
y = glass_df['GlassType']

# Spliting the data into training and testing sets.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state = 42)

@st.cache()
def prediction(model, ri, na, mg, al, si, k, ca, ba, fe):
    glass_type = model.predict([[ri, na, mg, al, si, k, ca, ba, fe]])
    glass_type = glass_type[0]
    if glass_type == 1:
        return "building windows float processed".upper()
    elif glass_type == 2:
        return "building windows non float processed".upper()
    elif glass_type == 3:
        return "vehicle windows float processed".upper()
    elif glass_type == 4:
        return "vehicle windows non float processed".upper()
    elif glass_type == 5:
        return "containers".upper()
    elif glass_type == 6:
        return "tableware".upper()
    else:
        return "headlamps".upper()

# Add title on the main page and in the sidebar.
st.title("Glass Type Predictor")
st.sidebar.title("Exploratory Data Analysis")

# Using if statement, display raw data on the click of the checkbox.
if st.sidebar.checkbox("Show raw data"):
    st.subheader("Full Dataset")
    st.dataframe(glass_df)

# Sidebar for scatter plot
st.sidebar.subheader("Scatter Plot")

# Remove deprecation warning.
st.set_option('deprecation.showPyplotGlobalUse', False)

# Choosing x-axis values for scatter plots.
features_list = st.sidebar.multiselect("Select the x-axis values:",
                                        ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))
# Create scatter plots.
for feature in features_list:
    st.subheader(f"Scatter plot between {feature} and GlassType")
    plt.figure(figsize = (12, 6))
    sns.scatterplot(x = feature, y = 'GlassType', data = glass_df)
    st.pyplot()

# Add a multiselect widget to allow the user to select multiple visualisation.
# Add a subheader in the sidebar with label "Visualisation Selector"
st.sidebar.subheader("Visualisation Selector")

# Add a multiselect in the sidebar with label 'Select the charts or plots:'
# and pass the remaining 6 plot types as a tuple i.e. ('Histogram', 'Box Plot', 'Count Plot', 'Pie Chart', 'Correlation Heatmap', 'Pair Plot').
# Store the current value of this widget in a variable 'plot_types'.
plot_types = st.sidebar.multiselect("Select the charts or plots:",
                                    ('Histogram', 'Box Plot', 'Count Plot', 'Pie Chart', 'Correlation Heatmap', 'Pair Plot'))


# Display box plot using the 'matplotlib.pyplot' module and the 'st.pyplot()' function.
if 'Histogram' in plot_types:
    st.subheader("Histogram")
    columns = st.sidebar.selectbox("Select the column to create its histogram",
                                  ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe'))
    # Note: Histogram is generally created for continous values not for discrete values.
    plt.figure(figsize = (12, 6))
    plt.title(f"Histogram for {columns}")
    plt.hist(glass_df[columns], bins = 'sturges', edgecolor = 'black')
    st.pyplot()

# Create box plot using the 'seaborn' module and the 'st.pyplot()' function.
if 'Box Plot' in plot_types:
    st.subheader("Box Plot")
    columns = st.sidebar.selectbox("Select the column to create its box plot",
                                  ('RI', 'Na', 'Mg', 'Al', 'Si', 'K', 'Ca', 'Ba', 'Fe', 'GlassType'))
    plt.figure(figsize = (12, 2))
    plt.title(f"Box plot for {columns}")
    sns.boxplot(glass_df[columns])
    st.pyplot()

# Create count plot using the 'seaborn' module and the 'st.pyplot()' function.
if 'Count Plot' in plot_types:
    st.subheader("Count plot")
    sns.countplot(x = 'GlassType', data = glass_df)
    st.pyplot()

# Create a pie chart using the 'matplotlib.pyplot' module and the 'st.pyplot()' function.
if 'Pie Chart' in plot_types:
    st.subheader("Pie Chart")
    pie_data = glass_df['GlassType'].value_counts()
    plt.figure(figsize = (5, 5))
    plt.pie(pie_data, labels = pie_data.index, autopct = '%1.2f%%',
            startangle = 30, explode = np.linspace(.06, .16, 6))
    st.pyplot()

# Display correlation heatmap using the 'seaborn' module and the 'st.pyplot()' function.
if 'Correlation Heatmap' in plot_types:
    st.subheader("Correlation Heatmap")
    plt.figure(figsize = (10, 6))
    ax = sns.heatmap(glass_df.corr(), annot = True) # Creating an object of seaborn axis and storing it in 'ax' variable
    bottom, top = ax.get_ylim() # Getting the top and bottom margin limits.
    ax.set_ylim(bottom + 0.5, top - 0.5) # Increasing the bottom and decreasing the bottom margins respectively.
    st.pyplot()

# Display pair plots using the the 'seaborn' module and the 'st.pyplot()' function.
if 'Pair Plot' in plot_types:
    st.subheader("Pair Plots")
    plt.figure(figsize = (15, 15))
    sns.pairplot(glass_df)
    st.pyplot()

# Add 9 slider widgets for accepting user input for 9 features.
st.sidebar.subheader("Select your values:")
ri = st.sidebar.slider("Input Ri", float(glass_df['RI'].min()), float(glass_df['RI'].max()))
na = st.sidebar.slider("Input Na", float(glass_df['Na'].min()), float(glass_df['Na'].max()))
mg = st.sidebar.slider("Input Mg", float(glass_df['Mg'].min()), float(glass_df['Mg'].max()))
al = st.sidebar.slider("Input Al", float(glass_df['Al'].min()), float(glass_df['Al'].max()))
si = st.sidebar.slider("Input Si", float(glass_df['Si'].min()), float(glass_df['Si'].max()))
k = st.sidebar.slider("Input K", float(glass_df['K'].min()), float(glass_df['K'].max()))
ca = st.sidebar.slider("Input Ca", float(glass_df['Ca'].min()), float(glass_df['Ca'].max()))
ba = st.sidebar.slider("Input Ba", float(glass_df['Ba'].min()), float(glass_df['Ba'].max()))
fe = st.sidebar.slider("Input Fe", float(glass_df['Fe'].min()), float(glass_df['Fe'].max()))

# Add a subheader and multiselect widget.
# Add a subheader in the sidebar with label "Choose Classifier"
st.sidebar.subheader("Choose Classifier")

# Add a selectbox in the sidebar with label 'Classifier'.
# and with 2 options passed as a tuple ('Support Vector Machine', 'Random Forest Classifier').
# Store the current value of this slider in a variable 'classifier'.
classifier = st.sidebar.selectbox("Classifier",
                                 ('Support Vector Machine', 'Random Forest Classifier', 'Logistic Regression'))

# Implement SVM with hyperparameter tuning
# if classifier == 'Support Vector Machine', ask user to input the values of 'C','kernel' and 'gamma'.
if classifier == 'Support Vector Machine':
    st.sidebar.subheader("Model Hyperparameters")
    c_value = st. sidebar.number_input("C (Error Rate)", 1, 100, step = 1)
    kernel_input = st.sidebar.radio("Kernel", ("linear", "rbf", "poly"))
    gamma_input = st. sidebar.number_input("Gamma", 1, 100, step = 1)

    if st.sidebar.button('Classify'):
        st.subheader("Support Vector Machine")
        svc_model=SVC(C = c_value, kernel = kernel_input, gamma = gamma_input)
        svc_model.fit(X_train,y_train)
        y_pred = svc_model.predict(X_test)
        accuracy = svc_model.score(X_test, y_test)
        glass_type = prediction(svc_model, ri, na, mg, al, si, k, ca, ba, fe)
        st.write("The Type of glass predicted is:", glass_type)
        st.write("Accuracy", accuracy.round(2))
        plot_confusion_matrix(svc_model, X_test, y_test)
        st.pyplot()

# Random Forest Classifier
if classifier == 'Random Forest Classifier':
    st.sidebar.subheader("Model Hyperparameters")
    n_estimators_input = st.sidebar.number_input("Number of trees in the forest", 100, 5000, step = 10)
    max_depth_input = st.sidebar.number_input("Maximum depth of the tree", 1, 100, step = 1)

    if st.sidebar.button('Classify'):
        st.subheader("Random Forest Classifier")
        rf_clf = RandomForestClassifier(n_estimators = n_estimators_input, max_depth = max_depth_input, n_jobs = -1)
        rf_clf.fit(X_train,y_train)
        accuracy = rf_clf.score(X_test, y_test)
        glass_type = prediction(rf_clf, ri, na, mg, al, si, k, ca, ba, fe)
        st.write("The Type of glass predicted is:", glass_type)
        st.write("Accuracy", accuracy.round(2))
        plot_confusion_matrix(rf_clf, X_test, y_test)
        st.pyplot()

---

#### Activity 1: Implementing Logistic Regression with Hyperparameter Tuning

For Logistic Regression, the front-end for hyperparameter tuning must look like this:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/7fd71f2c-ca3c-4437-997c-8a26eb96bbf1.PNG"/></center>

Recall the syntax for creating object of Logistic Regression:

`LogisticRegression(C, max_iter)`

where,
   - `C` is the error rate (same as the one used in Support Vector Machine).
   - `max_iter` is the maximum number of iterations taken by the solvers during model fitting. Default value is `100`.
  

For our app, we will ask the user to input the `C` and `max_iter` values using `st.number_input()` and `st.slider()` widgets.


Follow the steps given below to implement Logistic Regression with hyperparameter tuning:

- If `classifier == 'Logistic Regression'`,
  
  1. Display a subheader with label `"Model Hyperparameters:"` in the sidebar.
  
  2. Accept the value of hyperparameters i.e `C`  using `st.sidebar.number_input()` function and `max_iter` using `st.sidebar.slider()` function. The default mininum values of `C` and `max_iter` should be `1` and `100` respectively.
  
  3. Store the current value of these hyperparameters in two different variables `c_value` and `max_iter_input`.
  
  4. If the user clicks on `Classify` button created in sidebar:
     
     - Create an object of `LogisticRegression` class and store it in a variable, say `log_reg`. Pass the hyperparameter values to its constructor as follows:
      ```python
      LogisticRegression(C = c_value, max_iter = max_iter_input)
      ```
     - Call the `fit()` function on `LogisticRegression` object created above with the train set as inputs.
     - Determine the accuracy of the model by calling the `score()` function on the test set.
     - Call the `prediction()` function and pass the `LogisticRegression` object i.e `log_reg` and input widgets values to this function. Store the glass type returned by this function in the `glass_type` variable.
     - Print the predicted glass type (`glass_type`) and accuracy score of the model.
     - Call `plot_confusion_matrix` function with `log_reg` and test set as inputs. Use `st.pyplot()` after `plot_confusion_matrix()` function.

**Note:** Don't run the code shown below. It will throw an error.

In [None]:
# S1.1: Implement Logistic Regression with hyperparameter tuning
if classifier == 'Logistic Regression':
    st.sidebar.subheader("Model Hyperparameters")
    z = st.sidebar.number_input("c", 1, 100, step = 1)
    max_iter_input = st.sidebar.number_input("Maximum itrations", 10, 1000, step = 10)

    if st.sidebar.button('Classify'):
        st.subheader("Logistic regression")
        log_reg = LogisticRegression(C = c_value, max_iter = max_iter_input)
        log_reg.fit(X_train, y_train)
        accuracy = log_reg.score(X_test, y_test)
        glass_type = prediction(log_reg, ri, na, mg, al, si, k, ca, ba, fe)
        st.write("The Type of glass predicted is:", glass_type)
        st.write("Accuracy", accuracy.round(2))
        plot_confusion_matrix(log_reg, X_test, y_test)
        st.pyplot()

After adding the above Streamlit code and rerunning the entire app, it will look like this:
<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/9f401dd8-3fd3-4969-905b-e8f50eb3b40c.PNG"/></center>


In the above app, a user can adjust the hyperparameter values to achieve the desired accuracy.

Hence, we have completed the entire code needed for creating our glass type prediction web app.

Further, you can host the web app on Streamlit Sharing or Heroku using the process you learnt in the previous classes. After hosting it on Heroku, your app should look like the following:

https://glass-type-predictor.herokuapp.com/

Here's the GitHub repository for the same:

https://github.com/srahuliitb/glass-type-predictor

You may deploy it on Streamlit sharing service as well.

Now, let us develop a new Streamlit multi-page web app to implement a regression problem. Before that, let's quickly go through the dataset used for this regression problem.


---

#### Car Price Prediction - Problem Statement

Recall the linear regression model that you had built in one of your previous classes wherein you predicted the price of cars based on its technical specifications such as car manufacturer, its engine capacity, fuel efficiency, body-type etc.


**Dataset Description:**

The dataset contains 205 rows and 26 columns. Each column represents an attribute of a car as described in the table below.

|Sr No.|Attribute|Attribute Information|
|-|-|-|
|1|Car_ID|Unique id of each car (Integer)|
|2|Symboling|Assigned insurance risk rating; a value of +3 indicates that the car is risky; -3 suggests that it is probably a safe car (Categorical)|
|3|carCompany|Name of car company (Categorical)|
|4|fueltype| fuel-type i.e. petrol or diesel (Categorical)|
|5|aspiration|Aspiration used in a car (Categorical)|
|6|doornumber|Number of doors in a car (Categorical)|
|7|carbody|Body-type of a car (Categorical)|
|8|drivewheel|Type of drive wheel (Categorical)|
|9|enginelocation|Location of car engine (Categorical)|
|10|wheelbase|Weelbase of car (Numeric)|
|11|carlength|Length of car (Numeric)|
|12|carwidth|Width of car (Numeric)|
|13|carheight|Height of car (Numeric)|
|14|curbweight|The weight of a car without occupants or baggage (Numeric)|
|15|enginetype|Type of engine (Categorical)|
|16|cylindernumber|Number of cylinders placed in the car engine (Categorical)|
|17|enginesize|Capacity of an engine (Numeric)|
|18|fuelsystem|Fuel system of a car (Categorical)|
|19|boreratio|Bore ratio of car (Numeric)|
|20|stroke|Stroke or volume inside the engine (Numeric)|
|21|compressionratio|Compression ratio of an engine (Numeric)|
|22|horsepower|Power output of an engine (Numeric)|
|23|peakrpm|Peak revolutions per minute (Numeric)|
|24|citympg|Mileage in city (Numeric)|
|25|highwaympg|Mileage on highway (Numeric)|
|26|price(Dependent variable)|Price of a car (Numeric)|


**Dataset source:** https://archive.ics.uci.edu/ml/datasets/Automobile

**Credits:** `Dua, D., & Graff, C.. (2017). UCI Machine Learning Repository`

For ease of implementation, we will use only following four features that we had obtained after performing feature selection using RFE (Recursive Feature Elimination) in one of the previous lessons:

|Sr No.|Features|Target|
|-|-|-|
|1|`'carwidth'`|price|
|2|`'enginesize'`|
|3|`'horsepower'`|
|4|`'drivewheel_fwd'`|
|5|`'car_company_buick'`|

Let's start creating a multi-page application to implement the above regression problem using Streamlit.







---

#### Activity 2: Main Page Configuration

A multipage application is one is a web having having multiple web pages linked with the home page. The different web apps have their own widgets. To get a better understanding, let's learn create a web app having three different web pages using Streamlit using radio buttons.

This web app will do the following:

1. Display the name of the app and its brief description.

2. Display raw data and provide data description through one of the pages.

3. Create different types of charts or plots to find a pattern (if exists) in the data through another web page

4. Build an ML model to make predictions through the final web page.

To do this we will need to create 5 Python (`.py`) scripts. The four of them will do the above mentioned tasks and  the fifth one  (**main page**) will act as an anchor for the four web pages.

Let's first set the title of the home page that will appear on the browser tab. For this, you need the `set_page_config()` function of the `streamlit` module.

The main objective of the `set_page_config()` function configures the default settings of the page. Its **syntax** is:

```python
streamlit.set_page_config(page_title = None,
                          page_icon = None,
                          layout = 'centered',
                          initial_sidebar_state = 'auto')
```

- **`page_title` (`str` or `None`):** The page title, shown in the browser tab. If set to `None`, defaults to the filename of the script (eg., `app.py` would show `app • Streamlit`).

- **`page_icon` (Anything supported by `st.image` or `str` or `None`):** The page favicon. Besides the types supported by `st.image` (like URLs or numpy arrays), you can pass in an emoji as a string (`"🦈"`) or a shortcode (`":shark:"`). If you're feeling lucky, try `"random"` for a random emoji! Emoji icons are courtesy of [Twemoji](https://twemoji.twitter.com/) and loaded from [MaxCDN](https://www.stackpath.com/maxCDN).

- **`layout` (`"centered"` or `"wide"`):** How the page content should be laid out. Defaults to `"centered"`, which constraints the elements into a centered column of fixed width; `"wide"` uses the entire screen.

- **`initial_sidebar_state` (`"auto"` or `"expanded"` or `"collapsed"`):** How the sidebar should start out. Defaults to `"auto"`, which hides the sidebar on mobile-sized devices, and shows it otherwise. `"expanded"` shows the sidebar initially; `"collapsed"` hides it.

**Note:** The `set_page_config()` **must be the first** Streamlit function used in your app (of course after importing the `streamlit` module), and **must only be set once**.

So now create the main python file `main_app.py` in Sublime editor and save it in a new folder (say **multipage**). In the `main_app.py` file:

- Import the `streamlit` module

- Configure your main page by setting its title and icon that will be displayed in a browser tab.

- Load the dataset.

- Create a navigation bar to navigate through multiple pages.

**Note:** Do not run the code shown below in Google Colab. It will throw an error.

In [None]:
# S2.1: Configure your home page by setting its title and icon that will be displayed in a browser tab.
# Importing the necessary Python modules.

# Configure your home page.


---

#### Activity 3: Loading Data

In the `main_app.py` file, import the `numpy` and `pandas` modules and create a function (say `load_data()`) that loads the dataset and returns a Pandas data-frame. The body of the function should contain the following code that you already created in one of the previous lesson(s):

```python
# Reading the dataset
    cars_df = pd.read_csv("car-prices.csv")
    # Extract the name of the manufactures from the car names and display the first 25 cars to verify whether names are extracted successfully.
    car_companies = pd.Series([car.split(" ")[0] for car in cars_df['CarName']], index = cars_df.index)
    # Create a new column named 'car_company'. It should store the company names of a the cars.
    cars_df['car_company'] = car_companies
    # Replace the misspelled 'car_company' names with their correct names.
    cars_df.loc[(cars_df['car_company'] == "vw") | (cars_df['car_company'] == "vokswagen"), 'car_company'] = 'volkswagen'
    cars_df.loc[cars_df['car_company'] == "porcshce", 'car_company'] = 'porsche'
    cars_df.loc[cars_df['car_company'] == "toyouta", 'car_company'] = 'toyota'
    cars_df.loc[cars_df['car_company'] == "Nissan", 'car_company'] = 'nissan'
    cars_df.loc[cars_df['car_company'] == "maxda", 'car_company'] = 'mazda'
    cars_df.drop(columns= ['CarName'], axis = 1, inplace = True)
    cars_numeric_df = cars_df.select_dtypes(include = ['int64', 'float64'])
    cars_numeric_df.drop(columns = ['car_ID'], axis = 1, inplace = True)
    # Map the values of the 'doornumber' and 'cylindernumber' columns to their corresponding numeric values.
    cars_df[['cylindernumber', 'doornumber']] = cars_df[['cylindernumber', 'doornumber']].apply(num_map, axis = 1)
    # Create dummy variables for the 'carbody' columns.
    car_body_dummies = pd.get_dummies(cars_df['carbody'], dtype = int)
    # Create dummy variables for the 'carbody' columns with 1 column less.
    car_body_new_dummies = pd.get_dummies(cars_df['carbody'], drop_first = True, dtype = int)
    # Create a DataFrame containing all the non-numeric type features.
    cars_categorical_df = cars_df.select_dtypes(include = ['object'])
    #Get dummy variables for all the categorical type columns using the dummy coding process.
    cars_dummies_df = pd.get_dummies(cars_categorical_df, drop_first = True, dtype = int)
    #  Drop the categorical type columns from the 'cars_df' DataFrame.
    cars_df.drop(list(cars_categorical_df.columns), axis = 1, inplace = True)
    # Concatenate the 'cars_df' and 'cars_dummies_df' DataFrames.
    cars_df = pd.concat([cars_df, cars_dummies_df], axis = 1)
    #  Drop the 'car_ID' column
    cars_df.drop('car_ID', axis = 1, inplace = True)
    final_columns = ['carwidth', 'enginesize', 'horsepower', 'drivewheel_fwd', 'car_company_buick', 'price']
```

Also, add the following dictionary and function before the `load_data()` function.

```python
# Dictionary containing positive integers in the form of words as keys and corresponding former as values.
words_dict = {"two": 2, "three": 3, "four": 4, "five": 5, "six": 6, "eight": 8, "twelve": 12}
def num_map(series):
    return series.map(words_dict)
```

**Note:** Do not run the code shown below. It will thrown an error.

In [None]:
# S3.1: Create a function, say, 'load_data()' in the 'main_app.py' file to load the dataset.
import numpy as np
import pandas as pd

# Dictionary containing positive integers in the form of words as keys and corresponding former as values.
words_dict = {"two": 2, "three": 3, "four": 4, "five": 5, "six": 6, "eight": 8, "twelve": 12}
def num_map(series):
    return series.map(words_dict)

# Loading the dataset.
@st.cache()
def load_data():
    # Reading the dataset
    cars_df = pd.read_csv("car-prices.csv")
    # Extract the name of the manufactures from the car names and display the first 25 cars to verify whether names are extracted successfully.
    car_companies = pd.Series([car.split(" ")[0] for car in cars_df['CarName']], index = cars_df.index)
    # Create a new column named 'car_company'. It should store the company names of a the cars.
    cars_df['car_company'] = car_companies
    # Replace the misspelled 'car_company' names with their correct names.
    cars_df.loc[(cars_df['car_company'] == "vw") | (cars_df['car_company'] == "vokswagen"), 'car_company'] = 'volkswagen'
    cars_df.loc[cars_df['car_company'] == "porcshce", 'car_company'] = 'porsche'
    cars_df.loc[cars_df['car_company'] == "toyouta", 'car_company'] = 'toyota'
    cars_df.loc[cars_df['car_company'] == "Nissan", 'car_company'] = 'nissan'
    cars_df.loc[cars_df['car_company'] == "maxda", 'car_company'] = 'mazda'
    cars_df.drop(columns= ['CarName'], axis = 1, inplace = True)
    cars_numeric_df = cars_df.select_dtypes(include = ['int64', 'float64'])
    cars_numeric_df.drop(columns = ['car_ID'], axis = 1, inplace = True)
    # Map the values of the 'doornumber' and 'cylindernumber' columns to their corresponding numeric values.
    cars_df[['cylindernumber', 'doornumber']] = cars_df[['cylindernumber', 'doornumber']].apply(num_map, axis = 1)
    # Create dummy variables for the 'carbody' columns.
    car_body_dummies = pd.get_dummies(cars_df['carbody'], dtype = int)
    # Create dummy variables for the 'carbody' columns with 1 column less.
    car_body_new_dummies = pd.get_dummies(cars_df['carbody'], drop_first = True, dtype = int)
    # Create a DataFrame containing all the non-numeric type features.
    cars_categorical_df = cars_df.select_dtypes(include = ['object'])
    #Get dummy variables for all the categorical type columns using the dummy coding process.
    cars_dummies_df = pd.get_dummies(cars_categorical_df, drop_first = True, dtype = int)
    #  Drop the categorical type columns from the 'cars_df' DataFrame.
    cars_df.drop(list(cars_categorical_df.columns), axis = 1, inplace = True)
    # Concatenate the 'cars_df' and 'cars_dummies_df' DataFrames.
    cars_df = pd.concat([cars_df, cars_dummies_df], axis = 1)
    #  Drop the 'car_ID' column
    cars_df.drop('car_ID', axis = 1, inplace = True)
    final_columns = ['carwidth', 'enginesize', 'horsepower', 'drivewheel_fwd', 'car_company_buick', 'price']
    return cars_df[final_columns]

final_cars_df = load_data()

**Note:** You have to store the `car-prices.csv` file in your computer in the same folder that contains the above Python script. You can download the `car-prices.csv` file from the link provided below.

https://s3-student-datasets-bucket.whjr.online/whitehat-ds-datasets/car-prices.csv


---

#### Activity 4: Pages Navigator

Now, let's add another widget to navigate through multiple web pages in web app as shown in the image below.

<img src="https://s3-whjr-v2-prod-bucket.whjr.online/d632d029-0970-4b30-8854-2e9c3794254e.png">

But before that let's create four empty Python files that are `home.py`, `data.py`, `plots.py` and `predict.py` inside the same folder that contains the `main_app.py` as well. The objective of creating the four files is the following:

- When a user selects the `Home` option, the `home.py` script will be rendered which contains the code to display only title and a brief description of the web app.

- When a user selects the `View Data` option, the `data.py` script will be rendered which contains the code to display raw data and provide data description.

- When a user selects the `Visualise Data` option, the `plots.py` script will be rendered which contains the code to create different types of charts or plots to find a pattern (if exists) in the data.

- When a user selects the `Predict` option, the `predict.py` script will be rendered which contains the code to build an ML model to make predictions.

To create this navigation bar, perform the following tasks:

1. Import the `home.py, data.py, plots.py` and `predict.py` files in the `main_app.py` files as `home`, `data`, `plots` and `predict` respectively.

2. Create a dictionary, say `pages_dict`, with keys being the label to be displayed in the navigation bar and values being the name of Python script to be rendered:

  ```python
  pages_dict = {
                "Home:" home,
                "View Data": data,
                "Visualise Data": plots,
                "Predict": predict
            }
  ```

3. Add a title in the sidebar with the label `'Navigation'`.

4. Add a radio button widget with label `'Go to'` and options as keys of the `pages_dict` dictionary. Pass these keys in the form of a list or a tuple as the options to the radio button widget can only be provided in the form of a list or a tuple.

5. Store the current value of this radio button widget in a variable `user_choice`.

6. Obtain the corresponding value of the key stored in `user_choice` variable by passing it to the `pages_dict` dictionary. Store the value obtained from dictionary in a variable, say `selected_page`. It will have any value amongst `home`, `data`, `plots` or `predict`.

7. Call a user defined function, say `app()` on `selected_page` with `final_cars_df` as its input (except for `home`). We will define the `app()` function in all the four files to perform their respective tasks as you will see shortly.

  For instance, if we select `'Visualise Data'` option in the navigation, the `app()` function inside the `plots.py` file will be called.

**Note:**

- Except for `home`, the `final_cars_df` DataFrame must be passed as an input to the `app()` function so that there is no need to load and create the DataFrame again on every page. Thus the `final_cars_df` is being transported from the main page to the respective python script.

- If you do not create four empty Python files (i.e. `home.py`, `data.py`, `plots.py` and `predict.py`), then the above code will throw `ModuleNotFoundError` as shown below:

  <center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/acf63f9e-55e5-4716-b811-0ebcda2ba290.PNG"></center>

  We will write the codes for the above files shortly.

- Do not run the code shown below in Google Colab. It will thrown an error.


In [None]:
# S4.1: Adding a navigation in the sidebar using radio buttons
# Import the individual Python files
import home
import data
import plots
import predict

# create a dictionary 'pages_dict'
pages_dict = {
                "Home": home,
                "View Data": data,
                "Visualise Data": plots,
                "Predict": predict
            }

# Add radio buttons in the sidebar for navigation and call the respective pages based on 'user_choice'.
st.sidebar.title('Navigation')
user_choice = st.sidebar.radio("Go to", tuple(pages_dict.keys()))
if user_choice == "Home":
    home.app() # The 'app()' function should not take any input if the selection option is "Home".
else:
    selected_page = pages_dict[user_choice]
    selected_page.app(final_cars_df)

After adding the above Streamlit code and rerunning the entire app, it will look as shown below:

<img src="https://s3-whjr-v2-prod-bucket.whjr.online/1d764910-d926-43af-9b5f-48aac2632302.png">

At this point, if you select any option, you will get `AttributeError`. This is because, the `app()` function is being called whenever the user selects a page, but this function is not yet defined in any of the individual pages. We will define this function while creating the individual python scripts (`home.py`, `data.py`, `plots.py`, `predict.py`).

Thus, we have completed the `main_app.py` file. You can download the `main_app.py` from the link given below:

https://drive.google.com/file/d/1xHzwCXZIuxiug3IfwMCk-Hgqwz9MkBMH/view?usp=sharing

---

#### Activity 5: Home Page Configuration

Now, let's define the our home page by writing Python code for the same so that it appears as shown below:

<center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/e9181831-7df6-424a-aa22-7520d057b4d5.png"></center>

To define the home page:
- Open the empty python file `home.py` that you had created in the previous activity.

- Import the `streamlit` module.

- Define the `app()` function which doesn't take any input and doesn't return anything. Inside this function:

  - Set the title for the home page using the `header()` of the `streamlit` module.

  - Write a brief description of your web app, say, `This web app allows a user to predict the prices of a car based on their engine size, horse power, dimensions and the drive wheel type parameters.` using the `text()` function of the `streamlit` module.


**Note:** Do not run the code shown below in Google Colab. It will thrown an error.

In [None]:
# S5.1: Configure the home as directed above.
import streamlit as st

def app():
	st.header("Car Price Prediction App")
	st.text("""
            This web app allows a user to predict the prices of a car based on their
            engine size, horse power, dimensions and the drive wheel type parameters.
        	""")

---

#### Activity 6: The `View Data` Page Configuration

Now, let's write a Python program to display raw data and data descriptions in the empty `data.py` file. When a user chooses `View Data` option through the navigation section in the sidebar, the `app()` function of the `data.py`file should get called in the `main_app.py` file and the `car_df` data-frame should be an input to the `app()` function. This function doesn't return anything.

You need to perform the following tasks inside the `app()` function of `data.py` file:

**1. Display the original dataset:**

- We have already used `st.dataframe()` function to display a scrollable and interactive DataFrame.

- However, instead of drawing a DataFrame as an interactive table, you may want to draw it as a static table. For this, use the `st.table()` function.
  
  **Syntax:**  `st.table(data)` where `data` is the pandas DataFrame.

- Use `st.beta_expander` to display or hide the DataFrame.

**Note:** Do not run the code shown below. It will throw an error.

In [None]:
# S6.1: Design the View Data page of the multipage app.
# Import necessary modules
import numpy as np
import pandas as pd
import streamlit as st

# Define a function 'app()' which accepts 'car_df' as an input.
def app(car_df):
    st.header("View Data")
    # Add an expander and display the dataset as a static table within the expander.
    with st.beta_expander("View Dataset"):
        st.table(car_df)

**2. Measures of central tendency**

You can display the mean, median, quartile and standard deviation values of the numeric columns of a dataset using the `table()` function of the Streamlit module. Hence,

  - Using the `subheader()` function of the Streamlit module, display `Column Description:` as text.

  - Display the descriptive statistics of the numeric columns of `car_df` data-frame.


In [None]:
# S6.2: Design the View Data page of the multipage app.
# Import necessary modules
import numpy as np
import pandas as pd
import streamlit as st

# Define a function 'app()' which accepts 'car_df' as an input.
def app(car_df):
    st.header("View Data")
    # Add an expander and display the dataset as a static table within the expander.
    with st.beta_expander("View Dataset"):
        st.table(car_df)

    # ADD NEW CODE HERE.


**3. Display widgets horizontally:**

While creating web apps, sometimes we wish to customise the horizontal layout of the web page such that multiple elements or widgets can appear side by side as shown below:
  
  <center><img src="https://s3-whjr-v2-prod-bucket.whjr.online/612c1abe-0c03-45b2-afc1-e790a3aafdc1.PNG"></center>

- In Streamlit, this is done using the `beta_columns` function. It allows you to make multiple containers appear side by side.

  **Syntax:** `st.beta_columns(spec)` where `spec` is an integer or a list of numbers.
    
    - If `spec` is an integer, the `beta_columns` function returns that many number of columns having equal width.
    
    For eg., `st.beta_columns(3)` will return three equal width columns.
    
    - If `spec` is a list of numbers, the `beta_columns` function returns a column for each number, and each column's width is proportional to the number provided. The numbers can be integers or floats, but they must be positive.

    For eg., `st.beta_columns([3, 1, 2])` creates 3 columns where the first column is 3 times the width of the second, and the last column is 2 times the width of the second.
  
- To add one or more elements to the columns, use `with` notation as follows:

  ```python
  beta_col1, beta_col2 = st.beta_columns(2)
  with beta_col1:
    # adding widgets to the first column
    ...
  with beta_col2:
    # adding widgets to the second column
    ...
  ```

- For our app, we will create two columns inside the `app()` function.
  
  - First column must display all columns names on the click of a checkbox
  
  - Second column must display all the rows of the selected column on the click of a checkbox as shown below:

    <center><img src = "https://s3-whjr-v2-prod-bucket.whjr.online/7bf48efc-30ab-4d70-af91-e62d01deeda8.gif"></center>

Let us create the horizontal layout as visible in the above gif using the `st.beta_columns()` function.

**Note:**

- Be careful with the indentation. The code for creating `beta_columns` must be strictly placed inside `app()` function created above.

- Do not run the code shown below. It will throw an error.

In [None]:
# S6.3: Divide the web page into three columns to add more widgets.
def app(car_df):
    # Displaying orginal dataset
    st.header("View Data")
    # Add an expander and display the dataset as a static table within the expander.
    with st.beta_expander("View Dataset"):
        st.table(car_df)

    # Display descriptive statistics.
    st.subheader("Columns Description:")
    if st.checkbox("Show summary"):
        st.table(car_df.describe())

    # ADD NEW CODE FROM HERE
    # Add a subheader and create three columns. Store the columns in two separate variables.
    beta_col1, beta_col2, beta_col3 = st.beta_columns(3)

    # Add a checkbox in the first column. Display the column names of 'car_df' on the click of checkbox.
    with beta_col1:
        if st.checkbox("Show all column names"):
            st.table(list(car_df.columns))

    # Add a checkbox in the second column. Display the column data-types of 'car_df' on the click of checkbox.
    with beta_col2:
        if st.checkbox("View column data-type"):
            st.table(car_df.dtypes)a

    # Add a checkbox in the third column followed by a selectbox which accepts the column name whose data needs to be displayed.
    with beta_col3:
        if st.checkbox("View column data"):
            column_data = st.selectbox('Select column', tuple(car_df.columns))
            st.write(car_df[column_data])


Hence, we have completed the entire code for `data.py` file. Now run your web app  by running the `main_app.py` file using the following command:

`streamlit run main_app.py`

You will see the following output:

<center>
<img src="https://s3-whjr-v2-prod-bucket.whjr.online/8583eed7-ff66-4cc5-a3ea-9ddaab722918.gif"/></center>

**Note to the Teacher:** You can download the entire `data.py` file from the link given below:

https://drive.google.com/file/d/1uMWpePAoAcqW8_Jc6qK9So8MiHz03sgr


We will stop here. In the next class, we will create the remaining two Python pages.

---

### **Project**
You can now attempt the **Applied Tech.Project 96 - Multipage Streamlit App I** on your own.
**Applied Tech.Project 96 - Multipage Streamlit App I**: https://colab.research.google.com/drive/1_2dMWdc8o-GTY0_ktUikE1aQD9Ia5C2d

---