# **BOOTCAMP @ GIKI (Content designed by Usama Arshad) WEEK 2**

**Lab sessions you will conduct:**

---



*   Lab 6: Implementing SVM, Decision Trees, and Evaluation Metrics (Day 6)
* Lab 7: Implementing Unsupervised Learning Algorithms (Day 7)
* Lab 8: Feature Engineering and Model Selection Techniques (Day 8)
* Lab 9: Implementing Regression Models (Day 9)
* **Lab 10: Implementing Classification Models (Day 10)**

### Logistic Regression

#### Introduction to Logistic Regression
Logistic Regression is a statistical method for analyzing datasets in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable (in which there are only two possible outcomes). Logistic regression is used to describe data and to explain the relationship between one dependent binary variable and one or more nominal, ordinal, interval, or ratio-level independent variables.

#### How Logistic Regression Works
- Logistic regression uses the logistic function to model a binary dependent variable.
- The logistic function, also called the sigmoid function, is an S-shaped curve that can take any real-valued number and map it into a value between 0 and 1.
- The logistic function is defined as: \( \sigma(t) = \frac{1}{1 + e^{-t}} \)

#### Steps in Logistic Regression
1. **Model Specification**: Define the logistic regression model.
2. **Parameter Estimation**: Use maximum likelihood estimation to estimate the parameters.
3. **Model Evaluation**: Evaluate the model using metrics such as accuracy, precision, recall, and the ROC curve.

#### Applications
- Medical field to determine the presence or absence of a disease.
- Credit scoring in finance to predict the probability of default.
- Marketing for predicting whether a customer will buy a product.

### K-Nearest Neighbors (KNN)

#### Introduction to K-Nearest Neighbors
K-Nearest Neighbors (KNN) is a simple, easy-to-implement supervised machine learning algorithm that can be used for both classification and regression problems. It is non-parametric, meaning it does not make any assumptions on the underlying data distribution.

#### How KNN Works
- The algorithm classifies a new data point based on its similarity to the points in the training set.
- It uses a distance metric (usually Euclidean distance) to find the k-nearest neighbors to the new data point.
- The new data point is then classified based on the majority class among its k-nearest neighbors.

#### Steps in KNN
1. **Choose the number of k**: The number of nearest neighbors to consider.
2. **Calculate the distance**: Compute the distance between the new data point and all the training data points.
3. **Find the nearest neighbors**: Identify the k-nearest neighbors to the new data point.
4. **Make predictions**: Classify the new data point based on the majority vote of its k-nearest neighbors.

#### Applications
- Recommender systems (e.g., suggesting movies or products).
- Pattern recognition (e.g., handwriting detection).
- Medical diagnosis.

### Support Vector Machines (SVM)

#### Introduction to Support Vector Machines
Support Vector Machines (SVM) are powerful supervised learning models used for classification and regression tasks. SVMs are particularly well-suited for complex but small-to-medium-sized datasets.

#### How SVM Works
- SVM constructs a hyperplane or set of hyperplanes in a high-dimensional space to separate different classes.
- The goal is to find a hyperplane that maximizes the margin between the classes. This hyperplane is known as the maximum margin hyperplane.
- SVM uses kernel functions to transform the data into a higher-dimensional space where a hyperplane can be used to separate the classes.

#### Steps in SVM
1. **Select a kernel function**: Choose a kernel function (e.g., linear, polynomial, RBF).
2. **Fit the SVM model**: Train the SVM model to find the maximum margin hyperplane.
3. **Make predictions**: Use the trained SVM model to classify new data points.

#### Applications
- Image classification.
- Text classification and spam detection.
- Bioinformatics for gene classification.

### Decision Trees and Random Forests

#### Introduction to Decision Trees
Decision Trees are a type of supervised learning algorithm used for classification and regression tasks. They work by splitting the data into subsets based on the value of input features.

#### How Decision Trees Work
- Decision trees split the data into branches based on feature values, creating a tree-like model of decisions.
- Each node represents a decision point, each branch represents the outcome of the decision, and each leaf node represents a final output (class or value).

#### Steps in Decision Trees
1. **Choose the best feature to split on**: Use criteria like Gini impurity or information gain to select the best feature.
2. **Split the data**: Divide the dataset into subsets based on the best feature.
3. **Repeat recursively**: Repeat the process for each subset until the stopping criterion is met (e.g., maximum depth, minimum samples per leaf).

#### Introduction to Random Forests
Random Forests are an ensemble learning method that combines multiple decision trees to improve the model's accuracy and reduce overfitting.

#### How Random Forests Work
- Random forests create multiple decision trees using different subsets of the data and features.
- Each tree makes a prediction, and the final output is determined by averaging (for regression) or majority voting (for classification) the predictions of all trees.

#### Steps in Random Forests
1. **Bootstrap sampling**: Create multiple subsets of the data by randomly sampling with replacement.
2. **Train decision trees**: Train a decision tree on each subset of data.
3. **Aggregate predictions**: Combine the predictions from all trees to make the final prediction.

#### Applications
- Fraud detection.
- Stock market analysis.
- Customer segmentation in marketing.


In [33]:
# Importing necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet, LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler, LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, classification_report
import ipywidgets as widgets
from IPython.display import display, Markdown
import io

# Global variables to hold data, model, and scalers
data = None
model = None
label_encoder = None
feature_widgets = {}

# Step 1: Data Collection
# Create a file upload widget
upload_button = widgets.FileUpload(description="Upload CSV", accept='.csv')

# Function to load the dataset
def load_dataset(change):
    global data
    uploaded_file = upload_button.value
    if uploaded_file:
        file_content = uploaded_file[list(uploaded_file.keys())[0]]['content']
        data = pd.read_csv(io.BytesIO(file_content))
        display(Markdown("### Dataset Information"))
        display(Markdown(f"**Number of instances:** {data.shape[0]}"))
        display(Markdown(f"**Number of features:** {data.shape[1]}"))
        display(Markdown("### First 5 Rows of the Dataset"))
        display(data.head())
        # Automatically select the last column as the target
        target_input.value = data.columns[-1]
        feature_dropdown.options = data.columns[:-1].tolist()
        columns_to_drop.options = data.columns.tolist()
        columns_to_fill.options = data.columns[data.isnull().any()].tolist()
        categorical_columns.options = data.select_dtypes(include=['object', 'category']).columns.tolist()
        update_missing_values_info()
        update_feature_widgets()

# Attach the load_dataset function to the file upload button
upload_button.observe(load_dataset, names='value')

# Display the upload button
display(Markdown("## Step 1: Data Collection"))
display(upload_button)

# Step 2: Data Preprocessing
# Create a dropdown widget for selecting the feature to visualize and use for modeling
feature_dropdown = widgets.Dropdown(
    options=[],
    description='Feature:',
    disabled=False,
)

# Create a text widget for the target column name (auto-filled after loading data)
target_input = widgets.Text(
    value='',
    description='Target:',
    disabled=False,
)

# Create a dropdown widget for selecting the preprocessing method
preprocess_dropdown = widgets.Dropdown(
    options=['None', 'Standard Scaler', 'Min-Max Scaler', 'Robust Scaler'],
    value='None',
    description='Preprocess:',
    disabled=False,
)

# Create a widget to select columns to drop
columns_to_drop = widgets.SelectMultiple(
    options=[],
    description='Drop Columns:',
    disabled=False,
)

# Create a widget to select columns to fill missing values
columns_to_fill = widgets.SelectMultiple(
    options=[],
    description='Fill Columns:',
    disabled=False,
)

# Create a dropdown for filling method
fill_method_dropdown = widgets.Dropdown(
    options=['Mean', 'Median', 'Mode'],
    value='Mean',
    description='Fill Method:',
    disabled=False,
)

# Create a widget to select categorical columns
categorical_columns = widgets.SelectMultiple(
    options=[],
    description='Categorical Columns:',
    disabled=False,
)

# Create a dropdown for selecting the encoding method
encoding_method_dropdown = widgets.Dropdown(
    options=['Label Encoding', 'One-Hot Encoding'],
    value='Label Encoding',
    description='Encoding Method:',
    disabled=False,
)

# Function to update missing values information
def update_missing_values_info():
    if data is not None:
        missing_info = data.isnull().sum()
        missing_info = missing_info[missing_info > 0]
        if not missing_info.empty:
            display(Markdown("### Missing Values Information"))
            display(missing_info)

# Function to visualize the dataset
def visualize_data(change=None):
    feature = feature_dropdown.value
    target = target_input.value
    if feature and target:
        X = data[feature].values  # Feature
        y = data[target].values  # Target

        plt.figure(figsize=(10, 6))
        plt.scatter(X, y, color='blue', label='Data points')
        plt.title(f'{feature} vs {target}')
        plt.xlabel(feature)
        plt.ylabel(target)
        plt.legend()
        plt.show()
        display(Markdown(f"### Visualization of {feature}"))
        display(Markdown(f"This scatter plot shows the relationship between the selected feature **{feature}** and the target variable **{target}**."))

# Button to visualize data
visualize_data_button = widgets.Button(description="Visualize Data")
visualize_data_button.on_click(visualize_data)

# Display the feature dropdown and target input
display(Markdown("## Step 2: Data Preprocessing"))
display(feature_dropdown)
display(target_input)
display(preprocess_dropdown)
display(columns_to_drop)
display(columns_to_fill)
display(fill_method_dropdown)
display(categorical_columns)
display(encoding_method_dropdown)
display(visualize_data_button)

# Function to drop selected columns
def drop_selected_columns(b):
    global data
    if columns_to_drop.value:
        data.drop(columns=list(columns_to_drop.value), inplace=True)
        display(Markdown("### Dropped Selected Columns"))
        display(data.head())

drop_columns_button = widgets.Button(description="Drop Columns")
drop_columns_button.on_click(drop_selected_columns)
display(drop_columns_button)

# Function to fill missing values
def fill_missing_values(b):
    global data
    if columns_to_fill.value:
        for column in columns_to_fill.value:
            if fill_method_dropdown.value == 'Mean':
                data[column].fillna(data[column].mean(), inplace=True)
            elif fill_method_dropdown.value == 'Median':
                data[column].fillna(data[column].median(), inplace=True)
            elif fill_method_dropdown.value == 'Mode':
                data[column].fillna(data[column].mode()[0], inplace=True)
        display(Markdown("### Filled Missing Values"))
        display(data.head())

fill_missing_values_button = widgets.Button(description="Fill Missing Values")
fill_missing_values_button.on_click(fill_missing_values)
display(fill_missing_values_button)

# Function to handle categorical data
def handle_categorical_data(b):
    global data
    categorical_cols = list(categorical_columns.value)
    if categorical_cols:
        if encoding_method_dropdown.value == 'Label Encoding':
            label_encoder = LabelEncoder()
            for col in categorical_cols:
                data[col] = label_encoder.fit_transform(data[col])
        elif encoding_method_dropdown.value == 'One-Hot Encoding':
            data = pd.get_dummies(data, columns=categorical_cols, drop_first=True)
        display(Markdown("### Handled Categorical Data"))
        display(data.head())

handle_categorical_data_button = widgets.Button(description="Handle Categorical Data")
handle_categorical_data_button.on_click(handle_categorical_data)
display(handle_categorical_data_button)

# Step 3: Choosing the Right Model
# Create a dropdown widget for selecting the model type (Regression or Classification)
model_type_dropdown = widgets.Dropdown(
    options=['Regression', 'Classification'],
    value='Regression',
    description='Model Type:',
    disabled=False,
)

# Create a dropdown widget for selecting the model
model_dropdown = widgets.Dropdown(
    options=[],
    description='Model:',
    disabled=False,
)

# Function to update the model dropdown based on the model type
def update_model_dropdown(change):
    if model_type_dropdown.value == 'Regression':
        model_dropdown.options = ['Linear Regression', 'Ridge', 'Lasso', 'Elastic Net']
    else:
        model_dropdown.options = ['Logistic Regression', 'KNN', 'SVM', 'Decision Tree', 'Random Forest']

model_type_dropdown.observe(update_model_dropdown, names='value')

# Display the model type dropdown and model dropdown
display(Markdown("## Step 3: Choosing the Right Model"))
display(model_type_dropdown)
display(model_dropdown)

# Step 4: Training the Model
# Create a float input widget for entering the alpha value (for regularization models)
alpha_input = widgets.FloatText(
    value=1.0,
    description='Alpha:',
    disabled=True,
)

# Update the visibility of alpha input based on the selected model
def update_alpha_input(change):
    if model_dropdown.value in ['Ridge', 'Lasso', 'Elastic Net']:
        alpha_input.disabled = False
    else:
        alpha_input.disabled = True

model_dropdown.observe(update_alpha_input, names='value')

# Display the alpha input
display(alpha_input)

# Function to perform Regularization (Ridge, Lasso, Elastic Net)
def perform_regularization():
    global X_train, X_test, y_train, y_test, model, y_pred
    feature = feature_dropdown.value
    target = target_input.value
    preprocess = preprocess_dropdown.value
    method = model_dropdown.value
    alpha = alpha_input.value

    X = data.drop(columns=[target]).values
    y = data[target].values

    # Preprocess the data
    if preprocess == 'Standard Scaler':
        scaler = StandardScaler()
        X = scaler.fit_transform(X)
    elif preprocess == 'Min-Max Scaler':
        scaler = MinMaxScaler()
        X = scaler.fit_transform(X)
    elif preprocess == 'Robust Scaler':
        scaler = RobustScaler()
        X = scaler.fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    if method == 'Ridge':
        model = Ridge(alpha=alpha)
    elif method == 'Lasso':
        model = Lasso(alpha=alpha)
    elif method == 'Elastic Net':
        model = ElasticNet(alpha=alpha)
    else:
        model = LinearRegression()

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

# Function to perform Classification
def perform_classification():
    global X_train, X_test, y_train, y_test, model, y_pred
    feature = feature_dropdown.value
    target = target_input.value
    preprocess = preprocess_dropdown.value
    method = model_dropdown.value

    X = data.drop(columns=[target]).values
    y = data[target].values

    # Preprocess the data
    if preprocess == 'Standard Scaler':
        scaler = StandardScaler()
        X = scaler.fit_transform(X)
    elif preprocess == 'Min-Max Scaler':
        scaler = MinMaxScaler()
        X = scaler.fit_transform(X)
    elif preprocess == 'Robust Scaler':
        scaler = RobustScaler()
        X = scaler.fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    if method == 'Logistic Regression':
        model = LogisticRegression()
    elif method == 'KNN':
        model = KNeighborsClassifier()
    elif method == 'SVM':
        model = SVC()
    elif method == 'Decision Tree':
        model = DecisionTreeClassifier()
    elif method == 'Random Forest':
        model = RandomForestClassifier()
    else:
        model = LogisticRegression()

    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)

# Button to train the model
def train_model(b):
    if model_type_dropdown.value == 'Regression':
        perform_regularization()
    else:
        perform_classification()

    if model is not None:
        display(Markdown("### Model Training Completed"))
        display(Markdown("### Training Data Shape:"))
        display(Markdown(f"**X_train shape:** {X_train.shape}"))
        display(Markdown(f"**y_train shape:** {y_train.shape}"))
        display(Markdown("### Test Data Shape:"))
        display(Markdown(f"**X_test shape:** {X_test.shape}"))
        display(Markdown(f"**y_test shape:** {y_test.shape}"))

train_model_button = widgets.Button(description="Train Model")
train_model_button.on_click(train_model)

# Display the train model button
display(Markdown("## Step 4: Training the Model"))
display(train_model_button)

# Step 5: Evaluating the Model
# Function to evaluate the model
def evaluate_model(b):
    global X_train, X_test, y_train, y_test, y_pred
    if model is None:
        display(Markdown("### Error: Model is not trained."))
        return

    if model_type_dropdown.value == 'Regression':
        mse = mean_squared_error(y_test, y_pred)
        r2 = r2_score(y_test, y_pred)

        display(Markdown("### Model Evaluation"))
        display(Markdown(f"**Mean Squared Error:** {mse}"))
        display(Markdown(f"**R^2 Score:** {r2}"))

        plt.figure(figsize=(10, 6))
        plt.scatter(X_train[:, 0], y_train, color='blue', label='Training data')
        plt.plot(X_train[:, 0], model.predict(X_train), color='red', label='Regression line')
        plt.title(f'{model_dropdown.value} Regression on Training Data')
        plt.xlabel(feature_dropdown.value)
        plt.ylabel(target_input.value)
        plt.legend()
        plt.show()

        plt.figure(figsize=(10, 6))
        plt.scatter(X_test[:, 0], y_test, color='blue', label='Testing data')
        plt.plot(X_test[:, 0], y_pred, color='red', label='Regression line')
        plt.title(f'{model_dropdown.value} Regression on Test Data')
        plt.xlabel(feature_dropdown.value)
        plt.ylabel(target_input.value)
        plt.legend()
        plt.show()
    else:
        accuracy = accuracy_score(y_test, y_pred)
        display(Markdown("### Model Evaluation"))
        display(Markdown(f"**Accuracy:** {accuracy}"))
        display(Markdown(f"**Classification Report:**\n {classification_report(y_test, y_pred)}"))

evaluate_model_button = widgets.Button(description="Evaluate Model")
evaluate_model_button.on_click(evaluate_model)

# Display the evaluate model button
display(Markdown("## Step 5: Evaluating the Model"))
display(evaluate_model_button)

# Step 6: Hyperparameter Tuning and Optimization
# Function to perform hyperparameter tuning
def perform_hyperparameter_tuning(b):
    feature = feature_dropdown.value
    target = target_input.value
    preprocess = preprocess_dropdown.value
    method = model_dropdown.value

    X = data.drop(columns=[target]).values
    y = data[target].values

    # Preprocess the data
    if preprocess == 'Standard Scaler':
        scaler = StandardScaler()
        X = scaler.fit_transform(X)
    elif preprocess == 'Min-Max Scaler':
        scaler = MinMaxScaler()
        X = scaler.fit_transform(X)
    elif preprocess == 'Robust Scaler':
        scaler = RobustScaler()
        X = scaler.fit_transform(X)

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

    if model_type_dropdown.value == 'Regression':
        if method == 'Ridge':
            param_grid = {'alpha': np.logspace(-4, 4, 50)}
            grid_search = GridSearchCV(Ridge(), param_grid, cv=5)
        elif method == 'Lasso':
            param_grid = {'alpha': np.logspace(-4, 4, 50)}
            grid_search = GridSearchCV(Lasso(), param_grid, cv=5)
        elif method == 'Elastic Net':
            param_grid = {'alpha': np.logspace(-4, 4, 50), 'l1_ratio': np.linspace(0, 1, 10)}
            grid_search = GridSearchCV(ElasticNet(), param_grid, cv=5)
        else:
            grid_search = None
    else:
        if method == 'Logistic Regression':
            param_grid = {'C': np.logspace(-4, 4, 50)}
            grid_search = GridSearchCV(LogisticRegression(), param_grid, cv=5)
        elif method == 'KNN':
            param_grid = {'n_neighbors': np.arange(1, 31)}
            grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
        elif method == 'SVM':
            param_grid = {'C': np.logspace(-4, 4, 50), 'kernel': ['linear', 'rbf']}
            grid_search = GridSearchCV(SVC(), param_grid, cv=5)
        elif method == 'Decision Tree':
            param_grid = {'max_depth': np.arange(1, 21)}
            grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
        elif method == 'Random Forest':
            param_grid = {'n_estimators': [10, 50, 100, 200], 'max_depth': np.arange(1, 21)}
            grid_search = GridSearchCV(RandomForestClassifier(), param_grid, cv=5)
        else:
            grid_search = None

    if grid_search:
        grid_search.fit(X_train, y_train)
        best_params = grid_search.best_params_
        best_score = grid_search.best_score_
        display(Markdown("### Hyperparameter Tuning Results"))
        display(Markdown(f"**Best Parameters:** {best_params}"))
        display(Markdown(f"**Best Cross-Validation Score:** {best_score}"))

hyperparameter_tuning_button = widgets.Button(description="Perform Hyperparameter Tuning")
hyperparameter_tuning_button.on_click(perform_hyperparameter_tuning)

# Display the hyperparameter tuning button
display(Markdown("## Step 6: Hyperparameter Tuning and Optimization"))
display(hyperparameter_tuning_button)

# Step 7: Predictions and Deployment
# Function to create input widgets for each feature
def update_feature_widgets():
    global feature_widgets
    feature_widgets = {}
    if data is not None:
        for column in data.columns[:-1]:
            feature_widgets[column] = widgets.FloatText(
                value=0.0,
                description=column,
                disabled=False,
            )
        display(widgets.VBox(list(feature_widgets.values())))

# Function to make predictions with new input values
def make_predictions(b):
    global model, label_encoder
    if model is None:
        display(Markdown("### Error: Model is not trained."))
        return

    input_data = np.array([[widget.value for widget in feature_widgets.values()]])
    prediction = model.predict(input_data)
    if label_encoder:
        prediction = label_encoder.inverse_transform(prediction)
    display(Markdown(f"### Prediction for Input:"))
    for col, widget in feature_widgets.items():
        display(Markdown(f"**{col}:** {widget.value}"))
    display(Markdown(f"**Predicted Value:** {prediction[0]}"))

predict_button = widgets.Button(description="Make Prediction")
predict_button.on_click(make_predictions)

# Display the prediction widgets and button
display(Markdown("## Step 7: Predictions and Deployment"))
display(predict_button)


## Step 1: Data Collection

FileUpload(value={}, accept='.csv', description='Upload CSV')

## Step 2: Data Preprocessing

Dropdown(description='Feature:', options=(), value=None)

Text(value='', description='Target:')

Dropdown(description='Preprocess:', options=('None', 'Standard Scaler', 'Min-Max Scaler', 'Robust Scaler'), va…

SelectMultiple(description='Drop Columns:', options=(), value=())

SelectMultiple(description='Fill Columns:', options=(), value=())

Dropdown(description='Fill Method:', options=('Mean', 'Median', 'Mode'), value='Mean')

SelectMultiple(description='Categorical Columns:', options=(), value=())

Dropdown(description='Encoding Method:', options=('Label Encoding', 'One-Hot Encoding'), value='Label Encoding…

Button(description='Visualize Data', style=ButtonStyle())

Button(description='Drop Columns', style=ButtonStyle())

Button(description='Fill Missing Values', style=ButtonStyle())

Button(description='Handle Categorical Data', style=ButtonStyle())

## Step 3: Choosing the Right Model

Dropdown(description='Model Type:', options=('Regression', 'Classification'), value='Regression')

Dropdown(description='Model:', options=(), value=None)

FloatText(value=1.0, description='Alpha:', disabled=True)

## Step 4: Training the Model

Button(description='Train Model', style=ButtonStyle())

## Step 5: Evaluating the Model

Button(description='Evaluate Model', style=ButtonStyle())

## Step 6: Hyperparameter Tuning and Optimization

Button(description='Perform Hyperparameter Tuning', style=ButtonStyle())

## Step 7: Predictions and Deployment

Button(description='Make Prediction', style=ButtonStyle())

### Dataset Information

**Number of instances:** 7043

**Number of features:** 21

### First 5 Rows of the Dataset

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


VBox(children=(FloatText(value=0.0, description='customerID'), FloatText(value=0.0, description='gender'), Flo…

### Handled Categorical Data

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,5375,0,0,1,0,1,0,1,0,0,...,0,0,0,0,0,1,2,29.85,2505,0
1,3962,1,0,0,0,34,1,0,0,2,...,2,0,0,0,1,0,3,56.95,1466,0
2,2564,1,0,0,0,2,1,0,0,2,...,0,0,0,0,0,1,3,53.85,157,1
3,5535,1,0,0,0,45,0,1,0,2,...,2,2,0,0,1,0,0,42.3,1400,0
4,6511,0,0,0,0,2,1,0,1,0,...,0,0,0,0,0,1,2,70.7,925,1


### Model Training Completed

### Training Data Shape:

**X_train shape:** (5634, 20)

**y_train shape:** (5634,)

### Test Data Shape:

**X_test shape:** (1409, 20)

**y_test shape:** (1409,)

### Model Evaluation

**Accuracy:** 0.7210787792760823

**Classification Report:**
               precision    recall  f1-score   support

           0       0.82      0.80      0.81      1041
           1       0.47      0.49      0.48       368

    accuracy                           0.72      1409
   macro avg       0.64      0.65      0.64      1409
weighted avg       0.72      0.72      0.72      1409


### Hyperparameter Tuning Results

**Best Parameters:** {'max_depth': 4}

**Best Cross-Validation Score:** 0.7845218526054333

### Prediction for Input:

**customerID:** 0.0

**gender:** 0.0

**SeniorCitizen:** 0.0

**Partner:** 0.0

**Dependents:** 0.0

**tenure:** 0.0

**PhoneService:** 0.0

**MultipleLines:** 0.0

**InternetService:** 0.0

**OnlineSecurity:** 0.0

**OnlineBackup:** 0.0

**DeviceProtection:** 0.0

**TechSupport:** 0.0

**StreamingTV:** 0.0

**StreamingMovies:** 0.0

**Contract:** 0.0

**PaperlessBilling:** 0.0

**PaymentMethod:** 0.0

**MonthlyCharges:** 0.0

**TotalCharges:** 0.0

**Predicted Value:** 1

### Prediction for Input:

**customerID:** 0.0

**gender:** 4.0

**SeniorCitizen:** 0.0

**Partner:** 0.0

**Dependents:** 0.0

**tenure:** 3.0

**PhoneService:** 0.0

**MultipleLines:** 0.0

**InternetService:** 0.0

**OnlineSecurity:** 4.0

**OnlineBackup:** 0.0

**DeviceProtection:** 0.0

**TechSupport:** 0.0

**StreamingTV:** 0.0

**StreamingMovies:** 0.0

**Contract:** 0.0

**PaperlessBilling:** 0.0

**PaymentMethod:** 0.0

**MonthlyCharges:** 0.0

**TotalCharges:** 0.0

**Predicted Value:** 1