<p style="text-align:center">
    <a href="https://skills.network" target="_blank">
    <img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/assets/logos/SN_web_lightmode.png" width="200" alt="Skills Network Logo">
    </a>
</p>


## Metrics for Classification


Estimated time needed: **30** minutes


<p style='color: red'>The purpose of this lab is to show you how to evaluate a classification model using various metrics.</p>


## __Table of Contents__

<ol>
    <li><a href="#Objectives">Objectives</a></li>
    <li><a href="#Datasets">Datasets</a></li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-Required-Libraries">Installing Required Libraries</a></li>
            <li><a href="#Importing-Required-Libraries">Importing Required Libraries</a></li>
        </ol>
    </li>
    <li>
        <a href="#Examples">Examples</a>
        <ol>
            <li><a href="#Task-1---Load-the-data-in-a-csv-file-into-a-dataframe">Task 1 - Load the data in a csv file into a dataframe</a></li>
    <li><a href="#Task-2---Identify-the-target-column-and-the-data-columns">Task 2 - Identify the target column and the data columns</a></li>
    <li><a href="#Task-3---Split-the-data-set">Task 3 - Split the data set</a></li>
    <li><a href="#Task-4---Build-and-train-a-classifier">Task 4 - Build and train a classifier</a></li>
    <li><a href="#Task-5---Evaluate-the-model">Task 5 - Evaluate the model</a></li>        </ol>
    </li>

<li><a href="#Exercises">Exercises</a></li>
<ol>
    <li><a href="#Exercise-1---Load-a-dataset">Exercise 1 - Load a dataset</a></li>
    <li><a href="#Exercise-2---Identify-the-target-column-and-the-data-columns">Exercise 2 - Identify the target column and the data columns</a></li>
    <li><a href="#Exercise-3---Split-the-data">Exercise 3 - Split the data</a></li>
    <li><a href="#Exercise-4---Build-and-Train-a-new-classifier">Exercise 4 - Build and Train a new classifier</a></li>
    <li><a href="#Exercise-5---Evaluate-the-model">Exercise 5 - Evaluate the model</a></li>
    </ol>
</ol>




## Objectives

After completing this lab you will be able to:

 - Use Pandas to load data sets.
 - Identify the target and features.
 - Use Logistic Regression to build a classifier.
 - Use metrics to evaluate the model.
 - Make predictions using a trained model.


## Datasets

In this lab you will be using dataset(s):

 - Pima Indians Diabetes Database. Available at https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database


## Setup


For this lab, we will be using the following libraries:

*   [`pandas`](https://pandas.pydata.org/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for managing the data.
*   [`sklearn`](https://scikit-learn.org/stable/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMML0187ENSkillsNetwork31430127-2021-01-01) for machine learning and machine-learning-pipeline related functions.


### Installing Required Libraries

The following required libraries are pre-installed in the Skills Network Labs environment. However, if you run this notebook commands in a different Jupyter environment (e.g. Watson Studio or Ananconda), you will need to install these libraries by removing the `#` sign before `!pip` in the code cell below.


In [ ]:
# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented.
# !pip install pandas==1.3.4
# !pip install scikit-learn==0.20.1


The following required libraries are __not__ pre-installed in the Skills Network Labs environment. __You will need to run the following cell__ to install them:


### Importing Required Libraries

_We recommend you import all required libraries in one place (here):_


In [ ]:
# You can also use this section to suppress warnings generated by your code:
def warn(*args, **kwargs):
    pass
import warnings
warnings.warn = warn
warnings.filterwarnings('ignore')

import pandas as pd
from sklearn.linear_model import LogisticRegression

#import functions for train test split

from sklearn.model_selection import train_test_split


# functions for metrics

from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score

## Task 1 - Load the data in a csv file into a dataframe


In [ ]:
# the data set is available at the url below.
URL = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-BD0231EN-SkillsNetwork/datasets/diabetes.csv"

# using the read_csv function in the pandas library, we load the data into a dataframe.

df = pd.read_csv(URL)

Let's look at some sample rows from the dataset we loaded:


In [ ]:
# show 5 random rows from the dataset
df.sample(5)

Let's find out the number of rows and columns in the dataset:


In [ ]:
df.shape

Let's plot the types and count of Outcome


In [ ]:
df.Outcome.value_counts()

In [ ]:
df.Outcome.value_counts().plot.bar()

There are 500 people without diabetes and 268 people with diabetes in this dataset.


## Task 2 - Identify the target column and the data columns


First we identify the target. Target is the value that our machine learning model needs to classify


In [ ]:
y = df["Outcome"]

We identify the features next. Features are the input values our machine learning model learns from


In [ ]:
X = df[['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
       'BMI', 'DiabetesPedigreeFunction', 'Age']]

## Task 3 - Split the data set


We split the data set in the ratio of 70:30. 70% training data, 30% testing data.


In [ ]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)

## Task 4 - Build and train a classifier


Create a Logistic Regression model


In [ ]:
classifier = LogisticRegression()

Train/Fit the model on training data


In [ ]:
classifier.fit(X_train,y_train)

## Task 5 - Evaluate the model


Your model is now trained. Time to evaluate the model.


In [ ]:
#Higher the score, better the model.
classifier.score(X_test,y_test)

To compute the detailed metrics we need two values, the original mileage and the predicted mileage.


In [ ]:
original_values = y_test
predicted_values = classifier.predict(X_test)

##### Precision


In [ ]:
precision_score(original_values, predicted_values) # Higher the value the better the model

##### Recall


In [ ]:
recall_score(original_values, predicted_values) # Higher the value the better the model

##### F1 Score


In [ ]:
f1_score(original_values, predicted_values) # Higher the value the better the model

##### Confusion Matrix


In [ ]:
confusion_matrix(original_values, predicted_values) # can be used to manually calculate various met

# Exercises


In [ ]:
URL2 = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-BD0231EN-SkillsNetwork/datasets/diabetes.csv"


### Exercise 1 - Load a dataset


Load the cancer dataset available at URL2


In [ ]:
df2 = # TODO

<details>
    <summary>Click here for a Hint</summary>
    
Use the read_csv function

</details>


<details>
    <summary>Click here for Solution</summary>

```python
df2 = pd.read_csv(URL2)
```

</details>


### Exercise 2 - Identify the target column and the data columns


 - use the Outcome column as target
 - use columns 'Pregnancies', 'Glucose', 'Insulin', 'DiabetesPedigreeFunction', 'Age' as features


In [ ]:
y = # TODO
X = # TODO

<details>
    <summary>Click here for a Hint</summary>
    
Refer to Task 2
</details>


<details>
    <summary>Click here for Solution</summary>

```
y = df2["Outcome"]
X = df2[['Pregnancies', 'Glucose', 'Insulin', 'DiabetesPedigreeFunction', 'Age']]
```

</details>


### Exercise 3 - Split the data


Split the dataset into training and testing sets. Make 33% of the data as testing set. Use 40 as random state


In [ ]:
X_train, X_test, y_train, y_test =#TODO

<details>
    <summary>Click here for a Hint</summary>
    
use the train_test_split function
</details>


<details>
    <summary>Click here for Solution</summary>

```
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=40)

```

</details>


### Exercise 4 - Build and Train a new classifier


Create a new Classifier and train using the training data


In [ ]:
classifier2 = #TODO
classifier2.fit #TODO

<details>
    <summary>Click here for a Hint</summary>
    
fit using the features and target
</details>


<details>
    <summary>Click here for Solution</summary>

```python
classifier2 = LogisticRegression()
classifier2.fit(X_train,y_train)

```

</details>


### Exercise 5 - Evaluate the model


In [ ]:
original_values = y_test
predicted_values = classifier2.predict(X_test)

Print the metrics :
- Precision Score
- Recall Score
- F1 Score 						


In [ ]:
#your code goes here

<details>
    <summary>Click here for a Hint</summary>
    
use the metrics functions
</details>


<details>
    <summary>Click here for Solution</summary>

```
print(precision_score(original_values, predicted_values))
print(recall_score(original_values, predicted_values))
print(f1_score(original_values, predicted_values))
```

</details>


Congratulations you have completed this lab.<br>


## Authors


[Ramesh Sannareddy](https://www.linkedin.com/in/rsannareddy/)


### Other Contributors


Copyright © 2023 IBM Corporation. All rights reserved.


<!-- ## Change Log
-->


<!--|Date (YYYY-MM-DD)|Version|Changed By|Change Description|
|-|-|-|-|
|2023-04-15|0.1|Ramesh Sannareddy|Initial Version Created|
-->
