# Advanced Certification in AIML
## A Program by IIIT-H and TalentSprint


## Learning Objective

At the end of the experiment, you will be able to :

* Perform bagging classifer

### Dataset Description

The penguins dataset contains size measurements for three penguin species  (Chinstrap, Adelie, or Gentoo) observed on three islands in the Dream, Torgersen, or Biscoe.

The dataset consists of below 7 columns,

- **species:** penguin species (Chinstrap, Adelie, or Gentoo)
- **culmen length & depth:** The culmen is the upper ridge of a bird's beak
- **flipper_length_mm:** flipper length
- **body_mass_g:** body mass
- **island:** island name (Dream, Torgersen, or Biscoe)
- **sex:** penguin sex

## Setup Steps

In [None]:
#@title Please enter your registration id to start: { run: "auto", display-mode: "form" }
Id = "" #@param {type:"string"}


In [None]:
#@title Please enter your password (normally your phone number) to continue: { run: "auto", display-mode: "form" }
password = "" #@param {type:"string"}


In [None]:
#@title Run this cell to complete the setup for this Notebook
from IPython import get_ipython

ipython = get_ipython()

notebook= "U1W4_15_Bagging_Classifier_Penguins" #name of the notebook
Answer = "Ungraded"
def setup():
#  ipython.magic("sx pip3 install torch")
    from IPython.display import HTML, display
    ipython.magic("sx wget -qq https://cdn.iiith.talentsprint.com/aiml/Experiment_related_data/penguins.zip")
    ipython.magic("sx unzip -qq penguins.zip")
    display(HTML('<script src="https://dashboard.talentsprint.com/aiml/record_ip.html?traineeId={0}&recordId={1}"></script>'.format(getId(),submission_id)))
    print("Setup completed successfully")
    return

def submit_notebook():

    ipython.magic("notebook -e "+ notebook + ".ipynb")

    import requests, json, base64, datetime

    url = "https://dashboard.talentsprint.com/xp/app/save_notebook_attempts"
    if not submission_id:
      data = {"id" : getId(), "notebook" : notebook, "mobile" : getPassword()}
      r = requests.post(url, data = data)
      r = json.loads(r.text)

      if r["status"] == "Success":
          return r["record_id"]
      elif "err" in r:
        print(r["err"])
        return None
      else:
        print ("Something is wrong, the notebook will not be submitted for grading")
        return None

    elif getComplexity() and getAdditional() and getConcepts() and getComments():
      f = open(notebook + ".ipynb", "rb")
      file_hash = base64.b64encode(f.read())

      data = {"complexity" : Complexity, "additional" :Additional,
              "concepts" : Concepts, "record_id" : submission_id,
              "id" : Id, "file_hash" : file_hash,
              "feedback_experiments_input" : Comments, "notebook" : notebook}

      r = requests.post(url, data = data)
      r = json.loads(r.text)
      if "err" in r:
        print(r["err"])
        return None
      else:
        print("Your submission is successful.")
        print("Ref Id:", submission_id)
        print("Date of submission: ", r["date"])
        print("Time of submission: ", r["time"])
        print("View your submissions: https://learn-iiith.talentsprint.com/notebook_submissions")
        # print("For any queries/discrepancies, please connect with mentors through the chat icon in LMS dashboard.")
      return submission_id
    else: submission_id


def getAdditional():
  try:
    if not Additional:
      raise NameError
    else:
      return Additional
  except NameError:
    print ("Please answer Additional Question")
    return None
def getComments():
  try:
    if not Comments:
      raise NameError
    else:
      return Comments
  except NameError:
    print ("Please answer Comments Question")
    return None

def getComplexity():
  try:
    if not Complexity:
      raise NameError
    else:
      return Complexity
  except NameError:
    print ("Please answer Complexity Question")
    return None

def getConcepts():
  try:
    if not Concepts:
      raise NameError
    else:
      return Concepts
  except NameError:
    print ("Please answer Concepts Question")
    return None

def getId():
  try:
    return Id if Id else None
  except NameError:
    return None

def getPassword():
  try:
    return password if password else None
  except NameError:
    return None

submission_id = None
### Setup
if getPassword() and getId():
  submission_id = submit_notebook()
  if submission_id:
    setup()

else:
  print ("Please complete Id and Password cells before running setup")


## Import required packages

In [None]:
import numpy as np
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.ensemble import BaggingClassifier
from sklearn.svm import SVC

#### Load the `penguins_size.csv`data and print the first five records

In [None]:
df = pd.read_csv('penguins_size.csv')
df.head()

### Data Pre-Processing

####  Data Cleaning

*  Count the NaN values in each column of the dataframe
*  Drop the records where sex column has NaN values
    *   Print the unique values from the sex column after dropping NaN values
*  Drop the records where the sex column has ' . ' values
    * Print the unique values after removing records with ' . '








In [None]:
# Count NaN values in each column of the dataframe
df.isna().sum()

In [None]:
# Drop the records where the sex column has '.' values
df = df[df.sex != '.']

print("Unique values after removing records with '.' : ",df.sex.unique())

In [None]:
# Drop the records where sex column has NaN values
df.dropna(subset = ['sex'], inplace = True)

# Print the unique() elements from the sex column
print("Unique values after dropping NA values : ",df.sex.unique())

# Drop the records where the sex column has '.' values
df = df[df.sex != '.']

print("Unique values after removing records with '.' : ",df.sex.unique())

#### Convert categorical values to numerical
**Hint:** Use [Sklearn LabelEncoder's](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html) fit_transform method

In [None]:
LE = preprocessing.LabelEncoder()
df['island'] = LE.fit_transform(df['island'])
df['sex'] = LE.fit_transform(df['sex'])
df['species'] = LE.fit_transform(df['species'])
df.head()

In [None]:
df['sex'].unique()

In [None]:
df['species'].unique()

In [None]:
df['island'].unique()

####   Consider the target labels as **species** and remaining as the features

* Print the shape of the features and labels


In [None]:
X = df.drop(['species'], axis=1)
y = df['species']

####   Split the data into train and test sets




In [None]:
# Split data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
X_train.shape, X_test.shape, y_train.shape, y_test.shape

#### Perform Bagging classifier on the extracted data

In [None]:
# Instantiate classifiers
knn = KNN(3)
dt = DecisionTreeClassifier(max_depth=2)
svm = SVC()

classifiers = [('KNN', knn), ('Decision_Tree', dt), ('SVM', svm)]

# Using different classifiers as base_estimator
for clf_name, clf in classifiers:
    # Instantiate bagging classifier
    model = BaggingClassifier(estimator = clf, bootstrap=True)

    # Fit model on training dataset
    model.fit(X_train, y_train)

    # Prediction on test dataset
    y_pred = model.predict(X_test)

    # Evaluate the accuracy of clf on the test set
    print('{:s} : {:.3f}'.format(clf_name, accuracy_score(y_test, y_pred)))

## Please answer the questions below to complete the experiment:

In [None]:
#@title How was the experiment? { run: "auto", form-width: "500px", display-mode: "form" }
Complexity = "" #@param ["","Too Simple, I am wasting time", "Good, But Not Challenging for me", "Good and Challenging for me", "Was Tough, but I did it", "Too Difficult for me"]


In [None]:
#@title If it was very easy, what more you would have liked to have been added? If it was very difficult, what would you have liked to have been removed? { run: "auto", display-mode: "form" }
Additional = "" #@param {type:"string"}

In [None]:
#@title Can you identify the concepts from the lecture which this experiment covered? { run: "auto", vertical-output: true, display-mode: "form" }
Concepts = "" #@param ["","Yes", "No"]

In [None]:
#@title  Text and image description/explanation and code comments within the experiment: { run: "auto", vertical-output: true, display-mode: "form" }
Comments = "" #@param ["","Very Useful", "Somewhat Useful", "Not Useful", "Didn't use"]


In [None]:
#@title Run this cell to submit your notebook  { vertical-output: true }
try:
  if submission_id:
      return_id = submit_notebook()
      if return_id : submission_id =return_id
  else:
      print("Please complete the setup first.")
except NameError:
  print ("Please complete the setup first.")