# Foundations of Artificial Intelligence and Machine Learning
## A Program by IIIT-H and TalentSprint

The objective of this experiment is to understand Decision Tree.

#### Decision Tree

As the name says all about it, it is a tree which helps us by assisting us in decision-making. Used
for both classification and regression, it is a very basic and important predictive learning algorithm.

    1. It is different from others because it works intuitively i.e., taking decisions one-by-one.
    2. Non-Parametric: Fast and efficient.

It consists of nodes which have parent-child relationships

The core algorithm for building decision trees called ID3 by J. R. Quinlan which employs a top-down, greedy search through the space of possible branches with no backtracking. ID3 uses Entropy and Information Gain to construct a decision tree. In ZeroR model there is no predictor, in OneR model we try to find the single best predictor, naive Bayesian includes all predictors using Bayes' rule and the independence assumptions between predictors but decision tree includes all predictors with the dependence assumptions between predictors.

#### Importing Required Packages

In [None]:
import pandas as pd
import numpy as np
from utils import ID3 as ID3

#### Loading Dataset

In [None]:
dataset = pd.read_csv('Zoo_New.csv',
                      names=['animal_name','hair','feathers','eggs','milk',
                                                   'airbone','aquatic','predator','toothed','backbone',
                                                  'breathes','venomous','fins','legs','tail','domestic','catsize','class',])#Import all columns omitting the fist which consists the names of the animals
#We drop the animal names since this is not a good feature to split the data on
dataset=dataset.drop('animal_name',axis=1)

In [None]:
dataset.head()

#### Predict function to find the class Label

In [None]:
def predict(query,tree,default = 1):
    
    
    #1.
    for key in list(query.keys()):
        if key in list(tree.keys()):
            #2.
            try:
                result = tree[key][query[key]] 
            except:
                return default
  
            #3.
            result = tree[key][query[key]]
            #4.
            if isinstance(result,dict):
                return predict(query,result)
            else:
                return result
        

In [None]:
dataset.shape

#### Splitting the datasets into train and test

In [None]:
def train_test_split(dataset):
    training_data = dataset.iloc[:80].reset_index(drop=True)#We drop the index respectively relabel the index
    #starting form 0, because we do not want to run into errors regarding the row labels / indexes
    testing_data = dataset.iloc[80:].reset_index(drop=True)
    return training_data,testing_data
training_data = train_test_split(dataset)[0]
testing_data = train_test_split(dataset)[1] 


#### Function to Predict the class of test data

In [None]:
def test(data,tree):
    #Create new query instances by simply removing the target feature column from the original dataset and 
    #convert it to a dictionary
    queries = data.iloc[:,:-1].to_dict(orient = "records")
    print(len(queries))
    #Create a empty DataFrame in whose columns the prediction of the tree are stored
    predicted = pd.DataFrame(columns=["predicted"]) 
    
    #Calculate the prediction accuracy
   # print(data)
    for i in range(len(data)):
        predicted.loc[i,"predicted"] = predict(queries[i],tree,1.0) 
       # print(predict(queries[i],tree,1.0) )
    print('The prediction accuracy is: ',(np.sum(predicted["predicted"] == data["class"])/len(data))*100,'%')

In [None]:
tree = ID3(training_data,training_data,training_data.columns[:-1])
print(tree)
test(testing_data,tree)

#### Exercise 1

Try to understand the tree which is represented as a python dictionary.

#### Exercise 2

Change the train and test split ratio and observe the change in accuracy 