Skip to content

This project consists on creating a Decision Tree Classifier using ID3 (entropy based). Second Semester of the Second Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

Notifications You must be signed in to change notification settings

Maguids/Decision-Tree-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Decision Tree Classifier

This project was developed for the "Artificial Intelligence" course and aims to creating a Decision Tree Classifier using ID3 (entropy based). Second Semester of the Second Year of the Bachelor's Degree in Artificial Intelligence and Data Science.


Programming Language:


python


The Decision Tree

In order to create this decision tree, there were a few things we had to take into account:

  • The Decision Tree input needs to be in a CSV (Comma Separated Value) format, each column is an attribute, being the last one the variable of interest for classification;
  • The program needs to be able to read any dataset and learn the appropriate decision tree;
  • The program must also be prepared to accept as input a file with test examples (after generating the tree, it must be able to apply it to new examples and be able to classify them appropriately);
  • Not allowed to use scikit-learn or other libraries to automatically define and train the decision trees
  • The print of the decision tree created must be in the following format:

Attribute: the root of each subtree; Value: is one of the values of the attribute (one of the branches of the tree); Class: is the class value assigned to that branch in the tree (and corresponds to a leaf); Counter: is a counter of the number of examples corresponding to that tree branch


The Datasets and its Problems:

In this task, we were asked to consider 4 different data sets to test the decision tree implementations, each of which has specific characteristics that we must take into account to improve our decision tree:

  • reataurant: Its a simple dataset that contains information about customers and restaurants (type of food, waiting time, price etc). Only has two options for classification ("Yes", "No") and the data type is "Object" (string).
  • weather: This dataset has some issues and contains information about weather conditions for playing tennis. In addition to this dataset containing continuous values, it also presents some imbalance between classes.
  • iris: This dataset contains numerical information about plants from three classes: iris setosa, iris virginica and iris versicolor, so we are faced with a non-binary classification (more than one class).
  • connect4: This dataset is a board configuration of the connect four game that classifies each entrance as "win", "loss" or "draw". After analyzing the dataset, we realized that we wouldn't need to make any further changes to our decision tree.

To address the issues detected in each dataset, we developed the following features in our decision tree to deal with them:

Problem Solution
Unbalanced Classes SMOTE
Continuous Values Binning
Multiclasses One VS All

About the repository:

  • ds ➡️ Folder with the used datasets;
  • report_PL3_6.ipynb ➡️ Is the work developed in detail and all its code;
  • IA_2324_Trab2.pdf ➡️Project statement

Link to the course:

This course is part of the second semester of the second year of the Bachelor's Degree in Artificial Intelligence and Data Science at FCUP and FEUP in the academic year 2023/2024. You can find more information about this course at the following link:

About

This project consists on creating a Decision Tree Classifier using ID3 (entropy based). Second Semester of the Second Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published