Decision Tree Classifier

This project was developed for the "Artificial Intelligence" course and aims to creating a Decision Tree Classifier using ID3 (entropy based). Second Semester of the Second Year of the Bachelor's Degree in Artificial Intelligence and Data Science.

Programming Language:

The Decision Tree

In order to create this decision tree, there were a few things we had to take into account:

The Decision Tree input needs to be in a CSV (Comma Separated Value) format, each column is an attribute, being the last one the variable of interest for classification;
The program needs to be able to read any dataset and learn the appropriate decision tree;
The program must also be prepared to accept as input a file with test examples (after generating the tree, it must be able to apply it to new examples and be able to classify them appropriately);
Not allowed to use scikit-learn or other libraries to automatically define and train the decision trees
The print of the decision tree created must be in the following format:

Attribute: the root of each subtree; Value: is one of the values of the attribute (one of the branches of the tree); Class: is the class value assigned to that branch in the tree (and corresponds to a leaf); Counter: is a counter of the number of examples corresponding to that tree branch

The Datasets and its Problems:

In this task, we were asked to consider 4 different data sets to test the decision tree implementations, each of which has specific characteristics that we must take into account to improve our decision tree:

reataurant: Its a simple dataset that contains information about customers and restaurants (type of food, waiting time, price etc). Only has two options for classification ("Yes", "No") and the data type is "Object" (string).
weather: This dataset has some issues and contains information about weather conditions for playing tennis. In addition to this dataset containing continuous values, it also presents some imbalance between classes.
iris: This dataset contains numerical information about plants from three classes: iris setosa, iris virginica and iris versicolor, so we are faced with a non-binary classification (more than one class).
connect4: This dataset is a board configuration of the connect four game that classifies each entrance as "win", "loss" or "draw". After analyzing the dataset, we realized that we wouldn't need to make any further changes to our decision tree.

To address the issues detected in each dataset, we developed the following features in our decision tree to deal with them:

Problem	Solution
Unbalanced Classes	SMOTE
Continuous Values	Binning
Multiclasses	One VS All

About the repository:

ds ➡️ Folder with the used datasets;
report_PL3_6.ipynb ➡️ Is the work developed in detail and all its code;
IA_2324_Trab2.pdf ➡️Project statement

Link to the course:

This course is part of the second semester of the second year of the Bachelor's Degree in Artificial Intelligence and Data Science at FCUP and FEUP in the academic year 2023/2024. You can find more information about this course at the following link:

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
ds		ds
IA_2324_Trab2.pdf		IA_2324_Trab2.pdf
README.md		README.md
report_PL3_6.ipynb		report_PL3_6.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Decision Tree Classifier

Programming Language:

The Decision Tree

The Datasets and its Problems:

About the repository:

Link to the course:

About

Releases

Packages

Languages

Maguids/Decision-Tree-Classifier

Folders and files

Latest commit

History

Repository files navigation

Decision Tree Classifier

Programming Language:

The Decision Tree

The Datasets and its Problems:

About the repository:

Link to the course:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages