nobbas_assignment

Solution to the Machine Learning Assignment for Internship at Nobbas Technologies

My Approach

I used Pandas module for data handling and cleaning since it comes with a large number of builtin functions.
I loaded the provided datasets and observed the various features and values after which I found out that the CSV 2 and CSV 3 files were useless since they contained features with only 0 values.
After that, I removed the various features which either contained lots of null values or were not very useful for making prediction.
Then I converted various categorical features from text to numerical form.
Then I trained various classification algorithms on the data and evaluated their performance.

Observations

Simple classification models such as Logisitic Regression and KNN fail to gain high accuracy.
Ensemble models such as Random Forest and Ada Boost Classifier gives good accuracy, but as the size of dataset grows the training will become slower and also the performance will degrade.
Classification using Neural Networks would be the best way, regardless of the type of data provided, because it would be able to take advantage of all the different features and also there are lots of hyperparameters to play with so they can be fine tuned to perform even better.

How to run

Python version >=3.5 supported
Install the required modules using pip3 install -r requirements.txt
Run the jupyter notebook using jupyter notebook command and open the solution.ipynb

References

I referred to the official documentation of the modules to understand the usage of functions

Pandas : https://pandas.pydata.org/pandas-docs/stable

Scikit Learn : https://scikit-learn.org/stable

Matplotlib : https://matplotlib.org/3.0.3/index.html

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
ML_Internship_2019_Data		ML_Internship_2019_Data
.gitignore		.gitignore
README.md		README.md
Screencast.mp4		Screencast.mp4
requirements.txt		requirements.txt
solution.ipynb		solution.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ML_Internship_2019_Data

ML_Internship_2019_Data

.gitignore

.gitignore

README.md

README.md

Screencast.mp4

Screencast.mp4

requirements.txt

requirements.txt

solution.ipynb

solution.ipynb

Repository files navigation

nobbas_assignment

My Approach

Observations

How to run

References

About

Releases

Packages

Languages

techytushar/nobbas_assignment

Folders and files

Latest commit

History

Repository files navigation

nobbas_assignment

My Approach

Observations

How to run

References

About

Resources

Stars

Watchers

Forks

Languages