This repository was created to deepen my understanding of machine learning methods by implementing them from scratch in Python Jupyter Notebooks and comparing their results with the models provided by existing libraries.
The notebooks are organized to demonstrate the step-by-step process of building and evaluating algorithms. They rely on a set of widely used Python libraries:
- NumPy – for vectorized and matrix operations, linear algebra, and numerical routines.
- Pandas – for structured data manipulation, preprocessing, and tabular analysis.
- Matplotlib and Seaborn – for data visualization, including exploratory analysis and graphical representation of algorithm results.
- Scikit-learn – for access to datasets, utility functions, and baseline models for validation.
Each notebook typically includes:
- Math – key formulas and theoretical background for the algorithm or evaluation metric.
- Implementation – step-by-step Python code that reproduces the method without relying on high-level machine learning functions.
- Datasets – description and links to datasets used in the experiments.
- Visualization – plots illustrating the behavior of the algorithm, decision boundaries, performance metrics, or error analysis.
- Comparison – evaluation of the custom implementation against Scikit-learn (or other libraries) to verify correctness and performance.
| Notebook | Description |
|---|---|
| EDA COVID-19 | A small exploratory data analysis (EDA) of COVID-19 datasets, focusing on general trends and basic insights from the data. |
| Linear Regression | Analysis and implementation of linear regression to identify linear relationships that may influence students’ learning outcomes. |
| K-Nearest Neighbors (KNN) | Implementation and evaluation of the K-Nearest Neighbors (KNN) algorithm for both classification and regression tasks. |
| Principal Component Analysis (PCA) | Application of PCA for dimensionality reduction and visualization, highlighting how major components capture the key variance in the dataset. |
| Clustering Algorithms | Exploration of clustering methods including K-Means, DBSCAN, and Agglomerative Clustering to identify hidden patterns and group structures in the data. |
Created by Denys Bondarchuk. Feel free to reach out or contribute to the project!