Skip to content

thejvdev/ml-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning from Scratch

This repository was created to deepen my understanding of machine learning methods by implementing them from scratch in Python Jupyter Notebooks and comparing their results with the models provided by existing libraries.

Description

The notebooks are organized to demonstrate the step-by-step process of building and evaluating algorithms. They rely on a set of widely used Python libraries:

  • NumPy – for vectorized and matrix operations, linear algebra, and numerical routines.
  • Pandas – for structured data manipulation, preprocessing, and tabular analysis.
  • Matplotlib and Seaborn – for data visualization, including exploratory analysis and graphical representation of algorithm results.
  • Scikit-learn – for access to datasets, utility functions, and baseline models for validation.

Each notebook typically includes:

  1. Math – key formulas and theoretical background for the algorithm or evaluation metric.
  2. Implementation – step-by-step Python code that reproduces the method without relying on high-level machine learning functions.
  3. Datasets – description and links to datasets used in the experiments.
  4. Visualization – plots illustrating the behavior of the algorithm, decision boundaries, performance metrics, or error analysis.
  5. Comparison – evaluation of the custom implementation against Scikit-learn (or other libraries) to verify correctness and performance.

Contents

Notebook Description
EDA COVID-19 A small exploratory data analysis (EDA) of COVID-19 datasets, focusing on general trends and basic insights from the data.
Linear Regression Analysis and implementation of linear regression to identify linear relationships that may influence students’ learning outcomes.
K-Nearest Neighbors (KNN) Implementation and evaluation of the K-Nearest Neighbors (KNN) algorithm for both classification and regression tasks.
Principal Component Analysis (PCA) Application of PCA for dimensionality reduction and visualization, highlighting how major components capture the key variance in the dataset.
Clustering Algorithms Exploration of clustering methods including K-Means, DBSCAN, and Agglomerative Clustering to identify hidden patterns and group structures in the data.

Author

Created by Denys Bondarchuk. Feel free to reach out or contribute to the project!