Learn how machine learning models work by building them from scratch!
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docker
notebooks/Learning Units
source
tests
.gitignore
LICENSE
README.md
requirements.txt

README.md

Skratch

About

Machine Learning from Skratch

Machine Learning combines statistics, linear algebra, information theory, and computer science. It is, therefore, a complex field to learn. Nowadays, there exist many machine learning libraries, online services, or frameworks which allow virtually anyone to train machine learning models. Because of this, no one really needs to know how to build models from scratch. However, regardless of unattractive, it may be, understanding the inner workings of machine learning can make a huge difference. It can help pick an appropriate model, sensibly process the data, or make custom adjustments.

Finding a balance

There exist many great learning resources out there. However, they either delve deeply into theory and complex mathematics, or they never go further than high-level intuitions. Skratch’s goal is to be somewhere in the middle. In this online course, the goal is to break down machine learning concepts to their core, and show to build them back up. This is

Python code

Python was picked because of its readability and popularity in the machine learning community. However, the code was not written to be language-specific and it can be useful for people who use other languages. The code is documented and follows the scikit-learn interface and naming conventions as much as possible. In order to ensure that the implementations are correct, tests were written to compare the models to the ones from scikit-learn. The code can be found on Github and is fully open-source.

Visualizations

Understanding a topic typically starts with some intuition. This is why Skratch illustrates machine concepts with visualizations. Whether they are images or animations. And to stay in the spirit of implementing things from Skratch, clicking on every visualization in a learning unit will direct you to the piece of code that generated it. This way, it is possible to play around and create different visualizations. Matplotlib was the library that was chosen because of its simplicity and popularity.

Blogs, videos, Jupyter notebooks

Everyone learns differently. Some may prefer reading a textbook, some may prefer attending a lecture. Because of this, we tried to provide material in as many formats as possible. First and foremost, the Github repository is there to provide all of the source code and the tests. Each learning unit is also provided as a Jupyter Notebook. This way, the code involved in a unit can be run right in the browser. On top of regular learning units, you’ll also find blogs discussing machine learning topics. Topics vary from intuition on some theoretical concept, to opinion pieces about the ethical implications of machine learning.

Useful Resources

Skratch is not the single learning resource when it comes to machine learning, nor is it trying to be. Everyone learns in different ways and there already exist plenty of great resources out there to learn machine learning. This is why in each learning unit, other useful resources on the topic will be topic. On a more general note, useful resources such as podcasts, academic papers, or online courses are provided.

Open-source and for everyone

Skratch is and will remain a work in progress. And in order to improve, any and all remarks or suggestions are welcome. The code is fully open-source and so anyone can reuse it or contribute to it. Regarding the website, feel free to use the contact page in order to get in touch with me. Skratch is for anyone who is interested in machine learning, and especially those who like to build things themselves. Skratch is trying to fill a gap between “too technical” and “too high-level”. This means that it won’t delve deeply into statistics. If necessary, it will provide relevant links to other resources.

Who am I?

My name is Valentin and I originally come from Belgium. I lived in the United States for a bit and then I went to Maastricht University to study data science. I now work as a data scientist, which means that I am able to fully embrace my passion for machine learning on a daily basis. I am passionate about teaching and really believe that you cannot master a topic until you are able to explain it to others. This is one of the many reasons I decided to start this website. On one hand, I wanted to spread good resources about machine learning, and on the other, I wanted to further my education on the topic. This website is and will remain a work in progress. I welcome any criticism, suggestions, and even contributions to the project with open arms.

FAQ

Can I reuse the code?

Yes, the project is 100% open-source. Feel free to use, modify, or even contribute to the codebase. Do remember though that the code was not written to be robust or fast.

How did you create the images/GIFs?

I used Matplotlib, a famous Python library. If you click on each image or gif, it will send you to a Python file used to generate the figure! The code can also be found directly on Github.

How do I get in touch with you if I have questions?

You can make use of the contact page, send an e-mail to skratch@valentincalomme.com, or even find me on LinkedIn!

How do I know the implementations are correct?

Skratch aims to be as transparent as possible. It is why we wrote tests for the various machine learning models, ensuring that they perform similarly to sklearn models.

I don't know Python, is it a problem?

No! Even though the code was written in Python, it is not language specific. You might need some basic understanding about Python syntax but that’s it!

How much statistics do I need to know?

Statistics are a vital part of machine learning, and I can’t ever recommend to learn it enough. However, this course is not necessarily focused on statistics, so basic knowledge should be plenty!