Skip to content

Code for the Inquiryum Machine Learning Fundamentals Course

License

Notifications You must be signed in to change notification settings

zacharski/ml-class

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAP4770 Introduction to Data Mining

Summer 2024

essentials

# Resource
1 The current version of this syllabus and GitHub repository
2 Welcome video
3 What should you do the first week of the course
4 Instructor: Ron Zacharski, ron.zacharski@gmail.com, 575.680.4041
5 Experience Point Sheet
6 the FIU Deep Learning Slack workspace
7 The Inquiryum Machine Learning Course
8 The Lab Submission Form

Important: How to pass the course

As you will read in the details below, this class is a programming intensive course where you work at your own pace. Historically, about ⅓ of the students get an A, ⅓ an F, and ⅓ between an A and an F. What separates the 'A' students from the 'F' ones is that the 'A' students keep a regular schedule and consistantly submit their work. If they have a question or need help debugging they message me on Slack. They are not necessarily the most proficient programmers, or the best at math. The attribute that best defines them is self-discipline.

Course Content

Course Catalog Description

Data mining applications, data preparation, data reduction and various data mining techniques such as association, clustering, classification, anomaly detection.

Course Description

This course provides an introduction to practical machine learning tools for data mining with an emphasis on XGBoost and Deep Learning.

Course Objectives

Students will gain hands-on experience with the following algorithms and libraries, learning when and how to apply them to problems in data mining:

  • Numpy, Pandas, skLearn

  • entropy and decision trees

  • bagging and pasting

  • random forest

  • XGBoost

  • deep learning basics

  • Convolutional Neural Networks (CNN)

  • Clustering

  • Working with text

Expected Outcomes

Basic Machine Learning (ML) Techniques

Students should be able to

  • architect a scalable ML pipeline
  • run ML jobs on a GPU using Jupyter Notebooks in Colab
  • evaluate different ML models
  • determine the best ML algorithm to use for an application
  • reduce the dimensionality of a dataset
  • develop different linear models to solve classification problems
  • communicate effectively about ML applications (terminology)

XGBoost

Students should be able to

  • apply decision tree algorithms to create a classifier
  • use random forest techniques
  • combine a number of weak classifiers into a strong one by using boosting.
  • effectively use the XGBoost algorithm

Deep Learning

Students should be able to

  • build a simple deep learning system for image classification
  • build CNNs for computer vision
  • pre-process text datasets into a form usable for classification
  • build CNN for text classification
  • adjust hyperparameters to improve performance

Prerequisites

Prerequisite Course: COP 3530 Data Structures

Corequisite Course: COP 4710 Database Management

Note: While very little material from either of these courses will be used in this course, these prerequisites give you a level of programming maturity that is required.

An asynchronous online class

This class is asynchronous meaning there is no mandatory real-time interaction. You will be working through the Inquiryum Machine Learning Fundamentals Course. You can watch the videos anytime you want. You can play them at a faster speed, you can rewatch them or pause them. You can work on the course material in 20 minute blocks throughout a day, or devote a large contiguous block of time once per week. When you need help you can use the FIU Deep Learning Slack workspace to get assistance from me or your classmates.

The advantages of this approach is that it allows you great flexibility in when you want to work on the material and for how long. And, as described below under mastery learning, it allows you to work at your own pace.

Instructor availability

Slack Office Hours: Monday 1-2pm ET and Tuesday 4-5:30pm ET

I will be sitting at my laptop on the Slack channel on the days and times listed above. This means that if you Slack message me, I will respond within 5 minutes unless I am helping another student. Do not hesitate to message me at other times during normal business hours and early evening. Most of the time I can answer you within 30 minutes. Feel free to message me outside of those times but my response delay might be significant. Also, there may other times when I don't have cell or wifi coverage. In those cases I will post a message on Slack beforehand.

If your questions require something that can be better addressed over a video call, we can arrange a meeting time through Slack. I also encourage those in class to help others (see my honor code policy below)

The above hours may be subject to change if other times benefit more students. These changes will be announced in the Slack channel.

Labs and Projects

The majority of effort in the course is in working on labs and project, which have different levels of expected knowledge and independence.

Labs

  • In the form of Jupyter Notebook tutorials which provide detailed explanations and sample executable code.
  • You are to:
    • write a small amount of code to complete the task
    • answer any non-coding questions the Notebook may ask.

Projects

  • Follows examples shown in the course videos and in the labs.
  • Builds off of concepts and skills you learned completing the labs.
  • Project definition provides
    • a dataset
    • a short problem description
  • You are to
    • design and create the machine learning algorithm used to solve the problem.
    • write the code in a Jupyter Notebook
    • test and evaluate your solution.
    • save your notebook to Github..

Mastery Learning

Traditional classes are time-based learning. You spend a specific amount of time on a topic and then you move on to the next topic. For example, in a traditional intro course on Python programming you might cover for loops in week 5, take a quiz on them, and then move on to Python dictionaries in week 6. Suppose you got a 75% on that quiz in week 5. That means that you did not learn 25% of the material. Then perhaps in week 10 you take a test on list comprehensions and get an 80% (you did not master 20% of the material). These gaps in your mastery start adding up, and eventually, in either in some future class or on the job, you hit a wall because your current task requires that you are skilled in areas that you failed to master.

This class doesn't work like that.

In contrast to time-based learning, in mastery learning you stay on the topic until you master it. You work at your own pace. This online class is based on this approach. You stay on a topic until you master it. As I mentioned, the lectures are a set of videos (mostly screencasts) that you can watch at anytime. If the material is easy for you, you can speed up the videos and watch them at 1.5 speed. If you find the material challenging, you can rewatch the videos, google for more information, interact with other learners on the Slack channel.

Obviously, the work-at-your-own pace approach will collide with the end of the semester and there will be some material that you will not cover. The course is designed so that the essential core information is presented first, to enable you to develop solid foundational skills with no gaps.

Mastery Learning Difficulties

This course is work at your own pace. Other courses you might be taking have fixed deadlines, So, for example, you might have a gnarly project for a programming class due this week and a big operating systems project due next week. It is likely that you will work on those projects since they have immediate deadlines and ignore working on this course. It is human nature. Just block out a regular time each week to work on the course and you will do fine.

Starting on week 8, there is a limit of 3 submissions per week.

The course material

Order Lesson
1 JumpStart
2 Labs
3 Projects

Again, the class is work-at-your-own pace, but I provide a suggested schedule below.

Suggested Week-by-Week Schedule

Week Date Unit Topics labs and projects
1 6 May Intro Intro to class & Quickstart to ML Quickstart lab
2 13 May basics Numpy, Pandas Numpy & Pandas labs
3 20 May basics kNN sklearn sklearn lab
4 27 May basics entropy and decision trees decision tree lab
5 3 June basics one-hot encoding, cross-validation, hyperparameters working with data lab
6 10 June basics Regression & Clustering regression and clustering labs
7 17 June XGBoost Intro to boosting, bagging & pasting bagging and pasting lab
8 24 June XGBoost random forest, patches, xgboost XGBoost lab First Project
9 1 July DNN Neural Network anatomy & classification
our first neural network - classifying images
a first look at deep learning lab
10 8 July DNN Introduction to Convolutional Neural Networks (CNN) CNN lab
11 15 July DNN CNNs and test classification NLP & Embeddings lab
Amazon Reviews Project
12 22 July DNN project work Projects 2 & 3

Required materials

Google Colab Cloud Account

While the free Colab account is the minimum requirement, for the last 6 weeks of the class it may be beneficial to subscribe to [Google Colab Pro](Google Colab) for $9.99/mo

Laptop

Inquiryum’s Machine Learning Fundamentals Course

No purchases of books or equipment are required.

Slack

Slack is a work chat application that many tech companies use. We are going to be using Slack in a number of ways. First, all my announcements for the class will be in Slack. If you have a particular programming question you can ask it in a general channel and hopefully you will get an answer or suggestion quickly from either myself or fellow learners.

Slack check-in

Twice per week one of our Slackbots will ask you three questions:

  1. What have you accomplished since the last class?
  2. What are you working on now?
  3. What is holding your back?

Failure to do the Slack check-in will result in the following deduction of points:

number of missed check-ins points deducted
2 0
3 10
4 25
5 100
6 250

You will be responsible for logging into Slack on Tuesdays and Fridays to answer these questions. When you initially sign in to Slack make sure to join the scrum channel.

Sign up for Slack here.

Okay but how do I pass?

Grading is based on a method developed by Professor Lee Sheldon at Indiana University. It is based on obtaining experience points (XP). The number of XP determines what level you are at. You start the class at Level Zero and with 0 XP. The level you obtain at the end of the semester determines your final grade. Here is the chart:

Level XP Grade
Zero 0 F
One 550 D
Two 740 C
Three 800 C+
Four 840 B-
Five 871 B
Six 914 B+
Seven 950 A-
Eight 990 A

Here are the ways of earning XP:

  • there will be around 15 labs. On average each will be worth 30xp

  • there are 4-5 machine learning projects. On average they are each worth 150xp

Accessibility Statement

The Office of Disability Resources has been designated by the college as the primary office to guide, counsel, and assist students with disabilities. If you receive services through the Office of Disability Resources and require accommodations for this class, make an appointment with me as soon as possible to discuss your approved accommodation needs. Bring your accommodation letter, along with a copy of our class syllabus with you to the appointment. I will hold any information you share with me in strictest confidence unless you give me permission to do otherwise.

If you have not made contact with the Office of Disability Resources and have reasonable accommodation needs, (note taking assistance, extended time for tests, etc.), I will be happy to refer you. The office will require appropriate documentation of disability

Title IX Statement

Floridal International University's faculty are committed to supporting students and upholding the University’s Policy on Sexual Harassment and Sexual Misconduct. Under Title IX and this Policy, discrimination based upon sex or gender is prohibited. If you experience an incident of sex or gender based discrimination, we encourage you to report it. While you may talk to me, understand that as a “Responsible Employee” of the University, I MUST report to FIU's Title IX Coordinator what you share. If you wish to speak to someone confidentially, please contact the confidential resources described on the []FIU Title IX webpage](https://dei.fiu.edu/crca/title-ix) They can connect you with support services and help you explore your options. You may also seek assistance from FIU’s Title IX Coordinator.

Honor Code Policy

The general policy for any computer science class is

  1. You must write all programs yourself. 

  2. Do not share your code with other students

  3. Do not post your code or class materials to any publicly-available website. 

  4. Explicitly cite any sources you use.

  5. Do not look at solutions from previous semesters. 

  6. Be prepared to explain every single line of code you submit demonstrating your thought process behind it. 

Common Sense

My rule of thumb is What would a responsible adult do on the job? If you have a deadline on the job as a software developer and didn't know how to do something, the responsible thing wouldn't be to sit at your workstation just getting more and more frustrated and depressed and missing the deadline. The responsible person would get whatever help was necessary to complete the task. On the other hand, a responsible person wouldn't let someone else do all the work and present it as his own. That would be a violation of this policy. As would having an AI bot write your code.

From the other perspective---that of helping someone in the situation mentioned above---sharing your complete assignment would violate this policy, but sharing a code snippet, or occassionally helping someone find a bug, is fine and encouraged.

You should acknowledge the people that helped you in writing in your submission. For example, "Ann Mulkern helped me with the code to divide the dataset into training and testing sets."

Avatar names, pseudonyms, noms de plume

During the first week of class you will need to fill out the Avatar Form for your avatar name, pseudonym, whatever. This is the name that will appear on the Experience Point Google Spreadsheet that will be viewable by everyone in the class. If you wish to remain anonymous, don’t share your avatar name with anyone. To further protect the anonymity of those who wish to remain anonymous, the spreadsheet will also be populated by fictitious avatar names.

About

Code for the Inquiryum Machine Learning Fundamentals Course

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published