# | Resource |
---|---|
1 | The current version of this syllabus and GitHub repository |
2 | Welcome video |
3 | What should you do the first week of the course |
4 | Instructor: Ron Zacharski, ron.zacharski@gmail.com, 575.680.4041 |
5 | Experience Point Sheet |
6 | the FIU Deep Learning Slack workspace |
7 | The Inquiryum Machine Learning Course |
8 | The Lab Submission Form |
As you will read in the details below, this class is a programming intensive course where you work at your own pace. Historically, about ⅓ of the students get an A, ⅓ an F, and ⅓ between an A and an F. What separates the 'A' students from the 'F' ones is that the 'A' students keep a regular schedule and consistantly submit their work. If they have a question or need help debugging they message me on Slack. They are not necessarily the most proficient programmers, or the best at math. The attribute that best defines them is self-discipline.
Data mining applications, data preparation, data reduction and various data mining techniques such as association, clustering, classification, anomaly detection.
This course provides an introduction to practical machine learning tools for data mining with an emphasis on XGBoost and Deep Learning.
Students will gain hands-on experience with the following algorithms and libraries, learning when and how to apply them to problems in data mining:
-
Numpy, Pandas, skLearn
-
entropy and decision trees
-
bagging and pasting
-
random forest
-
XGBoost
-
deep learning basics
-
Convolutional Neural Networks (CNN)
-
Clustering
-
Working with text
Students should be able to
- architect a scalable ML pipeline
- run ML jobs on a GPU using Jupyter Notebooks in Colab
- evaluate different ML models
- determine the best ML algorithm to use for an application
- reduce the dimensionality of a dataset
- develop different linear models to solve classification problems
- communicate effectively about ML applications (terminology)
Students should be able to
- apply decision tree algorithms to create a classifier
- use random forest techniques
- combine a number of weak classifiers into a strong one by using boosting.
- effectively use the XGBoost algorithm
Students should be able to
- build a simple deep learning system for image classification
- build CNNs for computer vision
- pre-process text datasets into a form usable for classification
- build CNN for text classification
- adjust hyperparameters to improve performance
Prerequisite Course: COP 3530 Data Structures
Corequisite Course: COP 4710 Database Management
Note: While very little material from either of these courses will be used in this course, these prerequisites give you a level of programming maturity that is required.
This class is asynchronous meaning there is no mandatory real-time interaction. You will be working through the Inquiryum Machine Learning Fundamentals Course. You can watch the videos anytime you want. You can play them at a faster speed, you can rewatch them or pause them. You can work on the course material in 20 minute blocks throughout a day, or devote a large contiguous block of time once per week. When you need help you can use the FIU Deep Learning Slack workspace to get assistance from me or your classmates.
The advantages of this approach is that it allows you great flexibility in when you want to work on the material and for how long. And, as described below under mastery learning, it allows you to work at your own pace.
Slack Office Hours: Monday 1-2pm ET and Tuesday 4-5:30pm ET
I will be sitting at my laptop on the Slack channel on the days and times listed above. This means that if you Slack message me, I will respond within 5 minutes unless I am helping another student. Do not hesitate to message me at other times during normal business hours and early evening. Most of the time I can answer you within 30 minutes. Feel free to message me outside of those times but my response delay might be significant. Also, there may other times when I don't have cell or wifi coverage. In those cases I will post a message on Slack beforehand.
If your questions require something that can be better addressed over a video call, we can arrange a meeting time through Slack. I also encourage those in class to help others (see my honor code policy below)
The above hours may be subject to change if other times benefit more students. These changes will be announced in the Slack channel.
The majority of effort in the course is in working on labs and project, which have different levels of expected knowledge and independence.
- In the form of Jupyter Notebook tutorials which provide detailed explanations and sample executable code.
- You are to:
- write a small amount of code to complete the task
- answer any non-coding questions the Notebook may ask.
- Follows examples shown in the course videos and in the labs.
- Builds off of concepts and skills you learned completing the labs.
- Project definition provides
- a dataset
- a short problem description
- You are to
- design and create the machine learning algorithm used to solve the problem.
- write the code in a Jupyter Notebook
- test and evaluate your solution.
- save your notebook to Github..
Traditional classes are time-based learning. You spend a specific amount of time on a topic and then you move on to the next topic. For example, in a traditional intro course on Python programming you might cover for loops in week 5, take a quiz on them, and then move on to Python dictionaries in week 6. Suppose you got a 75% on that quiz in week 5. That means that you did not learn 25% of the material. Then perhaps in week 10 you take a test on list comprehensions and get an 80% (you did not master 20% of the material). These gaps in your mastery start adding up, and eventually, in either in some future class or on the job, you hit a wall because your current task requires that you are skilled in areas that you failed to master.
This class doesn't work like that.
In contrast to time-based learning, in mastery learning you stay on the topic until you master it. You work at your own pace. This online class is based on this approach. You stay on a topic until you master it. As I mentioned, the lectures are a set of videos (mostly screencasts) that you can watch at anytime. If the material is easy for you, you can speed up the videos and watch them at 1.5 speed. If you find the material challenging, you can rewatch the videos, google for more information, interact with other learners on the Slack channel.
Obviously, the work-at-your-own pace approach will collide with the end of the semester and there will be some material that you will not cover. The course is designed so that the essential core information is presented first, to enable you to develop solid foundational skills with no gaps.
This course is work at your own pace. Other courses you might be taking have fixed deadlines, So, for example, you might have a gnarly project for a programming class due this week and a big operating systems project due next week. It is likely that you will work on those projects since they have immediate deadlines and ignore working on this course. It is human nature. Just block out a regular time each week to work on the course and you will do fine.
Order | Lesson |
---|---|
1 | JumpStart |
2 | Labs |
3 | Projects |
Again, the class is work-at-your-own pace, but I provide a suggested schedule below.
Week | Date | Unit | Topics | labs and projects |
---|---|---|---|---|
1 | 6 May | Intro | Intro to class & Quickstart to ML | Quickstart lab |
2 | 13 May | basics | Numpy, Pandas | Numpy & Pandas labs |
3 | 20 May | basics | kNN sklearn | sklearn lab |
4 | 27 May | basics | entropy and decision trees | decision tree lab |
5 | 3 June | basics | one-hot encoding, cross-validation, hyperparameters | working with data lab |
6 | 10 June | basics | Regression & Clustering | regression and clustering labs |
7 | 17 June | XGBoost | Intro to boosting, bagging & pasting | bagging and pasting lab |
8 | 24 June | XGBoost | random forest, patches, xgboost | XGBoost lab First Project |
9 | 1 July | DNN | Neural Network anatomy & classification our first neural network - classifying images |
a first look at deep learning lab |
10 | 8 July | DNN | Introduction to Convolutional Neural Networks (CNN) | CNN lab |
11 | 15 July | DNN | CNNs and test classification | NLP & Embeddings lab Amazon Reviews Project |
12 | 22 July | DNN | project work | Projects 2 & 3 |
While the free Colab account is the minimum requirement, for the last 6 weeks of the class it may be beneficial to subscribe to [Google Colab Pro](Google Colab) for $9.99/mo
Laptop
Inquiryum’s Machine Learning Fundamentals Course
No purchases of books or equipment are required.
Slack is a work chat application that many tech companies use. We are going to be using Slack in a number of ways. First, all my announcements for the class will be in Slack. If you have a particular programming question you can ask it in a general channel and hopefully you will get an answer or suggestion quickly from either myself or fellow learners.
Twice per week one of our Slackbots will ask you three questions:
- What have you accomplished since the last class?
- What are you working on now?
- What is holding your back?
Failure to do the Slack check-in will result in the following deduction of points:
number of missed check-ins | points deducted |
---|---|
2 | 0 |
3 | 10 |
4 | 25 |
5 | 100 |
6 | 250 |
You will be responsible for logging into Slack on Tuesdays and Fridays to answer these questions. When you initially sign in to Slack make sure to join the scrum channel.
Grading is based on a method developed by Professor Lee Sheldon at Indiana University. It is based on obtaining experience points (XP). The number of XP determines what level you are at. You start the class at Level Zero and with 0 XP. The level you obtain at the end of the semester determines your final grade. Here is the chart:
Level | XP | Grade |
---|---|---|
Zero | 0 | F |
One | 550 | D |
Two | 740 | C |
Three | 800 | C+ |
Four | 840 | B- |
Five | 871 | B |
Six | 914 | B+ |
Seven | 950 | A- |
Eight | 990 | A |
Here are the ways of earning XP:
-
there will be around 15 labs. On average each will be worth 30xp
-
there are 4-5 machine learning projects. On average they are each worth 150xp
The Office of Disability Resources has been designated by the college as the primary office to guide, counsel, and assist students with disabilities. If you receive services through the Office of Disability Resources and require accommodations for this class, make an appointment with me as soon as possible to discuss your approved accommodation needs. Bring your accommodation letter, along with a copy of our class syllabus with you to the appointment. I will hold any information you share with me in strictest confidence unless you give me permission to do otherwise.
If you have not made contact with the Office of Disability Resources and have reasonable accommodation needs, (note taking assistance, extended time for tests, etc.), I will be happy to refer you. The office will require appropriate documentation of disability
Floridal International University's faculty are committed to supporting students and upholding the University’s Policy on Sexual Harassment and Sexual Misconduct. Under Title IX and this Policy, discrimination based upon sex or gender is prohibited. If you experience an incident of sex or gender based discrimination, we encourage you to report it. While you may talk to me, understand that as a “Responsible Employee” of the University, I MUST report to FIU's Title IX Coordinator what you share. If you wish to speak to someone confidentially, please contact the confidential resources described on the []FIU Title IX webpage](https://dei.fiu.edu/crca/title-ix) They can connect you with support services and help you explore your options. You may also seek assistance from FIU’s Title IX Coordinator.
The general policy for any computer science class is
-
You must write all programs yourself.
-
Do not share your code with other students
-
Do not post your code or class materials to any publicly-available website.
-
Explicitly cite any sources you use.
-
Do not look at solutions from previous semesters.
-
Be prepared to explain every single line of code you submit demonstrating your thought process behind it.
Common Sense
My rule of thumb is What would a responsible adult do on the job? If you have a deadline on the job as a software developer and didn't know how to do something, the responsible thing wouldn't be to sit at your workstation just getting more and more frustrated and depressed and missing the deadline. The responsible person would get whatever help was necessary to complete the task. On the other hand, a responsible person wouldn't let someone else do all the work and present it as his own. That would be a violation of this policy. As would having an AI bot write your code.
From the other perspective---that of helping someone in the situation mentioned above---sharing your complete assignment would violate this policy, but sharing a code snippet, or occassionally helping someone find a bug, is fine and encouraged.
You should acknowledge the people that helped you in writing in your submission. For example, "Ann Mulkern helped me with the code to divide the dataset into training and testing sets."
During the first week of class you will need to fill out the Avatar Form for your avatar name, pseudonym, whatever. This is the name that will appear on the Experience Point Google Spreadsheet that will be viewable by everyone in the class. If you wish to remain anonymous, don’t share your avatar name with anyone. To further protect the anonymity of those who wish to remain anonymous, the spreadsheet will also be populated by fictitious avatar names.